Planet MariaDB

June 23, 2018

Valeriy Kravchuk

On Partitioning in MySQL

Back in April I was preparing for vacations that my wife and I planned to spend in UK. Among other things planned I wanted to visit a customer's office in London and discuss few MySQL and MariaDB related topics, let's call them "stories". I tried to prepare myself for the discussion and collected a list of known active bugs (what else could I do as MySQL entomologist) for each of them. Surely live discussion was not suitable to share lists of bugs (and for some "stories" they were long), so I promised to share them later, in my blog. Time to do what I promised had finally come!

One of the stories we briefly discussed was "partitioning story". Right now I can immediately identify at least 47 active MySQL bugs in the related category.  While preparing I checked the same list and picked up 15 or so bug reports that had to illustrate my points. Let me share them here in no specific order, and add few more.
In April the latest still active bug in partitioning reported by MySQL community was  Bug #88916 - "Assertion `table->s->db_create_options == part_table->s->db_create_options'", from my colleague Elena Stepanova. Note a very simple test case that leads to assertion in debug builds, immediately verified.

Recently two more bugs were reported. Reporter of Bug #91190 - "DROP PARTITION and REORGANIZE PARTITION are slow" suspects a performance regression in MySQL 8.0.11. I've subscribed to this bug and is following the progress carefully. Same with Bug #91203 - "For partitions table, deal with NULL with is mismatch with reference guide". I think what happens with NULL value and range partitioning perfectly matches the manual, but the fact that INFORMATION_SCHEMA.PARTITIONS table may return wrong information after dropping partition with NULL value is somewhat unexpected.

Now back to the original lists for the "story" I prepared in April:
  • Bug #60023 - "No Loose Index Scan for GROUP BY / DISTINCT on InnoDB partitioned table". It was reported by Rene' Cannao' and since 2013 I strongly suspect that it's fixed in MySQL 5.6+ or, as noted in another comment, may depend on statistics properly collected for the table. Still the status remains "Verified".
  • Bug #78164 - "alter table command affect partitioned table data directory". Your custom DATA DIRECTORY settings may get lost when ALTER is applied to the whole table. Quick test shows that at least in MariaDB 10.3.7 this is no longer the case. The bug is still "Verified".
  • Bug #85126 - "Delete by range in presence of partitioning and no PK always picks wrong index". It was reported by Riccardo Pizzi 16 months ago, immediately verified (without explicit list of versions affected, by the way). One more case when ordering of indexes in CREATE TABLE may matter...
  • Bug #81712 - "lower_case_table_names=2 ignored on ADD PARTITION on Windows". Who cares about Windows these days?
  • Bug #84356 - "General tablespace table encryption". It seems partitioning allows to overcome documented limitation. If this is intended, then the manual is wrong, otherwise I suspect the lack of careful testing of partitioning integration with other features.
  • Bug #88673 - "Regression CREATE TBL from 5.7.17 to 20 (part #1: innodb_file_per_table = ON)." I've probably mentioned this bug reported by Jean-François Gagné in more than one blog post already. Take care and do not use long partition names.
  • Bug #85413 - "Failing to rename a column involved in partition". As simple as it sounds, and it still happens.
  • Bug #83435 - "ALTER TABLE is very slow when using PARTITIONED table". It was reported by Roel Van de Paar back in 2016 and still remains "Verified".
  • Bug #73084 - "Exchanging partitions defined with DATA DIRECTORY and INDEX DIRECTORY options". The bug still remains "Open" (see Bug #77772 also).
  • Bug #73648 - "innodb table replication is very slow with some of the partitioned table". It seems to be fixed last year as internal Bug #25687813 (see release notes for 5.6.38), but nobody cares to find this older duplicate and change its status or re-verify it.
  • Bug #83750 - "Import via TTS of a partitioned table only uses 1 cpu core". This feature requested by Daniël van Eeden makes a lot of sense. I truly hope to see parallel operations implemented for partitioned tables in GA MySQL versions (as I saw some parallel processing for partitions done for some upcoming "6.1" or so version back in 2008 in Riga during the MySQL's last company meeting I've attended).
  • Bug #64498 - "Running out of file handles when ALTERing partitioned MyISAM table". Too many file handles are needed. This is a documented limitation that DBAs should still take into account.
I also prepared a separate small list of partition pruning bugs:
  • Bug #83248 - "Partition pruning is not working with LEFT JOIN". I've reported it back in 2016 and it is still not fixed. There are reasons to think it is not so easy.
  • Bug #75085 - "Partition pruning on key partitioning with ENUM". It was reported by  Daniël van Eeden back in 2014!
  • Bug #77318 - "Selects waiting on MDL when altering partitioned table". One of the worst expectations DBA may have is that partitioned tables help to workaround "global" MDL locks because of partition pruning! This is not the case.
Does this story have any moral? I think so, and for me it's the following:
  1. Partitioning bugs do not get proper attention from Oracle engineers. We see bugs with wrong status and even a bug with a clear test case and a duplicate that is "Open" for 4 years. Some typical use cases are affected badly, and still no fixes (even though since 5.7 we have native partitioning in InnoDB and changing implementation gave good chance to review and either fix or re-check these bugs).
  2. MySQL DBAs should expect all kinds of surprises when running usual DDL statements (ALTER TABLE to add column even) with partitioned tables. In the best case DDL is just unexpectedly slow for them.
  3. Partition pruning may not work they way one expects.
  4. We miss parallel processing for partitioned tables. They should allow to speed up queries and DDL, not to slow them down instead...
  5. One can suspect that there is no careful internal testing performed on integration of partitioning with other features, or even basic partition maintenance operations.

by Valeriy Kravchuk (noreply@blogger.com) at June 23, 2018 05:21 PM

June 22, 2018

Peter Zaitsev

This Week in Data with Colin Charles 43: Polyglots, Security and DataOps.Barcelona

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

This is a short working week for me due to a family emergency. It caused me to skip speaking at DataOps.Barcelona and miss hanging out with the awesome of speakers and attendees. This is the first time I’ve missed a scheduled talk, and I received many messages about my absence. I am sure we will all meet again soon.

One of the talks I was planning to give at DataOps.Barcelona will be available as a Percona webinar next week: Securing Your Database Servers from External Attacks on Thursday, June 28, 2018, at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4). I am also giving a MariaDB 10.3 overview on Tuesday, June 26, 2018, at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4). I will “virtually” see you there.

If you haven’t already read Werner Vogel’s post A one size fits all database doesn’t fit anyone, I highly recommend it. It is true there is no “one size fits all” solution when it comes to databases. This is why Percona has made “the polyglot world” a theme. It’s why Amazon offers different database flavors: relational (Aurora for MySQL/PostgreSQL, RDS for MySQL/PostgreSQL/MariaDB Server), key-value (DynamoDB), document (DynamoDB), graph (Neptune), in-memory (ElastiCache for Redis & Memcached), search (Elasticsearch service). The article has a plethora of use cases, from AirBnB using Aurora, to Snapchat Stories and Tinder using DynamoDB, to Thomson Reuters using Neptune, down to McDonald’s using ElastiCache and Expedia using Elasticsearch. This kind of detail, and customer use case, is great.

There are plenty more stories and anecdotes in the post, and it validates why Percona is focused not just on MySQL, but also MariaDB, MongoDB, PostgreSQL and polyglot solutions. From a MySQL lens, it’s also worth noting that not one storage engine fits every use case. Facebook famously migrated a lot of their workload from InnoDB to MyRocks, and it is exciting to see Mark Callaghan stating that there are already three big workloads on MyRocks in production, with another two coming soon.

Releases

  • MariaDB 10.1.34 – including fixes for InnoDB defragmentation and full text search (MDEV-15824). This was from the WebScaleSQL tree, ported by KakaoTalk to MariaDB Server.
  • Percona XtraDB Cluster 5.6.40-26.25 – now with Percona Server for MySQL 5.6.40, including a new variable to configure rolling schema upgrade (RSU) wait for active commit connection timeouts.
  • Are you using the MariaDB Connector/C, Connector/J or Connector/ODBC? A slew of updates abound.

Link List

Industry Updates

Upcoming appearances

  • OSCON – Portland, Oregon, USA – July 16-19 2018

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 43: Polyglots, Security and DataOps.Barcelona appeared first on Percona Database Performance Blog.

by Colin Charles at June 22, 2018 09:26 PM

Finding the Right Direction: MongoDB Compass – Community Version

MongoDB Compass

MongoDB CompassIn this blog post, we will talk a bit about the product MongoDB Compass. This new tool has 3 main versions, these being: Community, Enterprise and Enterprise Read Only. MongoDB Compass Community is free, but a bit limited. It allows you to connect to your MongoDB Database to run queries, check queries execution plans, manage indexes, and create, drop/create collections and databases. The paid-for version offers some additional features such as Schema Analysis, Real Time Server Stats, and Document Validation.

We will focus on the Community version here, and look at how we can workaround its limitations using free open source software.

Of course, MongoDB 3.6 was released in November 2017 and it comes with a lot of new features. We’ve already covered those in some of our blog posts and webinars, which you might find interesting:

Using MongoDB Compass Community

The installation is very straightforward and it is available to all operating systems. We used MacOS version as an example but the product looks the same in all supported OS, including Linux with GUI.

Here are the main screens of MongoDB Compass, they are pretty self explanatory.

Database List

Collection List

Collection content (Documents)

Query explain

Indexes

In the community version, we don’t have Real Time Server Status, Document Validation, or Schema Analysis available. I’ve left these features offered by MongoDB Compass Enterprise out of this article.

However, following the philosophy offered in an earlier blog post — why pay if open source has you covered — I’d like to demonstrate some free tools that offer the same functionality.

I should highlight that Percona doesn’t have a partnership with those companies. These examples represent my suggestions of how open source software can deliver the same functionality as enterprise versions. There are other options out there, and if you know any that you think should be here please let us know!

Schema Validation

For schema validation, it is very likely that Compass is running behind the scenes something similar to Variety, an open project from James Cropcho and currently maintained by a few people https://github.com/variety/variety#core-maintainers.

With this tool, users can generate reports about collections, schemas and their field types.

Schema validator is a wrapper to create collection validation, this blog post from MongoDB explains in details how to create validations

Real-time Server Status

Real-time Server Status shows details about the server itself. It shows the current number of operations, memory used and network throughput. Those metrics can be gathered with open source or “homemade” scripts.

Most of the metrics are based on db.serverStatus()

We also have Percona Monitoring and Management, or PMM, that provides enterprise-grade monitoring features free of charge and not only monitors MongoDB but also MySQL and PostgreSQL see more at https://www.percona.com/doc/percona-monitoring-and-management/index.html

However, if you didn’t like Compass there are a lot of GUI tools available to run queries with IntelliSense, and this search will reveal the most common ones.

In summary, Compass is a great tool. However, with the limitations imposed in the Community version, it is just another user-friendly client. It is up to the user choose Compass over the other options available and if Community Compass is your option I hope you found this discussion useful.

The post Finding the Right Direction: MongoDB Compass – Community Version appeared first on Percona Database Performance Blog.

by Adamo Tonete at June 22, 2018 12:48 PM

June 21, 2018

Peter Zaitsev

Lock Down: Enforcing SELinux with Percona XtraDB Cluster

SELinux for PXC security

SELinux for PXC security

Why do I spend time blogging about security frameworks? Because, although there are some resources available on the Web, none apply to Percona XtraDB Cluster (PXC) directly. Actually, I rarely encounter a MySQL setup where SELinux is enforced and never when Percona XtraDB Cluster (PXC) or another Galera replication implementation is used. As we’ll see, there are good reasons for that. I originally thought this post would be a simple “how to” but it ended up with a push request to modify the SST script and a few other surprises.

Some context

These days, with all the major security breaches of the last few years, the importance of security in IT cannot be highlighted enough. For that reason, security in MySQL has been progressively tightened from version to version and the default parameters are much more restrictive than they used to be. That’s all good but it is only at the MySQL level if there is still a breach allowing access to MySQL, someone could in theory do everything the mysql user is allowed to do. To prevent such a situation, the operations that mysqld can do should be limited to only what it really needs to do. SELinux’ purpose is exactly that. You’ll find SELinux on RedHat/Centos and their derived distributions. Debian, Ubuntu and OpenSuse uses another framework, AppArmor, which is functionally similar to SELinux. I’ll talk about AppArmor in a future post, let’s focus for now on SELinux.

The default behavior of many DBAs and Sysadmins appears to be: “if it doesn’t work, disable SELinux”. Sure enough, it often solves the issue but it also removes an important security layer. I believe disabling SELinux is the wrong cure so let’s walk through the steps of configuring a PXC cluster with SELinux enforced.

Starting point

As a starting point, I’ll assume you have a running PXC cluster operating with SELinux in permissive mode. That likely means the file “/etc/sysconfig/selinux” looks like this:

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=permissive
# SELINUXTYPE= can take one of three two values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

For the purpose of writing this article, I created a 3 nodes PXC cluster with the hosts: BlogSELinux1, BlogSELinux2 and BlogSELinux3. On BlogSELinux1, I set SELinux in permissive mode, I truncated the audit.log. SELinux violations are logged in the audit.log file.

[root@BlogSELinux1 ~]# getenforce
Permissive
[root@BlogSELinux1 ~]# echo '' > /var/log/audit/audit.log

Let’s begin by covering the regular PXC operation items like start, stop, SST Donor, SST Joiner, IST Donor and IST Joiner. As we execute the steps in the list, the audit.log file will record SELinux related elements.

Stop and start

Those are easy:

[root@BlogSELinux1 ~]# systemctl stop mysql
[root@BlogSELinux1 ~]# systemctl start mysql

SST Donor

On BlogSELinux3:

[root@BlogSELinux3 ~]# systemctl stop mysql

then on BlogSELinux2:

[root@BlogSELinux2 ~]# systemctl stop mysql
[root@BlogSELinux2 ~]# rm -f /var/lib/mysql/grastate.dat
[root@BlogSELinux2 ~]# systemctl start mysql

SST Joiner

We have BlogSELinux1 and BlogSELinux2 up and running, we just do:

[root@BlogSELinux1 ~]# systemctl stop mysql
[root@BlogSELinux1 ~]# rm -f /var/lib/mysql/grastate.dat
[root@BlogSELinux1 ~]# systemctl start mysql

IST Donor

We have BlogSELinux1 and BlogSELinux2 up and running, we just do:

[root@BlogSELinux2 ~]# systemctl stop mysql

Then on the first node:

[root@BlogSELinux1 ~]# mysql -e 'create database test;';
[root@BlogSELinux1 ~]# mysql -e 'create table test.testtable (id int not null, primary key (id)) engine=innodb;'
[root@BlogSELinux1 ~]# mysql -e 'insert into test.testtable (id) values (1);'

Those statements put some data in the gcache, now we just restart the second node:

[root@BlogSELinux2 ~]# systemctl start mysql

IST Joiner

We have BlogSELinux1 and BlogSELinux2 up and running, we just do:

[root@BlogSELinux1 ~]# systemctl stop mysql

Then on the second node:

[root@BlogSELinux2 ~]# mysql -e 'insert into test.testtable (id) values (2);'

to insert some data in the gcache and we restart the first node:

[root@BlogSELinux1 ~]# systemctl start mysql

First run

Now that we performed the basic operations of a cluster while recording the security violations in permissive mode, we can look at the audit.log file and start building the SELinux policy. Let’s begin by installing the tools needed to manipulate the SELinux audit log and policy files with:

[root@BlogSELinux1 ~]# yum install policycoreutils-python.x86_64

Then, we’ll use the audit2allow tool to analyze the audit.log file:

[root@BlogSELinux1 ~]# grep -i denied /var/log/audit/audit.log | grep mysqld_t | audit2allow -M PXC
******************** IMPORTANT ***********************
To make this policy package active, execute:
semodule -i PXC.pp

We end up with 2 files, PXC.te and PXC.pp. The pp file is a compiled version of the human readable te file. If we examine the content of the PXC.te file, at the beginning, we have the require section listing all the involved SELinux types and classes:

module PXC 1.0;
require {
        type unconfined_t;
        type init_t;
        type auditd_t;
        type mysqld_t;
        type syslogd_t;
        type NetworkManager_t;
        type unconfined_service_t;
        type system_dbusd_t;
        type tuned_t;
        type tmp_t;
        type dhcpc_t;
        type sysctl_net_t;
        type kerberos_port_t;
        type kernel_t;
        type unreserved_port_t;
        type firewalld_t;
        type systemd_logind_t;
        type chronyd_t;
        type policykit_t;
        type udev_t;
        type mysqld_safe_t;
        type postfix_pickup_t;
        type sshd_t;
        type crond_t;
        type getty_t;
        type lvm_t;
        type postfix_qmgr_t;
        type postfix_master_t;
        class process { getattr setpgid };
        class unix_stream_socket connectto;
        class system module_request;
        class netlink_tcpdiag_socket { bind create getattr nlmsg_read setopt };
        class tcp_socket { name_bind name_connect };
        class file { getattr open read write };
        class dir search;
}

Then, using these types and classes, the policy file adds a series of generic allow rules matching the denied found in the audit.log file. Here’s what I got:

#============= mysqld_t ==============
allow mysqld_t NetworkManager_t:process getattr;
allow mysqld_t auditd_t:process getattr;
allow mysqld_t chronyd_t:process getattr;
allow mysqld_t crond_t:process getattr;
allow mysqld_t dhcpc_t:process getattr;
allow mysqld_t firewalld_t:process getattr;
allow mysqld_t getty_t:process getattr;
allow mysqld_t init_t:process getattr;
#!!!! This avc can be allowed using the boolean 'nis_enabled'
allow mysqld_t kerberos_port_t:tcp_socket name_bind;
allow mysqld_t kernel_t:process getattr;
#!!!! This avc can be allowed using the boolean 'domain_kernel_load_modules'
allow mysqld_t kernel_t:system module_request;
allow mysqld_t lvm_t:process getattr;
allow mysqld_t mysqld_safe_t:process getattr;
allow mysqld_t policykit_t:process getattr;
allow mysqld_t postfix_master_t:process getattr;
allow mysqld_t postfix_pickup_t:process getattr;
allow mysqld_t postfix_qmgr_t:process getattr;
allow mysqld_t sysctl_net_t:file { getattr open read };
allow mysqld_t syslogd_t:process getattr;
allow mysqld_t system_dbusd_t:process getattr;
allow mysqld_t systemd_logind_t:process getattr;
allow mysqld_t tuned_t:process getattr;
allow mysqld_t udev_t:process getattr;
allow mysqld_t unconfined_service_t:process getattr;
allow mysqld_t unconfined_t:process getattr;
allow mysqld_t tuned_t:process getattr;
allow mysqld_t udev_t:process getattr;
allow mysqld_t sshd_t:process getattr;
allow mysqld_t self:netlink_tcpdiag_socket { bind create getattr nlmsg_read setopt };
allow mysqld_t self:process { getattr setpgid };
#!!!! The file '/var/lib/mysql/mysql.sock' is mislabeled on your system.
#!!!! Fix with $ restorecon -R -v /var/lib/mysql/mysql.sock
#!!!! This avc can be allowed using the boolean 'daemons_enable_cluster_mode'
allow mysqld_t self:unix_stream_socket connectto;
allow mysqld_t sshd_t:process getattr;
allow mysqld_t sysctl_net_t:dir search;
allow mysqld_t sysctl_net_t:file { getattr open read };
allow mysqld_t syslogd_t:process getattr;
allow mysqld_t system_dbusd_t:process getattr;
allow mysqld_t systemd_logind_t:process getattr;
#!!!! WARNING 'mysqld_t' is not allowed to write or create to tmp_t.  Change the label to mysqld_tmp_t.
allow mysqld_t tmp_t:file write;
allow mysqld_t tuned_t:process getattr;
allow mysqld_t udev_t:process getattr;
allow mysqld_t unconfined_service_t:process getattr;
allow mysqld_t unconfined_t:process getattr;
#!!!! This avc can be allowed using one of the these booleans:
#     nis_enabled, mysql_connect_any
allow mysqld_t unreserved_port_t:tcp_socket { name_bind name_connect };

I can understand some of these rules. For example, one of the TCP ports used by Kerberos is 4444 and it is also used by PXC for the SST transfer. Similarly, MySQL needs to write to /tmp. But what about all the other rules?

Troubleshooting

We could load the PXC.pp module we got in the previous section and consider our job done. It will likely allow the PXC node to start and operate normally but what exactly is happening? Why did MySQL or one of its subprocesses asked for the process attributes getattr of all the running processes like sshd, syslogd and cron. Looking directly in the audit.log file, I found many entries like these:

type=AVC msg=audit(1527792830.989:136): avc:  denied  { getattr } for  pid=3683 comm="ss"
  scontext=system_u:system_r:mysqld_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=process
type=AVC msg=audit(1527792830.990:137): avc:  denied  { getattr } for  pid=3683 comm="ss"
  scontext=system_u:system_r:mysqld_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=process
type=AVC msg=audit(1527792830.991:138): avc:  denied  { getattr } for  pid=3683 comm="ss"
  scontext=system_u:system_r:mysqld_t:s0 tcontext=system_u:system_r:syslogd_t:s0 tclass=process

So, ss, a network utility tool, scans all the processes. That rang a bell… I knew where to look for, the sst script. Here’s the source of the problem in the wsrep_sst_xtrabackup-v2 file:

wait_for_listen()
{
    local HOST=$1
    local PORT=$2
    local MODULE=$3
    for i in {1..300}
    do
        ss -p state listening "( sport = :$PORT )" | grep -qE 'socat|nc' && break
        sleep 0.2
    done
    echo "ready ${HOST}:${PORT}/${MODULE}//$sst_ver"
}

This bash function is used when the node is a joiner and it checks using ss if the TCP port used by socat or nc is opened. The check is needed in order to avoid replying too early with the “ready” message. The code is functionally correct but wrong, security wise. Instead of looking if there is a socat or nc command running in the list of processes owned by the mysql user, it checks if any of the processes has opened the SST port and only then does it checks if the name of the command is socat or nc. Since we don’t know which processes will be running on the server, we can’t write a good security profile. For example, in the future, one could add the ntpd daemon, causing PXC to fail to start yet again. To avoid that, the function needs to be modified like this:

wait_for_listen()
{
    local HOST=$1
    local PORT=$2
    local MODULE=$3
    for i in {1..300}
    do
        sleep 0.2
        # List only our (mysql user) processes to avoid triggering SELinux
        for cmd in $(ps -u $(id -u) -o pid,comm | sed 's/^\s*//g' | tr ' ' '|' | grep -E 'socat|nc')
        do
            pid=$(echo $cmd | cut -d'|' -f1)
            # List the sockets of the pid
            sockets=$(ls -l /proc/$pid/fd | grep socket | cut -d'[' -f2 | cut -d ']' -f1 | tr '\n' '|')
            if [[ -n $sockets ]]; then
                # Is one of these sockets listening on the SST port?
                # If so, we need to break from 2 loops
                grep -E "${sockets:0:-1}" /proc/$pid/net/tcp | \
                  grep "00000000:$(printf '%X' $PORT)" > /dev/null \
                  && break 2
            fi
        done
    done
    echo "ready ${HOST}:${PORT}/${MODULE}//$sst_ver"
}

The modified function removes many of the denied messages in the audit log file and simplifies a lot the content of PXC.te. I tested the above modification and made a pull request to PXC. Among the remaining items, we have:

allow mysqld_t self:process { getattr setpgid };

setpgid is called often used after a fork to set the process group, usually through the setsid call. MySQL uses fork when it starts with the daemonize option but our installation of Percona XtraDB cluster uses mysqld_safe and does not directly run as a daemon. Another fork call is part of the wsrep source files and is used to launch processes like the SST script and is done when mysqld is already running with reduced privileges. This later invocation is certainly our culprit.

TCP ports

What about TPC ports? PXC uses quite a few. Of course there is the 3306/tcp port used to access MySQL. Galera also uses the ports 4567/tcp for replication, 4568/tcp for IST and 4444/tcp for SST. Let’s have a look which ports SELinux allows PXC to use:

[root@BlogSELinux1 audit]# semanage port -l | grep mysql
mysqld_port_t                  tcp      1186, 3306, 63132-63164

No surprise, port 3306/tcp is authorized but if you are new to MySQL, you may wonder what uses the 1186/tcp. It is the port used by NDB cluster for inter-node communication (NDB API). Now, if we try to add the missing ports:

[root@BlogSELinux1 audit]# semanage port -a -t mysqld_port_t -p tcp 4567
ValueError: Port tcp/4567 already defined
[root@BlogSELinux1 audit]# semanage port -a -t mysqld_port_t -p tcp 4568
[root@BlogSELinux1 audit]# semanage port -a -t mysqld_port_t -p tcp 4444
ValueError: Port tcp/4444 already defined

4568/tcp was successfully added but, 4444/tcp and 4567/tcp failed because they are already assigned to another security context. For example, 4444/tcp belongs to the kerberos security context:

[root@BlogSELinux1 audit]# semanage port -l | grep kerberos_port
kerberos_port_t                tcp      88, 750, 4444
kerberos_port_t                udp      88, 750, 4444

A TCP port is not allowed by SELinux to belong to more than one security context. We have no other choice than to move the two missing ports to the mysqld_t security context:

[root@BlogSELinux1 audit]# semanage port -m -t mysqld_port_t -p tcp 4444
[root@BlogSELinux1 audit]# semanage port -m -t mysqld_port_t -p tcp 4567
[root@BlogSELinux1 audit]# semanage port -l | grep mysqld
mysqld_port_t                  tcp      4567, 4444, 4568, 1186, 3306, 63132-63164

If you happen to be planning to deploy a Kerberos server on the same servers you may have to run PXC using a different port for Galera replication. In that case, and in the case where you want to run MySQL on a port other than 3306/tcp, you’ll need to add the port to the mysqld_port_t context like we just did above. Do not worry too much for the port 4567/tcp, it is reserved for tram which, from what I found, is a remote access protocol for routers.

Non-default paths

It is very frequent to run MySQL with non-standard paths/directories. With SELinux, you don’t list the authorized path in the security context, you add the security context labels to the paths. Adding a context label is a two steps process, basically change and apply. For example, if you are using /data as the MySQL datadir, you need to do:

semanage fcontext -a -t mysqld_db_t "/data(/.*)?"
restorecon -R -v /data

On a RedHat/Centos 7 server, the MySQL file contexts and their associated paths are:

[root@BlogSELinux1 ~]# bzcat /etc/selinux/targeted/active/modules/100/mysql/cil | grep filecon
(filecon "HOME_DIR/\.my\.cnf" file (system_u object_r mysqld_home_t ((s0) (s0))))
(filecon "/root/\.my\.cnf" file (system_u object_r mysqld_home_t ((s0) (s0))))
(filecon "/usr/lib/systemd/system/mysqld.*" file (system_u object_r mysqld_unit_file_t ((s0) (s0))))
(filecon "/usr/lib/systemd/system/mariadb.*" file (system_u object_r mysqld_unit_file_t ((s0) (s0))))
(filecon "/etc/my\.cnf" file (system_u object_r mysqld_etc_t ((s0) (s0))))
(filecon "/etc/mysql(/.*)?" any (system_u object_r mysqld_etc_t ((s0) (s0))))
(filecon "/etc/my\.cnf\.d(/.*)?" any (system_u object_r mysqld_etc_t ((s0) (s0))))
(filecon "/etc/rc\.d/init\.d/mysqld" file (system_u object_r mysqld_initrc_exec_t ((s0) (s0))))
(filecon "/etc/rc\.d/init\.d/mysqlmanager" file (system_u object_r mysqlmanagerd_initrc_exec_t ((s0) (s0))))
(filecon "/usr/bin/mysqld_safe" file (system_u object_r mysqld_safe_exec_t ((s0) (s0))))
(filecon "/usr/bin/mysql_upgrade" file (system_u object_r mysqld_exec_t ((s0) (s0))))
(filecon "/usr/libexec/mysqld" file (system_u object_r mysqld_exec_t ((s0) (s0))))
(filecon "/usr/libexec/mysqld_safe-scl-helper" file (system_u object_r mysqld_safe_exec_t ((s0) (s0))))
(filecon "/usr/sbin/mysqld(-max)?" file (system_u object_r mysqld_exec_t ((s0) (s0))))
(filecon "/usr/sbin/mysqlmanager" file (system_u object_r mysqlmanagerd_exec_t ((s0) (s0))))
(filecon "/usr/sbin/ndbd" file (system_u object_r mysqld_exec_t ((s0) (s0))))
(filecon "/var/lib/mysql(-files|-keyring)?(/.*)?" any (system_u object_r mysqld_db_t ((s0) (s0))))
(filecon "/var/lib/mysql/mysql\.sock" socket (system_u object_r mysqld_var_run_t ((s0) (s0))))
(filecon "/var/log/mariadb(/.*)?" any (system_u object_r mysqld_log_t ((s0) (s0))))
(filecon "/var/log/mysql.*" file (system_u object_r mysqld_log_t ((s0) (s0))))
(filecon "/var/run/mariadb(/.*)?" any (system_u object_r mysqld_var_run_t ((s0) (s0))))
(filecon "/var/run/mysqld(/.*)?" any (system_u object_r mysqld_var_run_t ((s0) (s0))))
(filecon "/var/run/mysqld/mysqlmanager.*" file (system_u object_r mysqlmanagerd_var_run_t ((s0) (s0))))

If you want to avoid security issues with SELinux, you should stay within those paths. A good example of an offending path is the PXC configuration file and directory which are now located in their own directory. These are not labeled correctly for SELinux:

[root@BlogSELinux1 ~]# ls -Z /etc/per*
-rw-r--r--. root root system_u:object_r:etc_t:s0       /etc/percona-xtradb-cluster.cnf
/etc/percona-xtradb-cluster.conf.d:
-rw-r--r--. root root system_u:object_r:etc_t:s0       mysqld.cnf
-rw-r--r--. root root system_u:object_r:etc_t:s0       mysqld_safe.cnf
-rw-r--r--. root root system_u:object_r:etc_t:s0       wsrep.cnf

I must admit that even if the security context labels on those files were not set, I got no audit messages and everything worked normally. Nevetheless, adding the labels is straightforward:

[root@BlogSELinux1 ~]# semanage fcontext -a -t mysqld_etc_t "/etc/percona-xtradb-cluster\.cnf"
[root@BlogSELinux1 ~]# semanage fcontext -a -t mysqld_etc_t "/etc/percona-xtradb-cluster\.conf\.d(/.*)?"
[root@BlogSELinux1 ~]# restorecon -v /etc/percona-xtradb-cluster.cnf
restorecon reset /etc/percona-xtradb-cluster.cnf context system_u:object_r:etc_t:s0->system_u:object_r:mysqld_etc_t:s0
[root@BlogSELinux1 ~]# restorecon -R -v /etc/percona-xtradb-cluster.conf.d/
restorecon reset /etc/percona-xtradb-cluster.conf.d context system_u:object_r:etc_t:s0->system_u:object_r:mysqld_etc_t:s0
restorecon reset /etc/percona-xtradb-cluster.conf.d/wsrep.cnf context system_u:object_r:etc_t:s0->system_u:object_r:mysqld_etc_t:s0
restorecon reset /etc/percona-xtradb-cluster.conf.d/mysqld.cnf context system_u:object_r:etc_t:s0->system_u:object_r:mysqld_etc_t:s0
restorecon reset /etc/percona-xtradb-cluster.conf.d/mysqld_safe.cnf context system_u:object_r:etc_t:s0->system_u:object_r:mysqld_etc_t:s0

Variables check list

Here is a list of all the variables you should check for paths used by MySQL

  • datadir, default is /var/lib/mysql, where MySQL stores its data
  • basedir, default is /usr, where binaries and librairies can be found
  • character_sets_dir, default is basedir/share/mysql/charsets, charsets used by MySQL
  • general_log_file, default is the datadir, where the general log is written
  • init_file, no default, sql file read and executed when the server starts
  • innodb_undo_directory, default is datadir, where InnoDB stores the undo files
  • innodb_tmpdir, default is tmpdir, where InnoDB creates temporary files
  • innodb_temp_data_file_path, default is in the datadir, where InnoDB creates the temporary tablespace
  • innodb_parallel_doublewrite_path, default is in the datadir, where InnoDB created the parallel doublewrite buffer
  • innodb_log_group_home_dir, default is the datadir, where InnoDB writes its transational log files
  • innodb_data_home_dir, default is the datadir, used a default value for the InnoDB files
  • innodb_data_file_path, default is in the datadir, path of the system tablespace
  • innodb_buffer_pool_filename, default is in the datadir, where InnoDB writes the buffer pool dump information
  • lc_messages_dir, basedir/share/mysql
  • log_bin_basename, default is the datadir, where the binlogs are stored
  • log_bin_index, default is the datadir, where the binlog index file is stored
  • log_error, no default value, where the MySQL error log is stored
  • pid-file, no default value, where the MySQL pid file is stored
  • plugin_dir, default is basedir/lib/mysql/plugin, where the MySQL plugins are stored
  • relay_log_basename, default is the datadir, where the relay logs are stored
  • relay_log_info_file, default is the datadir, may include a path
  • slave_load_tmpdir, default is tmpdir, where the slave stores files coming from LOAD DATA INTO statements.
  • slow_query_log, default is in the datadir, where the slow queries are logged
  • socket, no defaults, where the Unix socket file is created
  • ssl_*, SSL/TLS related files
  • tmpdir, default is /tmp, where temporary files are stored
  • wsrep_data_home_dir, default is the datadir, where galera stores its files
  • wsrep_provider->base_dir, default is wsrep_data_home_dir
  • wsrep_provider->gcache_dir, default is wsrep_data_home_dir, where the gcache file is stored
  • wsrep_provider->socket.ssl_*, no defaults, where the SSL/TLS related files for the Galera protocol are stored

That’s quite a long list and I may have missed some. If for any of these variables you use a non-standard path, you’ll need to adjust the context labels as we just did above.

All together

I would understand if you feel a bit lost, I am not a SELinux guru and it took me some time to understand decently how it works. Let’s recap how we can enable SELinux for PXC from what we learned in the previous sections.

1. Install the SELinux utilities

yum install policycoreutils-python.x86_64

2. Allow the TCP ports used by PXC

semanage port -a -t mysqld_port_t -p tcp 4568
semanage port -m -t mysqld_port_t -p tcp 4444
semanage port -m -t mysqld_port_t -p tcp 4567

3. Modify the SST script

Replace the wait_for_listen function in the /usr/bin/wsrep_sst_xtrabackup-v2 file by the version above. Hopefully, the next PXC release will include a SELinux friendly wait_for_listen function.

4. Set the security context labels for the configuration files

These steps seems optional but for completeness:

semanage fcontext -a -t mysqld_etc_t "/etc/percona-xtradb-cluster\.cnf"
semanage fcontext -a -t mysqld_etc_t "/etc/percona-xtradb-cluster\.conf\.d(/.*)?"
restorecon -v /etc/percona-xtradb-cluster.cnf
restorecon -R -v /etc/percona-xtradb-cluster.conf.d/

5. Create the policy file PXC.te

Create the file PXC.te with this content:

module PXC 1.0;
require {
        type unconfined_t;
        type mysqld_t;
        type unconfined_service_t;
        type tmp_t;
        type sysctl_net_t;
        type kernel_t;
        type mysqld_safe_t;
        class process { getattr setpgid };
        class unix_stream_socket connectto;
        class system module_request;
        class file { getattr open read write };
        class dir search;
}
#============= mysqld_t ==============
allow mysqld_t kernel_t:system module_request;
allow mysqld_t self:process { getattr setpgid };
allow mysqld_t self:unix_stream_socket connectto;
allow mysqld_t sysctl_net_t:dir search;
allow mysqld_t sysctl_net_t:file { getattr open read };
allow mysqld_t tmp_t:file write;

6. Compile and load the policy module

checkmodule -M -m -o PXC.mod PXC.te
semodule_package -o PXC.pp -m PXC.mod
semodule -i PXC.pp

7. Run for a while in Permissive mode

Set SELinux into permissive mode in /etc/sysconfig/selinux and reboot. Validate everything works fine in Permissive mode, check the audit.log for any denied messages. If there are denied messages, address them.

8. Enforce SELINUX

Last step, enforce SELinux:

setenforce 1
perl -pi -e 's/SELINUX=permissive/SELINUX=enforcing/g' /etc/sysconfig/selinux

Conclusion

As we can see, enabling SELinux with PXC is not straightforward but, once the process is understood, it is not that hard either. In an IT world where security is more than ever a major concern, enabling SELinux with PXC is a nice step forward. In an upcoming post, we’ll look at the other security framework, Apparmor.

The post Lock Down: Enforcing SELinux with Percona XtraDB Cluster appeared first on Percona Database Performance Blog.

by Yves Trudeau at June 21, 2018 03:57 PM

Jean-Jerome Schmidt

MySQL on Docker: Running a MariaDB Galera Cluster without Orchestration Tools - DB Container Management - Part 2

As we saw in the first part of this blog, a strongly consistent database cluster like Galera does not play well with container orchestration tools like Kubernetes or Swarm. We showed you how to deploy Galera and configure process management for Docker, so you retain full control of the behaviour.  This blog post is the continuation of that, we are going to look into operation and maintenance of the cluster.

To recap some of the main points from the part 1 of this blog, we deployed a three-node Galera cluster, with ProxySQL and Keepalived on three different Docker hosts, where all MariaDB instances run as Docker containers. The following diagram illustrates the final deployment:

Graceful Shutdown

To perform a graceful MySQL shutdown, the best way is to send SIGTERM (signal 15) to the container:

$ docker kill -s 15 {db_container_name}

If you would like to shutdown the cluster, repeat the above command on all database containers, one node at a time. The above is similar to performing "systemctl stop mysql" in systemd service for MariaDB. Using "docker stop" command is pretty risky for database service because it waits for 10 seconds timeout and Docker will force SIGKILL if this duration is exceeded (unless you use a proper --timeout value).

The last node that shuts down gracefully will have the seqno not equal to -1 and safe_to_bootstrap flag is set to 1 in the /{datadir volume}/grastate.dat of the Docker host, for example on host2:

$ cat /containers/mariadb2/datadir/grastate.dat
# GALERA saved state
version: 2.1
uuid:    e70b7437-645f-11e8-9f44-5b204e58220b
seqno:   7099
safe_to_bootstrap: 1

Detecting the Most Advanced Node

If the cluster didn't shut down gracefully, or the node that you were trying to bootstrap wasn't the last node to leave the cluster, you probably wouldn't be able to bootstrap one of the Galera node and might encounter the following error:

2016-11-07 01:49:19 5572 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node.
It was not the last one to leave the cluster and may not contain all the updates.
To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .

Galera honours the node that has safe_to_bootstrap flag set to 1 as the first reference node. This is the safest way to avoid data loss and ensure the correct node always gets bootstrapped.

If you got the error, we have to find out the most advanced node first before picking up the node as the first to be bootstrapped. Create a transient container (with --rm flag), map it to the same datadir and configuration directory of the actual database container with two MySQL command flags, --wsrep_recover and --wsrep_cluster_address. For example, if we want to know mariadb1 last committed number, we need to run:

$ docker run --rm --name mariadb-recover \
        --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
        --volume /containers/mariadb1/datadir:/var/lib/mysql \
        --volume /containers/mariadb1/conf.d:/etc/mysql/conf.d \
        mariadb:10.2.15 \
        --wsrep_recover \
        --wsrep_cluster_address=gcomm://
2018-06-12  4:46:35 139993094592384 [Note] mysqld (mysqld 10.2.15-MariaDB-10.2.15+maria~jessie) starting as process 1 ...
2018-06-12  4:46:35 139993094592384 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
...
2018-06-12  4:46:35 139993094592384 [Note] Plugin 'FEEDBACK' is disabled.
2018-06-12  4:46:35 139993094592384 [Note] Server socket created on IP: '::'.
2018-06-12  4:46:35 139993094592384 [Note] WSREP: Recovered position: e70b7437-645f-11e8-9f44-5b204e58220b:7099

The last line is what we are looking for. MariaDB prints out the cluster UUID and the sequence number of the most recently committed transaction. The node which holds the highest number is deemed as the most advanced node. Since we specified --rm, the container will be removed automatically once it exits. Repeat the above step on every Docker host by replacing the --volume path to the respective database container volumes.

Once you have compared the value reported by all database containers and decided which container is the most up-to-date node, change the safe_to_bootstrap flag to 1 inside /{datadir volume}/grastate.dat manually. Let's say all nodes are reporting the same exact sequence number, we can just pick mariadb3 to be bootstrapped by changing the safe_to_bootstrap value to 1:

$ vim /containers/mariadb3/datadir/grasate.dat
...
safe_to_bootstrap: 1

Save the file and start bootstrapping the cluster from that node, as described in the next chapter.

Bootstrapping the Cluster

Bootstrapping the cluster is similar to the first docker run command we used when starting up the cluster for the first time. If mariadb1 is the chosen bootstrap node, we can simply re-run the created bootstrap container:

$ docker start mariadb0 # on host1

Otherwise, if the bootstrap container does not exist on the chosen node, let's say on host2, run the bootstrap container command and map the existing mariadb2's volumes. We are using mariadb0 as the container name on host2 to indicate it is a bootstrap container:

$ docker run -d \
        --name mariadb0 \
        --hostname mariadb0.weave.local \
        --net weave \
        --publish "3306" \
        --publish "4444" \
        --publish "4567" \
        --publish "4568" \
        $(weave dns-args) \
        --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
        --volume /containers/mariadb2/datadir:/var/lib/mysql \
        --volume /containers/mariadb2/conf.d:/etc/mysql/mariadb.conf.d \
        mariadb:10.2.15 \
        --wsrep_cluster_address=gcomm:// \
        --wsrep_sst_auth="root:PM7%cB43$sd@^1" \
        --wsrep_node_address=mariadb0.weave.local

You may notice that this command is slightly shorter as compared to the previous bootstrap command described in this guide. Since we already have the proxysql user created in our first bootstrap command, we may skip these two environment variables:

  • --env MYSQL_USER=proxysql
  • --env MYSQL_PASSWORD=proxysqlpassword

Then, start the remaining MariaDB containers, remove the bootstrap container and start the existing MariaDB container on the bootstrapped host. Basically the order of commands would be:

$ docker start mariadb1 # on host1
$ docker start mariadb3 # on host3
$ docker stop mariadb0 # on host2
$ docker start mariadb2 # on host2

At this point, the cluster is started and is running at full capacity.

Resource Control

Memory is a very important resource in MySQL. This is where the buffers and caches are stored, and it's critical for MySQL to reduce the impact of hitting the disk too often. On the other hand, swapping is bad for MySQL performance. By default, there will be no resource constraints on the running containers. Containers use as much of a given resource as the host’s kernel will allow. Another important thing is file descriptor limit. You can increase the limit of open file descriptor, or "nofile" to something higher to cater for the number of files MySQL server can open simultaneously. Setting this to a high value won't hurt.

To cap memory allocation and increase the file descriptor limit to our database container, one would append --memory, --memory-swap and --ulimit parameters into the "docker run" command:

$ docker kill -s 15 mariadb1
$ docker rm -f mariadb1
$ docker run -d \
        --name mariadb1 \
        --hostname mariadb1.weave.local \
        --net weave \
        --publish "3306:3306" \
        --publish "4444" \
        --publish "4567" \
        --publish "4568" \
        $(weave dns-args) \
        --memory 16g \
        --memory-swap 16g \
        --ulimit nofile:16000:16000 \
        --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
        --volume /containers/mariadb1/datadir:/var/lib/mysql \
        --volume /containers/mariadb1/conf.d:/etc/mysql/mariadb.conf.d \
        mariadb:10.2.15 \
        --wsrep_cluster_address=gcomm://mariadb0.weave.local,mariadb1.weave.local,mariadb2.weave.local,mariadb3.weave.local \
        --wsrep_sst_auth="root:PM7%cB43$sd@^1" \
        --wsrep_node_address=mariadb1.weave.local

Take note that if --memory-swap is set to the same value as --memory, and --memory is set to a positive integer, the container will not have access to swap. If --memory-swap is not set, container swap will default to --memory multiply by 2. If --memory and --memory-swap are set to the same value, this will prevent containers from using any swap. This is because --memory-swap is the amount of combined memory and swap that can be used, while --memory is only the amount of physical memory that can be used.

Some of the container resources like memory and CPU can be controlled dynamically through "docker update" command, as shown in the following example to upgrade the memory of container mariadb1 to 32G on-the-fly:

$ docker update \
    --memory 32g \
    --memory-swap 32g \
    mariadb1

Do not forget to tune the my.cnf accordingly to suit the new specs. Configuration management is explained in the next section.

Configuration Management

Most of the MySQL/MariaDB configuration parameters can be changed during runtime, which means you don't need to restart to apply the changes. Check out the MariaDB documentation page for details. The parameter listed with "Dynamic: Yes" means the variable is loaded immediately upon changing without the necessity to restart MariaDB server. Otherwise, set the parameters inside the custom configuration file in the Docker host. For example, on mariadb3, make the changes to the following file:

$ vim /containers/mariadb3/conf.d/my.cnf

And then restart the database container to apply the change:

$ docker restart mariadb3

Verify the container starts up the process by looking at the docker logs. Perform this operation on one node at a time if you would like to make cluster-wide changes.

Backup

Taking a logical backup is pretty straightforward because the MariaDB image also comes with mysqldump binary. You simply use the "docker exec" command to run the mysqldump and send the output to a file relative to the host path. The following command performs mysqldump backup on mariadb2 and saves it to /backups/mariadb2 inside host2:

$ docker exec -it mariadb2 mysqldump -uroot -p --single-transaction > /backups/mariadb2/dump.sql

Binary backup like Percona Xtrabackup or MariaDB Backup requires the process to access the MariaDB data directory directly. You have to either install this tool inside the container, or through the machine host or use a dedicated image for this purpose like "perconalab/percona-xtrabackup" image to create the backup and stored it inside /tmp/backup on the Docker host:

$ docker run --rm -it \
    -v /containers/mariadb2/datadir:/var/lib/mysql \
    -v /tmp/backup:/xtrabackup_backupfiles \
    perconalab/percona-xtrabackup \
    --backup --host=mariadb2 --user=root --password=mypassword

You can also stop the container with innodb_fast_shutdown set to 0 and copy over the datadir volume to another location in the physical host:

$ docker exec -it mariadb2 mysql -uroot -p -e 'SET GLOBAL innodb_fast_shutdown = 0'
$ docker kill -s 15 mariadb2
$ cp -Rf /containers/mariadb2/datadir /backups/mariadb2/datadir_copied
$ docker start mariadb2

Restore

Restoring is pretty straightforward for mysqldump. You can simply redirect the stdin into the container from the physical host:

$ docker exec -it mariadb2 mysql -uroot -p < /backups/mariadb2/dump.sql

You can also use the standard mysql client command line remotely with proper hostname and port value instead of using this "docker exec" command:

$ mysql -uroot -p -h127.0.0.1 -P3306 < /backups/mariadb2/dump.sql

For Percona Xtrabackup and MariaDB Backup, we have to prepare the backup beforehand. This will roll forward the backup to the time when the backup was finished. Let's say our Xtrabackup files are located under /tmp/backup of the Docker host, to prepare it, simply:

$ docker run --rm -it \
    -v mysql-datadir:/var/lib/mysql \
    -v /tmp/backup:/xtrabackup_backupfiles \
    perconalab/percona-xtrabackup \
    --prepare --target-dir /xtrabackup_backupfiles

The prepared backup under /tmp/backup of the Docker host then can be used as the MariaDB datadir for a new container or cluster. Let's say we just want to verify restoration on a standalone MariaDB container, we would run:

$ docker run -d \
    --name mariadb-restored \
    --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
    -v /tmp/backup:/var/lib/mysql \
    mariadb:10.2.15

If you performed a backup using stop and copy approach, you can simply duplicate the datadir and use the duplicated directory as a volume maps to MariaDB datadir to run on another container. Let's say the backup was copied over under /backups/mariadb2/datadir_copied, we can run a new container by running:

$ mkdir -p /containers/mariadb-restored/datadir
$ cp -Rf /backups/mariadb2/datadir_copied /containers/mariadb-restored/datadir
$ docker run -d \
    --name mariadb-restored \
    --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
    -v /containers/mariadb-restored/datadir:/var/lib/mysql \
    mariadb:10.2.15

The MYSQL_ROOT_PASSWORD must match the actual root password for that particular backup.

Severalnines
 
MySQL on Docker: How to Containerize Your Database
Discover all you need to understand when considering to run a MySQL service on top of Docker container virtualization

Database Version Upgrade

There are two types of upgrade - in-place upgrade or logical upgrade.

In-place upgrade involves shutting down the MariaDB server, replacing the old binaries with the new binaries and then starting the server on the old data directory. Once started, you have to run mysql_upgrade script to check and upgrade all system tables and also to check the user tables.

The logical upgrade involves exporting SQL from the current version using a logical backup utility such as mysqldump, running the new container with the upgraded version binaries, and then applying the SQL to the new MySQL/MariaDB version. It is similar to backup and restore approach described in the previous section.

Nevertheless, it's a good approach to always backup your database before performing any destructive operations. The following steps are required when upgrading from the current image, MariaDB 10.1.33 to another major version, MariaDB 10.2.15 on mariadb3 resides on host3:

  1. Backup the database. It doesn't matter physical or logical backup but the latter using mysqldump is recommended.

  2. Download the latest image that we would like to upgrade to:

    $ docker pull mariadb:10.2.15
  3. Set innodb_fast_shutdown to 0 for our database container:

    $ docker exec -it mariadb3 mysql -uroot -p -e 'SET GLOBAL innodb_fast_shutdown = 0'
  4. Graceful shut down the database container:

    $ docker kill --signal=TERM mariadb3
  5. Create a new container with the new image for our database container. Keep the rest of the parameters intact except using the new container name (otherwise it would conflict):

    $ docker run -d \
            --name mariadb3-new \
            --hostname mariadb3.weave.local \
            --net weave \
            --publish "3306:3306" \
            --publish "4444" \
            --publish "4567" \
            --publish "4568" \
            $(weave dns-args) \
            --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
            --volume /containers/mariadb3/datadir:/var/lib/mysql \
            --volume /containers/mariadb3/conf.d:/etc/mysql/mariadb.conf.d \
            mariadb:10.2.15 \
            --wsrep_cluster_address=gcomm://mariadb0.weave.local,mariadb1.weave.local,mariadb2.weave.local,mariadb3.weave.local \
            --wsrep_sst_auth="root:PM7%cB43$sd@^1" \
            --wsrep_node_address=mariadb3.weave.local
  6. Run mysql_upgrade script:

    $ docker exec -it mariadb3-new mysql_upgrade -uroot -p
  7. If no errors occurred, remove the old container, mariadb3 (the new one is mariadb3-new):

    $ docker rm -f mariadb3
  8. Otherwise, if the upgrade process fails in between, we can fall back to the previous container:

    $ docker stop mariadb3-new
    $ docker start mariadb3

Major version upgrade can be performed similarly to the minor version upgrade, except you have to keep in mind that MySQL/MariaDB only supports major upgrade from the previous version. If you are on MariaDB 10.0 and would like to upgrade to 10.2, you have to upgrade to MariaDB 10.1 first, followed by another upgrade step to MariaDB 10.2.

Take note on the configuration changes being introduced and deprecated between major versions.

Failover

In Galera, all nodes are masters and hold the same role. With ProxySQL in the picture, connections that pass through this gateway will be failed over automatically as long as there is a primary component running for Galera Cluster (that is, a majority of nodes are up). The application won't notice any difference if one database node goes down because ProxySQL will simply redirect the connections to the other available nodes.

If the application connects directly to the MariaDB bypassing ProxySQL, failover has to be performed on the application-side by pointing to the next available node, provided the database node meets the following conditions:

  • Status wsrep_local_state_comment is Synced (The state "Desynced/Donor" is also possible, only if wsrep_sst_method is xtrabackup, xtrabackup-v2 or mariabackup).
  • Status wsrep_cluster_status is Primary.

In Galera, an available node doesn't mean it's healthy until the above status are verified.

Scaling Out

To scale out, we can create a new container in the same network and use the same custom configuration file for the existing container on that particular host. For example, let's say we want to add the fourth MariaDB container on host3, we can use the same configuration file mounted for mariadb3, as illustrated in the following diagram:

Run the following command on host3 to scale out:

$ docker run -d \
        --name mariadb4 \
        --hostname mariadb4.weave.local \
        --net weave \
        --publish "3306:3307" \
        --publish "4444" \
        --publish "4567" \
        --publish "4568" \
        $(weave dns-args) \
        --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
        --volume /containers/mariadb4/datadir:/var/lib/mysql \
        --volume /containers/mariadb3/conf.d:/etc/mysql/mariadb.conf.d \
        mariadb:10.2.15 \
        --wsrep_cluster_address=gcomm://mariadb1.weave.local,mariadb2.weave.local,mariadb3.weave.local,mariadb4.weave.local \
        --wsrep_sst_auth="root:PM7%cB43$sd@^1" \
        --wsrep_node_address=mariadb4.weave.local

Once the container is created, it will join the cluster and perform SST. It can be accessed on port 3307 externally or outside of the Weave network, or port 3306 within the host or within the Weave network. It's not necessary to include mariadb0.weave.local into the cluster address anymore. Once the cluster is scaled out, we need to add the new MariaDB container into the ProxySQL load balancing set via admin console:

$ docker exec -it proxysql1 mysql -uadmin -padmin -P6032
mysql> INSERT INTO mysql_servers(hostgroup_id,hostname,port) VALUES (10,'mariadb4.weave.local',3306);
mysql> INSERT INTO mysql_servers(hostgroup_id,hostname,port) VALUES (20,'mariadb4.weave.local',3306);
mysql> LOAD MYSQL SERVERS TO RUNTIME;
mysql> SAVE MYSQL SERVERS TO DISK;

Repeat the above commands on the second ProxySQL instance.

Finally for the the last step, (you may skip this part if you already ran "SAVE .. TO DISK" statement in ProxySQL), add the following line into proxysql.cnf to make it persistent across container restart on host1 and host2:

$ vim /containers/proxysql1/proxysql.cnf # host1
$ vim /containers/proxysql2/proxysql.cnf # host2

And append mariadb4 related lines under mysql_server directive:

mysql_servers =
(
        { address="mariadb1.weave.local" , port=3306 , hostgroup=10, max_connections=100 },
        { address="mariadb2.weave.local" , port=3306 , hostgroup=10, max_connections=100 },
        { address="mariadb3.weave.local" , port=3306 , hostgroup=10, max_connections=100 },
        { address="mariadb4.weave.local" , port=3306 , hostgroup=10, max_connections=100 },
        { address="mariadb1.weave.local" , port=3306 , hostgroup=20, max_connections=100 },
        { address="mariadb2.weave.local" , port=3306 , hostgroup=20, max_connections=100 },
        { address="mariadb3.weave.local" , port=3306 , hostgroup=20, max_connections=100 },
        { address="mariadb4.weave.local" , port=3306 , hostgroup=20, max_connections=100 }
)

Save the file and we should be good on the next container restart.

Scaling Down

To scale down, simply shuts down the container gracefully. The best command would be:

$ docker kill -s 15 mariadb4
$ docker rm -f mariadb4

Remember, if the database node left the cluster ungracefully, it was not part of scaling down and would affect the quorum calculation.

To remove the container from ProxySQL, run the following commands on both ProxySQL containers. For example, on proxysql1:

$ docker exec -it proxysql1 mysql -uadmin -padmin -P6032
mysql> DELETE FROM mysql_servers WHERE hostname="mariadb4.weave.local";
mysql> LOAD MYSQL SERVERS TO RUNTIME;
mysql> SAVE MYSQL SERVERS TO DISK;

You can then either remove the corresponding entry inside proxysql.cnf or just leave it like that. It will be detected as OFFLINE from ProxySQL point-of-view anyway.

Summary

With Docker, things get a bit different from the conventional way on handling MySQL or MariaDB servers. Handling stateful services like Galera Cluster is not as easy as stateless applications, and requires proper testing and planning.

In our next blog on this topic, we will evaluate the pros and cons of running Galera Cluster on Docker without any orchestration tools.

by ashraf at June 21, 2018 07:11 AM

June 20, 2018

Peter Zaitsev

Percona Server for MongoDB 3.4.15-2.13 Is Now Available

MongoRocks

Percona Server for MongoDB 3.2Percona announces the release of Percona Server for MongoDB 3.4.15-2.13 on June 20, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.4 Community Edition. It supports MongoDB 3.4 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features. It requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.15 and does not include any additional changes.

The Percona Server for MongoDB 3.4.15-2.13 release notes are available in the official documentation.

The post Percona Server for MongoDB 3.4.15-2.13 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at June 20, 2018 04:44 PM

Percona XtraDB Cluster 5.6.40-26.25 Is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.6Percona announces the release of Percona XtraDB Cluster 5.6.40-26.25 (PXC) on June 20, 2018. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.40-26.25 is now the current release, based on the following:

All Percona software is open-source and free.

New feature

  • PXC-907: New variable wsrep_RSU_commit_timeout allows to configure RSU wait for active commit connection timeout (in microseconds).

Fixed Bugs

  • PXC-2128: Duplicated auto-increment values were set for the concurrent sessions on cluster reconfiguration due to the erroneous readjustment.
  • PXC-2059: Error message about the necessity of the SUPER privilege appearing in case of the CREATE TRIGGER statements fail due to enabled WSREP was made more clear.
  • PXC-2091: Check for the maximum number of rows, that can be replicated as a part of a single transaction because of the Galera limit, was enforced even when replication was disabled with wsrep_on=OFF.
  • PXC-2103: Interruption of the local running transaction in a COMMIT state by a replicated background transaction while waiting for the binlog backup protection caused the commit fail and, eventually, an assert in Galera.
  • PXC-2130Percona XtraDB Cluster failed to build with Python 3.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

The post Percona XtraDB Cluster 5.6.40-26.25 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at June 20, 2018 02:07 PM

Is Serverless Just a New Word for Cloud Based?

serverless architecture

serverless architectureServerless is a new buzzword in the database industry. Even though it gets tossed around often, there is some confusion about what it really means and how it really works. Serverless architectures rely on third-party Backend as a Service (BaaS) services. They can also include custom code that is run in managed, ephemeral containers on a Functions as a Service (FaaS) platform. In comparison to traditional Platform as a Service (PaaS) server architecture, where you pay a predetermined sum for your instances, serverless applications benefit from reduced costs of operations and lower complexity. They are also considered to be more agile, allowing for reduced engineering efforts.

In reality, there are still servers in a serverless architecture: they are just being used, managed, and maintained outside of the application. But isn’t that a lot like what cloud providers, such as Amazon RDS, Google Cloud, and Microsoft Azure, are already offering? Well, yes, but with several caveats.

When you use any of the aforementioned platforms, you still need to provision the types of instances that you plan to use and define how those platforms will act. For example, will it run MySQL, MongoDB, PostgreSQL, or some other tool? With serverless, these decisions are no longer needed. Instead, you simply consume resources from a shared resource pool, using whatever application suits your needs at that time. In addition, in a serverless world, you are only charged for the time that you use the server instead of being charged whether you use it a lot or a little (or not at all).

Remember When You Joined That Gym?

How many of us have purchased a gym membership at some point in our life? Oftentimes, you walk in with the best of intentions and happily enroll in a monthly plan. “For only $29.95 per month, you can use all of the resources of the gym as much as you want.” But, many of us have purchased such a membership and found that our visits to the gym dwindle over time, leaving us paying the same monthly fee for less usage.

Traditional Database as a Service (DBaaS) offerings are similar to your gym membership: you sign up, select your service options, and start using them right away. There are certainly cases of companies using those services consistently, just like there are gym members who show up faithfully month after month. But there are also companies who spin up database instances for a specific purpose, use the database instance for some amount of time, and then slowly find that they are accessing that instance less and less. However, the fees for the instance, much like the fees for your gym membership, keep getting charged.

What if we had a “pay as you go” gym plan? Well, some of those certainly exist. Serverless architecture is somewhat like this plan: you only pay for the resources when you use them, and you only pay for your specific usage. This would be like charging $5 for access to the weight room and $3 for access to the swimming pool, each time you use one or the other. The one big difference with serverless architecture for databases is that you still need to have your data stored somewhere in the environment and made available to you as needed. This would be like renting a gym locker to store your workout gear so that didn’t have to bring it back and forth each time you visited.

Obviously, you will pay for that storage, whether it is your data or your workout gear, but the storage fees are going to be less than your standard membership. The big advantage is that you have what you need when you need it, and you can access the necessary resources to use whatever you are storing.

With a serverless architecture, you store your data securely on low cost storage devices and access as needed. The resources required to process that data are available on an on demand basis. So, your charges are likely to be lower since you are paying a low fee for data storage and a usage fee on resources. This can work great for companies that do not need 24x7x365 access to their data since they are only paying for the services when they are using them. It’s also ideal for developers, who may find that they spend far more time working on their application code than testing it against the database. Instead of paying for the database resources while the data is just sitting there doing nothing, you now pay to store the data and incur the database associated fees at use time.

Benefits and Risks of Going Serverless

One of the biggest possible benefits of going with a serverless architecture is that you save money and hassle. Money can be saved since you only pay for the resources when you use them. Hassle is reduced since you don’t need to worry about the hardware on which your application runs. These can be big wins for a company, but you need to be aware of some pitfalls.

First, serverless can save you money, but there is no guarantee that it will save you money.

Consider 2 different people who have the exact same cell phone – maybe it’s your dad and your teenage daughter. These 2 users probably have very different patterns of usage: your dad uses the phone sporadically (if at all!) and your teenage daughter seems to have her phone physically attached to her. These 2 people would benefit from different service plans with their provider. For your dad, a basic plan that allows some usage (similar to the base cost of storage in our serverless database) with charges for usage above that cap would probably suffice. However, such a plan for your teenage daughter would probably spiral out of control and incur very high usage fees. For her, an unlimited plan makes sense. What is a great fit for one user is a poor fit for another, and the same is true when comparing serverless and DBaaS options.

The good news is that serverless architectures and DBaaS options, like Amazon RDS, Microsoft Azure, and Google Cloud, reduce a lot of the hassle of owning and managing servers. You no longer need to be concerned about Mean Time Between Failures, power and cooling issues, or many of the other headaches that come with maintaining your hardware. However, this can also have a negative consequence.

The challenge of enforced updates

About the only thing that is consistent about software in today’s world is that it is constantly changing. New versions are released with new features that may or may not be important to you. When a serverless provider decides to implement a new version or patch of their backend, there may be some downstream issues for you to manage. It is always important to test any new updates, but now some of the decisions about how and when to upgrade may be out of your control. Proper notification from the provider gives you a window of time for testing, but they are probably going to flip the switch regardless of whether or not you have completed all of your test cycles. This is true of both serverless and DBaaS options.

A risk of vendor lock-in

A common mantra in the software world is that we want to avoid vendor lock-in. Of course, from the provider’s side, they want to avoid customer churn, so we often find ourselves on opposite sides of the same issue. Moving to a new platform or provider becomes more complex as you cede more aspects of server management to the host. This means that serverless can cause deep lock-in since your application is designed to work with the environment as your provider has configured it. If you choose to move to a different provider, you need to extract your application and your data from the current provider and probably need to rework it to fit the requirements of the new provider.

The challenge of client-side optimization

Another consideration is that optimizations of server-side configurations must necessarily be more generic compared to those you might make to self-hosted servers. Optimization can no longer be done at the server level for your specific application and use; instead, you now rely on a smarter client to perform your necessary optimizations. This requires a skill set that may not exist with some developers: the ability to tune applications client-side.

Conclusion

Serverless is not going away. In fact, it is likely to grow as people come to a better understanding and comfort level with it. You need to be able to make an informed decision regarding whether serverless is right for you. Careful consideration of the pros and cons is imperative for making a solid determination. Understanding your usage patterns, user expectations, development capabilities, and a lot more will help to guide that decision.

In a future post, I’ll review the architectural differences between on-premises, PaaS, DBaaS and serverless database environments.

 

The post Is Serverless Just a New Word for Cloud Based? appeared first on Percona Database Performance Blog.

by Rick Golba at June 20, 2018 11:35 AM

Webinar Thu 6/21: How to Analyze and Tune MySQL Queries for Better Performance

database query tuning

database query tuningPlease join Percona’s MySQL Database Administrator, Brad Mickel as he presents How to Analyze and Tune MySQL Queries for Better Performance on Thursday, June 21st, 2018, at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

 

Query performance is essential in making any application successful. In order to finely tune your queries you first need to understand how MySQL executes them, and what tools are available to help identify problems.

In this session you will learn:

  1. The common tools for researching problem queries
  2. What an Index is, and why you should use one
  3. Index limitations
  4. When to rewrite the query instead of just adding a new index
Register Now

 

Brad Mickel

MySQL DBA

Bradley began working with MySQL in 2013 as part of his duties in healthcare billing. After 3 years in healthcare billing he joined Percona as part of the bootcamp process. After the bootcamp he has served as a remote database administrator on the Atlas team for Percona Managed Services.

The post Webinar Thu 6/21: How to Analyze and Tune MySQL Queries for Better Performance appeared first on Percona Database Performance Blog.

by Bradley Mickel at June 20, 2018 09:09 AM

June 19, 2018

Jean-Jerome Schmidt

Comparing RDS vs EC2 for Managing MySQL or MariaDB on AWS

RDS is a Database as a Service (DBaaS) that automatically configures and maintains your databases in the AWS cloud. The user has limited power over specific configurations in comparison to running MySQL directly on Elastic Compute Cloud (EC2). But RDS is a convenient service, as long as you can live with the instances and configurations that it offers.

Amazon RDS currently supports various MySQL and MariaDB versions as well as the, MySQL-compatible Amazon Aurora DB engine. It does support replication, but as you may expect from a predefined web console, there are some limitations.

Amazon RDS Services
Amazon RDS Services

There are some tradeoffs when using RDS. These may not only affect the way you manage and provision your database instances, but also key things like performance, security, and high availability.

In this blog, we will take a look at the differences between using RDS and running MySQL on EC2, with focus on replication. As we will see, to decide between hosting MySQL on an EC2 instance or using Amazon RDS is not an easy task.

RDS Platform Tradeoffs

The biggest size of database that AWS can host depends on your source environment, the allocation of data in your source database, and how busy your system is.

Amazon RDS Environment options
Amazon RDS Environment options
Amazon RDS instance class
Amazon RDS instance class

AWS is split into regions. Every AWS account has limits, per region, on the number of AWS resources that can be created. Once a limit for a resource has been reached, additional calls to create that resource will fail.

AWS Regions
AWS Regions

For Amazon RDS MySQL DB instances, the maximum provisioned storage limit constrains the size of a table to a maximum size of 6 TB when using InnoDB file-per-table tablespaces.

InnoDB file-per-table feature is something that you should consider even if you are not looking to migrate a big database into the cloud. You may notice that some existing DB instances have a lower limit. For example, MySQL DB instances created prior to April 2014 have a file and table size limit of 2 TB. This 2-TB file size limit also applies to DB instances or Read Replicas created from DB snapshots taken before April 2014.

One of the key differences which affects the way you set up and maintain database replication is the lack of SUPER user. To address this limitation, Amazon introduced stored procedures that take care of various DBA tasks. Below are the key procedures to manage MySQL RDS replication.

Skip replication error:

CALL mysql.rds_skip_repl_error;

Stop replication:

CALL mysql.rds_stop_replication;

Start replication

CALL mysql.rds_start_replication;

Configures an RDS instance as a Read Replica of a MySQL instance running outside of AWS.

CALL mysql.rds_set_external_master;

Reconfigures a MySQL instance to no longer be a Read Replica of a MySQL instance running outside of AWS.

CALL mysql.rds_reset_external_master;

Imports a certificate. This is needed to enable SSL communication and encrypted replication.

CALL mysql.rds_import_binlog_ssl_material;

Removes a certificate.

CALL mysql.rds_remove_binlog_ssl_material;

Changes the replication master log position to the start of the next binary log on the master.

CALL mysql.rds_next_master_log;

While stored procedures take care of a number of tasks, it is a bit of a learning curve. Lack of SUPER privilege can also create problems in using external replication monitoring.

Amazon RDS does not currently support the following:

  • Global Transaction IDs
  • Transportable Table Space
  • Authentication Plugin
  • Password Strength Plugin
  • Replication Filters
  • Semi-synchronous Replication

Last but not least - access to the shell. Amazon RDS does not allow direct host access to a DB instance via Telnet, Secure Shell (SSH), or Windows Remote Desktop Connection (RDP). You can still use the client on an application host to connect to the DB via standard tools like mysql client.

There are other limitations, as described in the RDS documentation.

High availability with MySQL on EC2

There are options to operate MySQL directly on EC2, and thereby retain control of one’s high availability options. When going down this route, it is important to understand how to leverage the different AWS features that are at your disposal. Make sure you check out our ‘DIY Cloud Database’ white paper.

To automate deployment and management/maintenance tasks (while retaining control), it is possible to use ClusterControl. Just like with RDS, you have the convenience of deploying a database setup in a few minutes via a GUI. Adding nodes, scheduling backups, performing failovers, and so on, can also be conveniently done via the GUI.

Deployment

ClusterControl can automate deployment of different high availability database setups - from master-slave replication to multi-master clusters. All the main MySQL flavours are supported - Oracle MySQL, MariaDB and Percona Server. Some initial setup of VPC/security group is required, and these are well described in the DIY Cloud Database whitepaper. Note that similar concepts apply, whether it is AWS or Google Cloud or Azure

ClusterControl Deploy in EC2
ClusterControl Deploy in EC2

Galera Cluster is a good alternative to consider when deploying a highly available MySQL service. It has established itself as a credible replacement for traditional MySQL master-slave architectures, although it is not a drop-in replacement. Most applications can still be adapted to run on it. It is possible to define different segments for databases that span across multiple AWS regions.

ClusterControl expand cluster in EC2
ClusterControl expand cluster in EC2

It is possible to setup ‘hybrid replication’ by combining synchronous replication within a Galera Cluster and asynchronous replication between the cluster and one or more slaves. Options like delaying the slave gives an additional level of protection to the data.

ClusterControl Add replication in EC2
ClusterControl Add replication in EC2

Proxy layer

To achieve high availability, deploying a highly available setup is not enough. The applications have to somehow know which nodes are working and which ones are not. Changes in topology, e.g. moving a master to another host, also need to be propagated somehow so as to avoid errors in the application layer. ClusterControl supports deployments of proxies like HAProxy, MaxScale, and ProxySQL. For HAProxy and ProxySQL, there are additional options to deploy redundant instances with Keepalived and VirtualIP.

ClusterControl manager load balancers on EC2 nodes
ClusterControl manager load balancers on EC2 nodes

Cross-region replica

Amazon RDS provides read replica services. Cross-region replicas give you the ability to scale reads, as AWS has its services in a number of datacenters around the world. All read replicas are accessible and can be used for reading in a maximum number of five regions. These nodes are independent and can be used in your upgrade path, or can be promoted to standalone databases.

In addition to that, Amazon offers Multi-AZ deployments based on DRBD, synchronous disk replication. How is it different from Read Replicas? The main difference is that only the database engine on the primary instance is active, which leads to other architectural variations.

As opposed to read replicas, database engine version upgrades happen on the primary. Another difference is that AWS RDS will failover automatically with DRBD, while read replicas (using asynchronous replication) will require manual operations from you.

Multi-AZ failover on RDS uses a DNS change to point to the standby instance, according to Amazon this should happen within 60-120 seconds during the failover. Because the standby uses the same storage data as the primary, there will probably be transaction/log recovery. Bigger databases may spend a significant amount of time on InnoDB recovery, so please consider that in your DR plan and RTO calculation.

Of course, this goes with additional cost. Let’s take a look at some basic example. The cost of db.t2.medium host with 2vCPU, 4GB ram is 185.98 USD per month, the price will double when you enable Multizone (MZ) replica to 370.98 UDB. The price will vary by region but it will double in MZ.

Cost comparision
Cost comparision
Cost comparision
Cost comparision
Cost comparision

In order to achieve the same with EC2, you can deploy your virtual machines in different regions. Each AWS Region is completely independent. The setting of AWS Region can be changed in the console, by setting the EC2_REGION environment variable, or it can be overridden by using the --region parameter with the AWS Command Line Interface. When your set of servers are ready, you can use ClusterControl to deploy and monitor your replication. You can also manually set up replication through the console using standard commands.

Cross technology replication

It is possible to setup replication between an Amazon RDS MySQL or MariaDB DB instance and a MySQL or MariaDB instance that is external to Amazon RDS. This is done using standard replication method in mysql, through binary logs. To enable binary logs, you need to modify the my.cnf configuration. Without access to the shell, this task became impossible in RDS. It's done in a not so obvious way. You have two options. One is to enable backups - set automated backups on your Amazon RDS DB instance with retention to higher than 0. Or enable replication to a prebuilt slave server. Both tasks will enable binary logs which you can later on use for your replication.

Enable binary logs via RDS backup
Enable binary logs via RDS backup

Maintain the binlogs in your master instance until you have verified that they have been applied on the replica. This maintenance ensures that you can restore your master instance in the event of a failure.

Another roadblock can be permissions. The permissions required to start replication on an Amazon RDS DB instance are restricted and not available to your Amazon RDS master user. Because of this, you must use the Amazon RDS mysql.rds_set_external_master and mysql.rds_start_replication commands to set up replication between your live database and your Amazon RDS database.

Monitor failover events for the Amazon RDS instance that is your replica. If a failover occurs, then the DB instance that is your replica might be recreated on a new host with a different network address. For information on how to monitor failover events, see Using Amazon RDS Event Notification.

In the below example, we will see how to enable replication from RDS to an external DB located on an EC2 instance.
You should have binary logs enabled, we use an RDS slave here.

Specify the number of hours to retain binary logs.

mysql -h RDS_MASTER -u<username> -u<password>
call mysql.rds_set_configuration('binlog retention hours', 7);

On RDS MASTER, create replication user with the following commands:

CREATE USER 'repl'@'ec2DBslave' IDENTIFIED BY 's3cr3tp4SSw0rd';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'ec2DBslave';

On RDS SLAVE, run the commands:

mysql -u<username> -u<password> -h RDS_SLAVE
call mysql.rds_stop_replication;
SHOW SLAVE STATUS;  Exec_Master_Log_Pos, Relay_Master_Log_File.

On RDS SLAVE, run mysqldump with the following format:

mysqldump -u<username> -u<password> -h RDS_SLAVE --routines --triggers --single-transaction --databases DB1 DB2 DB3 > mysqldump.sql

Import the DB dump to external database:

mysql -u<username> -u<password> -h ec2DBslave
tee import_database.log;
source mysqldump.sql;
CHANGE MASTER TO 
 MASTER_HOST='RDS_MASTER', 
 MASTER_USER='repl',
 MASTER_PASSWORD='s3cr3tp4SSw0rd',
 MASTER_LOG_FILE='<Relay_Master_Log_File>',
 MASTER_LOG_POS=<Exec_Master_Log_Pos>;

Create a replication filter to ignore tables created by AWS only on RDS

CHANGE REPLICATION FILTER REPLICATE_WILD_IGNORE_TABLE = ('mysql.rds\_%');

Start replication

START SLAVE;

Verify replication status

SHOW SLAVE STATUS;

That’s it for now. Managing MySQL on AWS is a big topic. Do let us know your thoughts in the comments section below.

by Bart Oles at June 19, 2018 08:49 PM

Peter Zaitsev

Chunk Change: InnoDB Buffer Pool Resizing

innodb buffer pool chunk size

Since MySQL 5.7.5, we have been able to resize dynamically the InnoDB Buffer Pool. This new feature also introduced a new variable — innodb_buffer_pool_chunk_size — which defines the chunk size by which the buffer pool is enlarged or reduced. This variable is not dynamic and if it is incorrectly configured, could lead to undesired situations.

Let’s see first how innodb_buffer_pool_size , innodb_buffer_pool_instances  and innodb_buffer_pool_chunk_size interact:

The buffer pool can hold several instances and each instance is divided into chunks. There is some information that we need to take into account: the number of instances can go from 1 to 64 and the total amount of chunks should not exceed 1000.

So, for a server with 3GB RAM, a buffer pool of 2GB with 8 instances and chunks at default value (128MB) we are going to get 2 chunks per instance:

This means that there will be 16 chunks.

I’m not going to explain the benefits of having multiple instances, I will focus on resizing operations. Why would you want to resize the buffer pool? Well, there are several reasons, such as:

  • on a virtual server you can add more memory dynamically
  • for a physical server, you might want to reduce database memory usage to make way for other processes
  • on systems where the database size is smaller than available RAM
  • if you expect a huge growth and want to increase the buffer pool on demand

Reducing the buffer pool

Let’s start reducing the buffer pool:

| innodb_buffer_pool_size | 2147483648 |
| innodb_buffer_pool_instances | 8     |
| innodb_buffer_pool_chunk_size | 134217728 |
mysql> set global innodb_buffer_pool_size=1073741824;
Query OK, 0 rows affected (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_buffer_pool_size | 1073741824 |
+-------------------------+------------+
1 row in set (0.00 sec)

If we try to decrease it to 1.5GB, the buffer pool will not change and a warning will be showed:

mysql> set global innodb_buffer_pool_size=1610612736;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+---------------------------------------------------------------------------------+
| Level   | Code | Message                                                                         |
+---------+------+---------------------------------------------------------------------------------+
| Warning | 1210 | InnoDB: Cannot resize buffer pool to lesser than chunk size of 134217728 bytes. |
+---------+------+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_buffer_pool_size | 2147483648 |
+-------------------------+------------+
1 row in set (0.01 sec)

Increasing the buffer pool

When we try to increase the value from 1GB to 1.5GB, the buffer pool is resized but the requested innodb_buffer_pool_size is considered to be incorrect and is truncated:

mysql> set global innodb_buffer_pool_size=1610612736;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+-----------------------------------------------------------------+
| Level   | Code | Message                                                         |
+---------+------+-----------------------------------------------------------------+
| Warning | 1292 | Truncated incorrect innodb_buffer_pool_size value: '1610612736' |
+---------+------+-----------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_buffer_pool_size | 2147483648 |
+-------------------------+------------+
1 row in set (0.01 sec)

And the final size is 2GB. Yes! you intended to set the value to 1.5GB and you succeeded in setting it to 2GB. Even if you set 1 byte higher, like setting: 1073741825, you will end up with a buffer pool of 2GB.

mysql> set global innodb_buffer_pool_size=1073741825;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_%size' ;
+-------------------------------+------------+
| Variable_name                 | Value      |
+-------------------------------+------------+
| innodb_buffer_pool_chunk_size | 134217728  |
| innodb_buffer_pool_size       | 2147483648 |
+-------------------------------+------------+
2 rows in set (0.01 sec)

Interesting scenarios

Increasing size in the config file

Let’s suppose one day you get up willing to change or tune some variables in your server, and you decide that as you have free memory you will increase the buffer pool. In this example, we are going to use a server with 

innodb_buffer_pool_instances = 16
  and 2GB of buffer pool size which will be increased to 2.5GB

So, we set in the configuration file:

innodb_buffer_pool_size = 2684354560

But then after restart, we found:

mysql> show global variables like 'innodb_buffer_pool_%size' ;
+-------------------------------+------------+
| Variable_name                 | Value      |
+-------------------------------+------------+
| innodb_buffer_pool_chunk_size | 134217728  |
| innodb_buffer_pool_size       | 4294967296 |
+-------------------------------+------------+
2 rows in set (0.00 sec)

And the error log says:

2018-05-02T21:52:43.568054Z 0 [Note] InnoDB: Initializing buffer pool, total size = 4G, instances = 16, chunk size = 128M

So, after we have set innodb_buffer_pool_size in the config file to 2.5GB, the database gives us a 4GB buffer pool, because of the number of instances and the chunk size. What the message doesn’t tell us is the number of chunks, and this would be useful to understand why such a huge difference.

Let’s take a look at how that’s calculated.

Increasing instances and chunk size

Changing the number of instances or the chunk size will require a restart and will take into consideration the buffer pool size as an upper limit to set the chunk size. For instance, with this configuration:

innodb_buffer_pool_size = 2147483648
innodb_buffer_pool_instances = 32
innodb_buffer_pool_chunk_size = 134217728

We get this chunk size:

mysql> show global variables like 'innodb_buffer_pool_%size' ;
+-------------------------------+------------+
| Variable_name                 | Value      |
+-------------------------------+------------+
| innodb_buffer_pool_chunk_size | 67108864   |
| innodb_buffer_pool_size       | 2147483648 |
+-------------------------------+------------+
2 rows in set (0.00 sec)

However, we need to understand how this is really working. To get the innodb_buffer_pool_chunk_size it will make this calculation: innodb_buffer_pool_size / innodb_buffer_pool_instances with the result rounded to a multiple of 1MB.

In our example, the calculation will be 2147483648 / 32 = 67108864 which 67108864%1048576=0, no rounding needed. The number of chunks will be one chunk per instance.

When does it consider that it needs to use more chunks per instance? When the difference between the required size and the innodb_buffer_pool_size configured in the file is greater or equal to 1MB.

That is why, for instance, if you try to set the innodb_buffer_pool_size equal to 1GB + 1MB – 1B you will get 1GB of buffer pool:

innodb_buffer_pool_size = 1074790399
innodb_buffer_pool_instances = 16
innodb_buffer_pool_chunk_size = 67141632
2018-05-07T09:26:43.328313Z 0 [Note] InnoDB: Initializing buffer pool, total size = 1G, instances = 16, chunk size = 64M

But if you set the innodb_buffer_pool_size equals to 1GB + 1MB you will get 2GB of buffer pool:

innodb_buffer_pool_size = 1074790400
innodb_buffer_pool_instances = 16
innodb_buffer_pool_chunk_size = 67141632
2018-05-07T09:25:48.204032Z 0 [Note] InnoDB: Initializing buffer pool, total size = 2G, instances = 16, chunk size = 64M

This is because it considers that two chunks will fit. We can say that this is how the InnoDB Buffer pool size is calculated:

determine_best_chunk_size{
  if innodb_buffer_pool_size / innodb_buffer_pool_instances < innodb_buffer_pool_chunk_size
  then
    innodb_buffer_pool_chunk_size = roundDownMB(innodb_buffer_pool_size / innodb_buffer_pool_instances)
  fi
}
determine_amount_of_chunks{
  innodb_buffer_amount_chunks_per_instance = roundDown(innodb_buffer_pool_size / innodb_buffer_pool_instances / innodb_buffer_pool_chunk_size)
  if innodb_buffer_amount_chunks_per_instance * innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size - innodb_buffer_pool_size > 1024*1024
  then
    innodb_buffer_amount_chunks_per_instance++
  fi
}
determine_best_chunk_size
determine_amount_of_chunks
innodb_buffer_pool_size = innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size * innodb_buffer_amount_chunks_per_instance

What is the best setting?

In order to analyze the best setting you will need to know that there is a upper limit of 1000 chunks. In our example with 16 instances, we can have no more than 62 chunks per instance.

Another thing to consider is what each chunk represents in percentage terms. Continuing with the example, each chunk per instance represent 1.61%, which means that we can increase or decrease the complete buffer pool size in multiples of this percentage.

From a management point of view, I think that you might want to consider at least a range of 2% to 5% to increase or decrease the buffer. I performed some tests to see the impact of having small chunks and I found no issues but this is something that needs to be thoroughly tested.

The post Chunk Change: InnoDB Buffer Pool Resizing appeared first on Percona Database Performance Blog.

by David Ducos at June 19, 2018 06:02 PM

Webinar Weds 20/6: Percona XtraDB Cluster 5.7 Tutorial Part 2

webinar Percona XtraDB Cluster

Including setting up Percona XtraDB Cluster with ProxySQL and PMM

webinar Percona XtraDB ClusterPlease join Percona’s Architect, Tibi Köröcz as he presents Percona XtraDB Cluster 5.7 Tutorial Part 2 on Wednesday, June 20th, 2018, at 7:00 am PDT (UTC-7) / 10:00 am EDT (UTC-4).

 

Never used Percona XtraDB Cluster before? This is the webinar for you! In this 45-minute webinar, we will introduce you to a fully functional Percona XtraDB Cluster.

This webinar will show you how to install Percona XtraDB Cluster with ProxySQL, and monitor it with Percona Monitoring and Management (PMM).

We will also cover topics like bootstrap, IST, SST, certification, common-failure situations and online schema changes.

After this webinar, you will have enough knowledge to set up a working Percona XtraDB Cluster with ProxySQL, in order to meet your high availability requirements.

You can see part one of this series here: Percona XtraDB Cluster 5.7 Tutorial Part 1

Register Now!

Tibor Köröcz

Architect

ProxySQL for Connection Pooling

Tibi joined Percona in 2015 as a Consultant. Before joining Percona, among many other things, he worked at the world’s largest car hire booking service as a Senior Database Engineer. He enjoys trying and working with the latest technologies and applications which can help or work with MySQL together. In his spare time he likes to spend time with his friends, travel around the world and play ultimate frisbee.

 

The post Webinar Weds 20/6: Percona XtraDB Cluster 5.7 Tutorial Part 2 appeared first on Percona Database Performance Blog.

by Tibor Korocz at June 19, 2018 12:18 PM

Percona Database Performance Blog Commenting Issues

Percona Database Performance Blog Comments

We are experiencing an intermittent commenting problem on the Percona Database Performance Blog.

At Percona, part of our purpose is to engage with the open source database community. A big part of this engagement is the Percona Database Performance Blog, and the participation of the community in reading and commenting. We appreciate your interest and contributions.

Currently, we are experiencing an intermittent problem with comments on the blog. Some users are unable to post comments and can receive an error message similar to the following:

Percona Blog Error MsgWe are working on correcting the issue, and apologize if it affects you. If you have a comment that you want to make on a specific blog and this issue affects you, you can also leave comments on Twitter, Facebook or LinkedIn – all blog posts are socialized on those platforms.

Thanks for your patience.

The post Percona Database Performance Blog Commenting Issues appeared first on Percona Database Performance Blog.

by Dave Avery at June 19, 2018 02:28 AM

June 18, 2018

MariaDB Foundation

MariaDB 10.1.34 and latest MariaDB Connectors now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.1.34, the latest stable release in the MariaDB 10.1 series, as well as MariaDB Connector/C 3.0.5, MariaDB Connector/C 2.3.6, MariaDB Connector/J 2.2.5, MariaDB Connector/J 1.7.4, MariaDB Connector/ODBC 3.0.5 and MariaDB Connector/ODBC 2.0.17, the latest stable MariaDB Connector releases. See the release notes and changelogs […]

The post MariaDB 10.1.34 and latest MariaDB Connectors now available appeared first on MariaDB.org.

by Ian Gilfillan at June 18, 2018 02:15 PM

MariaDB AB

MariaDB Server 10.1.34 and updated Connectors now available

MariaDB Server 10.1.34 and updated Connectors now available dbart Mon, 06/18/2018 - 09:51

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.1.34 and updated MariaDB Connectors. See the release notes and changelogs for details and visit mariadb.com/downloads to download.

Download MariaDB Server 10.1.34

Release Notes Changelog What is MariaDB 10.1?


Download MariaDB Connectors

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.1.34 and updated MariaDB Connectors. See the release notes and changelogs for details.

Login or Register to post comments

by dbart at June 18, 2018 01:51 PM

Peter Zaitsev

Webinar Tues 19/6: MySQL: Scaling and High Availability – Production Experience from the Last Decade(s)

scale high availability

scale high availability
Please join Percona’s CEO, Peter Zaitsev as he presents MySQL: Scaling and High Availability – Production Experience Over the Last Decade(s) on Tuesday, June 19th, 2018 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

 

Percona is known as the MySQL performance experts. With over 4,000 customers, we’ve studied, mastered and executed many different ways of scaling applications. Percona can help ensure your application is highly available. Come learn from our playbook, and leave this talk knowing your MySQL database will run faster and more optimized than before.

Register Now

About Peter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

The post Webinar Tues 19/6: MySQL: Scaling and High Availability – Production Experience from the Last Decade(s) appeared first on Percona Database Performance Blog.

by Peter Zaitsev at June 18, 2018 11:02 AM

June 15, 2018

Peter Zaitsev

This Week in Data with Colin Charles 42: Security Focus on Redis and Docker a Timely Reminder to Stay Alert

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Much of last week, there was a lot of talk around this article: New research shows 75% of ‘open’ Redis servers infected. It turns out, it helps that one should always read beyond the headlines because they tend to be more sensationalist than you would expect. From the author of Redis, I highly recommend reading Clarifications on the Incapsula Redis security report, because it turns out that in this case, it is beyond the headline. The content is also suspect. Antirez had to write this to help the press (we totally need to help keep reportage accurate).

Not to depart from the Redis world just yet, but Antirez also had some collaboration with the Apple Information Security Team with regards to the Redis Lua subsystem. The details are pretty interesting as documented in Redis Lua scripting: several security vulnerabilities fixed because you’ll note that the Alibaba team also found some other issues. Antirez also ensured that the Redis cloud providers (notably: Redis Labs, Amazon, Alibaba, Microsoft, Google, Heroku, Open Redis and Redis Green) got notified first (and in the comments, compose.io was missing, but now added to the list). I do not know if Linux distributions were also informed, but they will probably be rolling out updates soon.

In the “be careful where you get your software” department: some criminals have figured out they could host some crypto-currency mining software that you would get pre-installed if you used their Docker containers. They’ve apparently made over $90,000. It is good to note that the Backdoored images downloaded 5 million times finally removed from Docker Hub. This, however, was up on the Docker Hub for ten months and they managed to get over 5 million downloads across 17 images. Know what images you are pulling. Maybe this is again more reason for software providers to run their own registries?

James Turnbull is out with a new book: Monitoring with Prometheus. It just got released, I’ve grabbed it, but a review will come shortly. He’s managed all this while pulling off what seems to be yet another great O’Reilly Velocity San Jose Conference.

Releases

A quiet week on this front.

Link List

  • INPLACE upgrade from MySQL 5.7 to MySQL 8.0
  • PostgreSQL relevant: What’s is the difference between streaming replication vs hot standby vs warm standby ?
  • A new paper on Amazon Aurora is out: Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes. It was presented at SIGMOD 2018, and an abstract: “One of the more novel differences between Aurora and other relational databases is how it pushes redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. Doing so reduces networking traffic, avoids checkpoints and crash recovery, enables failovers to replicas without loss of data, and enables fault-tolerant storage that heals without database involvement. Traditional implementations that leverage distributed storage would use distributed consensus algorithms for commits, reads, replication, and membership changes and amplify cost of underlying storage.” Aurora, as you know, avoids distributed consensus under most circumstances. Short 8-page read.
  • Dormando is blogging again, and this was of particular interest — Caching beyond RAM: the case for NVMe. This is done in the context of memcached, which I am certain many use.
  • It is particularly heartening to note that not only does MongoDB use Linkbench for some of their performance testing, they’re also contributing to making it better via a pull request.

Industry Updates

Trying something new here… To cover fundraising, and people on the move in the database industry.

  • Kenny Gorman — who has been on the program committee for several Percona Live conferences, and spoken at the event multiple times before — is the founder and CEO of Eventador, a stream-processing as a service company built on Apache Kafka and Apache Flink, has just raised $3.8 million in funding to fuel their growth. They are also naturally spending this on hiring. The full press release.
  • Jimmy Guerrero (formerly of MySQL and InfluxDB) is now VP Marketing & Community at YugaByte DB. YugaByte was covered in column 13 as having raised $8 million in November 2017.

Upcoming appearances

  • DataOps Barcelona – Barcelona, Spain – June 21-22, 2018 – code dataopsbcn50 gets you a discount
  • OSCON – Portland, Oregon, USA – July 16-19, 2018
  • Percona webinar on Maria Server 10.3 – June 26, 2018

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 42: Security Focus on Redis and Docker a Timely Reminder to Stay Alert appeared first on Percona Database Performance Blog.

by Colin Charles at June 15, 2018 04:18 PM

Tuning PostgreSQL for sysbench-tpcc

PostgreSQL benchmark

tuning PostgreSQL performance for sysbench-tpccPercona has a long tradition of performance investigation and benchmarking. Peter Zaitsev, CEO and Vadim Tkachenko, CTO, led their crew into a series of experiments with MySQL in this space. The discussion that always follows on the results achieved is well known and praised even by the PostgreSQL community. So when Avi joined the team and settled at Percona just enough to get acquainted with my colleagues, sure enough one of the first questions they asked him was: “did you know sysbench-tpcc also works with PostgreSQL now ?!“. 

sysbench

sysbench is “a scriptable multi-threaded benchmark tool based on LuaJIT (…) most frequently used for database benchmarks“, created and maintained by Alexey Kopytov. It’s been around for a long time now and has been a main reference for MySQL benchmarking since its inception. One of the favorites of Netflix’ Brendan Gregg, we now know. You may remember Sveta Smirnova and Alexander Korotkov’s report on their experiments in Millions of Queries per Second: PostgreSQL and MySQL’s Peaceful Battle at Today’s Demanding Workloads here. In fact, that post may serve as a nice prelude for the tests we want to show you today. It provides a good starting point as a MySQL vs PostgreSQL performance comparison.

The idea behind Sveta and Alexander’s experiments was “to provide an honest comparison for the two popular RDBMSs“, MySQL and PostgreSQL, using “the same tool, under the same challenging workloads and using the same configuration parameters (where possible)“. Since neither pgbench nor sysbench would work effectively with MySQL and PostgreSQL for both writes and reads they attempted to port pgbench‘s workload as a sysbench benchmark. 

sysbench-tpcc

More recently, Vadim came up with an implementation of the famous TPC-C workload benchmark for sysbench, sysbench-tpcc. He has since published a series of tests using Percona Server and MySQL, and worked to make it compatible with PostgreSQL too. For real now, hence the request that awaited us.

Our goal this time was less ambitious than Sveta and Alexander’s. We wanted to show you how we setup PostgreSQL to perform optimally for sysbench-tpcc, highlighting the settings we tuned the most to accomplish this. We ran our tests on the same box used by Vadim in his recent experiments with Percona Server for MySQL and MySQL.

A valid benchmark – benchmark rules

Before we present our results we shall note there are several ways to speed up database performance. You may for example disable full_page_writes, which would make a server crash unsafe, and use a minimalistic wal_level mode, which would block replication capability. These would speed things up but at the expense of reliability, making the server inappropriate for production usage.

For our benchmarks, we made sure we had all the necessary parameters in place to satisfy the following:

  1. ACID Compliance
  2. Point-in-time-recovery
  3. WALs usable by Replica/Slave for Replication
  4. Crash Recovery
  5. Frequent Checkpointing to reduce time for Crash Recovery
  6. Autovacuum

When we initially prepared sysbench-tpcc with PostgreSQL 10.3 the database size was 118 GB. By the time we completed the test, i.e. after 36000 seconds, the DB size had grown up to 335 GB. We have a total of “only” 256 GB of memory available in this server, however, based on the observations from pg_stat_database, pg_statio_user_tables and pg_statio_user_indexes 99.7% of the blocks were always in-memory:

postgres=# select ((blks_hit)*100.00)/(blks_hit+blks_read) AS “perc_mem_hit” from pg_stat_database where datname like ‘sbtest’;
   perc_mem_hit
---------------------
99.7267224322546
(1 row)

Hence, we consider it to be an in-memory workload with the whole active data set in RAM. In this post we explain how we tuned our PostgreSQL Instance for an in-memory workload, as was the case here.

Preparing the database before running sysbench

In order to run a sysbench-tpcc, we must first prepare the database to load some data. In our case, as mentioned above, this initial step resulted in a 118 GB database:

postgres=# select datname, pg_size_pretty(pg_database_size(datname)) as "DB_Size" from pg_stat_database where datname = 'sbtest';
 datname | DB_Size
---------+---------
 sbtest  | 118 GB
(1 row)

This may change depending on the arguments used. Here is the actual command we used to prepare the PostgreSQL Database for sysbench-tpcc:

$ ./tpcc.lua --pgsql-user=postgres --pgsql-db=sbtest --time=120 --threads=56 --report-interval=1 --tables=10 --scale=100 --use_fk=0  --trx_level=RC --db-driver=pgsql prepare

While we were loading the data, we wanted to see if we could speed-up the process. Here’s the customized PostgreSQL settings we used, some of them directly targeted to accelerate the data load:

shared_buffers = 192GB
maintenance_work_mem = '20GB'
wal_level = 'minimal'
autovacuum = 'OFF'
wal_compression = 'ON'
max_wal_size = '20GB'
checkpoint_timeout = '1h'
checkpoint_completion_target = '0.9'
random_page_cost = 1
max_wal_senders = 0
full_page_writes = ON
synchronous_commit = ON

We’ll discuss most of these parameters in the sections that follow, but we would like to highlight two of them here. We increased maintenance_work_mem to speed  up index creation and max_wal_size to delay checkpointing further, but not too much — this is a write-intensive phase after all. Using these parameters it took us 33 minutes to complete the prepare stage compared with 55 minutes when using the default parameters. 

If you are not concerned about crash recovery or ACID, you could turn off full_page_writes, fsync and synchrnous_commit. That would speed up the data load much more. 

Running a manual VACUUM ANALYZE after sysbench-tpcc’s initial prepare stage

Once we had prepared the database, as it is a newly created DB Instance, we ran a manual VACUUM ANALYZE on the database (in parallel jobs) using the command below. We employed all the 56 vCPUs available in the server since there was nothing else running in the machine:

$ /usr/lib/postgresql/10/bin/vacuumdb -j 56 -d sbtest -z

Having run a vacuum for the entire database we restarted PostgreSQL and cleared the OS cache before executing the benchmark in “run” mode. We repeated this process after each round.

First attempt with sysbench-tpcc

When we ran sysbench-tpcc for the first time, we observed a resulting TPS of 1978.48 for PostgreSQL with the server not properly tuned, running with default settings. We used the following command to run sysbench-tpcc for PostgreSQL for 10 hours (or 36000 seconds) for all rounds:

./tpcc.lua --pgsql-user=postgres --pgsql-db=sbtest --time=36000 --threads=56 --report-interval=1 --tables=10 --scale=100 --use_fk=0  --trx_level=RC --pgsql-password=oracle --db-driver=pgsql run

PostgreSQL performance tuning of parameters for sysbench-tpcc (crash safe)

After getting an initial idea of how PostgreSQL performed with the default settings and the actual demands of the sysbench-tpcc workload, we began making progressive adjustments in the settings, observing how they impacted the server’s performance. After several rounds we came up with the following list of parameters (all of these satisfy ACID properties):

shared_buffers = '192GB'
work_mem = '4MB'
random_page_cost = '1'
maintenance_work_mem = '2GB'
wal_level = 'replica'
max_wal_senders = '3'
synchronous_commit = 'on'
seq_page_cost = '1'
max_wal_size = '100GB'
checkpoint_timeout = '1h'
synchronous_commit = 'on'
checkpoint_completion_target = '0.9'
autovacuum_vacuum_scale_factor = '0.4'
effective_cache_size = '200GB'
min_wal_size = '1GB'
bgwriter_lru_maxpages = '1000'
bgwriter_lru_multiplier = '10.0'
logging_collector = 'ON'
wal_compression = 'ON'
log_checkpoints = 'ON'
archive_mode = 'ON'
full_page_writes = 'ON'
fsync = 'ON'

Let’s discuss our reasoning behind the tuning of the most important settings:

shared_buffers

Defines the amount of memory PostgreSQL uses for shared memory buffers. It’s arguably its most important setting, often compared (for better or worse) to MySQL’s innodb_buffer_pool_size. The biggest difference, if we dare to compare shared_buffers to the Buffer Pool, is that InnoDB bypasses the OS cache to directly access (read and write) data in the underlying storage subsystem whereas PostgreSQL do not.

Does this mean PostgreSQL does “double caching” by first loading data from disk into the OS cache to then make a copy of these pages into the shared_buffers area? Yes.

Does this “double caching” makes PostgreSQL inferior to InnoDB and MySQL in terms of memory management? No. We’ll discuss why that is the case in a follow up blog post. For now it suffice to say the actual performance depends on the workload (mix of reads and writes), the size of the “hot data” (the portion of the dataset that is most accessed and modified) and how often checkpointing takes place.

How we chose the setting for shared_buffers to optimize PostgreSQL performance

Due to these factors, the documented suggested formula of setting shared_buffers to 25% of RAM or the magic number of “8GB” is hardly ideal. What seems to be good reasoning, though, is this:

  • If you can fit the whole of your “hot data” in memory, then dedicating most of your memory to shared_buffers pays off nicely, making PostgreSQL behave as close to an in-memory database as possible.
  • If the size of your “hot data” surpasses the amount of memory you have available in the server, then you’re probably better off working with a much smaller shared_buffers area and relying more on the OS cache.

For this benchmark, considering the options we used, we found that dedicating 75% of all the available memory to shared_buffers is ideal. It is enough to fit the entire “hot data” and still leave sufficient memory for the OS to operate, handle connections and everything else.

work_mem

This setting defines the amount of memory that can be used by each query (not session) for internal sort operations (such as ORDER BY and DISTINCT), and hash tables (such as when doing hash-based aggregation). Beyond this, PostgreSQL moves the data into temporary disk files. The challenge is usually finding a good balance here. We want to avoid the use of temporary disk files, which slow down query completion and in turn may cause contention. But we don’t want to over-commit memory, which could even lead to OOM; working with high values for work_mem may be destructive when it is not really needed.

We analyzed the workload produced by sysbench-tpcc and found with some surprise that work_mem doesn’t play a role here, considering the queries that were executed. So we kept the default value of 4MB. Please note that this is seldom the case in production workloads, so it is important to always keep an eye on that parameter.

random_page_cost

This setting stipulates the cost that a non-sequentially-fetched disk page would have, and directly affects the query planner’s decisions. Going with a conservative value is particularly important when using high latency storage, such as spinning disks. This wasn’t our case, hence we could afford to equalize random_page_cost to seq_page_cost. So, we set this parameter to 1 as well, down from the default value of 4.

wal_level, max_wal_senders and archive_mode

To set up streaming replication wal_level needs to be set to at least “replica” and archive_mode must be enabled. This means the amount of WAL data produced increases significantly compared to when using default settings for these parameters, which in turn impacts IO. However, we considered these with a production environment in mind.

wal_compression

For this workload, we observed total WALs produced of size 3359 GB with wal_compression disabled and 1962 GB with wal_compression. We enabled wal_compression to reduce IO — the amount (and, most importantly, the rate) of WAL files being written to disk — at the expense of some additional CPU cycles. This proved to be very effective in our case as we had a surplus of CPU available.

checkpoint_timeout, checkpoint_completion_target and max_wal_size

We set the checkpoint_timeout to 1 hour and checkpoint_completion_target to 0.9. This means a checkpoint is forced every 1 hour and it has 90% of the time before the next checkpoint to spread the writes. However, a checkpoint is also forced when max_wal_size of WAL’s have been generated. With these parameters for a sysbench-tpcc workload, we saw that there were 3 to 4 checkpoints every 1 hour. This is especially because of the amount of WALs being generated.

In production environments we would always recommend you perform a manual CHECKPOINT before shutting down PostgreSQL in order to allow for a faster restart (recovery) time. In this context, issuing a manual CHECKPOINT took us between 1 and 2 minutes, after which we were able to restart PostgreSQL in just about 4 seconds. Please note that in our testing environment, taking time to restart PostgreSQL was not a concern, so working with this checkpoint rate benefited us. However, if you cannot afford a couple of minutes for crash recovery it is always suggested to force checkpointing to take place more often, even at the cost of some degraded performance.

full_page_writes, fsync and synchronous_commit

We set all of these parameters to ON to satisfy ACID properties.

autovacuum

We enabled autovacuum and other vacuum settings to ensure vacuum is being performed in the backend. We will discuss the importance of maintaining autovacuum enabled in a production environment, as well as the danger of doing otherwise, in a separate post. 

Amount of WAL’s (Transaction Logs) generated after 10 hours of sysbench-tpcc

Before we start to discuss the numbers it is important to highlight that we enabled wal_compression before starting sysbench. As we mentioned above, the amount of WALs generated with wal_compression set to OFF was more than twice the amount of WALs generated when having compression enabled. We observed that enabling wal_compression resulted in an increase in TPS of 21%. No wonder, the production of WALs has an important impact on IO: so much so that it is very common to find PostgreSQL servers with a dedicated storage for WALs only. Thus, it is important to highlight the fact wal_compression may benefit write-intensive workloads by sparing IO at the expense of additional CPU usage.

To find out the total amount of WALs generated after 10 Hours, we took note at the WAL offset from before we started the test and after the test completed:

WAL Offset before starting the sysbench-tpcc ⇒ 2C/860000D0
WAL Offset after 10 hours of sysbench-tpcc   ⇒ 217/14A49C50

and subtracted one from the other using pg_wal_lsn_diff, as follows:

postgres=# SELECT pg_size_pretty(pg_wal_lsn_diff('217/14A49C50','2C/860000D0'));
pg_size_pretty
----------------
1962 GB
(1 row)

1962 GB of WALs is a fairly big amount of transaction logs produced over 10 hours, considering we had enabled wal_compression .

We contemplated making use of a separate disk to store WALs to find out by how much more a dedicated storage for transaction logs would benefit overall performance. However, we wanted to keep using the same hardware Vadim had used for his previous tests, so decided against this.

Crash unsafe parameters

Setting full_page_writes, fsync and synchronous_commit to OFF may speed up the performance but it is always crash unsafe unless we have enough backup in place to consider these needs. For example, if you are using a COW FileSystem with Journaling, you may be fine with full_page_writes set to OFF. This may not be true 100% of the time though.

However, we still want to share the results with the crash unsafe parameters mentioned in the paragraph above as a reference.

Results after 10 Hours of sysbench-tpcc for PostgreSQL with default, crash safe and crash unsafe parameters

Here are the final numbers we obtained after running sysbench-tpcc for 10 hours considering each of the scenarios above:

Parameters

TPS

Default / Untuned

1978.48

Tuned (crash safe)

5736.66

Tuned (crash unsafe)

7881.72

Did we expect to get these numbers? Yes and no.

Certainly we expected a properly tuned server would outperform one running with default settings considerably but we can’t say we expected it to be almost three times better (2.899). With PostgreSQL making use of the OS cache it is not always the case that tuning shared_buffers in particular will make such a dramatic difference. By comparison, tuning MySQL’s InnoDB Buffer Pool almost always makes a difference. For PostgreSQL high performance it depends on the workload. In this case for sysbench-tpcc benchmarks, tuning shared_buffers definitely makes a difference.

On the other hand experiencing an additional order of magnitude faster (4x), when using crash unsafe settings, was not much of a surprise.

Here’s an alternative view of the results of our PostgreSQL insert performance tuning benchmarks:

Sysbench-TPCC with PostgreSQL

What did you think about this experiment? Please let us know in the comments section below and let’s get the conversation going.

Hardware spec

  • Supermicro server:
    • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
    • Memory: 256GB of RAM
    • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
    • Filesystem: ext4/xfs
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic
  • PostgreSQL: version 10.3
  • sysbench-tpcc: https://github.com/Percona-Lab/sysbench-tpcc

The post Tuning PostgreSQL for sysbench-tpcc appeared first on Percona Database Performance Blog.

by Avinash Vallarapu at June 15, 2018 12:30 PM

June 14, 2018

Peter Zaitsev

Percona Monitoring and Management: Look After Your pmm-data Container

looking after pmm-datamcontainers

looking after pmm-datamcontainersIf you have already deployed PMM server using Docker you might be aware that we begin by creating a special container for persistent PMM data. In this post, I aim to explain the importance of pmm-data container when you deploy PMM server with Docker. By the end of this post, you will have a fair idea of why this Docker container is needed.

Percona Monitoring and Management (PMM) is a free and open-source solution for database troubleshooting and performance optimization that you can run in your own environment. It provides time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

What is the purpose of pmm-data?

Well, as simple as its name suggests, when PMM Server runs via Docker its data is stored in the pmm-data container. It’s a dedicated data only container which you create with bind mounts using -v i.e data volumes for holding persistent PMM data. We use pmm-data to compartmentalize the persistent data so you can more easily backup up and move data consistently across instances or containers. It acts as a single access point from which other running containers (in this case pmm-server) can access data volumes.

pmm-data container does not run, but data from the container is used by pmm-server to build graphs. PMM Server is the core of PMM that aggregates collected data and presents it in the form of tables, dashboards, and graphs in a web interface.

Why do we use docker create ?

The

docker create
  command instructs the Docker daemon to create a writable container layer over the docker image. When you execute
docker create
  using the steps shown, it will create a Docker container named pmm-data and initialize data volumes using the -v flag in conjunction with the create command. (e.g. /opt/prometheus/data).

Option -v is used multiple times in current versions of PMM to mount multiple data volumes. This allows you to create the data volume containers, and then use them from another container i.e pmm-server. We do not want to run the pmm-data container, but only to create it. nb: the number of data volumes bind mounted may change with versions of PMM

$ docker create \
   -v /opt/prometheus/data
   -v /opt/consul-data \
   -v /var/lib/mysql \
   -v /var/lib/grafana \
   --name pmm-data \
   percona/pmm-server:latest /bin/true

Make sure that the data volumes you initialize with the -v option match those given in the example. PMM Server expects you to have bind mounted those directories exactly as demonstrated in the deployment steps. For using different mount points for PMM deployment, please refer to this blog post. Data volumes are very useful as once designated and created you can share them and be include them as part of other containers. If you use -v or –volume to bind-mount a file or directory that does not yet exist on the Docker host, -v creates the endpoint for you. It is always created as a directory. Data in the pmm-data volume are actually hosted on the host’s filesystem.

Why does pmm-data not run ?

As we used

docker create
  container and not
docker run
  for pmm-data, this container does not run. It simply exists to make sure you retain all PMM data when you upgrade to a newer PMM Server image. Data volumes bind mounted on pmm-data container are shared to the running pmm-server container as the
--volumes-from
  option is used for pmm-server launch. Here we persisted data using Docker without binding it to the pmm-server by storing files in the host machine. As long as pmm-data exists, the data exists.

You can stop, destroy, or replace a container. When a non-running container is using a volume, the volume is still available to Docker and is not removed automatically. You can easily replace the pmm-server of the running container by a newer version without any impact or loss of data. For that reason, because of the need to store persistent data, we do it in a data volume. In our case, pmm-data container does not write to the same volumes as it could cause possible corruption.

Why can’t I remove pmm-data container ? What happens if I delete it ?

Removing pmm-data container results in the loss of collected metrics data.

If you remove containers that mount volumes, including the initial pmm-server container, or any subsequent containers mounted, such as pmm-server-2, you do not delete the volumes. This allows you to upgrade — or effectively migrate — data volumes between containers. Your data container might be based on an old version of container, with known security problems. It is not a big problem since it doesn’t actually run anything, but it doesn’t feel right.

As noted earlier, pmm-data stores metrics data as per the retention. You should not remove or recreate pmm-data container unless you need to wipe out all PMM data and start again. To delete the volume from disk, you must explicitly call docker rm -v against the container with a reference to the volume.

Some do’s and don’ts

  • Allocate enough disk space on the host for pmm-data to retain data.
    By default, Prometheus stores time-series data for 30 days, and QAN stores query data for 8 days.
  • Manage data retention appropriately as per your disk space available.
    You can take backup of pmm-data by extracting data from container to avoid data-loss in any situation by using steps mentioned here.

In case of any issues with metrics, here’s a good blog post regarding troubleshooting.

The post Percona Monitoring and Management: Look After Your pmm-data Container appeared first on Percona Database Performance Blog.

by Siddhant Sawant at June 14, 2018 12:58 PM

What is the Top Cause of Application Downtime Today?

Application outages lurking monster

Application outages lurking monsterI frequently talk to our customer base about what keeps them up at night. While there is a large variance of answers, they tend to fall into one of two categories. The first is the conditioned fear of some monster lurking behind the scenes that could pounce at any time. The second, of course, is the actual monster of downtime on a critical system. Ask most tech folks and they will tell you outages seem to only happen late at night or early in the morning. And that they do keep them up.

Entire companies and product lines have been built around providing those in the IT world with some ability to sleep at night. Modern enterprises have spent millions to mitigate the risk and prevent their businesses from having a really bad day because of an outage. Cloud providers are attuned to the downtime dilemma and spend lots of time, money, and effort to build in redundancy and make “High Availability” (HA) as easy as possible. The frequency of “hardware” or server issues continues to dwindle.

Where does the downtime issue start?

In my discussions, most companies I have talked to say their number one cause of outages and customer interruptions is ultimately related to the deployment of new or upgraded code. Often I hear the operations team has little or no involvement with an application until it’s put into production. It is a bit ironic that this is also the area where companies tend to drastically under-invest. They opt instead to invest in ways to “Scale Out or Up”. Or perhaps how to survive asteroids hitting two out three of their data centers.

Failing over broken or slow code from one server to another does not fix it. Adding more servers to distribute the load can mitigate a problem, but can also escalate the cost dramatically. In most cases, the solutions they apply don’t address the primary cause of the problems.

While there are some fantastic tools out there that can help with getting better visibility into code level issues — such as New Relic, AppDynamics and others — the real problem is that these often end up being used to diagnose issues after they have appeared in production. Most companies carry out some amount of testing before releasing code, but typically it is a fraction of what they should be doing. Working for a company that specializes in open source databases, we get a lot of calls on issues that have prevented companies’ end users from using critical applications. Many of these problems are fixable before they cost a loss of revenue and reputation.

I think it’s time technology companies start to rethink our QA, Testing, and Pre-Deployment requirements. How much time, effort, and money can we save if we catch these “monsters” before they make it into production?

Not to mention how much better our operations team will sleep . . .

The post What is the Top Cause of Application Downtime Today? appeared first on Percona Database Performance Blog.

by Matt Yonkovit at June 14, 2018 09:43 AM

June 13, 2018

Peter Zaitsev

Zone Based Sharding in MongoDB

MongoDB shard zones

MongoDB shard zonesIn this blog post, we will discuss about how to use zone based sharding to deploy a sharded MongoDB cluster in a customized manner so that the queries and data will be redirected per geographical groupings. This feature of MongoDB is a part of its Data Center Awareness, that allows queries to be routed to particular MongoDB deployments considering physical locations or configurations of mongod instances.

Before moving on, let’s have an overview of this feature. You might already have some questions about zone based sharding. Was it recently introduced? If zone-based sharding is something we should use, then what about tag-aware sharding?

MongoDB supported tag-aware sharding from even the initial versions of MongoDB. This means tagging a range of shard keys values, associating that range with a shard, and redirecting operations to that specific tagged shard. This tag-aware sharding, since version 3.4, is referred to as ZONES. So, the only change is its name, and this is the reason sh.addShardTag(shard, tag) method is being used.

How it works

  1. With the help of a shard key, MongoDB allows you to create zones of sharded data – also known as shard zones.
  2. Each zone can be associated with one or more shards.
  3. Similarly, a shard can associate with any number of non-conflicting zones.
  4. MongoDB migrates chunks to the zone range in the selected shards.
  5. MongoDB routes read and write to a particular zone range that resides in particular shards.

Useful for what kind of deployments/applications?

  1. In cases where data needs to be routed to a particular shard due to some hardware configuration restrictions.
  2. Zones can be useful if there is the need to isolate specific data to a particular shard. For example, in the case of GDPR compliance that requires businesses to protect data and privacy for an individual within the EU.
  3. If an application is being used geographically and you want a query to route to the nearest shards for both reads and writes.

Let’s consider a Scenario

Consider the scenario of a school where students are experts in Biology, but most students are experts in Maths. So we have more data for the maths students compare to Biology students. In this example, deployment requires that Maths students data should route to the shard with the better configuration for a large amount of data. Both read and write will be served by specific shards.  All the Biology students will be served by another shard. To implement this, we will add a tag to deploy the zones to the shards.

For this scenario we have an environment with:

DB: “school”

Collection: “students”

Fields: “sId”, “subject”, “marks” and so on..

Indexed Fields: “subject” and “sId”

We enable sharding:

sh.enableSharding("school")

And create a shardkey: “subject” and “sId” 

sh.shardCollection("school.students", {subject: 1, sId: 1});

We have two shards in our test environment

shards:

{  "_id" : "shard0000",  "host" : "127.0.0.1:27001",  "state" : 1 }
{  "_id" : "shard0001",  "host" : "127.0.0.1:27002",  "state" : 1 }

Zone Deployment

1) Disable balancer

To prevent migration of the chunks across the cluster, disable the balancer for the “students” collection:

mongos> sh.disableBalancing("school.students")
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Before proceeding further make sure the balancer is not running. It is not a mandatory process, but it is always a good practice to make sure no migration of chunks takes place while configuring zones

mongos> sh.isBalancerRunning()
false

2) Add shard to the zone

A zone can be associated with a particular shard in the form of a tag, using the sh.addShardTag(), so a tag will be added to each shard. Here we are considering two zones so the tags “MATHS” and “BIOLOGY” need to be added.

mongos> sh.addShardTag( "shard0000" , "MATHS");
{ "ok" : 1 }
mongos> sh.addShardTag( "shard0001" , "BIOLOGY");
{ "ok" : 1 }

We can see zones are assigned in the form of tags as required against each shard.

mongos> sh.status()
 shards:
        {  "_id" : "shard0000",  "host" : "127.0.0.1:27001",  "state" : 1,  "tags" : [ "MATHS" ] }
        {  "_id" : "shard0001",  "host" : "127.0.0.1:27002",  "state" : 1,  "tags" : [ "BIOLOGY" ] }

3) Define ranges for each zone

Each zone covers one or more ranges of shard key values. Note: each range a zone covers is always inclusive of its lower boundary and exclusive of its upper boundary.

mongos> sh.addTagRange(
	"school.students",
	{ "subject" : "maths", "sId" : MinKey},
	{ "subject" : "maths", "sId" : MaxKey},
	"MATHS"
)
{ "ok" : 1 }
mongos> sh.addTagRange(
	"school.students",
	{ "subject" : "biology", "sId" : MinKey},
	{ "subject" : "biology", "sId" : MaxKey},
"BIOLOGY"
)
{ "ok" : 1 }

4) Enable balancer

Now enable the balancer so the chunks will migrate across the shards as per the requirement and all the read and write queries will be routed to the particular shards.

mongos> sh.enableBalancing("school.students")
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
mongos> sh.isBalancerRunning()
true

Let’s check how documents get routed as per the tags:

We have inserted 6 documents, 4 documents with “subject”:”maths” and 2 documents with “subject”:”biology”

mongos> db.students.find({"subject":"maths"}).count()
4
mongos> db.students.find({"subject":"biology"}).count()
2

Checking the shard distribution for the students collection:

mongos> db.students.getShardDistribution()
Shard shard0000 at 127.0.0.1:27003
data : 236B docs : 4 chunks : 4
estimated data per chunk : 59B
estimated docs per chunk : 1
Shard shard0001 at 127.0.0.1:27004
data : 122B docs : 2 chunks : 1
estimated data per chunk : 122B
estimated docs per chunk : 2

So in this test case, all the queries for the students collection have routed as per the tag used, with four documents inserted into shard0000 and two documents inserted to shard0001.

Any queries related to MATHS will route to shard0000 and queries related to BIOLOGY will route to shard0001, hence the load will be distributed as per the configuration of the shard, keeping the database performance optimized.

Sharding MongoDB using zones is a great feature provided by MongoDB. With the help of zones, data can be isolated to the specific shards. Or if we have any kind of hardware or configuration restrictions to the shards, it is a possible solution for routing the operations.

The post Zone Based Sharding in MongoDB appeared first on Percona Database Performance Blog.

by Aayushi Mangal at June 13, 2018 01:58 PM

Jean-Jerome Schmidt

ChatOps - Managing MySQL, MongoDB & PostgreSQL from Slack

What is ChatOps?

Nowadays, we make use of multiple communication channels to manage or receive information from our systems, such as email, chat and applications among others. If we could centralize this in one or just a few different possible applications, and even better, if we could integrate it with tools that we currently use in our organization, we would be able to automate processes, improve our work dynamics and communication, having a clearer picture of the current state of our system. In many companies, Slack or other collaboration tools is becoming the centre and the heart of the development and ops teams.

What is ChatBot?

A chatbot is a program that simulates a conversation, receiving entries made by the user and returns answers based on its programming.

Some products have been developed with this technology, that allow us to perform administrative tasks, or keeps the team up to date on the current status of the systems.

This allows, among other things, to integrate the communication tools we use daily, with our systems.

CCBot - ClusterControl

CCBot is a chatbot that uses the ClusterControl APIs to manage and monitor your database clusters. You will be able to deploy new clusters or replication setups, keep your team up to date on the status of the databases as well as the status of any administrative jobs (e.g., backups or rolling upgrades). You can also restart failed nodes, add new ones, promote a slave to master, add load balancers, and so on. CCBot supports most of the major chat services like Slack, Flowdock and Hipchat.

CCBot is integrated with the s9s command line, so you have several commands to use with this tool.

ClusterControl Notifications via Slack

Note that you can use Slack to handle alarms and notifications from ClusterControl. Why? A chat room is a good place to discuss incidents. Seeing an actual alarm in a Slack channel makes it easy to discuss it with the team, because all team members actually know what is being discussed and can chime in.

The main difference between CCBot and the integration of notifications via Slack is that, with CCBot, the user initiates the communication via a specific command, generating a response from the system. For notifications, ClusterControl generates an event, for example, a message about a node failure. This event is then sent to the tool that we have integrated for our notifications, for example, Slack.

You can review this post on how to configure ClusterControl in order to send notifications to Slack.

After this, we can see ClusterControl notifications in our Slack:

ClusterControl Slack Integration
ClusterControl Slack Integration

CCBot Installation

To install CCBot, once we have installed ClusterControl, we must execute the following script:

$ /var/www/html/clustercontrol/app/tools/install-ccbot.sh

We select which adapter we want to use, in this blog, we will select Slack.

-- Supported Hubot Adapters --
1. slack
2. hipchat
3. flowdock
Select the hubot adapter to install [1-3]: 1

It will then ask us for some information, such as an email, a description, the name we will give to our bot, the port, the API token and the channel to which we want to add it.

? Owner (User <user@example.com>)
? Description (A simple helpful robot for your Company)
Enter your bot's name (ccbot):
Enter hubot's http events listening port (8081):
Enter your slack API token:
Enter your slack message room (general):

To obtain the API token, we must go to our Slack -> Apps (On the left side of our Slack window), we look for Hubot and select Install.

CCBot Hubot
CCBot Hubot

We enter the Username, which must match our bot name.

In the next window, we can see the API token to use.

CCBot API Token
CCBot API Token
Enter your slack API token: xoxb-111111111111-XXXXXXXXXXXXXXXXXXXXXXXX
CCBot installation completed!

Finally, to be able to use all the s9s command line functions with CCBot, we must create a user from ClusterControl:

$ s9s user --create --cmon-user=cmon --group=admins  --controller="https://localhost:9501" --generate-key cmon

For further information about how to manage users, please check the official documentation.

We can now use our CCBot from Slack.

Here we have some examples of commands:

$ s9s --help
CCBot Help
CCBot Help

With this command we can see the help for the s9s CLI.

$ s9s cluster --list --long
CCBot Cluster List
CCBot Cluster List

With this command we can see a list of our clusters.

$ s9s cluster --cluster-id=17 --stat
CCBot Cluster Stat
CCBot Cluster Stat

With this command we can see the stats of one cluster, in this case cluster id 17.

$ s9s node --list --long
CCBot Node List
CCBot Node List

With this command we can see a list of our nodes.

$ s9s job --list
CCBot Job List
CCBot Job List

With this command we can see a list of our jobs.

$ s9s backup --create --backup-method=mysqldump --cluster-id=16 --nodes=192.168.100.34:3306 --backup-directory=/backup
CCBot Backup
CCBot Backup

With this command we can create a backup with mysqldump, in the node 192.168.100.34. The backup will be saved in the /backup directory.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Now let's see some more complex examples:

$ s9s cluster --create --cluster-type=mysqlreplication --nodes="mysql1;mysql2" --vendor="percona" --provider-version="5.7" --template="my.cnf.repl57" --db-admin="root" --db-admin-passwd="root123" --os-user="root" --cluster-name="MySQL1"
CCBot Create Replication
CCBot Create Replication

With this command we can create a MySQL Master-Slave Replication with Percona for MySQL 5.7 version.

CCBot Check Replication Created
CCBot Check Replication Created

And we can check this new cluster.

In ClusterControl Topology View, we can check our current topology with one master and one slave node.

Topology View Replication 1
Topology View Replication 1
$ s9s cluster --add-node --nodes=mysql3 --cluster-id=24
CCBot Add Node
CCBot Add Node

With this command we can add a new slave in our current cluster.

Topology View Replication 2
Topology View Replication 2

And we can check our new topology in ClusterControl Topology View.

$ s9s cluster --add-node --cluster-id=24 --nodes="proxysql://proxysql"
CCBot Add ProxySQL
CCBot Add ProxySQL

With this command we can add a new ProxySQL node named "proxysql" in our current cluster.

Topology View Replication 3
Topology View Replication 3

And we can check our new topology in ClusterControl Topology View.

You can check the list of available commands in the documentation.
If we try to use CCBot from a Slack channel, we must add "@ccbot_name" at the beginning of our command:

@ccbot s9s backup --create --backup-method=xtrabackupfull --cluster-id=1 --nodes=10.0.0.5:3306 --backup-directory=/storage/backups

CCBot makes it easier for teams to manage their clusters in a collaborative way. It is fully integrated with the tools they use on a daily basis.

Note

If we have the following error when wanting to run the CCBot installer in our ClusterControl:

-bash: yo: command not found

We must update the version of nodejs package.

Conclusion

As we said previously, there are several ChatBot alternatives for different purposes, we can even create our own ChatBot, but as this technology facilitates our tasks and has several advantages that we mentioned at the beginning of this blog, not everything that shines is gold.

There is a very important detail to keep in mind - security. We must be very careful when using them, and take all the necessary precautions to know what we allow to do, in what way, at what moment, to whom and from where.

by Sebastian Insausti at June 13, 2018 01:04 PM

Peter Zaitsev

Webinar Thurs 6/14: MongoDB Backup and Recovery Field Guide

mongodb backup and recovery field guide

mongodb backup and recovery field guidePlease join Percona’s Sr. Technical Operations Architect, Tim Vaillancourt as he presents MongoDB Backup and Recovery Field Guide on Thursday, June 14, 2018, at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

This talk will cover backup and recovery solutions for MongoDB replica sets and clusters, focusing on online and low-impact solutions for production systems.

Register for the webinar

Tim Vaillancourt

Senior Technical Operations Architect

With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned.

Tim is based in Amsterdam, NL and enjoys traveling, coding and music. Prior to Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS.

Prior to moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

The post Webinar Thurs 6/14: MongoDB Backup and Recovery Field Guide appeared first on Percona Database Performance Blog.

by Tim Vaillancourt at June 13, 2018 08:04 AM

June 12, 2018

Oli Sennhauser

Select Hello World FromDual with MariaDB PL/SQL

MariaDB 10.3 was released GA a few weeks ago. One of the features which interests me most is the MariaDB Oracle PL/SQL compatibility mode.

So its time to try it out now...

Enabling Oracle PL/SQL in MariaDB

Oracle PL/SQL syntax is quite different from old MySQL/MariaDB SQL/PSM syntax. So the old MariaDB parser would through some errors without modification. The activation of the modification of the MariaDB PL/SQL parser is achieved by changing the sql_mode as follows:

mariadb> SET SESSION sql_mode=ORACLE;

or you can make this setting persistent in your my.cnf MariaDB configuration file:

[mysqld]

sql_mode = ORACLE

To verify if the sql_mode is already set you can use the following statement:

mariadb> pager grep --color -i oracle
PAGER set to 'grep --color -i oracle'
mariadb> SELECT @@sql_mode;
| PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE,ORACLE,NO_KEY_OPTIONS,NO_TABLE_OPTIONS,NO_FIELD_OPTIONS,NO_AUTO_CREATE_USER,SIMULTANEOUS_ASSIGNMENT |
mariadb> nopager

Nomen est omen

First of all I tried the function of the basic and fundamental table in Oracle, the DUAL table:

mariadb> SELECT * FROM dual;
ERROR 1096 (HY000): No tables used

Sad. :-( But this query on the dual table seems to work:

mariadb> SELECT 'Hello World!' FROM dual;
+--------------+
| Hello World! |
+--------------+
| Hello World! |
+--------------+

The second result looks much better. The first query should work as well but does not. We opened a bug at MariaDB without much hope that this bug will be fixed soon...

To get more info why MariaDB behaves like this I tried to investigate a bit more:

mariadb> SELECT table_schema, table_name
  FROM information_schema.tables
 WHERE table_name = 'dual';
Empty set (0.001 sec)

Hmmm. It seems to be implemented not as a real table... But normal usage of this table seems to work:

mariadb> SELECT CURRENT_TIMESTAMP() FROM dual;
+---------------------+
| current_timestamp() |
+---------------------+
| 2018-06-07 15:32:11 |
+---------------------+

If you rely heavily in your code on the dual table you can create it yourself. It is defined as follows:

"The DUAL table has one column, DUMMY, defined to be VARCHAR2(1), and contains one row with a value X."

If you want to create the dual table yourself here is the statement:

mariadb> CREATE TABLE `DUAL` (DUMMY VARCHAR2(1));
mariadb> INSERT INTO `DUAL` (DUMMY) VALUES ('X');

Anonymous PL/SQL block in MariaDB

To try some PL/SQL features out or to run a sequence of PL/SQL commands you can use anonymous blocks. Unfortunately MySQL SQL/PSM style delimiter seems still to be necessary.

It is recommended to use the DELIMITER /, then most of the Oracle examples will work straight out of the box...

DELIMITER /

BEGIN
  SELECT 'Hello world from MariaDB anonymous PL/SQL block!';
END;
/

DELIMITER ;

+--------------------------------------------------+
| Hello world from MariaDB anonymous PL/SQL block! |
+--------------------------------------------------+
| Hello world from MariaDB anonymous PL/SQL block! |
+--------------------------------------------------+

A simple PL/SQL style MariaDB Procedure

DELIMITER /

CREATE OR REPLACE PROCEDURE hello AS
BEGIN
  DECLARE
    vString VARCHAR2(255) := NULL;
  BEGIN
    SELECT 'Hello world from MariaDB PL/SQL Procedure!' INTO vString FROM dual;
    SELECT vString;
  END;
END hello;
/

BEGIN
  hello();
END;
/

DELIMITER ;

A simple PL/SQL style MariaDB Function

DELIMITER /

CREATE OR REPLACE FUNCTION hello RETURN VARCHAR2 DETERMINISTIC AS
BEGIN
  DECLARE
    vString VARCHAR2(255) := NULL;
  BEGIN
    SELECT 'Hello world from MariaDB PL/SQL Function!' INTO vString FROM dual;
    RETURN vString;
  END;
END hello;
/

DECLARE
  vString VARCHAR(255) := NULL;
BEGIN
  vString := hello();
  SELECT vString;
END;
/

DELIMITER ;

An PL/SQL package in MariaDB

Up to here there is nothing really new, just slightly different. But now let us try a PL/SQL package in MariaDB:

DELIMITER /

CREATE OR REPLACE PACKAGE hello AS
  -- must be delared as public!
  PROCEDURE helloWorldProcedure(pString VARCHAR2);
  FUNCTION helloWorldFunction(pString VARCHAR2) RETURN VARCHAR2;
END hello;
/

CREATE OR REPLACE PACKAGE BODY hello AS

  vString VARCHAR2(255) := NULL;

  -- was declared public in PACKAGE
  PROCEDURE helloWorldProcedure(pString VARCHAR2) AS
  BEGIN
    SELECT 'Hello world from MariaDB Package Procedure in ' || pString || '!' INTO vString FROM dual;
    SELECT vString;
  END;

  -- was declared public in PACKAGE
  FUNCTION helloWorldFunction(pString VARCHAR2) RETURN VARCHAR2 AS
  BEGIN
    SELECT 'Hello world from MariaDB Package Function in ' || pString || '!' INTO vString FROM dual;
    return vString;
  END;
BEGIN
  SELECT 'Package initialiser, called only once per connection!';
END hello;
/

DECLARE
  vString VARCHAR2(255) := NULL;
  -- CONSTANT seems to be not supported yet by MariaDB
  -- cString CONSTANT VARCHAR2(255) := 'anonymous block';
  cString VARCHAR2(255) := 'anonymous block';
BEGIN
  CALL hello.helloWorldProcedure(cString);
  SELECT hello.helloWorldFunction(cString) INTO vString;
  SELECT vString;
END;
/

DELIMITER ;

DBMS_OUTPUT package for MariaDB

An Oracle database contains over 200 PL/SQL packages. One of the most common one is the DBMS_OUTPUT package. In this package we can find the Procedure PUT_LINE.

This package/function has not been implemented yet by MariaDB so far. So we have to do it ourself:

DELIMITER /

CREATE OR REPLACE PACKAGE DBMS_OUTPUT AS
  PROCEDURE PUT_LINE(pString IN VARCHAR2);
END DBMS_OUTPUT;
/

CREATE OR REPLACE PACKAGE BODY DBMS_OUTPUT AS

  PROCEDURE PUT_LINE(pString IN VARCHAR2) AS
  BEGIN
    SELECT pString;
  END;
END DBMS_OUTPUT;
/

BEGIN
  DBMS_OUTPUT.PUT_LINE('Hello world from MariaDB DBMS_OUTPUT.PUT_LINE!');
END;
/

DELIMITER ;

The other Functions and Procedures have to be implemented later over time...

Now we can try to do all examples from Oracle sources!

by Shinguz at June 12, 2018 09:36 PM

Peter Zaitsev

Webinar Weds 6/13: Performance Analysis and Troubleshooting Methodologies for Databases

troubleshooting methodologies

troubleshooting methodologiesPlease join Percona’s CEO, Peter Zaitsev as he presents Performance Analysis and Troubleshooting Methodologies for Databases on Wednesday, June 13th, 2018 at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

 

Have you heard about the USE Method (Utilization – Saturation – Errors)? RED (Rate – Errors – Duration), or Golden Signals (Latency – Traffic – Errors – Saturations)?

In this presentation, we will talk briefly about these different-but-similar “focuses”. We’ll discuss how we can apply them to data infrastructure performance analysis, troubleshooting, and monitoring.

We will use MySQL as an example, but most of this talk applies to other database technologies too.

Register for the webinar

About Peter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

The post Webinar Weds 6/13: Performance Analysis and Troubleshooting Methodologies for Databases appeared first on Percona Database Performance Blog.

by Peter Zaitsev at June 12, 2018 11:48 AM

PXC loves firewalls (and System Admins loves iptables)

PXC and setting firewalls using iptables

PXC and setting firewalls using iptablesLet them stay together.

In the last YEARS, I have seen quite often that users, when installing a product such as PXC, instead of spending five minutes to understand what to do just run

iptables -F
  and save.

In short, they remove any rules for their firewall.

With this post, I want to show you how easy it can be to do the right thing instead of putting your server at risk. I’ll show you how a slightly more complex setup like PXC (compared to MySQL), can be easily achieved without risky shortcuts.

iptables is the utility used to manage the chains of rules used by the Linux kernel firewall, which is your basic security tool.
Linux comes with a wonderful firewall built into the kernel. As an administrator, you can configure this firewall with interfaces like ipchains  — which we are not going to cover — and iptables, which we shall talk about.

iptables is stateful, which means that the firewall can make decisions based on received packets. This means that I can, for instance, DROP a packet if it’s coming from bad-guy.com.

I can also create a set of rules that either will allow or reject the package, or that will redirect it to another rule. This potentially can create a very complex scenario.

However, for today and for this use case let’s keep it simple…  Looking at my own server:

iptables -v -L
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
 250K   29M ACCEPT     all  --  any    any     anywhere             anywhere             state RELATED,ESTABLISHED
    6   404 ACCEPT     icmp --  any    any     anywhere             anywhere
    0     0 ACCEPT     all  --  lo     any     anywhere             anywhere
    9   428 ACCEPT     tcp  --  any    any     anywhere             anywhere             state NEW tcp dpt:ssh
    0     0 ACCEPT     tcp  --  any    any     anywhere             anywhere             state NEW tcp dpt:mysql
    0     0 ACCEPT     tcp  --  any    any     anywhere             anywhere
  210 13986 REJECT     all  --  any    any     anywhere             anywhere             reject-with icmp-host-prohibited
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 REJECT     all  --  any    any     anywhere             anywhere             reject-with icmp-host-prohibited
Chain OUTPUT (policy ACCEPT 241K packets, 29M bytes)
 pkts bytes target     prot opt in     out     source               destination

That’s not too bad, my server is currently accepting only SSH and packets on port 3306. Please note that I used the -v option to see more information like IN/OUT and  that allows me to identify that actually row #3 is related to my loopback device, and as such it’s good to have it open.

The point is that if I try to run the PXC cluster with these settings it will fail, because the nodes will not be able to see each other.

A quite simple example when try to start the second node of the cluster:

2018-05-21T17:56:14.383686Z 0 [Note] WSREP: (3cb4b3a6, 'tcp://10.0.0.21:4567') connection to peer 584762e6 with addr tcp://10.0.0.23:4567 timed out, no messages seen in PT3S

Starting a new node will fail, given that the connectivity will not be established correctly. In the Percona documentation there is a notes section in which we mention that these ports must be open to have the cluster working correctly.:

  • 3306 For MySQL client connections and State Snapshot Transfer that use the mysqldump method.
  • 4567 For Galera Cluster replication traffic, multicast replication uses both UDP transport and TCP on this port.
  • 4568 For Incremental State Transfer.
  • 4444 For all other State Snapshot Transfer.

Of course, if you don’t know how to do it that could be a problem, but it is quite simple. Just use the following commands to add the needed rules:

iptables -I INPUT 2 --protocol tcp --match tcp --dport 3306 --source 10.0.0.1/24 --jump ACCEPT
iptables -I INPUT 3 --protocol tcp --match tcp --dport 4567 --source 10.0.0.1/24 --jump ACCEPT
iptables -I INPUT 4 --protocol tcp --match tcp --dport 4568 --source 10.0.0.1/24 --jump ACCEPT
iptables -I INPUT 5 --protocol tcp --match tcp --dport 4444 --source 10.0.0.1/24 --jump ACCEPT
iptables -I INPUT 6 --protocol udp --match udp --dport 4567 --source 10.0.0.1/24 --jump ACCEPT

Once you have done this check the layout again and you should have something like this:

[root@galera1h1n5 gal571]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
ACCEPT tcp -- 10.0.0.0/24 anywhere tcp dpt:mysql
ACCEPT tcp -- 10.0.0.0/24 anywhere tcp dpt:tram
ACCEPT tcp -- 10.0.0.0/24 anywhere tcp dpt:bmc-reporting
ACCEPT tcp -- 10.0.0.0/24 anywhere tcp dpt:krb524
ACCEPT udp -- 10.0.0.0/24 anywhere udp dpt:tram
ACCEPT icmp -- anywhere anywhere
ACCEPT tcp -- anywhere anywhere tcp dpt:ssh
ACCEPT tcp -- anywhere anywhere tcp dpt:mysql
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable
Chain FORWARD (policy ACCEPT)
target prot opt source destination
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable
Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Try to start the secondary node, and — tadaaa — the node will connect, will provision itself, and finally will start correctly.

All good? Well not really, you still need to perform a final step. We need to make our server accessible also for PMM monitoring agents.

You have PMM right? If you don’t take a look here and you will want it. 😀

Anyhow PMM will not work correctly with the rules I have, and the result will be an empty set of graphs when accessing the server statistics. Luckily, PMM has a very easy way to help you identify the issue:

[root@galera1h1n5 gal571]# pmm-admin check-network
PMM Network Status
Server Address | 192.168.1.52
Client Address | 192.168.1.205
* System Time
NTP Server (0.pool.ntp.org) | 2018-05-24 08:05:37 -0400 EDT
PMM Server | 2018-05-24 12:05:34 +0000 GMT
PMM Client | 2018-05-24 08:05:37 -0400 EDT
PMM Server Time Drift | OK
PMM Client Time Drift | OK
PMM Client to PMM Server Time Drift | OK
* Connection: Client --> Server
-------------------- -------
SERVER SERVICE STATUS
-------------------- -------
Consul API OK
Prometheus API OK
Query Analytics API OK
Connection duration | 1.051724ms
Request duration | 311.924µs
Full round trip | 1.363648ms
* Connection: Client <-- Server
-------------- ------------ -------------------- ------- ---------- ---------
SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD
-------------- ------------ -------------------- ------- ---------- ---------
linux:metrics galera1h1n5 192.168.1.205:42000 DOWN NO NO
mysql:metrics gal571 192.168.1.205:42002 DOWN NO NO
When an endpoint is down it may indicate that the corresponding service is stopped (run 'pmm-admin list' to verify).
If it's running, check out the logs /var/log/pmm-*.log
When all endpoints are down but 'pmm-admin list' shows they are up and no errors in the logs,
check the firewall settings whether this system allows incoming connections from server to address:port in question.
Also you can check the endpoint status by the URL: http://192.168.1.52/prometheus/targets

What you want more? You have all the information to debug and build your new rules. I just need to open the ports 42000 42002 on my firewall:

iptables -I INPUT 7 --protocol tcp --match tcp --dport 42000 --source 192.168.1.1/24 --jump ACCEPT
iptables -I INPUT 8 --protocol tcp --match tcp --dport 42002 --source 192.168.1.1/24 --jump ACCEPT

Please note that we are handling the connectivity for PMM using a different range of IPs/subnet. This because it is best practice to have PXC nodes communicate to a dedicated network/subnet (physical and logical).

Run the test again:

* Connection: Client <-- Server
-------------- ------------ -------------------- ------- ---------- ---------
SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD
-------------- ------------ -------------------- ------- ---------- ---------
linux:metrics galera1h1n5 192.168.1.205:42000 OK YES YES
mysql:metrics gal571 192.168.1.205:42002 OK YES YES

Done …  I just repeat this on all my nodes and I will have set my firewall to handle the PXC related security.

Now that all my settings are working well I can save my firewall’s rules:

iptables-save > /etc/sysconfig/iptables

For Ubuntu you may need some additional steps as for (https://help.ubuntu.com/community/IptablesHowTo#Using_iptables-save.2Frestore_to_test_rules)

There are some nice tools to help you even more, if you are very lazy, like UFW and the graphical one, GUFW. Developed to ease iptables firewall configuration, ufw provides a user friendly way to create an IPv4 or IPv6 host-based firewall. By default UFW is disabled in Ubuntu. Given that ultimately they use iptables, and their use is widely covered in other resources such as the official Ubuntu documentation, I won’t cover these here.

Conclusion

Please don’t make the mistake of flushing/ignoring your firewall, when to make this right is just a matter of 5 commands. It’s easy enough to be done by everyone and it’s good enough to stop the basic security attacks.

Happy MySQL (and PXC) to everyone.

The post PXC loves firewalls (and System Admins loves iptables) appeared first on Percona Database Performance Blog.

by Marco Tusa at June 12, 2018 10:37 AM

Jean-Jerome Schmidt

How to Benchmark Performance of MySQL & MariaDB using SysBench

What is SysBench? If you work with MySQL on a regular basis, then you most probably have heard of it. SysBench has been in the MySQL ecosystem for a long time. It was originally written by Peter Zaitsev, back in 2004. Its purpose was to provide a tool to run synthetic benchmarks of MySQL and the hardware it runs on. It was designed to run CPU, memory and I/O tests. It had also an option to execute OLTP workload on a MySQL database. OLTP stands for online transaction processing, typical workload for online applications like e-commerce, order entry or financial transaction systems.

In this blog post, we will focus on the SQL benchmark feature but keep in mind that hardware benchmarks can also be very useful in identifying issues on database servers. For example, I/O benchmark was intended to simulate InnoDB I/O workload while CPU tests involve simulation of highly concurrent, multi-treaded environment along with tests for mutex contentions - something which also resembles a database type of workload.

SysBench history and architecture

As mentioned, SysBench was originally created in 2004 by Peter Zaitsev. Soon after, Alexey Kopytov took over its development. It reached version 0.4.12 and the development halted. After a long break Alexey started to work on SysBench again in 2016. Soon version 0.5 has been released with OLTP benchmark rewritten to use LUA-based scripts. Then, in 2017, SysBench 1.0 was released. This was like day and night compared to the old, 0.4.12 version. First and the foremost, instead of hardcoded scripts, now we have the ability to customize benchmarks using LUA. For instance, Percona created TPCC-like benchmark which can be executed using SysBench. Let’s take a quick look at the current SysBench architecture.

SysBench is a C binary which uses LUA scripts to execute benchmarks. Those scripts have to:

  1. Handle input from command line parameters
  2. Define all of the modes which the benchmark is supposed to use (prepare, run, cleanup)
  3. Prepare all of the data
  4. Define how the benchmark will be executed (what queries will look like etc)

Scripts can utilize multiple connections to the database, they can also process results should you want to create complex benchmarks where queries depend on the result set of previous queries. With SysBench 1.0 it is possible to create latency histograms. It is also possible for the LUA scripts to catch and handle errors through error hooks. There’s support for parallelization in the LUA scripts, multiple queries can be executed in parallel, making, for example, provisioning much faster. Last but not least, multiple output formats are now supported. Before SysBench generated only human-readable output. Now it is possible to generate it as CSV or JSON, making it much easier to do post-processing and generate graphs using, for example, gnuplot or feed the data into Prometheus, Graphite or similar datastore.

Why SysBench?

The main reason why SysBench became popular is the fact that it is simple to use. Someone without prior knowledge can start to use it within minutes. It also provides, by default, benchmarks which cover most of the cases - OLTP workloads, read-only or read-write, primary key lookups and primary key updates. All which caused most of the issues for MySQL, up to MySQL 8.0. This was also a reason why SysBench was so popular in different benchmarks and comparisons published on the Internet. Those posts helped to promote this tool and made it into the go-to synthetic benchmark for MySQL.

Another good thing about SysBench is that, since version 0.5 and incorporation of LUA, anyone can prepare any kind of benchmark. We already mentioned TPCC-like benchmark but anyone can craft something which will resemble her production workload. We are not saying it is simple, it will be most likely a time-consuming process, but having this ability is beneficial if you need to prepare a custom benchmark.

Being a synthetic benchmark, SysBench is not a tool which you can use to tune configurations of your MySQL servers (unless you prepared LUA scripts with custom workload or your workload happen to be very similar to the benchmark workloads that SysBench comes with). What it is great for is to compare performance of different hardware. You can easily compare performance of, let’s say, different type of nodes offered by your cloud provider and maximum QPS (queries per second) they offer. Knowing that metric and knowing what you pay for given node, you can then calculate even more important metric - QP$ (queries per dollar). This will allow you to identify what node type to use when building a cost-efficient environment. Of course, SysBench can be used also for initial tuning and assessing feasibility of a given design. Let’s say we build a Galera cluster spanning across the globe - North America, EU, Asia. How many inserts per second can such a setup handle? What would be the commit latency? Does it even make sense to do a proof of concept or maybe network latency is high enough that even a simple workload does not work as you would expect it to.

What about stress-testing? Not everyone has moved to the cloud, there are still companies preferring to build their own infrastructure. Every new server acquired should go through a warm-up period during which you will stress it to pinpoint potential hardware defects. In this case SysBench can also help. Either by executing OLTP workload which overloads the server, or you can also use dedicated benchmarks for CPU, disk and memory.

As you can see, there are many cases in which even a simple, synthetic benchmark can be very useful. In the next paragraph we will look at what we can do with SysBench.

What SysBench can do for you?

What tests you can run?

As mentioned at the beginning, we will focus on OLTP benchmarks and just as a reminder we’ll repeat that SysBench can also be used to perform I/O, CPU and memory tests. Let’s take a look at the benchmarks that SysBench 1.0 comes with (we removed some helper LUA files and non-database LUA scripts from this list).

-rwxr-xr-x 1 root root 1.5K May 30 07:46 bulk_insert.lua
-rwxr-xr-x 1 root root 1.3K May 30 07:46 oltp_delete.lua
-rwxr-xr-x 1 root root 2.4K May 30 07:46 oltp_insert.lua
-rwxr-xr-x 1 root root 1.3K May 30 07:46 oltp_point_select.lua
-rwxr-xr-x 1 root root 1.7K May 30 07:46 oltp_read_only.lua
-rwxr-xr-x 1 root root 1.8K May 30 07:46 oltp_read_write.lua
-rwxr-xr-x 1 root root 1.1K May 30 07:46 oltp_update_index.lua
-rwxr-xr-x 1 root root 1.2K May 30 07:46 oltp_update_non_index.lua
-rwxr-xr-x 1 root root 1.5K May 30 07:46 oltp_write_only.lua
-rwxr-xr-x 1 root root 1.9K May 30 07:46 select_random_points.lua
-rwxr-xr-x 1 root root 2.1K May 30 07:46 select_random_ranges.lua

Let’s go through them one by one.

First, bulk_insert.lua. This test can be used to benchmark the ability of MySQL to perform multi-row inserts. This can be quite useful when checking, for example, performance of replication or Galera cluster. In the first case, it can help you answer a question: “how fast can I insert before replication lag will kick in?”. In the later case, it will tell you how fast data can be inserted into a Galera cluster given the current network latency.

All oltp_* scripts share a common table structure. First two of them (oltp_delete.lua and oltp_insert.lua) execute single DELETE and INSERT statements. Again, this could be a test for either replication or Galera cluster - push it to the limits and see what amount of inserting or purging it can handle. We also have other benchmarks focused on particular functionality - oltp_point_select, oltp_update_index and oltp_update_non_index. These will execute a subset of queries - primary key-based selects, index-based updates and non-index-based updates. If you want to test some of these functionalities, the tests are there. We also have more complex benchmarks which are based on OLTP workloads: oltp_read_only, oltp_read_write and oltp_write_only. You can run either a read-only workload, which will consist of different types of SELECT queries, you can run only writes (a mix of DELETE, INSERT and UPDATE) or you can run a mix of those two. Finally, using select_random_points and select_random_ranges you can run some random SELECT either using random points in IN() list or random ranges using BETWEEN.

How you can configure a benchmark?

What is also important, benchmarks are configurable - you can run different workload patterns using the same benchmark. Let’s take a look at the two most common benchmarks to execute. We’ll have a deep dive into OLTP read_only and OLTP read_write benchmarks. First of all, SysBench has some general configuration options. We will discuss here only the most important ones, you can check all of them by running:

sysbench --help

Let’s take a look at them.

  --threads=N                     number of threads to use [1]

You can define what kind of concurrency you’d like SysBench to generate. MySQL, as every software, has some scalability limitations and its performance will peak at some level of concurrency. This setting helps to simulate different concurrencies for a given workload and check if it already has passed the sweet spot.

  --events=N                      limit for total number of events [0]
  --time=N                        limit for total execution time in seconds [10]

Those two settings govern how long SysBench should keep running. It can either execute some number of queries or it can keep running for a predefined time.

  --warmup-time=N                 execute events for this many seconds with statistics disabled before the actual benchmark run with statistics enabled [0]

This is self-explanatory. SysBench generates statistical results from the tests and those results may be affected if MySQL is in a cold state. Warmup helps to identify “regular” throughput by executing benchmark for a predefined time, allowing to warm up the cache, buffer pools etc.

  --rate=N                        average transactions rate. 0 for unlimited rate [0]

By default SysBench will attempt to execute queries as fast as possible. To simulate slower traffic this option may be used. You can define here how many transactions should be executed per second.

  --report-interval=N             periodically report intermediate statistics with a specified interval in seconds. 0 disables intermediate reports [0]

By default SysBench generates a report after it completed its run and no progress is reported while the benchmark is running. Using this option you can make SysBench more verbose while the benchmark still runs.

  --rand-type=STRING   random numbers distribution {uniform, gaussian, special, pareto, zipfian} to use by default [special]

SysBench gives you ability to generate different types of data distribution. All of them may have their own purposes. Default option, ‘special’, defines several (it is configurable) hot-spots in the data, something which is quite common in web applications. You can also use other distributions if your data behaves in a different way. By making a different choice here you can also change the way your database is stressed. For example, uniform distribution, where all of the rows have the same likeliness of being accessed, is much more memory-intensive operation. It will use more buffer pool to store all of the data and it will be much more disk-intensive if your data set won’t fit in memory. On the other hand, special distribution with couple of hot-spots will put less stress on the disk as hot rows are more likely to be kept in the buffer pool and access to rows stored on disk is much less likely. For some of the data distribution types, SysBench gives you more tweaks. You can find this info in ‘sysbench --help’ output.

  --db-ps-mode=STRING prepared statements usage mode {auto, disable} [auto]

Using this setting you can decide if SysBench should use prepared statements (as long as they are available in the given datastore - for MySQL it means PS will be enabled by default) or not. This may make a difference while working with proxies like ProxySQL or MaxScale - they should treat prepared statements in a special way and all of them should be routed to one host making it impossible to test scalability of the proxy.

In addition to the general configuration options, each of the tests may have its own configuration. You can check what is possible by running:

root@vagrant:~# sysbench ./sysbench/src/lua/oltp_read_write.lua  help
sysbench 1.1.0-2e6b7d5 (using bundled LuaJIT 2.1.0-beta3)

oltp_read_only.lua options:
  --distinct_ranges=N           Number of SELECT DISTINCT queries per transaction [1]
  --sum_ranges=N                Number of SELECT SUM() queries per transaction [1]
  --skip_trx[=on|off]           Don't start explicit transactions and execute all queries in the AUTOCOMMIT mode [off]
  --secondary[=on|off]          Use a secondary index in place of the PRIMARY KEY [off]
  --create_secondary[=on|off]   Create a secondary index in addition to the PRIMARY KEY [on]
  --index_updates=N             Number of UPDATE index queries per transaction [1]
  --range_size=N                Range size for range SELECT queries [100]
  --auto_inc[=on|off]           Use AUTO_INCREMENT column as Primary Key (for MySQL), or its alternatives in other DBMS. When disabled, use client-generated IDs [on]
  --delete_inserts=N            Number of DELETE/INSERT combinations per transaction [1]
  --tables=N                    Number of tables [1]
  --mysql_storage_engine=STRING Storage engine, if MySQL is used [innodb]
  --non_index_updates=N         Number of UPDATE non-index queries per transaction [1]
  --table_size=N                Number of rows per table [10000]
  --pgsql_variant=STRING        Use this PostgreSQL variant when running with the PostgreSQL driver. The only currently supported variant is 'redshift'. When enabled, create_secondary is automatically disabled, and delete_inserts is set to 0
  --simple_ranges=N             Number of simple range SELECT queries per transaction [1]
  --order_ranges=N              Number of SELECT ORDER BY queries per transaction [1]
  --range_selects[=on|off]      Enable/disable all range SELECT queries [on]
  --point_selects=N             Number of point SELECT queries per transaction [10]

Again, we will discuss the most important options from here. First of all, you have a control of how exactly a transaction will look like. Generally speaking, it consists of different types of queries - INSERT, DELETE, different type of SELECT (point lookup, range, aggregation) and UPDATE (indexed, non-indexed). Using variables like:

  --distinct_ranges=N           Number of SELECT DISTINCT queries per transaction [1]
  --sum_ranges=N                Number of SELECT SUM() queries per transaction [1]
  --index_updates=N             Number of UPDATE index queries per transaction [1]
  --delete_inserts=N            Number of DELETE/INSERT combinations per transaction [1]
  --non_index_updates=N         Number of UPDATE non-index queries per transaction [1]
  --simple_ranges=N             Number of simple range SELECT queries per transaction [1]
  --order_ranges=N              Number of SELECT ORDER BY queries per transaction [1]
  --point_selects=N             Number of point SELECT queries per transaction [10]
  --range_selects[=on|off]      Enable/disable all range SELECT queries [on]

You can define what a transaction should look like. As you can see by looking at the default values, majority of queries are SELECTs - mainly point selects but also different types of range SELECTs (you can disable all of them by setting range_selects to off). You can tweak the workload towards more write-heavy workload by increasing the number of updates or INSERT/DELETE queries. It is also possible to tweak settings related to secondary indexes, auto increment but also data set size (number of tables and how many rows each of them should hold). This lets you customize your workload quite nicely.

  --skip_trx[=on|off]           Don't start explicit transactions and execute all queries in the AUTOCOMMIT mode [off]

This is another setting, quite important when working with proxies. By default, SysBench will attempt to execute queries in explicit transaction. This way the dataset will stay consistent and not affected: SysBench will, for example, execute INSERT and DELETE on the same row, making sure the data set will not grow (impacting your ability to reproduce results). However, proxies will treat explicit transactions differently - all queries executed within a transaction should be executed on the same host, thus removing the ability to scale the workload. Please keep in mind that disabling transactions will result in data set diverging from the initial point. It may also trigger some issues like duplicate key errors or such. To be able to disable transactions you may also want to look into:

  --mysql-ignore-errors=[LIST,...] list of errors to ignore, or "all" [1213,1020,1205]

This setting allows you to specify error codes from MySQL which SysBench should ignore (and not kill the connection). For example, to ignore errors like: error 1062 (Duplicate entry '6' for key 'PRIMARY') you should pass this error code: --mysql-ignore-errors=1062

What is also important, each benchmark should present a way to provision a data set for tests, run them and then clean it up after the tests complete. This is done using ‘prepare’, ‘run’ and ‘cleanup’ commands. We will show how this is done in the next section.

Examples

In this section we’ll go through some examples of what SysBench can be used for. As mentioned earlier, we’ll focus on the two most popular benchmarks - OLTP read only and OLTP read/write. Sometimes it may make sense to use other benchmarks, but at least we’ll be able to show you how those two can be customized.

Primary Key lookups

First of all, we have to decide which benchmark we will run, read-only or read-write. Technically speaking it does not make a difference as we can remove writes from R/W benchmark. Let’s focus on the read-only one.

As a first step, we have to prepare a data set. We need to decide how big it should be. For this particular benchmark, using default settings (so, secondary indexes are created), 1 million rows will result in ~240 MB of data. Ten tables, 1000 000 rows each equals to 2.4GB:

root@vagrant:~# du -sh /var/lib/mysql/sbtest/
2.4G    /var/lib/mysql/sbtest/
root@vagrant:~# ls -alh /var/lib/mysql/sbtest/
total 2.4G
drwxr-x--- 2 mysql mysql 4.0K Jun  1 12:12 .
drwxr-xr-x 6 mysql mysql 4.0K Jun  1 12:10 ..
-rw-r----- 1 mysql mysql   65 Jun  1 12:08 db.opt
-rw-r----- 1 mysql mysql 8.5K Jun  1 12:12 sbtest10.frm
-rw-r----- 1 mysql mysql 240M Jun  1 12:12 sbtest10.ibd
-rw-r----- 1 mysql mysql 8.5K Jun  1 12:10 sbtest1.frm
-rw-r----- 1 mysql mysql 240M Jun  1 12:10 sbtest1.ibd
-rw-r----- 1 mysql mysql 8.5K Jun  1 12:10 sbtest2.frm
-rw-r----- 1 mysql mysql 240M Jun  1 12:10 sbtest2.ibd
-rw-r----- 1 mysql mysql 8.5K Jun  1 12:10 sbtest3.frm
-rw-r----- 1 mysql mysql 240M Jun  1 12:10 sbtest3.ibd
-rw-r----- 1 mysql mysql 8.5K Jun  1 12:10 sbtest4.frm
-rw-r----- 1 mysql mysql 240M Jun  1 12:10 sbtest4.ibd
-rw-r----- 1 mysql mysql 8.5K Jun  1 12:11 sbtest5.frm
-rw-r----- 1 mysql mysql 240M Jun  1 12:11 sbtest5.ibd
-rw-r----- 1 mysql mysql 8.5K Jun  1 12:11 sbtest6.frm
-rw-r----- 1 mysql mysql 240M Jun  1 12:11 sbtest6.ibd
-rw-r----- 1 mysql mysql 8.5K Jun  1 12:11 sbtest7.frm
-rw-r----- 1 mysql mysql 240M Jun  1 12:11 sbtest7.ibd
-rw-r----- 1 mysql mysql 8.5K Jun  1 12:11 sbtest8.frm
-rw-r----- 1 mysql mysql 240M Jun  1 12:11 sbtest8.ibd
-rw-r----- 1 mysql mysql 8.5K Jun  1 12:12 sbtest9.frm
-rw-r----- 1 mysql mysql 240M Jun  1 12:12 sbtest9.ibd

This should give you idea how many tables you want and how big they should be. Let’s say we want to test in-memory workload so we want to create tables which will fit into InnoDB buffer pool. On the other hand, we want also to make sure there are enough tables not to become a bottleneck (or, that the amount of tables matches what you would expect in your production setup). Let’s prepare our dataset. Please keep in mind that, by default, SysBench looks for ‘sbtest’ schema which has to exist before you prepare the data set. You may have to create it manually.

root@vagrant:~# sysbench /root/sysbench/src/lua/oltp_read_only.lua --threads=4 --mysql-host=10.0.0.126 --mysql-user=sbtest --mysql-password=pass --mysql-port=3306 --tables=10 --table-size=1000000 prepare
sysbench 1.1.0-2e6b7d5 (using bundled LuaJIT 2.1.0-beta3)

Initializing worker threads...

Creating table 'sbtest2'...
Creating table 'sbtest3'...
Creating table 'sbtest4'...
Creating table 'sbtest1'...
Inserting 1000000 records into 'sbtest2'
Inserting 1000000 records into 'sbtest4'
Inserting 1000000 records into 'sbtest3'
Inserting 1000000 records into 'sbtest1'
Creating a secondary index on 'sbtest2'...
Creating a secondary index on 'sbtest3'...
Creating a secondary index on 'sbtest1'...
Creating a secondary index on 'sbtest4'...
Creating table 'sbtest6'...
Inserting 1000000 records into 'sbtest6'
Creating table 'sbtest7'...
Inserting 1000000 records into 'sbtest7'
Creating table 'sbtest5'...
Inserting 1000000 records into 'sbtest5'
Creating table 'sbtest8'...
Inserting 1000000 records into 'sbtest8'
Creating a secondary index on 'sbtest6'...
Creating a secondary index on 'sbtest7'...
Creating a secondary index on 'sbtest5'...
Creating a secondary index on 'sbtest8'...
Creating table 'sbtest10'...
Inserting 1000000 records into 'sbtest10'
Creating table 'sbtest9'...
Inserting 1000000 records into 'sbtest9'
Creating a secondary index on 'sbtest10'...
Creating a secondary index on 'sbtest9'...

Once we have our data, let’s prepare a command to run the test. We want to test Primary Key lookups therefore we will disable all other types of SELECT. We will also disable prepared statements as we want to test regular queries. We will test low concurrency, let’s say 16 threads. Our command may look like below:

sysbench /root/sysbench/src/lua/oltp_read_only.lua --threads=16 --events=0 --time=300 --mysql-host=10.0.0.126 --mysql-user=sbtest --mysql-password=pass --mysql-port=3306 --tables=10 --table-size=1000000 --range_selects=off --db-ps-mode=disable --report-interval=1 run

What did we do here? We set the number of threads to 16. We decided that we want our benchmark to run for 300 seconds, without a limit of executed queries. We defined connectivity to the database, number of tables and their size. We also disabled all range SELECTs, we also disabled prepared statements. Finally, we set report interval to one second. This is how a sample output may look like:

[ 297s ] thds: 16 tps: 97.21 qps: 1127.43 (r/w/o: 935.01/0.00/192.41) lat (ms,95%): 253.35 err/s: 0.00 reconn/s: 0.00
[ 298s ] thds: 16 tps: 195.32 qps: 2378.77 (r/w/o: 1985.13/0.00/393.64) lat (ms,95%): 189.93 err/s: 0.00 reconn/s: 0.00
[ 299s ] thds: 16 tps: 178.02 qps: 2115.22 (r/w/o: 1762.18/0.00/353.04) lat (ms,95%): 155.80 err/s: 0.00 reconn/s: 0.00
[ 300s ] thds: 16 tps: 217.82 qps: 2640.92 (r/w/o: 2202.27/0.00/438.65) lat (ms,95%): 125.52 err/s: 0.00 reconn/s: 0.00

Every second we see a snapshot of workload stats. This is quite useful to track and plot - final report will give you averages only. Intermediate results will make it possible to track the performance on a second by second basis. The final report may look like below:

SQL statistics:
    queries performed:
        read:                            614660
        write:                           0
        other:                           122932
        total:                           737592
    transactions:                        61466  (204.84 per sec.)
    queries:                             737592 (2458.08 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

Throughput:
    events/s (eps):                      204.8403
    time elapsed:                        300.0679s
    total number of events:              61466

Latency (ms):
         min:                                   24.91
         avg:                                   78.10
         max:                                  331.91
         95th percentile:                      137.35
         sum:                              4800234.60

Threads fairness:
    events (avg/stddev):           3841.6250/20.87
    execution time (avg/stddev):   300.0147/0.02

You will find here information about executed queries and other (BEGIN/COMMIT) statements. You’ll learn how many transactions were executed, how many errors happened, what was the throughput and total elapsed time. You can also check latency metrics and the query distribution across threads.

If we were interested in latency distribution, we could also pass ‘--histogram’ argument to SysBench. This results in an additional output like below:

Latency histogram (values are in milliseconds)
       value  ------------- distribution ------------- count
      29.194 |******                                   1
      30.815 |******                                   1
      31.945 |***********                              2
      33.718 |******                                   1
      34.954 |***********                              2
      35.589 |******                                   1
      37.565 |***********************                  4
      38.247 |******                                   1
      38.942 |******                                   1
      39.650 |***********                              2
      40.370 |***********                              2
      41.104 |*****************                        3
      41.851 |*****************************            5
      42.611 |*****************                        3
      43.385 |*****************                        3
      44.173 |***********                              2
      44.976 |**************************************** 7
      45.793 |***********************                  4
      46.625 |***********                              2
      47.472 |*****************************            5
      48.335 |**************************************** 7
      49.213 |***********                              2
      50.107 |**********************************       6
      51.018 |***********************                  4
      51.945 |**************************************** 7
      52.889 |*****************                        3
      53.850 |*****************                        3
      54.828 |***********************                  4
      55.824 |***********                              2
      57.871 |***********                              2
      58.923 |***********                              2
      59.993 |******                                   1
      61.083 |******                                   1
      63.323 |***********                              2
      66.838 |******                                   1
      71.830 |******                                   1

Once we are good with our results, we can clean up the data:

sysbench /root/sysbench/src/lua/oltp_read_only.lua --threads=16 --events=0 --time=300 --mysql-host=10.0.0.126 --mysql-user=sbtest --mysql-password=pass --mysql-port=3306 --tables=10 --table-size=1000000 --range_selects=off --db-ps-mode=disable --report-interval=1 cleanup

Write-heavy traffic

Let’s imagine here that we want to execute a write-heavy (but not write-only) workload and, for example, test I/O subsystem’s performance. First of all, we have to decide how big the dataset should be. We’ll assume ~48GB of data (20 tables, 10 000 000 rows each). We need to prepare it. This time we will use the read-write benchmark.

root@vagrant:~# sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --mysql-host=10.0.0.126 --mysql-user=sbtest --mysql-password=pass --mysql-port=3306 --tables=20 --table-size=10000000 prepare

Once this is done, we can tweak the defaults to force more writes into the query mix:

root@vagrant:~# sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=16 --events=0 --time=300 --mysql-host=10.0.0.126 --mysql-user=sbtest --mysql-password=pass --mysql-port=3306 --tables=20 --delete_inserts=10 --index_updates=10 --non_index_updates=10 --table-size=10000000 --db-ps-mode=disable --report-interval=1 run

As you can see from the intermediate results, transactions are now on a write-heavy side:

[ 5s ] thds: 16 tps: 16.99 qps: 946.31 (r/w/o: 231.83/680.50/33.98) lat (ms,95%): 1258.08 err/s: 0.00 reconn/s: 0.00
[ 6s ] thds: 16 tps: 17.01 qps: 955.81 (r/w/o: 223.19/698.59/34.03) lat (ms,95%): 1032.01 err/s: 0.00 reconn/s: 0.00
[ 7s ] thds: 16 tps: 12.00 qps: 698.91 (r/w/o: 191.97/482.93/24.00) lat (ms,95%): 1235.62 err/s: 0.00 reconn/s: 0.00
[ 8s ] thds: 16 tps: 14.01 qps: 683.43 (r/w/o: 195.12/460.29/28.02) lat (ms,95%): 1533.66 err/s: 0.00 reconn/s: 0.00

Understanding the results

As we showed above, SysBench is a great tool which can help to pinpoint some of the performance issues of MySQL or MariaDB. It can also be used for initial tuning of your database configuration. Of course, you have to keep in mind that, to get the best out of your benchmarks, you have to understand why results look like they do. This would require insights into the MySQL internal metrics using monitoring tools, for instance, ClusterControl. This is quite important to remember - if you don’t understand why the performance was like it was, you may draw incorrect conclusions out of the benchmarks. There is always a bottleneck, and SysBench can help raise the performance issues, which you then have to identify.

by krzysztof at June 12, 2018 08:08 AM

June 11, 2018

Peter Zaitsev

ProxySQL Experimental Feature: Native ProxySQL Clustering

ProxySQL Cluster

ProxySQL 1.4.2 introduced native clustering, allowing several ProxySQL instances to communicate with and share configuration updates with each other. In this blog post, I’ll review this new feature and how we can start working with 3 nodes.

Before I continue, let’s review two common methods to installing ProxySQL.

ProxySQL as a centralized server

This is the most common installation, where ProxySQL is between application servers and the database. It is simple, but without any high availability. If ProxySQL goes down you lose all connectivity to the database.

ProxySQL Install most common set up

ProxySQL on app instances

Another common setup is to install ProxySQL onto each application server. This is good because the loss of one ProxySQL/App server will not bring down the entire application.

ProxySQL Install master-slave

For more information about the previous installation, please visit this link Where Do I Put ProxySQL?

Sometimes our application and databases grow fast. Maybe you need add a loadbalancer, for example, and in that moment you start thinking … “What could I do to configure and maintain all these ProxySQL nodes without mistakes?

To do that, there are many tools like Ansible, Puppet, and Chef, but you will need write/create/maintain scripts to do those tasks. This is really difficult to administer for one person.

Now, there is a native solution, built into ProxySQL, to create and administer a cluster in an easy way.

At the moment this feature is EXPERIMENTAL and subject to change. Think very carefully before installing it in production, in fact I strongly recommend you wait. However, if you would like to start testing this feature, you need to install ProxySQL 1.4.2, or better.

This clustering feature is really useful if you have installed one ProxySQL per application instance, because all the changes in one of the ProxySQL nodes will be propagated to all the other ProxySQL nodes. You can also configure a “master-slave” style setup with ProxySQL clustering.

There are only 4 tables where you can make changes and propagate the configuration:

  • mysql_query_rules
  • mysql_servers
  • mysql_users
  • proxysql_servers

How does it work?

It’s easy. When you make a change like INSERT/DELETE/UPDATE on any of these tables, after running the command

LOAD … TO RUNTIME
 , ProxySQL creates a new checksum of the table’s data and increments the version number in the table runtime_checksums_values. Below we can see an example.

admin ((none))>SELECT name, version, FROM_UNIXTIME(epoch), checksum FROM runtime_checksums_values ORDER BY name;
+-------------------+---------+----------------------+--------------------+
| name              | version | FROM_UNIXTIME(epoch) | checksum           |
+-------------------+---------+----------------------+--------------------+
| admin_variables   | 0       | 1970-01-01 00:00:00  |                    |
| mysql_query_rules | 1       | 2018-04-26 15:58:23  | 0x0000000000000000 |
| mysql_servers     | 1       | 2018-04-26 15:58:23  | 0x0000000000000000 |
| mysql_users       | 4       | 2018-04-26 18:36:12  | 0x2F35CAB62143AE41 |
| mysql_variables   | 0       | 1970-01-01 00:00:00  |                    |
| proxysql_servers  | 1       | 2018-04-26 15:58:23  | 0x0000000000000000 |
+-------------------+---------+----------------------+--------------------+

Internally, all nodes are monitoring and communicating with all the other ProxySQL nodes. When another node detects a change in the checksum and version (both at the same time), each node will get a copy of the table that was modified, make the same changes locally, and apply the new config to RUNTIME to refresh the new config, make it visible to the applications connected and automatically save it to DISK for persistence.

ProxySQL Cluster

The following setup creates a “synchronous cluster” so any changes to these 4 tables on any ProxySQL server will be replicated to all other ProxySQL nodes. Be careful!

How can I start testing this new feature?

1) To start we need to get at least 2 nodes. Download and install ProxySQL 1.4.2 or higher and start a clean version.

2) On all nodes, we need to update the following global variables. These changes will set the username and password used by each node’s internal communication to cluster1/clusterpass. These must be the same on all nodes in this cluster.

update global_variables set variable_value='admin:admin;cluster1:clusterpass' where variable_name='admin-admin_credentials';
update global_variables set variable_value='cluster1' where variable_name='admin-cluster_username';
update global_variables set variable_value='clusterpass' where variable_name='admin-cluster_password';
update global_variables set variable_value=200 where variable_name='admin-cluster_check_interval_ms';
update global_variables set variable_value=100 where variable_name='admin-cluster_check_status_frequency';
update global_variables set variable_value='true' where variable_name='admin-cluster_mysql_query_rules_save_to_disk';
update global_variables set variable_value='true' where variable_name='admin-cluster_mysql_servers_save_to_disk';
update global_variables set variable_value='true' where variable_name='admin-cluster_mysql_users_save_to_disk';
update global_variables set variable_value='true' where variable_name='admin-cluster_proxysql_servers_save_to_disk';
update global_variables set variable_value=3 where variable_name='admin-cluster_mysql_query_rules_diffs_before_sync';
update global_variables set variable_value=3 where variable_name='admin-cluster_mysql_servers_diffs_before_sync';
update global_variables set variable_value=3 where variable_name='admin-cluster_mysql_users_diffs_before_sync';
update global_variables set variable_value=3 where variable_name='admin-cluster_proxysql_servers_diffs_before_sync';
load admin variables to RUNTIME;
save admin variables to disk;

3) Add all IPs from the other ProxySQL nodes into each other node:

INSERT INTO proxysql_servers (hostname,port,weight,comment) VALUES ('10.138.180.183',6032,100,'PRIMARY');
INSERT INTO proxysql_servers (hostname,port,weight,comment) VALUES ('10.138.244.108',6032,99,'SECONDARY');
INSERT INTO proxysql_servers (hostname,port,weight,comment) VALUES ('10.138.244.244',6032,98,'SECONDARY');
LOAD PROXYSQL SERVERS TO RUNTIME;
SAVE PROXYSQL SERVERS TO DISK;

At this moment, we have all nodes synced.

In the next example from the log file, we can see when node1 detected node2.

[root@proxysql1 ~]# $ tail /var/lib/proxysql/proxysql.log
...
2018-05-10 11:19:51 [INFO] Cluster: Fetching ProxySQL Servers from peer 10.138.244.108:6032 started
2018-05-10 11:19:51 [INFO] Cluster: Fetching ProxySQL Servers from peer 10.138.244.108:6032 completed
2018-05-10 11:19:51 [INFO] Cluster: Loading to runtime ProxySQL Servers from peer 10.138.244.108:6032
2018-05-10 11:19:51 [INFO] Destroyed Cluster Node Entry for host 10.138.148.242:6032
2018-05-10 11:19:51 [INFO] Cluster: Saving to disk ProxySQL Servers from peer 10.138.244.108:6032
2018-05-10 11:19:52 [INFO] Cluster: detected a new checksum for proxysql_servers from peer 10.138.180.183:6032, version 6, epoch 1525951191, checksum 0x3D819A34C06EF4EA . Not syncing yet ...
2018-05-10 11:19:52 [INFO] Cluster: checksum for proxysql_servers from peer 10.138.180.183:6032 matches with local checksum 0x3D819A34C06EF4EA , we won't sync.
2018-05-10 11:19:52 [INFO] Cluster: closing thread for peer 10.138.148.242:6032
2018-05-10 11:19:52 [INFO] Cluster: detected a new checksum for proxysql_servers from peer 10.138.244.244:6032, version 4, epoch 1525951163, checksum 0x3D819A34C06EF4EA . Not syncing yet ...
2018-05-10 11:19:52 [INFO] Cluster: checksum for proxysql_servers from peer 10.138.244.244:6032 matches with local checksum 0x3D819A34C06EF4EA , we won't sync
...

Another example is to add users to the table mysql_users. Remember these users are to enable MySQL connections between the application (frontend) and MySQL (backend).

We will add a new username and password on any server; in my test I’ll use node2:

admin proxysql2 ((none))>INSERT INTO mysql_users(username,password) VALUES ('user1','crazyPassword');
Query OK, 1 row affected (0.00 sec)
admin proxysql2 ((none))>LOAD MYSQL USERS TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

In the log file from node3, we can see the update immediately:

[root@proxysql3 ~]# $ tail /var/lib/proxysql/proxysql.log
...
2018-05-10 11:30:57 [INFO] Cluster: detected a new checksum for mysql_users from peer 10.138.244.108:6032, version 2, epoch 1525951873, checksum 0x2AF43564C9985EC7 . Not syncing yet ...
2018-05-10 11:30:57 [INFO] Cluster: detected a peer 10.138.244.108:6032 with mysql_users version 2, epoch 1525951873, diff_check 3. Own version: 1, epoch: 1525950968. Proceeding with remote sync
2018-05-10 11:30:57 [INFO] Cluster: detected a peer 10.138.244.108:6032 with mysql_users version 2, epoch 1525951873, diff_check 4. Own version: 1, epoch: 1525950968. Proceeding with remote sync
2018-05-10 11:30:57 [INFO] Cluster: detected peer 10.138.244.108:6032 with mysql_users version 2, epoch 1525951873
2018-05-10 11:30:57 [INFO] Cluster: Fetching MySQL Users from peer 10.138.244.108:6032 started
2018-05-10 11:30:57 [INFO] Cluster: Fetching MySQL Users from peer 10.138.244.108:6032 completed
2018-05-10 11:30:57 [INFO] Cluster: Loading to runtime MySQL Users from peer 10.138.244.108:6032
2018-05-10 11:30:57 [INFO] Cluster: Saving to disk MySQL Query Rules from peer 10.138.244.108:6032
2018-05-10 11:30:57 [INFO] Cluster: detected a new checksum for mysql_users from peer 10.138.244.244:6032, version 2, epoch 1525951857, checksum 0x2AF43564C9985EC7 . Not syncing yet ...
2018-05-10 11:30:57 [INFO] Cluster: checksum for mysql_users from peer 10.138.244.244:6032 matches with local checksum 0x2AF43564C9985EC7 , we won't sync.
2018-05-10 11:30:57 [INFO] Cluster: detected a new checksum for mysql_users from peer 10.138.180.183:6032, version 2, epoch 1525951886, checksum 0x2AF43564C9985EC7 . Not syncing yet ...
2018-05-10 11:30:57 [INFO] Cluster: checksum for mysql_users from peer 10.138.180.183:6032 matches with local checksum 0x2AF43564C9985EC7 , we won't sync.
...

What happens if some node is down?

In this example, we will see and find out what happens if one node is down or has a network glitch, or other issue. I’ll stop ProxySQL node3:

[root@proxysql3 ~]# service proxysql stop
Shutting down ProxySQL: DONE!

On ProxySQL node1, we can check that node3 is unreachable:

[root@proxysql1 ~]# tailf /var/lib/proxysql/proxysql.log
2018-05-10 11:57:33 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.138.244.244:6032 . Error: Can't connect to MySQL server on '10.138.244.244' (107)
2018-05-10 11:57:33 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.138.244.244:6032 . Error: Can't connect to MySQL server on '10.138.244.244' (107)
2018-05-10 11:57:33 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.138.244.244:6032 . Error: Can't connect to MySQL server on '10.138.244.244' (107)

And another check can be run in any ProxySQL node like node2, for example:

admin proxysql2 ((none))>SELECT hostname, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at FROM stats_proxysql_servers_checksums WHERE name='proxysql_servers' ORDER BY hostname;
+----------------+--------------------+---------------------+---------------------+
| hostname       | checksum           | changed_at          | updated_at          |
+----------------+--------------------+---------------------+---------------------+
| 10.138.180.183 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:39 | 2018-05-10 12:01:59 |
| 10.138.244.108 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:38 | 2018-05-10 12:01:59 |
| 10.138.244.244 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:39 | 2018-05-10 11:56:59 |
+----------------+--------------------+---------------------+---------------------+
3 rows in set (0.00 sec)

In the previous result, we can see node3 (10.138.244.244) is not being updated; the column updated_at should have a later datetime. This means that node3 is not running (or is down or network glitch).

At this point, any change to any of the tables, mysql_query_rules, mysql_servers, mysql_users, proxysql_servers, will be replicated between nodes 1 & 2.

In this next example, while node3 is offline, we will add another user to mysql_users table.

admin proxysql2 ((none))>INSERT INTO mysql_users(username,password) VALUES ('user2','passwordCrazy');
Query OK, 1 row affected (0.00 sec)
admin proxysql2 ((none))>LOAD MYSQL USERS TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

That change was propagated to node1:

[root@proxysql3 ~]# $ tail /var/lib/proxysql/proxysql.log
...
2018-05-10 12:12:36 [INFO] Cluster: detected a peer 10.138.244.108:6032 with mysql_users version 3, epoch 1525954343, diff_check 4. Own version: 2, epoch: 1525951886. Proceeding with remote sync
2018-05-10 12:12:36 [INFO] Cluster: detected peer 10.138.244.108:6032 with mysql_users version 3, epoch 1525954343
2018-05-10 12:12:36 [INFO] Cluster: Fetching MySQL Users from peer 10.138.244.108:6032 started
2018-05-10 12:12:36 [INFO] Cluster: Fetching MySQL Users from peer 10.138.244.108:6032 completed
2018-05-10 12:12:36 [INFO] Cluster: Loading to runtime MySQL Users from peer 10.138.244.108:6032
2018-05-10 12:12:36 [INFO] Cluster: Saving to disk MySQL Query Rules from peer 10.138.244.108:6032
...

We keep seeing node3 is out of sync about 25 minutes ago.

admin proxysql2 ((none))>SELECT hostname, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at FROM stats_proxysql_servers_checksums WHERE name='mysql_users' ORDER BY hostname;
+----------------+--------------------+---------------------+---------------------+
|      hostname  |     checksum       |     changed_at      |      updated_at     |
+----------------+--------------------+---------------------+---------------------+
| 10.138.180.183 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:39 | 2018-05-10 12:21:35 |
| 10.138.244.108 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:38 |2018-05-10 12:21:35  |
| 10.138.244.244 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:39 |2018-05-10 12:21:35  |
+----------------+--------------------+---------------------+---------------------+
3 rows in set (0.00 sec)

Let’s start node3 and check if the sync works. node3 should connect to the other nodes and get the last changes.

[root@proxysql3 ~]# tail /var/lib/proxysql/proxysql.log
...
2018-05-10 12:30:02 [INFO] Cluster: detected a peer 10.138.244.108:6032 with mysql_users version 3, epoch 1525954343, diff_check 3. Own version: 1, epoch: 1525955402. Proceeding with remote sync
2018-05-10 12:30:02 [INFO] Cluster: detected a peer 10.138.180.183:6032 with mysql_users version 3, epoch 1525954356, diff_check 3. Own version: 1, epoch: 1525955402. Proceeding with remote sync
…
2018-05-10 12:30:03 [INFO] Cluster: detected peer 10.138.180.183:6032 with mysql_users version 3, epoch 1525954356
2018-05-10 12:30:03 [INFO] Cluster: Fetching MySQL Users from peer 10.138.180.183:6032 started
2018-05-10 12:30:03 [INFO] Cluster: Fetching MySQL Users from peer 10.138.180.183:6032 completed
2018-05-10 12:30:03 [INFO] Cluster: Loading to runtime MySQL Users from peer 10.138.180.183:6032
2018-05-10 12:30:03 [INFO] Cluster: Saving to disk MySQL Query Rules from peer 10.138.180.183:6032

Looking at the status from the checksum table, we can see node3 is now up to date.

admin proxysql2 ((none))>SELECT hostname, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at FROM stats_proxysql_servers_checksums WHERE name='mysql_users' ORDER BY hostname;
+----------------+--------------------+---------------------+---------------------+
| hostname | checksum | changed_at | updated_at |
+----------------+--------------------+---------------------+---------------------+
| 10.138.180.183 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:39 | 2018-05-10 12:21:35 |
| 10.138.244.108 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:38 |2018-05-10 12:21:35 |
| 10.138.244.244 | 0x3D819A34C06EF4EA | 2018-05-10 11:19:39 |2018-05-10 12:21:35 |
+----------------+--------------------+---------------------+---------------------+
3 rows in set (0.00 sec)admin proxysql2 ((none))>SELECT hostname, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at FROM stats_proxysql_servers_checksums WHERE name='mysql_users' ORDER BY hostname;
+----------------+--------------------+---------------------+---------------------+
| hostname | checksum | changed_at | updated_at |
+----------------+--------------------+---------------------+---------------------+
| 10.138.180.183 | 0x3928F574AFFF4C65 | 2018-05-10 12:12:24 | 2018-05-10 12:31:58 |
| 10.138.244.108 | 0x3928F574AFFF4C65 | 2018-05-10 12:12:23 | 2018-05-10 12:31:58 |
| 10.138.244.244 | 0x3928F574AFFF4C65 | 2018-05-10 12:30:19 | 2018-05-10 12:31:58 |
+----------------+--------------------+---------------------+---------------------+
3 rows in set (0.00 sec)

Now we have 3 ProxySQL nodes up to date. This example didn’t add any MySQL servers, hostgroups, etc, because the functionality is the same. The post is intended as an introduction to this new feature and how you can create and test a ProxySQL cluster.

Just remember that this is still an experimental feature and is subject to change with newer versions of ProxySQL.

Summary

This feature is really needed if you have more than one ProxySQL running for the same application in different instances. It is easy to maintain and configure for a single person and is easy to create and attach new nodes.

Hope you find this post helpful!

http://www.proxysql.com/blog/proxysql-cluster
http://www.proxysql.com/blog/proxysql-cluster-part2
http://www.proxysql.com/blog/proxysql-cluster-part3-mysql-servers
https://github.com/sysown/proxysql/wiki/ProxySQL-Cluster

The post ProxySQL Experimental Feature: Native ProxySQL Clustering appeared first on Percona Database Performance Blog.

by Walter Garcia at June 11, 2018 12:18 PM

June 07, 2018

Jean-Jerome Schmidt

MySQL on Docker: Running a MariaDB Galera Cluster without Container Orchestration Tools - Part 1

Container orchestration tools simplify the running of a distributed system, by deploying and redeploying containers and handling any failures that occur. One might need to move applications around, e.g., to handle updates, scaling, or underlying host failures. While this sounds great, it does not always work well with a strongly consistent database cluster like Galera. You can’t just move database nodes around, they are not stateless applications. Also, the order in which you perform operations on a cluster has high significance. For instance, restarting a Galera cluster has to start from the most advanced node, or else you will lose data. Therefore, we’ll show you how to run Galera Cluster on Docker without a container orchestration tool, so you have total control.

In this blog post, we are going to look into how to run a MariaDB Galera Cluster on Docker containers using the standard Docker image on multiple Docker hosts, without the help of orchestration tools like Swarm or Kubernetes. This approach is similar to running a Galera Cluster on standard hosts, but the process management is configured through Docker.

Before we jump further into details, we assume you have installed Docker, disabled SElinux/AppArmor and cleared up the rules inside iptables, firewalld or ufw (whichever you are using). The following are three dedicated Docker hosts for our database cluster:

  • host1.local - 192.168.55.161
  • host2.local - 192.168.55.162
  • host3.local - 192.168.55.163

Multi-host Networking

First of all, the default Docker networking is bound to the local host. Docker Swarm introduces another networking layer called overlay network, which extends the container internetworking to multiple Docker hosts in a cluster called Swarm. Long before this integration came into place, there were many network plugins developed to support this - Flannel, Calico, Weave are some of them.

Here, we are going to use Weave as the Docker network plugin for multi-host networking. This is mainly due to its simplicity to get it installed and running, and support for DNS resolver (containers running under this network can resolve each other's hostname). There are two ways to get Weave running - systemd or through Docker. We are going to install it as a systemd unit, so it's independent from Docker daemon (otherwise, we would have to start Docker first before Weave gets activated).

  1. Download and install Weave:

    $ curl -L git.io/weave -o /usr/local/bin/weave
    $ chmod a+x /usr/local/bin/weave
  2. Create a systemd unit file for Weave:

    $ cat > /etc/systemd/system/weave.service << EOF
    [Unit]
    Description=Weave Network
    Documentation=http://docs.weave.works/weave/latest_release/
    Requires=docker.service
    After=docker.service
    [Service]
    EnvironmentFile=-/etc/sysconfig/weave
    ExecStartPre=/usr/local/bin/weave launch --no-restart $PEERS
    ExecStart=/usr/bin/docker attach weave
    ExecStop=/usr/local/bin/weave stop
    [Install]
    WantedBy=multi-user.target
    EOF
  3. Define IP addresses or hostname of the peers inside /etc/sysconfig/weave:

    $ echo 'PEERS="192.168.55.161 192.168.55.162 192.168.55.163"' > /etc/sysconfig/weave
  4. Start and enable Weave on boot:

    $ systemctl start weave
    $ systemctl enable weave

Repeat the above 4 steps on all Docker hosts. Verify with the following command once done:

$ weave status

The number of peers is what we are looking after. It should be 3:

          ...
          Peers: 3 (with 6 established connections)
          ...

Running a Galera Cluster

Now the network is ready, it's time to fire our database containers and form a cluster. The basic rules are:

  • Container must be created under --net=weave to have multi-host connectivity.
  • Container ports that need to be published are 3306, 4444, 4567, 4568.
  • The Docker image must support Galera. If you'd like to use Oracle MySQL, then get the Codership version. If you'd like Percona's, use this image instead. In this blog post, we are using MariaDB's.

The reasons we chose MariaDB as the Galera cluster vendor are:

  • Galera is embedded into MariaDB, starting from MariaDB 10.1.
  • The MariaDB image is maintained by the Docker and MariaDB teams.
  • One of the most popular Docker images out there.

Bootstrapping a Galera Cluster has to be performed in sequence. Firstly, the most up-to-date node must be started with "wsrep_cluster_address=gcomm://". Then, start the remaining nodes with a full address consisting of all nodes in the cluster, e.g, "wsrep_cluster_address=gcomm://node1,node2,node3". To accomplish these steps using container, we have to do some extra steps to ensure all containers are running homogeneously. So the plan is:

  1. We would need to start with 4 containers in this order - mariadb0 (bootstrap), mariadb2, mariadb3, mariadb1.
  2. Container mariadb0 will be using the same datadir and configdir with mariadb1.
  3. Use mariadb0 on host1 for the first bootstrap, then start mariadb2 on host2, mariadb3 on host3.
  4. Remove mariadb0 on host1 to give way for mariadb1.
  5. Lastly, start mariadb1 on host1.

At the end of the day, you would have a three-node Galera Cluster (mariadb1, mariadb2, mariadb3). The first container (mariadb0) is a transient container for bootstrapping purposes only, using cluster address "gcomm://". It shares the same datadir and configdir with mariadb1 and will be removed once the cluster is formed (mariadb2 and mariadb3 are up) and nodes are synced.

By default, Galera is turned off in MariaDB and needs to be enabled with a flag called wsrep_on (set to ON) and wsrep_provider (set to the Galera library path) plus a number of Galera-related parameters. Thus, we need to define a custom configuration file for the container to configure Galera correctly.

Let's start with the first container, mariadb0. Create a file under /containers/mariadb0/conf.d/my.cnf and add the following lines:

$ mkdir -p /containers/mariadb0/conf.d
$ cat /containers/mariadb0/conf.d/my.cnf
[mysqld]

default_storage_engine          = InnoDB
binlog_format                   = ROW

innodb_flush_log_at_trx_commit  = 0
innodb_flush_method             = O_DIRECT
innodb_file_per_table           = 1
innodb_autoinc_lock_mode        = 2
innodb_lock_schedule_algorithm  = FCFS # MariaDB >10.1.19 and >10.2.3 only

wsrep_on                        = ON
wsrep_provider                  = /usr/lib/galera/libgalera_smm.so
wsrep_sst_method                = xtrabackup-v2

Since the image doesn't come with MariaDB Backup (which is the preferred SST method for MariaDB 10.1 and MariaDB 10.2), we are going to stick with xtrabackup-v2 for the time being.

To perform the first bootstrap for the cluster, run the bootstrap container (mariadb0) on host1:

$ docker run -d \
        --name mariadb0 \
        --hostname mariadb0.weave.local \
        --net weave \
        --publish "3306" \
        --publish "4444" \
        --publish "4567" \
        --publish "4568" \
        $(weave dns-args) \
        --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
        --env MYSQL_USER=proxysql \
        --env MYSQL_PASSWORD=proxysqlpassword \
        --volume /containers/mariadb1/datadir:/var/lib/mysql \
        --volume /containers/mariadb1/conf.d:/etc/mysql/mariadb.conf.d \
        mariadb:10.2.15 \
        --wsrep_cluster_address=gcomm:// \
        --wsrep_sst_auth="root:PM7%cB43$sd@^1" \
        --wsrep_node_address=mariadb0.weave.local

The parameters used in the the above command are:

  • --name, creates the container named "mariadb0",
  • --hostname, assigns the container a hostname "mariadb0.weave.local",
  • --net, places the container in the weave network for multi-host networing support,
  • --publish, exposes ports 3306, 4444, 4567, 4568 on the container to the host,
  • $(weave dns-args), configures DNS resolver for this container. This command can be translated into Docker run as "--dns=172.17.0.1 --dns-search=weave.local.",
  • --env MYSQL_ROOT_PASSWORD, the MySQL root password,
  • --env MYSQL_USER, creates "proxysql" user to be used later with ProxySQL for database routing,
  • --env MYSQL_PASSWORD, the "proxysql" user password,
  • --volume /containers/mariadb1/datadir:/var/lib/mysql, creates /containers/mariadb1/datadir if does not exist and map it with /var/lib/mysql (MySQL datadir) of the container (for bootstrap node, this could be skipped),
  • --volume /containers/mariadb1/conf.d:/etc/mysql/mariadb.conf.d, mounts the files under directory /containers/mariadb1/conf.d of the Docker host, into the container at /etc/mysql/mariadb.conf.d.
  • mariadb:10.2.15, uses MariaDB 10.2.15 image from here,
  • --wsrep_cluster_address, Galera connection string for the cluster. "gcomm://" means bootstrap. For the rest of the containers, we are going to use a full address instead.
  • --wsrep_sst_auth, authentication string for SST user. Use the same user as root,
  • --wsrep_node_address, the node hostname, in this case we are going to use the FQDN provided by Weave.

The bootstrap container contains several key things:

  • The name, hostname and wsrep_node_address is mariadb0, but it uses the volumes of mariadb1.
  • The cluster address is "gcomm://"
  • There are two additional --env parameters - MYSQL_USER and MYSQL_PASSWORD. This parameters will create additional user for our proxysql monitoring purpose.

Verify with the following command:

$ docker ps
$ docker logs -f mariadb0

Once you see the following line, it indicates the bootstrap process is completed and Galera is active:

2018-05-30 23:19:30 139816524539648 [Note] WSREP: Synchronized with group, ready for connections

Create the directory to load our custom configuration file in the remaining hosts:

$ mkdir -p /containers/mariadb2/conf.d # on host2
$ mkdir -p /containers/mariadb3/conf.d # on host3

Then, copy the my.cnf that we've created for mariadb0 and mariadb1 to mariadb2 and mariadb3 respectively:

$ scp /containers/mariadb1/conf.d/my.cnf /containers/mariadb2/conf.d/ # on host1
$ scp /containers/mariadb1/conf.d/my.cnf /containers/mariadb3/conf.d/ # on host1

Next, create another 2 database containers (mariadb2 and mariadb3) on host2 and host3 respectively:

$ docker run -d \
        --name ${NAME} \
        --hostname ${NAME}.weave.local \
        --net weave \
        --publish "3306:3306" \
        --publish "4444" \
        --publish "4567" \
        --publish "4568" \
        $(weave dns-args) \
        --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
        --volume /containers/${NAME}/datadir:/var/lib/mysql \
        --volume /containers/${NAME}/conf.d:/etc/mysql/mariadb.conf.d \
        mariadb:10.2.15 \
    
--wsrep_cluster_address=gcomm://mariadb0.weave.local,mariadb1.weave.local,mariadb2.weave.local,mariadb3.weave.local \
        --wsrep_sst_auth="root:PM7%cB43$sd@^1" \
        --wsrep_node_address=${NAME}.weave.local

** Replace ${NAME} with mariadb2 or mariadb3 respectively.

However, there is a catch. The entrypoint script checks the mysqld service in the background after database initialization by using MySQL root user without password. Since Galera automatically performs synchronization through SST or IST when starting up, the MySQL root user password will change, mirroring the bootstrapped node. Thus, you would see the following error during the first start up:

018-05-30 23:27:13 140003794790144 [Warning] Access denied for user 'root'@'localhost' (using password: NO)
MySQL init process in progress…
MySQL init process failed.

The trick is to restart the failed containers once more, because this time, the MySQL datadir would have been created (in the first run attempt) and it would skip the database initialization part:

$ docker start mariadb2 # on host2
$ docker start mariadb3 # on host3

Once started, verify by looking at the following line:

$ docker logs -f mariadb2
…
2018-05-30 23:28:39 139808069601024 [Note] WSREP: Synchronized with group, ready for connections

At this point, there are 3 containers running, mariadb0, mariadb2 and mariadb3. Take note that mariadb0 is started using the bootstrap command (gcomm://), which means if the container is automatically restarted by Docker in the future, it could potentially become disjointed with the primary component. Thus, we need to remove this container and replace it with mariadb1, using the same Galera connection string with the rest and use the same datadir and configdir with mariadb0.

First, stop mariadb0 by sending SIGTERM (to ensure the node is going to be shutdown gracefully):

$ docker kill -s 15 mariadb0

Then, start mariadb1 on host1 using similar command as mariadb2 or mariadb3:

$ docker run -d \
        --name mariadb1 \
        --hostname mariadb1.weave.local \
        --net weave \
        --publish "3306:3306" \
        --publish "4444" \
        --publish "4567" \
        --publish "4568" \
        $(weave dns-args) \
        --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
        --volume /containers/mariadb1/datadir:/var/lib/mysql \
        --volume /containers/mariadb1/conf.d:/etc/mysql/mariadb.conf.d \
        mariadb:10.2.15 \
    
--wsrep_cluster_address=gcomm://mariadb0.weave.local,mariadb1.weave.local,mariadb2.weave.local,mariadb3.weave.local \
        --wsrep_sst_auth="root:PM7%cB43$sd@^1" \
        --wsrep_node_address=mariadb1.weave.local

This time, you don't need to do the restart trick because MySQL datadir already exists (created by mariadb0). Once the container is started, verify the cluster size is 3, the status must be in Primary and the local state is synced:

$ docker exec -it mariadb3 mysql -uroot "-pPM7%cB43$sd@^1" -e 'select variable_name, variable_value from information_schema.global_status where variable_name in ("wsrep_cluster_size", "wsrep_local_state_comment", "wsrep_cluster_status", "wsrep_incoming_addresses")'
+---------------------------+-------------------------------------------------------------------------------+
| variable_name             | variable_value                                                                |
+---------------------------+-------------------------------------------------------------------------------+
| WSREP_CLUSTER_SIZE        | 3                                                                             |
| WSREP_CLUSTER_STATUS      | Primary                                                                       |
| WSREP_INCOMING_ADDRESSES  | mariadb1.weave.local:3306,mariadb3.weave.local:3306,mariadb2.weave.local:3306 |
| WSREP_LOCAL_STATE_COMMENT | Synced                                                                        |
+---------------------------+-------------------------------------------------------------------------------+

At this point, our architecture is looking something like this:

Although the run command is pretty long, it well describes the container's characteristics. It's probably a good idea to wrap the command in a script to simplify the execution steps, or use a compose file instead.

Database Routing with ProxySQL

Now we have three database containers running. The only way to access to the cluster now is to access the individual Docker host’s published port of MySQL, which is 3306 (map to 3306 to the container). So what happens if one of the database containers fails? You have to manually failover the client's connection to the next available node. Depending on the application connector, you could also specify a list of nodes and let the connector do the failover and query routing for you (Connector/J, PHP mysqlnd). Otherwise, it would be a good idea to unify the database resources into a single resource, that can be called a service.

This is where ProxySQL comes into the picture. ProxySQL can act as the query router, load balancing the database connections similar to what "Service" in Swarm or Kubernetes world can do. We have built a ProxySQL Docker image for this purpose and will maintain the image for every new version with our best effort.

Before we run the ProxySQL container, we have to prepare the configuration file. The following is what we have configured for proxysql1. We create a custom configuration file under /containers/proxysql1/proxysql.cnf on host1:

$ cat /containers/proxysql1/proxysql.cnf
datadir="/var/lib/proxysql"
admin_variables=
{
        admin_credentials="admin:admin"
        mysql_ifaces="0.0.0.0:6032"
        refresh_interval=2000
}
mysql_variables=
{
        threads=4
        max_connections=2048
        default_query_delay=0
        default_query_timeout=36000000
        have_compress=true
        poll_timeout=2000
        interfaces="0.0.0.0:6033;/tmp/proxysql.sock"
        default_schema="information_schema"
        stacksize=1048576
        server_version="5.1.30"
        connect_timeout_server=10000
        monitor_history=60000
        monitor_connect_interval=200000
        monitor_ping_interval=200000
        ping_interval_server=10000
        ping_timeout_server=200
        commands_stats=true
        sessions_sort=true
        monitor_username="proxysql"
        monitor_password="proxysqlpassword"
}
mysql_servers =
(
        { address="mariadb1.weave.local" , port=3306 , hostgroup=10, max_connections=100 },
        { address="mariadb2.weave.local" , port=3306 , hostgroup=10, max_connections=100 },
        { address="mariadb3.weave.local" , port=3306 , hostgroup=10, max_connections=100 },
        { address="mariadb1.weave.local" , port=3306 , hostgroup=20, max_connections=100 },
        { address="mariadb2.weave.local" , port=3306 , hostgroup=20, max_connections=100 },
        { address="mariadb3.weave.local" , port=3306 , hostgroup=20, max_connections=100 }
)
mysql_users =
(
        { username = "sbtest" , password = "password" , default_hostgroup = 10 , active = 1 }
)
mysql_query_rules =
(
        {
                rule_id=100
                active=1
                match_pattern="^SELECT .* FOR UPDATE"
                destination_hostgroup=10
                apply=1
        },
        {
                rule_id=200
                active=1
                match_pattern="^SELECT .*"
                destination_hostgroup=20
                apply=1
        },
        {
                rule_id=300
                active=1
                match_pattern=".*"
                destination_hostgroup=10
                apply=1
        }
)
scheduler =
(
        {
                id = 1
                filename = "/usr/share/proxysql/tools/proxysql_galera_checker.sh"
                active = 1
                interval_ms = 2000
                arg1 = "10"
                arg2 = "20"
                arg3 = "1"
                arg4 = "1"
                arg5 = "/var/lib/proxysql/proxysql_galera_checker.log"
        }
)

The above configuration will:

  • configure two host groups, the single-writer and multi-writer group, as defined under "mysql_servers" section,
  • send reads to all Galera nodes (hostgroup 20) while write operations will go to a single Galera server (hostgroup 10),
  • schedule the proxysql_galera_checker.sh,
  • use monitor_username and monitor_password as the monitoring credentials created when we first bootstrapped the cluster (mariadb0).

Copy the configuration file to host2, for ProxySQL redundancy:

$ mkdir -p /containers/proxysql2/ # on host2
$ scp /containers/proxysql1/proxysql.cnf /container/proxysql2/ # on host1

Then, run the ProxySQL containers on host1 and host2 respectively:

$ docker run -d \
        --name=${NAME} \
        --publish 6033 \
        --publish 6032 \
        --restart always \
        --net=weave \
        $(weave dns-args) \
        --hostname ${NAME}.weave.local \
        -v /containers/${NAME}/proxysql.cnf:/etc/proxysql.cnf \
        -v /containers/${NAME}/data:/var/lib/proxysql \
        severalnines/proxysql

** Replace ${NAME} with proxysql1 or proxysql2 respectively.

We specified --restart=always to make it always available regardless of the exit status, as well as automatic startup when Docker daemon starts. This will make sure the ProxySQL containers act like a daemon.

Verify the MySQL servers status monitored by both ProxySQL instances (OFFLINE_SOFT is expected for the single-writer host group):

$ docker exec -it proxysql1 mysql -uadmin -padmin -h127.0.0.1 -P6032 -e 'select hostgroup_id,hostname,status from mysql_servers'
+--------------+----------------------+--------------+
| hostgroup_id | hostname             | status       |
+--------------+----------------------+--------------+
| 10           | mariadb1.weave.local | ONLINE       |
| 10           | mariadb2.weave.local | OFFLINE_SOFT |
| 10           | mariadb3.weave.local | OFFLINE_SOFT |
| 20           | mariadb1.weave.local | ONLINE       |
| 20           | mariadb2.weave.local | ONLINE       |
| 20           | mariadb3.weave.local | ONLINE       |
+--------------+----------------------+--------------+

At this point, our architecture is looking something like this:

All connections coming from 6033 (either from the host1, host2 or container's network) will be load balanced to the backend database containers using ProxySQL. If you would like to access an individual database server, use port 3306 of the physical host instead. There is no virtual IP address as single endpoint configured for the ProxySQL service, but we could have that by using Keepalived, which is explained in the next section.

Virtual IP Address with Keepalived

Since we configured ProxySQL containers to be running on host1 and host2, we are going to use Keepalived containers to tie these hosts together and provide virtual IP address via the host network. This allows a single endpoint for applications or clients to connect to the load balancing layer backed by ProxySQL.

As usual, create a custom configuration file for our Keepalived service. Here is the content of /containers/keepalived1/keepalived.conf:

vrrp_instance VI_DOCKER {
   interface ens33               # interface to monitor
   state MASTER
   virtual_router_id 52          # Assign one ID for this route
   priority 101
   unicast_src_ip 192.168.55.161
   unicast_peer {
      192.168.55.162
   }
   virtual_ipaddress {
      192.168.55.160             # the virtual IP
}

Copy the configuration file to host2 for the second instance:

$ mkdir -p /containers/keepalived2/ # on host2
$ scp /containers/keepalived1/keepalived.conf /container/keepalived2/ # on host1

Change the priority from 101 to 100 inside the copied configuration file on host2:

$ sed -i 's/101/100/g' /containers/keepalived2/keepalived.conf

**The higher priority instance will hold the virtual IP address (in this case is host1), until the VRRP communication is interrupted (in case host1 goes down).

Then, run the following command on host1 and host2 respectively:

$ docker run -d \
        --name=${NAME} \
        --cap-add=NET_ADMIN \
        --net=host \
        --restart=always \
        --volume /containers/${NAME}/keepalived.conf:/usr/local/etc/keepalived/keepalived.conf \ osixia/keepalived:1.4.4

** Replace ${NAME} with keepalived1 and keepalived2.

The run command tells Docker to:

  • --name, create a container with
  • --cap-add=NET_ADMIN, add Linux capabilities for network admin scope
  • --net=host, attach the container into the host network. This will provide virtual IP address on the host interface, ens33
  • --restart=always, always keep the container running,
  • --volume=/containers/${NAME}/keepalived.conf:/usr/local/etc/keepalived/keepalived.conf, map the custom configuration file for container's usage.

After both containers are started, verify the virtual IP address existence by looking at the physical network interface of the MASTER node:

$ ip a | grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 192.168.55.161/24 brd 192.168.55.255 scope global ens33
    inet 192.168.55.160/32 scope global ens33

The clients and applications may now use the virtual IP address, 192.168.55.160 to access the database service. This virtual IP address exists on host1 at this moment. If host1 goes down, keepalived2 will take over the IP address and bring it up on host2. Take note that the configuration for this keepalived does not monitor the ProxySQL containers. It only monitors the VRRP advertisement of the Keepalived peers.

At this point, our architecture is looking something like this:

Summary

So, now we have a MariaDB Galera Cluster fronted by a highly available ProxySQL service, all running on Docker containers.

In part two, we are going to look into how to manage this setup. We’ll look at how to perform operations like graceful shutdown, bootstrapping, detecting the most advanced node, failover, recovery, scaling up/down, upgrades, backup and so on. We will also discuss the pros and cons of having this setup for our clustered database service.

Happy containerizing!

by ashraf at June 07, 2018 09:58 AM

June 06, 2018

Oli Sennhauser

Special MySQL and MariaDB trainings 2018 in English

Due to a strong customer demand FromDual offers 2018 two extra MySQL/MariaDB trainings with its Training partner The Linuxhotel in Essen (Germany). Those trainings are in English.

  • MariaDB Performance Tuning on 5 and 6 September 2018 (2 days).
  • Advanced MySQL/MariaDB training on 26 to 30 November 2018 (5 days).

More information about the contents of the trainings can be found at Advanced MySQL and MariaDB training.

For conditions and booking: MariaDB Performance Tuning and Advanced MySQL Training.

For specific MariaDB or MySQL on-site Consulting or in-house Training please get in contact with us.

by Shinguz at June 06, 2018 01:49 PM

June 05, 2018

MariaDB Foundation

Developer tip: test MariaDB install/upgrade quickly with Docker

Here is a quick tip for any developer who might want to test if the latest development version of MariaDB installs/upgrades. Traditionally, developers seem to have a bunch of virtual machines lying around which they use to test MariaDB installation and upgrade related things. Snapshotting virtual images, keeping them up-to-date, starting, stopping etc. takes a […]

The post Developer tip: test MariaDB install/upgrade quickly with Docker appeared first on MariaDB.org.

by Otto Kekäläinen at June 05, 2018 11:45 AM

Jean-Jerome Schmidt

Webinar: MySQL & MariaDB Performance Tuning for Dummies

You’re running MySQL or MariaDB as backend database, how do you tune it to make best use of the hardware? How do you optimize the Operating System? How do you best configure MySQL or MariaDB for a specific database workload?

Do these questions sound familiar to you? Maybe you’re having to deal with that type of situation yourself?

MySQL & MariaDB Performance Tuning Webinar

A database server needs CPU, memory, disk and network in order to function. Understanding these resources is important for anybody managing a production database. Any resource that is weak or overloaded can become a limiting factor and cause the database server to perform poorly.

In this webinar, we’ll discuss some of the settings that are most often tweaked and which can bring you significant improvement in the performance of your MySQL or MariaDB database. We will also cover some of the variables which are frequently modified even though they should not.

Performance tuning is not easy, especially if you’re not an experienced DBA, but you can go a surprisingly long way with a few basic guidelines.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, June 26th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)

Register Now

North America/LatAm

Tuesday, June 26th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)

Register Now

Agenda

  • What to tune and why?
  • Tuning process
  • Operating system tuning
    • Memory
    • I/O performance
  • MySQL configuration tuning
    • Memory
    • I/O performance
  • Useful tools
  • Do’s and do not’s of MySQL tuning
  • Changes in MySQL 8.0

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

This webinar builds upon blog posts by Krzysztof from the ‘Become a MySQL DBA’ series.

We look forward to “seeing” you there!

by jj at June 05, 2018 09:58 AM

June 04, 2018

Open Query Pty Ltd

GitHub acquired by Microsoft

GitHub-MarkMicrosoft has just acquired GitHub for $7.5bn.  Good or bad?

Injected VC capital was $350m, so ROI for the VCs = 21.4x = very happy VCs.

Microsoft has done excellent work on OSS software in recent years, including on the Linux kernel, PHP, and many others.  Just like Oracle continues to put very good effort into MySQL after the Sun Microsystems acquisition many years ago.

But Microsoft is not an Open Source software company. The open source development model is not something they have built into their business “DNA” – processes (actually many companies that only do OSS haven’t got that either). So why GitHub? Combine it with LinkedIn (acquired by Microsoft earlier), and you have developers’ resumes. That’s valuable. It’s a strategically smart move, for Microsoft.

Will GitHub users benefit, and if so, how?

Well, I expect there’ll be more hoovering of “useful” (meta)data by a corporation, which some LinkedIn users will find handy, but I think it’s mainly beneficial to Microsoft rather than users, and this type of gathering and combining data is fundamentally incompatible with basic privacy.  It will bite, at some point down the line.  It always does.

Fundamentally, GitHub and its use is self-contradictory.  Git explicitly enables distributed source code control and truly distributed development, whereas GitHub is very much centralised.  Don’t just walk away to something else now, that won’t address the actual problem.  Solving it properly will include having bug tracking as part of a repository, and by design not relying on a single central location, or company.  The developer community (and companies) must resolve this external dependency.

by Arjen Lentz at June 04, 2018 10:59 PM

MariaDB AB

Using MariaDB Backup and MariaDB MaxScale to Scale Online

Using MariaDB Backup and MariaDB MaxScale to Scale Online anderskarlsson4 Mon, 06/04/2018 - 10:53

This blog post is rather practical. What it aims to show is how we can use a script that in turn uses MariaDB Backup to back up a MariaDB Server master, how we can create a MariaDB Server slave from this backup and then how we can script an online update to MariaDB MaxScale 2.2 to include the new MariaDB Server slave. Lets start with describing our environment.

Our infrastructure

 

Infrastructure2.JPG

We currently have one MariaDB Server master, 2 MariaDB Server slaves and one instance of MariaDB MaxScale. They are all running CentOS 7.2 and the IP adresses are 192.168.142.110 (MariaDB MaxScale), 192.168.142.111 (MariaDB Server master), 192.168.142.112 and 192.168.142.113 (MariaDB Server slaves).

Setup of the MariaDB servers

The setup of the master server is nothing really complicated in this case, but there are a few things we have to configure to use this as a master. We need to enable the binlog and set a server id, and this means editing the /etc/my.cnf.d/server.cnf file (if you are not on CentOS or RedHat, the location might be different) and add the following to the [mysqld] section:

server_id=111
log-bin=hostonly111

We also need to adjust the [mysqld] section in the same configuration file on the slaves, for example:

server_id=112
log_bin=hostonly112
log_error=error.log
datadir=/var/lib/mysql
report-host=192.168.142.112

And this has to be adjusted accordingly of course to fit your setup and also note that not all of these are strictly necessary. I will not show more of the master and slave configuration here, this is not the goal of this blog.

Setup of MariaDB MaxScale

There is a basic setup of MariaDB MaxScale that is assumed here, but note that MariaDB MaxScale, from version 2.2, stores a binary version of it's configuration separately. This is useful when you use online dynamic reconfiguration, but it makes things a bit more complicated. What I am showing here then is the basic MariaDB MaxScale configuration that is used to support the cluster set up as above and this is stored in the file /etc/maxscale.cnf:

# Global parameters
#
[maxscale]
threads=auto

# Server definitions
#
[server1]
type=server
address=192.168.142.111
port=3306
protocol=MariaDBBackend

[server2]
type=server
address=192.168.142.112
port=3306
protocol=MariaDBBackend

[server3]
type=server
address=192.168.142.113
port=3306
protocol=MariaDBBackend

# Monitor for the servers
#
[MariaDB-Monitor]
type=monitor
module=mariadbmon
servers=server1,server2,server3
user=myuser
passwd=mypwd
monitor_interval=1000

# Service definitions
#
[Read-Write-Service]
type=service
router=readwritesplit
servers=server1,server2,server3
user=myuser
passwd=mypwd

# This service enables the use of the MaxAdmin interface
#
[MaxAdmin-Service]
type=service
router=cli

# Listener definitions for the services
#
[Read-Write-Listener]
type=listener
service=Read-Write-Service
protocol=MariaDBClient
port=4006

[MaxAdmin-Listener]
type=listener
service=MaxAdmin-Service
protocol=maxscaled
socket=default

Note that I am not going to cover all the aspects of configuring MariaDB MaxScale here.

Backing up MariaDB Server using MariaDB Backup

Mariabackup is fully documented in the MariaDB Knowledge base so I will not get into details, rather what I aim to show is a basic bash-script that runs a backup. How you run this is not really important, but it has to be run, and the compressed and archived backup has to be placed in the agreed location. Also note that the script also runs a prepare, which means that it does any recovery necessary on the backup to create a consistent copy of the running database.

The script is far from complete, but it performs the basics. It does keep old backups, and does a few other things. It is intended to be run in the background, which is why it is configured using environments variables at the top of the script and not any command line arguments.

#!/bin/bash -eu
#
set -o pipefail
MDB_USER=root
MDB_PWD=
MDB_BACKUPBASE=/home/anders
MDB_BACKUPNAME=backup
MDB_ARCHIVENAME=backuparchive
MDB_BACKUPDIR=$MDB_BACKUPBASE/$MDB_BACKUPNAME
MDB_BACKUPARCHIVEPFX=$MDB_BACKUPBASE/$MDB_ARCHIVENAME
MDB_BACKUPARCHIVEDIR=""
MDB_BACKUPARCHIVETGZ=""
MDB_BACKUPLOG=/tmp/backup.log
MDB_BACKUPCMD=/usr/bin/mariabackup

if [ "$EUID" != "0" ]; then
   echo "$0 must be run as root" >&2
   exit 1
fi

# Check if the backup directory exists.
if [ -e "$MDB_BACKUPDIR" -o -e "$MDB_BACKUPDIR.tgz" ]; then
# Find a backup archive directory.
   for I in {1..10000}; do
      if [ ! -e "$MDB_BACKUPARCHIVEPFX$I" -a ! -e "$MDB_BACKUPARCHIVEPFX$I.tgz" ]; then
        MDB_BACKUPARCHIVEDIR="$MDB_BACKUPARCHIVEPFX$I"
        MDB_BACKUPARCHIVETGZ="$MDB_BACKUPARCHIVEPFX$I.tgz"
        break
      fi
   done

   # Check that a directory was found.
   if [ "x$MDB_BACKUPARCHIVEDIR" = "x" ]; then
      echo "Can't find a suitable backup archive directory" >&2
      exit 1
   fi

   if [ -e "$MDB_BACKUPDIR" ] ; then
      mv $MDB_BACKUPDIR $MDB_BACKUPARCHIVEDIR
   fi

   if [ -e "$MDB_BACKUPDIR.tgz" ] ; then
      mv $MDB_BACKUPDIR.tgz $MDB_BACKUPARCHIVETGZ
   fi
fi

echo >> $MDB_BACKUPLOG
echo "Starting backup on `date +"%Y-%m-%d %H:%M:%S"`" >> $MDB_BACKUPLOG

# Do the backup.
echo "Backup up to $MDB_BACKUPDIR" >> $MDB_BACKUPLOG
$MDB_BACKUPCMD --backup -u $MDB_USER ${MDB_PASS:+"-p$MDB_PASS"} --target-dir=$MDB_BACKUPDIR >> $MDB_BACKUPLOG 2>&1

# Prepare and make the backup consistent.
$MDB_BACKUPCMD --prepare -u $MDB_USER ${MDB_PASS:+"-p$MDB_PASS"} --target-dir=$MDB_BACKUPDIR >> $MDB_BACKUPLOG 2>&1

# Compress and archive the backup.
cd $MDB_BACKUPBASE
tar cvfz $MDB_BACKUPNAME.tgz $MDB_BACKUPNAME >> $MDB_BACKUPLOG 2>&1

As you can see, nothing too complicated and the resulting backup will be placed in the archive file /home/anders/backup.tgz.

Creating a slave from a master backup

The second script to present is the one that creates a slave from a master, using the content of a backup as described above. It is assumed that this slave has MariaDB Server and MariaBackup already installed and that ssh is configured to that files can be copied from the master, including appropriate keys. Then it is time to have a look at the script.

The script does quite a few things, first it figures out a suitable server_id for this slave and then it copies a backup archive from the master server and unpacks this. For a replication slave to be set up appropriately, we need to know the GTID at the point when the backup was taken, so this is recovered from a file that MariaDB Backup generates for us, that is included with the backup archive. At this point the MariaDB Server is shut down, if it is running, and the current datadir is saved.

It is then time to recover data from the backup and set up the correct ownership of the MariaDB Server directories. At this point we can start the MariaDB Server. All we need to do next is to configure the MariaDB Server as a slave and we are all set.

The script, which will run on the server of the MariaDB Server slave, takes two options: the IP address of the MariaDB Server master of this slave, and the host IP of this MariaDB Server slave. The latter isn't as obvious as one might think, and many servers, possibly most, have several host IPs (excluding localhost there is often one connected internally and one to a Firewalled DMZ).

#!/bin/bash
#
set -o pipefail
MDB_MASTERHOST=$1
MDB_SLAVEHOST=$2
MDB_USER=root
MDB_PWD=
MDB_CNFFILE=/etc/my.cnf.d/server.cnf
MDB_REPLUSER=repl
MDB_REPLPWD=repl
MDB_DEFAULTDATADIR=/var/lib/mysql
MDB_BACKUPNAME=backup
MDB_RESTORELOG=/tmp/createslave.log
MDB_BACKUPLOC=$MDB_MASTERHOST:$MDB_BACKUPNAME.tgz
MDB_BACKUPCMD=/usr/bin/mariabackup
MDB_MASTEROPTS="-h $MDB_MASTERHOST -u $MDB_REPLUSER ${MDB_REPLPWD:+"-p$MDB_REPLPWD"} --batch --column-names=0"

if [ "$EUID" != "0" ]; then
   echo "$0 must be run as root" >&2
   exit 1
fi

if [ "$#" -lt 2 ]; then
   echo "Usage: $0   []" >&2
   exit 1
fi

# Handle server id.
if [ "$#" -gt 2 ]; then
   MDB_SLAVEID=$3
   if [ "x`echo $MDB_SLAVEID | sed "s/^[0-9]*$//"`" != "x" ]; then
      echo "Slave server id invalid. It must be numeric" >&2
      exit 1
   fi
else
# Get a server id from the master if not specified.
   MDB_SLAVEID="`mysql $MDB_MASTEROPTS -e "SHOW SLAVE HOSTS" | awk '{print $1}' | sort -n | tail -1`"
   if [ "x$MDB_SLAVEID" == "x" ]; then
      MDB_SLAVEID="`mysql $MDB_MASTEROPTS -e "SELECT @@SERVER_ID"`"
   fi
   MDB_SLAVEID=$(($MDB_SLAVEID + 1))
fi

# Check if we have mariabackup
if [ ! -e "$MDB_BACKUPCMD" ]; then
   echo "Cannot find $MDB_BACKUPCMD command. Please install it" >&2
   exit 1
fi
# Check if datadir is set, else set the default.
MDB_DATADIR="`(grep "^ *datadir *=" $MDB_CNFFILE || true) | awk -F= '{print $2}'`"
if [ "x$MDB_DATADIR" == "x" ]; then
   MDB_DATADIR=$MDB_DEFAULTDATADIR
fi

# Print to log.
echo >> $MDB_RESTORELOG
echo "Starting restore on `date +"%Y-%m-%d %H:%M:%S"`" >> $MDB_RESTORELOG

# Copy backup from master.
scp $MDB_BACKUPLOC . >> $MDB_RESTORELOG 2>&1

# Remove old backup, if one exists.
if [ -e "$MDB_BACKUPNAME" ]; then
  rm -rf $MDB_BACKUPNAME
fi

# Unpack backup.
tar xvfz $MDB_BACKUPNAME.tgz >> $MDB_RESTORELOG 2>&1

# Get the GTID from the backup
GTID_POS=`cat $MDB_BACKUPNAME/xtrabackup_binlog_info | awk '{print $3}'`
echo >> $MDB_RESTORELOG 2>&1
echo "Restoring GTID: $GTID_POS" >> $MDB_RESTORELOG 2>&1

# Get MariaDB server status
STATUS=`systemctl is-active mariadb || true`
echo "MariaDB status: $STATUS" >> $MDB_RESTORELOG 2>&1

# Stop MariaDB if it is running.
if [ "$STATUS" = "active" ]; then
   echo "Stopping MariaDB" >> $MDB_RESTORELOG 2>&1
   systemctl stop mariadb >> $MDB_RESTORELOG 2>&1
   STATUS=`systemctl is-active mariadb || true`
   if [ "$STATUS" = "active" ]; then
      echo "Error stopping MariaDB" >> $MDB_RESTORELOG 2>&1
      exit 1
   fi
fi

# Save current datadir if that exists.
if [ -e "$MDB_DATADIR" ]; then
   MDB_DATADIR_SAVE="$MDB_DATADIR`date +\"%Y%m%d_%H%M%S\"`"

   if [ -e "$MDB_DATADIR_SAVE" ]; then
      for I in {1..100000}; do
         MDB_DATADIR_SAVE="$MDB_DATADIR`date +\"%Y%m%d_%H%M%S\"`_$I"
         if [ ! -e "$MDB_DATADIR_SAVE" ]; then
            break
         fi
      done
      if [ -e "$MDB_DATADIR_SAVE" ]; then
         echo "Can't find location for saved datadir" >> $MDB_RESTORELOG 2>&1
         exit 1
      fi
   fi

# Move datadir to saved location.
   mv $MDB_DATADIR $MDB_DATADIR_SAVE
fi


# Find mysqld group in config file.
GRPLINE=`grep -n "\[mysqld\]" $MDB_CNFFILE | tail -1 | awk -F: '{print $1}'`
# If a group wasn't found, then add one.
if [ "x$GRPLINE" == "x" ]; then
   echo "[mysqld]" >> $MDB_CNFFILE
   GRPLINE=`grep -n "\[mysqld\]" $MDB_CNFFILE | awk -F: '{print $1}'`
fi

# Set up section of variables to set.
NEWCNF=""
if [ "x`grep \"^ *server[-_]id *=\" $MDB_CNFFILE`" == "x" ]; then
   NEWCNF="server_id=$MDB_SLAVEID"
fi
if [ "x`grep \"^ *datadir *=\" $MDB_CNFFILE`" == "x" ]; then
   NEWCNF="${NEWCNF}${NEWCNF:+\n}datadir=/var/lib/mysql"
fi
if [ "x`grep \"^ *report[-_]host *=\" $MDB_CNFFILE`" == "x" ]; then
   NEWCNF="${NEWCNF}${NEWCNF:+\n}report_host=$MDB_SLAVEHOST"
fi

# Set up required variables in cnf if necessary.
if [ "x$NEWCNF" != "x" ]; then
   sed -i "${GRPLINE}a$NEWCNF" $MDB_CNFFILE
fi

# Restore from backup.
$MDB_BACKUPCMD --move-back --target-dir=$PWD/$MDB_BACKUPNAME >> $MDB_RESTORELOG 2>&1

# Set correct ownership.
chown -R mysql:mysql $MDB_DATADIR
chmod 755 $MDB_DATADIR

# Start MariaDB again.
systemctl start mariadb >> $MDB_RESTORELOG 2>&1

# Get MariaDB server status
STATUS=`systemctl is-active mariadb || true`

# Stop if MariaDB is not running.
if [ "$STATUS" != "active" ]; then
   echo "Error starting MariaDB" >> $MDB_RESTORELOG 2>&1
   exit 1
fi

# Set up node as slave.
mysql -u $MDB_USER ${MDB_PWD:+"-p$MDB_PWD"} -e "SET GLOBAL gtid_slave_pos = '$GTID_POS'" >> $MDB_RESTORELOG 2>&1
mysql -u $MDB_USER ${MDB_PWD:+"-p$MDB_PWD"} -e "CHANGE MASTER TO MASTER_HOST='$MDB_MASTERHOST',\ 
  MASTER_USER='$MDB_REPLUSER', MASTER_PASSWORD='$MDB_REPLPWD', MASTER_USE_GTID=current_pos" >> \ 
  $MDB_RESTORELOG 2>&1
mysql -u $MDB_USER ${MDB_PWD:+"-p$MDB_PWD"} -e "START SLAVE" >> $MDB_RESTORELOG 2>&

After running the script above, we have yet another slave attached to the master, but one thing remains, which is to tell MariaDB MaxScale to route reads also to the newly attached server.

Automated and online reconfiguration of MariaDB MaxScale

The last step is to add our new slave to the MariaDB MaxScale configuration. The way this script works is that it attaches to the master and use that to determine which slaves exist. This is compared to the slaves that MariaDB MaxScale knows about and if one it missing, it is added. Among the programs used by this script is curl and above all jq, which is used to parse JSON. These tools have to be installed to use this script.

#!/bin/bash
#
MAX_HOST=localhost
MAX_PORT=8989
MAX_USER=admin
MAX_PWD=mariadb
MDB_PORT=3306
MDB_USER=myuser
MDB_PWD=mypwd
SERVERID_PREFIX=server
CURL_OPTS="--user $MAX_USER:$MAX_PWD --silent"
if [ "x`which curl 2> /dev/null`" == "x" ]; then
   echo "Cannot find curl program" 1>&2
   exit 1
fi
if [ "x`which mysql 2> /dev/null`" == "x" ]; then
   echo "Cannot find mysql program" 1>&2
   exit 1
fi
if [ "x`which jq 2> /dev/null`" == "x" ]; then
   echo "Cannot find jq program" 1>&2
   exit 1
fi

#
# Function to add a server.
# Arguments:
# Server address
# Server node name
#
function addserver {
   curl $CURL_OPTS -H "Content-Type:application/json" http://$MAX_HOST:$MAX_PORT/v1/servers -d '{
   "data": {
      "id": "'$2'",
      "type": "servers",
      "attributes": {
         "parameters": {
            "address": "'$1'",
             "port": 3306,
             "protocol": "MariaDBBackend"
         }
      },
      "relationships": {
         "services" : {
            "data": ['$SERVICELIST']
         },
         "monitors" : {
            "data": ['$MONITORLIST']
         }
      }
   }
}'
}

#
# Function to generate a suitable server_id
#
function get_serverid {
   for I in {1..10000}; do
      found=0
      for S in $SERVERIDS; do
         if [ "$SERVERID_PREFIX$I" == "$S" ]; then
            found=1
            break
         fi
      done
      if [ $found -eq 0 ]; then
         echo "$SERVERID_PREFIX$I"
         break
      fi
   done
   return 0
}

MASTER=`curl $CURL_OPTS http://$MAX_HOST:$MAX_PORT/v1/servers | jq --raw-output '.data[].attributes | select(.state == "Master, Running") | .parameters.address'`
MASTERID=`curl $CURL_OPTS http://$MAX_HOST:$MAX_PORT/v1/servers | jq --raw-output '.data[] | select(.attributes.state == "Master, Running") | .id'`
if [ "x$MASTER" == "x" ]; then
   echo "Cannot find a master node" 1>&2
   exit 1
fi
MASTER_SERVICES=`curl $CURL_OPTS http://$MAX_HOST:$MAX_PORT/v1/servers/$MASTERID | jq --raw-output '.data.relationships.services.data[].id'`
MASTER_MONITORS=`curl $CURL_OPTS http://$MAX_HOST:$MAX_PORT/v1/servers/$MASTERID | jq --raw-output '.data.relationships.monitors.data[].id'`
SERVERS=`curl $CURL_OPTS --silent http://$MAX_HOST:$MAX_PORT/v1/servers | jq --raw-output '.data[].attributes.parameters.address' | sort`
SERVERIDS=`curl $CURL_OPTS --silent http://$MAX_HOST:$MAX_PORT/v1/servers | jq --raw-output '.data[].id' | sort`
SLAVES=`mysql -h $MASTER -P $MDB_PORT -u $MDB_USER -p$MDB_PWD -e "show processlist" --batch | grep "Binlog Dump" | awk '{sub(/:[0-9]*/, "", $3); print $3;}'`

# Create JSON list of services.
SERVICELIST=""
for S in $MASTER_SERVICES; do
   SERVICELIST="${SERVICELIST:+$SERVICELIST,}{\"id\":\"$S\",\"type\":\"services\"}"
done

# Create JSON list of monitors.
MONITORLIST=""
for S in $MASTER_MONITORS; do
   MONITORLIST="${MONITORLIST:+$MONITORLIST,}{\"id\":\"$S\",\"type\":\"monitors\"}"
done

# Loop for all slaves and see if they are defined in maxscale.
for S in $SLAVES; do
   found=0
   for SE in $SERVERS; do
      if [ "$S" == "$SE" ]; then
         found=1
         break;
      fi
   done

# If server is not found in maxscale, then add it.
   if [ $found -eq 0 ]; then
      echo "Server $S not found in MaxScale. Adding"
      SRVID=$(get_serverid)
      echo $SRVID
      addserver $S $SRVID
      SERVERIDS="$SERVERIDS $SRVID"
   fi
done

Conclusion

MariaDB MaxScale provides a powerful, flexible and convenient means to build a scalable MariaDB Server cluster, be it Galera or a Replicated Cluster. MariaDB Backup on the other hand is a powerful and flexible online backup solution for MariaDB Server. Combining these technologies means that a powerful and scalable environment can easily be built, and it is scalable and flexible without downtime.

References

Happy SQL'ing
/Karlsson

MariaDB backup is the online backup tool that is part of MariaDB TX, it is an online backup tool that supports many advanced features. MariaDB MaxScale is a database proxy that, among other things allows read/write split scalability, online configuration and much more and which is also included with MariaDB TX. By combining these tools, this blog shows how a replication cluster can be scaled online without downtime.

Taylor Wilson

Wed, 06/13/2018 - 07:28

MariaDB

What is MariaDB Backup do you have any video tutorial if you have than share? https://www.DissertationPlus.co.uk/

Login or Register to post comments

by anderskarlsson4 at June 04, 2018 02:53 PM

June 01, 2018

Peter Zaitsev

This Week in Data with Colin Charles 40: a Peak at Blockchain, Lots of MariaDB News, then Back on the Road

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Shortly after the last dispatch, I jetted off for a spot of vacation (which really meant I was checking out the hype behind Blockchain with a database developer lens at the Blockchain Week NYC), and then some customer visits in Seoul, which explains the short hiatus. Here’s to making this more regular as the summer approaches.

I am about to embark on a fairly long trip, covering a few upcoming appearances: Lisbon for the Percona Engineering meeting, SouthEastLinuxFest in Charlotte, the Open Source Data Centre Conference in Berlin and then the DataOps Barcelona event. I have some discount codes: 50% discount for OSDC with the code OSDC_FOR_FRIENDS, and 50% discount for DataOps Barcelona with the code dataopsbcn50. Expect this column to reflect my travels over the next few weeks.

There has been a lot of news on the MariaDB front: MariaDB 10.3.7 went stable/GA! You might have noticed more fanfare around the release name MariaDB TX 3.0, but the reality is you can still get this download from your usual MariaDB Foundation site. It is worth noting that the MariaDB Foundation 2017 financials have also been released. Some may have noticed a couple months back there was a press release titled Report “State of the Open-Source DBMS Market, 2018” by Gartner Includes Pricing Comparison With MariaDB. This led to a Gartner report on the State of the Open-Source DBMS Market, 2018; although the report has since been pulled. Hopefully we see it surface again.

In the meantime, please do try out MariaDB 10.3.7 and it would be great to hear feedback. I also have an upcoming Percona webinar on MariaDB Server 10.3 on June 26 2018 — when the sign up link appears, I will be sure to include it here.

Well written, and something worth discussing: Should Red Hat Buy or Build a Database?. The Twitter discussion is also worth looking at.

Releases

Link List

Upcoming appearances

Feedback

I look forward to receiving feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 40: a Peak at Blockchain, Lots of MariaDB News, then Back on the Road appeared first on Percona Database Performance Blog.

by Colin Charles at June 01, 2018 04:45 PM

Jean-Jerome Schmidt

How to Recover Galera Cluster or MySQL Replication from Split Brain Syndrome

You may have heard about the term “split brain”. What it is? How does it affect your clusters? In this blog post we will discuss what exactly it is, what danger it may pose to your database, how we can prevent it, and if everything goes wrong, how to recover from it.

Long gone are the days of single instances, nowadays almost all databases run in replication groups or clusters. This is great for high availability and scalability, but a distributed database introduces new dangers and limitations. One case which can be deadly is a network split. Imagine a cluster of multiple nodes which, due to network issues, was split in two parts. For obvious reasons (data consistency), both parts shouldn’t handle traffic at the same time as they are isolated from each other and data cannot be transferred between them. It is also wrong from the application point of view - even if, eventually, there would be a way to sync the data (although reconciliation of 2 datasets is not trivial). For a while, part of the application would be unaware of the changes made by other application hosts, which accesses the other part of the database cluster. This can lead to serious problems.

The condition in which the cluster has been divided in two or more parts that are willing to accept writes is called “split brain”.

The biggest problem with split brain is data drift, as writes happen on both parts of the cluster. None of MySQL flavors provide automated means of merging datasets that have diverged. You will not find such feature in MySQL replication, Group Replication or Galera. Once the data has diverged, the only option is to either use one of the parts of the cluster as the source of truth and discard changes executed on the other part - unless we can follow some manual process in order to merge the data.

This is why we will start with how to prevent split brain from happening. This is so much easier than having to fix any data discrepancy.

How to prevent split brain

The exact solution depends on the type of the database and the setup of the environment. We will take a look at some of the most common cases for Galera Cluster and MySQL Replication.

Galera cluster

Galera has a built-in “circuit breaker” to handle split brain: it rely on a quorum mechanism. If a majority (50% + 1) of the nodes are available in the cluster, Galera will operate normally. If there is no majority, Galera will stop serving traffic and switch to so called “non-Primary” state. This is pretty much all you need to deal with a split brain situation while using Galera. Sure, there are manual methods to force Galera into “Primary” state even if there’s not a majority. Thing is, unless you do that, you should be safe.

The way how quorum is calculated has important repercussions - at a single datacenter level, you want to have an odd number of nodes. Three nodes give you a tolerance for failure of one node (2 nodes match the requirement of more than 50% of the nodes in the cluster being available). Five nodes will give you a tolerance for failure of two nodes (5 - 2 = 3 which is more than 50% from 5 nodes). On the other hand, using four nodes will not improve your tolerance over three node cluster. It would still handle only a failure of one node (4 - 1 = 3, more than 50% from 4) while failure of two nodes will render the cluster unusable (4 - 2 = 2, just 50%, not more).

While deploying Galera cluster in a single datacenter, please keep in mind that, ideally, you would like to distribute nodes across multiple availability zones (separate power source, network, etc.) - as long as they do exist in your datacenter, that is. A simple setup may look like below:

At the multi-datacenter level, those considerations are also applicable. If you want Galera cluster to automatically handle datacenter failures, you should use an odd number of datacenters. To reduce costs, you can use a Galera arbitrator in one of them instead of a database node. Galera arbitrator (garbd) is a process which takes part in the quorum calculation but it does not contain any data. This makes it possible to use it even on very small instances as it is not resource-intensive - although the network connectivity has to be good as it ‘sees’ all the replication traffic. Example setup may look like on a diagram below:

MySQL Replication

With MySQL replication the biggest issue is that there is no quorum mechanism builtin, as it is in Galera cluster. Therefore more steps are required to ensure that your setup will not be affected by a split brain.

One method is to avoid cross-datacenter automated failovers. You can configure your failover solution (it can be through ClusterControl, or MHA or Orchestrator) to failover only within single datacenter. If there was a full datacenter outage, it would be up to the admin to decide how to failover and how to ensure that the servers in the failed datacenter will not be used.

There are options to make it more automated. You can use Consul to store data about the nodes in the replication setup, and which one of them is the master. Then it will be up to the admin (or via some scripting) to update this entry and move writes to the second datacenter. You can benefit from an Orchestrator/Raft setup where Orchestrator nodes can be distributed across multiple datacenters and detect split brain. Based on this you could take different actions like, as we mentioned previously, update entries in our Consul or etcd. The point is that this is a much more complex environment to setup and automate than Galera cluster. Below you can find example of multi-datacenter setup for MySQL replication.

Please keep in mind that you still have to create scripts to make it work, i.e. monitor Orchestrator nodes for a split brain and take necessary actions to implement STONITH and ensure that the master in datacenter A will not be used once the network converge and connectivity will be restored.

Split brain happened - what to do next?

The worst case scenario happened and we have data drift. We will try to give you some hints what can be done here. Unfortunately, the exact steps will depend mostly on your schema design so it will not be possible to write a precise how-to guide.

What you have to keep in mind is that the ultimate goal will be to copy data from one master to the other and recreate all relations between tables.

First of all, you have to identify which node will continue serving data as master. This is a dataset to which you will merge data stored on the other “master” instance. Once that’s done, you have to identify data from old master which is missing on the current master. This will be manual work. If you have timestamps in your tables, you can leverage them to pinpoint the missing data. Ultimately, binary logs will contain all data modifications so you can rely on them. You may also have to rely on your knowledge of the data structure and relations between tables. If your data is normalized, one record in one table could be related to records in other tables. For example, your application may insert data to “user” table which is related to “address” table using user_id. You will have to find all related rows and extract them.

Next step will be to load this data into the new master. Here comes the tricky part - if you prepared your setups beforehand, this could be simply a matter of running a couple of inserts. If not, this may be rather complex. It’s all about primary key and unique index values. If your primary key values are generated as unique on each server using some sort of UUID generator or using auto_increment_increment and auto_increment_offset settings in MySQL, you can be sure that the data from the old master you have to insert won’t cause primary key or unique key conflicts with data on the new master. Otherwise, you may have to manually modify data from the old master to ensure it can be inserted correctly. It sounds complex, so let’s take a look at an example.

Let’s imagine we insert rows using auto_increment on node A, which is a master. For the sake of simplicity, we will focus on a single row only. There are columns ‘id’ and ‘value’.

If we insert it without any particular setup, we’ll see entries like below:

1000, ‘some value0’
1001, ‘some value1’
1002, ‘some value2’
1003, ‘some value3’

Those will replicate to the slave (B). If the split brain happens and writes will be executed on both old and new master, we will end up with following situation:

A

1000, ‘some value0’
1001, ‘some value1’
1002, ‘some value2’
1003, ‘some value3’
1004, ‘some value4’
1005, ‘some value5’
1006, ‘some value7’

B

1000, ‘some value0’
1001, ‘some value1’
1002, ‘some value2’
1003, ‘some value3’
1004, ‘some value6’
1005, ‘some value8’
1006, ‘some value9’

As you can see, there’s no way to simply dump records with id of 1004, 1005 and 1006 from node A and store them on node B because we will end up with duplicated primary key entries. What needs to be done is to change values of id column in the rows that will be inserted to a value larger than the maximum value of the id column from the table. This is all what’s needed for single rows. For more complex relations, where multiple tables are involved, you may have to make the changes in multiple locations.

On the other hand, if we had anticipated this potential problem and configured our nodes to store odd id’s on node A and even id’s on node B, the problem would have been so much easier to solve.

Node A was configured with auto_increment_offset = 1 and auto_increment_increment = 2

Node B was configured with auto_increment_offset = 2 and auto_increment_increment = 2

This is how the data would look on node A before the split brain:

1001, ‘some value0’
1003, ‘some value1’
1005, ‘some value2’
1007, ‘some value3’

When split brain happened, it will look like below.

Node A:

1001, ‘some value0’
1003, ‘some value1’
1005, ‘some value2’
1007, ‘some value3’
1009, ‘some value4’
1011, ‘some value5’
1013, ‘some value7’

Node B:

1001, ‘some value0’
1003, ‘some value1’
1005, ‘some value2’
1007, ‘some value3’
1008, ‘some value6’
1010, ‘some value8’
1012, ‘some value9’

Now we can easily copy missing data from node A:

1009, ‘some value4’
1011, ‘some value5’
1013, ‘some value7’

And load it to node B ending up with following data set:

1001, ‘some value0’
1003, ‘some value1’
1005, ‘some value2’
1007, ‘some value3’
1008, ‘some value6’
1009, ‘some value4’
1010, ‘some value8’
1011, ‘some value5’
1012, ‘some value9’
1013, ‘some value7’

Sure, rows are not in the original order, but this should be ok. In the worst case scenario you will have to order by ‘value’ column in queries and maybe add an index on it to make the sorting fast.

Now, imagine hundreds or thousands of rows and a highly normalized table structure - to restore one row may mean you will have to restore several of them in additional tables. With a need to change id’s (because you didn’t have protective settings in place) across all related rows and all of this being manual work, you can imagine that this is not the best situation to be in. It takes time to recover and it is an error-prone process. Luckily, as we discussed at the beginning, there are means to minimize chances that split brain will impact your system or to reduce the work that needs to be done to sync back your nodes. Make sure you use them and stay prepared.

by krzysztof at June 01, 2018 09:58 AM

May 31, 2018

Peter Zaitsev

Percona Server for MySQL 5.7.22-22 Is Now Available

Percona Server for MySQL

Percona Server for MySQLPercona announces the GA release of Percona Server for MySQL 5.7.22-22 on on May 31, 2018. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

Based on MySQL 5.7.22, including all the bug fixes in it, Percona Server for MySQL 5.7.22-22 is the current GA release in the Percona Server for MySQL 5.7 series. Percona provides completely open-source and free software.

New Features:
  • A new --encrypt-tmp-files option turns on encryption for the temporary files which Percona Server may create on disk for filesort, binary log transactional caches and Group Replication caches.
Bugs Fixed:
  • Executing the SHOW GLOBAL STATUS expression could cause “data drift” on global status variables in case of a query rollback: the variable, being by its nature a counter and allowing only an increase, could return to its previous value. Bug fixed #3951 (upstream #90351).
  • NUMA support was improved in Percona Server, reverting upstream implementation back to the original one,due to upstream variant being less effective in memory allocation. Now  innodb_numa_interleave variable not only enables NUMA interleave memory policy for the InnoDB buffer pool allocation, but forces NUMA interleaved allocation at the buffer pool initialization time. Bug fixed #3967.
  • audit_log_include_accounts variable did not take effect if placed in my.cnf configuration file, while still working as intended if set dynamically. Bug fixed #3867.
  • key_block_size value was set automatically by the Improved MEMORY Storage Engine, which resulted in warnings when changing the engine type to InnoDB, and constantly growing key_block_size during alter operations. Bugs fixed #3936#3940, and #3943.
  • Fixes were introduced to remove GCC 8 compilation warnings for the Percona Server build. Bug fixed #3950.
  • An InnoDB Memcached Plugin code clean-up was backported from MySQL 8.0. Bug fixed  #4506.
  • Percona Server could not be built with -DWITH_LZ4=system option on Ubuntu 14.04 (Trusty) because of too old LZ4 packages. Bug fixed #3842.
  • A regression brought during TokuDB code clean-up in 5.7.21-21 was causing assertion in cases when the FT layer returns an error during an alter table operation. Bug fixed #4294.
MyRocks Changes and fixes:
  • UPDATE statements were returning incorrect results because of not making a full table scan on tables with unique secondary index. Bug fixed #4495 (upstream facebook/mysql-5.6#830).
Other Bugs Fixed:
  • #4451 “Implement better compression algo testing”
  • #4469 “variable use out of scope bug in get_last_key test detected by ASAN in clang 6”
  • #4470 “the cachetable-simple-pin-nonblocking-cheap test occasionally fails due to a locking conflict with the cachetable evictor”
  • #4488 “-Werror is always disabled for innodb_memcached
  • #1114 “Assertion `inited == INDEX’ failed”
  • #1130 “RBR Replication with concurrent XA in READ-COMMITTED takes supremum pseudo-records and breaks replication”

Find the release notes for Percona Server for MySQL 5.7.22-22 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.7.22-22 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at May 31, 2018 06:06 PM

Don’t Drown in your Data Lake

Don't drown in your data lake

Don't drown in your data lakeA data lake is “…a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms…”1. Many companies find value in using a data lake but aren’t clear that they need to properly plan for it and maintain it in order to prevent issues.

The idea of a data lake rose from the need to store data in a raw format that is accessible to a variety of applications and authorized users. Hadoop is often used to query the data, and the necessary structures for querying are created through the query tool (schema on read) rather than as part of the data design (schema on write). There are other tools available for analysis, and many cloud providers are actively developing additional options for creating and managing your data lake. The cloud is often viewed as an ideal place for your data lake since it is inherently elastic and can expand to meet the needs of your data.

Data Lake or Data Swamp?

One of the key components of a functioning data lake is the continuing inflow and egress of data. Some data must be kept indefinitely but some can be archived or deleted after a defined period of time. Failure to remove stale data can result in a data swamp, where the out of date data is taking up valuable and costly space and may be causing queries to take longer to complete. This is one of the first issues that companies encounter in maintaining their data lake. Often, people view the data lake as a “final resting place” for data, but it really should be used for data that is accessed often, or at least occasionally.

A natural spring-fed lake can turn into a swamp due to a variety of factors. If fresh water is not allowed to flow into the lake, this can cause stagnation, meaning that plants and animals that previously were not able to be supported by the lake take hold. Similarly, if water cannot exit the lake at some point, the borders will be breached, and the surrounding land will be inundated. Both of these conditions can cause a once pristine lake to turn into a fetid and undesirable swamp. If data is no longer being added to your data lake, the results will become dated and eventually unreliable. Also, if data is always being added to the lake but is not accessed on a regular basis, this can lead to unrestricted growth of your data lake, with no real plan for how the data will be used. This can become an expensive “cold storage” facility that is likely more expensive than archived storage.

If bad or undesirable items, like old cars or garbage, are thrown into a lake, this can damage the ecosystem, causing unwanted reactions. In a data lake, this is akin to simply throwing data into the data lake with no real rules or rationale. While the data is saved, it may not be useful and can cause negative consequences across the whole environment since it is consuming space and may slow response times. Even though a basic concept of a data lake is that the data does not need to conform to a predefined structure, like you would see with a relational database, it is important that some rules and guidelines exist regarding the type and quality of data that is included in the lake. In the absence of some guidelines, it becomes difficult to access the relevant data for your needs. Proper definition and tagging of content help to ensure that the correct data is accessible and available when needed.

Unrestricted Growth Consequences

Many people have a junk drawer somewhere in their house; a drawer that is filled with old receipts, used tickets, theater programs, and the like. Some of this may be stored for sentimental reasons, but a lot of it is put into this drawer since it was a convenient dropping place for things. Similarly, if we look to the data lake as the “junk drawer” for our company, it is guaranteed to be bigger and more expensive than it truly needs to be.

It is important that the data that is stored in your data lake has a current or expected purpose. While you may not have a current use for some data, it can be helpful to keep it around in case a need arises. An example of this is in the area of machine learning. Providing more ancillary data enables better decisions since it provides a deeper view into the decision process. Therefore, maintaining some data that may not have a specific and current need can be helpful. However, there are cases where maintaining a huge volume of data can be counterproductive. Consider temperature information delivered from a switch. If the temperature reaches a specific threshold, the switch should be shut down. Reporting on the temperature in an immediate and timely manner is important to make an informed decision, but stable temperature data from days, week, or months ago could be summarized and stored in a more efficient manner. The granular details can then be purged from the lake.

So, where is the balance? If you keep all the data, it can make your data lake unwieldy and costly. If you only keep data that has a specific current purpose, you may be impairing your future plans. Obviously, the key is to monitor your access and use of the data frequently, and purge or archive some of the data that is not being regularly used.

Uncontrolled Access Concerns

Since much of the data in your data lake is company confidential, it is imperative that access to that data be controlled. The fact that the data in the lake is stored in its raw format means that it is more difficult to control access. The structures of a relational database provide some of the basis for access control, allowing us to limit who has access to specific queries, tables, fields, schemas, databases, and other objects. In the absence of these structures, controlling access requires more finesse. Determining who has access to what parts of the data in the lake must be handled, as well as isolating the data within your own network environment. Many of these restrictions may already be in place in your current environment, but they should be reviewed before being relied on fully, since the data lake may store information that was previously unavailable to some users. Access should be regularly reviewed to identify potential rogue activities. Encryption options also exist to further secure the data from unwanted access, and file system security can be used to limit access. All of these components must be considered, implemented, and reviewed to ensure that the data is secure.

User Considerations

In a relational database, the data structure inherently determines some of the consistencies and format of the data. This enables users to easily query the data and be assured that they are returning valid results. The lack of such structures in the data lake means that users must be more highly skilled at data manipulation. Having users with less skill accessing the data is possible, but it may not provide the best results. A data scientist is better positioned to access and query the complete data set. Obviously, users with a higher skill set are rare and cost more to hire, but the return may be worth it in the long run.

So What Do I Do Now?

This is an area where there are no hard and fast rules. Each company must develop and implement processes and procedures that make sense for their individual needs. Only with a plan for monitoring inputs, outputs, access patterns, and the like are you able to make a solid determination for your company’s needs. Percona can help to determine a plan for reporting usage, assess security settings, and more. As you are using the data in your data lake, we can also provide guidance regarding tools used to access the data.

1 Wikipedia, May 22, 2018

The post Don’t Drown in your Data Lake appeared first on Percona Database Performance Blog.

by Rick Golba at May 31, 2018 05:37 PM

MongoDB: deploy a replica set with transport encryption (part 3/3)

MongoDB Encryption Replica Sets

MongoDB Encryption Replica SetsIn this third and final post of the series, we look at how to configure transport encryption on a deployed MongoDB replica set. Security vulnerabilities can arise when internal personnel have legitimate access to the private network, but should not have access to the data. Encrypting intra-node traffic ensures that no one can “sniff” sensitive data on the network.

In part 1 we described MongoDB replica sets and how they work.
In part 2 we provided a step-by-step guide to deploy a simple 3-node replica set, including information on replica set configuration.

Enable Role-Based Access Control

In order for the encryption to be used in our replica set, we need first to activate Role-Based Access Control (RBAC). By default, a MongoDB installation permits anyone to connect and see the data, as in the sample deployment we created in part 2. Having RBAC enabled is mandatory for encryption.

RBAC governs access to a MongoDB system. Users are created and assigned privileges to access specific resources, such as databases and collections. Likewise, for carrying out administrative tasks, users need to be created with specific grants. Once activated, every user must authenticate themselves in order to access MongoDB.

Prior to activating RBAC, let’s create an administrative user. We’ll connect to the PRIMARY member and do the following:

rs-test:PRIMARY> use admin
switched to db admin
rs-test:PRIMARY> db.createUser({user: 'admin', pwd: 'secret', roles:['root']})
Successfully added user: { "user" : "admin", "roles" : [ "root" ] }

Let’s activate the RBAC in the configuration file /etc/mongod.conf on each node

security:
      authorization: enabled

and restart the daemon

sudo service mongod restart

Now to connect to MongoDB we issue the following command:

mongo -u admin -p secret --authenticationDatabase "admin"

Certificates

MongoDB supports X.509 certificate authentication for use with a secure TLS/SSL connection. The members can use X.509 certificates to verify their membership of the replica set.

In order to use encryption, we need to create certificates on all the nodes and have a certification authority (CA) that signs them. Since having a certification authority can be quite costly, we decide to use self-signed certificates. For our purposes, this solution ensures encryption and has no cost. Using a public CA is not necessary inside a private infrastructure.

To proceed with certificate generation we need to have openssl installed on our system and certificates need to satisfy these requirements:

  • any certificate needs to be signed by the same CA
  • the common name (CN) required during the certificate creation must correspond to the hostname of the host
  • any other field requested in the certificate creation should be a non-empty value and, hopefully, should reflect our organization details
  • it is also very important that all the fields, except the CN, should match those from the certificates for the other cluster members

The following guide describes all the steps to configure internal X.509 certificate-based encryption.

1 – Connect to one of the hosts and generate a new private key using openssl

openssl genrsa -out mongoCA.key -aes256 8192

We have created a new 8192 bit private key and saved it in the file mongoCA.key
Remember to enter a strong passphrase when requested.

2 – Sign a new CA certificate

Now we are going to create our “fake” local certification authority that we’ll use later to sign each node certificate.

During certificate creation, some fields must be filled out. We could choose these randomly but they should correspond to our organization’s details.

root@psmdb1:~# openssl req -x509 -new -extensions v3_ca -key mongoCA.key -days 365 -out
    mongoCA.crt
    Enter pass phrase for mongoCA.key:
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    -----
    Country Name (2 letter code) [AU]:US
    State or Province Name (full name) [Some-State]:California
    Locality Name (eg, city) []:San Francisco
    Organization Name (eg, company) [Internet Widgits Pty Ltd]:My Company Ltd
    Organizational Unit Name (eg, section) []:DBA
    Common Name (e.g. server FQDN or YOUR name) []:psmdb
    Email Address []:corrado@mycompany.com

3 – Issue self-signed certificates for all the nodes

For each node, we need to generate a certificate request and sign it using the CA certificate we created in the previous step.

Remember: fill out all the fields requested the same for each host, but remember to fill out a different common name (CN) that must correspond to the hostname.

For the first node issue the following commands.

openssl req -new -nodes -newkey rsa:4096 -keyout psmdb1.key -out psmdb1.csr
openssl x509 -CA mongoCA.crt -CAkey mongoCA.key -CAcreateserial -req -days 365 -in psmdb1.csr -out psmdb1.crt
cat psmdb1.key psmdb1.crt > psmdb1.pem

for the second node

openssl req -new -nodes -newkey rsa:4096 -keyout psmdb2.key -out psmdb2.csr
openssl x509 -CA mongoCA.crt -CAkey mongoCA.key -CAcreateserial -req -days 365 -in psmdb2.csr -out psmdb2.crt
cat psmdb2.key psmdb2.crt > psmdb2.pem

and for the third node

openssl req -new -nodes -newkey rsa:4096 -keyout psmdb3.key -out psmdb3.csr
openssl x509 -CA mongoCA.crt -CAkey mongoCA.key -CAcreateserial -req -days 365 -in psmdb3.csr -out psmdb3.crt
cat psmdb3.key psmdb3.crt > psmdb3.pem

4 – Place the files

We could execute all of the commands in the previous step on the same host, but now we need to copy the generated files to the proper nodes:

  • Copy to each node the CA certifcate file: mongoCA.crt
  • Copy each self signed certifcate <hostname>.pem into the relative member
  • Create on each member a directory that only the MongoDB user can read, and copy both files there

sudo mkdir -p /etc/mongodb/ssl
sudo chmod 700 /etc/mongodb/ssl
sudo chown -R mongod:mongod /etc/mongodb
sudo cp psmdb1.pem /etc/mongodb/ssl
sudo cp mongoCA.crt /etc/mongodb/ssl

Do the same on each host.

5 – Configure mongod

Finally, we need to instruct mongod about the certificates to enable the encryption.

Change the configuration file /etc/mongod.conf on each host adding the following rows:

net:
   port: 27017
   ssl:
      mode: requireSSL
      PEMKeyFile: /etc/mongodb/ssl/psmdb1.pem
      CAFile: /etc/mongodb/ssl/mongoCA.crt
      clusterFile: /etc/mongodb/ssl/psmdb1.pem
   security:
      authorization: enabled
      clusterAuthMode: x509

Restart the daemon

sudo service mongodb restart

Make sure to put the proper file names on each host (psmdb2.pem on psmdb2 host and so on)

Now, as long as we have made no mistakes, we have a properly configured replica set that is using encrypted connections.

Issue the following command to connect on node psmdb1:

mongo admin --ssl --sslCAFile /etc/mongodb/ssl/mongoCA.crt
--sslPEMKeyFile /etc/mongodb/ssl/psmdb1.pem
-u admin -p secret --host psmdb1

Access the first two articles in this series

  • Part 1: Introduces basic replica set concepts, how it works and what its main features
  • Part 2:  Provides a step-by-step guide to configure a three-node replica set

The post MongoDB: deploy a replica set with transport encryption (part 3/3) appeared first on Percona Database Performance Blog.

by Corrado Pandiani at May 31, 2018 02:59 PM

MariaDB AB

ALTER TABLE Improvements in MariaDB Server 10.3

ALTER TABLE Improvements in MariaDB Server 10.3 Marko Mäkelä Thu, 05/31/2018 - 08:10

MariaDB Server 10.3.7 (the first Generally Available release in the series) includes some ALTER TABLE improvements that are worth mentioning. Last October, I wrote about the Instant ADD COLUMN feature that was introduced in the 10.3.2 alpha release. The effort to support instant ALTER TABLE in MariaDB comes from a collaboration with ServiceNow. The first part of that, instant ADD COLUMN, was brainstormed in April 2017 by engineers from MariaDB Corporation, Alibaba and Tencent. A prototype was first developed by Vin Chen (陈福荣) from the Tencent Game DBA Team and was later refined by our team for the MariaDB version.

Part of the original plan was to introduce syntax for ALTER TABLE…ALGORITHM=INSTANT in order to be able to give a guarantee that the requested operation will be performed instantly, or not at all. This was finally implemented in MariaDB Server 10.3.7. We also introduced the keyword ALGORITHM=NOCOPY, which will refuse an operation if the table would be rebuilt.

Example

CREATE TABLE t(id INT PRIMARY KEY, u INT UNSIGNED NOT NULL UNIQUE)
ENGINE=InnoDB;
INSERT INTO t(id,u) VALUES(1,1),(2,2),(3,3);

SET alter_algorithm=instant;
ALTER TABLE t ADD COLUMN d DATETIME DEFAULT current_timestamp();
--error ER_ALTER_OPERATION_NOT_SUPPORTED
# There is no instant DROP COLUMN yet
ALTER TABLE t DROP COLUMN u;
--error ER_ALTER_OPERATION_NOT_SUPPORTED
ALTER TABLE t DROP COLUMN u, ALGORITHM=NOCOPY;
SET alter_algorithm=default;
ALTER TABLE t DROP COLUMN u;

The example illustrates a new configuration parameter alter_algorithm. A DBA could set it globally in the MariaDB Server configuration to NOCOPY in order to prevent expensive ALTER TABLE statements from being executed by mistake.

The ALGORITHM=INPLACE syntax, which was added in MariaDB Server 10.0, can misleadingly suggest that no copying ever takes place. However, until the efforts to implement instant schema changes (instant DROP COLUMN and others being worked on for upcoming MariaDB Server releases), the inplace operation often did involve copying, potentially allocating quite a bit of extra space, not only for the new copy of the table, but also for pre-sorting the data and for logging concurrent modifications to the table (ALTER ONLINE TABLE). With the ALGORITHM=INSTANT and ALGORITHM=NOCOPY clauses, which represent proper subsets of ALGORITHM=INPLACE, will are clarifying the situation.

Imitation is the sincerest form of flattery

We’re happy to hear that MySQL 8.0 will add support for instant ADD COLUMN that is based on our work. But you don’t have to wait to try it.

Try it out

MariaDB Server 10.3.7 was announced as GA on May 25, 2018. Download MariaDB TX 3.0, which includes MariaDB Server 10.3.7 to upgrade your current server and immediately get the benefit of instant ADD COLUMN for your old InnoDB tables.

Note that if you need to export InnoDB data files to older servers than MariaDB Server 10.3, you must rebuild the table first: ALTER TABLE t FORCE;

Perhaps the most user-visible InnoDB changes in the MariaDB Server 10.3.7 GA release are Instant ADD COLUMN for InnoDB tables, the new parameter alter_algorithm and the clauses ALGORITHM=INSTANT and ALGORITHM=NOCOPY that can protect the database from executing costly ALTER TABLE operations taking hours.

Upgrade to MariaDB Server 10.3.7, and enjoy the instant ADD COLUMN with your old InnoDB data files.

Taylor Wilson

Wed, 06/06/2018 - 03:49

Nice Blog

As I understand this is really good blog for user and the best thing is about this site that is user-friendly. http://www.AOneEssays.net/

Login or Register to post comments

by Marko Mäkelä at May 31, 2018 12:10 PM

May 30, 2018

Peter Zaitsev

Percona Server for MySQL 5.6.40-84.0 Is Now Available

Percona Server for MySQL

Percona Server for MySQL 5.6Percona announces the release of Percona Server for MySQL 5.6.40-84.0 on May 30, 2018 (downloads are available here and from the Percona Software Repositories). Based on MySQL 5.6.40, including all the bug fixes in it, Percona Server for MySQL 5.6.40-84.0 is now the current GA release in the 5.6 series. All of Percona’s software is open-source and free.

New Features
  • A new string variable version_suffix allows to change suffix for the Percona Server version string returned by the read-only version variable. This allows to append the version number for the server with a custom suffix to reflect some build or configuration specifics. Also version_comment (default value of which is taken from the CMake COMPILATION_COMMENT option) is converted from a global read-only to a global read-write variable and thereby it is now cutomizable.
  • Query response time plugin now can be disabled at session level with use of a new variable query_response_time_session_stats.
Bugs Fixed
  • Compilation warning was fixed for -DWITH_QUERY_RESPONSE_TIME=ON CMake compilation option, which makes QRT to be linked statically. Bug fixed #3841.
  • A code clean-up was done to fix clang 6 specific compilation warnings and errors (bug fixed #3893, upstream #90111).
  • Using -DWITHOUT_<PLUGIN>=ON CMake variable to exclude a plugin from the build didn’t work for some plugins, including a number of storage engines. Bug fixed #3901.
  • A clean-up in Percona Server binlog-related code was made to avoid uninitialized memory comparison. Bug fixed #3925 (upstream #90238).
  • Temporary file I/O was not instrumented for Performance Schema. Bug fixed  #3937  (upstream  #90264).
  • A key_block_size value was set automatically by the Improved MEMORY Storage Engine, which resulted in warnings when changing the engine type to InnoDB, and constantly growing key_block_size during alter operations. Bugs fixed #3936#3940, and #3943.
  • Percona Server Debian packages description included reference to /etc/mysql/my.cnf file, which is not actually present in these packages. Bug fixed #2046.
  • Fixes were introduced to remove GCC 8 compilation warnings for the Percona Server build, retaining compatibility with old compiler versions, including GCC 4.4. Bugs fixed #3950 and #4471.
  • A typo in plugin.cmake file prevented to compile plugins statically into the server. Bug fixed #3871 (upstream #89766).
  • -DWITH_NUMA=ON build option was silently ignored by CMake when NUMA development package was not installed, instead of exiting by error. Bug fixed #4487.
  • Variables innodb_buffer_pool_populate and numa_interleave mapped to the upstream innodb_numa_interleave variable in 5.6.27-75.0 were reverted to their original implementation due to upstream variant being less effective in memory allocation. Now buffer pool is allocated with MAP_POPULATE, forcing NUMA interleaved allocation at the buffer pool initialization time. Bug fixed #3967.
  • audit_log_include_accounts variable did not take effect if placed in my.cnf configuration file, while still working as intended if set dynamically. Bug fixed #3867.
  • Synchronization between between innodb_kill_idle_transaction and kill_idle_transaction system variables was broken because of the regression in Percona Server 5.6.40-83.2. Bug fixed #3955.
  • Executing the SHOW GLOBAL STATUS expression could cause “data drift” on global status variables in case of a query rollback: the variable, being by its nature a counter and allowing only an increase, could return to its previous value. Bug fixed #3951 (upstream #90351).
  • ALTER TABLE … COMMENT = … statement caused TokuDB to rebuild the whole table, which is not needed, as only FRM metadata should be changed. The fix was provided as a contribution by Fungo Wang. Bugs fixed #4280 and #4292.
  • A number of Percona Server 8.0 TokuDB fixes have been backported to Percona Server 5.6 in preparation for using MySQL 8.0. Bugs fixed  #4379#4380#4387#4378#4383#4384#4386#4382, #4391#4390#4392, and #4381.
TokuDB Changes and Fixes
  • Two new variables, tokudb_enable_fast_update and tokudb_enable_fast_upsert, were introduced to facilitate the TokuDB fast updates feature, which involves queries optimization to avoid random reads during their execution. Bug fixed #4365.
  • A data race was fixed in minicron utility of the PerconaFT, as a contribution by Rik Prohaska. Bug fixed #4281.
  • Row count and cardinality decrease to zero took place after long-running REPLACE load, ending up with full table scans for any action.
Other Bugs Fixed
  • #3818 “Orphaned file mysql-test/suite/innodb/r/percona_innodb_kill_idle_trx.result”
  • #3926 “Potentially truncated bitmap file name in log_online_open_bitmap_file_read_only() (storage/innobase/log/log0online.cc)”
  • #2204 “Test main.audit_log_default_db is unstable”
  • #3767 “Fix compilation warnings/errors with clang”
  • #3773 “Incorrect key file for table frequently for tokudb”
  • #3794 “MTR test main.percona_show_temp_tables_stress does not wait for events to start”
  • #3798 “MTR test innodb.percona_extended_innodb_status fails if InnoDB status contains unquoted special characters”
  • #3887 “TokuDB does not compile with -DWITH_PERFSCHEMA_STORAGE_ENGINE=OFF”
  • #4388 “5.7 code still has TOKU_INCLUDE_OPTION_STRUCTS which is a MariaDB specific construct”
  • #4265 “TDB-114 (Change use of MySQL HASH to unordered_map) introduces memory leak”
  • #4277 “memory leaks in TDB-2 and TDB-89 tests”
  • #4276 “Data race on cache table attributes detected by the thread sanitizer”
  • #4451 “Implement better compression algo testing”
  • #4469 “variable use out of scope bug in get_last_key test detected by ASAN in clang 6”
  • #4470 “the cachetable-simple-pin-nonblocking-cheap test occasionally fails due to a locking conflict with the cachetable evictor”
  • #1131 “User_var_log_event::User_var_log_event(const char*, uint, const Format_description_log_event*): Assertion `(bytes_read == (data_written – ((old_pre_checksum_fd || (description_event->checksum_alg == BINLOG_CHECKSUM_ALG_OFF)) ? 0 : 4))) || ((“.

Find the release notes for Percona Server for MySQL 5.6.40-84.0 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.6.40-84.0 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at May 30, 2018 06:55 PM

Jean-Jerome Schmidt

Watch the Replay: How to Migrate to Galera Cluster for MySQL & MariaDB

Watch the replay of this webinar with Severalnines Support Engineer Bart Oles, as he walks us through what you need to know in order to migrate from standalone or a master-slave MySQL/MariaDB setup to Galera Cluster.

When considering such a migration, plenty of questions typically come up, such as: how do we migrate? Does the schema or application change? What are the limitations? Can a migration be done online, without service interruption? What are the potential risks?

Galera Cluster has become a mainstream option for high availability MySQL and MariaDB. And though it is now known as a credible replacement for traditional MySQL master-slave architectures, it is not a drop-in replacement.

It has some characteristics that make it unsuitable for certain use cases, however, most applications can still be adapted to run on it.

The benefits are clear: multi-master InnoDB setup with built-in failover and read scalability.

Check out this walk-through on how to migrate to Galera Cluster for MySQL and MariaDB.

Watch the replay and browse through the slides!

Agenda

  • Application use cases for Galera
  • Schema design
  • Events and Triggers
  • Query design
  • Migrating the schema
  • Load balancer and VIP
  • Loading initial data into the cluster
  • Limitations:
    • Cluster technology
    • Application vendor support
  • Performing Online Migration to Galera
  • Operational management checklist
  • Belts and suspenders: Plan B
  • Demo

Speaker

Bartlomiej Oles is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.

by jj at May 30, 2018 11:12 AM

Peter Zaitsev

MySQL Test Framework for Percona XtraDB Cluster

MySQL Test Framework

At my latest webinar “MySQL Test Framework (MTR) for Troubleshooting”, I received an interesting question about MTR test cases for Percona XtraDB Cluster (PXC). Particularly about testing SST and IST.

This post is intended to answer this question. It assumes you are familiar with MTR and can write tests for MySQL servers. If you are not, please watch the webinar recording first.

You can find example tests in any PXC tarball package. They are located in directories

mysql-test/suite/galera
 ,
mysql-test/suite/galera_3nodes
  and
mysql-test/suite/wsrep
 , though that last directory only contains a configuration file.

If you simply try to run tests in galera suite you will find they all are disabled, because the environment variable

WSREP_PROVIDER
  was not set:

sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test$ ./mtr --suite=galera
Logging: ./mtr --suite=galera
MySQL Version 5.7.19
Too long tmpdir path '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/tmp' creating a shorter one...
- using tmpdir: '/tmp/xYgQqOa5b7'
Checking supported features...
- SSL connections supported
- binaries built with wsrep patch
Using suites: galera
Collecting tests...
Checking leftover processes...
- found old pid 30624 in 'mysqld.3.pid', killing it...
process did not exist!
Removing old var directory...
Creating var directory '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var'...
Installing system database...
Using parallel: 1
==============================================================================
TEST RESULT TIME (ms) or COMMENT
--------------------------------------------------------------------------
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
galera.GAL-419 [ skipped ] Test needs 'big-test' option
...
galera.galera_binlog_checksum [ skipped ] Test requires wsrep provider library (libgalera_smm.so). Did you set $WSREP_PROVIDER?
galera.galera_binlog_event_max_size_min [ skipped ] Test requires wsrep provider library (libgalera_smm.so). Did you set $WSREP_PROVIDER?
galera.galera_flush_gtid [ skipped ] Test requires wsrep provider library (libgalera_smm.so). Did you set $WSREP_PROVIDER?
galera.galera_gtid [ skipped ] Test requires wsrep provider library (libgalera_smm.so). Did you set $WSREP_PROVIDER?
galera.lp1435482 [ skipped ] Test requires wsrep provider library (libgalera_smm.so). Did you set $WSREP_PROVIDER?
^Cmysql-test-run: *** ERROR: Got ^C signal

In order to run these tests you need to set this variable first.

I use the quite outdated 5.7.19 PXC package (the version does not matter for the purpose of this post) and run tests as:

WSREP_PROVIDER=/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/lib/libgalera_smm.so ./mtr --suite=galera

After the variable

WSREP_PROVIDER
  is set, 
mtr
  can successfully run:

sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test

WSREP_PROVIDER=/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/lib/libgalera_smm.so ./mtr --suite=galera
Logging: ./mtr --suite=galera
MySQL Version 5.7.19
Too long tmpdir path '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/tmp' creating a shorter one...
- using tmpdir: '/tmp/I6HfuqkwR1'
Checking supported features...
- SSL connections supported
- binaries built with wsrep patch
Using suites: galera
Collecting tests...
Checking leftover processes...
- found old pid 14271 in 'mysqld.1.pid', killing it...
process did not exist!
- found old pid 14273 in 'mysqld.2.pid', killing it...
process did not exist!
Removing old var directory...
Creating var directory '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var'...
Installing system database...
Using parallel: 1
==============================================================================
TEST RESULT TIME (ms) or COMMENT
--------------------------------------------------------------------------
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
galera.GAL-419 [ skipped ] Test needs 'big-test' option
...
worker[1] mysql-test-run: WARNING: Waited 60 seconds for /home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/run/mysqld.2.pid to be created, still waiting for 120 seconds...
galera.galera_binlog_checksum [ pass ] 2787
worker[1] mysql-test-run: WARNING: Waited 60 seconds for /home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/run/mysqld.2.pid to be created, still waiting for 120 seconds...
galera.galera_binlog_event_max_size_min [ pass ] 2200
...

Now we are ready to write our first PXC test. The easiest way to get started is to open any existing test and check how it is written. Then modify it so that it replays our own scenario.

Since the question was about testing

IST
  and
SST
, I will use the test
galera_ist_progress
  as an example. First let’s check that it runs successfully and that it does not have any requirements that could prevent it from running inside regular production binaries:

sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test$ WSREP_PROVIDER=/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/lib/libgalera_smm.so ./mtr --suite=galera galera_ist_progress
Logging: ./mtr --suite=galera galera_ist_progress
MySQL Version 5.7.19
Too long tmpdir path '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/tmp' creating a shorter one...
- using tmpdir: '/tmp/EodvOyCJwo'
Checking supported features...
- SSL connections supported
- binaries built with wsrep patch
Collecting tests...
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var'...
Installing system database...
Using parallel: 1
==============================================================================
TEST RESULT TIME (ms) or COMMENT
--------------------------------------------------------------------------
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
worker[1] mysql-test-run: WARNING: Waited 60 seconds for /home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/run/mysqld.2.pid to be created, still waiting for 120 seconds...
galera.galera_ist_progress [ pass ] 17970
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 17.970 of 218 seconds executing testcases
Completed: All 1 tests were successful.

Everything is fine. Now let’s look into the test itself.

First, this test has its own configuration file. Let’s check what’s in there:

$ cat suite/galera/t/galera_ist_progress.cnf
!include ../galera_2nodes.cnf
[mysqld.1]
wsrep_provider_options='base_port=@mysqld.1.#galera_port;pc.ignore_sb=true'

galera_2nodes.cnf
  is one of the standard configuration files in galera suite. If we look into it we may notice that 
wsrep_provider_options
  is defined and overriding this option is not required for all tests.

We’ll continue our review. The test script includes the 

galera_cluster.inc
  file:

--source include/galera_cluster.inc

This file is located outside of galera suite and contains 2 lines:

--let $galera_cluster_size = 2
--source include/galera_init.inc

galera_init.inc
 , in its turn, creates as many nodes as defined by the 
galera_cluster_size
  variable and additionally creates a default connection for each of them.

Now let’s step out from

galera_ist_progress
  and check if this knowledge is enough to create our first PXC test.

I created a simple test based on a two node setup which checks a few status and system variables, creates a table, inserts data into it, and ensures that content is accessible on both nodes:

$ cat ~/src/tests/t/pxc.test
--source include/galera_cluster.inc
--connection node_1
--echo We are on node 1
select @@hostname, @@port;
show status like 'wsrep_cluster_size';
show status like 'wsrep_cluster_status';
show status like 'wsrep_connected';
create table t1(id int not null auto_increment primary key, f1 int) engine=innodb;
insert into t1(f1) values(1),(2),(3);
select * from t1;
--connection node_2
--echo We are on node 2
select @@hostname, @@port;
show status like 'wsrep_cluster_size';
show status like 'wsrep_cluster_status';
show status like 'wsrep_connected';
select * from t1;
insert into t1(f1) values(1),(2),(3);
select * from t1;
--connection node_1
--echo We are on node 1
select * from t1;
drop table t1;

However, if I run this test in the main suite, it will fail:

sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test$ export WSREP_PROVIDER=/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/lib/libgalera_smm.so
sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test$ do_test.sh -s ~/mysql_packages -b Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100
Logging: ./mysql-test-run.pl --record --force pxc
MySQL Version 5.7.19
Too long tmpdir path '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/tmp' creating a shorter one...
- using tmpdir: '/tmp/uUmBztSWUA'
Checking supported features...
- SSL connections supported
- binaries built with wsrep patch
Collecting tests...
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var'...
Installing system database...
Using parallel: 1
==============================================================================
TEST RESULT TIME (ms) or COMMENT
--------------------------------------------------------------------------
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
main.pxc [ skipped ] Test requires wsrep provider library (libgalera_smm.so). Did you set $WSREP_PROVIDER?
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 0.000 of 108 seconds executing testcases
Completed: All 0 tests were successful.
1 tests were skipped, 1 by the test itself.
=====Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100=====
=====pxc=====
sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test$ echo $WSREP_PROVIDER
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/lib/libgalera_smm.so

The reason for this failure is that galera suite has default option files that set the necessary variables. Let’s skip those option files for a while and simply run our test in galera suite:

sveta@Thinkie:~/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test$ do_test.sh -s ~/mysql_packages -b Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100 -t galera
Logging: ./mysql-test-run.pl --record --force --suite=galera pxc
MySQL Version 5.7.19
Too long tmpdir path '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/tmp' creating a shorter one...
- using tmpdir: '/tmp/ytqEjnfM7i'
Checking supported features...
- SSL connections supported
- binaries built with wsrep patch
Collecting tests...
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var'...
Installing system database...
Using parallel: 1
==============================================================================
TEST RESULT TIME (ms) or COMMENT
--------------------------------------------------------------------------
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
worker[1] mysql-test-run: WARNING: Waited 60 seconds for /home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/var/run/mysqld.2.pid to be created, still waiting for 120 seconds...
galera.pxc [ pass ] 2420
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 2.420 of 208 seconds executing testcases
Completed: All 1 tests were successful.
pxc.result
=====Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100=====
=====pxc=====
We are on node 1
select @@hostname, @@port;
@@hostname @@port
Thinkie 13000
show status like 'wsrep_cluster_size';
Variable_name Value
wsrep_cluster_size 2
show status like 'wsrep_cluster_status';
Variable_name Value
wsrep_cluster_status Primary
show status like 'wsrep_connected';
Variable_name Value
wsrep_connected ON
create table t1(id int not null auto_increment primary key, f1 int) engine=innodb;
insert into t1(f1) values(1),(2),(3);
select * from t1;
id f1
2 1
4 2
6 3
We are on node 2
select @@hostname, @@port;
@@hostname @@port
Thinkie 13004
show status like 'wsrep_cluster_size';
Variable_name Value
wsrep_cluster_size 2
show status like 'wsrep_cluster_status';
Variable_name Value
wsrep_cluster_status Primary
show status like 'wsrep_connected';
Variable_name Value
wsrep_connected ON
select * from t1;
id f1
2 1
4 2
6 3
insert into t1(f1) values(1),(2),(3);
select * from t1;
id f1
2 1
4 2
6 3
7 1
9 2
11 3
We are on node 1
select * from t1;
id f1
2 1
4 2
6 3
7 1
9 2
11 3
drop table t1;

You will see that the test reports that the two nodes run on different ports:

We are on node 1
select @@hostname, @@port;
@@hostname @@port
Thinkie 13000
...
We are on node 2
select @@hostname, @@port;
@@hostname @@port
Thinkie 13004

… and that PXC started:

show status like 'wsrep_cluster_size';
Variable_name Value
wsrep_cluster_size 2
show status like 'wsrep_cluster_status';
Variable_name Value
wsrep_cluster_status Primary
show status like 'wsrep_connected';
Variable_name Value
wsrep_connected ON

And we can also clearly see that each node sees the changes to our test table that were made by the other node.

Now let’s get back to

IST
  test, defined in
galera_ist_progress.test
 .

In order to test

IST
  it first stops writes to the cluster:

# Isolate node #2
--connection node_2
SET GLOBAL wsrep_provider_options = 'gmcast.isolate = 1';

Then it connects to node 1 and waits until 

wsrep_cluster_size
  becomes 1:

--connection node_1
--let $wait_condition = SELECT VARIABLE_VALUE = 1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_size';
--source include/wait_condition.inc

Then it turns

wsrep_on OFF
  on node 2:

--connection node_2
SET SESSION wsrep_on = OFF;
--let $wait_condition = SELECT VARIABLE_VALUE = 'non-Primary' FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_status';
--source include/wait_condition.inc
SET SESSION wsrep_on = ON;

Now node 2 is completely isolated and node 1 can be updated, so we can test

IST
  when we bring node 2 back online.

--connection node_1
CREATE TABLE t1 (f1 INTEGER) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);
INSERT INTO t1 VALUES (2);
INSERT INTO t1 VALUES (3);
INSERT INTO t1 VALUES (4);
INSERT INTO t1 VALUES (5);
INSERT INTO t1 VALUES (6);
INSERT INTO t1 VALUES (7);
INSERT INTO t1 VALUES (8);
INSERT INTO t1 VALUES (9);
INSERT INTO t1 VALUES (10);

After the update is done, node 2 is brought online:

--connection node_2
SET GLOBAL wsrep_provider_options = 'gmcast.isolate = 0';
--connection node_1
--let $wait_condition = SELECT VARIABLE_VALUE = 2 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_size';
--source include/wait_condition.inc
--connection node_2
--let $wait_condition = SELECT VARIABLE_VALUE = 'Primary' FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_status';
--source include/wait_condition.inc

Once node 2 is online, checks for IST progress are performed. To check for IST progress, the test greps the error log file from node 2 where any messages about IST progress are printed:

#
# Grep for expected IST output in joiner log
#
--connection node_1
--let $assert_count = 1
--let $assert_file = $MYSQLTEST_VARDIR/log/mysqld.2.err
--let $assert_only_after = Need state transfer
--let $assert_text = Receiving IST: 11 writesets, seqnos
--let $assert_select = Receiving IST: 11 writesets, seqnos
--source include/assert_grep.inc
--let $assert_text = Receiving IST... 0.0% ( 0/11 events) complete
--let $assert_select = Receiving IST... 0.0% ( 0/11 events) complete
--source include/assert_grep.inc
--let $assert_text = Receiving IST...100.0% (11/11 events) complete
--let $assert_select = Receiving IST...100.0% (11/11 events) complete
--source include/assert_grep.inc

Here is the error log snipped from node 2 when it re-joined the cluster and initiated state transfer.

2018-05-25T17:00:46.908569Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 13)
2018-05-25T17:00:46.908637Z 2 [Note] WSREP: State transfer required:
	Group state: f364a69b-603c-11e8-a632-ce5a4a7d5964:13
	Local state: f364a69b-603c-11e8-a632-ce5a4a7d5964:2
2018-05-25T17:00:46.908673Z 2 [Note] WSREP: New cluster view: global state: f364a69b-603c-11e8-a632-ce5a4a7d5964:13, view# 4: Primary, number of nodes: 2, my index: 1, protocol version 3
2018-05-25T17:00:46.908694Z 2 [Note] WSREP: Setting wsrep_ready to true
2018-05-25T17:00:46.908717Z 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
2018-05-25T17:00:46.908737Z 2 [Note] WSREP: Setting wsrep_ready to false
2018-05-25T17:00:46.908757Z 2 [Note] WSREP: You have configured 'xtrabackup-v2' state snapshot transfer method which cannot be performed on a running server. Wsrep provider won't be able to fall back to it if other means of state transfer are unavailable. In that case you will need to restart the server.
2018-05-25T17:00:46.908777Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-05-25T17:00:46.908799Z 2 [Note] WSREP: REPL Protocols: 7 (3, 2)
2018-05-25T17:00:46.908831Z 2 [Note] WSREP: Assign initial position for certification: 13, protocol version: 3
2018-05-25T17:00:46.908886Z 0 [Note] WSREP: Service thread queue flushed.
2018-05-25T17:00:46.908934Z 2 [Note] WSREP: Check if state gap can be serviced using IST
2018-05-25T17:00:46.909062Z 2 [Note] WSREP: IST receiver addr using tcp://127.0.0.1:13006
2018-05-25T17:00:46.909232Z 2 [Note] WSREP: Prepared IST receiver, listening at: tcp://127.0.0.1:13006
2018-05-25T17:00:46.909267Z 2 [Note] WSREP: State gap can be likely serviced using IST. SST request though present would be void.
2018-05-25T17:00:46.909489Z 0 [Note] WSREP: Member 1.0 (Thinkie) requested state transfer from '*any*'. Selected 0.0 (Thinkie)(SYNCED) as donor.
2018-05-25T17:00:46.909513Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 13)
2018-05-25T17:00:46.909557Z 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2018-05-25T17:00:46.909602Z 2 [Note] WSREP: GCache history reset: f364a69b-603c-11e8-a632-ce5a4a7d5964:2 -> f364a69b-603c-11e8-a632-ce5a4a7d5964:13
2018-05-25T17:00:46.910221Z 0 [Note] WSREP: 0.0 (Thinkie): State transfer to 1.0 (Thinkie) complete.
2018-05-25T17:00:46.910422Z 0 [Note] WSREP: Member 0.0 (Thinkie) synced with group.
2018-05-25T17:00:47.006802Z 2 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): full reset
2018-05-25T17:00:47.106423Z 2 [Note] WSREP: Receiving IST: 11 writesets, seqnos 2-13
2018-05-25T17:00:47.106764Z 0 [Note] WSREP: Receiving IST...  0.0% ( 0/11 events) complete.
2018-05-25T17:00:47.109740Z 0 [Note] WSREP: Receiving IST...100.0% (11/11 events) complete.
2018-05-25T17:00:47.110029Z 2 [Note] WSREP: IST received: f364a69b-603c-11e8-a632-ce5a4a7d5964:13
2018-05-25T17:00:47.110433Z 0 [Note] WSREP: 1.0 (Thinkie): State transfer from 0.0 (Thinkie) complete.
2018-05-25T17:00:47.110480Z 0 [Note] WSREP: SST leaving flow control
2018-05-25T17:00:47.110509Z 0 [Note] WSREP: Shifting JOINER -> JOINED (TO: 13)
2018-05-25T17:00:47.110778Z 0 [Note] WSREP: Member 1.0 (Thinkie) synced with group.
2018-05-25T17:00:47.110830Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 13)
2018-05-25T17:00:47.110890Z 2 [Note] WSREP: Synchronized with group, ready for connections

If you want to write your own tests for IST and SST operations you can use existing test cases as a baseline. You are not required to use grep, and can explore your own scenarios. The important parts of the code are:

  • The variable
    WSREP_PROVIDER
     must be set before the test run
  • The test should be either in galera suite or if you choose to use your own suite you must copy the definitions from the galera suite default configuration file
  • The test should include the file
    include/galera_cluster.inc
  • To isolate the node from the cluster run the following code:

# Isolate node #2
--connection node_2
SET GLOBAL wsrep_provider_options = 'gmcast.isolate = 1';
--connection node_1
--let $wait_condition = SELECT VARIABLE_VALUE = 1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_size';
--source include/wait_condition.inc
--connection node_2
SET SESSION wsrep_on = OFF;
--let $wait_condition = SELECT VARIABLE_VALUE = 'non-Primary' FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_status';
--source include/wait_condition.inc
SET SESSION wsrep_on = ON;

Replace the node numbers if needed.

To bring the node back to the cluster run the following code:

# Restore node #2, IST is performed
--connection node_2
SET GLOBAL wsrep_provider_options = 'gmcast.isolate = 0';
--connection node_1
--let $wait_condition = SELECT VARIABLE_VALUE = 2 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_size';
--source include/wait_condition.inc
--connection node_2
--let $wait_condition = SELECT VARIABLE_VALUE = 'Primary' FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_status';
--source include/wait_condition.inc

Depending on the size of the updates and

gcache
 you can test either IST or SST in this way.

The post MySQL Test Framework for Percona XtraDB Cluster appeared first on Percona Database Performance Blog.

by Sveta Smirnova at May 30, 2018 10:43 AM

May 29, 2018

Peter Zaitsev

Deploying PMM on DigitalOcean

Log in to DigitalOcean panel and click "Create Droplet."

It’s very easy to install Percona Monitoring and Management (PMM) on DigitalOcean. If you’ve never used DigitalOcean before, you will find that it is user-friendly and not very expensive. For $5/month you can easily host your PMM on it, letting you monitor your simple infrastructure or try out PMM before implementing it to monitor your production environments.

Let’s prepare the DigitalOcean instance

Log in to DigitalOcean (DO) control panel and click “Create Droplet.”

Log in to DigitalOcean panel and click "Create Droplet."

Thanks to DO you can skip the boring OS setup and save time by using the Docker “One click app” in DO and the Docker image from PMM.

Create Droplet on DigitalOcean

Note: After clicking on “Docker…” choose an instance size that accommodates your budget – PMM can run on as little as the 1GB 1vCPU instance!

Choose Droplet Size

Note: Scroll again!

Next step – select a nearby region

Since the next Percona Live Europe, 2018 will be in Frankfurt (https://www.percona.com/blog/2018/04/05/percona-live-europe-2018-save-the-date/ ) for me the location choice is obvious.

Choose DigitalOcean datacenter region

The final step in this section is ‘Set Hostname’

I recommend you add ‘pmm-server-‘ at the beginning so that you can easily find it in your control panel. The name in my case is ‘pmm-server-docker-s-1vcpu-1gb-fra1-01’ and I’ll use it later in this tutorial.

Finalize and create Droplet hostname

Click “Create” and wait a while.You can follow the process on the dashboard:

Creating the instance of DigitalOcean Droplet

When the Droplet is created, you’ll get an email with your login details.

The next step is ‘Set up PMM into the Droplet’

SSH to the server, change the password, and let’s prepare to install the PMM server.

==================
random@random-vb:~$ ssh root@X.X.X.X
...
"ufw" has been enabled. All ports except 22 (SSH), 80 (http) and 443 (https)
have been blocked by default.
...
Changing password for root.
(current) UNIX password:
Enter new UNIX password:
Retype new UNIX password:
root@pmm-server-docker-s-1vcpu-1gb-fra1-01:~#
====================

Note the output for the first login. You are getting Ubuntu 16.04 with pre-installed Docker.

The instructions for installing PMM are very simple. You can read them at https://www.percona.com/doc/percona-monitoring-and-management/deploy/server/docker.html

1) Pull the latest version from Docker Hub:

docker pull percona/pmm-server:latest

Wait for some time (this depends on your internet connection)

2) Create a container for persistent PMM data

docker create
-v /opt/prometheus/data
-v /opt/consul-data
-v /var/lib/mysql
-v /var/lib/grafana
--name pmm-data
percona/pmm-server:latest /bin/true

3) Create and launch PMM Server in one command

docker run -d
-p 80:80
--volumes-from pmm-data
--name pmm-server
--restart always
percona/pmm-server:latest

Just to confirm that your containers are available, go ahead and run “docker ps.” You’ll see something like this:

root@pmm-server-docker-s-1vcpu-1gb-fra1-01:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5513858041f7 percona/pmm-server:latest "/opt/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:80->80/tcp, 443/tcp pmm-server

That’s all! Congratulations! Your PMM server is running.

If you open the IP of your server in the browser, you’ll see something like this:

PMM running in DigitalOcean Droplet instance

There you can see that PMM has already started monitoring itself.

Now you need to install PMM client on your database server and configure it, instructions for this are at https://www.percona.com/doc/percona-monitoring-and-management/deploy/client/index.html

Please note, if you also use DO for the database server by external IP, you’ll probably face “the firewall problem.” In this case, you need to open ports using the “ufw” tool. (See the welcome message from Digital Ocean). For testing purposes, you can use

ufw allow 42000:42999/tcp

To open only pmm-client related ports, follow https://www.percona.com/doc/percona-monitoring-and-management/glossary.terminology.html#term-ports  To run ufw, you need to use the terminal, and you can find more information about ufw at https://www.digitalocean.com/community/tutorials/ufw-essentials-common-firewall-rules-and-commands  Once you have opened up the ports, PMM should now work correctly for this setup.

Final recommendation: Depending on your load you may need to monitor your System Overview dashboard which you’ll find at http://X.X.X.X/graph/somesymbols/system-overview

If you are out of space, upgrade your DO Droplet.

The post Deploying PMM on DigitalOcean appeared first on Percona Database Performance Blog.

by Roma Novikov at May 29, 2018 12:22 PM

MariaDB AB

Simplify User Account Management with MariaDB MaxScale 2.2 and MariaDB Server 10.3

Simplify User Account Management with MariaDB MaxScale 2.2 and MariaDB Server 10.3 Esa Korhonen Tue, 05/29/2018 - 08:07

Configuring database user accounts for MariaDB MaxScale and a backend cluster has typically required a duplicate effort. This is because an account entry must exist for both the real client host and the MaxScale host. MaxScale authenticates incoming users against the user entry with the real client host. When MaxScale creates the routing session, it uses the client’s username and password scramble to authenticate the client to the backend. The backend sees the connection coming from the machine running MaxScale. Unless the host name uses wildcards (low security), entries for both hosts are required. If user accounts are often modified, this duplication may get cumbersome and lead to errors.

MariaDB Server 10.3 adds support for the proxy protocol which allows a connection to self-designate its host. The protocol states that when a connection has been established, the client should first (before responding to the MySQL handshake) send a proxy protocol header. This header contains the hostname that the server should imagine the connection originating from instead of using the real hostname. For security reasons, proxy headers are only allowed from addresses listed in the server variable “proxy_protocol_networks”. The feature thus allows select ip addresses to act as proxies without having actual user accounts on the database backend. As an example, the header “PROXY TCP4 192.168.0.1 192.168.0.2 56324 443” instructs the server to authenticate the client as if the client was connecting from 192.168.0.1.

This feature can be used to simplify user account management when using MaxScale 2.2 and MariaDB Server 10.3. To enable the feature in MaxScale, add the line “proxy_protocol=on” to a server definition in your MaxScale configuration file (typically this should be added to all server sections).

An example of a MaxScale server definition:

[MyServer1]
type=server
address=123.456.789.0
port=3306
protocol=MariaDBBackend
proxy_protocol=yes

When MaxScale attempts to create a client session on the server, MaxScale first sends a proxy header with the original hostname of the client. If the MaxScale IP is found within the “proxy_protocol_networks” of the server, the header is read and the connection authenticated using the real client address. For the server setting, see server documentation for more information.

Assuming MaxScale IP is “111.222.333.4”, add the following to the [mysqld]-section of the server configuration:

proxy_protocol_networks = 111.222.333.4

With these settings, an incoming client “normal_user” does not need to have an entry for host “111.222.333.4” in the mysql.user-table. Only an entry for the real client host is required.

Get MariaDB MaxScale 2.2 and MariaDB Server 10.3 as part of MariaDB TX 3.0, available for download now.

Login or Register to post comments

by Esa Korhonen at May 29, 2018 12:07 PM

May 28, 2018

Valeriy Kravchuk

Fun with Bugs #68 - On MySQL Bug Reports I am Subscribed to, Part VII

Last time I reviewed my recent subscriptions to MySQL bugs it was April 1, 2018. I was busy working, blogging about other bugs, running random MTR tests on MySQL 8.0.11 and got two weeks of vacation since than. Now it's time to briefly review 20 recent bug reports (mostly for MySQL 8.0.11) I was interested in.

As usual, I start with most recent bug reports:
  • Bug #91010 - "WolfSSL build broken due to cmake typo". MySQL 8 can be compiled using wolfSSL according to the manual, but as Laurynas Biveinis found this is not the case in practice, and not only because of the typo in libutils.cmake. It seems nobody tried to test this kind of build recently. I wonder what else they had not tested in a hurry to release nice MySQL 8.0.11 GA... 
  • Bug #91009 - "Incorrect usage of std::atomic::compare_exchange_weak". My dear friend Sinisa Milivojevic verified this report by Alexey Kopytov as a feature request. I think it's still a bug, even if it does not have any visible effect on processors currently supported. Let's see what may happen with this report next.
  • Bug #90968 - "Several incorrect function type indirect call UBSan errors". It seems Laurynas Biveinis found yet another kind of testing that Oracle decided not to bother much with while working on MySQL 8 GA release. More test runs with UBSan are needed while working on MySQL 8.0.12.
  • Bug #90959 - "XA transactions can lock forever if a gap lock is also taken on the slave". This bug was reported by Andreas Wederbrand and additional test case by Sveta Smirnova shows how serious it might be.
  • Bug #90890 - "CPU overhead for insert benchmark load increased by 2X in 8.0.11". Serious problem for single-threaded case was reported by Mark Callaghan. Single thread insert rate continues to drop from 5.6 to 5.7 and 8.0.11.
  • Bug #90847 - "Query returns wrong data if order by is present". Wrong results bugs are the worst, probably, as it may be hard to notice the problem until it's too late... This bug was reported by Vincenzo Antolini.
  • Bug #90794 - "GR 5.7.22 not compatible with earlier versions due to lower_case_table_names". I can afford not to care at all about group replication at the moment, but this regression bug noted by Kenny Gryp may affect many less lucky people during upgrade in production.
  • Bug #90670 - "InnoDB assertion failure in log0write.cc time_elapsed >= 0". I do not see any public attempts to process this bug reported by Mark Callaghan. It may be not easy to repeat, but Mark's idea of more useful information in the assert message is great anyway.
  • Bug #90643 - "use different mutex to protect trx_sys->serialisation_list". Nice feature request from Zhai Weixiang.
  • Bug #90617 - "using gdb to attach mysqld will shutdown the instance". I can not reproduce this bug with binaries I've built from source, but that would be a really awful bug if it happens with Oracle binaries. I am surprised that this bug report by Zhai Weixiang is still "Open" and had not got proper attention from Oracle engineers for more than a month...
  • Bug #90579 - "please document how to configure the dragnet 8.0 logging non-interactively". Unlike Simon Mudd, I had not started to read the manual for new 8.0 features carefully yet. It may become a source of dozens of additional bug reports if it's of the same quality as, say, MySQL 5.6's manual at the moment of GA. We shall see.
  • Bug #90571 - "Don't generate warnings about successful actions (dragnet filter rules)". MySQL 8 reduced the number of messages in the error log and allows to control the content better, but as Simon Mudd reasonably noted, adding filters successfully should not generate warnings by itself.
  • Bug #90554 - "Undesired change for Windows users in 8.0". As Peter Laursen noted, the idea to disable network connections if server is started with --skip-grant-tables may look good from security point of view, but this unexpected change may leave Windows users (who had not configured any other connectivity options) without a known way to set/reset passwords.
  • Bug #90534 - "InnoDB Cluster members stop by failed start of Group Replication". This bug was reported by Ryusuke Kajiyama. Based on recent comments, this may happen not only on macOS Sierra version 10.12.6 (as it was stated initially).
  • Bug #90484 - "No (easy) way to know if a GR node is writeable or not". Nice feature request from Kenny Gryp to
    "Make it possible to determine _easily_ if a node is part of primary partition AND which node can accept writes."
  • Bug #90448 - "INSERT IGNORE not ignored on SQL thread". It happens for a table with partition missing for the date, but still is unexpected. Fortunately this bug reported by Simon Mudd is NOT repeatable on MySQL 8.0.11 GA (it seems to affect only 5.7.x).
  • Bug #90351 - "GLOBAL STATUS variables drift after rollback". As it was noted by Iwo P, the value of Handler_rollback may decrease in some cases. There is a patch from Zsolt Parragi in this bug report.
  • Bug #90331 - "Server does not raise error on unrecognized collation id". This really serious issue was reported by Manuel Ung. Until older server versions starts to produce error when asked for a collation they do not support, it may be not safe to use 8.0.x clients with older server versions.
  • Bug #90307 - "host blocking limit seems not to be used correctly". We all know that Performance Schema is near perfect. But Simon Mudd still found a bug in it that probably affects MySQL 8.0.11 as well.
  • Bug #90291 - "load_file() will not raise an error if secure_file_priv option was not set". This bug was reported by Shahriyar Rzayev from Percona. It is not clear if it was checked on MySQL 8.0.x.
That's all for now. Stay tuned! I plan to write few more posts about bugs related to partitioning and InnoDB data compression, soon.

by Valeriy Kravchuk (noreply@blogger.com) at May 28, 2018 08:18 AM

May 27, 2018

MariaDB Foundation

MariaDB 10.3.7 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.3.7, the first stable release in the MariaDB 10.3 series. See the release notes and changelogs for details. Download MariaDB 10.3.7 Release Notes Changelog What is MariaDB 10.3? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.3.7 Aleksey Midenkov (Tempesta) Alexander Barkov […]

The post MariaDB 10.3.7 now available appeared first on MariaDB.org.

by Ian Gilfillan at May 27, 2018 07:43 PM

May 25, 2018

MariaDB AB

What's New in MariaDB Server 10.3

What's New in MariaDB Server 10.3 maxmether Fri, 05/25/2018 - 18:52

We are happy to announce the general availability (GA) of MariaDB Server 10.3! This release is a big milestone for the development of MariaDB Server and is the result of a huge effort by the development team and contributors – thanks to everyone involved! With our previous major release of MariaDB Server 10.2 last year, we started a journey of adding more enterprise-grade features to better close the gap with proprietary databases. With MariaDB Server 10.3 we take a huge leap on that journey by being the first enterprise open source database to add features like temporal data processing (through system versioning) as well as compatibility with Oracle sequences and Oracle PL/SQL. At the same time, we want to stay true to our open source and innovative roots by adding support for new storage engines to be able to more easily adapt to different workloads and different hardware available to users. This path allows us to adapt quickly to an ever-changing landscape where new innovations are being created at a constantly accelerated pace. This is our greatest release yet and, with this release, we want to put our stake in the Enterprise database category.

The key enhancements of MariaDB Server 10.3 can be put in the following categories:

  • Temporal data processing (system-versioned tables)

  • Oracle compatibility features

  • Purpose-built storage engines

 

Temporal Data Processing

Temporal data processing through our system versioning feature is one of the more exciting additions in the MariaDB Server 10.3 release. With system versioning, the database will keep track of all changes made to every row in the table. The old versions of the rows are not visible through normal query syntax, but by using a special syntax you can access all of the old versions of the row. This capability lends itself to a large number of use cases, anything from auditing and forensics (finding the exact point-in-time result set from a suspicious query executed some time ago) to things like analyzing changes in your data, comparing customer preferences year to year and a multitude of other possibilities. This feature can be turned on per table and the history can also be deleted periodically so that your table doesn’t grow indefinitely. The use cases are both exciting and endless! For more information on system versioning read our manual or this blog on automatic data versioning.

 

Oracle Compatibility

As the demand for MariaDB Server has increased in larger enterprises we have also seen a need for features that are readily available in proprietary databases. In order for MariaDB to be easier to use for DBAs and skilled database engineers from other products, we wanted to add as much compatibility as possible.

In MariaDB Server 10.3, we added a new stored routine syntax in addition to the already existing MariaDB SQL/PSM syntax. We now support MariaDB SQL/PL which is a syntax designed to be compatible with Oracle PL/SQL. This way, migrating existing applications is a lot easier and existing skills can be used without complex retraining. In the process we also added several new constructs to our stored procedure support like new ROW data types.

The new syntax isn’t the only new compatibility feature, we also added sequences in order to have a more flexible way of creating unique primary keys than the already existing auto_increment feature. This feature is fully compatible with Oracle sequences. Learn more about how to use sequences in this blog post. Together with features added previously (like window functions, common table expressions (CTEs), etc.) we now have a deep set of enterprise-grade features that can tackle any type of application need.

 

Purpose-Built Storage Engines

At MariaDB, we believe in using the right tool for the right trade. However, we don’t feel that you need to change everything in order to achieve that. We have a unique architecture with pluggable storage engines that allows the user to adapt the database to the use case and workload without changing the main characteristics and features. We believe that this flexibility serves the best interest of the user and we will work on further advancing this with future versions of MariaDB. This architecture will enable both the community and our team to innovate further by adding storage engines designed for new hardware and new use cases. In MariaDB Server 10.3, we introduce two new storage engines that are declared stable, MyRocks and Spider.

MyRocks comes from a collaboration with Facebook where the storage engine is built on top of RocksDB – an open source project mainly maintained by Facebook. The MyRocks storage engine is built using a log-structured merge tree (LSM tree) architecture and is well adapted to high write workloads. MyRocks also has a very high compression ratio and is built to optimize the lifetime of SSD disks.

Spider is a storage engine designed for extreme scale. The Spider storage engine allows you to shard a specific table across multiple nodes. It uses the partitioning protocol to define how the table should be split up and each individual shard will then reside on a remote MariaDB Server that will only handle queries for that particular shard. With Spider you get almost linear scaling for INSERTS and key lookup read queries.

 

And there’s more ...

In addition to this, we have added a multitude of features to help speed up schema operations (like instant ADD COLUMN) and other optimizations and compatibility features. The ADD COLUMN feature is another example of our collaboration with customers and partners including Alibaba, Tencent and ServiceNow, and is just the beginning of making heavy DDL operations more usable.

Want all the details? Get a full list of features in MariaDB Server 10.3.

Get MariaDB Server 10.3 as part of the MariaDB TX 3.0 download – now available.

Login or Register to post comments

by maxmether at May 25, 2018 10:52 PM

Automatic Data Versioning in MariaDB Server 10.3

Automatic Data Versioning in MariaDB Server 10.3 rasmusjohansson Fri, 05/25/2018 - 18:11

MariaDB Server 10.3 comes with a new, very useful feature that will ease the design of many applications. Data versioning is important for several perspectives. Compliance might require that you need to store data changes. For analytical queries, you may want to look at data at a specific point in time and for auditing purposes, what changes were made and when is important. Also, in the case of a table being deleted it can be of great value to recover it from history. MariaDB Server now includes a feature named System-Versioned Tables, which is based on the specification in the SQL:2011 standard. It provides automatic versioning of table data.

I’ll walk through the concept of System-Versioned Tables with a very simple example, which will show you what it is all about. Let’s start by creating a database and a table.

CREATE DATABASE Company; 

CREATE TABLE Person (
  Id int(11) NOT NULL AUTO_INCREMENT,
  FirstName varchar(50) NOT NULL,
  LastName varchar(50) NOT NULL,
  Gender char(1) NOT NULL,
  DepartmentId int(11) NOT NULL,
  PRIMARY KEY (Id),
  CONSTRAINT con_gender CHECK (Gender in ('f','m')))
WITH SYSTEM VERSIONING;

It looks exactly like before except the last table option WITH SYSTEM_VERSIONING, which turns on the automatic versioning on the table. Let’s see how it works by inserting a row into the table.

MariaDB [Company]> INSERT INTO Person (FirstName, LastName, Gender, DepartmentId) VALUES ('Rasmus', 'Johansson', 'm', 1);
Query OK, 1 row affected (0.002 sec)

MariaDB [Company]> SELECT * FROM Person;
+----+-----------+-----------+--------+--------------+
| Id | FirstName | LastName  | Gender | DepartmentId |
+----+-----------+-----------+--------+--------------+
|  1 | Rasmus    | Johansson | m      |            1 |
+----+-----------+-----------+--------+--------------+
1 row in set (0.001 sec)

There we have me as one row in a table. The interesting part starts when we update rows. I’ll change departments a couple of times.

MariaDB [Company]> UPDATE Person SET DepartmentId = 2 WHERE Id = 1;                                      Query OK, 1 row affected (0.002 sec)
Rows matched: 1  Changed: 1  Inserted: 1  Warnings: 0

MariaDB [Company]> SELECT * FROM Person;
+----+-----------+-----------+--------+--------------+
| Id | FirstName | LastName  | Gender | DepartmentId |
+----+-----------+-----------+--------+--------------+
|  1 | Rasmus    | Johansson | m      |            2 |
+----+-----------+-----------+--------+--------------+
1 row in set (0.001 sec)

MariaDB [Company]> UPDATE Person SET DepartmentId = 3 WHERE Id = 1;
Query OK, 1 row affected (0.003 sec)
Rows matched: 1  Changed: 1  Inserted: 1  Warnings: 0

MariaDB [Company]> SELECT * FROM Person;
+----+-----------+-----------+--------+--------------+
| Id | FirstName | LastName  | Gender | DepartmentId |
+----+-----------+-----------+--------+--------------+
|  1 | Rasmus    | Johansson | m      |            3 |
+----+-----------+-----------+--------+--------------+
1 row in set (0.001 sec)

As you can see MariaDB Server tells that there is 1 changed row for each update as usual, but also 1 inserted, which wouldn’t be the case for a table without versioning. Each update causes a new version of the row, which has to be inserted into the table. As you also see above a normal SELECT will only show the latest version. To see all versions of the rows MariaDB Server provides the following syntax.

MariaDB [Company]> SELECT * FROM Person FOR SYSTEM_TIME ALL;
+----+-----------+-----------+--------+--------------+
| Id | FirstName | LastName  | Gender | DepartmentId |
+----+-----------+-----------+--------+--------------+
|  1 | Rasmus    | Johansson | m      |            1 |
|  1 | Rasmus    | Johansson | m      |            2 |
|  1 | Rasmus    | Johansson | m      |            3 |
+----+-----------+-----------+--------+--------------+
3 rows in set (0.001 sec)

To be able to see when the rows have been updated we want to include two invisible columns that are created by the automatic versioning. Invisible Columns is another exciting new feature of MariaDB Server 10.3. The invisible columns of automatic versioning are ROW_START and ROW_END. They define the time period for which the version of the row was/is valid.

MariaDB [Company]> SELECT *, ROW_START, ROW_END FROM Person FOR SYSTEM_TIME ALL;
+----+-----------+-----------+--------+--------------+----------------------------+----------------------------+
| Id | FirstName | LastName  | Gender | DepartmentId | ROW_START                  | ROW_END                    |
+----+-----------+-----------+--------+--------------+----------------------------+----------------------------+
|  1 | Rasmus    | Johansson | m      |            1 | 2018-05-03 07:21:12.386980 | 2018-05-03 07:22:29.188266 |
|  1 | Rasmus    | Johansson | m      |            2 | 2018-05-03 07:22:29.188266 | 2018-05-03 07:22:47.596481 |
|  1 | Rasmus    | Johansson | m      |            3 | 2018-05-03 07:22:47.596481 | 2038-01-19 03:14:07.999999 |
+----+-----------+-----------+--------+--------------+----------------------------+----------------------------+
3 rows in set (0.000 sec)

The interesting thing now is to do point in time queries to retrieve exactly what the table looked like at a specific date and time. We can do this by using the AS OF syntax:

MariaDB [Company]> SELECT * FROM Person FOR SYSTEM_TIME AS OF TIMESTAMP '2018-05-03 07:22:33';
+----+-----------+-----------+--------+--------------+
| Id | FirstName | LastName  | Gender | DepartmentId |
+----+-----------+-----------+--------+--------------+
|  1 | Rasmus    | Johansson | m      |            2 |
+----+-----------+-----------+--------+--------------+
1 row in set (0.001 sec)

This was just a little glimpse into System-Versioned Tables. In addition to the examples shown above you can place history on separate partitions for performance reasons, exclude columns from versioning and much more.

Read more about System-Versioned Tables in the MariaDB documentation. Get MariaDB Server 10.3 as part of the MariaDB TX 3.0 download – now available.

Sarah Taylor

Tue, 06/19/2018 - 06:24

MariaDB?

What is MariaDB Server 10.3? is it kind of language method and query or it is an another kind of database? Could anyone explain? http://www.DissertationHouse.co.uk/

Anel Husakovic

Anel Husakovic

Thu, 06/21/2018 - 11:55

Obtaining the ROW_END/timestamp

Hi,
thank you for the article. It is interesting.
How is ROW_END obtained for the last record? In the example it is 2038-01-19, why ?

And could it happen that user insert timestamp (your example AS OF TIMESTAMP) that is half of timestamps of two subsequent rows (mean value) ? If yes, what the result of your query would be ?

Login or Register to post comments

by rasmusjohansson at May 25, 2018 10:11 PM

Peter Zaitsev

Webinar Tues, 5/29: MySQL, Percona XtraDB Cluster, ProxySQL, Kubernetes: How they work together

Please join Percona’s Principal Architect Alex Rubin as he presents MySQL, Percona XtraDB Cluster, ProxySQL, Kubernetes: How they work together to give you a highly available cluster database environment on Tuesday, May 29th at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

 

In this webinar, Alex will discuss how to deploy a highly available MySQL database environment on Kubernetes/Openshift using Percona XtraDB Cluster (PXC) together with MySQL Proxy to implement read/write splitting.

If you have never used Kubernetes and Openshift, or never used PXC / MySQL Proxy, Alex will do a quick introduction to these technologies. There will also be a demo where Alex sets up a PXC cluster with ProxySQL in Openshift Origin and tries to break it.

By the end of this webinar you will have a better understanding of:

  • How to deploy Percona XtraDB Cluster with ProxySQL for HA solutions
  • How to leverage Kubernetes/Openshift in your environments
  • How to troubleshoot performance issues

Register for the webinar

Alexander Rubin, Principal Consultant

Alexander RubinAlexander joined Percona in 2013. Alexander worked with MySQL since 2000 as DBA and Application Developer. Before joining Percona he was doing MySQL consulting as a principal consultant for over 7 years (started with MySQL AB in 2006, then Sun Microsystems and then Oracle). He helped many customers design large, scalable and highly available MySQL systems and optimize MySQL performance. Alexander also helped customers design Big Data stores with Apache Hadoop and related technologies.

The post Webinar Tues, 5/29: MySQL, Percona XtraDB Cluster, ProxySQL, Kubernetes: How they work together appeared first on Percona Database Performance Blog.

by Alexander Rubin at May 25, 2018 11:54 AM

MariaDB AB

Sequences Support in MariaDB Server 10.3

Sequences Support in MariaDB Server 10.3 rasmusjohansson Fri, 05/25/2018 - 00:33

In the SQL standard SQL:2003 sequences are defined. The idea of sequences is to have a way of requesting unique values on demand. The typical use case for sequences is to have a unique identifier that can be used on multiple tables. In addition it might be useful in some cases to have an identifier before an actual row is inserted. With the normal way of having an automatically incrementing identifier, the identifier value will only be available after insert of the row and the identifier will only be unique inside its own table. The implementation of sequences in MariaDB Server 10.3 follows the standard and includes compatibility with the way Oracle does sequences introduced in Oracle Database Server on top of the standard.

To create a sequence, a create statement is used:

CREATE SEQUENCE Seq1_1  
  START WITH 1  
  INCREMENT BY 1;

This creates a sequence that starts at 1 and is incremented with 1 every time a value is requested from the sequence. In this example, both START WITH and INCREMENT BY could have been left out since there default values are 1. The sequence will be visible among the tables in the database, i.e. if you run SHOW TABLES it will be there. You can use DESCRIBE on the sequence to see what columns it has.

To test out the usage of sequences let’s create a table.

CREATE TABLE Observation (
  Id int(11) NOT NULL,
  Place varchar(50) NOT NULL,
  BirdId int(11) NOT NULL,
  PRIMARY KEY (Id)
)

Since I want to use sequences this time, I did not put AUTO_INCREMENT on the Id column. Instead I’ll ask for the next value from the sequence in the INSERT statements:

INSERT INTO Observation (Id, Place, BirdId) VALUES (NEXT VALUE FOR Seq1_1, 'Helsinki', 10);
INSERT INTO Observation (Id, Place, BirdId) VALUES (NEXT VALUE FOR Seq1_1, 'Espoo', 10);
INSERT INTO Observation (Id, Place, BirdId) VALUES (NEXT VALUE FOR Seq1_1, 'Kirkkonummi', 10);
INSERT INTO Observation (Id, Place, BirdId) VALUES (NEXT VALUE FOR Seq1_1, 'Hanko', 10);

A bird flying west from Helsinki and being observed in cities along the way. In the INSERT statements there is a call to the sequence: NEXT VALUE FOR Seq1_1. It will retrieve the next value from the sequence. Instead of having the NEXT VALUE FOR in each INSERT statement, it could have been the default value of the column in this way:

ALTER TABLE Observation MODIFY Id int(11) NOT NULL DEFAULT NEXT VALUE FOR Seq1_1;

Running a SELECT over the Observation table will look like this:

SELECT * FROM Observation;
+----+-------------+--------+
| Id | Place       | BirdId |
+----+-------------+--------+
|  1 | Helsinki    |     10 |
|  2 | Espoo       |     10 |
|  3 | Kirkkonummi |     10 |
|  4 | Hanko       |     10 |
+----+-------------+--------+
4 rows in set (0.001 sec)

As we can see the Id column has been populated with numbers that start from 1 and are incremented with 1 as defined in the sequence’s CREATE statement. To get the last retrieved number from the sequence PREVIOUS VALUE is used:

SELECT PREVIOUS VALUE FOR Seq1_1;
+---------------------------+
| PREVIOUS VALUE FOR Seq1_1 |
+---------------------------+
|                         4 |
+---------------------------+
1 row in set (0.001 sec)

Another useful option for sequences is CYCLE, which means that we start from the beginning after reaching a certain value. For example, if there are 4 phases in a process that are done sequentially and then start from the beginning, we could create a sequence to always be able to retrieve the number of the next phase.

CREATE SEQUENCE Seq1_1_c4  
  START WITH 1  
  INCREMENT BY 1
  MAXVALUE = 4
  CYCLE;

The sequence starts at 1 and is incremented with 1 every time the next value is requested. But when it reaches 4 (MAXVALUE) it will restart from 1 (CYCLE).

It’s also possible to set the next value of a sequence, to ALTER a sequence or using sequences in Oracle mode with Oracle specific syntax. To switch to Oracle mode use:

SET SQL_MODE=ORACLE;

After that you can retrieve the next value of a sequence in Oracle style:

SELECT Seq1_1.nextval;

Read more about sequences in the MariaDB documentation. Get MariaDB Server 10.3 as part of the MariaDB TX 3.0 download – now available.

Login or Register to post comments

by rasmusjohansson at May 25, 2018 04:33 AM

May 24, 2018

Peter Zaitsev

Using dbdeployer to manage MySQL, Percona Server and MariaDB sandboxes

dbdeployer by Giuseppe Maxia

Some years ago, Peter Z wrote a blogpost about using MySQL Sandbox to deploy multiple server versions. Last February, Giuseppe  introduced us to its successor: dbdeployer. In this blogpost we will demonstrate how to use it. There is a lot of information in Giuseppe’s post, so head there if you want a deeper dive.

First step is to install it, which is really easy to do now since it’s developed in Go, and standalone executables are provided. You can get the latest version here.

shell> wget https://github.com/datacharmer/dbdeployer/releases/download/1.5.0/dbdeployer-1.5.0.linux.tar.gz
shell> tar xzf dbdeployer-1.5.0.linux.tar.gz
shell> mv dbdeployer-1.5.0.linux ~/bin/dbdeployer

If you have your ~/bin/ directory in the path, you should now be able to run dbdeployer commands.

dbdeployer by Giuseppe Maxia

Let’s start with deploying a latest version vanilla MySQL sandbox.

In the Support Team, we extensively use MySQL Sandbox (the predecessor to dbdeployer) to easily run different flavours and versions of MySQL so that we can test with the same versions our customers present us with. We store MySQL binaries in /opt/, so we can all share them and avoid wasting disk space on duplicated binaries.

The first step to using dbdeployer is getting the binary we want to run, and then unpacking it into the binaries directory.

shell> wget https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.11-linux-glibc2.12-x86_64.tar.gz
shell> dbdeployer --sandbox-binary=/opt/mysql/ unpack mysql-8.0.11-linux-glibc2.12-x86_64.tar.gz

This command will extract and move the files to the appropriate directory, which in this case is under /opt/mysql/ as overridden with the --sandbox-binary argument, so we can use them with the deploy command.

Standalone

To create a new standalone MySQL sandbox with the newly extracted binary, we can use the following command.

shell> dbdeployer --sandbox-binary=/opt/mysql/ deploy single 8.0.11
Creating directory /home/vagrant/sandboxes
Database installed in $HOME/sandboxes/msb_8_0_11
run 'dbdeployer usage single' for basic instructions'
.. sandbox server started

You can read the dbdeployer usage output to have even more information on how the tool works. Next, let’s connect to it.

shell> cd sandboxes/msb_8_0_11/
shell> ./use
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 9
Server version: 8.0.11 MySQL Community Server - GPL
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql [localhost] {msandbox} ((none)) > select @@version, @@port;
+-----------+--------+
| @@version | @@port |
+-----------+--------+
| 8.0.11    | 8011 |
+-----------+--------+
1 row in set (0.00 sec)

And that was it! When creating the new instance, dbdeployer will try to use the same port as the version numbers concatenated. If that port is in use, it will try another one, or we can manually override it with the --port argument.

Replication

We can also easily setup a replication environment with just one command.

shell> dbdeployer --sandbox-binary=/opt/mariadb/ deploy replication 10.2.15
Installing and starting master
. sandbox server started
Installing and starting slave1
. sandbox server started
Installing and starting slave2
. sandbox server started
$HOME/sandboxes/rsandbox_10_2_15/initialize_slaves
initializing slave 1
initializing slave 2
Replication directory installed in $HOME/sandboxes/rsandbox_10_2_15
run 'dbdeployer usage multiple' for basic instructions'

Again, you should run the recommended command to get more insight into what can be done. We can use the ./m script to connect to the master, and ./s1 to connect to the first slave. The ./use_all* scripts can come in handy to run commands in many servers at a time.

Multiple sandboxes

Finally, we will see how to create multiple sandboxes with the same version at the same time.

shell> dbdeployer --sandbox-binary=/opt/percona_server/ deploy multiple 5.7.21
Installing and starting node 1
. sandbox server started
Installing and starting node 2
. sandbox server started
Installing and starting node 3
. sandbox server started
multiple directory installed in $HOME/sandboxes/multi_msb_5_7_21
run 'dbdeployer usage multiple' for basic instructions'

This could be useful for setting up environments that are not already covered by the tool, like Galera clusters or semi-sync replication. With this approach, we will at least have a base to start from, and then can use our own custom scripts. dbdeployer now has templates, which would allow extending functionality to support this, if needed. I have not yet tried to do so, but sounds like an interesting project for the future! Let me know if you would be interested in reading more about it.

The post Using dbdeployer to manage MySQL, Percona Server and MariaDB sandboxes appeared first on Percona Database Performance Blog.

by Agustín at May 24, 2018 09:56 PM

Setting up PMM on Google Compute Engine in 15 minutes or less

Percona Monitoring and Management on Google Compute Engine

In this blog post, I will show you how easy it is to set up a Percona Monitoring and Management server on Google Compute Engine from the command line.

First off you will need to have a Google account and install the Cloud SDK tool. You need to create a GCP (Google Cloud Platform) project and enable billing to proceed. This blog assumes you are able to authenticate and SSH into instances from the command line.

Here are the steps to install PMM server in Google Cloud Platform.

1) Create the Compute engine instance with the following command. The example creates an Ubuntu Xenial 16.04 LTS compute instance in the us-west1-b zone with a 100GB persistent disk. For production systems it would be best to use a 500GB disk instead (size=500GB). This should be enough for default data retention settings, although your needs may vary.

jerichorivera@percona-support:~/GCE$ gcloud compute instances create pmm-server --tags pmmserver --image-family ubuntu-1604-lts --image-project ubuntu-os-cloud --machine-type n1-standard-4 --zone us-west1-b --create-disk=size=100GB,type=pd-ssd,device-name=sdb --description "PMM Server on GCP" --metadata-from-file startup-script=deploy-pmm-xenial64.sh
Created [https://www.googleapis.com/compute/v1/projects/thematic-acumen-204008/zones/us-west1-b/instances/pmm-server].
NAME        ZONE        MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP   STATUS
pmm-server  us-west1-b  n1-standard-4               10.138.0.2   35.233.216.225  RUNNING

Notice that we’ve used

--metadata-from-file startup-script=deploy-pmm-xenial64.sh
  The file has the following contents:

jerichorivera@percona-support:~$ cat GCE/deploy-pmm-xenial64.sh
#!/bin/bash
set -v
sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
# Format the persistent disk, mount it then add to /etc/fstab
sudo mkfs.ext4 -m 0 -F -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
sudo mkdir -p /mnt/disks/pdssd
sudo mount -o discard,defaults /dev/sdb /mnt/disks/pdssd/
sudo chmod a+w /mnt/disks/pdssd/
sudo cp /etc/fstab /etc/fstab.backup
echo UUID=`sudo blkid -s UUID -o value /dev/sdb` /mnt/disks/pdssd ext4 discard,defaults,nofail 0 2 | sudo tee -a /etc/fstab
# Change docker’s root directory before installing Docker
sudo mkdir /etc/systemd/system/docker.service.d/
cat << EOF > /etc/systemd/system/docker.service.d/docker.root.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H fd:// -g /mnt/disks/pdssd/docker/
EOF
sudo apt-get install -y docker-ce
# Creates the deploy.sh script
cat << EOF > /tmp/deploy.sh
#!/bin/bash
set -v
docker pull percona/pmm-server:latest
docker create -v /opt/prometheus/data -v /opt/consul-data -v /var/lib/mysql -v /var/lib/grafana --name pmm-data percona/pmm-server:latest /bin/true
docker run -d -p 80:80 --volumes-from pmm-data --name pmm-server --restart always percona/pmm-server:latest
EOF

This startup script will be executed right after the compute instance is created. The script will format the persistent disk and mount the file system; create a custom Docker unit file for the purpose of creating Docker’s root directory from /var/lib/docker to /mnt/disks/pdssd/docker; install the Docker package; and create the deploy.sh script.

2) Once the compute engine instance is created, SSH into the instance, check that Docker is running and the root directory pointing to the desired folder.

jerichorivera@pmm-server:~$ sudo systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─docker.root.conf
   Active: active (running) since Wed 2018-05-16 12:53:30 UTC; 45s ago
     Docs: https://docs.docker.com
 Main PID: 4744 (dockerd)
   CGroup: /system.slice/docker.service
           ├─4744 /usr/bin/dockerd -H fd:// -g /mnt/disks/pdssd/docker/
           └─4764 docker-containerd --config /var/run/docker/containerd/containerd.toml
May 16 12:53:30 pmm-server dockerd[4744]: time="2018-05-16T12:53:30.391566708Z" level=warning msg="Your kernel does not support swap memory limit"
May 16 12:53:30 pmm-server dockerd[4744]: time="2018-05-16T12:53:30.391638253Z" level=warning msg="Your kernel does not support cgroup rt period"
May 16 12:53:30 pmm-server dockerd[4744]: time="2018-05-16T12:53:30.391680203Z" level=warning msg="Your kernel does not support cgroup rt runtime"
May 16 12:53:30 pmm-server dockerd[4744]: time="2018-05-16T12:53:30.392913043Z" level=info msg="Loading containers: start."
May 16 12:53:30 pmm-server dockerd[4744]: time="2018-05-16T12:53:30.767048674Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
May 16 12:53:30 pmm-server dockerd[4744]: time="2018-05-16T12:53:30.847907241Z" level=info msg="Loading containers: done."
May 16 12:53:30 pmm-server dockerd[4744]: time="2018-05-16T12:53:30.875129963Z" level=info msg="Docker daemon" commit=9ee9f40 graphdriver(s)=overlay2 version=18.03.1-ce
May 16 12:53:30 pmm-server dockerd[4744]: time="2018-05-16T12:53:30.875285809Z" level=info msg="Daemon has completed initialization"
May 16 12:53:30 pmm-server dockerd[4744]: time="2018-05-16T12:53:30.884566419Z" level=info msg="API listen on /var/run/docker.sock"
May 16 12:53:30 pmm-server systemd[1]: Started Docker Application Container Engine.

3) Add your user to the docker group as shown below and change deploy.sh script to executable.

jerichorivera@pmm-server:~$ sudo usermod -aG docker $USER
jerichorivera@pmm-server:~$ sudo chmod +x /tmp/deploy.sh

4) Log off from the instance, and then log back in and then execute the deploy.sh script.

jerichorivera@pmm-server:~$ cd /tmp/
jerichorivera@pmm-server:/tmp$ ./deploy.sh
docker pull percona/pmm-server:latest
latest: Pulling from percona/pmm-server
697841bfe295: Pull complete
fa45d21b9629: Pull complete
Digest: sha256:98d2717b4f0ae83fbca63330c39590d69a7fca7ae6788f52906253ac75db6838
Status: Downloaded newer image for percona/pmm-server:latest
docker create -v /opt/prometheus/data -v /opt/consul-data -v /var/lib/mysql -v /var/lib/grafana --name pmm-data percona/pmm-server:latest /bin/true
8977102d419cf8955fd8bbd0ed2c663c75a39f9fbc635238d56b480ecca8e749
docker run -d -p 80:80 --volumes-from pmm-data --name pmm-server --restart always percona/pmm-server:latest
83c2e6db2efc752a6beeff0559b472f012062d3f163c042e5e0d41cda6481d33

5) Finally, create a firewall rule to allow HTTP port 80 to access the PMM Server. For security reasons, we recommend that you secure your PMM server by adding a password, or limit access to it with a stricter firewall rule to specify which IP addresses can access port 80.

jerichorivera@percona-support:~$ gcloud compute firewall-rules create allow-http-pmm-server --allow tcp:80 --target-tags pmmserver --description "Allow HTTP traffic to PMM Server"
Creating firewall...-Created [https://www.googleapis.com/compute/v1/projects/thematic-acumen-204008/global/firewalls/allow-http-pmm-server].
Creating firewall...done.
NAME                   NETWORK  DIRECTION  PRIORITY  ALLOW   DENY
allow-http-pmm-server  default  INGRESS    1000      tcp:80
jerichorivera@percona-support:~/GCE$ gcloud compute firewall-rules list
NAME                    NETWORK  DIRECTION  PRIORITY  ALLOW                         DENY
allow-http-pmm-server   default  INGRESS    1000      tcp:80
default-allow-icmp      default  INGRESS    65534     icmp
default-allow-internal  default  INGRESS    65534     tcp:0-65535,udp:0-65535,icmp
default-allow-rdp       default  INGRESS    65534     tcp:3389
default-allow-ssh       default  INGRESS    65534     tcp:22

At this point you should have a PMM Server in GCP running on a Compute Engine instance.

The next steps is to install pmm-client on the database hosts and add services for monitoring.

Here I’ve launched a single standalone Percona Server 5.6 on another Compute Engine instance in the same project (thematic-acumen-204008).

jerichorivera@percona-support:~/GCE$ gcloud compute instances create mysql1 --tags mysql1 --image-family centos-7 --image-project centos-cloud --machine-type n1-standard-2 --zone us-west1-b --create-disk=size=50GB,type=pd-standard,device-name=sdb --description "MySQL1 on GCP" --metadata-from-file startup-script=compute-instance-deploy.sh
Created [https://www.googleapis.com/compute/v1/projects/thematic-acumen-204008/zones/us-west1-b/instances/mysql1].
NAME    ZONE        MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
mysql1  us-west1-b  n1-standard-2               10.138.0.3   35.233.187.253  RUNNING

Installed Percona Server 5.6 and pmm-client and then added services. Take note that since the PMM Server and the MySQL server is in the same project and same VPC network, we can connect directly through INTERNAL_IP 10.138.0.2, otherwise use the EXTERNAL_IP 35.223.216.225.

[root@mysql1 jerichorivera]# pmm-admin config --server 10.138.0.2
OK, PMM server is alive.
PMM Server      | 10.138.0.2
Client Name     | mysql1
Client Address  | 10.138.0.3
[root@mysql1 jerichorivera]#
[root@mysql1 jerichorivera]# pmm-admin check-network
PMM Network Status
Server Address | 10.138.0.2
Client Address | 10.138.0.3
* System Time
NTP Server (0.pool.ntp.org)         | 2018-05-22 06:45:47 +0000 UTC
PMM Server                          | 2018-05-22 06:45:47 +0000 GMT
PMM Client                          | 2018-05-22 06:45:47 +0000 UTC
PMM Server Time Drift               | OK
PMM Client Time Drift               | OK
PMM Client to PMM Server Time Drift | OK
* Connection: Client --> Server
-------------------- -------
SERVER SERVICE       STATUS
-------------------- -------
Consul API           OK
Prometheus API       OK
Query Analytics API  OK
Connection duration | 408.185µs
Request duration    | 6.810709ms
Full round trip     | 7.218894ms
No monitoring registered for this node identified as 'mysql1'.
[root@mysql1 jerichorivera]# pmm-admin add mysql --create-user
[linux:metrics] OK, now monitoring this system.
[mysql:metrics] OK, now monitoring MySQL metrics using DSN pmm:***@unix(/mnt/disks/disk1/data/mysql.sock)
[mysql:queries] OK, now monitoring MySQL queries from slowlog using DSN pmm:***@unix(/mnt/disks/disk1/data/mysql.sock)
[root@mysql1 jerichorivera]# pmm-admin list
pmm-admin 1.10.0
PMM Server      | 10.138.0.2
Client Name     | mysql1
Client Address  | 10.138.0.3
Service Manager | linux-systemd
-------------- ------- ----------- -------- ----------------------------------------------- ------------------------------------------
SERVICE TYPE   NAME    LOCAL PORT  RUNNING  DATA SOURCE                                     OPTIONS
-------------- ------- ----------- -------- ----------------------------------------------- ------------------------------------------
mysql:queries  mysql1  -           YES      pmm:***@unix(/mnt/disks/disk1/data/mysql.sock)  query_source=slowlog, query_examples=true
linux:metrics  mysql1  42000       YES      -
mysql:metrics  mysql1  42002       YES      pmm:***@unix(/mnt/disks/disk1/data/mysql.sock)

Lastly, in case you need to delete the PMM Server instance. Just execute this delete command below to completely remove the instance and the attached disk. Be aware that you may remove the boot disk and retain the attached persistent disk if you prefer.

jerichorivera@percona-support:~/GCE$ gcloud compute instances delete pmm-server
The following instances will be deleted. Any attached disks configured
 to be auto-deleted will be deleted unless they are attached to any
other instances or the `--keep-disks` flag is given and specifies them
 for keeping. Deleting a disk is irreversible and any data on the disk
 will be lost.
 - [pmm-server] in [us-west1-b]
Do you want to continue (Y/n)?  y
Deleted [https://www.googleapis.com/compute/v1/projects/thematic-acumen-204008/zones/us-west1-b/instances/pmm-server].

The other option is to install PMM on Google Container engine which was explained by Manjot Singh in his blog post.

The post Setting up PMM on Google Compute Engine in 15 minutes or less appeared first on Percona Database Performance Blog.

by Jericho Rivera at May 24, 2018 07:32 PM

MariaDB AB

MariaDB TX 3.0 – First to Deliver on the Promise of Enterprise Open Source

MariaDB TX 3.0 – First to Deliver on the Promise of Enterprise Open Source Shane Johnson Thu, 05/24/2018 - 15:28

It’s one thing to be open source. It’s another to be enterprise open source.

That begs the question: What does it mean to be enterprise open source?

You have to be 100% committed to the open source community – collaboration, transparency and innovation. You have to be 100% committed to customer success – providing the enterprise features and reliability needed to support mission-critical applications.

However, being committed is not enough. You have to be a leader. You have to challenge proprietary vendors, and that includes vendors who limit open source projects with proprietary extensions and/or plugins.

MariaDB TX 3.0 sets the standard for enterprise open source databases, and as the leader, we’re challenging Oracle, Microsoft and IBM with it. Here’s how.

Oracle Database compatibility

MariaDB TX 3.0 is the first enterprise open source database with Oracle Database compatibility, including support for stored procedures written in PL/SQL. Until now, if you needed Oracle Database compatibility, you needed a proprietary database (IBM DB2 or EnterpriseDB). Until now, if you needed Oracle Database compatibility, you needed a proprietary database (IBM DB2 or EnterpriseDB). Today, you can run those Oracle PL/SQL stored procedures on MariaDB TX!

Temporal features

MariaDB TX 3.0 is the first enterprise open source database with temporal features, including built-in system-versioned tables and standard temporal query syntax. Until now, if you needed the functional equivalent of Oracle Flashback queries or Microsoft SQL Server temporal tables, you needed a proprietary database. Today, you can run those temporal queries on MariaDB TX.

Faster schema changes

MariaDB TX 3.0 is the first enterprise open source database to support invisible columns (like Oracle Database), compressed columns and the ability to add columns (with or without default values) to a table without causing all of the rows to be updated (i.e., a table rebuild) – something you can’t do in MySQL or Postgres. Simply said, life is easier with MariaDB TX.

Purpose-built storage

MariaDB TX 3.0 is the first enterprise open source database to support a variety of workloads, all with the same level of performance, by leveraging multiple, purpose-built storage engines: the default storage engine for mixed or read-mostly workloads (InnoDB), an SSD-optimized storage engine for write-intensive workloads (MyRocks) and a distributed storage engine for workloads requiring extreme scalability and/or concurrency (Spider).

While general-purpose databases are limited to supporting one workload really well, MariaDB TX can support a variety of workloads very well – and at the same time. Would you need a NoSQL database if your relational database supported JSON and distributed storage (i.e., scale out)?

You could deploy multiple specialized databases, but wouldn’t you rather standardize on a single database? Well, you can with MariaDB TX.

Data protection

MariaDB TX 3.0 is the first enterprise open source database to support anonymization via complete data obfuscation and psuedoanonymization via full or partial data masking, necessary features assuming you want to comply with EU GDPR and don’t want your company featured in tomorrow's headlines as the featured security breach of the month. If you’re using Oracle Database, these features are part of Oracle Data Redaction, and require Oracle Advanced Security – an extra $7,500 per core. MariaDB TX database administrators sleep well at night.

Conclusion

We created MariaDB TX 3.0 so you can migrate from Oracle/Microsoft/IBM to the enterprise open source database you want without sacrificing the enterprise features you need. Ready?

Login or Register to post comments

by Shane Johnson at May 24, 2018 07:28 PM

The Ease of Migrating to MariaDB TX Using New Oracle Compatibility Features

The Ease of Migrating to MariaDB TX Using New Oracle Compatibility Features alexander.bien… Thu, 05/24/2018 - 09:00

One of the central themes of MariaDB TX 3.0 is reducing the cost, reusing established skill sets and increasing the efficiency, speed and technical viability of migrations from complex, proprietary databases. We’ve added some amazing new features including SEQUENCE constructs, Oracle-style packages, and the ROW data type - making migrations dramatically easier and a true game changer for enterprise open source databases.

With SQL_MODE = ORACLE, MariaDB is now able to parse, depending on the case, approximately 80 percent of the legacy Oracle PL/SQL without rewriting the code. Using the core Oracle PL/SQL compatibility in MariaDB TX 3.0, the Development Bank of Singapore (DBS) has been able to migrate more than half of their business-critical applications in just 12 months from Oracle Database to MariaDB. In fact, vast parts of your Oracle PL/SQL code can be migrated seamlessly, including:

  • Stored Procedure Parameters

  • Non-ANSI Stored Procedure Construct

  • Cursor Syntax

  • Loop Syntax

  • Variable Declaration

  • Data Type inheritance (%TYPE, %ROWTYPE)

  • PL/SQL style Exceptions

  • Synonyms for Basic SQL Types (VARCHAR2, NUMBER, …)


MariaDB Red Rover Migration Practice

We’re here to help with your database migration no matter the challenge. Our migration practice has created and maintained a number of best practices and technical solutions. In order to help customers save costs and untangle complexity, we aim at enabling database teams as quickly as possible. For critical migration steps such as planning, quality assurance, and conducting switchovers, we assist our customers in achieving a smooth and purposeful project scope with these six steps.

  • Migration Assessment: A solid analysis of 8-10 days is the basis of a well-founded, thoughtful and purposeful migration.

  • Proof of Concept: In the beginning of a migration project, we recommend proving the viability of all critical components of the target architecture and the migration process itself, such as:

    • Schema migration in a precise, automated and validated way.

    • Procedure migration with the desired goodness, degree of automation etc.

    • High availability with MariaDB according to SLAs of the customer.

    • Live data replication from the proprietary database to MariaDB.

    • Customer requirements to test and prove in order to achieve a solid decision basis and obtain a clear migration path.

  • Bootstrapping Migration: We enable customers to help save costs in the migration.

  • Migration Consulting: Where necessary and helpful, we accompany the migration project by:

    • Adding and sharing knowledge.

    • Helping to streamline and manage the migration.

    • Keeping the migration on the right track and helping to avoid pitfalls.

  • Switchover: Before the actual migration, we carefully plan, validate, train and conduct the following:

    • Forward switchover steps.

    • Rollback steps where short-term action may be necessary.

    • Points of no return in order to identify and plan for critical dependencies and appropriate solution paths for all eventualities.

  • Pilot phase: We assist with the first weeks of system operation in order to help deal with all aspects of operating, monitoring, optimizing and securing the new migrated system. This includes active support for:

    • DBA teams.

    • Application maintenance teams.

    • Software development teams.

For all these measures, our migration practice aligns with customer methodologies, frameworks, best practices and compliance requirements. This includes PRINCE2, ITIL, data protection and SOX /Euro-SOX, as well as other regulatory requirements.
 

How does this save?

Cost reductions of a migration project are achieved through a number of best practices:

  • Sharing migration knowledge directly with the customer teams and creating competence as early as possible.

  • Enabling multiplication structures such as “train the trainer,” which is also important for quality assurance.

  • Reusing established skill sets when migrating applications or deploying new ones.

In our approach, the migration effort is significantly reduced compared to manual migrations:

  • Efficient, but deep analysis done during the Migration Assessment.

  • Automation implemented in our tooling.

  • Core Oracle PL/SQL compatibility.


Now available: Migration Architect

In addition to our MariaDB TX Subscription, we offer a Migration Architect service in order to help initiate, conduct and finalize a successful migration:

  • Migration Architect is an extension of the MariaDB Red Rover Migration Practice

  • Migration Architect serves in an advisory capacity regarding:

    • Database architecture

    • Migration project planning and conduct

    • Applying best practices from the MariaDB Migration Practice

    • Quality assurance throughout the migration

    • Switchover planning, training and conduct


Want to learn more about Oracle compatibility features in MariaDB TX 3.0? Join our webinar on June 7 to hear what’s new. Register now.

One of the central themes of MariaDB TX 3.0 is reducing the cost, reusing established skill sets and increasing the efficiency, speed and technical viability of migrations from complex, proprietary databases. We’ve added some amazing new features including SEQUENCE constructs, Oracle-style packages, and the ROW data type - making migrations dramatically easier and a true game changer for enterprise open source databases.

Login or Register to post comments

by alexander.bienemann_57274 at May 24, 2018 01:00 PM

May 23, 2018

MariaDB AB

A Look into MariaDB Auditing for GDPR Compliance

A Look into MariaDB Auditing for GDPR Compliance maria-luisaraviol Wed, 05/23/2018 - 18:27

When we are talking about a database auditing concept, what we are focused on is tracking the use of database records, and the monitoring of each operation on the data.

The auditing activities goal is to provide a clear and reliable answer to the typical 4 W questions: Who accessed the database, When did this happen, What was touched, Where this access came from. Auditing should also help the security team answer the 5th W: Why this happened?

Auditing is also a very important task when we want to monitor the database activity to collect information that can help to increase the database performance or debug the application.

When we talk about security, accountability and regulatory compliance Database Auditing plays an even more critical role.

An auditing activity is key in achieving accountability as it allows us to investigate malicious or suspicious database activities. It’s used to help DBAs detect excessive user privileges or suspicious activities coming from specific connections.

In particular, the new European Union General Data Protection Regulation (GDPR) says that it will be important to be able to provide detail of changes to personal data to demonstrate that data protection and security procedures are effective and are being followed. Furthermore, we must ensure that data is only accessed by appropriate parties. This means that we need to be able to say who changed an item of data and when they changed it.

It’s broader than GDPR. HIPAA (Health Insurance Portability and Accountability Act) requires healthcare providers to deliver audit trails about anyone and everyone who touches any data in their records. This is down to the row and record level.

Furthermore, if a data breach occurs, organizations must disclose full information on these events to their local data protection authority (DPA) and all customers concerned with the data breach within 72 hours so they can respond accordingly.

MariaDB Audit Plugin

For all these reasons MariaDB started including the Audit Plugin since version 10.0.10 of MariaDB Server. The purpose of the MariaDB Audit Plugin is to log the server's activity: for each client session, it records who connected to the server (i.e., user name and host), what queries were executed, and which tables were accessed and server variables that were changed.

Events that are logged by the MariaDB Audit Plugin are grouped into three different types: CONNECT, QUERY and TABLE events.

There are actually more types of events to allow fine-tuning of the audit, and focus on just the events and statements relevant for a specific organisation. These are detailed on the Log Settings Page.

There also exist several system variables to configure the MariaDB Audit Plugin. the Server Audit Status Variables page includes all variables relevant to review the status of the auditing. The overall monitoring should include an alert to monitor that the auditing is active.

This information is stored in a rotating log file or it may be sent to the local syslog.

For security reasons, it's sometimes recommended to use the system logs instead of a local file: in this case the value of server_audit_output_type needs to be set to syslog.

It is also possible to set up even more advanced and secure solutions such as using a remote syslog service (Read more about the MariaDB Audit Plugin and setting up a rsyslog).

What does the MariaDB audit log file looks like?

The audit log file is a set of rows in plain text format, written as a list of comma-separated fields to a file. The general format for the logging to the plugin's own file is defined like the following:

[timestamp],[serverhost],[username],[host],[connectionid],
[queryid],[operation],[database],[object],[retcode]

If the log file is sent to syslog the format is slightly different as the syslog has its own standard format (refer to the MariaDB Audit Plugin Log Format page for the details).

A typical MariaDB Audit plugin log file example is:

# tail mlr_Test_audit.log

20180421 09:22:38,mlr_Test,root,localhost,22,0,CONNECT,,,0
20180421 09:22:42,mlr_Test,root,localhost,22,35,QUERY,,'CREATE USER IF NOT EXISTS \'mlr\'@\'%\' IDENTIFIED WITH \'mysql_native_password\' AS \'*F44445443BB93ED07F5FAB7744B2FCE47021238F\'',0
20180421 09:22:42,mlr_Test,root,localhost,22,36,QUERY,,'drop user if exists mlr',0
20180421 09:22:45,mlr_Test,root,localhost,22,0,DISCONNECT,,,0
20180421 09:25:29,mlr_Test,root,localhost,20,0,FAILED_CONNECT,,,1045
20180421 09:25:44,mlr_Test,root,localhost,43,133,WRITE,employees,salaries,
20180421 09:25:44,mlr_Test,root,localhost,43,133,QUERY,employees,'DELETE FROM salaries LIMIT 100',0

Audit Files Analysis

Log files are a great source of information but only if you have a system in place to consistently review the data. Also the way you shape your application and database environment is important. In order to get useful auditing, for example, it’s recommended that every human user has his own account.

Furthermore, from the applications standpoint, if those are not using native DB accounts but application based accounts, each application accessing the same server should have its own "application-user".

As we said before, you have to use the information collected and analyse it on a regular basis, and when needed, take immediate actions based on those logged events. However, even small environments can generate a lot of information to be analysed manually.

Starting with the most recent release, Monyog 8.5, the monitoring tool that is included with the MariaDB TX and MariaDB AX subscriptions,  added a very interesting feature for MariaDB: The Audit Log.

This feature parses the audit log maintained by MariaDB Server and displays the content in a clean tabular format.

Monyog accesses the audit log file, the same way it does for other MariaDB log files, including the Slow Query, General Query and Error log.

Through the Monyog interface you can select the server and the time-frame for which you want the audit log to be seen from.  Then, clicking on “SHOW AUDIT LOG” fetches the contents of the log. The limit on the number of rows that can be fetched in one time-frame is 10000.

Screen Shot 2018-05-22 at 2.57.00 PM.png

The snapshot above gives you a quick summary of the audit log in a percentage, like Failed Logins, Failed Events, Schema changes, Data Changes and Stored Procedure. All these legends are clickable and shows the corresponding audit log entries on clicking.

Furthermore, you can use the filter option to fetch audit log based on Username, Host, Operation, Database and Table/Query.

Login or Register to post comments

by maria-luisaraviol at May 23, 2018 10:27 PM

Peter Zaitsev

Percona Monitoring and Management 1.11.0 Is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL® and MongoDB® performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL® and MongoDB® servers to ensure that your data works as efficiently as possible.

In PMM Release 1.11.0, we deliver the following changes:

  • Configurable MySQL Slow Log Rotation – enable or disable rotation, and specify how many files to keep on disk
  • Predictable Graphs – we’ve updated our formulas to use aggregation functions over time for more reliable graphs
  • MySQL Exporter Parsing of my.cnf – we’ve improved how we read my.cnf
  • Annotation improvements – passing multiple strings results in single annotation being written

The issues in the release includes 1 new features & improvements, and 9 bugs fixed.

MySQL Slow Log Rotation Improvements

We spent some time this release going over how we handle MySQL’s Slow Log rotation logic. Query Analytics requires that slow logging be enabled (either to file, or to PERFORMANCE_SCHEMA) and we found that users of Percona Server for MySQL overwhelmingly choose logging to a file in order to take advantage of log_slow_verbosity which provides enhanced InnoDB Usage information. However, the challenge with MySQL’s Slow Log is that it is very verbose and thus the number one concern is disk space. PMM strives to do no harm and so MySQL Slow Log Rotation was a natural fit, but until this release we were very strict and hadn’t enabled any configuration of these parameters.

Percona Server for MySQL Users have long known about Slow Query Log Rotation and Expiration, but until now had no way of using the in-built Percona Server for MySQL feature while ensuring that PMM wasn’t missing any queries from the Slow Log during file rotation. Or perhaps your use case is that you want to do Slow Log Rotation using logrotate or some other facility. Today with Release 1.11 this is now possible!

We’ve made two significant changes:

  1. You can now specify the number of Slow Log files to remain on disk, and let PMM handle deleting the oldest files first. Default remains unchanged – 1 Slow Log to remain on disk.
  2. Slow Log rotation can now be disabled, for example if you want to manage rotation using logrotate or Percona Server for MySQL Slow Query Log Rotation and Expiration. Default remains unchanged – Slow Log Rotation is ON.

Number of Slow Logs Retained on Disk

Slow Logs Rotation – On or Off

You specify each of these two new controls when setting up the MySQL service. The following example specifies that 5 Slow Log files should remain on disk:

pmm-admin add mysql ... --retain-slow-logs=5

While the following example specifies that Slow Log rotation is to be disabled (flag value of false), with the assumption that you will perform your own Slow Log Rotation:

pmm-admin add mysql ... --slow-log-rotation=false

We don’t currently support modifying option parameters for an existing service definition. This means you must remove, then re-add the service and include the new options.

We’re including a logrotate script in this post to get you started, and it is designed to keep 30 copies of Slow Logs at 1GB each. Note that you’ll need to update the Slow Log location, and ensure a MySQL User Account with SUPER, RELOAD are used for this script to successfully execute.

Example logrotate
/var/mysql/mysql-slow.log {
    nocompress
    create 660 mysql mysql
    size 1G
    dateext
    missingok
    notifempty
    sharedscripts
    postrotate
       /bin/mysql -e 'SELECT @@global.long_query_time INTO @LQT_SAVE; SET GLOBAL long_query_time=2000; SELECT SLEEP(2); FLUSH SLOW LOGS; SELECT SLEEP(2); SET GLOBAL long_query_time=@LQT_SAVE;'
    endscript
    rotate 30
}

Predictable Graphs

We’ve updated the logic on four dashboards to better handle predictability and also to allow zooming to look at shorter time ranges.  For example, refreshing PXC/Galera graphs prior to 1.11 led to graphs spiking at different points during the metric series. We’ve reviewed each of these graphs and their corresponding queries and added in <aggregation>_over_time() functions so that graphs display a consistent view of the metric series. This improves your ability to drill in on the dashboards so that no matter how short your time range, you will still observe the same spikes and troughs in your metric series. The four dashboards affected by this improvement are:

  • Home Dashboard
  • PXC/Galera Graphs Dashboard
  • MySQL Overview Dashboard
  • MySQL InnoDB Metrics Dashboard

MySQL Exporter parsing of my.cnf

In earlier releases, the MySQL Exporter expected only key=value type flags. It would ignore options without values (i.e. disable-auto-rehash), and could sometimes read the wrong section of the my.cnf file.  We’ve updated the parsing engine to be more MySQL compatible.

Annotation improvements

Annotations permit the display of an event on all dashboards in PMM.  Users reported that passing more than one string to pmm-admin annotate would generate an error, so we updated the parsing logic to assume all strings passed during annotation creation generates a single annotation event.  Previously you needed to enclose your strings in quotes so that it would be parsed as a single string.

Issues in this release

New Features & Improvements

  • PMM-2432 – Configurable MySQL Slow Log File Rotation

Bug fixes

  • PMM-1187 – Graphs breaks at tight resolution 
  • PMM-2362 – Explain is a part of query 
  • PMM-2399 – RPM for pmm-server is missing some files 
  • PMM-2407 – Menu items are not visible on PMM QAN dashboard 
  • PMM-2469 – Parsing of a valid my.cnf can break the mysqld_exporter 
  • PMM-2479 – PXC/Galera Cluster Overview dashboard: typo in metric names 
  • PMM-2484 – PXC/Galera Graphs display unpredictable results each time they are refreshed 
  • PMM-2503 – Wrong InnoDB Adaptive Hash Index Statistics 
  • PMM-2513 – QAN-agent always changes max_slowlog_size to 0 
  • PMM-2514 – pmm-admin annotate help – fix typos
  • PMM-2515 – pmm-admin annotate – more than 1 annotation 

How to get PMM

PMM is available for installation using three methods:

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

The post Percona Monitoring and Management 1.11.0 Is Now Available appeared first on Percona Database Performance Blog.

by Michael Coburn at May 23, 2018 08:37 PM

Percona Server for MongoDB 3.6.4-1.2 Is Now Available

MongoRocks

Percona Server for MongoDBPercona announces the release of Percona Server for MongoDB 3.6.4-1.2 on May 23, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.6 Community Edition. It supports MongoDB 3.6 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine, as well as several enterprise-grade features. Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.6.4 and includes the following additional changes:

  • #PSMDB-205: mongod failed to initialize if audit filter was set to record Action type events specified with the $in expression.
  • #PSMDB-207: a premature initialization of the feature compatibility version in global parameters was fixed for the RocksDB storage engine.
  • #PSMDB-209: CentOS 6 and CentOS 7 RPM packages contained config file with a wrong link to the online Percona Memory Engine documentation.

Note: as mentioned in the Percona Server for MongoDB 3.6.3-1.1  Release Notes,  MongoRocks is deprecated in Percona Server for MongoDB 3.6.

The Percona Server for MongoDB 3.6.4-1.2 release notes are available in the official documentation.

The post Percona Server for MongoDB 3.6.4-1.2 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at May 23, 2018 05:04 PM

Deploy a MongoDB Replica Set with Transport Encryption (Part 2)

document-replication

In this article series, we will talk about the basic high availability architecture of a MongoDB: the MongoDB replica set.

  • Part 1 : We introduced basic replica set concepts, how it works and what its main features
  • Part 2 (this post): We’ll provide a step-by-step guide to configure a three-node replica set
  • Part 3: We’ll talk about how to configure transport encryption between the nodes

In part 1 we introduced and described the main features of a MongoDB replica set. In this post, we are going to present a step-by-step guide to deploy a basic and fully operational 3-nodes replica set. We’ll use just regular members, all with priority=1, no arbiter, no hidden or delayed nodes.

The environment

Our example environment is 3 virtual hosts with Ubuntu 16.04 LTS, although the configuration is the same with CentOS or other Linux distributions.

We have installed Percona Server for MongoDB on each node. Hostnames and IPs are:

  • psmdb1 : 192.168.56.101
  • psmdb2 : 192.168.56.102
  • psmdb3 : 192.168.56.103

It is not the goal of this post to provide installation details, but in case you need them you can follow this guide: https://www.percona.com/doc/percona-server-for-mongodb/LATEST/install/index.html MongoDB installation from the repository is very easy.

Connectivity

Once we have all the nodes with MongoDB installed, we just need to be sure that each one is accessible by all the others on port 27017, the default port.

Since our members are on the same network we can simply try to test the connectivity between each pair of nodes, connecting the mongo client from one node to each of the others.

psmdb1> mongo --host 192.168.56.102 --port 27017
psmdb1> mongo --host 192.168.56.103 --port 27017
psmdb2> mongo --host 192.168.56.101 --port 27017
psmdb2> mongo --host 192.168.56.103 --port 27017
psmdb3> mongo --host 192.168.56.101 --port 27017
psmdb3> mongo --host 192.168.56.102 --port 27017

If the mongo client is not able to connect, we need to check the network configuration, or to configure or disable the firewall.

Hostnames

Configuring the hostnames into our hosts is not mandatory for the replica set. In fact you can configure the replica set using just the IPs and it’s fine. But we need to define the hostnames because they will be very useful when we discuss how to configure internal encryption in Part 3.

We need to ensure that each member is accessible by way of resolvable DNS or hostnames.

Set up each node in the /etc/hosts file

root@psmdb1:~# cat /etc/hosts
127.0.0.1       localhost
192.168.56.101  psmdb1
192.168.56.102  psmdb2
192.168.56.103  psmdb3

Choose a name for the replica set

We are now close to finalizing the configuration.

Now we have to choose a name for the replica set. We need to choose one and put t on each member’s configuration file. Let’s say we decide to use rs-test.

Put the replica set name into /etc/mongod.conf (the MongoDB configuration file) on each host. Enter the following:

replication:
     replSetName: "rs-test"

Restart the server:

sudo service mongod restart

Remember to do this on all the nodes.

That’s all we need to do to configure the replication at its most basic. There are obviously other configuration parameters we could set, but maybe we’ll talk about them in another post when discussing more advanced features. For this basic deployment we can assume that all the default values are good enough.

Initiate replication

Now we need to connect to one of the nodes. It doesn’t matter which, just choose one of them and launch mongo shell to connect to the local mongod instance.

Then issue the rs.initiate() command to let the replica set know what all the members are.

mongo> rs.initiate( {
      ... _id: “rs-test”,
      ... members: [
      ... { _id: 0, host: “psmdb1:27017” },
      ... { _id: 1, host: “psmdb2:27017” },
      ... { _id: 2, host: “psmdb3:27017” }
      ... ] })

After issuing the command, MongoDB initiates the replication process using the default configuration. A PRIMARY node is elected and all the documents will be created by now will be asynchronously replicated on the SECONDARY nodes.

We don’t need to do any more. The replica set is now working.

We can verify that the replication is working by taking a look at the mongo shell prompt. Once the replica set is up and running the prompt should be like this on the PRIMARY node:

rs-test:PRIMARY>

and like this on the SECONDARY nodes:

rs-test:SECONDARY>

MongoDB lets you know the replica role of the node that you are connected to.

A couple of useful commands

There are several commands to investigate and to do some administrative tasks on the replica set. Here are a couple of them.

To investigate the replica set configuration you can issue rs.conf() on any node

rs-test:PRIMARY> rs.conf()
{
 "_id" : "rs-test",
 "version" : 68835,
 "protocolVersion" : NumberLong(1),
 "members" : [
 {
 "_id" : 0,
 "host" : "psmdb1:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {
},
 "slaveDelay" : NumberLong(0),
 "votes" : 1
 },
 {
 "_id" : 1,
 "host" : "psmdb2:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {
},
 "slaveDelay" : NumberLong(0),
 "votes" : 1
 },
 {
 "_id" : 2,
 "host" : "psmdb3:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {
},
 "slaveDelay" : NumberLong(0),
 "votes" : 1
 }
 ],
 "settings" : {
 "chainingAllowed" : true,
 "heartbeatIntervalMillis" : 2000,
 "heartbeatTimeoutSecs" : 10,
 "electionTimeoutMillis" : 10000,
 "catchUpTimeoutMillis" : 60000,
 "getLastErrorModes" : {
},
 "getLastErrorDefaults" : {
 "w" : 1,
 "wtimeout" : 0
 },
 "replicaSetId" : ObjectId("5aa2600d377adb63d28e7f0f")
 }
}

We can see information about the configured nodes, whether arbiter or hidden, the priority, and other details regarding the heartbeat process.

To investigate the replica set status you can issue rs.status() on any node

rs-test:SECONDARY> rs.status()
{
 "set" : "rs-test",
 "date" : ISODate("2018-05-14T10:16:05.228Z"),
 "myState" : 2,
 "term" : NumberLong(47),
 "syncingTo" : "psmdb3:27017",
 "heartbeatIntervalMillis" : NumberLong(2000),
 "optimes" : {
 "lastCommittedOpTime" : {
 "ts" : Timestamp(1526292954, 1),
 "t" : NumberLong(47)
 },
 "appliedOpTime" : {
 "ts" : Timestamp(1526292964, 1),
 "t" : NumberLong(47)
 },
 "durableOpTime" : {
 "ts" : Timestamp(1526292964, 1),
 "t" : NumberLong(47)
 }
 },
 "members" : [
 {
 "_id" : 0,
 "name" : "psmdb1:27017",
 "health" : 1,
 "state" : 2,
 "stateStr" : "SECONDARY",
 "uptime" : 392,
 "optime" : {
 "ts" : Timestamp(1526292964, 1),
 "t" : NumberLong(47)
 },
 "optimeDate" : ISODate("2018-05-14T10:16:04Z"),
 "syncingTo" : "psmdb3:27017",
 "configVersion" : 68835,
 "self" : true
 },
 {
 "_id" : 1,
 "name" : "psmdb2:27017",
 "health" : 1,
 "state" : 1,
 "stateStr" : "PRIMARY",
 "uptime" : 379,
 "optime" : {
 "ts" : Timestamp(1526292964, 1),
 "t" : NumberLong(47)
 },
 "optimeDurable" : {
 "ts" : Timestamp(1526292964, 1),
 "t" : NumberLong(47)
 },
 "optimeDate" : ISODate("2018-05-14T10:16:04Z"),
 "optimeDurableDate" : ISODate("2018-05-14T10:16:04Z"),
 "lastHeartbeat" : ISODate("2018-05-14T10:16:04.832Z"),
 "lastHeartbeatRecv" : ISODate("2018-05-14T10:16:03.318Z"),
 "pingMs" : NumberLong(0),
 "electionTime" : Timestamp(1526292592, 1),
 "electionDate" : ISODate("2018-05-14T10:09:52Z"),
 "configVersion" : 68835
 },
 {
 "_id" : 2,
 "name" : "psmdb3:27017",
 "health" : 1,
 "state" : 2,
 "stateStr" : "SECONDARY",
 "uptime" : 378,
 "optime" : {
 "ts" : Timestamp(1526292964, 1),
 "t" : NumberLong(47)
 },
 "optimeDurable" : {
 "ts" : Timestamp(1526292964, 1),
 "t" : NumberLong(47)
 },
 "optimeDate" : ISODate("2018-05-14T10:16:04Z"),
 "optimeDurableDate" : ISODate("2018-05-14T10:16:04Z"),
 "lastHeartbeat" : ISODate("2018-05-14T10:16:04.832Z"),
 "lastHeartbeatRecv" : ISODate("2018-05-14T10:16:04.822Z"),
 "pingMs" : NumberLong(0),
 "syncingTo" : "psmdb2:27017",
 "configVersion" : 68835
 }
 ],
 "ok" : 1
}

Here we can see for example if nodes are reachable and are running, but in particular we can see the role they have at this moment: which is the PRIMARY and which are SECONDARY

Test replication

Finally, let’s try to test that the replication process is really working as expected.

Connect to the PRIMARY node and create a sample document:

rs-test:PRIMARY> use test
switched to db test
rs-test:PRIMARY> db.foo.insert( {name:"Bruce", surname:"Dickinson"} )
WriteResult({ "nInserted" : 1 })
rs-test:PRIMARY> db.foo.find().pretty()
{
    "_id" : ObjectId("5ae05ac27e6680071caf94b7")
    "name" : "Bruce"
    "surname" : "Dickinson"
}

Then connect to a SECONDARY node and look for the same document.

Remember that you can’t connect to the SECONDARY node to read the data. By default reads and writes are allowed only on the PRIMARY. So, if you want to read data on a SECONDARY node, you first need to issue the rs.slaveOK() command. If you don’t do this you will receive an error.

rs-test:SECONDARY> rs.slaveOK()
rs-test:SECONDARY> show collections
local
foo
rs-test:SECONDARY> db.foo.find().pretty()
{
     "_id" : ObjectId("5ae05ac27e6680071caf94b7")
     "name" : "Bruce"
     "surname" : "Dickinson"
}

As we can see, the SECONDARY node has replicated the creation of the collection foo and the inserted document.

This simple test demonstrates that the replication process is working as expected.

There are more sophisticated features to investigate the replica set, and for troubleshooting, but discussing them it’s not in the scope of this post.

In Part 3, we’ll show how to encrypt the internal replication process we have deployed so far.

Read the first post of this series: Deploy a MongoDB Replica Set with Transport Encryption

The post Deploy a MongoDB Replica Set with Transport Encryption (Part 2) appeared first on Percona Database Performance Blog.

by Corrado Pandiani at May 23, 2018 02:42 PM

May 22, 2018

Peter Zaitsev

Percona Toolkit 3.0.10 Is Now Available

percona toolkit

percona toolkitPercona announces the release of Percona Toolkit 3.0.10 on May 22, 2018.

Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL®, MongoDB® and system tasks that are too difficult or complex to perform manually. With over 1,000,000 downloads, Percona Toolkit supports Percona Server for MySQL, MySQL®, MariaDB®, Percona Server for MongoDB and MongoDB.

Percona Toolkit, like all Percona software, is free and open source. You can download packages from the website or install from official repositories.

This release includes the following changes:

New Features:

  • PT-131: pt-table-checksum disables the QRT plugin
    The Query Response Time Plugin provides a tool for analyzing information by counting and displaying the number of queries according to the length of time they took to execute. This feature enables a new flag
    --disable-qrt-plugin
      that leverages Percona Server for MySQL’s new ability to disable QRT plugin at the session level. The advantage to enabling this Toolkit feature is that the QRT metrics are not impacted by the work that pt-table-checksum performs. This means that QRT metrics report only the work your Application is generating on MySQL, and not clouded by the activities of pt-table-checksum.
  • PT-118: pt-table-checksum reports the number of rows of difference between master and slave
    We’re adding support for pt-table-checksum to identify the number of row differences between master and slave. Previously you were able to see only the count of chunks that differed between hosts. This is helpful for situations where you believe you can tolerate some measure of row count drift between hosts, but want to be precise in understanding what that row count difference actually is.

Improvements

  • PT-1546: Improved support for MySQL 8 roles
  • PT-1543: The encrypted table status query causes high load over multiple minutes
    Users reported that listing encrypted table status can be very slow.  We’ve enabled this functionality via --list-encrypted-tables and set it to default of disabled.
  • PT-1536: Added info about encrypted tablespaces in pt-mysql-summary
    We’ve improved pt-mysql-summary to now include information about encrypted tablespaces.  This information is available by using
    --list-encrypted-tables
     .

Bug Fixes:

  • PT-1556pt-table-checksum 3.0.9 does not change binlog_format to statement any more.

pt-show-grants has several known issues when working with MySQL 8 and roles, which Percona aims to address in subsequent Percona Toolkit releases: PT-1560PT-1559, and PT-1558

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

The post Percona Toolkit 3.0.10 Is Now Available appeared first on Percona Database Performance Blog.

by Michael Coburn at May 22, 2018 08:32 PM

ProxySQL 1.4.8 and Updated proxysql-admin Tool Now in the Percona Repository

ProxySQL for Connection Pooling

ProxySQL 1.4.5ProxySQL 1.4.8, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.8 source and binary packages available at https://percona.com/downloads/proxysql include ProxySQL Admin – a tool, developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.8 are available as well: https://hub.docker.com/r/percona/proxysql/. You can download the original ProxySQL from https://github.com/sysown/proxysql/releases.

This release fixes the following bugs in ProxySQL Admin:

Usability improvement:

  • PR #84: Now proxysql-status tool dumps host_priority and proxysql-admin.cnf. Also output format was changed.

Other improvements and bug fixes:

  • PR #66: --syncusers option now makes ProxySQL-admin to update the user’s password in ProxySQL database if there is any password difference between ProxySQL user and MySQL user.
  • PSQLADM-45: it was unclear from the help screen, that --config-file option requires an argument.
  • PSQLADM-48${PROXYSQL_DATADIR}/${CLUSTER_NAME}_mode file was not created at ProxySQL-admin upgrade (1.4.5 or before to 1.4.6 onwards).
  • PSQLADM-52: The  proxysql_galera_checker script was not checking empty query rules.
  • PSQLADM-54: proxysql_node_monitor did not change OFFLINE_HARD status properly for the coming back online nodes.

ProxySQL is available under OpenSource license GPLv3.

The post ProxySQL 1.4.8 and Updated proxysql-admin Tool Now in the Percona Repository appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at May 22, 2018 07:34 PM

Upcoming Webinar Thursday, 5/24: What’s New in MongoDB 3.6

Running MongoDB

Please join Percona’s Senior Support Engineer, Adamo Tonete as he presents What’s New in MongoDB 3.6 on Thursday, May 24th, 2018, at 12:30 PM PDT (UTC-7) / 3:30 PM EDT (UTC-4).

In this webinar, Adamo will walk though what’s new in MongoDB 3.6, including:

  • Change streams for building reactive, real-time applications
  • Retryable writes for always-on write availability
  • Schema validation with JSON Schema for new data governance controls
  • Fully expressive array updates that perform complex array manipulations in a single atomic update operation
  • New security controls
  • End-to-end compression to create efficient, distributed architectures

This webinar is a summary and follow up to several published blog posts on MongoDB 3.6. More information can be found here.

Download the guide to MongoDB 3.6

 

Adamo Tonete, Senior Technical Services Engineer

Adamo joined Percona in 2015, after working as a MongoDB/MySQL Database Administrator for three years. As the main database member of a startup, he was responsible for suggesting the best architecture and data flows for a worldwide company in a 7/24 environment. Before that, he worked as a Microsoft SQL Server DBA in a large e-commerce company, mainly on performance tuning and automation. Adamo has almost eight years of experience working as a DBA and in the past three years he has moved to NoSQL technologies without giving up relational databases. He likes to play video games and to study everything that is related to engines. Adamo lives with his wife in São Paulo, Brazil.

Register for the webinar

The post Upcoming Webinar Thursday, 5/24: What’s New in MongoDB 3.6 appeared first on Percona Database Performance Blog.

by Adamo Tonete at May 22, 2018 06:59 PM

Shlomi Noach

MySQL master discovery methods, part 6: other methods

This is the sixth in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

Hard coded configuration deployment

You may use your source/config repo as master service discovery method of sorts.

The master's identity would be hard coded into your, say, git repo, to be updated and deployed to production upon failover.

This method is simple and I've seen it being used by companies, in production. Noteworthy:

  • This requires a dependency of production on source availability.
    • The failover tool would need to have access to your source environment.
  • This requires a dependency of production on build/deploy flow.
    • The failover tool would need to kick build, test, deploy process.
  • Code deployment time can be long.
  • Deployment must take place on all relevant hosts, and cause for a mass refresh/reload.
    • It should interrupt processes that cannot reload themselves, such as various commonly used scripts.

Synchronous replication

This series of posts is focused on asynchronous replication, but we will do well to point out a few relevant notes on sychnronous replication (Galera, XtraDB Cluster, InnoDB Cluster).

  • Synchronous replication can act in single-writer mode or in multi-writer mode.
  • In single writer mode, apps should connect to a particular master.
    • The identity of such master can be achieved by querying the MySQL members of the cluster.
  • In multi-writer mode, apps can connect to any healthy member of the cluster.
    • This still calls for a check: is the member healthy?
  • Syncronous replication is not intended to work well cross DC.

The last bullet should perhaps be highlighted. In a cross-DC setup, and for cross-DC failovers, we are back to same requirements as with asynchronous replication, and the methods illustrated in this series of posts may apply.

  • VIPs make less sense.
  • Proxy-based solution make a lot of sense.

All posts in this series

by shlomi at May 22, 2018 08:39 AM

Open Query Pty Ltd

How not to respect your users’ privacy

PrivacyYou just run the usual online frameworks, with their extensive plugin range, CDN, Google Analytics, NewRelic, Twitter, Facebook and LinkedIn widgets, and the rest.  Then, you display a notice to your users that your site uses cookies and passes some data to third parties (such as Google Analytics and NewRelic) “to enhance the user experience”.

There. Easy, right? You probably didn’t need to change anything at all. Most companies, sites and applications do this.  Now tell me: given that you probably agree with at least some of the above, how come you display a notice to your users explaining how you respect their privacy?  It can’t both be true.

So yes, this was a test.  And most of us fail, including us.  Why is this?

  1. Are you asking for and storing more data than you actually require for delivering the product or service that you provide?  You can probably only test this by working out the minimum data requirements, questioning each item, and then comparing that list with what you currently actually collect.  There’s likely to be a (large) discrepancy.
  2. Are you using multiple analytics and trackers?  Why?  It does in fact affect the user experience of your site, both in terms of speed as well as privacy.  And you probably don’t actually use all that data.  So think about what you actually use, and get rid of the rest.  That’s a good exercise and an excellent step.
  3. Does your site deliver pixel images for Facebook and others?  If so, why?
  4. Does your site show a “site seal” advertising your SSL certificate’s vendor?  If so, why?
  5. Does your site set one or more cookies for every user, rather than only logged-in users?  If so, why?
  6. Most CMS and frameworks actually make it difficult to not flood users with cookies and third-party tracking. They have become the new bloat.  Example: you use a component that includes a piece  of javascript or css off a vendor-provided CDN. Very convenient, but you’ve just provided site-usage data to that vendor as well as your users’ IP address.
  7. Respecting privacy is not “business as usual” + a notice. It’s just not.

So, privacy is actually really hard, and for a large part because our tools make it so.  They make it so not for your users’ convenience, or even your convenience, but for the vendors of said tools/components. You get some benefit, which in turn could benefit your users, but I think it’s worthwhile to really review what’s actually necessary and what’s not.

A marketing or sales person might easily say “more data is better”, but is it, really?  It affects site speed and user experience. And unless you’ve got your analytics tools really well organised, you’re actually going to find that all that extra data is overhead you don’t need in your company.  If you just collect and use what you really need, you’ll do well. Additionally, it’ll enable you to tell your users/clients honestly about what you do and why, rather than deliver a generic fudge-text as described in the first paragraph of this post.

A few quick hints to check your users’ privacy experience, without relying on third-party sites.

  • Install EFF’s Privacy Badger plugin.  It uses heuristics (rather than a fixed list) to identify suspected trackers and deal with them appropriately (allow, block cookies, block completely).  Privacy Badger provides you with an icon on the right of your location bar, showing a number indicating how many trackers the current page has.  If you click on the icon, you can see details and adjust.  And as a site-owner, you’ll want to adjust the site it rather than badger!
  • If you click on the left hand side of your location bar, on the secure icon (because you are already offering https, right?), you can also see details on cookies: both how many and to which domains. If you see any domains which are not yours, they’re caused by components (images, javascript, css) on your page that retrieve bits from elsewhere. Prepare to be shocked.
  • To see in more detail what bits an individual page uses, you can right-click on a page and select “Inspect” then go to the “Sources” tab.  Again, prepare to be shocked.

Use that shock well, to genuinely improve privacy – and thereby respect your users.

Aside from the ethics, I expect that these indicates (cookies, third-party resource requests, trackers, etc) will get used to rank sites and identify bad players. So there’ll be a business benefit in being ahead of this predictable trend.  And again, doing a clean-up will also make your site faster, as well as easier to use.

by Arjen Lentz at May 22, 2018 01:20 AM

May 21, 2018

Peter Zaitsev

Percona Server for MongoDB 3.2.20-3.11 Is Now Available

MongoRocks

Percona Server for MongoDB 3.2Percona announces the release of Percona Server for MongoDB 3.2.20-3.11 on May 21, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.2 Community Edition. It supports MongoDB 3.2 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features. It requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.2.20 and does not include any additional changes.

The Percona Server for MongoDB 3.2.20-3.11 release notes are available in the official documentation.

The post Percona Server for MongoDB 3.2.20-3.11 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at May 21, 2018 08:32 PM

Webinar Wed, 5/23: Troubleshooting MySQL Concurrency Issues with Load Testing Tools

Benchmarking tools for troubleshooting

Please join Percona’s Principal Support Escalation Specialist, Sveta Smirnova, as she presents Troubleshooting MySQL Concurrency Issues with Load Testing Tools on Wednesday, May 23, 2018 at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

 

Normally, we use benchmarking tools when we are developing applications. When applications are deployed, benchmarks tests are usually too late to help.

This webinar doesn’t cover actual benchmarks, but it does look at how you can use benchmarking tools for troubleshooting. When you need to repeat a situation caused by concurrent client execution, they can be your best option. These types of issues include all kinds of locking and performance issues, along with stalls and crashes.

In this webinar Sveta will cover some of the main tools she uses, such as (but not limited to) SysBench and mysqlslap. She will show how to use the tools’ standard options while working with specific custom problems, and how to script them to develop test cases that are as close to real life scenarios as possible.

Register for the webinar.

MySQL Test Framework for TroubleshootingSveta Smirnova, Principal Support Escalation Specialist

Sveta joined Percona in 2015. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can quickly solve typical issues and teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona, Sveta worked as Support Engineer in MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is the author of book “MySQL Troubleshooting” and JSON UDF functions for MySQL.

The post Webinar Wed, 5/23: Troubleshooting MySQL Concurrency Issues with Load Testing Tools appeared first on Percona Database Performance Blog.

by Sveta Smirnova at May 21, 2018 06:41 PM

MariaDB Foundation

MariaDB Foundation financial report for 2017

The 2017 accounting for the MariaDB Foundation has been completed and the key figures are: Total income: 476,952.38 USD Total expenses: 476,952.38 USD Net income after adjustments: 153,890.65 USD Staff costs were about 292 000 USD. Travel costs were only about 30 000 USD. The remaining 23 000 USD is administration (accounting, finances, legal) and other expenses. As […]

The post MariaDB Foundation financial report for 2017 appeared first on MariaDB.org.

by Otto Kekäläinen at May 21, 2018 02:49 PM

Jean-Jerome Schmidt

Understanding Deadlocks in MySQL & PostgreSQL

Working with databases, concurrency control is the concept that ensures that database transactions are performed concurrently without violating data integrity.

There is a lot of theory and different approaches around this concept and how to accomplish it, but we will briefly refer to the way that PostgreSQL and MySQL (when using InnoDB) handle it, and a common problem that can arise in highly concurrent systems: deadlocks.

These engines implement concurrency control by using a method called MVCC (Multiversion Concurrency Control). In this method, when an item is being updated, the changes will not overwrite the original data, but instead a new version of the item (with the changes) will be created. Thus we will have several versions of the item stored.

One of the main advantages of this model is that locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.

But, if several versions of the same item are stored, which version of it will a transaction see? To answer that question we need to review the concept of transaction isolation. Transactions specify an isolation level, that defines the degree to which one transaction must be isolated from resource or data modifications made by other transactions.This degree is directly related with the locking generated by a transaction, and so, as it can be specified at transaction level, it can determine the impact that a running transaction can have over other running transactions.

This is a very interesting and long topic, although we will not go into too much details in this blog. We’d recommend the PostgreSQL and MySQL official documentation for further reading on this topic.

So, why are we going into the above topics when dealing with deadlocks? Because sql commands will automatically acquire locks to ensure the MVCC behaviour, and the lock type acquired depends on the transaction isolation defined.

There are several types of locks (again another long and interesting topic to review for PostgreSQL and MySQL) but, the important thing about them, is how they interact (most exactly, how they conflict) with each other. Why is that? Because two transactions cannot hold locks of conflicting modes on the same object at the same time. And a non minor detail, once acquired, a lock is normally held till end of transaction.

This is a PostgreSQL example of how locking types conflict with each other:

PostgreSQL Locking types conflict
PostgreSQL Locking types conflict

And for MySQL:

MySQL Locking types conflict
MySQL Locking types conflict

X= exclusive lock         IX= intention exclusive lock
S= shared lock         IS= intention shared lock

So what happens when I have two running transactions that want to hold conflicting locks on the same object at the same time? One of them will get the lock and the other will have to wait.

So now we are in a position to truly understand what is happening during a deadlock.

What is a deadlock then? As you can imagine, there are several definitions for a database deadlock, but i like the following for its simplicity.

A database deadlock is a situation in which two or more transactions are waiting for one another to give up locks.

So for example, the following situation will lead us to a deadlock:

Deadlock example
Deadlock example

Here, the application A gets a lock on table 1 row 1 in order to make an update.

At the same time application B gets a lock on table 2 row 2.

Now application A needs to get a lock on table 2 row 2, in order to continue the execution and finish the transaction, but it cannot get the lock because it is held by application B. Application A needs to wait for application B to release it.

But application B needs to get a lock on table 1 row 1, in order to continue the execution and finish the transaction, but it cannot get the lock because it is held by application A.

So here we are in a deadlock situation. Application A is waiting for the resource held by application B in order to finish and application B is waiting for the resource held by application A. So, how to continue? The database engine will detect the deadlock and kill one of the transactions, unblocking the other one and raising a deadlock error on the killed one.

Let's check some PostgreSQL and MySQL deadlock examples:

PostgreSQL

Suppose we have a test database with information from the countries of the world.

world=# SELECT code,region,population FROM country WHERE code IN ('NLD','AUS');
code |          region           | population
------+---------------------------+------------
NLD  | Western Europe            |   15864000
AUS  | Australia and New Zealand |   18886000
(2 rows)

We have two sessions that want to make changes to the database.

The first session will modify the region field for the NLD code, and the population field for the AUS code.

The second session will modify the region field for the AUS code, and the population field for the NLD code.

Table data:

code: NLD
region: Western Europe
population: 15864000
code: AUS
region: Australia and New Zealand
population: 18886000

Session 1:

world=# BEGIN;
BEGIN
world=# UPDATE country SET region='Europe' WHERE code='NLD';
UPDATE 1

Session 2:

world=# BEGIN;
BEGIN
world=# UPDATE country SET region='Oceania' WHERE code='AUS';
UPDATE 1
world=# UPDATE country SET population=15864001 WHERE code='NLD';

Session 2 will hang waiting for Session 1 to finish.

Session 1:

world=# UPDATE country SET population=18886001 WHERE code='AUS';

ERROR:  deadlock detected
DETAIL:  Process 1181 waits for ShareLock on transaction 579; blocked by process 1148.
Process 1148 waits for ShareLock on transaction 578; blocked by process 1181.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (0,15) in relation "country"

Here we have our deadlock. The system detected the deadlock and killed session 1.

Session 2:

world=# BEGIN;
BEGIN
world=# UPDATE country SET region='Oceania' WHERE code='AUS';
UPDATE 1
world=# UPDATE country SET population=15864001 WHERE code='NLD';
UPDATE 1

And we can check that the second session finished correctly after the deadlock was detected and the Session 1 was killed (thus, the lock was released).

To have more details we can see the log in our PostgreSQL server:

2018-05-16 12:56:38.520 -03 [1181] ERROR:  deadlock detected
2018-05-16 12:56:38.520 -03 [1181] DETAIL:  Process 1181 waits for ShareLock on transaction 579; blocked by process 1148.
       Process 1148 waits for ShareLock on transaction 578; blocked by process 1181.
       Process 1181: UPDATE country SET population=18886001 WHERE code='AUS';
       Process 1148: UPDATE country SET population=15864001 WHERE code='NLD';
2018-05-16 12:56:38.520 -03 [1181] HINT:  See server log for query details.
2018-05-16 12:56:38.520 -03 [1181] CONTEXT:  while updating tuple (0,15) in relation "country"
2018-05-16 12:56:38.520 -03 [1181] STATEMENT:  UPDATE country SET population=18886001 WHERE code='AUS';
2018-05-16 12:59:50.568 -03 [1181] ERROR:  current transaction is aborted, commands ignored until end of transaction block

Here we will be able to see the actual commands that were detected on deadlock.

Download the Whitepaper Today
 
PostgreSQL Management & Automation with ClusterControl
Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

MySQL

To simulate a deadlock in MySQL we can do the following.

As with PostgreSQL, suppose we have a test database with information on actors and movies among other things.

mysql> SELECT first_name,last_name FROM actor WHERE actor_id IN (1,7);
+------------+-----------+
| first_name | last_name |
+------------+-----------+
| PENELOPE   | GUINESS   |
| GRACE      | MOSTEL    |
+------------+-----------+
2 rows in set (0.00 sec)

We have two processes that want to make changes to the database.

The first process will modify the field first_name for actor_id 1, and the field last_name for actor_id 7.

The second process will modify the field first_name for actor_id 7, and the field last_name for actor_id 1.

Table data:

actor_id: 1
first_name: PENELOPE
last_name: GUINESS
actor_id: 7
first_name: GRACE
last_name: MOSTEL

Session 1:

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)
mysql> UPDATE actor SET first_name='GUINESS' WHERE actor_id='1';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Session 2:

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)
mysql> UPDATE actor SET first_name='MOSTEL' WHERE actor_id='7';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> UPDATE actor SET last_name='PENELOPE' WHERE actor_id='1';

Session 2 will hang waiting for Session 1 to finish.

Session 1:

mysql> UPDATE actor SET last_name='GRACE' WHERE actor_id='7';

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Here we have our deadlock. The system detected the deadlock and killed session 1.

Session 2:

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)
mysql> UPDATE actor SET first_name='MOSTEL' WHERE actor_id='7';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> UPDATE actor SET last_name='PENELOPE' WHERE actor_id='1';
Query OK, 1 row affected (8.52 sec)
Rows matched: 1  Changed: 1  Warnings: 0

As we can see in the error, as we saw for PostgreSQL, there is a deadlock between both processes.

For more details we can use the command SHOW ENGINE INNODB STATUS\G:

mysql> SHOW ENGINE INNODB STATUS\G
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-05-16 18:55:46 0x7f4c34128700
*** (1) TRANSACTION:
TRANSACTION 1456, ACTIVE 33 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 54, OS thread handle 139965388506880, query id 15876 localhost root updating
UPDATE actor SET last_name='PENELOPE' WHERE actor_id='1'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1456 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0001; asc   ;;
1: len 6; hex 0000000005af; asc       ;;
2: len 7; hex 2d000001690110; asc -   i  ;;
3: len 7; hex 4755494e455353; asc GUINESS;;
4: len 7; hex 4755494e455353; asc GUINESS;;
5: len 4; hex 5afca8b3; asc Z   ;;

*** (2) TRANSACTION:
TRANSACTION 1455, ACTIVE 47 sec starting index read, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 53, OS thread handle 139965267871488, query id 16013 localhost root updating
UPDATE actor SET last_name='GRACE' WHERE actor_id='7'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1455 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0001; asc   ;;
1: len 6; hex 0000000005af; asc       ;;
2: len 7; hex 2d000001690110; asc -   i  ;;
3: len 7; hex 4755494e455353; asc GUINESS;;
4: len 7; hex 4755494e455353; asc GUINESS;;
5: len 4; hex 5afca8b3; asc Z   ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1455 lock_mode X locks rec but not gap waiting
Record lock, heap no 202 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0007; asc   ;;
1: len 6; hex 0000000005b0; asc       ;;
2: len 7; hex 2e0000016a0110; asc .   j  ;;
3: len 6; hex 4d4f5354454c; asc MOSTEL;;
4: len 6; hex 4d4f5354454c; asc MOSTEL;;
5: len 4; hex 5afca8c1; asc Z   ;;

*** WE ROLL BACK TRANSACTION (2)

Under the title "LATEST DETECTED DEADLOCK", we can see details of our deadlock.

To see the detail of the deadlock in the mysql error log, we must enable the option innodb_print_all_deadlocks in our database.

mysql> set global innodb_print_all_deadlocks=1;
Query OK, 0 rows affected (0.00 sec)

MySQL Log Error:

2018-05-17T18:36:58.341835Z 12 [Note] InnoDB: Transactions deadlock detected, dumping detailed information.
2018-05-17T18:36:58.341869Z 12 [Note] InnoDB:
*** (1) TRANSACTION:
 
TRANSACTION 1812, ACTIVE 42 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140515492943616, query id 8467 localhost root updating
UPDATE actor SET last_name='PENELOPE' WHERE actor_id='1'
2018-05-17T18:36:58.341945Z 12 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
 
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1812 lock_mode X locks rec but not gap waiting
Record lock, heap no 204 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0001; asc   ;;
1: len 6; hex 000000000713; asc       ;;
2: len 7; hex 330000016b0110; asc 3   k  ;;
3: len 7; hex 4755494e455353; asc GUINESS;;
4: len 7; hex 4755494e455353; asc GUINESS;;
5: len 4; hex 5afdcb89; asc Z   ;;
 
2018-05-17T18:36:58.342347Z 12 [Note] InnoDB: *** (2) TRANSACTION:
 
TRANSACTION 1811, ACTIVE 65 sec starting index read, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 12, OS thread handle 140515492677376, query id 9075 localhost root updating
UPDATE actor SET last_name='GRACE' WHERE actor_id='7'
2018-05-17T18:36:58.342409Z 12 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
 
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1811 lock_mode X locks rec but not gap
Record lock, heap no 204 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0001; asc   ;;
1: len 6; hex 000000000713; asc       ;;
2: len 7; hex 330000016b0110; asc 3   k  ;;
3: len 7; hex 4755494e455353; asc GUINESS;;
4: len 7; hex 4755494e455353; asc GUINESS;;
5: len 4; hex 5afdcb89; asc Z   ;;
 
2018-05-17T18:36:58.342793Z 12 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
 
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1811 lock_mode X locks rec but not gap waiting
Record lock, heap no 205 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0007; asc   ;;
1: len 6; hex 000000000714; asc       ;;
2: len 7; hex 340000016c0110; asc 4   l  ;;
3: len 6; hex 4d4f5354454c; asc MOSTEL;;
4: len 6; hex 4d4f5354454c; asc MOSTEL;;
5: len 4; hex 5afdcba0; asc Z   ;;
 
2018-05-17T18:36:58.343105Z 12 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)

Taking into account what we have learned above about why deadlocks happen, you can see that there is not much we can do on the database side to avoid them. Anyway, as DBAs it is our duty to actually catch them, analyze them, and provide feedback to the developers.

The reality is that these errors are particular to each application, so you will need to check them one by one and there is not guide to tell you how to troubleshoot this. Keeping this in mind, there are some things you can look for.

Search for long running transactions. As the locks are usually held until the end of a transaction, the longer the transaction , the longer the locks over the resources. If it is possible, try to split long running transactions into smaller/faster ones.

Sometimes it is not possible to actually split the transactions, so the work should focus on trying to execute those operations in a consistent order each time, so transactions form well-defined queues and do not deadlock.

One workaround that you can also propose is to add retry logic into the application (of course, try to solve the underlying issue first) in a way that, if a deadlock happens, the application will to run the same commands again.

Check the isolation levels used, sometimes you try by changing them. Look for commands like SELECT FOR UPDATE, and SELECT FOR SHARE, as they generate explicit locks, and evaluate if they are really needed or you can work with an older snapshot of the data. One thing you can try if you cannot remove these commands is using a lower isolation level such as READ COMMITTED.

Of course, always add well-chosen indexes to your tables. Then your queries need scan fewer index records and consequently set fewer locks.

On a higher level, as a DBA you can take some precautions to minimize locking in general. For naming one example, in this case for PostgreSQL, you can avoid adding a default value in the same command that you will add a column. Altering a table will get a really aggressive lock, and setting a default value for it will actually update the existing rows that have null values, making this operation take really long. So if you split this operation into several commands, adding the column, adding the default, updating the null values, you will minimize the locking impact.

Of course there are tons of tips like this that the DBAs get with the practice (creating indexes concurrently, create the pk index separately before adding the pk,and so on), but the important thing is to learn and understand this "way of thinking" and always to minimize the lock impact of the operations we are doing.

by Sebastian Insausti at May 21, 2018 10:16 AM

May 18, 2018

Jean-Jerome Schmidt

Cloud Disaster Recovery for MariaDB and MySQL

MySQL has a long tradition in geographic replication. Distributing clusters to remote data centers reduces the effects of geographic latency by pushing data closer to the user. It also provides a capability for disaster recovery. Due to the significant cost of duplicating hardware in a separate site, not many companies were able to afford it in the past. Another cost is skilled staff who is able to design, implement and maintain a sophisticated multiple data centers environment.

With the Cloud and DevOps automation revolution, having distributed datacenter has never been more accessible to the masses. Cloud providers are increasing the range of services they offer for a better price.One can build cross-cloud, hybrid environments with data spread all over the world. One can make flexible and scalable DR plans to approach a broad range of disruption scenarios. In some cases, that can just be a backup stored offsite. In other cases, it can be a 1 to 1 copy of a production environment running somewhere else.

In this blog we will take a look at some of these cases, and address common scenarios.

Storing Backups in the Cloud

A DR plan is a general term that describes a process to recover disrupted IT systems and other critical assets an organization uses. Backup is the primary method to achieve this. When a backup is in the same data center as your production servers, you risk that all data may be wiped out in case you lose that data center. To avoid that, you should have the policy to create a copy in another physical location. It's still a good practice to keep a backup on disk to reduce the time needed to restore. In most cases, you will keep your primary backup in the same data center (to minimize restore time), but you should also have a backup that can be used to restore business procedures when primary datacenter is down.

ClusterControl: Upload Backup to the cloud
ClusterControl: Upload Backup to the cloud

ClusterControl allows seamless integration between your database environment and the cloud. It provides options for migrating data to the cloud. We offer a full combination of database backups for Amazon Web Services (AWS), Google Cloud Services or Microsoft Azure. Backups can now be executed, scheduled, downloaded and restored directly from your cloud provider of choice. This ability provides increased redundancy, better disaster recovery options, and benefits in both performance and cost savings.

ClusterControl: Managing Cloud Credentials
ClusterControl: Managing Cloud Credentials

The first step to set up "data center failure - proof backup" is to provide credentials for your cloud operator. You can choose from multiple vendors here. Let's take a look at the process set up for the most popular cloud operator - AWS.

ClusterControl: adding cloud credentials
ClusterControl: adding cloud credentials

All you need is the AWS Key ID and the secret for the region where you want to store your backup. You can get that from AWS console. You can follow a few steps to get it.

  1. Use your AWS account email address and password to sign in to the AWS Management Console as the AWS account root user.
  2. On the IAM Dashboard page, choose your account name in the navigation bar, and then select My Security Credentials.
  3. If you see a warning about accessing the security credentials for your AWS account, choose to Continue to Security Credentials.
  4. Expand the Access keys (access key ID and secret access key) section.
  5. Choose to Create New Access Key. Then choose Download Key File to save the access key ID and secret access key to a file on your computer. After you close the dialog box, you will not be able to retrieve this secret access key again.
ClusterControl: Hybrid cloud backup
ClusterControl: Hybrid cloud backup

When all is set, you can adjust your backup schedule and enable backup to cloud option. To reduce network traffic make sure to enable data compression. It makes backups smaller and minimizes the time needed for upload. Another good practice is to encrypt the backup. ClusterControl creates a key automatically and uses it if you decide to restore it. Advanced backup policies should have different keep times for backups stored on servers in the same datacenter, and the backups stored in another physical location. You should set a more extended retention period for cloud-based backups, and shorter period for backups stored near the production environment, as the probability of restore drops with the backup lifetime.

ClusterControl: backup retention policy
ClusterControl: backup retention policy

Extend your cluster with asynchronous replication

Galera with asynchronous replication can be an excellent solution to build an active DR node in a remote data center. There are a few good reasons to attach an asynchronous slave to a Galera Cluster. Long-running OLAP type queries on a Galera node might slow down a whole cluster. With delay apply option, delayed replication can save you from human errors so all those golden enters will be not immediately applied to your backup node.

ClusterControl: delayed replication
ClusterControl: delayed replication

In ClusterControl, extending a Galera node group with asynchronous replication is done in a single page wizard. You need to provide the necessary information about your future or existing slave server. The slave will be set up from an existing backup, or a freshly streamed XtraBackup from the master to the slave.

Load balancers in multi-datacenter

Load balancers are a crucial component in MySQL and MariaDB database high availability. It’s not enough to have a cluster spanning across multiple data centers. You still need your services to access them. A failure of a load balancer that is available in one data center will make your entire environment unreachable.

Web proxies in cluster environment
Web proxies in cluster environment

One of the popular methods to hide the complexity of the database layer from an application is to use a proxy. Proxies act as an entry point to the databases, they track the state of the database nodes and should always direct traffic to only the nodes that are available. ClusterControl makes it easy to deploy and configure several different load balancing technologies for MySQL and MariaDB, including ProxySQL, HAProxy, with a point-and-click graphical interface.

ClusterControl: load balancer HA
ClusterControl: load balancer HA

It also allows making this component redundant by adding keepalived on top of it. To prevent your load balancers from being a single point of failure, one would set up two identical (one active and one in different DC as standby) HAProxy, ProxySQL or MariaDB Maxscale instances and use Keepalived to run Virtual Router Redundancy Protocol (VRRP) between them. VRRP provides a Virtual IP address to the active load balancer and transfers the Virtual IP to the standby HAProxy in case of failure. It is seamless because the two proxy instances need no shared state.

Of course, there are many things to consider to make your databases immune to data center failures.
Proper planning and automation will make it work! Happy Clustering!

by Bart Oles at May 18, 2018 12:17 PM

MariaDB Foundation

MariaDB 10.2.15 and MariaDB Connector/J 2.2.4 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.2.15, the latest stable release in the MariaDB 10.2 series, and MariaDB Connector/J 2.2.4, the latest stable release in the MariaDB Connector/J 2.2 series. See the release notes and changelogs for details. Download MariaDB 10.2.15 Release Notes Changelog What is MariaDB 10.2? MariaDB APT […]

The post MariaDB 10.2.15 and MariaDB Connector/J 2.2.4 now available appeared first on MariaDB.org.

by Ian Gilfillan at May 18, 2018 08:07 AM

MariaDB AB

MariaDB Server 10.2.15 and Connector/J 2.2.4 now available

MariaDB Server 10.2.15 and Connector/J 2.2.4 now available dbart Fri, 05/18/2018 - 01:08

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.2.15 and MariaDB Connector/J 2.2.4. See the release notes and changelogs for details and visit mariadb.com/downloads to download.

Download MariaDB Server 10.2.15

Release Notes Changelog What is MariaDB 10.2?


Download MariaDB Connector/J 2.2.4

Release Notes Changelog About MariaDB Connector/J

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.2.15 and MariaDB Connector/J 2.2.4. See the release notes and changelogs for details.

Login or Register to post comments

by dbart at May 18, 2018 05:08 AM

May 15, 2018

Peter Zaitsev

About ZFS Performance

ZFS

If you are a regular reader of this blog, you likely know I like the ZFS filesystem a lot. ZFS has many very interesting features, but I am a bit tired of hearing negative statements on ZFS performance. It feels a bit like people are telling me “Why do you use InnoDB? I have read that MyISAM is faster.” I found the comparison of InnoDB vs. MyISAM quite interesting, and I’ll use it in this post.

To have some data to support my post, I started an AWS i3.large instance with a 1000GB gp2 EBS volume. A gp2 volume of this size is interesting because it is above the burst IOPS level, so it offers a constant 3000 IOPS performance level.

I used sysbench to create a table of 10M rows and then, using export/import tablespace, I copied it 329 times. I ended up with 330 tables for a total size of about 850GB. The dataset generated by sysbench is not very compressible, so I used lz4 compression in ZFS. For the other ZFS settings, I used what can be found in my earlier ZFS posts but with the ARC size limited to 1GB. I then used that plain configuration for the first benchmarks. Here are the results with the sysbench point-select benchmark, a uniform distribution and eight threads. The InnoDB buffer pool was set to 2.5GB.

In both cases, the load is IO bound. The disk is doing exactly the allowed 3000 IOPS. The above graph appears to be a clear demonstration that XFS is much faster than ZFS, right? But is that really the case? The way the dataset has been created is extremely favorable to XFS since there is absolutely no file fragmentation. Once you have all the files opened, a read IOP is just a single fseek call to an offset and ZFS doesn’t need to access any intermediate inode. The above result is about as fair as saying MyISAM is faster than InnoDB based only on table scan performance results of unfragmented tables and default configuration. ZFS is much less affected by the file level fragmentation, especially for point access type.

More on ZFS metadata

ZFS stores the files in B-trees in a very similar fashion as InnoDB stores data. To access a piece of data in a B-tree, you need to access the top level page (often called root node) and then one block per level down to a leaf-node containing the data. With no cache, to read something from a three levels B-tree thus requires 3 IOPS.

Simple three levels B-tree

The extra IOPS performed by ZFS are needed to access those internal blocks in the B-trees of the files. These internal blocks are labeled as metadata. Essentially, in the above benchmark, the ARC is too small to contain all the internal blocks of the table files’ B-trees. If we continue the comparison with InnoDB, it would be like running with a buffer pool too small to contain the non-leaf pages. The test dataset I used has about 600MB of non-leaf pages, about 0.1% of the total size, which was well cached by the 3GB buffer pool. So only one InnoDB page, a leaf page, needed to be read per point-select statement.

To correctly set the ARC size to cache the metadata, you have two choices. First, you can guess values for the ARC size and experiment. Second, you can try to evaluate it by looking at the ZFS internal data. Let’s review these two approaches.

You’ll read/hear often the ratio 1GB of ARC for 1TB of data, which is about the same 0.1% ratio as for InnoDB. I wrote about that ratio a few times, having nothing better to propose. Actually, I found it depends a lot on the recordsize used. The 0.1% ratio implies a ZFS recordsize of 128KB. A ZFS filesystem with a recordsize of 128KB will use much less metadata than another one using a recordsize of 16KB because it has 8x fewer leaf pages. Fewer leaf pages require less B-tree internal nodes, hence less metadata. A filesystem with a recordsize of 128KB is excellent for sequential access as it maximizes compression and reduces the IOPS but it is poor for small random access operations like the ones MySQL/InnoDB does.

To determine the correct ARC size, you can slowly increase the ARC size and monitor the number of metadata cache-misses with the arcstat tool. Here’s an example:

# echo 1073741824 > /sys/module/zfs/parameters/zfs_arc_max
# arcstat -f time,arcsz,mm%,mhit,mread,dread,pread 10
    time  arcsz  mm%  mhit  mread  dread  pread
10:22:49   105M    0     0     0      0      0
10:22:59   113M  100     0    22     73      0
10:23:09   120M  100     0    20     68      0
10:23:19   127M  100     0    20     65      0
10:23:29   135M  100     0    22     74      0

You’ll want the ‘mm%’, the metadata missed percent, to reach 0. So when the ‘arcsz’ column is no longer growing and you still have high values for ‘mm%’, that means the ARC is too small. Increase the value of ‘zfs_arc_max’ and continue to monitor.

If the 1GB of ARC for 1TB of data ratio is good for large ZFS recordsize, it is likely too small for a recordsize of 16KB. Does 8x more leaf pages automatically require 8x more ARC space for the non-leaf pages? Although likely, let’s verify.

The second option we have is the zdb utility that comes with ZFS, which allows us to view many internal structures including the B-tree list of pages for a given file. The tool needs the inode of a file and the ZFS filesystem as inputs. Here’s an invocation for one of the tables of my dataset:

# cd /var/lib/mysql/data/sbtest
# ls -li | grep sbtest1.ibd
36493 -rw-r----- 1 mysql mysql 2441084928 avr 15 15:28 sbtest1.ibd
# zdb -ddddd mysqldata/data 36493 > zdb5d.out
# more zdb5d.out
Dataset mysqldata/data [ZPL], ID 90, cr_txg 168747, 4.45G, 26487 objects, rootbp DVA[0]=<0:1a50452800:200> DVA[1]=<0:5b289c1600:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=3004977L/3004977P fill=26487 cksum=13723d4400:5d1f47fb738:fbfb87e6e278:1f30c12b7fa1d1
    Object  lvl   iblk   dblk  dsize  lsize   %full  type
     36493    4    16K    16K  1.75G  2.27G   97.62  ZFS plain file
                                        168   bonus  System attributes
        dnode flags: USED_BYTES USERUSED_ACCOUNTED
        dnode maxblkid: 148991
        path    /var/lib/mysql/data/sbtest/sbtest1.ibd
        uid     103
        gid     106
        atime   Sun Apr 15 15:04:13 2018
        mtime   Sun Apr 15 15:28:45 2018
        ctime   Sun Apr 15 15:28:45 2018
        crtime  Sun Apr 15 15:04:13 2018
        gen     3004484
        mode    100640
        size    2441084928
        parent  36480
        links   1
        pflags  40800000004
Indirect blocks:
               0 L3    0:1a4ea58800:400 4000L/400P F=145446 B=3004774/3004774
               0  L2   0:1c83454c00:1800 4000L/1800P F=16384 B=3004773/3004773
               0   L1  0:1eaa626400:1600 4000L/1600P F=128 B=3004773/3004773
               0    L0 0:1c6926ec00:c00 4000L/c00P F=1 B=3004773/3004773
            4000    L0 EMBEDDED et=0 4000L/6bP B=3004484
            8000    L0 0:1c69270c00:400 4000L/400P F=1 B=3004773/3004773
            c000    L0 0:1c7fbae400:800 4000L/800P F=1 B=3004736/3004736
           10000    L0 0:1ce3f53600:3200 4000L/3200P F=1 B=3004484/3004484
           14000    L0 0:1ce3f56800:3200 4000L/3200P F=1 B=3004484/3004484
           18000    L0 0:18176fa600:3200 4000L/3200P F=1 B=3004485/3004485
           1c000    L0 0:18176fd800:3200 4000L/3200P F=1 B=3004485/3004485
           ...
           [more than 140k lines truncated]

The last section of the above output is very interesting as it shows the B-tree pages. The ZFSB-tree of the file sbtest1.ibd has four levels. L3 is the root page, L2 is the first level (from the top) pages, L1 are the second level pages, and L0 are the leaf pages. The metadata is essentially L3 + L2 + L1. When you change the recordsize property of a ZFS filesystem, you affect only the size of the leaf pages.

The non-leaf page size is always 16KB (4000L) and they are always compressed on disk with lzop (If I read correctly). In the ARC, these pages are stored uncompressed so they use 16KB of memory each. The fanout of a ZFS B-tree, the largest possible ratio of a number of pages between levels, is 128. With the above output, we can easily calculate the required size of metadata we would need to cache all the non-leaf pages in the ARC.

# grep -c L3 zdb5d.out
1
# grep -c L2 zdb5d.out
9
# grep -c L1 zdb5d.out
1150
# grep -c L0 zdb5d.out
145447

So, each of the 330 tables of the dataset has 1160 non-leaf pages and 145447 leaf pages; a ratio very close to the prediction of 0.8%. For the complete dataset of 749GB, we would need the ARC to be, at a minimum, 6GB to fully cache all the metadata pages. Of course, there is some overhead to add. In my experiments, I found I needed to add about 15% for ARC overhead in order to have no metadata reads at all. The real minimum for the ARC size I should have used is almost 7GB.

Of course, an ARC of 7GB on a server with 15GB of Ram is not small. Is there a way to do otherwise? The first option we have is to use a larger InnoDB page size, as allowed by MySQL 5.7. Instead of the regular Innodb page size of 16KB, if you use a page size of 32KB with a matching ZFS recordsize, you will cut the ARC size requirement by half, to 0.4% of the uncompressed size.

Similarly, an Innodb page size of 64KB with similar ZFS recordsize would further reduce the ARC size requirement to 0.2%. That approach works best when the dataset is highly compressible. I’ll blog more about the use of larger InnoDB pages with ZFS in a near future. If the use of larger InnoDB page sizes is not a viable option for you, you still have the option of using the ZFS L2ARC feature to save on the required memory.

So, let’s proposed a new rule of thumb for the required ARC/L2ARC size for a a given dataset:

  • Recordsize of 128KB => 0.1% of the uncompressed dataset size
  • Recordsize of 64KB => 0.2% of the uncompressed dataset size
  • Recordsize of 32KB => 0.4% of the uncompressed dataset size
  • Recordsize of 16KB => 0.8% of the uncompressed dataset size

The ZFS revenge

In order to improve ZFS performance, I had 3 options:

  1. Increase the ARC size to 7GB
  2. Use a larger Innodb page size like 64KB
  3. Add a L2ARC

I was reluctant to grow the ARC to 7GB, which was nearly half the overall system memory. At best, the ZFS performance would only match XFS. A larger InnoDB page size would increase the CPU load for decompression on an instance with only two vCPUs; not great either. The last option, the L2ARC, was the most promising.

The choice of an i3.large instance type is not accidental. The instance has a 475GB ephemeral NVMe storage device. Let’s try to use this storage for the ZFS L2ARC. The warming of an L2ARC device is not exactly trivial. In my case, with a 1GB ARC, I used:

echo 1073741824 > /sys/module/zfs/parameters/zfs_arc_max
echo 838860800 > /sys/module/zfs/parameters/zfs_arc_meta_limit
echo 67108864 > /sys/module/zfs/parameters/l2arc_write_max
echo 134217728 > /sys/module/zfs/parameters/l2arc_write_boost
echo 4 > /sys/module/zfs/parameters/l2arc_headroom
echo 16 > /sys/module/zfs/parameters/l2arc_headroom_boost
echo 0 > /sys/module/zfs/parameters/l2arc_norw
echo 1 > /sys/module/zfs/parameters/l2arc_feed_again
echo 5 > /sys/module/zfs/parameters/l2arc_feed_min_ms
echo 0 > /sys/module/zfs/parameters/l2arc_noprefetch

I then ran ‘cat /var/lib/mysql/data/sbtest/* > /dev/null’ to force filesystem reads and caches on all of the tables. A key setting here to allow the L2ARC to cache data is the zfs_arc_meta_limit. It needs to be slightly smaller than the zfs_arc_max in order to allow some data to be cache in the ARC. Remember that the L2ARC is fed by the LRU of the ARC. You need to cache data in the ARC in order to have data cached in the L2ARC. Using lz4 in ZFS on the sysbench dataset results in a compression ration of only 1.28x. A more realistic dataset would compress by more than 2x, if not 3x. Nevertheless, since the content of the L2ARC is compressed, the 475GB device caches nearly 600GB of the dataset. The figure below shows the sysbench results with the L2ARC enabled:

Now, the comparison is very different. ZFS completely outperforms XFS, 5000 qps for ZFS versus 3000 for XFS. The ZFS results could have been even higher but the two vCPUs of the instance were clearly the bottleneck. Properly configured, ZFS can be pretty fast. Of course, I could use flashcache or bcache with XFS and improve the XFS results but these technologies are way more exotic than the ZFS L2ARC. Also, only the L2ARC stores data in a compressed form, maximizing the use of the NVMe device. Compression also lowers the size requirement and cost for the gp2 disk.

ZFS is much more complex than XFS and EXT4 but, that also means it has more tunables/options. I used a simplistic setup and an unfair benchmark which initially led to poor ZFS results. With the same benchmark, very favorable to XFS, I added a ZFS L2ARC and that completely reversed the situation, more than tripling the ZFS results, now 66% above XFS.

Conclusion

We have seen in this post why the general perception is that ZFS under-performs compared to XFS or EXT4. The presence of B-trees for the files has a big impact on the amount of metadata ZFS needs to handle, especially when the recordsize is small. The metadata consists mostly of the non-leaf pages (or internal nodes) of the B-trees. When properly cached, the performance of ZFS is excellent. ZFS allows you to optimize the use of EBS volumes, both in term of IOPS and size when the instance has fast ephemeral storage devices. Using the ephemeral device of an i3.large instance for the ZFS L2ARC, ZFS outperformed XFS by 66%.

The post About ZFS Performance appeared first on Percona Database Performance Blog.

by Yves Trudeau at May 15, 2018 06:59 PM

Jean-Jerome Schmidt

Updated: Become a ClusterControl DBA: Safeguarding your Data

In the past four posts of the blog series, we covered deployment of clustering/replication (MySQL/Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health and in the last post, how to make your setup highly available through HAProxy and ProxySQL.

So now that you have your databases up and running and highly available, how do you ensure that you have backups of your data?

You can use backups for multiple things: disaster recovery, to provide production data to test against development or even to provision a slave node. This last case is already covered by ClusterControl. When you add a new (replica) node to your replication setup, ClusterControl will make a backup/snapshot of the master node and use it to build the replica. It can also use an existing backup to stage the replica, in case you want to avoid that extra load on the master. After the backup has been extracted, prepared and the database is up and running, ClusterControl will automatically set up replication.

Creating an Instant Backup

In essence, creating a backup is the same for Galera, MySQL replication, PostgreSQL and MongoDB. You can find the backup section under ClusterControl > Backup and by default you would see a list of created backup of the cluster (if any). Otherwise, you would see a placeholder to create a backup:

From here you can click on the "Create Backup" button to make an instant backup or schedule a new backup:

All created backups can also be uploaded to cloud by toggling "Upload Backup to the Cloud", provided you supply working cloud credentials. By default, all backups older than 31 days will be deleted (configurable via Backup Retention settings) or you can choose to keep it forever or define a custom period.

"Create Backup" and "Schedule Backup" share similar options except the scheduling part and incremental backup options for the latter. Therefore, we are going to look into Create Backup feature (a.k.a instant backup) in more depth.

As all these various databases have different backup tools, there is obviously some difference in the options you can choose. For instance with MySQL you get to choose between mysqldump and xtrabackup (full and incremental). For MongoDB, ClusterControl supports mongodump and mongodb-consistent-backup (beta) while PostgreSQL, pg_dump and pg_basebackup are supported. If in doubt which one to choose for MySQL, check out this blog about the differences and use cases for mysqldump and xtrabackup.

Backing up MySQL and Galera

As mentioned in the previous paragraph, you can make MySQL backups using either mysqldump or xtrabackup (full or incremental). In the "Create Backup" wizard, you can choose which host you want to run the backup on, the location where you want to store the backup files, and its directory and specific schemas (xtrabackup) or schemas and tables (mysqldump).

If the node you are backing up is receiving (production) traffic, and you are afraid the extra disk writes will become intrusive, it is advised to send the backups to the ClusterControl host by choosing "Store on Controller" option. This will cause the backup to stream the files over the network to the ClusterControl host and you have to make sure there is enough space available on this node and the streaming port is opened on the ClusterControl host.

There are also several other options whether you would want to use compression and the compression level. The higher the compression level is, the smaller the backup size will be. However, it requires higher CPU usage for the compression and decompression process.

If you would choose xtrabackup as the method for the backup, it would open up extra options: desync, backup locks, compression and xtrabackup parallel threads/gzip. The desync option is only applicable to desync a node from a Galera cluster. Backup locks uses a new MDL lock type to block updates to non-transactional tables and DDL statements for all tables which is more efficient for InnoDB-specific workload. If you are running on Galera Cluster, enabling this option is recommended.

After scheduling an instant backup you can keep track of the progress of the backup job in the Activity > Jobs:

After it has finished, you should be able to see the a new entry under the backup list.

Backing up PostgreSQL

Similar to the instant backups of MySQL, you can run a backup on your Postgres database. With Postgres backups there are two backup methods supported - pg_dumpall or pg_basebackup. Take note that ClusterControl will always perform a full backup regardless of the chosen backup method.

We have covered this aspect in this details in Become a PostgreSQL DBA - Logical & Physical PostgreSQL Backups.

Backing up MongoDB

For MongoDB, ClusterControl supports the standard mongodump and mongodb-consistent-backup developed by Percona. The latter is still in beta version which provides cluster-consistent point-in-time backups of MongoDB suitable for sharded cluster setups. As the sharded MongoDB cluster consists of multiple replica sets, a config replica set and shard servers, it is very difficult to make a consistent backup using only mongodump.

Note that in the wizard, you don't have to pick a database node to be backed up. ClusterControl will automatically pick the healthiest secondary replica as the backup node. Otherwise, the primary will be selected. When the backup is running, the selected backup node will be locked until the backup process completes.

Scheduling Backups

Now that we have played around with creating instant backups, we now can extend that by scheduling the backups.

The scheduling is very easy to do: you can select on which days the backup has to be made and at what time it needs to run.

For xtrabackup there is an additional feature: incremental backups. An incremental backup will only backup the data that changed since the last backup. Of course, the incremental backups are useless if there would not be full backup as a starting point. Between two full backups, you can have as many incremental backups as you like. But restoring them will take longer.

Once scheduled the job(s) should become visible under the "Scheduled Backup" tab and you can edit them by clicking on the "Edit" button. Like with the instant backups, these jobs will schedule the creation of a backup and you can keep track of the progress via the Activity tab.

Backup List

You can find the Backup List under ClusterControl > Backup and this will give you a cluster level overview of all backups made. Clicking on each entry will expand the row and expose more information about the backup:

Each backup is accompanied with a backup log when ClusterControl executed the job, which is available under "More Actions" button.

Offsite Backup in Cloud

Since we have now a lot of backups stored on either the database hosts or the ClusterControl host, we also want to ensure they don’t get lost in case we face a total infrastructure outage. (e.g. DC on fire or flooded) Therefore ClusterControl allows you to store or copy your backups offsite on cloud. The supported cloud platforms are Amazon S3, Google Cloud Storage and Azure Cloud Storage.

The upload process happens right after the backup is successfully created (if you toggle "Upload Backup to the Cloud") or you can manually click on the cloud icon button of the backup list:

Choose the cloud credential and specify the backup location accordingly:

Restore and/or Verify Backup

From the Backup List interface, you can directly restore a backup to a host in the cluster by clicking on the "Restore" button for the particular backup or click on the "Restore Backup" button:

One nice feature is that it is able to restore a node or cluster using the full and incremental backups as it will keep track of the last full backup made and start the incremental backup from there. Then it will group a full backup together with all incremental backups till the next full backup. This allows you to restore starting from the full backup and applying the incremental backups on top of it.

ClusterControl supports restore on an existing database node or restore and verify on a new standalone host:

These two options are pretty similar, except the verify one has extra options for the new host information. If you follow the restoration wizard, you will need to specify a new host. If "Install Database Software" is enabled, ClusterControl will remove any existing MySQL installation on the target host and reinstall the database software with the same version as the existing MySQL server.

Once the backup is restored and verified, you will receive a notification on the restoration status and the node will be shut down automatically.

Point-in-Time Recovery

For MySQL, both xtrabackup and mysqldump can be used to perform point-in-time recovery and also to provision a new replication slave for master-slave replication or Galera Cluster. A mysqldump PITR-compatible backup contains one single dump file, with GTID info, binlog file and position. Thus, only the database node that produces binary log will have the "PITR compatible" option available:

When PITR compatible option is toggled, the database and table fields are greyed out since ClusterControl will always perform a full backup against all databases, events, triggers and routines of the target MySQL server.

Now restoring the backup. If the backup is compatible with PITR, an option will be presented to perform a Point-In-Time Recovery. You will have two options for that - “Time Based” and “Position Based”. For “Time Based”, you can just pass the day and time. For “Position Based”, you can pass the exact position to where you want to restore. It is a more precise way to restore, although you might need to get the binlog position using the mysqlbinlog utility. More details about point in time recovery can be found in this blog.

Backup Encryption

Universally, ClusterControl supports backup encryption for MySQL, MongoDB and PostgreSQL. Backups are encrypted at rest using AES-256 CBC algorithm. An auto generated key will be stored in the cluster's configuration file under /etc/cmon.d/cmon_X.cnf (where X is the cluster ID):

$ sudo grep backup_encryption_key /etc/cmon.d/cmon_1.cnf
backup_encryption_key='JevKc23MUIsiWLf2gJWq/IQ1BssGSM9wdVLb+gRGUv0='

If the backup destination is not local, the backup files are transferred in encrypted format. This feature complements the offsite backup on cloud, where we do not have full access to the underlying storage system.

Final Thoughts

We showed you how to get your data backed up and how to store them safely off site. Recovery is always a different thing. ClusterControl can recover automatically your databases from the backups made in the past that are stored on premises or copied back from the cloud.

Obviously there is more to securing your data, especially on the side of securing your connections. We will cover this in the next blog post!

by ashraf at May 15, 2018 05:33 AM

May 14, 2018

Peter Zaitsev

Installing MySQL 8.0 on Ubuntu 16.04 LTS in Five Minutes

Installing MySQL 8.0 on Ubuntu small

Do you want to install MySQL 8.0 on Ubuntu 16.04 LTS? In this quick tutorial, I show you exactly how to do it in five minutes or less.

This tutorial assumes you don’t have MySQL or MariaDB installed. If you do, it’s necessary to uninstall them or follow a slightly more complicated upgrade process (not covered here).

Step 1: Install MySQL APT Repository

Ubuntu 16.04 LTS, also known as Xenial, comes with a choice of MySQL 5.7 and MariaDB 10.0.

If you want to use MySQL 8.0, you need to install the MySQL/Oracle Apt repository first:

wget https://dev.mysql.com/get/mysql-apt-config_0.8.10-1_all.deb
dpkg -i mysql-apt-config_0.8.10-1_all.deb

The MySQL APT repository installation package allows you to pick what MySQL version you want to install, as well as if you want access to Preview Versions. Let’s leave them all as default:

Installing MySQL 8.0 on Ubuntu

Step 2: Update repository configuration and install MySQL Server

apt-get update
apt-get install mysql-server

Note: Do not forget to run “apt-get update”, otherwise you can get an old version of MySQL from Ubuntu repository installed.

The installation process asks you to set a password for the root user:

Installing MySQL 8.0 on Ubuntu 1

I recommend you set a root password for increased security. If you do not set a password for the root account, “auth_socket” authentication is enabled. This ensures only the operating system’s “root” user can connect to MySQL Server without a password.

Next, the installation script asks you whether to use Strong Password Encryption or Legacy Authentication:

Installing MySQL 8.0 on Ubuntu 2

While using strong passwords is recommend for security purposes, not all applications and drivers support this new authentication method. Going with Legacy Authentication is a safer choice

All Done

You should have MySQL 8.0 Server running. You can test it by connecting to it with a command line client:

Installing MySQL 8.0 on Ubuntu 3

As you can see, it takes just a few simple steps to install MySQL 8.0 on Ubuntu 16.04 LTS.

Installing MySQL 8.0 on Ubuntu 16.04 LTS is easy. Go ahead give it a try!

The post Installing MySQL 8.0 on Ubuntu 16.04 LTS in Five Minutes appeared first on Percona Database Performance Blog.

by Peter Zaitsev at May 14, 2018 05:27 PM

Shlomi Noach

MySQL master discovery methods, part 5: Service discovery & Proxy

This is the fifth in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).

Master discovery via Service discovery and Proxy

Part 4 presented with an anti-pattern setup, where a proxy would infer the identify of the master by drawing conclusions from backend server checks. This led to split brains and undesired scenarios. The problem was the loss of context.

We re-introduce a service discovery component (illustrated in part 3), such that:

  • The app does not own the discovery, and
  • The proxy behaves in an expected and consistent way.

In a failover/service discovery/proxy setup, there is clear ownership of duties:

  • The failover tool own the failover itself and the master identity change notification.
  • The service discovery component is the source of truth as for the identity of the master of a cluster.
  • The proxy routes traffic but does not make routing decisions.
  • The app only ever connects to a single target, but should allow for a brief outage while failover takes place.

Depending on the technologies used, we can further achieve:

  • Hard cut for connections to old, demoted master M.
  • Black/hold off for incoming queries for the duration of failover.

We explain the setup using the following assumptions and scenarios:

  • All clients connect to master via cluster1-writer.example.net, which resolves to a proxy box.
  • We fail over from master M to promoted replica R.

A non planned failover illustration #1

Master M has died, the box had a power failure. R gets promoted in its place. Our recovery tool:

  • Updates service discovery component that R is the new master for cluster1.

The proxy:

  • Either actively or passively learns that R is the new master, rewires all writes to go to R.
  • If possible, kills existing connections to M.

The app:

  • Needs to know nothing. Its connections to M fail, it reconnects and gets through to R.

A non planned failover illustration #2

Master M gets network isolated for 10 seconds, during which time we failover. R gets promoted.

Everything is as before.

If the proxy kills existing connections to M, then the fact M is back alive turns meaningless. No one gets through to M. Clients were never aware of its identity anyhow, just as they are unaware of R's identity.

Planned failover illustration

We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R.

  • In the process of promotion, M turned read-only.
  • Immediately following promotion, our failover tool updates service discovery.
  • Proxy reloads having seen the changes in service discovery.
  • Our app connects to R.

Discussion

This is a setup we use at GitHub in production. Our components are:

  • orchestrator for failover tool.
  • Consul for service discovery.
  • GLB (HAProxy) for proxy
  • Consul template running on proxy hosts:
    • listening on changes to Consul's KV data
    • Regenerate haproxy.cfg configuration file
    • reload haproxy

As mentioned earlier, the apps need not change anything. They connect to a name that is always resolved to proxy boxes. There is never a DNS change.

At the time of failover, the service discovery component must be up and available, to catch the change. Otherwise we do not strictly require it to be up at all times.

For high availability we will have multiple proxies. Each of whom must listen on changes to K/V. Ideally the name (cluster1-writer.example.net in our example) resolves to any available proxy box.

  • This, in itself, is a high availability issue. Thankfully, managing the HA of a proxy layer is simpler than that of a MySQL layer. Proxy servers tend to be stateless and equal to each other.
  • See GLB as one example for a highly available proxy layer. Cloud providers, Kubernetes, two level layered proxies, Linux Heartbeat, are all methods to similarly achieve HA.

See also:

Sample orchestrator configuration

An orchestrator configuration would look like this:

  "ApplyMySQLPromotionAfterMasterFailover": true,
  "KVClusterMasterPrefix": "mysql/master",
  "ConsulAddress": "127.0.0.1:8500",
  "ZkAddress": "srv-a,srv-b:12181,srv-c",
  "PostMasterFailoverProcesses": [
    “/just/let/me/know about failover on {failureCluster}“,
  ],

In the above:

  • If ConsulAddress is specified, orchestrator will update given Consul setup with K/V changes.
  • At 3.0.10, ZooKeeper, via ZkAddress, is still not supported by orchestrator.
  • PostMasterFailoverProcesses is here just to point out hooks are not strictly required for the operation to run.

See orchestrator configuration documentation.

All posts in this series

by shlomi at May 14, 2018 08:08 AM

May 12, 2018

MariaDB AB

Streaming Data From MariaDB Server Into MariaDB ColumnStore via MariaDB MaxScale

Streaming Data From MariaDB Server Into MariaDB ColumnStore via MariaDB MaxScale markusmakela Fri, 05/11/2018 - 23:04

In this blog post, we look at how to configure Change Data Capture (CDC) from the MariaDB Server to
MariaDB ColumnStore via MariaDB MaxScale. Our goal in this blog post is to have our analytical
ColumnStore instance reflect the changes that happen on our operational MariaDB Server.

MariaDB MaxScale Configuration

We start by creating a MaxScale configuration with a binlogrouter and avrorouter instances. The
former acts as a replication slave and fetches binary logs and the latter processes the binary logs
into CDC records.

[replication-router]
type=service
router=binlogrouter
user=maxuser
passwd=maxpwd
server_id=2
master_id=1
binlogdir=/var/lib/maxscale
mariadb10-compatibility=1
filestem=mariadb-bin

[replication-listener]
type=listener
service=replication-router
protocol=MySQLClient
port=3306

[avro-router]
type=service
router=avrorouter
source=replication-router
avrodir=/var/lib/maxscale

[avro-listener]
type=listener
service=avro-router
protocol=cdc
port=4001

Copy the contents of this file into the `maxscale.cnf` file.

The docker-compose.yml File

The next step is to clone the MaxScale repository and to create the docker-compose file.

To clone the MaxScale repository, execute the following command.

git clone https://github.com/mariadb-corporation/MaxScale.git --branch=2.2 --depth=1

After the command completes, create the `docker-compose.yml` file with the following contents in the
same directory where you cloned MaxScale.
 

version: '2'
services:
    master:
        image: mariadb:10.2
        container_name: master
        environment:
            MYSQL_ALLOW_EMPTY_PASSWORD: Y
        command: mysqld --log-bin=mariadb-bin --binlog-format=ROW --server-id=1
        ports:
            - "3306:3306"

    maxscale:
        build: ./MaxScale/docker/
        container_name: maxscale
        volumes:
            - ./maxscale.cnf:/etc/maxscale.cnf.d/maxscale.cnf
        ports:
            - "3307:3306"
            - "4001:4001"

    mcs:
        image: mariadb/columnstore_singlenode:latest
        container_name: mcs
        ports:
            - "3308:3306"

    adapter:
        image: centos:7
        container_name: adapter
        command: /bin/sleep 0xffffffff

This file contains a MariaDB Server that acts as the master server, a MaxScale instance in a CDC
configuration and a single-node ColumnStore container. We also use a plain CentOS 7 container where
we install the adapter.

To start the cluster, run the following commands.

docker-compose build
docker-compose up -d

Configuring

The next step is to copy the ColumnStore configuration file from the `mcs` container and modify it
to use the container hostname instead of the loopback address. To do this, execute the following
commands.

docker cp mcs:/usr/local/mariadb/columnstore/etc/Columnstore.xml .
sed -i 's/127.0.0.1/mcs/' Columnstore.xml
docker cp Columnstore.xml adapter:/etc/Columnstore.xml

After we have copied the configuration file into the `adapter` container, we are ready to install the adapter.

Installing Adapter

To access the container, execute `docker-compose exec adapter bash`. This will launch a new shell
where the following commands will be executed.

yum -y install epel-release
yum -y install https://downloads.mariadb.com/Data-Adapters/mariadb-columnstore-api/1.1.3/centos/x86_64/7/mariadb-columnstore-api-1.1.3-1-x86_64-centos7.rpm
curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash
yum -y install https://downloads.mariadb.com/Data-Adapters/mariadb-streaming-data-adapters/cdc-data-adapter/1.1.3/centos-7/mariadb-columnstore-maxscale-cdc-adapters-1.1.3-1-x86_64-centos7.rpm

After the adapter is installed, exit the shell.

Next we can start preparing the data on the master server and configure the replication between it
and MaxScale.

Preparing Data and Configuring Replication

We connect to the MariaDB Server running on the `master` container with the following command.

mysql -uroot -h 127.0.0.1 -P 3306

Once connected, executing the following SQL. This will prepare the server, create a table and insert
some dummy data into the table. It also modified the data to emulate changes in the database.

RESET MASTER;
CREATE USER 'maxuser'@'%' IDENTIFIED BY 'maxpwd';
GRANT ALL ON *.* TO 'maxuser'@'%';
CREATE DATABASE test;
USE test;
CREATE TABLE t1(id INT);
INSERT INTO t1 VALUES (1), (2), (3);
UPDATE t1 SET id = 4 WHERE id = 2;
DELETE FROM t1 WHERE id = 3;

Once we have created some data, we configure the replication between MaxScale and the master
server. To do this, execute the following command.

mysql -umaxuser -pmaxpwd -h 127.0.0.1 -P 3307 -e "CHANGE MASTER TO MASTER_HOST='master', MASTER_PORT=3306, MASTER_USER='maxuser', MASTER_PASSWORD='maxpwd', MASTER_LOG_FILE='mariadb-bin.000001', MASTER_LOG_POS=4; START SLAVE"

MaxScale will start to replicate events from the master server and process them into CDC records.

Create CDC User

To use the CDC system in MaxScale, we have to create a user for it. Execute the following command to create a user.

docker-compose exec maxscale maxctrl call command cdc add_user avro-router cdcuser cdcpassword

Starting the Adapter

We again execute the commands inside the adapter container. To access the container, execute
`docker-compose exec adapter bash`.

Once inside the container, we can try to start the adapter. Given that the table `test.t1` does not
exist on ColumnStore, the adapter will give us an error when we try to start it:
 

[root@d444d5c5b820 /]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h maxscale -P 4001 test t1
Table not found, create with:

    CREATE TABLE test.t1 (domain int, event_number int, event_type varchar(50), id int, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;

To create the table on ColumnStore, we have to exit the container. Once out of the container, we
connect to the ColumnStore container and create the table described in the error message with the
following command.

mysql -uroot -h 127.0.0.1 -P 3308 -e "CREATE TABLE test.t1 (domain int, event_number int, event_type varchar(50), id int, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;"

Once the table is created, we go back into the adapter container with `docker-compose exec adapter
bash` and try to start it again.

[root@d444d5c5b820 /]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h maxscale -P 4001 test t1
4 rows and 1 transactions inserted in 0.210798 seconds. GTID = 0-1-6
2 rows and 1 transactions inserted in 0.164197 seconds. GTID = 0-1-7

This time we see that it processed a total of six rows of data. We can now connect to the
ColumnStore on another terminal and see what the table contains.

[markusjm@localhost blog]$ mysql -uroot -h 127.0.0.1 -P 3308 -e "SELECT * FROM test.t1"
+--------+--------------+---------------+------+----------+-----------+------------+
| domain | event_number | event_type    | id   | sequence | server_id | timestamp  |
+--------+--------------+---------------+------+----------+-----------+------------+
|      0 |            1 | insert        |    1 |        5 |         1 | 1523948280 |
|      0 |            2 | insert        |    2 |        5 |         1 | 1523948280 |
|      0 |            3 | insert        |    3 |        5 |         1 | 1523948280 |
|      0 |            1 | update_before |    2 |        6 |         1 | 1523948280 |
|      0 |            2 | update_after  |    4 |        6 |         1 | 1523948280 |
|      0 |            1 | delete        |    3 |        7 |         1 | 1523948281 |
+--------+--------------+---------------+------+----------+-----------+------------+

The changes we did on the master MariaDB Server have been propagated to ColumnStore. To understand
what the values are, we can map the SQL statements to the rows in the table.

The first SQL statement is `INSERT INTO t1 VALUES (1), (2), (3);` which inserts three values into
the table. We see that the first three rows in the resultset are of type `insert` and the values
match what we inserted.

The next SQL statement is `UPDATE t1 SET id = 4 WHERE id = 2;` which only touches one row. Although
it modifies only one row in the database, it generated two rows in ColumnStore. This happened
because the MaxScale CDC system stores both the before and after images of the modified row. This
allows easy comparisons between new and old values.

The final SQL statement was `DELETE FROM t1 WHERE id = 3;` which deleted one row. This statement was
converted to a delete entry with the data that was deleted (row with `id` of 3). This allows deleted
data to be retained for analytical and auditing purposes without actually storing it on the master
database.

In this blog post, we look at how to configure Change Data Capture from the MariaDB Server to
MariaDB ColumnStore via MariaDB MaxScale. Our goal in this blog post is to have our analytical
ColumnStore instance reflect the changes that happen on our operational MariaDB Server.

Login or Register to post comments

by markusmakela at May 12, 2018 03:04 AM

May 11, 2018

Peter Zaitsev

This Week in Data with Colin Charles 39: a valuable time spent at rootconf.in

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

rootconf.in 2018 just ended, and it was very enjoyable to be in Bangalore for the conference. The audience was large, the conversations were great, and overall I think this is a rather important conference if you’re into the “DevOps” movement (or are a sysadmin!). From the data store world, Oracle MySQL was a sponsor, as was MyDBOPS (blog), and Elastic. There were plenty more, including Digital Ocean/GoJek/Walmart Labs — many MySQL users.

I took a handful of pictures with people, and here are some of the MyDBOPS team and myself.  They have over 20 employees, and serve the Indian market at rates that would be more palatable than straight up USD rates. Traveling through Asia, many businesses always do find local partners and offer local pricing; this really becomes more complex in the SaaS space (everyone pays the same rate generally) and also the services space.

Colin at Rootconf with Oracle
Some of the Oracle MySQL team who were exhibiting were very happy they got a good amount of traffic to the booth based on stuff discussed at the talk and BOF.

From a talk standpoint, I did a keynote for an hour and also a BoF session for another hour (great discussion, lots of blog post ideas from there), and we had a Q&A session for about 15 minutes. There were plenty of good conversations in the hallway track.

A quick observation that I notice happens everywhere: many people don’t realize features that have existed in MySQL since 5.6/5.7.  So they are truly surprised with stuff in 8.0 as well. It is clear there is a huge market that would thrive around education. Not just around feature checklists, but also around how to use features. Sometimes, this feels like the MySQL of the mid-2000’s — getting apps to also use new features, would be a great thing.

Releases

This seems to have been a quiet week on the releases front.

Are you a user of Amazon Aurora MySQL? There is now the Amazon Aurora Backtrack feature, which allows you to go back in time. It is described to work as:

Aurora uses a distributed, log-structured storage system (read Design Considerations for High Throughput Cloud-Native Relational Databases to learn a lot more); each change to your database generates a new log record, identified by a Log Sequence Number (LSN). Enabling the backtrack feature provisions a FIFO buffer in the cluster for storage of LSNs. This allows for quick access and recovery times measured in seconds.

Link List

Upcoming appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 39: a valuable time spent at rootconf.in appeared first on Percona Database Performance Blog.

by Colin Charles at May 11, 2018 03:42 PM

Jean-Jerome Schmidt

My MySQL Database is Corrupted... What Do I Do Now?

How do MySQL tables get corrupted? There are many ways to spoil data files. Often, corruption is due to defects in the underlying platform, which MySQL relies on to store and retrieve data - disk subsystem, controllers, communication channels, drivers, firmware or other hardware faults. Data corruption can also occur if the MySQL server daemon restarts suddenly, or your server reboots due to a crash of other OS components. If the database instance was in the middle of writing data to disk, it could write the data partially which may end up with a page checksum that is different than expected. There have also been bugs in MySQL so even if the server hardware is ok, MySQL itself can cause corruption.

Usually when MySQL data gets corrupted the recommendation is to restore it from the last backup, switch to DR server or take down the affected node if you have Galera cluster to serve data immediately from other nodes. In some cases you can't - if the backup is not there, the cluster was never set up, your replication is down for a very long time, or the DR procedure was never tested. Even if you have a backup, you may still want to take some actions to attempt recovery as it may take less time get back online.

MyISAM, the bad and ugly

InnoDB is more fault-tolerant than MyISAM. InnoDB has auto_recovery features and is much safer as compared to the older MyISAM engine.

MyISAM tables can easily get corrupted when lots of writes happen and a lot of locks happen on that table. The storage engine "writes" data to the filesystem cache, which may take some time before it is flushed to disk. Therefore if your server restarts suddenly, some unknown amount of data in the cache is lost. That's a usual way for MyISAM data to be corrupted. The recommendation is to migrate from MyISAM to InnoDB, but there may be cases where this is not possible.

Primum non nocere, the backup

Before you attempt to repair corrupted tables, you should back your database files first. Yes, it’s already broken but this is to minimize the risk of possible further damage which may be caused by a recovery operation. There is no guarantee that any action you take will not harm untouched data blocks. Forcing InnoDB recovery with values greater than 4 can corrupt data files, so make sure you will do it with prior backup and ideally on a separate physical copy of the database.

To back up all of the files from all of your databases, follow these steps:

Stop the MySQL server

service mysqld stop

Type the following command for your datadir.

cp -r /var/lib/mysql /var/lib/mysql_bkp

After we have a backup copy of the data directory, we are ready to start troubleshooting.

Data corruption identification

The error log is your best friend. Usually, when data corruption happens, you will find relevant information (including links to documentation) in the error log. If you don't know where it's located, check my.cnf and variable log_error, for more details check this article https://dev.mysql.com/doc/refman/8.0/en/error-log-destination-configuration.html. What you should also know is your storage engine type. You can find this information in the error log or in information_schema.

mysql> select table_name,engine from information_schema.tables where table_name = '<TABLE>' and table_schema = '<DATABASE>';

The main tools/commands to diagnose issues with data corruption are CHECK TABLE, REPAIR TABLE, and myisamchk. The mysqlcheck client performs table maintenance: It checks, repairs (MyISAM), optimizes or analyzes tables while MySQL is running.

mysqlcheck -uroot -p <DATABASE>

Replace DATABASE with the name of the database, and replace TABLE with the name of the table that you want to check:

mysqlcheck -uroot -p <DATABASE> <TABLE>

Mysqlcheck checks the specified database and tables. If a table passes the check, mysqlcheck displays OK for the table.

employees.departments                              OK
employees.dept_emp                                 OK
employees.dept_manager                             OK
employees.employees                                OK
Employees.salaries
Warning  : Tablespace is missing for table 'employees/salaries'
Error    : Table 'employees.salaries' doesn't exist in engine
status   : Operation failed
employees.titles                                   OK

Data corruption issues may be also related to permission issues. In some cases, OS can switch mount point to read-only mode due to R/W issues or this can be caused by a user who accidentally changed ownership of the data files. In such cases, you will find relevant information in the error log.

[root@node1 employees]# ls -rtla
...
-rw-rw----. 1 mysql mysql  28311552 05-10 06:24 titles.ibd
-rw-r-----. 1 root  root  109051904 05-10 07:09 salaries.ibd
drwxr-xr-x. 7 mysql mysql      4096 05-10 07:12 ..
drwx------. 2 mysql mysql      4096 05-10 07:17 .

MySQL Client

MariaDB [employees]> select count(*) from salaries;
ERROR 1932 (42S02): Table 'employees.salaries' doesn't exist in engine

Error log entry

2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: Failed to find tablespace for table `employees`.`salaries` in the cache. Attempting to load the tablespace with space id 9
2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: Operating system error number 13 in a file operation.
2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: The error means mysqld does not have the access rights to the directory.
2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: Cannot open datafile for read-only: './employees/salaries.ibd' OS error: 81
2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: Operating system error number 13 in a file operation.
2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: The error means mysqld does not have the access rights to the directory.
2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: Could not find a valid tablespace file for `employees/salaries`. Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Recovering InnoDB table

If you are using the InnoDB storage engine for a database table, you can run the InnoDB recovery process.
To enable auto recovery MySQL needs innodb_force_recovery option to be enabled. Innodb_force_recovery forces InnoDB to start up while preventing background operations from running, so that you can dump your tables.

To do this open my.cnf and add the following line to the [mysqld] section:

[mysqld]
innodb_force_recovery=1
service mysql restart

You should start from innodb_force_recovery=1 save the changes to my.cnf file, and then restart the MySQL server using the appropriate command for your operating system. If you are able to dump your tables with an innodb_force_recovery value of 3 or less, then you are relatively safe. In many cases you will have to go up to 4 and as you already know that can corrupt data.

[mysqld]
innodb_force_recovery=1
service mysql restart

If needed change to the higher value, six is the maximum and most dangerous.

Once you are able to start your database, type the following command to export all of the databases to the databases.sql file:

mysqldump --all-databases --add-drop-database --add-drop-table > dump.sql

Start mysql, and then try to drop the affected database or databases using the DROP DATABASE command. If MySQL is unable to drop a database, you can delete it manually using the steps below after you stop the MySQL server.

service mysqld stop

If you were unable to drop a database, type the following commands to delete it manually.

cd /var/lib/mysql
rm -rf <DATABASE>

Make sure you do not delete the internal database directories.
After you are done, comment out the following line in the [mysqld] to disable InnoDB recovery mode.

#innodb_force_recovery=...

Save the changes to the my.cnf file, and then start the MySQL server

service mysqld start

Type the following command to restore the databases from the backup file you created in step 5:

mysql> tee import_database.log
mysql> source dump.sql

Repairing MyISAM

If mysqlcheck reports an error for a table, type the mysqlcheck command with -repair flag to fix it. The mysqlcheck repair option works while the server is up and running.

mysqlcheck -uroot -p -r <DATABASE> <TABLE>

If the server is down and for any reason mysqlcheck cannot repair your table, you still have an option to perform recovery directly on files using myisamchk. With myisamchk, you need to make sure that the server doesn't have the tables open.

Stop the MySQL

service mysqld stop
cd /var/lib/mysql

Change to the directory where the database is located.

cd /var/lib/mysql/employees
myisamchk <TABLE>

To check all of the tables in a database, type the following command:

myisamchk *.MYI

If the previous command does not work, you can try deleting temporary files that may be preventing myisamchk from running correctly. To do this, change back to the data dir directory, and then run the following command:

ls */*.TMD

If there are any .TMD files listed, delete them:

rm */*.TMD

Then re-run myisamchk.

To attempt repair a table, execute the following command, replacing TABLE with the name of the table that you want to repair:

myisamchk --recover <TABLE>

Restart the MySQL server

service mysqld start

How to avoid data loss

There are several things you can do to minimize the risk of unrecoverable data. First of all backups. The problem with backups is that sometimes they can be overlooked. For cron scheduled backups, usually we write wrapper scripts that detect problems in the backup log, but that does not include cases when the backup didn’t start at all. Cron can sometimes hang and often there is no monitoring set on it. Another potential issue could be the case when the backup was never set up. The good practice is to run reports from a separate tool that will analyze the backup status and inform you about missing backups schedules. You can use ClusterControl for that or write your own programs.

ClusterControl operational backup report
ClusterControl operational backup report

To reduce the impact of the possible data corruption you should always consider clustered systems. It’s just a matter of time when the database will crash or get corrupted, so it’s good to have a copy which you can switch to. It could be Master / Slave replication. The important aspect here is to have safe automatic recovery to minimize the complexity of the switchover and minimize the recovery time (RTO).

ClusterControl auto recovery features
ClusterControl auto recovery features

by Bart Oles at May 11, 2018 09:58 AM

May 10, 2018

Peter Zaitsev

Why We’ve Deprecated MongoRocks in Percona Server for MongoDB 3.6

MongoRocks

MongoRocksIn this blog post, we’ll look at why we deprecated MongoRocks in Percona Server for MongoDB 3.6, and provide some guidance on switching from MongoRocks to WiredTiger.

On April 24, 2018, Percona announced the availability of Percona Server for MongoDB 3.6. If you read the blog post announcing the release, you probably saw the following note:

“MongoRocks is deprecated in Percona Server for MongoDB 3.6 and it will be fully removed in the next major version of Percona Server for MongoDB.”

Why did we make this decision, and what should you do if you’re using MongoRocks?

There are two significant factors that contributed to our decision to deprecate MongoRocks in Percona Server for MongoDB 3.6:

  1. We’ve seen little commercial interest in MongoRocks over the past two years, and
  2. MongoDB’s deep integration with WiredTiger makes supporting alternative storage engines increasingly difficult.

Little Commercial Interest

We haven’t seen strong demand for MongoRocks from our customers. Whenever we talk to Percona customers and Percona Server for MongoDB users, we receive pretty consistent feedback about some new features they would like to see, but nobody ever mentions storage engines. When we ask specifically which storage engine they use, it’s almost always WiredTiger. MongoRocks rarely comes up.

Deep WiredTiger Integration Makes Alternative Storage Engine Support Increasingly Difficult

MongoDB 3.6 introduced a number of exciting new features, including sessions, retryable writes and causal consistency (both of which are based on the sessions work). And, as was formally announced in February, MongoDB 4.0 will bring multi-document transactions for replica sets. All of these important new features work properly in large part because of significant changes MongoDB, Inc. made to the storage layer of WiredTiger. Additionally, given that the MongoDB server team is planning on deprecating MMAPv1 in MongoDB 4.0, we expect MongoDB, Inc. will continue making fundamental changes to WiredTiger to create new features in MongoDB.

Rearchitecting RocksDB (the storage layer of MongoRocks) to allow it to work properly with the new features in MongoDB 3.6 and with multi-document transactions in MongoDB 4.0 would be a massive undertaking, and we believe more users will be more positively affected if our engineering resources instead work on frequently-requested features for Percona Server for MongoDB.

For those of you who are using MongoRocks with Percona Server for MongoDB, we know this situation isn’t ideal; but we want to make sure you have a database that’s highly performant and reliable and using all of the latest and greatest features, including sessions (and soon, transactions). The best way to achieve that is to migrate from MongoRocks to WiredTiger and upgrade to Percona Server for MongoDB 3.6.

How to Switch to WiredTiger and then Upgrade

If you are using MongoRocks with an earlier version of Percona Server for MongoDB, and you wish to upgrade to Percona Server for MongoDB 3.6, we strongly encourage you to first switch to WiredTiger, then upgrade. For detailed instructions on how to change MongoDB storage engines without downtime, please see this blog post, appropriately titled, “How to Change MongoDB Storage Engines Without Downtime.” You can then follow the steps from the Upgrading to 3.6 section of the Percona Server for MongoDB 3.6 documentation.

We suspect sessions and transactions are just the tip of the iceberg of great new functionality that MongoDB will be able to implement by building deep integrations between the database and WiredTiger. We look forward to seeing what comes next!

The post Why We’ve Deprecated MongoRocks in Percona Server for MongoDB 3.6 appeared first on Percona Database Performance Blog.

by Jeff Sandstrom at May 10, 2018 03:00 PM

Jean-Jerome Schmidt

Deploying Cloud Databases with ClusterControl 1.6

ClusterControl 1.6 comes with tighter integration with AWS, Azure and Google Cloud, so it is now possible to launch new instances and deploy MySQL, MariaDB, MongoDB and PostgreSQL directly from the ClusterControl user interface. In this blog, we will show you how to deploy a cluster on Amazon Web Services.

Note that this new feature requires two modules called clustercontrol-cloud and clustercontrol-clud. The former is a helper daemon which extends CMON capability of cloud communication, while the latter is a file manager client to upload and download files on cloud instances. Both packages are dependencies of the clustercontrol UI package, which will be installed automatically if they do not exist. See the Components documentation page for details.

Cloud Credentials

ClusterControl allows you to store and manage your cloud credentials under Integrations (side menu) -> Cloud Providers:

The supported cloud platforms in this release are Amazon Web Services, Google Cloud Platform and Microsoft Azure. On this page, you can add new cloud credentials, manage existing ones and also connect to your cloud platform to manage resources.

The credentials that have been set up here can be used to:

  • Manage cloud resources
  • Deploy databases in the cloud
  • Upload backup to cloud storage

The following is what you would see if you clicked on "Manage AWS" button:

You can perform simple management tasks on your cloud instances. You can also check the VPC settings under "AWS VPC" tab, as shown in the following screenshot:

The above features are useful as reference, especially when preparing your cloud instances before you start the database deployments.

Database Deployment on Cloud

In previous versions of ClusterControl, database deployment on cloud would be treated similarly to deployment on standard hosts, where you had to create the cloud instances beforehand and then supply the instance details and credentials in the "Deploy Database Cluster" wizard. The deployment procedure was unaware of any extra functionality and flexibility in the cloud environment, like dynamic IP and hostname allocation, NAT-ed public IP address, storage elasticity, virtual private cloud network configuration and so on.

With version 1.6, you just need to supply the cloud credentials, which can be managed via the "Cloud Providers" interface and follow the "Deploy in the Cloud" deployment wizard. From ClusterControl UI, click Deploy and you will be presented with the following options:

At the moment, the supported cloud providers are the three big players - Amazon Web Service (AWS), Google Cloud and Microsoft Azure. We are going to integrate more providers in the future release.

In the first page, you will be presented with the Cluster Details options:

In this section, you would need to select the supported cluster type, MySQL Galera Cluster, MongoDB Replica Set or PostgreSQL Streaming Replication. The next step is to choose the supported vendor for the selected cluster type. At the moment, the following vendors and versions are supported:

  • MySQL Galera Cluster - Percona XtraDB Cluster 5.7, MariaDB 10.2
  • MongoDB Cluster - MongoDB 3.4 by MongoDB, Inc and Percona Server for MongoDB 3.4 by Percona (replica set only).
  • PostgreSQL Cluster - PostgreSQL 10.0 (streaming replication only).

In the next step, you will be presented with the following dialog:

Here you can configure the selected cluster type accordingly. Pick the number of nodes. The Cluster Name will be used as the instance tag, so you can easily recognize this deployment in your cloud provider dashboard. No space is allowed in the cluster name. My.cnf Template is the template configuration file that ClusterControl will use to deploy the cluster. It must be located under /usr/share/cmon/templates on the ClusterControl host. The rest of the fields are pretty self-explanatory.

The next dialog is to select the cloud credentials:

You can choose the existing cloud credentials or create a new one by clicking on the "Add New Credential" button. The next step is to choose the virtual machine configuration:

Most of the settings in this step are dynamically populated from the cloud provider by the chosen credentials. You can configure the operating system, instance size, VPC setting, storage type and size and also specify the SSH key location on the ClusterControl host. You can also let ClusterControl generate a new key specifically for these instances. When clicking on "Add New" button next to Virtual Private Cloud, you will be presented with a form to create a new VPC:

VPC is a logical network infrastructure you have within your cloud platform. You can configure your VPC by modifying its IP address range, create subnets, configure route tables, network gateways, and security settings. It's recommended to deploy your database infrastructure in this network for isolation, security and routing control.

When creating a new VPC, specify the VPC name and IPv4 address block with subnet. Then, choose whether IPv6 should be part of the network and the tenancy option. You can then use this virtual network for your database infrastructure.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

The last step is the deployment summary:

In this stage, you need to choose which subnet under the chosen virtual network that you want the database to be running on. Take note that the chosen subnet MUST have auto-assign public IPv4 address enabled. You can also create a new subnet under this VPC by clicking on "Add New Subnet" button. Verify if everything is correct and hit the "Deploy Cluster" button to start the deployment.

You can then monitor the progress by clicking on the Activity -> Jobs -> Create Cluster -> Full Job Details:

Depending on the connections, it could take 10 to 20 minutes to complete. Once done, you will see a new database cluster listed under the ClusterControl dashboard. For PostgreSQL streaming replication cluster, you might need to know the master and slave IP addresses once the deployment completes. Simply go to Nodes tab and you would see the public and private IP addresses on the node list on the left:

Your database cluster is now deployed and running on AWS.

At the moment, the scaling up works similar to the standard host, where you need to create a cloud instance manually beforehand and specify the host under ClusterControl -> pick the cluster -> Add Node.

Under the hood, the deployment process does the following:

  1. Create cloud instances
  2. Configure security groups and networking
  3. Verify the SSH connectivity from ClusterControl to all created instances
  4. Deploy database on every instance
  5. Configure the clustering or replication links
  6. Register the deployment into ClusterControl

Take note that this feature is still in beta. Nevertheless, you can use this feature to speed up your development and testing environment by controlling and managing the database cluster in different cloud providers from a single user interface.

Database Backup on Cloud

This feature has been around since ClusterControl 1.5.0, and now we added support for Azure Cloud Storage. This means that you can now upload and download the created backup on all three major cloud providers (AWS, GCP and Azure). The upload process happens right after the backup is successfully created (if you toggle "Upload Backup to the Cloud") or you can manually click on the cloud icon button of the backup list:

You can then download and restore backups from the cloud, in case you lost your local backup storage, or if you need to reduce local disk space usage for your backups.

Current Limitations

There are some known limitations for the cloud deployment feature, as stated below:

  • There is currently no 'accounting' in place for the cloud instances. You will need to manually remove the cloud instances if you remove a database cluster.
  • You cannot add or remove a node automatically with cloud instances.
  • You cannot deploy a load balancer automatically with a cloud instance.

We have extensively tested the feature in many environments and setups but there are always corner cases that we might have missed out upon. For more information, please take a look at the change log.

Happy clustering in the cloud!

by ashraf at May 10, 2018 09:58 AM