Planet MariaDB

August 18, 2018

Valeriy Kravchuk

On Fine MySQL Manual

Today I am going to provide some details on the last item in my list of problems with Oracle's way of MySQL server development, maintenance of MySQL Manual. I stated that:
"MySQL Manual still have many details missing and is not fixed fast enough.
Moreover, it is not open source...
"
Let me explain the above:
  1. MySQL Reference Manual is not open source. It used to be built from DocBook XML sources. Probably that's still the case. But you can not find the source code in open repositories (please, correct me if I am wrong, I tried to search...) That's because it is NOT open source. It says this clearly in Preface:
    "This documentation is NOT distributed under a GPL license. Use of this documentation is subject to the following terms:
    ...
    Any other use, such as any dissemination of printed copies or use of this documentation, in whole or in part, in another publication, requires the prior written consent from an authorized representative of Oracle. Oracle and/or its affiliates reserve any and all rights to this documentation not expressly granted above.
    "
    It was NOT Oracle who closed the source (as far as I remember, the manual was not GPL even in 2004, when I started to care about MySQL in general). But Oracle had a chance to change the license and set up some better contribution process for MySQL Community users, with many good writers among them. They decided not do this, so creators of forks and derived software have to duplicate efforts and rewrite everything themselves, and the only real way to help make the manual better is to report bugs.
  2. Quality of new documentation is not improved much. We, MySQL Community users, have to report bugs in the manual, as it still has some details missing or documented wrongly. Let me illustrate this with recent 10 or so documentation requests users made (I skipped reports for features I do not care about for now, like group replication):
    • Bug #91955 - "8.0 broke plugin API for C". According to the comment from Oracle developer in this bug reported by Laurynas Biveinis, writing plugins in C is not supported any more... But this change is NOT properly documented.
    • Bug #91781 - "Add version matrix in "Chapter 2 Keywords and Reserved Words". Good idea in the manual is not consistently implemented.
    • Bug #91743 - "performance_schema.data_locks not working correctly". Nice feature was added to Performance Schema (now we can check and study InnoDB locks with SQL!), but it is still not completely and properly documented.
    • Bug #91654 - "Handling an Unexpected Halt of a Replication Slave" Documentation is uncertain". This is also an example of improper bug handling - when I worked in Oracle newer bugs were usually declared duplicates of older ones. Here the opposite decision was made, even though both reports were from the same user, Uday Varagani, who explicitly asked to change the decision. Obviously, documentation requests do not get any extra care comparing to other kinds of bugs, quite to the contrary...
    • Bug #91648 - "Numeric literals must be a number between 0 and 9999". Surely ports with numbers larger than 9999 can be used.
    • Bug #91642 - "Document how protocol version 4.0 and 4.1 differ on how strings are terminated". As noted by Rene' Cannao', comments in the code are still sometimes more useful than documentation.
    • Bug #91549 - "MDL lock queue order seems to depend on table names". Make sure to read last comment in this nice request by Shane Bester. Dmitry Lenev provides some details on priorities of MDL locks in it. There are still cases when bugs and documentation requests document some details better than fine MySQL Manual!
    • Bug #90997 - "Handling an Unexpected Halt of a Replication Slave" manual page is wrong". Sveta Smirnova highlighted a totally misleading statement in the manual.
    • Bug #90935 - "Modify relay_log_recovery Documentation". Simple documentation request stays "Open" for 3 months. Definitely processing documentation requests is not a highest priority for Oracle engineers.
    • Bug #90680 - "dragnet logging: document how to use / find error_symbol codes". Even if request comes from a customer or otherwise well known bug reporter, like Simon Mudd, and it's related to some totally new cool feature of MySQL 8, it can wait for the fix for months...
    You can make your own conclusions from the above. But I do not see any good trends in the way new documentation is created or documentation requests are processed recently. Same problems as 4 years ago (see more on that in a side note below).
  3. Older documentation requests get even less attention than recent ones sometimes, even though they may highlight problems with software itself, not the MySQL Manual. Let me illustrate this with a few bugs I reported:
    • Bug #83640 - "Locks set by DELETE statement on already deleted record". I explicitly asked to
      "...document in the manual what locks this kind of DELETE sets when it encountered a record already marked as deleted, and why"
      This still had NOT happened. Moreover, eventually both MySQL and MariaDB engineers decided that current implementation of locking for this case is wrong and can be improved, so this report ended up as InnoDB bug. Check related  MDEV-11215 for more details.
    • Bug #73413 - "Manual does not explain MTS implementation in details". This is one of my favorite documentation requests. I've got a suggestion to explain what I want to see documented, in details. I tried my best, but if you want to get more details, read this blog.
    • Bug #72368 - "Empty/zero statistics for imported tablespace until explicit ANALYZE TABLE". This is a bug in the way persistent statistics (the feature I truly hate) in InnoDB is re-estimated "automatically". But until the bug is fixed, I asked to document current implementation (and problems with it). So, where current implementation is properly documented? If only in the bug reports...
    • Bug #71294 - "Manual page for P_S.FILE_INSTANCES table does not explain '~' in FILE_NAME".  They pretend they do not get the point:
      "Per Marc Alff, the Performance Schema is simply reporting what it gets from runtime. If runtime is reporting nonexistent filenames, that's a server issue.

      Recategorizing this as a server bug.
      "
    • Bug #71293 - "Manual page for P_S.FILE_INSTANCES table does not explain EVENT_NAME values". Now they are waiting for the new DOCUMENTATION column in the setup_instruments table to be filled in with something by server developers... The code is the documentation? OK, bus as we know from the experience (see slides 44-46 here) chances to get a bug in Performance Schema fixed fast are even less than to see it properly and completely documented...
There are more problems with MySQL documentation (not only reference manual), but at the moment I consider 3 highlighted and illustrated above the most serious.

Regent's Canal is nice. If I only knew how to operate the locks there... MySQL Manual also misses some information about locks.
As a side note, it's not the first time I write about MySQL Manual. You can find some useful details in the following posts:
  • "What's wrong with MySQL Manual". In 2014, after spending few weeks reporting up to 5 documentation bugs per day, I thought that, essentially, there is nothing much wrong with it - it's fine, well indexed by Google and has meaningful human-readable URLs. Few problems listed were lack of careful readers (I tried my best to fix that), limited attention to old documentation requests, some pages with not so much useful content and lack of "How to" documents. The later I also tried to fix to some small extent in this blog, see howto tagged posts. The real fix came mostly from Percona's blog, though.
  • I have a special "missing manual" tag for blog posts that mention at least one bug in the manual.
  • I tried to use "missing_manual" tag consistently for my own documentation requests. Last year I shared a detailed enough review of the bugs with that tag that were still active.
As a yet another side note, I tried once to create a kind of community driven "missing manual" project, but failed. Writing manual from scratch is a (hard) full time job, while my full time job is providing support to users of MySQL, MariaDB and Percona software...

That said, I also wanted to join MySQL's documentation team in the past, but this was not possible at least because I am not a native speaker. If anything changed in this regard, I am still open to a job offer of this kind. My conditions for an acceptable offer from Oracle are known to all interested parties and they include (but are not limited to) at least 4 times the salary I had before I quit (it was almost 6 years ago) and working for Oracle's office in Paris (because in Oracle office you ere employed by formally matters a lot for your happiness and success as employee).

In case of no offer in upcoming week, I'll just continue to write my annoying but hopefully somewhat useful blog posts (until MySQL Reference Manual becomes true open source project ;)

by Valeriy Kravchuk (noreply@blogger.com) at August 18, 2018 11:53 AM

August 17, 2018

Peter Zaitsev

Percona Server for MySQL 5.6.41-84.1 Is Now Available

Percona Server for MySQL 5.6

Percona Server for MySQL 5.6Percona announces the release of Percona Server for MySQL 5.6.41-84.1 on August 17, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.6.41, including all the bug fixes in it. Percona Server for MySQL 5.6.41-84.1 is now the current GA release in the 5.6 series. All of Percona’s software is open-source and free.

Bugs Fixed
  • A simple SELECT query on a table with CHARSET=euckr COLLATE=euckr_bin could return different results each time it was executed. Bug fixed #4513 (upstream 91091).
  • Percona Server 5.6.39-83.1 could crash when altering an InnoDB table that has a full-text search index defined. Bug fixed #3851 (upstream 68987).
Other Bugs Fixed
  • #3325 “online upgrade GTID cause binlog damage in high write QPS situation”
  • #3976 “Errors in MTR tests main.variables-big, main.information_schema-big, innodb.innodb_bug14676111”
  • #4506 “Backport fixes from 8.0 for InnoDB memcached Plugin”

Find the release notes for Percona Server for MySQL 5.6.41-84.1 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.6.41-84.1 Is Now Available appeared first on Percona Database Performance Blog.

by Borys Belinsky at August 17, 2018 10:20 PM

Percona Server for MySQL 5.5.61-38.13 Is Now Available

Percona Server for MySQL 5.6

Percona Server for MySQL 5.6Percona announces the release of Percona Server for MySQL 5.5.61-38.13 on August 17, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.5.61, including all the bug fixes in it. Percona Server for MySQL 5.5.61-38.13 is now the current GA release in the 5.5 series. All of Percona’s software is open-source and free.

Bugs Fixed
  • The --innodb-optimize-keys option of the mysqldump utility fails when a column name is used as a prefix of a column which has the AUTO_INCREMENT attribute. Bug fixed #4524.
Other Bugs Fixed
  • #4566 “stack-use-after-scope in reinit_io_cache()” (upstream 91603)
  • #4581 “stack-use-after-scope in _db_enter_() / mysql_select_db()” (upstream 91604)
  • #4600 “stack-use-after-scope in _db_enter_() / get_upgrade_info_file_name()” (upstream 91617)
  • #3976 “Errors in MTR tests main.variables-big, main.information_schema-big, innodb.innodb_bug14676111”

Find the release notes for Percona Server for MySQL 5.5.61-38.13 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.5.61-38.13 Is Now Available appeared first on Percona Database Performance Blog.

by Borys Belinsky at August 17, 2018 07:30 PM

Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon

postgres mysql replication using pg_chameleon

postgres mysql replication using pg_chameleonReplication is one of the well-known features that allows us to build an identical copy of a database. It is supported in almost every RDBMS. The advantages of replication may be huge, especially HA (High Availability) and load balancing. But what if we need to build replication between 2 heterogeneous databases like MySQL and PostgreSQL? Can we continuously replicate changes from a MySQL database to a PostgreSQL database? The answer to this question is pg_chameleon.

For replicating continuous changes, pg_chameleon uses the mysql-replication library to pull the row images from MySQL, which are transformed into a jsonb object. A pl/pgsql function in postgres decodes the jsonb and replays the changes into the postgres database. In order to setup this type of replication, your mysql binlog_format must be “ROW”.

A few points you should know before setting up this tool :

  1. Tables that need to be replicated must have a primary key.
  2. Works for PostgreSQL versions > 9.5 and MySQL > 5.5
  3. binlog_format must be ROW in order to setup this replication.
  4. Python version must be > 3.3

When you initialize the replication, pg_chameleon pulls the data from MySQL using the CSV format in slices, to prevent memory overload. This data is flushed to postgres using the COPY command. If COPY fails, it tries INSERT, which may be slow. If INSERT fails, then the row is discarded.

To replicate changes from mysql, pg_chameleon mimics the behavior of a mysql slave. It creates the schema in postgres, performs the initial data load, connects to MySQL replication protocol, stores the row images into a table in postgres. Now, the respective functions in postgres decode those rows and apply the changes. This is similar to storing relay logs in postgres tables and applying them to a postgres schema. You do not have to create a postgres schema using any DDLs. This tool automatically does that for the tables configured for replication. If you need to specifically convert any types, you can specify this in the configuration file.

The following is just an exercise that you can experiment with and implement if it completely satisfies your requirement. We performed these tests on CentOS Linux release 7.4.

Prepare the environment

Set up Percona Server for MySQL

InstallMySQL 5.7 and add appropriate parameters for replication.

In this exercise, I have installed Percona Server for MySQL 5.7 using YUM repo.

yum install http://www.percona.com/downloads/percona-release/redhat/0.1-6/percona-release-0.1-6.noarch.rpm
yum install Percona-Server-server-57
echo "mysql ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
usermod -s /bin/bash mysql
sudo su - mysql

pg_chameleon requires the following the parameters to be set in your my.cnf file (parameter file of your MySQL server). You may add the following parameters to /etc/my.cnf

binlog_format= ROW
binlog_row_image=FULL
log-bin = mysql-bin
server-id = 1

Now start your MySQL server after adding the above parameters to your my.cnf file.

$ service mysql start

Fetch the temporary root password from mysqld.log, and reset the root password using mysqladmin

$ grep "temporary" /var/log/mysqld.log
$ mysqladmin -u root -p password 'Secret123!'

Now, connect to your MySQL instance and create sample schema/tables. I have also created an emp table for validation.

$ wget http://downloads.mysql.com/docs/sakila-db.tar.gz
$ tar -xzf sakila-db.tar.gz
$ mysql -uroot -pSecret123! < sakila-db/sakila-schema.sql
$ mysql -uroot -pSecret123! < sakila-db/sakila-data.sql
$ mysql -uroot -pSecret123! sakila -e "create table emp (id int PRIMARY KEY, first_name varchar(20), last_name varchar(20))"

Create a user for configuring replication using pg_chameleon and give appropriate privileges to the user using the following steps.

$ mysql -uroot -p
create user 'usr_replica'@'%' identified by 'Secret123!';
GRANT ALL ON sakila.* TO 'usr_replica'@'%';
GRANT RELOAD, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'usr_replica'@'%';
FLUSH PRIVILEGES;

While creating the user in your mysql server (‘usr_replica’@’%’), you may wish to replace % with the appropriate IP or hostname of the server on which pg_chameleon is running.

Set up PostgreSQL

Install PostgreSQL and start the database instance.

You may use the following steps to install PostgreSQL 10.x

yum install https://yum.postgresql.org/10/redhat/rhel-7.4-x86_64/pgdg-centos10-10-2.noarch.rpm
yum install postgresql10*
su - postgres
$/usr/pgsql-10/bin/initdb
$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data start

As seen in the following logs, create a user in PostgreSQL using which pg_chameleon can write changed data to PostgreSQL. Also create the target database.

postgres=# CREATE USER usr_replica WITH ENCRYPTED PASSWORD 'secret';
CREATE ROLE
postgres=# CREATE DATABASE db_replica WITH OWNER usr_replica;
CREATE DATABASE

Steps to install and setup replication using pg_chameleon

Step 1: In this exercise, I installed Python 3.6 and pg_chameleon 2.0.8 using the following steps. You may skip the python install steps if you already have the desired python release. We can create a virtual environment if the OS does not include Python 3.x by default.

yum install gcc openssl-devel bzip2-devel wget
cd /usr/src
wget https://www.python.org/ftp/python/3.6.6/Python-3.6.6.tgz
tar xzf Python-3.6.6.tgz
cd Python-3.6.6
./configure --enable-optimizations
make altinstall
python3.6 -m venv venv
source venv/bin/activate
pip install pip --upgrade
pip install pg_chameleon

Step 2: This tool requires a configuration file to store the source/target server details, and a directory to store the logs. Use the following command to let pg_chameleon create the configuration file template and the respective directories for you.

$ chameleon set_configuration_files

The above command would produce the following output, which shows that it created some directories and a file in the location where you ran the command.

creating directory /var/lib/pgsql/.pg_chameleon
creating directory /var/lib/pgsql/.pg_chameleon/configuration/
creating directory /var/lib/pgsql/.pg_chameleon/logs/
creating directory /var/lib/pgsql/.pg_chameleon/pid/
copying configuration example in /var/lib/pgsql/.pg_chameleon/configuration//config-example.yml

Copy the sample configuration file to another file, lets say, default.yml

$ cd .pg_chameleon/configuration/
$ cp config-example.yml default.yml

Here is how my default.yml file looks after adding all the required parameters. In this file, we can optionally specify the data type conversions, tables to skipped from replication and the DML events those need to skipped for selected list of tables.

---
#global settings
pid_dir: '~/.pg_chameleon/pid/'
log_dir: '~/.pg_chameleon/logs/'
log_dest: file
log_level: info
log_days_keep: 10
rollbar_key: ''
rollbar_env: ''
# type_override allows the user to override the default type conversion into a different one.
type_override:
  "tinyint(1)":
    override_to: boolean
    override_tables:
      - "*"
#postgres  destination connection
pg_conn:
  host: "localhost"
  port: "5432"
  user: "usr_replica"
  password: "secret"
  database: "db_replica"
  charset: "utf8"
sources:
  mysql:
    db_conn:
      host: "localhost"
      port: "3306"
      user: "usr_replica"
      password: "Secret123!"
      charset: 'utf8'
      connect_timeout: 10
    schema_mappings:
      sakila: sch_sakila
    limit_tables:
#      - delphis_mediterranea.foo
    skip_tables:
#      - delphis_mediterranea.bar
    grant_select_to:
      - usr_readonly
    lock_timeout: "120s"
    my_server_id: 100
    replica_batch_size: 10000
    replay_max_rows: 10000
    batch_retention: '1 day'
    copy_max_memory: "300M"
    copy_mode: 'file'
    out_dir: /tmp
    sleep_loop: 1
    on_error_replay: continue
    on_error_read: continue
    auto_maintenance: "disabled"
    gtid_enable: No
    type: mysql
    skip_events:
      insert:
#        - delphis_mediterranea.foo #skips inserts on the table delphis_mediterranea.foo
      delete:
#        - delphis_mediterranea #skips deletes on schema delphis_mediterranea
      update:

Step 3: Initialize the replica using this command:

$ chameleon create_replica_schema --debug

The above command creates a schema and nine tables in the PostgreSQL database that you specified in the .pg_chameleon/configuration/default.yml file. These tables are needed to manage replication from source to destination. The same can be observed in the following log.

db_replica=# \dn
List of schemas
Name | Owner
---------------+-------------
public | postgres
sch_chameleon | target_user
(2 rows)
db_replica=# \dt sch_chameleon.t_*
List of relations
Schema | Name | Type | Owner
---------------+------------------+-------+-------------
sch_chameleon | t_batch_events | table | target_user
sch_chameleon | t_discarded_rows | table | target_user
sch_chameleon | t_error_log | table | target_user
sch_chameleon | t_last_received | table | target_user
sch_chameleon | t_last_replayed | table | target_user
sch_chameleon | t_log_replica | table | target_user
sch_chameleon | t_replica_batch | table | target_user
sch_chameleon | t_replica_tables | table | target_user
sch_chameleon | t_sources | table | target_user
(9 rows)

Step 4: Add the source details to pg_chameleon using the following command. Provide the name of the source as specified in the configuration file. In this example, the source name is mysql and the target is postgres database defined under pg_conn.

$ chameleon add_source --config default --source mysql --debug

Once you run the above command, you should see that the source details are added to the t_sources table.

db_replica=# select * from sch_chameleon.t_sources;
-[ RECORD 1 ]-------+----------------------------------------------
i_id_source | 1
t_source | mysql
jsb_schema_mappings | {"sakila": "sch_sakila"}
enm_status | ready
t_binlog_name |
i_binlog_position |
b_consistent | t
b_paused | f
b_maintenance | f
ts_last_maintenance |
enm_source_type | mysql
v_log_table | {t_log_replica_mysql_1,t_log_replica_mysql_2}
$ chameleon show_status --config default
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql ready Yes N/A N/A

Step 5: Initialize the replica/slave using the following command. Specify the source from which you are replicating the changes to the PostgreSQL database.

$ chameleon init_replica --config default --source mysql --debug

Initialization involves the following tasks on the MySQL server (source).

1. Flush the tables with read lock
2. Get the master’s coordinates
3. Copy the data
4. Release the locks

The above command creates the target schema in your postgres database automatically.
In the default.yml file, we mentioned the following schema_mappings.

schema_mappings:
sakila: sch_sakila

So, now it created the new schema scott in the target database db_replica.

db_replica=# \dn
List of schemas
Name | Owner
---------------+-------------
public | postgres
sch_chameleon | usr_replica
sch_sakila | usr_replica
(3 rows)

Step 6: Now, start replication using the following command.

$ chameleon start_replica --config default --source mysql

Step 7: Check replication status and any errors using the following commands.

$ chameleon show_status --config default
$ chameleon show_errors

This is how the status looks:

$ chameleon show_status --source mysql
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql running No N/A N/A
== Schema mappings ==
Origin schema Destination schema
--------------- --------------------
sakila sch_sakila
== Replica status ==
--------------------- ---
Tables not replicated 0
Tables replicated 17
All tables 17
Last maintenance N/A
Next maintenance N/A
Replayed rows
Replayed DDL
Skipped rows

Now, you should see that the changes are continuously getting replicated from MySQL to PostgreSQL.

Step 8:  To validate, you may insert a record into the table in MySQL that we created for the purpose of validation and check that it is replicated to postgres.

$ mysql -u root -pSecret123! -e "INSERT INTO sakila.emp VALUES (1,'avinash','vallarapu')"
mysql: [Warning] Using a password on the command line interface can be insecure.
$ psql -d db_replica -c "select * from sch_sakila.emp"
 id | first_name | last_name
----+------------+-----------
  1 | avinash    | vallarapu
(1 row)

In the above log, we see that the record that was inserted to the MySQL table was replicated to the PostgreSQL table.

You may also add multiple sources for replication to PostgreSQL (target).

Reference : http://www.pgchameleon.org/documents/

Please refer to the above documentation to find out about the many more options that are available with pg_chameleon

The post Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon appeared first on Percona Database Performance Blog.

by Avinash Vallarapu at August 17, 2018 03:38 PM

Jean-Jerome Schmidt

How to Monitor your Database Servers using ClusterControl CLI

How would you like to merge "top" process for all your 5 database nodes and sort by CPU usage with just a one-liner command? Yeah, you read it right! How about interactive graphs display in the terminal interface? We introduced the CLI client for ClusterControl called s9s about a year ago, and it’s been a great complement to the web interface. It’s also open source..

In this blog post, we’ll show you how you can monitor your databases using your terminal and s9s CLI.

Introduction to s9s, The ClusterControl CLI

ClusterControl CLI (or s9s or s9s CLI), is an open source project and optional package introduced with ClusterControl version 1.4.1. It is a command line tool to interact, control and manage your database infrastructure using ClusterControl. The s9s command line project is open source and can be found on GitHub.

Starting from version 1.4.1, the installer script will automatically install the package (s9s-tools) on the ClusterControl node.

Some prerequisites. In order for you to run s9s-tools CLI, the following must be true:

  • A running ClusterControl Controller (cmon).
  • s9s client, install as a separate package.
  • Port 9501 must be reachable by the s9s client.

Installing the s9s CLI is straightforward if you install it on the ClusterControl Controller host itself:$ rm

$ rm -Rf ~/.s9s
$ wget http://repo.severalnines.com/s9s-tools/install-s9s-tools.sh
$ ./install-s9s-tools.sh

You can install s9s-tools outside of the ClusterControl server (your workstation laptop or bastion host), as long as the ClusterControl Controller RPC (TLS) interface is exposed to the public network (default to 127.0.0.1:9501). You can find more details on how to configure this in the documentation page.

To verify if you can connect to ClusterControl RPC interface correctly, you should get the OK response when running the following command:

$ s9s cluster --ping
PING OK 2.000 ms

As a side note, also look at the limitations when using this tool.

Example Deployment

Our example deployment consists of 8 nodes across 3 clusters:

  • PostgreSQL Streaming Replication - 1 master, 2 slaves
  • MySQL Replication - 1 master, 1 slave
  • MongoDB Replica Set - 1 primary, 2 secondary nodes

All database clusters were deployed by ClusterControl by using "Deploy Database Cluster" deployment wizard and from the UI point-of-view, this is what we would see in the cluster dashboard:

Cluster Monitoring

We will start by listing out the clusters:

$ s9s cluster --list --long
ID STATE   TYPE              OWNER  GROUP  NAME                   COMMENT
23 STARTED postgresql_single system admins PostgreSQL 10          All nodes are operational.
24 STARTED replication       system admins Oracle 5.7 Replication All nodes are operational.
25 STARTED mongodb           system admins MongoDB 3.6            All nodes are operational.

We see the same clusters as the UI. We can get more details on the particular cluster by using the --stat flag. Multiple clusters and nodes can also be monitored this way, the command line options can even use wildcards in the node and cluster names:

$ s9s cluster --stat *Replication
Oracle 5.7 Replication                                                                                                                                                                                               Name: Oracle 5.7 Replication              Owner: system/admins
      ID: 24                                  State: STARTED
    Type: REPLICATION                        Vendor: oracle 5.7
  Status: All nodes are operational.
  Alarms:  0 crit   1 warn
    Jobs:  0 abort  0 defnd  0 dequd  0 faild  7 finsd  0 runng
  Config: '/etc/cmon.d/cmon_24.cnf'
 LogFile: '/var/log/cmon_24.log'

                                                                                HOSTNAME    CPU   MEMORY   SWAP    DISK       NICs
                                                                                10.0.0.104 1  6% 992M 120M 0B 0B 19G 13G   10K/s 54K/s
                                                                                10.0.0.168 1  6% 992M 116M 0B 0B 19G 13G   11K/s 66K/s
                                                                                10.0.0.156 2 39% 3.6G 2.4G 0B 0B 19G 3.3G 338K/s 79K/s

The output above gives a summary of our MySQL replication together with the cluster status, state, vendor, configuration file and so on. Down the line, you can see the list of nodes that fall under this cluster ID with a summarized view of system resources for each host like number of CPUs, total memory, memory usage, swap disk and network interfaces. All information shown are retrieved from the CMON database, not directly from the actual nodes.

You can also get a summarized view of all databases on all clusters:

$ s9s  cluster --list-databases --long
SIZE        #TBL #ROWS     OWNER  GROUP  CLUSTER                DATABASE
  7,340,032    0         0 system admins PostgreSQL 10          postgres
  7,340,032    0         0 system admins PostgreSQL 10          template1
  7,340,032    0         0 system admins PostgreSQL 10          template0
765,460,480   24 2,399,611 system admins PostgreSQL 10          sbtest
          0  101         - system admins Oracle 5.7 Replication sys
Total: 5 databases, 789,577,728, 125 tables.

The last line summarizes that we have total of 5 databases with 125 tables, 4 of them are on our PostgreSQL cluster.

For a complete example of usage on s9s cluster command line options, check out s9s cluster documentation.

Node Monitoring

For nodes monitoring, s9s CLI has similar features with the cluster option. To get a summarized view of all nodes, you can simply do:

$ s9s node --list --long
STAT VERSION    CID CLUSTER                HOST       PORT  COMMENT
coC- 1.6.2.2662  23 PostgreSQL 10          10.0.0.156  9500 Up and running
poM- 10.4        23 PostgreSQL 10          10.0.0.44   5432 Up and running
poS- 10.4        23 PostgreSQL 10          10.0.0.58   5432 Up and running
poS- 10.4        23 PostgreSQL 10          10.0.0.60   5432 Up and running
soS- 5.7.23-log  24 Oracle 5.7 Replication 10.0.0.104  3306 Up and running.
coC- 1.6.2.2662  24 Oracle 5.7 Replication 10.0.0.156  9500 Up and running
soM- 5.7.23-log  24 Oracle 5.7 Replication 10.0.0.168  3306 Up and running.
mo-- 3.2.20      25 MongoDB 3.6            10.0.0.125 27017 Up and Running
mo-- 3.2.20      25 MongoDB 3.6            10.0.0.131 27017 Up and Running
coC- 1.6.2.2662  25 MongoDB 3.6            10.0.0.156  9500 Up and running
mo-- 3.2.20      25 MongoDB 3.6            10.0.0.35  27017 Up and Running
Total: 11

The most left-hand column specifies the type of the node. For this deployment, "c" represents ClusterControl Controller, 'p" for PostgreSQL, "m" for MongoDB, "e" for Memcached and s for generic MySQL nodes. The next one is the host status - "o" for online, "l" for off-line, "f" for failed nodes and so on. The next one is the role of the node in the cluster. It can be M for master, S for slave, C for controller and - for everything else. The remaining columns are pretty self-explanatory.

You can get all the list by looking at the man page of this component:

$ man s9s-node

From there, we can jump into a more detailed stats for all nodes with --stats flag:

$ s9s node --stat --cluster-id=24
 10.0.0.104:3306
    Name: 10.0.0.104              Cluster: Oracle 5.7 Replication (24)
      IP: 10.0.0.104                 Port: 3306
   Alias: -                         Owner: system/admins
   Class: CmonMySqlHost              Type: mysql
  Status: CmonHostOnline             Role: slave
      OS: centos 7.0.1406 core     Access: read-only
   VM ID: -
 Version: 5.7.23-log
 Message: Up and running.
LastSeen: Just now                    SSH: 0 fail(s)
 Connect: y Maintenance: n Managed: n Recovery: n Skip DNS: y SuperReadOnly: n
     Pid: 16592  Uptime: 01:44:38
  Config: '/etc/my.cnf'
 LogFile: '/var/log/mysql/mysqld.log'
 PidFile: '/var/lib/mysql/mysql.pid'
 DataDir: '/var/lib/mysql/'
 10.0.0.168:3306
    Name: 10.0.0.168              Cluster: Oracle 5.7 Replication (24)
      IP: 10.0.0.168                 Port: 3306
   Alias: -                         Owner: system/admins
   Class: CmonMySqlHost              Type: mysql
  Status: CmonHostOnline             Role: master
      OS: centos 7.0.1406 core     Access: read-write
   VM ID: -
 Version: 5.7.23-log
 Message: Up and running.
  Slaves: 10.0.0.104:3306
LastSeen: Just now                    SSH: 0 fail(s)
 Connect: n Maintenance: n Managed: n Recovery: n Skip DNS: y SuperReadOnly: n
     Pid: 975  Uptime: 01:52:53
  Config: '/etc/my.cnf'
 LogFile: '/var/log/mysql/mysqld.log'
 PidFile: '/var/lib/mysql/mysql.pid'
 DataDir: '/var/lib/mysql/'
 10.0.0.156:9500
    Name: 10.0.0.156              Cluster: Oracle 5.7 Replication (24)
      IP: 10.0.0.156                 Port: 9500
   Alias: -                         Owner: system/admins
   Class: CmonHost                   Type: controller
  Status: CmonHostOnline             Role: controller
      OS: centos 7.0.1406 core     Access: read-write
   VM ID: -
 Version: 1.6.2.2662
 Message: Up and running
LastSeen: 28 seconds ago              SSH: 0 fail(s)
 Connect: n Maintenance: n Managed: n Recovery: n Skip DNS: n SuperReadOnly: n
     Pid: 12746  Uptime: 01:10:05
  Config: ''
 LogFile: '/var/log/cmon_24.log'
 PidFile: ''
 DataDir: ''

Printing graphs with the s9s client can also be very informative. This presents the data the controller collected in various graphs. There are almost 30 graphs supported by this tool as listed here and s9s-node enumerates them all. The following shows server load histogram of all nodes for cluster ID 1 as collected by CMON, right from your terminal:

It is possible to set the start and end date and time. One can view short periods (like the last hour) or longer periods (like a week or a month). The following is an example of viewing the disk utilization for the last hour:

Using the --density option, a different view can be printed for every graph. This density graph shows not the time series, but how frequently the given values were seen (X-axis represents the density value):

If the terminal does not support Unicode characters, the --only-ascii option can switch them off:

The graphs have colors, where dangerously high values for example are shown in red. The list of nodes can be filtered with --nodes option, where you can specify the node names or use wildcards if convenient.

Process Monitoring

Another cool thing about s9s CLI is it provides a processlist of the entire cluster - a “top” for all nodes, all processes merged into one. The following command runs the "top" command on all database nodes for cluster ID 24, sorted by the most CPU consumption, and updated continuously:

$ s9s process --top --cluster-id=24
Oracle 5.7 Replication - 04:39:17                                                                                                                                                      All nodes are operational.
3 hosts, 4 cores, 10.6 us,  4.2 sy, 84.6 id,  0.1 wa,  0.3 st,
GiB Mem : 5.5 total, 1.7 free, 2.6 used, 0.1 buffers, 1.1 cached
GiB Swap: 0 total, 0 used, 0 free,

PID   USER     HOST       PR  VIRT      RES    S   %CPU   %MEM COMMAND
12746 root     10.0.0.156 20  1359348    58976 S  25.25   1.56 cmon
 1587 apache   10.0.0.156 20   462572    21632 S   1.38   0.57 httpd
  390 root     10.0.0.156 20     4356      584 S   1.32   0.02 rngd
  975 mysql    10.0.0.168 20  1144260    71936 S   1.11   7.08 mysqld
16592 mysql    10.0.0.104 20  1144808    75976 S   1.11   7.48 mysqld
22983 root     10.0.0.104 20   127368     5308 S   0.92   0.52 sshd
22548 root     10.0.0.168 20   127368     5304 S   0.83   0.52 sshd
 1632 mysql    10.0.0.156 20  3578232  1803336 S   0.50  47.65 mysqld
  470 proxysql 10.0.0.156 20   167956    35300 S   0.44   0.93 proxysql
  338 root     10.0.0.104 20     4304      600 S   0.37   0.06 rngd
  351 root     10.0.0.168 20     4304      600 R   0.28   0.06 rngd
   24 root     10.0.0.156 20        0        0 S   0.19   0.00 rcu_sched
  785 root     10.0.0.156 20   454112    11092 S   0.13   0.29 httpd
   26 root     10.0.0.156 20        0        0 S   0.13   0.00 rcuos/1
   25 root     10.0.0.156 20        0        0 S   0.13   0.00 rcuos/0
22498 root     10.0.0.168 20   127368     5200 S   0.09   0.51 sshd
14538 root     10.0.0.104 20        0        0 S   0.09   0.00 kworker/0:1
22933 root     10.0.0.104 20   127368     5200 S   0.09   0.51 sshd
28295 root     10.0.0.156 20   127452     5016 S   0.06   0.13 sshd
 2238 root     10.0.0.156 20   197520    10444 S   0.06   0.28 vc-agent-007
  419 root     10.0.0.156 20    34764     1660 S   0.06   0.04 systemd-logind
    1 root     10.0.0.156 20    47628     3560 S   0.06   0.09 systemd
27992 proxysql 10.0.0.156 20    11688      872 S   0.00   0.02 proxysql_galera
28036 proxysql 10.0.0.156 20    11688      876 S   0.00   0.02 proxysql_galera

There is also a --list flag which returns a similar result without continuous update (similar to "ps" command):

$ s9s process --list --cluster-id=25

Job Monitoring

Jobs are tasks performed by the controller in the background, so that the client application does not need to wait until the entire job is finished. ClusterControl executes management tasks by assigning an ID for every task and lets the internal scheduler decide whether two or more jobs can be run in parallel. For example, more than one cluster deployment can be executed simultaneously, as well as other long running operations like backup and automatic upload of backups to cloud storage.

In any management operation, it's would be helpful if we could monitor the progress and status of a specific job, like e.g., scale out a new slave for our MySQL replication. The following command add a new slave, 10.0.0.77 to scale out our MySQL replication:

$ s9s cluster --add-node --nodes="10.0.0.77" --cluster-id=24
Job with ID 66992 registered.

We can then monitor the jobID 66992 using the job option:

$ s9s job --log --job-id=66992
addNode: Verifying job parameters.
10.0.0.77:3306: Adding host to cluster.
10.0.0.77:3306: Testing SSH to host.
10.0.0.77:3306: Installing node.
10.0.0.77:3306: Setup new node (installSoftware = true).
10.0.0.77:3306: Setting SELinux in permissive mode.
10.0.0.77:3306: Disabling firewall.
10.0.0.77:3306: Setting vm.swappiness = 1
10.0.0.77:3306: Installing software.
10.0.0.77:3306: Setting up repositories.
10.0.0.77:3306: Installing helper packages.
10.0.0.77: Upgrading nss.
10.0.0.77: Upgrading ca-certificates.
10.0.0.77: Installing socat.
...
10.0.0.77: Installing pigz.
10.0.0.77: Installing bzip2.
10.0.0.77: Installing iproute2.
10.0.0.77: Installing tar.
10.0.0.77: Installing openssl.
10.0.0.77: Upgrading openssl openssl-libs.
10.0.0.77: Finished with helper packages.
10.0.0.77:3306: Verifying helper packages (checking if socat is installed successfully).
10.0.0.77:3306: Uninstalling existing MySQL packages.
10.0.0.77:3306: Installing replication software, vendor oracle, version 5.7.
10.0.0.77:3306: Installing software.
...

Or we can use the --wait flag and get a spinner with progress bar:

$ s9s job --wait --job-id=66992
Add Node to Cluster
- Job 66992 RUNNING    [         █] ---% Add New Node to Cluster

That's it for today's monitoring supplement. We hope that you’ll give the CLI a try and get value out of it. Happy clustering!

by ashraf at August 17, 2018 01:05 PM

Peter Zaitsev

This Week in Data with Colin Charles 49: MongoDB Conference Opportunities and Serverless Aurora MySQL

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Beyond the MongoDB content that will be at Percona Live Europe 2018, there is also a bit of an agenda for MongoDB Europe 2018, happening on November 8 in London—a day after Percona Live in Frankfurt. I expect you’ll see a diverse set of MongoDB content at Percona Live.

The Percona Live Europe Call for Papers closes TODAY! (Friday August 17, 2018)

From Amazon, there have been some good MySQL changes. You now have access to time delayed replication as a strategy for your High Availability and disaster recovery. This works with versions 5.7.22, 5.6.40 and later. It is worth noting that this isn’t documented as working for MariaDB (yet?). It arrived in MariaDB Server in 10.2.3.

Another MySQL change from Amazon? Aurora Serverless MySQL is now generally available. You can build and run applications without thinking about instances: previously, the database function was not all that focused on serverless. This on-demand auto-scaling serverless Aurora should be fun to use. Only Aurora MySQL 5.6 is supported at the moment and also, be aware that this is not available in all regions yet (e.g. Singapore).

Releases

  • pgmetrics is described as an open-source, zero-dependency, single-binary tool that can collect a lot of information and statistics from a running PostgreSQL server and display it in easy-to-read text format or export it as JSON for scripting.
  • PostgreSQL 10.5, 9.6.10, 9.5.14, 9.4.19, 9.3.24, And 11 Beta 3 has two fixed security vulnerabilities may inspire an upgrade.

Link List

Industry Updates

  • Martin Arrieta (LinkedIn) is now a Site Reliability Engineer at Fastly. Formerly of Pythian and Percona.
  • Ivan Zoratti (LinkedIn) is now Director of Product Management at Neo4j. He was previously on founding teams, was the CTO of MariaDB Corporation (then SkySQL), and is a long time MySQL veteran.

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 49: MongoDB Conference Opportunities and Serverless Aurora MySQL appeared first on Percona Database Performance Blog.

by Colin Charles at August 17, 2018 12:49 PM

August 16, 2018

Peter Zaitsev

MongoDB: how to use the JSON Schema Validator

JSON Schema Validator for MongoDB

JSON Schema Validator for MongoDBThe flexibility of MongoDB as a schemaless database is one of its strengths. In early versions, it was left to application developers to ensure that any necessary data validation is implemented. With the introduction of JSON Schema Validator there are new techniques to enforce data integrity for MongoDB. In this article, we use examples to show you how to use the JSON Schema Validator to introduce validation checks at the database level—and consider the pros and cons of doing so.

Why validate?

MongoDB is a schemaless database. This means that we don’t have to define a fixed schema for a collection. We just need to insert a JSON document into a collection and that’s all. Documents in the same collection can have a completely different set of fields, and even the same fields can have different types on different documents. The same object can be a string in some documents and can be a number in other documents.

The schemaless feature has given MongoDB great flexibility and the capability to adapt the database to the changing needs of applications. Let’s say that this flexibility is one of the main reasons to use MongoDB. Relational databases are not so flexible: you always need to define a schema at first. Then, when you need to add new columns, create new tables or change existing architecture to respond to the needs of the applications it’s sometimes a very hard task.

The real world can often be messy and MongoDB can really help, but in most cases the real world requires some kind of backbone architecture too. In real applications built on MongoDB there is always some kind of “fixed schema” or “validation rules” in collections and in documents. It’s possible to have in a collection two documents that represent two completely different things.

Well, it’s technically possible, but it doesn’t make sense in most cases for the application. Most of the arguments for enforcing a schema on the data are well known: schemas maintain structure, giving a clear idea of what’s going into the database, reducing preventable bugs and allowing for cleaner code. Schemas are a form of self-documenting code, as they describe exactly what type of data something should be, and they let you know what checks will be performed. It’s good to be flexible, but behind the scenes we need some strong regulations.

So, what we need to do is to find a balance between flexibility and schema validation. In real world applications, we need to define a sort of “backbone schema” for our data and retain the possibility to be flexible to manage specific particularities. In the past developers implemented schema validation in their applications, but starting from version 3.6, MongoDB supports the JSON Schema Validator. We can rely on it to define a fixed schema and validation rules directly into the database and free the applications to take care of it.

Let’s have a look at how it works.

JSON Schema Validator

In fact, a “Validation Schema” was already introduced in 3.2 but the new “JSON Schema Validator” introduced in the 3.6 release is by far the best and a friendly way to manage validations in MongoDB.

What we need to do is to define the rules using the operator $jsonSchema in the db.createCollection command. The $jsonSchema operator requires a JSON document where we specify all the rules to be applied on each inserted or updated document: for example what are the required fields, what type the fields must be, what are the ranges of the values, what pattern a specific field must have, and so on.

Let’s have a look at the following example where we create a collection people defining validation rules with JSON Schema Validator.

db.createCollection( "people" , {
   validator: { $jsonSchema: {
      bsonType: "object",
      required: [ "name", "surname", "email" ],
      properties: {
         name: {
            bsonType: "string",
            description: "required and must be a string" },
         surname: {
            bsonType: "string",
            description: "required and must be a string" },
         email: {
            bsonType: "string",
            pattern: "^.+\@.+$",
            description: "required and must be a valid email address" },
         year_of_birth: {
            bsonType: "int",
            minimum: 1900,
            maximum: 2018,
            description: "the value must be in the range 1900-2018" },
         gender: {
            enum: [ "M", "F" ],
            description: "can be only M or F" }
      }
   }
}})

Based on what we have defined, only 3 fields are strictly required in every document of the collection: name, surname, and email. In particular, the email field must have a specific pattern to be sure the content is a valid address. (Note: to validate an email address you need a more complex regular expression, here we just use a simpler version just to check there is the @ symbol). Other fields are not required but in case someone inserts them, we have defined a validation rule.

Let’s try to do some example inserting documents to test if everything is working as expected.

Insert a document with one of the required fields missing:

MongoDB > db.people.insert( { name : "John", surname : "Smith" } )
    WriteResult({
      "nInserted" : 0,
      "writeError" : {
      "code" : 121,
      "errmsg" : "Document failed validation"
   }
})

Insert a document with all the required fields but with an invalid email address

MongoDB > db.people.insert( { name : "John", surname : "Smith", email : "john.smith.gmail.com" } )
   WriteResult({
      "nInserted" : 0,
      "writeError" : {
      "code" : 121,
      "errmsg" : "Document failed validation"
   }
})

Finally, insert a valid document

MongoDB > db.people.insert( { name : "John", surname : "Smith", email : "john.smith@gmail.com" } )
WriteResult({ "nInserted" : 1 })

Let’s try now to do more inserts including of other fields.

MongoDB > db.people.insert( { name : "Bruce", surname : "Dickinson", email : "bruce@gmail.com", year_of_birth : NumberInt(1958), gender : "M" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.insert( { name : "Corrado", surname : "Pandiani", email : "corrado.pandiani@percona.com", year_of_birth : NumberInt(1971), gender : "M" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.insert( { name : "Marie", surname : "Adamson", email : "marie@gmail.com", year_of_birth : NumberInt(1992), gender : "F" } )
WriteResult({ "nInserted" : 1 })

The records were inserted correctly because all the rules on the required fields, and on the other not required fields, were satisfied. Let’s see now a case where the year_of_birth or gender fields are not correct.

MongoDB > db.people.insert( { name : "Tom", surname : "Tom", email : "tom@gmail.com", year_of_birth : NumberInt(1980), gender : "X" } )
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
MongoDB > db.people.insert( { name : "Luise", surname : "Luise", email : "tom@gmail.com", year_of_birth : NumberInt(1899), gender : "F" } )
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})

In the first query gender is X, but the valid values are only M or F. In the second query year of birth is outside the permitted range.

Let’s try now to insert documents with arbitrary extra fields that are not in the JSON Schema Validator.

MongoDB > db.people.insert( { name : "Tom", surname : "Tom", email : "tom@gmail.com", year_of_birth : NumberInt(2000), gender : "M", shirt_size : "XL", preferred_band : "Coldplay" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.insert( { name : "Luise", surname : "Luise", email : "tom@gmail.com", gender : "F", shirt_size : "M", preferred_band : "Maroon Five" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.find().pretty()
{
"_id" : ObjectId("5b6b12e0f213dc83a7f5b5e8"),
"name" : "John",
"surname" : "Smith",
"email" : "john.smith@gmail.com"
}
{
"_id" : ObjectId("5b6b130ff213dc83a7f5b5e9"),
"name" : "Bruce",
"surname" : "Dickinson",
"email" : "bruce@gmail.com",
"year_of_birth" : 1958,
"gender" : "M"
}
{
"_id" : ObjectId("5b6b1328f213dc83a7f5b5ea"),
"name" : "Corrado",
"surname" : "Pandiani",
"email" : "corrado.pandiani@percona.com",
"year_of_birth" : 1971,
"gender" : "M"
}
{
"_id" : ObjectId("5b6b1356f213dc83a7f5b5ed"),
"name" : "Marie",
"surname" : "Adamson",
"email" : "marie@gmail.com",
"year_of_birth" : 1992,
"gender" : "F"
}
{
"_id" : ObjectId("5b6b1455f213dc83a7f5b5f0"),
"name" : "Tom",
"surname" : "Tom",
"email" : "tom@gmail.com",
"year_of_birth" : 2000,
"gender" : "M",
"shirt_size" : "XL",
"preferred_band" : "Coldplay"
}
{
"_id" : ObjectId("5b6b1476f213dc83a7f5b5f1"),
"name" : "Luise",
"surname" : "Luise",
"email" : "tom@gmail.com",
"gender" : "F",
"shirt_size" : "M",
"preferred_band" : "Maroon Five"
}

As we can see, we have the flexibility to add new fields with no restrictions on the permitted values.

Having a really fixed schema

The behavior we have seen so far to permit the addition of extra fields that are not in the validation rules is the default. If we would like to be more restrictive and have a really fixed schema for the collection we need to add the additionalProperties: false parameter in the createCollection command.

In the following example, we create a validator to permit only the required fields. No other extra fields are permitted.

db.createCollection( "people2" , {
   validator: {
     $jsonSchema: {
        bsonType: "object",
        additionalProperties: false,
        properties: {
           _id : {
              bsonType: "objectId" },
           name: {
              bsonType: "string",
              description: "required and must be a string" },
           age: {
              bsonType: "int",
              minimum: 0,
              maximum: 100,
              description: "required and must be in the range 0-100" }
        }
     }
}})

Note a couple of differences:

  • we don’t need to specify the list of required fields; using additionalProperties: false forces all the fields to be required by default
  • we need to put explicitly even the _id field

As you can notice in the following test, we are no longer allowed to add extra fields. We are forced to insert documents always with the same two fields name and age.

MongoDB > db.people2.insert( {name : "George", age: NumberInt(30)} )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people2.insert( {name : "Maria", age: NumberInt(35), surname: "Peterson"} )
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})

In this case we don’t have flexibility, and that is the main benefit of having a NoSQL database like MongoDB.

Well, it’s up to you to use it or not. It depends on the nature and goals of your application. I wouldn’t recommend it in most cases.

Add validation to existing collections

We have seen so far how to create a new collection with validation rules, But what about the existing collections? How can we add rules?

This is quite trivial. The syntax to use in $jsonSchema remains the same, we just need to use the collMod command instead of createCollection. The following example shows how to create validation rules on an existing collection.

First we create a simple new collection people3, inserting some documents.

MongoDB > db.people3.insert( {name: "Corrado", surname: "Pandiani", year_of_birth: NumberLong(1971)} )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people3.insert( {name: "Tom", surname: "Cruise", year_of_birth: NumberLong(1961), gender: "M"} )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people3.insert( {name: "Kevin", surname: "Bacon", year_of_birth: NumberLong(1964), gender: "M", shirt_size: "L"} )
WriteResult({ "nInserted" : 1 })

Let’s create the validator.

MongoDB > db.runCommand( { collMod: "people3",
   validator: {
      $jsonSchema : {
         bsonType: "object",
         required: [ "name", "surname", "gender" ],
         properties: {
            name: {
               bsonType: "string",
               description: "required and must be a string" },
            surname: {
               bsonType: "string",
               description: "required and must be a string" },
            gender: {
               enum: [ "M", "F" ],
               description: "required and must be M or F" }
         }
       }
},
validationLevel: "moderate",
validationAction: "warn"
})

The two new options validationLevel and validationAction are important in this case.

validationLevel can have the following values:

  • “off” : validation is not applied
  • “strict”: it’s the default value. Validation applies to all inserts and updates
  • “moderated”: validation applies to all the valid existing documents. Not valid documents are ignored.

When creating validation rules on existing collections, the “moderated” value is the safest option.

validationAction can have the following values:

  • “error”: it’s the default value. The document must pass the validation in order to be written
  • “warn”: a document that doesn’t pass the validation is written but a warning message is logged

When adding validation rules to an existing collection the safest option is “warn”

These two options can be applied even with createCollection. We didn’t use them because the default values are good in most of the cases.

How to investigate a collection definition

In case we want to see how a collection was defined, and, in particular, what the validator rules are, the command db.getCollectionInfos() can be used. The following example shows how to investigate the “schema” we have created for the people collection.

MongoDB > db.getCollectionInfos( {name: "people"} )
[
  {
    "name" : "people",
    "type" : "collection",
    "options" : {
      "validator" : {
        "$jsonSchema" : {
          "bsonType" : "object",
          "required" : [
            "name",
            "surname",
            "email"
          ],
          "properties" : {
            "name" : {
              "bsonType" : "string",
              "description" : "required and must be a string"
            },
            "surname" : {
              "bsonType" : "string",
              "description" : "required and must be a string"
            },
            "email" : {
              "bsonType" : "string",
              "pattern" : "^.+@.+$",
              "description" : "required and must be a valid email address"
             },
             "year_of_birth" : {
               "bsonType" : "int",
               "minimum" : 1900,
               "maximum" : 2018,
               "description" : "the value must be in the range 1900-2018"
             },
             "gender" : {
               "enum" : [
                 "M",
                 "F"
               ],
             "description" : "can be only M or F"
        }
      }
    }
  }
},
"info" : {
  "readOnly" : false,
  "uuid" : UUID("5b98c6f0-2c9e-4c10-a3f8-6c1e7eafd2b4")
},
"idIndex" : {
  "v" : 2,
  "key" : {
    "_id" : 1
  },
"name" : "_id_",
"ns" : "test.people"
}
}
]

Limitations and restrictions

Validators cannot be defined for collections in the following databases: admin, local, config.

Validators cannot be defined for system.* collections.

A limitation in the current implementation of JSON Schema Validator is that the error messages are not very good in terms of helping you to understand which of the rules are not satisfied by the document. This should be confirmed manually by doing some tests, and that’s not so easy when dealing with complex documents. Having more specific error strings, hopefully taken from the validator definition, could be very useful when debugging application errors and warnings. This is definitely something that should be improved in the next releases.

While waiting for improvements, someone has developed a wrapper for the mongo client to gather more defined error strings. You can have a look at https://www.npmjs.com/package/mongo-schemer. You can test it and use it, but pay attention to the clause “Running in prod is not recommended due to the overhead of validating documents against the schema“.

Conclusions

Doing schema validation in the application remains, in general, a best practice, but JSON Schema Validator is a good tool to enforce validation directly into the database.

Hence even though it needs some improvements, the JSON Schema feature is good enough for most of the common cases. We suggest to test it and use it when you really need to create a backbone structure for your data.

While you are here…

You might also enjoy these other articles about MongoDB 3.6

 

The post MongoDB: how to use the JSON Schema Validator appeared first on Percona Database Performance Blog.

by Corrado Pandiani at August 16, 2018 10:57 AM

Jean-Jerome Schmidt

MySQL on Docker: How to Monitor MySQL Containers with Prometheus - Part 1 - Deployment on Standalone and Swarm

Monitoring is a concern for containers, as the infrastructure is dynamic. Containers can be routinely created and destroyed, and are ephemeral. So how do you keep track of your MySQL instances running on Docker?

As with any software component, there are many options out there that can be used. We’ll look at Prometheus as a solution built for distributed infrastructure, and works very well with Docker.

This is a two-part blog. In this part 1 blog, we are going to cover the deployment aspect of our MySQL containers with Prometheus and its components, running as standalone Docker containers and Docker Swarm services. In part 2, we will look at the important metrics to monitor from our MySQL containers, as well as integration with the paging and notification systems.

Introduction to Prometheus

Prometheus is a full monitoring and trending system that includes built-in and active scraping, storing, querying, graphing, and alerting based on time series data. Prometheus collects metrics through pull mechanism from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true. It supports all the target metrics that we want to measure if one would like to run MySQL as Docker containers. Those metrics include physical hosts metrics, Docker container metrics and MySQL server metrics.

Take a look at the following diagram which illustrates Prometheus architecture (taken from Prometheus official documentation):

We are going to deploy some MySQL containers (standalone and Docker Swarm) complete with a Prometheus server, MySQL exporter (i.e., a Prometheus agent to expose MySQL metrics, that can then be scraped by the Prometheus server) and also Alertmanager to handle alerts based on the collected metrics.

For more details check out the Prometheus documentation. In this example, we are going to use the official Docker images provided by the Prometheus team.

Standalone Docker

Deploying MySQL Containers

Let's run two standalone MySQL servers on Docker to simplify our deployment walkthrough. One container will be using the latest MySQL 8.0 and the other one is MySQL 5.7. Both containers are in the same Docker network called "db_network":

$ docker network create db_network
$ docker run -d \
--name mysql80 \
--publish 3306 \
--network db_network \
--restart unless-stopped \
--env MYSQL_ROOT_PASSWORD=mypassword \
--volume mysql80-datadir:/var/lib/mysql \
mysql:8 \
--default-authentication-plugin=mysql_native_password

MySQL 8 defaults to a new authentication plugin called caching_sha2_password. For compatibility with Prometheus MySQL exporter container, let's use the widely-used mysql_native_password plugin whenever we create a new MySQL user on this server.

For the second MySQL container running 5.7, we execute the following:

$ docker run -d \
--name mysql57 \
--publish 3306 \
--network db_network \
--restart unless-stopped \
--env MYSQL_ROOT_PASSWORD=mypassword \
--volume mysql57-datadir:/var/lib/mysql \
mysql:5.7

Verify if our MySQL servers are running OK:

[root@docker1 mysql]# docker ps | grep mysql
cc3cd3c4022a        mysql:5.7           "docker-entrypoint.s…"   12 minutes ago      Up 12 minutes       0.0.0.0:32770->3306/tcp   mysql57
9b7857c5b6a1        mysql:8             "docker-entrypoint.s…"   14 minutes ago      Up 14 minutes       0.0.0.0:32769->3306/tcp   mysql80

At this point, our architecture is looking something like this:

Let's get started to monitor them.

Exposing Docker Metrics to Prometheus

Docker has built-in support as Prometheus target, where we can use to monitor the Docker engine statistics. We can simply enable it by creating a text file called "daemon.json" inside the Docker host:

$ vim /etc/docker/daemon.json

And add the following lines:

{
  "metrics-addr" : "12.168.55.161:9323",
  "experimental" : true
}

Where 192.168.55.161 is the Docker host primary IP address. Then, restart Docker daemon to load the change:

$ systemctl restart docker

Since we have defined --restart=unless-stopped in our MySQL containers' run command, the containers will be automatically started after Docker is running.

Deploying MySQL Exporter

Before we move further, the mysqld exporter requires a MySQL user to be used for monitoring purposes. On our MySQL containers, create the monitoring user:

$ docker exec -it mysql80 mysql -uroot -p
Enter password:
mysql> CREATE USER 'exporter'@'%' IDENTIFIED BY 'exporterpassword' WITH MAX_USER_CONNECTIONS 3;
mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';

Take note that it is recommended to set a max connection limit for the user to avoid overloading the server with monitoring scrapes under heavy load. Repeat the above statements onto the second container, mysql57:

$ docker exec -it mysql57 mysql -uroot -p
Enter password:
mysql> CREATE USER 'exporter'@'%' IDENTIFIED BY 'exporterpassword' WITH MAX_USER_CONNECTIONS 3;
mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';

Let's run the mysqld exporter container called "mysql8-exporter" to expose the metrics for our MySQL 8.0 instance as below:

$ docker run -d \
--name mysql80-exporter \
--publish 9104 \
--network db_network \
--restart always \
--env DATA_SOURCE_NAME="exporter:exporterpassword@(mysql80:3306)/" \
prom/mysqld-exporter:latest \
--collect.info_schema.processlist \
--collect.info_schema.innodb_metrics \
--collect.info_schema.tablestats \
--collect.info_schema.tables \
--collect.info_schema.userstats \
--collect.engine_innodb_status

And also another exporter container for our MySQL 5.7 instance:

$ docker run -d \
--name mysql57-exporter \
--publish 9104 \
--network db_network \
--restart always \
-e DATA_SOURCE_NAME="exporter:exporterpassword@(mysql57:3306)/" \
prom/mysqld-exporter:latest \
--collect.info_schema.processlist \
--collect.info_schema.innodb_metrics \
--collect.info_schema.tablestats \
--collect.info_schema.tables \
--collect.info_schema.userstats \
--collect.engine_innodb_status

We enabled a bunch of collector flags for the container to expose the MySQL metrics. You can also enable --collect.slave_status, --collect.slave_hosts if you have a MySQL replication running on containers.

We should be able to retrieve the MySQL metrics via curl from the Docker host directly (port 32771 is the published port assigned automatically by Docker for container mysql80-exporter):

$ curl 127.0.0.1:32771/metrics
...
mysql_info_schema_threads_seconds{state="waiting for lock"} 0
mysql_info_schema_threads_seconds{state="waiting for table flush"} 0
mysql_info_schema_threads_seconds{state="waiting for tables"} 0
mysql_info_schema_threads_seconds{state="waiting on cond"} 0
mysql_info_schema_threads_seconds{state="writing to net"} 0
...
process_virtual_memory_bytes 1.9390464e+07

At this point, our architecture is looking something like this:

We are now good to setup the Prometheus server.

Deploying Prometheus Server

Firstly, create Prometheus configuration file at ~/prometheus.yml and add the following lines:

$ vim ~/prometheus.yml
global:
  scrape_interval:     5s
  scrape_timeout:      3s
  evaluation_interval: 5s

# Our alerting rule files
rule_files:
  - "alert.rules"

# Scrape endpoints
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'mysql'
    static_configs:
      - targets: ['mysql57-exporter:9104','mysql80-exporter:9104']

  - job_name: 'docker'
    static_configs:
      - targets: ['192.168.55.161:9323']

From the Prometheus configuration file, we have defined three jobs - "prometheus", "mysql" and "docker". The first one is the job to monitor the Prometheus server itself. The next one is the job to monitor our MySQL containers named "mysql". We define the endpoints on our MySQL exporters on port 9104, which exposed the Prometheus-compatible metrics from the MySQL 8.0 and 5.7 instances respectively. The "alert.rules" is the rule file that we will include later in the next blog post for alerting purposes.

We can then map the configuration with the Prometheus container. We also need to create a Docker volume for Prometheus data for persistency and also expose port 9090 publicly:

$ docker run -d \
--name prometheus-server \
--publish 9090:9090 \
--network db_network \
--restart unless-stopped \
--mount type=volume,src=prometheus-data,target=/prometheus \
--mount type=bind,src="$(pwd)"/prometheus.yml,target=/etc/prometheus/prometheus.yml \
--mount type=bind,src="$(pwd)
prom/prometheus

Now our Prometheus server is already running and can be accessed directly on port 9090 of the Docker host. Open a web browser and go to http://192.168.55.161:9090/ to access the Prometheus web UI. Verify the target status under Status -> Targets and make sure they are all green:

At this point, our container architecture is looking something like this:

Our Prometheus monitoring system for our standalone MySQL containers are now deployed.

Docker Swarm

Deploying a 3-node Galera Cluster

Supposed we want to deploy a three-node Galera Cluster in Docker Swarm, we would have to create 3 different services, each service representing one Galera node. Using this approach, we can keep a static resolvable hostname for our Galera container, together with MySQL exporter containers that will accompany each of them. We will be using MariaDB 10.2 image maintained by the Docker team to run our Galera cluster.

Firstly, create a MySQL configuration file to be used by our Swarm service:

$ vim ~/my.cnf
[mysqld]

default_storage_engine          = InnoDB
binlog_format                   = ROW

innodb_flush_log_at_trx_commit  = 0
innodb_flush_method             = O_DIRECT
innodb_file_per_table           = 1
innodb_autoinc_lock_mode        = 2
innodb_lock_schedule_algorithm  = FCFS # MariaDB >10.1.19 and >10.2.3 only

wsrep_on                        = ON
wsrep_provider                  = /usr/lib/galera/libgalera_smm.so
wsrep_sst_method                = mariabackup

Create a dedicated database network in our Swarm called "db_swarm":

$ docker network create --driver overlay db_swarm

Import our MySQL configuration file into Docker config so we can load it into our Swarm service when we create it later:

$ cat ~/my.cnf | docker config create my-cnf -

Create the first Galera bootstrap service, with "gcomm://" as the cluster address called "galera0". This is a transient service for bootstrapping process only. We will delete this service once we have gotten 3 other Galera services running:

$ docker service create \
--name galera0 \
--replicas 1 \
--hostname galera0 \
--network db_swarm \
--publish 3306 \
--publish 4444 \
--publish 4567 \
--publish 4568 \
--config src=my-cnf,target=/etc/mysql/mariadb.conf.d/my.cnf \
--env MYSQL_ROOT_PASSWORD=mypassword \
--mount type=volume,src=galera0-datadir,dst=/var/lib/mysql \
mariadb:10.2 \
--wsrep_cluster_address=gcomm:// \
--wsrep_sst_auth="root:mypassword" \
--wsrep_node_address=galera0

At this point, our database architecture can be illustrated as below:

Then, repeat the following command for 3 times to create 3 different Galera services. Replace {name} with galera1, galera2 and galera3 respectively:

$ docker service create \
--name {name} \
--replicas 1 \
--hostname {name} \
--network db_swarm \
--publish 3306 \
--publish 4444 \
--publish 4567 \
--publish 4568 \
--config src=my-cnf,target=/etc/mysql/mariadb.conf.d/my.cnf \
--env MYSQL_ROOT_PASSWORD=mypassword \
--mount type=volume,src={name}-datadir,dst=/var/lib/mysql \
mariadb:10.2 \
--wsrep_cluster_address=gcomm://galera0,galera1,galera2,galera3 \
--wsrep_sst_auth="root:mypassword" \
--wsrep_node_address={name}

Verify our current Docker services:

$ docker service ls 
ID                  NAME                MODE                REPLICAS            IMAGE               PORTS
wpcxye3c4e9d        galera0             replicated          1/1                 mariadb:10.2        *:30022->3306/tcp, *:30023->4444/tcp, *:30024-30025->4567-4568/tcp
jsamvxw9tqpw        galera1             replicated          1/1                 mariadb:10.2        *:30026->3306/tcp, *:30027->4444/tcp, *:30028-30029->4567-4568/tcp
otbwnb3ridg0        galera2             replicated          1/1                 mariadb:10.2        *:30030->3306/tcp, *:30031->4444/tcp, *:30032-30033->4567-4568/tcp
5jp9dpv5twy3        galera3             replicated          1/1                 mariadb:10.2        *:30034->3306/tcp, *:30035->4444/tcp, *:30036-30037->4567-4568/tcp

Our architecture is now looking something like this:

We need to remove the Galera bootstrap Swarm service, galera0, to stop it from running because if the container is being rescheduled by Docker Swarm, a new replica will be started with a fresh new volume. We run the risk of data loss because the --wsrep_cluster_address contains "galera0" in the other Galera nodes (or Swarm services). So, let's remove it:

$ docker service rm galera0

At this point, we have our three-node Galera Cluster:

We are now ready to deploy our MySQL exporter and Prometheus Server.

MySQL Exporter Swarm Service

Login to one of the Galera nodes and create the exporter user with proper privileges:

$ docker exec -it {galera1} mysql -uroot -p
Enter password:
mysql> CREATE USER 'exporter'@'%' IDENTIFIED BY 'exporterpassword' WITH MAX_USER_CONNECTIONS 3;
mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';

Then, create the exporter service for each of the Galera services (replace {name} with galera1, galera2 and galera3 respectively):

$ docker service create \
--name {name}-exporter \
--network db_swarm \
--replicas 1 \
-p 9104 \
-e DATA_SOURCE_NAME="exporter:exporterpassword@({name}:3306)/" \
prom/mysqld-exporter:latest \
--collect.info_schema.processlist \
--collect.info_schema.innodb_metrics \
--collect.info_schema.tablestats \
--collect.info_schema.tables \
--collect.info_schema.userstats \
--collect.engine_innodb_status

At this point, our architecture is looking something like this with exporter services in the picture:

Prometheus Server Swarm Service

Finally, let's deploy our Prometheus server. Similar to the Galera deployment, we have to prepare the Prometheus configuration file first before importing it into Swarm using Docker config command:

$ vim ~/prometheus.yml
global:
  scrape_interval:     5s
  scrape_timeout:      3s
  evaluation_interval: 5s

# Our alerting rule files
rule_files:
  - "alert.rules"

# Scrape endpoints
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'galera'
    static_configs:
      - targets: ['galera1-exporter:9104','galera2-exporter:9104', 'galera3-exporter:9104']

From the Prometheus configuration file, we have defined three jobs - "prometheus" and "galera". The first one is the job to monitor the Prometheus server itself. The next one is the job to monitor our MySQL containers named "galera". We define the endpoints on our MySQL exporters on port 9104, which expose the Prometheus-compatible metrics from the three Galera nodes respectively. The "alert.rules" is the rule file that we will include later in the next blog post for alerting purposes.

Import the configuration file into Docker config to be used with Prometheus container later:

$ cat ~/prometheus.yml | docker config create prometheus-yml -

Let's run the Prometheus server container, and publish port 9090 of all Docker hosts for the Prometheus web UI service:

$ docker service create \
--name prometheus-server \
--publish 9090:9090 \
--network db_swarm \
--replicas 1 \    
--config src=prometheus-yml,target=/etc/prometheus/prometheus.yml \
--mount type=volume,src=prometheus-data,dst=/prometheus \
prom/prometheus

Verify with the Docker service command that we have 3 Galera services, 3 exporter services and 1 Prometheus service:

$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                         PORTS
jsamvxw9tqpw        galera1             replicated          1/1                 mariadb:10.2                  *:30026->3306/tcp, *:30027->4444/tcp, *:30028-30029->4567-4568/tcp
hbh1dtljn535        galera1-exporter    replicated          1/1                 prom/mysqld-exporter:latest   *:30038->9104/tcp
otbwnb3ridg0        galera2             replicated          1/1                 mariadb:10.2                  *:30030->3306/tcp, *:30031->4444/tcp, *:30032-30033->4567-4568/tcp
jq8i77ch5oi3        galera2-exporter    replicated          1/1                 prom/mysqld-exporter:latest   *:30039->9104/tcp
5jp9dpv5twy3        galera3             replicated          1/1                 mariadb:10.2                  *:30034->3306/tcp, *:30035->4444/tcp, *:30036-30037->4567-4568/tcp
10gdkm1ypkav        galera3-exporter    replicated          1/1                 prom/mysqld-exporter:latest   *:30040->9104/tcp
gv9llxrig30e        prometheus-server   replicated          1/1                 prom/prometheus:latest        *:9090->9090/tcp

Now our Prometheus server is already running and can be accessed directly on port 9090 from any Docker node. Open a web browser and go to http://192.168.55.161:9090/ to access the Prometheus web UI. Verify the target status under Status -> Targets and make sure they are all green:

At this point, our Swarm architecture is looking something like this:

To be continued..

We now have our database and monitoring stack deployed on Docker. In part 2 of the blog, we will look into the different MySQL metrics to keep an eye on. We’ll also see how to configure alerting with Prometheus.

by ashraf at August 16, 2018 10:18 AM

August 15, 2018

MariaDB AB

MariaDB Server 10.3.9 now available

MariaDB Server 10.3.9 now available dbart Wed, 08/15/2018 - 13:04

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.3.9. See the release notes and changelog for details and visit mariadb.com/downloads to download.

 

Download MariaDB Server 10.3.9

Release Notes Changelog What is MariaDB 10.3?

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.3.9. See the release notes and changelog for details.

Login or Register to post comments

by dbart at August 15, 2018 05:04 PM

MariaDB Foundation

MariaDB 10.3.9 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.3.9, the latest stable release in the MariaDB 10.3 series. See the release notes and changelogs for details. Download MariaDB 10.3.9 Release Notes Changelog What is MariaDB 10.3? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.3.9 Alexander Barkov (MariaDB Corporation) Alexey […]

The post MariaDB 10.3.9 now available appeared first on MariaDB.org.

by Ian Gilfillan at August 15, 2018 05:00 PM

Peter Zaitsev

Webinar Thurs 16/8: Developing an App on MongoDB: Tips and Tricks

Please join Percona’s Sr. Technical Operations Architect Tim Vaillancourt as he presents Developing an App on MongoDB: Tips and Tricks on Thursday, August 16th, 2018, at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

A lot of developers prefer using MongoDB to other open source databases when developing applications. But why? How do you work with MongoDB to create a well-functioning application?

This webinar will help developers understand what MongoDB does and how it processes requests from applications.

In this webinar, we will cover:

  • Data, Queries and Indexes
  • Using indices efficiently
  • Reducing index and storage size with correct data types
  • The aggregation framework
  • Using the Explain and Operation Profiling features
  • MongoDB features to avoid
  • Using Read and Write Concerns for Integrity
  • Performance
  • Scaling read queries using Read Preference
  • What is MongoDB Sharding?
  • Using Percona Monitoring and Management (PMM) to visualize database usage
  • MongoDB users and built-in roles for application security
  • Using SRV DNS record support

By the end of the lesson, you will know how to avoid common problems with MongoDB in the application stage, instead of fixing it in production.

Register Now

The post Webinar Thurs 16/8: Developing an App on MongoDB: Tips and Tricks appeared first on Percona Database Performance Blog.

by Tim Vaillancourt at August 15, 2018 02:35 PM

MariaDB Foundation

MariaDB on HackerOne

We are pleased to announce the launch of our public bug bounty program on the HackerOne platform: https://hackerone.com/mariadb The aim for this program is two fold: Review the vulnerability submission channels, guidelines and policy for responsible disclosure, as well as asset identification and vulnerability handling process on our side. Encourage researchers to look for vulnerabilities […]

The post MariaDB on HackerOne appeared first on MariaDB.org.

by Teodor Mircea Ionita at August 15, 2018 12:11 PM

August 14, 2018

Peter Zaitsev

Open Source Database Community Blog: The Story So Far

open source database community blog

open source database community blogRecently, we initiated a new project, the Open Source Database Community Blog. One way to think of this is as an online, year round version of the Percona Live conferences. If you have a story to tell, an experience to share, or a lesson to be learned send it along. As long as it’s related to open source database software, their management and application. That’s right. Not just Percona software. Any open source database software of all formats.

Unlike Percona Live, though, we are not limited by time or space. All submissions are welcome as long as they follow some simple guidelines.

We have already had some excellent posts, and in case this is news to you, here’s a recap:

You can also read Jean-François’s personal blog (unsolicited, but greatly appreciated) on how the process of getting his post up and running went.

About the posts … and what’s in it for you

All of the writers are giving freely of their time and knowledge. So .. if you would just like to read some alternative independent viewpoints, try the blog. If you want to support the writers with constructive exchanges and comments, that would be great.

If you would like to go a little further and provide some feedback about what you’d like to see via our blog poll, that would be awesome. As a community blog, we want to make sure it hits community interests.

You can also drop me a line if there are things that I’ve missed from the poll.

Also, you should know I have the English covered but I am not a technical expert. We don’t want—not would I get—Percona techs to edit or review these blog posts. That’s not the point of the blog!

So, would you consider being a technical editor maybe? Not for all posts, since many of the writers will want to ‘own’ their content. But there could be some new writers who’d appreciate some back up from a senior tech before going ‘live’ with their posts. Might that tech buddy be you?

There’s some more ideas and I have written more about our vision for the blog here.

If you are tempted to write for this, please get in touch, I would love to hear from you. You do not have to be an expert! Content suitable for all levels of experience is welcome.

The post Open Source Database Community Blog: The Story So Far appeared first on Percona Database Performance Blog.

by Lorraine Pocklington, Community Manager at August 14, 2018 09:24 PM

MariaDB Foundation

MariaDB 10.2.17 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.2.17, the latest stable release in the MariaDB 10.2 series. See the release notes and changelogs for details. Download MariaDB 10.2.17 Release Notes Changelog What is MariaDB 10.2? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.2.17 Alexander Barkov (MariaDB Corporation) Alexey […]

The post MariaDB 10.2.17 now available appeared first on MariaDB.org.

by Ian Gilfillan at August 14, 2018 06:30 PM

MariaDB AB

MariaDB Server 10.2.17 now available

MariaDB Server 10.2.17 now available dbart Tue, 08/14/2018 - 14:06

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.2.17. See the release notes and changelog for details and visit mariadb.com/downloads to download.

 

Download MariaDB Server 10.2.17

Release Notes Changelog What is MariaDB 10.2?

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.2.17. See the release notes and changelog for details.

Login or Register to post comments

by dbart at August 14, 2018 06:06 PM

Federico Razzoli

Open source databases and bug trackers

All software has bugs. I rarely feel scared because of a bug – even famous ones that are sometimes in the news. People should manage software wisely, and be able to workaround bugs they hit.

Instead, I’m more concerned about how vendors treat their bugs. In particular, I care about transparency, and how projects process their bugs.

Here’s what I mean.

Transparency:

  1. Ability to search for new, active, and closed bugs.
  2. Public tests to check yourself if bugs are fixed, and if regression bugs made their way back into the code.
  3. Public version maintenance policy – if I use an old version, I should know if bugfixes are still backported.
  4. Optional: See developers comments, see their activity on a specific bug.
  5. Optional: See which version should contain a fix (if planned) and when that version will be released (if planned).

I want to highlight the word Optional. Points 1, 2 and 3, in my opinion, should be in place for any project that is not a personal hobby. If they miss (quite often, I have to say…) the project can still be useful and cool, but it is not being managed properly from QA point of view.

Bug processing:

I am not able to write a detailed list here. I think that this part depends on project characteristics: what the program does, if it is led by a company or not, how active the community is in development, and so on. Still, common sense should apply. Users are affected by bugs, therefore bugs should be fixed when it is reasonable to do so.

When it comes to open source RDBMSs…

Yes, I’m only talking about relational databases. And I’m only consider the most widely known ones. With this, I don’t want to disrespect any project. There can be great reasons to use NoSQL or less known SQL databases, but NoSQL is a huge set of different products, and I only want to write about what I know (or “know”).

MySQL

MySQL has a public bug tracker, which is good. But they also have a private one, and this has some bad consequences:

  • If you read about a bug that is still active, you cannot be sure that information is up to date.
  • Some bugs are private, for security reasons. Those reasons are often quite obscure, if not always. Note that MySQL typically runs on open source layers whose security bugs are public.
  • For the same reasons, not all tests are public. This reduces community QA capabilities.

But you can read more detailed information about this topic in Blog of (former?) MySQL Entomologist.

Another flaw is that not all MySQL components have the same level of quality. If you make heavy use of certain features (XA transactions, stored procedures…) you can reasonably expect to hit bugs. This is partly understandable, because Oracle needs to improve the features that are most used and useful. What is much less understandable is how an area of the product that was disappointing and buggy many years ago is still disappointing and buggy.

About the bug tracker… I’d like to highlight something that is rarely mentioned: its search capabilities are powerful. Maybe not so user friendly – you need to know how to use the tool. But once you know, you will be able to find everything you need.

Percona Server

Percona Server is a fork of MySQL which tries to keep the differences as small as possible. Thus, generally MySQL bugs are also Percona Server bugs, and all the features they implement could cause some specific bugs.

If they fix MySQL bugs, you will find this in their documentation.

Specific bugs can be found in Percona Server JIRA.

You can usually find internal details and tests. You know who works on a bug and when. Generally you don’t know in advance which version will contain a fix.

I think they do very well, and no, the fact I worked for Percona does not affect my opinion. Keep in mind, however, that Percona is a customer-centric company. This means that generally they fix bugs that affect their customers. And, obviously, they merge bug fixes that happen in mainstream MySQL code.

MariaDB

They also have a JIRA to report and search bugs. In their JIRA you can also find tests and several technical information written by the developers. It happened to me more than once to read more information about a bug and find interesting details about InnoDB internals.

In MariaDB JIRA you can also find some information that you can’t find in Percona JIRA. You can see the active sprint’s activity, so you know which bugs they are working on and what their status is (todo, in progress, in review, done). And you can see the next releases, including the next planned release dates (keep in mind that they are not commitments). This is a very high level of transparency and potentially a valuable source of information, for example if you are planning upgrades or if you have pressing problems with a specific bug. And of course it’s still interesting if you are just curious.

I also want to highlight that they discuss in the same way contributions and new features. Also, developers often interact with the community using their mailing lists, which makes the whole project much more transparent.

PostgreSQL

PostgreSQL does not have a bug tracker. Bugs can be reported to the dedicated mailing list or via a web form which sends a message to the mailing list. The archives are the only available tool to see information about Postgres bugs. In other words, there is no way to see the list of open bugs. This is sad.

However, Percona recently started to offer support for PostgreSQL. Their Careers page indicated that they were looking for someone able to work with Postgres code, some time ago (yes, I keep an eye on some companies career pages, because they are a source of information about their plans). Therefore my hope is that they intend to do a great QA work with Postgres like they always did with MySQL, and that they will push to have a publicly available professional bug tracker.

CockroachDB

CockroachDB uses GitHub issues to track bugs. The tool is quite poor compared to JIRA or bugs.mysql.com, but it has the most important features that you can reasonably expect.

Usually there are not many technical details about bugs. And you don’t know what is happening around a bug, unless there is a new comment.

Why do I always criticise?

Actually I don’t. But I’m aware that, if a person expresses 3 positive thoughs and 1 criticism, in many cases only the latter will be remembered. This is natural.

Anyway, my spirit is always: please do better. I use my time to criticise what I like, not what I dislike.

My criticism could also be wrong, in which case there could at least be a good discussion in the comments.

Federico

by Federico at August 14, 2018 09:30 AM

August 13, 2018

Oli Sennhauser

Cool new features in FromDual Backup and Recovery Manager 2.0.0

A while ago we released our FromDual Backup and Recovery Manager (brman) 2.0.0 for MariaDB and MySQL. So what are the new cool features of this new release?

First of all brman 2.0.0 is compatible with MariaDB 10.3 and MySQL 8.0:

shell> bman --target=brman:secret@127.0.0.1:3318 --type=full --mode=logical --policy=daily

Reading configuration from /etc/mysql/my.cnf
Reading configuration from /home/mysql/.my.cnf
No bman configuration file.

Command line: /home/mysql/product/brman-2.0.0/bin/bman.php --target=brman:******@127.0.0.1:3318 --type=full --mode=logical --policy=daily 

Options from command line
  target          = brman:******@127.0.0.1:3318
  type            = full
  mode            = logical
  policy          = daily

Resulting options
  config          = 
  target          = brman:******@127.0.0.1:3318
  type            = full
  mode            = logical
  policy          = daily
  log             = ./bman.log
  backupdir       = /home/mysql/bck
  catalog-name    = brman_catalog

Logging to   ./bman.log
Backupdir is /home/mysql/bck
Hostname is  chef
Version is   2.0.0 (catalog v0.2.0)

Start backup at 2018-08-13_11-57-31
  Binary logging is disabled.
  Schema to backup: mysql, foodmart, world, test

  schema_name      engine  cnt  data_bytes    index_bytes   table_rows 
  foodmart                    0             0             0           0
  mysql            CSV        2             0             0           4
  mysql            InnoDB     4         65536         49152          17
  mysql            MyISAM    25        515327        133120        2052
  test             InnoDB     3         49152             0           0
  world                       0             0             0           0

  /home/mysql/product/mariadb-10.3/bin/mysqldump --user=brman --host=127.0.0.1 --port=3318 --all-databases  --quick --single-transaction --flush-logs --triggers --routines --hex-blob --events
  to Destination: /home/mysql/bck/daily/bck_full_2018-08-13_11-57-31.sql
  Backup size is 488835
  Backup does NOT contain any binary log information.
  Do MD5 checksum of uncompressed file /home/mysql/bck/daily/bck_full_2018-08-13_11-57-31.sql
  md5sum --binary /home/mysql/bck/daily/bck_full_2018-08-13_11-57-31.sql
  md5 = 31cab19021e01c12db5fe49165a3df93
  /usr/bin/pigz -6 /home/mysql/bck/daily/bck_full_2018-08-13_11-57-31.sql
End backup at 2018-08-13 11:57:31 (rc=0)

Next brman also support mariabackup now:

shell> bman --target=brman:secret@127.0.0.1:3318 --type=full --mode=physical --policy=daily
...
Start backup at 2018-08-13_12-02-18
  Backup with tool mariabackup version 10.3.7 (from path /home/mysql/product/mariadb-10.3/bin/mariabackup).
  Schema to backup: mysql, foodmart, world, test

  schema_name      engine  cnt  data_bytes    index_bytes   table_rows 
  foodmart                    0             0             0           0
  mysql            CSV        2             0             0           4
  mysql            InnoDB     4         65536         49152          17
  mysql            MyISAM    25        515327        133120        2052
  test             InnoDB     3         49152             0           0
  world                       0             0             0           0

  Binary logging is disabled.
  /home/mysql/product/mariadb-10.3/bin/mariabackup --defaults-file=/tmp/bck_full_2018-08-13_12-02-18.cnf --user=brman --host=127.0.0.1 --port=3318 --no-timestamp --backup --target-dir=/home/mysql/bck/daily/bck_full_2018-08-13_12-02-18
180813 12:02:19 Connecting to MySQL server host: 127.0.0.1, user: brman, password: set, port: 3318, socket: not set
Using server version 10.3.7-MariaDB
/home/mysql/product/mariadb-10.3/bin/mariabackup based on MariaDB server 10.3.7-MariaDB Linux (x86_64)
mariabackup: uses posix_fadvise().
mariabackup: cd to /home/mysql/database/mariadb-103/data/
mariabackup: open files limit requested 0, set to 1024
mariabackup: using the following InnoDB configuration:
mariabackup:   innodb_data_home_dir =
mariabackup:   innodb_data_file_path = ibdata1:12M:autoextend
mariabackup:   innodb_log_group_home_dir = ./
2018-08-13 12:02:19 0 [Note] InnoDB: Number of pools: 1
mariabackup: Generating a list of tablespaces
2018-08-13 12:02:19 0 [Warning] InnoDB: Allocated tablespace ID 59 for mysql/transaction_registry, old maximum was 0
180813 12:02:19 >> log scanned up to (15975835)
180813 12:02:19 [01] Copying ibdata1 to /home/mysql/bck/daily/bck_full_2018-08-13_12-02-18/ibdata1
180813 12:02:19 [01]        ...done
...

Then brman 2.0.0 supports seamlessly all three physical backup methods (mariabackup, xtrabackup, mysqlbackup) in their newest release.

On a customer request we have added the option --pass-through to pass additional specific options through to the final back-end application (mysqldump, mariabackup, xtrabackup, mysqlbackup):

As an example the customer wanted to pass through the option --ignore-table to mysqldump:

shell> bman --target=brman:secret@127.0.0.1:3318 --type=schema --mode=logical --policy=daily --schema=+world --pass-through="--ignore-table=world.CountryLanguage"

...
Start backup at 2018-08-13_12-11-40
  Schema to backup: world

  schema_name      engine  cnt  data_bytes    index_bytes   table_rows 
  world            InnoDB     3        655360             0        5411

  Binary logging is disabled.
  /home/mysql/product/mariadb-10.3/bin/mysqldump --user=brman --host=127.0.0.1 --port=3318  --quick --single-transaction --flush-logs --triggers --routines --hex-blob --databases 'world' --events --ignore-table=world.CountryLanguage
  to Destination: /home/mysql/bck/daily/bck_schema_2018-08-13_12-11-40.sql
  Backup size is 217054
  Backup does NOT contain any binary log information.
  Do MD5 checksum of uncompressed file /home/mysql/bck/daily/bck_schema_2018-08-13_12-11-40.sql
  md5sum --binary /home/mysql/bck/daily/bck_schema_2018-08-13_12-11-40.sql
  md5 = f07e319c36ee7bb1e662008c4c66a35a
  /usr/bin/pigz -6 /home/mysql/bck/daily/bck_schema_2018-08-13_12-11-40.sql
End backup at 2018-08-13 12:11:40 (rc=0)

In the field it is sometimes wanted to not purge the binary logs during a binlog backup. So we added the option --no-purge to not purge binary logs during binlog backup. It looked like this before:

shell> bman --target=brman:secret@127.0.0.1:3326 --type=binlog --policy=binlog
...
Start backup at 2018-08-13_12-16-48
  Binlog Index file is: /home/mysql/database/mysql-80/data/binlog.index
  Getting lock: /home/mysql/product/brman-2.0.0/lck/binlog-logical-binlog.lock
  Releasing lock: /home/mysql/product/brman-2.0.0/lck/binlog-logical-binlog.lock
  FLUSH /*!50503 BINARY */ LOGS

  Copy /home/mysql/database/mysql-80/data/binlog.000006 to /home/mysql/bck/binlog/bck_binlog.000006
  Binary log binlog.000006 begin datetime is: 2018-08-13 12:14:14 and end datetime is: 2018-08-13 12:14:30
  Do MD5 checksum of /home/mysql/bck/binlog/bck_binlog.000006
  md5sum --binary /home/mysql/bck/binlog/bck_binlog.000006
  md5 = a7ae2a271a6c90b0bb53c562c87f6f7a
  /usr/bin/pigz -6 /home/mysql/bck/binlog/bck_binlog.000006
  PURGE BINARY LOGS TO 'binlog.000007'

  Copy /home/mysql/database/mysql-80/data/binlog.000007 to /home/mysql/bck/binlog/bck_binlog.000007
  Binary log binlog.000007 begin datetime is: 2018-08-13 12:14:30 and end datetime is: 2018-08-13 12:14:31
  Do MD5 checksum of /home/mysql/bck/binlog/bck_binlog.000007
  md5sum --binary /home/mysql/bck/binlog/bck_binlog.000007
  md5 = 5b592e597241694944d70849d7a05f53
  /usr/bin/pigz -6 /home/mysql/bck/binlog/bck_binlog.000007
  PURGE BINARY LOGS TO 'binlog.000008'
...

and like this after:

shell> bman --target=brman:secret@127.0.0.1:3326 --type=binlog --policy=binlog --no-purge

...
Start backup at 2018-08-13_12-18-52
  Binlog Index file is: /home/mysql/database/mysql-80/data/binlog.index
  Getting lock: /home/mysql/product/brman-2.0.0/lck/binlog-logical-binlog.lock
  Releasing lock: /home/mysql/product/brman-2.0.0/lck/binlog-logical-binlog.lock
  FLUSH /*!50503 BINARY */ LOGS

  Copy /home/mysql/database/mysql-80/data/binlog.000015 to /home/mysql/bck/binlog/bck_binlog.000015
  Binary log binlog.000015 begin datetime is: 2018-08-13 12:16:48 and end datetime is: 2018-08-13 12:18:41
  Do MD5 checksum of /home/mysql/bck/binlog/bck_binlog.000015
  md5sum --binary /home/mysql/bck/binlog/bck_binlog.000015
  md5 = 1f9a79c3ad081993b4006c58bf1d6bee
  /usr/bin/pigz -6 /home/mysql/bck/binlog/bck_binlog.000015

  Copy /home/mysql/database/mysql-80/data/binlog.000016 to /home/mysql/bck/binlog/bck_binlog.000016
  Binary log binlog.000016 begin datetime is: 2018-08-13 12:18:41 and end datetime is: 2018-08-13 12:18:42
  Do MD5 checksum of /home/mysql/bck/binlog/bck_binlog.000016
  md5sum --binary /home/mysql/bck/binlog/bck_binlog.000016
  md5 = ef1613e99bbfa78f75daa5ba543e3213
  /usr/bin/pigz -6 /home/mysql/bck/binlog/bck_binlog.000016
...

To make the logical backup (mysqldump) slightly faster we added the --quick option. This is done automatically and you cannot influence this behaviour.

  /home/mysql/product/mariadb-10.3/bin/mysqldump --user=brman --host=127.0.0.1 --port=3318  --quick --single-transaction --flush-logs --triggers --routines --hex-blob --events

Some of our customers use brman in combination with MyEnv and they want to have an overview of used software. So we made the version output of brman MyEnv compliant:

mysql@chef:~ [mariadb-103, 3318]> V

The following FromDual Toolbox Packages are installed:
------------------------------------------------------------------------
MyEnv:           2.0.0
BRman:           2.0.0
OpsCenter:       0.4.0
Fpmmm:           1.0.1
Nagios plug-ins: 1.0.1
O/S:             Linux / Ubuntu
Binaries:        mysql-5.7
                 mysql-8.0
                 mariadb-10.2
                 mariadb-10.3
------------------------------------------------------------------------

mysql@chef:~ [mariadb-103, 3318]>

In MySQL 5.7 general tablespaces were introduced. The utility mysqldump is not aware of general tablespaces and does not dump this information. This leads to errors during restore. FromDual brman checks for general tablespaces and writes them to the backup log so you can later extract this information at least from there. We consider this as a bug in mysqldump. MariaDB up to 10.3 has not implemented this feature yet so it is not affected of this problem.

...
Start backup at 2018-08-13_12-25-46
  WARNING: 5 general tablespaces found! mysqldump does NOT dump tablespace creation statements.

  CREATE TABLESPACE `brman_test_ts` ADD DATAFILE './brman_test_ts.ibd' FILE_BLOCK_SIZE=4096 ENGINE=InnoDB
  CREATE TABLESPACE `ts2` ADD DATAFILE './ts2.ibd' FILE_BLOCK_SIZE=4096 ENGINE=InnoDB
  CREATE TABLESPACE `ts3` ADD DATAFILE './ts3.ibd' FILE_BLOCK_SIZE=4096 ENGINE=InnoDB
  CREATE TABLESPACE `ts4` ADD DATAFILE './ts4.ibd' FILE_BLOCK_SIZE=4096 ENGINE=InnoDB
  CREATE TABLESPACE `ts1` ADD DATAFILE './ts1.ibd' FILE_BLOCK_SIZE=4096 ENGINE=InnoDB
...

FromDual brman backups are quite complex and can run quite some long time thus timestamps are logged so we can find out where the time is spent or where the bottlenecks are:

...
  At 2018-08-13 12:27:17 do MD5 checksum of uncompressed file /home/mysql/bck/daily/bck_full_2018-08-13_12-27-16/ib_logfile0
  md5sum --binary /home/mysql/bck/daily/bck_full_2018-08-13_12-27-16/ib_logfile0
  md5 = d41d8cd98f00b204e9800998ecf8427e
  At 2018-08-13 12:27:17 compress file /home/mysql/bck/daily/bck_full_2018-08-13_12-27-16/ib_logfile0
  /usr/bin/pigz -6 /home/mysql/bck/daily/bck_full_2018-08-13_12-27-16/ib_logfile0

  At 2018-08-13 12:27:18 do MD5 checksum of uncompressed file /home/mysql/bck/daily/bck_full_2018-08-13_12-27-16/ibdata1
  md5sum --binary /home/mysql/bck/daily/bck_full_2018-08-13_12-27-16/ibdata1
  md5 = 097ab6d70eefb6e8735837166cd4ba54
  At 2018-08-13 12:27:18 compress file /home/mysql/bck/daily/bck_full_2018-08-13_12-27-16/ibdata1
  /usr/bin/pigz -6 /home/mysql/bck/daily/bck_full_2018-08-13_12-27-16/ibdata1

  At 2018-08-13 12:27:19 do MD5 checksum of uncompressed file /home/mysql/bck/daily/bck_full_2018-08-13_12-27-16/xtrabackup_binlog_pos_innodb
...

A general FromDual policy is to not use the MariaDB/MySQL root user for anything except direct DBA interventions. So backup should be done with its own user. FromDual suggest brman as a username and the utility complains with a warning if root is used:

shell> bman --target=root@127.0.0.1:3318 --type=full --policy=daily
...
Start backup at 2018-08-13_12-30-29

WARNING: You should NOT use the root user for backup. Please create another user as follows:

         CREATE USER 'brman'@'127.0.0.1' IDENTIFIED BY 'S3cret123';
         GRANT ALL ON *.* TO 'brman'@'127.0.0.1';

         If you want to be more restrictive you can grant privileges as follows:

         GRANT SELECT, LOCK TABLES, RELOAD, PROCESS, TRIGGER, SUPER, REPLICATION CLIENT, SHOW VIEW, EVENT ON *.* TO 'brman'@'127.0.0.1';

         Additionally for MySQL Enterprise Backup (MEB):

         GRANT CREATE, INSERT, DROP, UPDATE ON mysql.backup_progress TO 'brman'@'127.0.0.1';
         GRANT CREATE, INSERT, SELECT, DROP, UPDATE ON mysql.backup_history TO 'brman'@'127.0.0.1';
         GRANT FILE ON *.* TO 'brman'@'127.0.0.1';
         GRANT CREATE, INSERT, DROP, UPDATE ON mysql.backup_sbt_history TO 'brman'@'127.0.0.1';

         Additionally for MariaBackup / XtraBackup:

         GRANT INSERT, SELECT ON PERCONA_SCHEMA.xtrabackup_history TO 'brman'@'127.0.0.1';
...

Some customers have implemented a monitoring solution. FromDual brman can report backup return code, backup run time and backup size to the FromDual Performance Monitor for MariaDB and MySQL (fpmmm/Zabbix) now:

shell> bman --target=brman:secret@127.0.0.1:3318 --type=full --policy=daily --fpmmm-hostname=mariadb-103 --fpmmm-cache-file=/var/cache/fpmmm/fpmmm.FromDual.mariadb-103.cache
...

shell> cat /var/cache/fpmmm/fpmmm.FromDual.mariadb-103.cache
mariadb-103 FromDual.MySQL.backup.full_logical_rc 1534156619 "0"
mariadb-103 FromDual.MySQL.backup.full_logical_duration 1534156619 "129"
mariadb-103 FromDual.MySQL.backup.full_logical_size 1534156619 "7324744568"

Some customers run their databases on shared hosting systems or in cloud solutions where they do not have all the needed database privileges. For those users FromDual brman is much less intrusive now and allows backups on those restricted systems as well:

#
# /home/shinguz/etc/brman.conf
#

policy                = daily
target                = shinguz_brman:secret@localhost
type                  = schema
per-schema            = on
schema                = -shinguz_shinguz
log                   = /home/shinguz/log/bman_backup.log
backupdir             = /home/shinguz/bck


shell> /home/shinguz/brman/bin/bman --config=/home/shinguz/etc/brman.conf 1>/dev/null
...
  WARNING: Binary logging is enabled but you are lacking REPLICATION CLIENT privilege. I cannot get Master Log File and Pos!
  WARNING: I cannot check for GENERAL tablespaces. I lack the PROCESS privilege. This backup might not restore in case of presence of GENERAL tablespaces.
...

Details: Check for binary logging is made less intrusive. If RELOAD privilege is missing --master-data and/or --flush-logs options are omitted. Schema backup does not require SHOW DATABASES privilege any more.

Some customers want to push theire backups directly to an other server during backup (not pull from somewhere else). For those customers the new option --archivedestination was introduced which replaces the less powerfull option --archivedir which is deprecated. So archiving with rsync, scp and sftp is possible now (NFS mounts was possible before already):

shell> bman --target=brman:secret@127.0.0.1:3318 --type=full --policy=daily --archivedestination=sftp://oli@backup.fromdual.com:22/home/oli/bck/production/daily/
...
  /home/mysql/product/mysql-5.7.21/bin/mysqldump --user=root --host=127.0.0.1 --port=33006 --master-data=2 --quick --single-transaction --triggers --routines --hex-blob --events 'tellmatic'
  to Destination: /home/mysql/backup/daily/bck_schema_tellmatic_2018-08-13_11-41-26.sql
  Backup size is 602021072
  Binlog file is mysql-bin.019336 and position is 287833
  Do MD5 checksum of uncompressed file /home/mysql/backup/daily/bck_schema_tellmatic_2018-08-13_11-41-26.sql
  md5sum --binary /home/mysql/backup/daily/bck_schema_tellmatic_2018-08-13_11-41-26.sql
  md5 = 06e1a0acd5da8acf19433b192259c1e1
  /usr/bin/pigz -6 /home/mysql/backup/daily/bck_schema_tellmatic_2018-08-13_11-41-26.sql
  Archiving /home/mysql/backup/daily/bck_schema_tellmatic_2018-08-13_11-41-26.sql.gz to sftp://oli@backup.example.com:/home/oli/bck/production/daily/
  echo 'put "/home/mysql/backup/daily/bck_schema_tellmatic_2018-08-13_11-41-26.sql.gz"' | sftp -b - -oPort=22 oli@backup.fromdual.com:/home/oli/bck/production/daily/

End backup at 2018-08-13 11:42:19 (rc=0)

by Shinguz at August 13, 2018 12:49 PM

Peter Zaitsev

Webinar Tues 8/14: Amazon Migration Service: The Magic Wand to Migrate Away from Your Proprietary Environment to MySQL

Amazon Migration Service to migrate to MySQL

Amazon Migration Service to migrate to MySQLPlease join Percona’s Solution Engineer, Dimitri Vanoverbeke as he presents Amazon Migration Service: The Magic Wand to Migrate Away from Your Proprietary Environment to MySQL on Tuesday, August 14th, 2018 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

In this talk, we will learn about the Amazon Migration Tool. The talk will cover the possibilities, potential pitfalls prior to migrating and a high-level overview of its functionalities.

Register Now

The post Webinar Tues 8/14: Amazon Migration Service: The Magic Wand to Migrate Away from Your Proprietary Environment to MySQL appeared first on Percona Database Performance Blog.

by Dimitri Vanoverbeke at August 13, 2018 11:27 AM

August 12, 2018

Valeriy Kravchuk

On Oracle's QA for MySQL

In my recent blog posts I presented lists of bugs, fixed and not yet fixed, as usual. Working on these lists side tracked me from the main topic of this summer - problems in Oracle's way of handling MySQL. Time to get back on track!

Among things Oracle could do better for MySQL I mentioned QA:
"Oracle's internal QA efforts still seem to be somewhat limited.
We get regression bugs, ASAN failures, debug assertions, crashes, test failures etc in the official releases, and Oracle MySQL still relies a lot on QA by MySQL Community (while not highlighting this fact that much in public)."
I have to explain these in details, as it's common perception for years already that Oracle improved MySQL QA a lot and invests enormously in it, and famous MySQL experts were impressed even 5 years ago:
"Lets take a number we did get the QA team now has 400 person-years of experience on it. Lets say the QA team was 10 people before, and now it is tripled to 30 people. That means the average QA person has over 13 years experience in QA, which is about a year longer than my entire post-college IT career."
I was in the conference hall during that famous keynote, and QA related statements in it sounded mostly funny for me. Now, 5 years later, let me try to explain why just adding people and person-years of experience may not work that well. I'll try to present some examples and lists of bugs, as usual, to prove my points.
Emirates Air Line in London lets you see nice views of London, and it costed a lot, but hardly it's the most efficient public transport system between the North Greenwich Peninsula and Royal Victoria Dock one could imagine.
  1. We still get all kinds of regression bugs reported by MySQL Community for every release, even MySQL 8.0.12. Here is the short list of random recent examples:
    • Bug #90209 - "Performance regression with > 15K tables in MySQL 8.0 (with general tablespaces)".
    • Bug #91878 - "Wrong results with optimizer_switch='derived_merge=OFF';".
    • Bug #91377 - "Can't Initialize MySQl if internal_tmp_disk_storage_engine is set to MYISAM".
    • Bug #90100 - "Year type column have index, query value more than 2156 , the result is wrong".
    • Bug #91927 - "8.0.12 no longer builds with Mac brew-installed ICU".
    • Bug #91975 - "InnoDB: Assertion failure: dict0dd.cc:1071:!fail".
    It means that Oracle's MySQL QA may NOT do enough/proper regression testing. We sometimes can not say this for sure, as Oracle hides some test cases. So, we, users of MySQL, just may not know what was the intention of some recent change (tests should show it even if the fine manual may not be clear enough - a topic for my next post).
  2. We still get valid test failure bugs found by MySQL Community members. Some recent examples follows:
    • Bug #90633 - "innodb_fts.ngram_1 test fails (runs too long probably)".
    • Bug #90608 - "rpl_gtid.rpl_perfschema_applier_status_by_worker_gtid_skipped_transaction fails".
    • Bug #90631 - "perfschema.statement_digest_query_sample test fails sporadically".
    • Bug #89431 - "innodb_undo.truncate_recover MTR test failing with a server error log warning". It's fixed, but only in MySQL 8.0.13.
    • Bug #91175 - "rpl_semi_sync_group_commit_deadlock.test is not simulating flush error ".
    • Bug #91022 - "audit_null.audit_plugin_bugs test always failing".
    • Bug #86110 - "A number of MTR test cases fail when run on a server with no PERFSCHEMA".
    For me it means that Oracle's MySQL QA either do not care to run regression tests suite properly, in enough combination of platforms, options and build types, or they do not analyze the failures they get properly (and release when needed, not when all tests pass on all platforms). This is somewhat scary.
  3. We still get crashing bugs in GA releases. It's hard to notice them as they are got hidden fast or as soon as they get public attention, but they do exist, and the last example, Bug #91928, is discussed here.
  4. It seems some tools that helps to discover code problems may not be used properly/regularly in Oracle. I had a separate post "On Bugs Detected by ASan", where you can find some examples. Lucky we are that Percona engineers test ASan builds of MySQL 5.7 and 8.0 regularly, for years, and contribute back public bug reports.
  5. Oracle's MySQL QA engineers do not write much about their work in public recently. I can find some posts here and there from 2013 and 2014, but very few in recent years. One may say that's because QA engineers are working hard and have no time for blogging (unlike lazy annoying individual like me), but that's not entirely true. There is at least one Oracle engineer who does a lot of QA and makes a lot of information about his work public - Shane Bester - who is brave enough and cares enough to report MySQL bugs in public. Ironically, I doubt he has any formal relation to any of QA teams in Oracle!
  6. A lot of real MySQL QA is still done by MySQL Community, while these efforts are not that much acknowledged recently (you usually get your name mentioned in the official release notes if you submitted a patch, but the fact that you helped Oracle by finding a real bug their QA missed is NOT advertised any more since last Morgan's "Community Release Notes" published 2 years ago). Moreover, only MySQL Community tries to make QA job popular and educate users about proper tools and approaches (Percona and Roel Van de Paar personally are famous for this).
To summarize, for me it seems that real MySQL QA is largely still performed by MySQL Community and in public, while the impact of hidden and maybe huge Oracle's investments in QA is way less clear and visible. Oracle's MySQL QA investments look like those into the Emirates Air Line cable car in London to me - the result is nice to have, but it's the most expensive cable system ever built with a limited efficiency for community as a public transport.

by Valeriy Kravchuk (noreply@blogger.com) at August 12, 2018 11:28 AM

August 10, 2018

Peter Zaitsev

Tuning Autovacuum in PostgreSQL and Autovacuum Internals

Tuning Autovacuum in PostgreSQL

The performance of a PostgreSQL database can be compromised by dead tuples, since they continue to occupy space and can lead to bloat. We provided an introduction to VACUUM and bloat in an earlier blog post. Now, though, it’s time to look at autovacuum for postgres, and the internals you to know to maintain a high performance PostgreSQL database needed by demanding applications.

What is autovacuum ?

Autovacuum is one of the background utility processes that starts automatically when you start PostgreSQL. As you see in the following log, the postmaster (parent PostgreSQL process) with pid 2862 has started the autovacuum launcher process with pid 2868. To start autovacuum, you must have the parameter autovacuum set to ON. In fact, you should not set it to OFF in a production system unless you are 100% sure about what you are doing and its implications.

avi@percona:~$ps -eaf | egrep "/post|autovacuum"
postgres  2862     1  0 Jun17 pts/0    00:00:11 /usr/pgsql-10/bin/postgres -D /var/lib/pgsql/10/data
postgres  2868  2862  0 Jun17 ?        00:00:10 postgres: autovacuum launcher process
postgres 15427  4398  0 18:35 pts/1    00:00:00 grep -E --color=auto /post|autovacuum

Why is autovacuum needed ? 

We need VACUUM to remove dead tuples, so that the space occupied by dead tuples can be re-used by the table for future inserts/updates. To know more about dead tuples and bloat, please read our previous blog post. We also need ANALYZE on the table that updates the table statistics, so that the optimizer can choose optimal execution plans for an SQL statement. It is the autovacuum in postgres that is responsible for performing both vacuum and analyze on tables.

There exists another background process in postgres called Stats Collector that tracks the usage and activity information. The information collected by this process is used by autovacuum launcher to identify the list of candidate tables for autovacuum. PostgreSQL identifies the tables needing vacuum or analyze automatically, but only when autovacuum is enabled. This ensures that postgres heals itself and stops the database from developing more bloat/fragmentation.

Parameters needed to enable autovacuum in PostgreSQL are :

autovacuum = on  # ( ON by default )
track_counts = on # ( ON by default )

track_counts
  is used by the stats collector. Without that in place, autovacuum cannot access the candidate tables.

Logging autovacuum

Eventually, you may want to log the tables on which autovacuum spends more time. In that case, set the parameter log_autovacuum_min_duration to a value (defaults to milliseconds), so that any autovacuum that runs for more than this value is logged to the PostgreSQL log file. This may help tune your table level autovacuum settings appropriately.

# Setting this parameter to 0 logs every autovacuum to the log file.
log_autovacuum_min_duration = '250ms' # Or 1s, 1min, 1h, 1d

Here is an example log of autovacuum vacuum and analyze

< 2018-08-06 07:22:35.040 EDT > LOG: automatic vacuum of table "vactest.scott.employee": index scans: 0
pages: 0 removed, 1190 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 110008 removed, 110008 remain, 0 are dead but not yet removable
buffer usage: 2402 hits, 2 misses, 0 dirtied
avg read rate: 0.057 MB/s, avg write rate: 0.000 MB/s
system usage: CPU 0.00s/0.02u sec elapsed 0.27 sec
< 2018-08-06 07:22:35.199 EDT > LOG: automatic analyze of table "vactest.scott.employee" system usage: CPU 0.00s/0.02u sec elapsed 0.15 sec

When does PostgreSQL run autovacuum on a table ? 

As discussed earlier, autovacuum in postgres refers to both automatic VACUUM and ANALYZE and not just VACUUM. An automatic vacuum or analyze runs on a table depending on the following mathematic equations.

The formula for calculating the effective table level autovacuum threshold is :

Autovacuum VACUUM thresold for a table = autovacuum_vacuum_scale_factor * number of tuples + autovacuum_vacuum_threshold

With the equation above, it is clear that if the actual number of dead tuples in a table exceeds this effective threshold, due to updates and deletes, that table becomes a candidate for autovacuum vacuum.

Autovacuum ANALYZE threshold for a table = autovacuum_analyze_scale_factor * number of tuples + autovacuum_analyze_threshold

The above equation says that any table with a total number of inserts/deletes/updates exceeding this threshold—since last analyze—is eligible for an autovacuum analyze.

Let’s understand these parameters in detail.

  • autovacuum_vacuum_scale_factor Or autovacuum_analyze_scale_factor : Fraction of the table records that will be added to the formula. For example, a value of 0.2 equals to 20% of the table records.
  • autovacuum_vacuum_threshold Or autovacuum_analyze_threshold : Minimum number of obsolete records or dml’s needed to trigger an autovacuum.

Let’s consider a table: percona.employee with 1000 records and the following autovacuum parameters.

autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 50
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 50

Using the above mentioned mathematical formulae as reference,

Table : percona.employee becomes a candidate for autovacuum Vacuum when,
Total number of Obsolete records = (0.2 * 1000) + 50 = 250

Table : percona.employee becomes a candidate for autovacuum ANALYZE when,
Total number of Inserts/Deletes/Updates = (0.1 * 1000) + 50 = 150

Tuning Autovacuum in PostgreSQL

We need to understand that these are global settings. These settings are applicable to all the databases in the instance. This means, regardless of the table size, if the above formula is reached, a table is eligible for autovacuum vacuum or analyze.

Is this a problem ?

Consider a table with ten records versus a table with a million records. Even though the table with a million records may be involved in transactions far more often, the frequency at which a vacuum or an analyze runs automatically could be greater for the table with just ten records.

Consequently, PostgreSQL allows you to configure individual table level autovacuum settings that bypass global settings.

ALTER TABLE scott.employee SET (autovacuum_vacuum_scale_factor = 0, autovacuum_vacuum_threshold = 100);

Output Log
----------
avi@percona:~$psql -d percona
psql (10.4)
Type "help" for help.
percona=# ALTER TABLE scott.employee SET (autovacuum_vacuum_scale_factor = 0, autovacuum_vacuum_threshold = 100);
ALTER TABLE

The above setting runs autovacuum vacuum on the table scott.employee only once there is more than 100 obsolete records.

How do we identify the tables that need their autovacuum settings tuned ? 

In order to tune autovacuum for tables individually, you must know the number of inserts/deletes/updates on a table for an interval. You can also view the postgres catalog view : pg_stat_user_tables to get that information.

percona=# SELECT n_tup_ins as "inserts",n_tup_upd as "updates",n_tup_del as "deletes", n_live_tup as "live_tuples", n_dead_tup as "dead_tuples"
FROM pg_stat_user_tables
WHERE schemaname = 'scott' and relname = 'employee';
 inserts | updates | deletes | live_tuples | dead_tuples
---------+---------+---------+-------------+-------------
      30 |      40 |       9 |          21 |          39
(1 row)

As observed in the above log, taking a snapshot of this data for a certain interval should help you understand the frequency of DMLs on each table. In turn, this should help you with tuning your autovacuum settings for individual tables.

How many autovacuum processes can run at a time ? 

There cannot be more than autovacuum_max_workers number of autovacuum processes running at a time, across the instance/cluster that may contain more than one database. Autovacuum launcher background process starts a worker process for a table that needs a vacuum or an analyze. If there are four databases with autovacuum_max_workers set to 3, then, the 4th database has to wait until one of the existing worker process gets free.

Before starting the next autovacuum, it waits for autovacuum_naptime, the default is 1 min on most of the versions. If you have three databases, the next autovacuum waits for 60/3 seconds. So, the wait time before starting next autovacuum is always (autovacuum_naptime/N) where N is the total number of databases in the instance.

Does increasing autovacuum_max_workers alone increase the number of autovacuum processes that can run in parallel ?
NO. This is explained better in next few lines.

Is VACUUM IO intensive? 

Autovacuum can be considered as a cleanup. As discussed earlier, we have 1 worker process per table. Autovacuum reads 8KB (default block_size) pages of a table from disk and modifies/writes to the pages containing dead tuples. This involves both read and write IO. Thus, this could be an IO intensive operation, when there is an autovacuum running on a huge table with many dead tuples, during a peak transaction time. To avoid this issue, we have a few parameters that are set to minimize the impact on IO due to vacuum.

The following are the parameters used to tune autovacuum IO

  • autovacuum_vacuum_cost_limit : total cost limit autovacuum could reach (combined by all autovacuum jobs).
  • autovacuum_vacuum_cost_delay : autovacuum will sleep for these many milliseconds when a cleanup reaching autovacuum_vacuum_cost_limit cost is done.
  • vacuum_cost_page_hit : Cost of reading a page that is already in shared buffers and doesn’t need a disk read.
  • vacuum_cost_page_miss : Cost of fetching a page that is not in shared buffers.
  • vacuum_cost_page_dirty : Cost of writing to each page when dead tuples are found in it.

Default Values for the parameters discussed above.
------------------------------------------------------
autovacuum_vacuum_cost_limit = -1 (So, it defaults to vacuum_cost_limit) = 200
autovacuum_vacuum_cost_delay = 20ms
vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20

Consider autovacuum VACUUM running on the table percona.employee.

Let’s imagine what can happen in 1 second. (1 second = 1000 milliseconds)

In a best case scenario where read latency is 0 milliseconds, autovacuum can wake up and go for sleep 50 times (1000 milliseconds / 20 ms) because the delay between wake-ups needs to be 20 milliseconds.

1 second = 1000 milliseconds = 50 * autovacuum_vacuum_cost_delay

Since the cost associated per reading a page in shared_buffers is 1, in every wake up 200 pages can be read, and in 50 wake-ups 50*200 pages can be read.

If all the pages with dead tuples are found in shared buffers, with an autovacuum_vacuum_cost_delay of 20ms, then it can read: ((200 / vacuum_cost_page_hit) * 8) KB in each round that needs to wait forautovacuum_vacuum_cost_delay amount of time.

Thus, at the most, an autovacuum can read : 50 * 200 * 8 KB = 78.13 MB per second (if blocks are already found in shared_buffers), considering the block_size as 8192 bytes.

If the blocks are not in shared buffers and need to fetched from disk, an autovacuum can read : 50 * ((200 / vacuum_cost_page_miss) * 8) KB = 7.81 MB per second.

All the information we have seen above is for read IO.

Now, in order to delete dead tuples from a page/block, the cost of a write operation is : vacuum_cost_page_dirty, set to 20 by default.

At the most, an autovacuum can write/dirty : 50 * ((200 / vacuum_cost_page_dirty) * 8) KB = 3.9 MB per second.

Generally, this cost is equally divided to all the autovacuum_max_workers number of autovacuum processes running in the Instance. So, increasing the autovacuum_max_workers may delay the autovacuum execution for the currently running autovacuum workers. And increasing the autovacuum_vacuum_cost_limit may cause IO bottlenecks. An important point to note is that this behaviour can be overridden by setting the storage parameters of individual tables, which would subsequently ignore the global settings.

postgres=# alter table percona.employee set (autovacuum_vacuum_cost_limit = 500);
ALTER TABLE
postgres=# alter table percona.employee set (autovacuum_vacuum_cost_delay = 10);
ALTER TABLE
postgres=#
postgres=# \d+ percona.employee
Table "percona.employee"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
id | integer | | | | plain | |
Options: autovacuum_vacuum_threshold=10000, autovacuum_vacuum_cost_limit=500, autovacuum_vacuum_cost_delay=10

Thus, on a busy OLTP database, always have a strategy to implement manual VACUUM on tables that are frequently hit with DMLs, during a low peak window. You may have as many parallel vacuum jobs as possible when you run it manually after setting relevant autovacuum_* settings. For this reason, a scheduled manual Vacuum Job is always recommended alongside finely tuned autovacuum settings.

The post Tuning Autovacuum in PostgreSQL and Autovacuum Internals appeared first on Percona Database Performance Blog.

by Avinash Vallarapu at August 10, 2018 05:44 PM

This Week in Data with Colin Charles 48: Coinbase Powered by MongoDB and Prometheus Graduates in the CNCF

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

The call for submitting a talk to Percona Live Europe 2018 is closing today, and while there may be a short extension, have you already got your talk submitted? I suggest doing so ASAP!

I’m sure many of you have heard of cryptocurrencies, the blockchain, and so on. But how many of you realiize that Coinbase, an application that handles cryptocurrency trades, matching book orders, and more, is powered by MongoDB? With the hype and growth in interest in late 2017, Coinbase has had to scale. They gave an excellent talk at MongoDB World, titled MongoDB & Crypto Mania (the video is worth a watch), and they’ve also written a blog post, How we’re scaling our platform for spikes in customer demand. They even went driver hacking (the Ruby driver for MongoDB)!

It is great to see there be a weekly review of happenings in the Vitess world.

PingCap and TiDB have been to many Percona Live events to present, and recently hired Morgan Tocker. Morgan has migrated his blog from MySQL to TiDB. Read more about his experience in, This blog, now Powered by WordPress + TiDB. Reminds me of the early days of Galera Cluster and showing how Drupal could be powered by it!

Releases

Link List

  • Sys Schema MySQL 5.7+ – blogger from Wipro, focusing on an introduction to the sys schema on MySQL (note: still not available in the MariaDB Server fork).
  • Prometheus Graduates in the CNCF, so is considered a mature project. Criteria for graduation is such that “projects must demonstrate thriving adoption, a documented, structured governance process, and a strong commitment to community sustainability and inclusivity.” Percona benefits from Prometheus in Percona Monitoring & Management (PMM), so we should celebrate this milestone!
  • Replicating from MySQL 8.0 to MySQL 5.7
  • A while ago in this column, we linked to Shlomi Noach’s excellent post on MySQL High Availability at GitHub. We were also introduced to GitHub Load Balancer (GLB), which they ran on top of HAProxy. However back then, GLB wasn’t open; now you can get GLB Director: GLB: GitHub’s open source load balancer. The project describes GLB Director as: “… a Layer 4 load balancer which scales a single IP address across a large number of physical machines while attempting to minimise connection disruption during any change in servers. GLB Director does not replace services like haproxy and nginx, but rather is a layer in front of these services (or any TCP service) that allows them to scale across multiple physical machines without requiring each machine to have unique IP addresses.”
  • F1 Query: Declarative Querying at Scale – a well-written paper.

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 48: Coinbase Powered by MongoDB and Prometheus Graduates in the CNCF appeared first on Percona Database Performance Blog.

by Colin Charles at August 10, 2018 11:48 AM

August 09, 2018

Peter Zaitsev

Lock Down: Enforcing AppArmor with Percona XtraDB Cluster

Enforcing AppArmor with Percona XtraDB Cluster

Recently, I wrote a blog post showing how to enforce SELinux with Percona XtraDB Cluster (PXC). The Linux distributions derived from RedHat use SELinux. There is another major mandatory discretionary access control (DAC) system, AppArmor. Ubuntu, for example, installs AppArmor by default. If you are concerned by computer security and use PXC on Ubuntu, you should enforce AppArmor. This post will guide you through the steps of creating a profile for PXC and enabling it. If you don’t want to waste time, you can just grab my profile, it seems to work fine. Adapt it to your environment if you are using non-standard paths. Look at the section “Copy the profile” for how to install it. For the brave, let’s go!

Install the tools

In order to do anything with AppArmor, we need to install the tools. On Ubuntu 18.04, I did:

apt install apparmor-utils

The apparmor-utils package provides the tools we need to generate a skeleton profile and parse the system logs.

Create a skeleton profile

AppArmor is fairly different from SELinux. Instead of attaching security tags to resources, you specify what a given binary can access, and how, in a text file. Also, processes can inherit permissions from their parent. We will only create a profile for the mysqld_safe script and it will cover the mysqld process and the SST scripts as they are executed under it. You create the skeleton profile like this:

root@BlogApparmor2:~# aa-autodep /usr/bin/mysqld_safe
Writing updated profile for /usr/bin/mysqld_safe.

On Ubuntu 18.04, there seems to be a bug. I reported it and apparently I am not the only one with the issue. If you get a “KeyError” error with the above command, try:

root@BlogApparmor2:~# echo "#include <abstractions>" > /etc/apparmor.d/scripts
root@BlogApparmor2:~# aa-autodep /usr/bin/mysqld_safe

The aa-autodep command creates the profile “usr.bin.mysqld_safe” in the /etc/apparmor.d directory. The initial content is:

root@BlogApparmor2:~# cat /etc/apparmor.d/usr.bin.mysqld_safe
# Last Modified: Wed Jul 25 18:56:31 2018
#include <tunables/global>
/usr/bin/mysqld_safe flags=(complain) {
  #include <abstractions/base>
  #include <abstractions/bash>
  /bin/dash ix,
  /lib/x86_64-linux-gnu/ld-*.so mr,
  /usr/bin/mysqld_safe r,
}

I suggest you add, ahead of time, things you know are needed. In my case, I added:

/etc/mysql/** r,
/usr/bin/innobackupex mrix,
/usr/bin/wsrep_sst_xtrabackup-v2 mrix,
/usr/lib/galera3/* r,
/usr/lib/mysql/plugin/* r,
/usr/sbin/mysqld mrix,
/var/log/mysqld.log w,
owner /tmp/** rw,
owner /var/lib/mysql/** rwk,

This will save time on redundant questions later. Those entries are permissions granted to mysqld_safe. For example,

/etc/mysql** r
  allows to read everything in
/etc/mysql
  and its subdirectories. These lines need to go right after the
/usr/bin/mysqld_safe r,
  line. Once done, parse and load the profile with:

root@BlogApparmor2:~# apparmor_parser -r /etc/apparmor.d/usr.bin.mysqld_safe

Get a well behaved SST script

If you read my previous blog post on SELinux, you may recall the

wsrep_sst_xtrabackup-v2
  script does not behave well, security wise. The Percona developers have released a fixed version but it may not be available yet in a packaged form. In the meantime, you can download it from github.

Start iterating

My initial thought was to put the profile in complain mode, generate activity and parse the logs with aa-logprof to get entries to add to the profile. Likely there is something I am doing wrong but in complain mode, aa-logprof detects nothing. In order to get something I had to enforce the profile with:

root@BlogApparmor2:~# aa-enforce /etc/apparmor.d/usr.bin.mysqld_safe

Then, I iterated many times—like more than 20—over the following sequence:

  1. rm -rf /var/lib/mysql/* # optional
  2. systemctl start mysql &
  3. tail -f /var/log/mysqld.log /var/log/kern.log
  4. systemctl stop mysql
  5. ps fax | egrep ‘mysqld_safe|mysqld’ | grep -v grep | awk ‘{print $1}’ | xargs kill -9 # sometimes
  6. aa-logprof
  7. if something was not right, jump back to step 1

See the next section for how to run aa-logprof. Once that sequence worked well, I tried SST (joiner/donor) roles and IST.

Parse the logs with aa-logprof

Now, the interesting part begins, parsing the logs. Simply begin the process with:

root@BlogApparmor2:~#  aa-logprof

and answer the questions. Be careful, I made many mistakes before I got it right, remember I am more a DBA than a Sysadmin. For example, you’ll get questions like:

Profile:  /usr/sbin/mysqld
Path:     /etc/hosts.allow
New Mode: r
Severity: unknown
 [1 - #include <abstractions/lxc/container-base>]
  2 - #include <abstractions/lxc/start-container>
  3 - /etc/hosts.allow r,
(A)llow / [(D)eny] / (I)gnore / (G)lob / Glob with (E)xtension / (N)ew / Audi(t) / Abo(r)t / (F)inish

AppArmor asks you how it should provide read access to the

/etc/hosts.allow
  file. If you answer right away with “A”, it will add
#include <abstractions/lxc/container-base>
 to the profile. With all the dependencies pulled by the lxc-related includes, you basically end up allowing nearly everything. You must first press “3” to get:

Profile:  /usr/sbin/mysqld
Path:     /etc/hosts.allow
New Mode: r
Severity: unknown
  1 - #include <abstractions/lxc/container-base>
  2 - #include <abstractions/lxc/start-container>
 [3 - /etc/hosts.allow r,]
(A)llow / [(D)eny] / (I)gnore / (G)lob / Glob with (E)xtension / (N)ew / Audi(t) / Abo(r)t / (F)inish

Notice the “[ ]” have moved to the bottom entry and then, press “A”. You’ll also get questions like:

Profile:  /usr/bin/mysqld_safe
Execute:  /bin/sed
Severity: unknown
(I)nherit / (C)hild / (N)amed / (X) ix On / (D)eny / Abo(r)t / (F)inish

For such a question, my answer is “I” for inherit. After a while, you’ll get through all the questions and you’ll be asked to save the profile:

The following local profiles were changed. Would you like to save them?
 [1 - /usr/bin/mysqld_safe]
(S)ave Changes / Save Selec(t)ed Profile / [(V)iew Changes] / View Changes b/w (C)lean profiles / Abo(r)t
Writing updated profile for /usr/bin/mysqld_safe.

Revise the profile

Do not hesitate to edit the profile if you see, for example, many similar file entries which could be replaced by a “*” or “**”. If you manually modify the profile, you need to parse it to load your changes:

root@BlogApparmor2:~# apparmor_parser -r /etc/apparmor.d/usr.bin.mysqld_safe

Copy the profile

Once you have a server running with AppArmor enforced on PXC, simply copy the profile to the other servers and parse it. For example:

root@BlogApparmor3:~# cd /etc/apparmor.d
root@BlogApparmor3:/etc/apparmor.d# scp ubuntu@10.0.4.76:/etc/apparmor.d/usr.bin.mysqld_safe .
ubuntu@10.0.4.76's password:
usr.bin.mysqld_safe                                   100% 2285     3.0MB/s   00:00
root@BlogApparmor3:/etc/apparmor.d# aa-enforce usr.bin.mysqld_safe
Setting /etc/apparmor.d/usr.bin.mysqld_safe to enforce mode.

You can always verify if the profile is enforced with:

root@BlogApparmor3:/etc/apparmor.d# aa-status
apparmor module is loaded.
42 profiles are loaded.
20 profiles are in enforce mode.
 /sbin/dhclient
 ...
 /usr/bin/mysqld_safe
 ...
 man_groff

Once enforced, I strongly advise to monitor the log files on a regular basis to see if anything has been overlooked. Similarly if you encounter a strange and unexpected behavior with PXC. Have the habit of checking the logs, it might save a lot of frustrating work.

Conclusion

As we have just seen, enabling AppArmor with PXC is not a difficult task, it just requires some patience. AppArmor is an essential component of a layered security approach. It achieves similar goals as the other well known DAC framework, SELinux. With the rising security concerns and the storage of sensitive data in databases, there are compelling reasons to enforce a DAC framework. I hope these two posts will help DBAs and Sysadmins to configure and enable DAC for PXC.

The post Lock Down: Enforcing AppArmor with Percona XtraDB Cluster appeared first on Percona Database Performance Blog.

by Yves Trudeau at August 09, 2018 11:26 AM

Webinar Thursday Aug 9: Migrating to AWS Aurora, Monitoring AWS Aurora with PMM

Monitoring Amazon Aurora with PMM

Monitoring Amazon Aurora with PMMPlease join Autodesk’s Senior Database Engineer, Vineet Khanna, and Percona’s Sr. MySQL DBA, Tate McDaniel as they present Migrating to Aurora and Monitoring with PMM on Thursday, August 9th, 2018, at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

Amazon Web Services (AWS) Aurora is one of the most popular cloud-based RDBMS solutions. The main reason for Aurora’s success is because it’s based on InnoDB storage engine.

In this session, we will talk about how you can efficiently plan for migration to Aurora using Terraform and Percona products and solutions. We will share our Terraform code for launching AWS Aurora clusters, look at tricks for checking data consistency, verify migration paths and effectively monitor the environment using PMM.

The topics in this session include:

  • Why AWS Aurora? What is the future of AWS Aurora?
  • Build Aurora Infrastructure
  • Using Terraform (Without Data)
  • Restore Using Terraform & Percona XtraBackup (Using AWS S3 Bucket)
  • Verify data consistency
  • Aurora migration
  • 1:1 migration
  • Many:1 migration using Percona Server multi-source replication
  • Show benchmarks and PMM dashboard
  • Demo

Register Now

 

Vineet KhannaVineet Khanna, Senior Database Engineer, Autodesk

Vineet Khanna, Senior Database Engineer at Autodesk, has 10+ years of experience as a MySQL DBA. His main professional interests are managing complex database environments, improving database performance, architecting High Availability solutions for MySQL. He has handled database environments of organizations like Chegg, Zendesk, Adobe.

 

Tate McDanielTate Mcdaniel, Sr. MySQL DBA

Tate joined Percona in June 2017 as a Remote MySQL DBA. He holds a Bachelors degree in Information Systems and Decision Strategies from LSU. He has 10+ years of experience working with MySQL and operations management. His great love is application query tuning. In his off time, he races sailboats, travels the Caribbean by sailboat, and
drives all over in an RV.

The post Webinar Thursday Aug 9: Migrating to AWS Aurora, Monitoring AWS Aurora with PMM appeared first on Percona Database Performance Blog.

by Tate McDaniel at August 09, 2018 10:11 AM

How to Change Settings for PMM Deployed via Docker

change settings for PMM deployed docker

When deployed through Docker Percona Monitoring and Management (PMM) uses environment variables for its configuration

For example, if you want to adjust metrics resolution you can pass

-e METRICS_RESOLUTION=Ns
  as  an option to the
docker run
  command:

docker run -d \
  -p 80:80 \
  --volumes-from pmm-data \
  --name pmm-server \
  --restart always \
  -e METRICS_RESOLUTION=2s \
  percona/pmm-server:latest

You would think if you want to change the setting for existing installation you can just stop the container with

docker stop
  and when you want to start, passing new environment variable with
docker start

Unfortunately, this is not going to work as

docker start
 does not support changing environment variables, at least not at the time of writing. I assume the idea is to keep container immutable and if you want container with different properties—like environment variables—you should run a new container instead. Here’s how.

Stop and Rename the old container, just in case you want to go back

docker stop pmm-server
docker rename pmm-server pmm-server-old

Refresh the container with the latest version

docker pull percona/pmm-server:latest

Do not miss this step!  When you destroy and recreate the container, all the updates you have done through PMM Web interface will be lost. What’s more, the software version will be reset to the one in the Docker image. Running an old PMM version with a data volume modified by a new PMM version may cause unpredictable results. This could include data loss.

Run the container with the new settings, for example changing METRICS_RESOLUTION

docker run -d \
  -p 80:80 \
  --volumes-from pmm-data \
  --name pmm-server \
  --restart always \
  -e METRICS_RESOLUTION=5s \
  percona/pmm-server:latest

After you’re happy with your new container deployment you can remove the old container

docker rm pmm-server-old

That’s it! You should have running the latest PMM version with updated configuration settings.

The post How to Change Settings for PMM Deployed via Docker appeared first on Percona Database Performance Blog.

by Peter Zaitsev at August 09, 2018 09:21 AM

Open Query Pty Ltd

INSERT IGNORE … ON DUPLICATE KEY UPDATE … allowed

Not serious, but just interesting: MDEV-16925: INSERT IGNORE … ON DUPLICATE KEY UPDATE … allowed

  • INSERT IGNORE allows an insert to come back with “ok” even if the row already exists.
  • INSERT … ON DUPLICATE KEY UPDATE … intends the UPDATE to be executed in case the row already exists.

So combining the two makes no sense, and while harmless, perhaps it would be better if the parser were to throw a syntax error for it.
Currently, it appears the server executes the ON DUPLICATE KEY, thus disregarding the IGNORE:

MariaDB [test]> create table dup (a int, b int, unique (a,b));
Query OK, 0 rows affected (0.019 sec)

MariaDB [test]> insert into dup values (1,20),(2,24);
Query OK, 2 rows affected (0.013 sec)
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [test]> select * from dup;
+------+------+
| a    | b    |
+------+------+
|    1 |   20 |
|    2 |   24 |
+------+------+
2 rows in set (0.000 sec)

MariaDB [test]> insert ignore foo values (2,24) on duplicate key update b=23;
Query OK, 1 row affected (0.006 sec)

MariaDB [test]> select * from dup;
+------+------+
| a    | b    |
+------+------+
|    1 |   20 |
|    2 |   24 |
+------+------+
2 rows in set (0.000 sec)

MariaDB [test]> insert ignore dup values (2,24) on duplicate key update b=23;
Query OK, 2 rows affected (0.007 sec)

MariaDB [test]> select * from dup;
+------+------+
| a    | b    |
+------+------+
|    1 |   20 |
|    2 |   23 |
+------+------+
2 rows in set (0.001 sec)

It has been noted that INSERT can chuck errors for reasons other than a duplicate key.
As the manual states:

“IGNORE has a similar effect on inserts into partitioned tables where no partition matching a given value is found. Without IGNORE, such INSERT statements are aborted with an error. When INSERT IGNORE is used, the insert operation fails silently for rows containing the unmatched value, but inserts rows that are matched.”

One example would be a foreign key check failing, in this case IGNORE would see the INSERT return without error (but without inserting a row, of course).

However, when combined with ON DUPLICATE KEY may create an ambiguity. If an fk check fails, but there is also a dup key clause, which one takes precedence? That is, will the on dup key be executed in that case if IGNORE is also specified, or will the server just return without error because of the fk check fail?
If we intend to allow this syntax, then the exact behaviour for each of the possible combinations of errors should be documented.

by Arjen Lentz at August 09, 2018 02:47 AM

August 07, 2018

Peter Zaitsev

Resource Usage Improvements in Percona Monitoring and Management 1.13

PMM 1-13 reduction CPU usage by 5x

In Percona Monitoring and Management (PMM) 1.13 we have adopted Prometheus 2, and with this comes a dramatic improvement in resource usage, along with performance improvements!

What does it mean for you? This means you can have a significantly larger number of servers and database instances monitored by the same PMM installation. Or you can reduce the instance size you use to monitor your environment and save some money.

Let’s look at some stats!

CPU Usage

PMM 1.13 reduction in CPU usage by 5x

Percona Monitoring and Management 1.13 reduction in CPU usage after adopting Prometheus 2 by 8x

We can see an approximate 5x and 8x reduction of CPU usage on these two PMM Servers. Depending on the workload, we see CPU usage reductions to range between 3x and 10x.

Disk Writes

There is also less disk write bandwidth required:

PMM 1.13 reduction in disk write bandwidth

On this instance, the bandwidth reduction is “just” 1.5x times. Note this is disk IO for the entire PMM system, which includes more than only the Prometheus component. Prometheus 2 itself promises much more significant IO bandwidth reduction according to official benchmarks

According to the same benchmark, you should expect disk space usage reduction by 33-50% for Prometheus 2 vs Prometheus 1.8. The numbers will be less for Percona Monitoring and Management, as it also stores Query Statistics outside of Prometheus.

Resource usage on the monitored hosts

Also, resource usage on the monitored hosts is significantly reduced:

Percona Monitoring and Management 1.13 reduction of resource usage by Prometheus 2

Why does CPU usage go down on a monitored host with a Prometheus 2 upgrade? This is because PMM uses TLS for the Prometheus to monitored host communication. Before Prometheus 2, a full handshake was performed for every scrape, taking a lot of CPU time. This was optimized with Prometheus 2, resulting in a dramatic CPU usage decrease.

Query performance is also a lot better with Prometheus 2, meaning dashboards visually load a lot faster, though we did not do any specific benchmarks here to share the hard numbers. Note though this improvement only applies when you’re querying the data which is stored in Prometheus 2.

If you’re querying data that was originally stored in Prometheus 1.8, it will be queried through the much slower and less efficient “Remote Read” interface, being quite a bit slower and using a lot more CPU and memory resources.

If you love better efficiency and Performance, consider upgrading to PMM 1.13!

The post Resource Usage Improvements in Percona Monitoring and Management 1.13 appeared first on Percona Database Performance Blog.

by Peter Zaitsev at August 07, 2018 07:34 PM

MariaDB Foundation

MariaDB 10.1.35 and MariaDB Galera Cluster 10.0.36 now available

The MariaDB project is pleased to announce the availability of MariaDB 10.1.35 and MariaDB Galera Cluster 10.0.36. Both are stable releases. See the release notes and changelogs for details. Download MariaDB 10.1.35 Release Notes Changelog What is MariaDB 10.1? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Galera Cluster 10.0.36 Release Notes Changelog What […]

The post MariaDB 10.1.35 and MariaDB Galera Cluster 10.0.36 now available appeared first on MariaDB.org.

by Ian Gilfillan at August 07, 2018 03:42 PM

MariaDB AB

MariaDB Server 10.1.35 and 10.0.36 Cluster now available

MariaDB Server 10.1.35 and 10.0.36 Cluster now available dbart Tue, 08/07/2018 - 11:20

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.1.35 and MariaDB Cluster 10.0.36. See the release notes and changelogs for details and visit mariadb.com/downloads to download.

.

Download MariaDB Server 10.1.35

Release Notes Changelog What is MariaDB 10.1?


Download MariaDB Cluster 10.0.36

Release Notes Changelog What is MariaDB Galera Cluster?

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.1.35 and MariaDB Cluster 10.0.36. See the release notes and changelogs for details.

Login or Register to post comments

by dbart at August 07, 2018 03:20 PM

Peter Zaitsev

Replicating from MySQL 8.0 to MySQL 5.7

replicate from MySQL 8 to MySQL 5.7

In this blog post, we’ll discuss how to set a replication from MySQL 8.0 to MySQL 5.7. There are some situations that having this configuration might help. For example, in the case of a MySQL upgrade, it can be useful to have a master that is using a newer version of MySQL to an older version slave as a rollback plan. Another example is in the case of upgrading a master x master replication topology.

Officially, replication is only supported between consecutive major MySQL versions, and only from a lower version master to a higher version slave. Here is an example of a supported scenario:

5.7 master –> 8.0 slave

while the opposite is not supported:

8.0 master –> 5.7 slave

In this blog post, I’ll walk through how to overcome the initial problems to set a replication working in this scenario. I’ll also show some errors that can halt the replication if a new feature from MySQL 8 is used.

Here is the initial set up that will be used to build the topology:

slave > select @@version;
+---------------+
| @@version     |
+---------------+
| 5.7.17-log |
+---------------+
1 row in set (0.00 sec)
master > select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.12    |
+-----------+
1 row in set (0.00 sec)

First, before executing the CHANGE MASTER command, you need to modify the collation on the master server. Otherwise the replication will run into this error:

slave > show slave status\G
                   Last_Errno: 22
                   Last_Error: Error 'Character set '#255' is not a compiled character set and is not specified in the '/opt/percona_server/5.7.17/share/charsets/Index.xml' file' on query. Default database: 'mysql8_1'. Query: 'create database mysql8_1'

This is because the default character_set and the collation has changed on MySQL 8. According to the documentation:

The default value of the character_set_server and character_set_database system variables has changed from latin1 to utf8mb4.

The default value of the collation_server and collation_database system variables has changed from latin1_swedish_ci to utf8mb4_0900_ai_ci.

Let’s change the collation and the character set to utf8 on MySQL 8 (it is possible to use any option that exists in both versions):

# master my.cnf
[client]
default-character-set=utf8
[mysqld]
character-set-server=utf8
collation-server=utf8_unicode_ci

You need to restart MySQL 8 to apply the changes. Next, after the restart, you have to create a replication user using mysql_native_password.This is because MySQL 8 changed the default Authentication Plugin to caching_sha2_password which is not supported by MySQL 5.7. If you try to execute the CHANGE MASTER command with a user using caching_sha2_password plugin, you will receive the error message below:

Last_IO_Errno: 2059
Last_IO_Error: error connecting to master 'root@127.0.0.1:19025' - retry-time: 60 retries: 1

To create a user using mysql_native_password :

master> CREATE USER 'replica_user'@'%' IDENTIFIED WITH mysql_native_password BY 'repli$cat';
master> GRANT REPLICATION SLAVE ON *.* TO 'replica_user'@'%';

Finally, we can proceed as usual to build the replication:

master > show master status\G
*************************** 1. row ***************************
File: mysql-bin.000007
Position: 155
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
slave > CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_USER='replica_user', MASTER_PASSWORD='repli$cat',MASTER_PORT=19025, MASTER_LOG_FILE='mysql-bin.000007', MASTER_LOG_POS=155; start slave;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
# This procedure works with GTIDs too
slave > CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_USER='replica_user', MASTER_PASSWORD='repli$cat',MASTER_PORT=19025,MASTER_AUTO_POSITION = 1 ; start slave;

Checking the replication status:

master > show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: replica_user
Master_Port: 19025
Connect_Retry: 60
Master_Log_File: mysql-bin.000007
Read_Master_Log_Pos: 155
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 321
Relay_Master_Log_File: mysql-bin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 155
Relay_Log_Space: 524
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 100
Master_UUID: 00019025-1111-1111-1111-111111111111
Master_Info_File: /home/vinicius.grippa/sandboxes/rsandbox_5_7_17/master/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.01 sec)

Executing a quick test to check if the replication is working:

master > create database vinnie;
Query OK, 1 row affected (0.06 sec)

slave > show databases like 'vinnie';
+-------------------+
| Database (vinnie) |
+-------------------+
| vinnie |
+-------------------+
1 row in set (0.00 sec)

Caveats

Any tentative attempts to use a new feature from MySQL 8 like roles, invisible indexes or caching_sha2_password will make the replication stop with an error:

master > alter user replica_user identified with caching_sha2_password by 'sekret';
Query OK, 0 rows affected (0.01 sec)

slave > show slave status\G
               Last_SQL_Errno: 1396
               Last_SQL_Error: Error 'Operation ALTER USER failed for 'replica_user'@'%'' on query. Default database: ''. Query: 'ALTER USER 'replica_user'@'%' IDENTIFIED WITH 'caching_sha2_password' AS '$A$005$H	MEDi\"gQ
                        wR{/I/VjlgBIUB08h1jIk4fBzV8kU1J2RTqeqMq8Q2aox0''

Summary

Replicating from MySQL 8 to MySQL 5.7 is possible. In some scenarios (especially upgrades), this might be helpful, but it is not advisable to have a heterogeneous topology because it will be prone to errors and incompatibilities under some cases.

You might also like:

 

The post Replicating from MySQL 8.0 to MySQL 5.7 appeared first on Percona Database Performance Blog.

by Vinicius Grippa at August 07, 2018 01:01 PM

August 06, 2018

Peter Zaitsev

Basic Understanding of Bloat and VACUUM in PostgreSQL

VACUUM and Bloat PostgreSQL

VACUUM and Bloat PostgreSQLImplementation of MVCC (Multi-Version Concurrency Control) in PostgreSQL is different and special when compared with other RDBMS. MVCC in PostgreSQL controls which tuples can be visible to transactions via versioning.

What is versioning in PostgreSQL?

Let’s consider the case of an Oracle or a MySQL Database. What happens when you perform a DELETE or an UPDATE of a row? You see an UNDO record maintained in a global UNDO Segment. This UNDO segment contains the past image of a row, to help database achieve consistency. (the “C” in A.C.I.D). For example, if there is an old transaction that depends on the row that got deleted, the row may still be visible to it because the past image is still maintained in the UNDO. If you are an Oracle DBA reading this blog post, you may quickly recollect the error

ORA-01555 snapshot too old
 . What this error means is—you may have a smaller undo_retention or not a huge UNDO segment that could retain all the past images (versions) needed by the existing or old transactions.

You may not have to worry about that with PostgreSQL.

Then how does PostgreSQL manage UNDO ?

In simple terms, PostgreSQL maintains both the past image and the latest image of a row in its own Table. It means, UNDO is maintained within each table. And this is done through versioning. Now, we may get a hint that, every row of PostgreSQL table has a version number. And that is absolutely correct. In order to understand how these versions are maintained within each table, you should understand the hidden columns of a table (especially xmin) in PostgreSQL.

Understanding the Hidden Columns of a Table

When you describe a table, you would only see the columns you have added, like you see in the following log.

percona=# \d scott.employee
                                          Table "scott.employee"
  Column  |          Type          | Collation | Nullable |                    Default
----------+------------------------+-----------+----------+------------------------------------------------
 emp_id   | integer                |           | not null | nextval('scott.employee_emp_id_seq'::regclass)
 emp_name | character varying(100) |           |          |
 dept_id  | integer                |           |          |

However, if you look at all the columns of the table in pg_attribute, you should see several hidden columns as you see in the following log.

percona=# SELECT attname, format_type (atttypid, atttypmod)
FROM pg_attribute
WHERE attrelid::regclass::text='scott.employee'
ORDER BY attnum;
 attname  |      format_type
----------+------------------------
 tableoid | oid
 cmax     | cid
 xmax     | xid
 cmin     | cid
 xmin     | xid
 ctid     | tid
 emp_id   | integer
 emp_name | character varying(100)
 dept_id  | integer
(9 rows)

Let’s understand a few of these hidden columns in detail.

tableoid : Contains the OID of the table that contains this row. Used by queries that select from inheritance hierarchies.
More details on table inheritance can be found here : https://www.postgresql.org/docs/10/static/ddl-inherit.html

xmin : The transaction ID(xid) of the inserting transaction for this row version. Upon update, a new row version is inserted. Let’s see the following log to understand the xmin more.

percona=# select txid_current();
 txid_current
--------------
          646
(1 row)
percona=# INSERT into scott.employee VALUES (9,'avi',9);
INSERT 0 1
percona=# select xmin,xmax,cmin,cmax,* from scott.employee where emp_id = 9;
 xmin | xmax | cmin | cmax | emp_id | emp_name | dept_id
------+------+------+------+--------+----------+---------
  647 |    0 |    0 |    0 |      9 | avi      |       9
(1 row)

As you see in the above log, the transaction ID was 646 for the command => select txid_current(). Thus, the immediate INSERT statement got a transaction ID 647. Hence, the record was assigned an xmin of 647. This means, no transaction ID that has started before the ID 647, can see this row. In other words, already running transactions with txid less than 647 cannot see the row inserted by txid 647. 

With the above example, you should now understand that every tuple has an xmin that is assigned the txid that inserted it.

Note: the behavior may change depending on the isolation levels you choose, would be discussed later in another blog post.

xmax : This values is 0 if it was not a deleted row version. Before the DELETE is committed, the xmax of the row version changes to the ID of the transaction that has issued the DELETE. Let’s observe the following log to understand that better.

On Terminal A : We open a transaction and delete a row without committing it.

percona=# BEGIN;
BEGIN
percona=# select txid_current();
 txid_current
--------------
          655
(1 row)
percona=# DELETE from scott.employee where emp_id = 10;
DELETE 1

On Terminal B : Observe the xmax values before and after the delete (that has not been committed).

Before the Delete
------------------
percona=# select xmin,xmax,cmin,cmax,* from scott.employee where emp_id = 10;
 xmin | xmax | cmin | cmax | emp_id | emp_name | dept_id
------+------+------+------+--------+----------+---------
  649 |    0 |    0 |    0 |     10 | avi      |      10
After the Delete
------------------
percona=# select xmin,xmax,cmin,cmax,* from scott.employee where emp_id = 10;
 xmin | xmax | cmin | cmax | emp_id | emp_name | dept_id
------+------+------+------+--------+----------+---------
  649 |  655 |    0 |    0 |     10 | avi      |      10
(1 row)

As you see in the above logs, the xmax value changed to the transaction ID that has issued the delete. If you have issued a ROLLBACK, or if the transaction got aborted, xmax remains at the transaction ID that tried to DELETE it (which is 655) in this case.

Now that we understand the hidden columns xmin and xmax, let’s observe what happens after a DELETE or an UPDATE in PostgreSQL. As we discussed earlier, through the hidden columns in PostgreSQL for every table, we understand that there are multiple versions of rows maintained within each table. Let’s see the following example to understand this better.

We’ll insert 10 records to the table : scott.employee

percona=# INSERT into scott.employee VALUES (generate_series(1,10),'avi',1);
INSERT 0 10

Now, let’s DELETE 5 records from the table.

percona=# DELETE from scott.employee where emp_id > 5;
DELETE 5
percona=# select count(*) from scott.employee;
 count
-------
     5
(1 row)

Now, when you check the count after DELETE, you would not see the records that have been DELETED. To see any row versions that exist in the table but are not visible, we have an extension called pageinspect. The pageinspect module provides functions that allow you to inspect the contents of database pages at a low level, which is useful for debugging purposes. Let’s create this extension to see the older row versions those have been deleted.

percona=# CREATE EXTENSION pageinspect;
CREATE EXTENSION
percona=# SELECT t_xmin, t_xmax, tuple_data_split('scott.employee'::regclass, t_data, t_infomask, t_infomask2, t_bits) FROM heap_page_items(get_raw_page('scott.employee', 0));
 t_xmin | t_xmax |              tuple_data_split
--------+--------+---------------------------------------------
    668 |      0 | {"\\x01000000","\\x09617669","\\x01000000"}
    668 |      0 | {"\\x02000000","\\x09617669","\\x01000000"}
    668 |      0 | {"\\x03000000","\\x09617669","\\x01000000"}
    668 |      0 | {"\\x04000000","\\x09617669","\\x01000000"}
    668 |      0 | {"\\x05000000","\\x09617669","\\x01000000"}
    668 |    669 | {"\\x06000000","\\x09617669","\\x01000000"}
    668 |    669 | {"\\x07000000","\\x09617669","\\x01000000"}
    668 |    669 | {"\\x08000000","\\x09617669","\\x01000000"}
    668 |    669 | {"\\x09000000","\\x09617669","\\x01000000"}
    668 |    669 | {"\\x0a000000","\\x09617669","\\x01000000"}
(10 rows)

Now, we could still see 10 records in the table even after deleting 5 records from it. Also, you can observe here that t_xmax is set to the transaction ID that has deleted them. These deleted records are retained in the same table to serve any of the older transactions that are still accessing them.

We’ll take a look at what an UPDATE would do in the following Log.  

percona=# DROP TABLE scott.employee ;
DROP TABLE
percona=# CREATE TABLE scott.employee (emp_id INT, emp_name VARCHAR(100), dept_id INT);
CREATE TABLE
percona=# INSERT into scott.employee VALUES (generate_series(1,10),'avi',1);
INSERT 0 10
percona=# UPDATE scott.employee SET emp_name = 'avii';
UPDATE 10
percona=# SELECT t_xmin, t_xmax, tuple_data_split('scott.employee'::regclass, t_data, t_infomask, t_infomask2, t_bits) FROM heap_page_items(get_raw_page('scott.employee', 0));
 t_xmin | t_xmax |               tuple_data_split
--------+--------+-----------------------------------------------
    672 |    673 | {"\\x01000000","\\x09617669","\\x01000000"}
    672 |    673 | {"\\x02000000","\\x09617669","\\x01000000"}
    672 |    673 | {"\\x03000000","\\x09617669","\\x01000000"}
    672 |    673 | {"\\x04000000","\\x09617669","\\x01000000"}
    672 |    673 | {"\\x05000000","\\x09617669","\\x01000000"}
    672 |    673 | {"\\x06000000","\\x09617669","\\x01000000"}
    672 |    673 | {"\\x07000000","\\x09617669","\\x01000000"}
    672 |    673 | {"\\x08000000","\\x09617669","\\x01000000"}
    672 |    673 | {"\\x09000000","\\x09617669","\\x01000000"}
    672 |    673 | {"\\x0a000000","\\x09617669","\\x01000000"}
    673 |      0 | {"\\x01000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x02000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x03000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x04000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x05000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x06000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x07000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x08000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x09000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x0a000000","\\x0b61766969","\\x01000000"}
(20 rows)

An UPDATE in PostgreSQL would perform an insert and a delete. Hence, all the records being UPDATED have been deleted and inserted back with the new value. Deleted records have non-zero t_xmax value.

Records for which you see a non-zero value for t_xmax may be required by the previous transactions to ensure consistency based on appropriate isolation levels.

We discussed about xmin and xmax. What are these hidden columns cmin and cmax ?

cmax : The command identifier within the deleting transaction or zero. (As per the documentation). However, both cmin and cmax are always the same as per the PostgreSQL source code.

cmin : The command identifier within the inserting transaction. You could see the cmin of the 3 insert statements starting with 0, in the following log.

See the following log to understand how the cmin and cmax values change through inserts and deletes in a transaction.

On Terminal A
---------------
percona=# BEGIN;
BEGIN
percona=# INSERT into scott.employee VALUES (1,'avi',2);
INSERT 0 1
percona=# INSERT into scott.employee VALUES (2,'avi',2);
INSERT 0 1
percona=# INSERT into scott.employee VALUES (3,'avi',2);
INSERT 0 1
percona=# INSERT into scott.employee VALUES (4,'avi',2);
INSERT 0 1
percona=# INSERT into scott.employee VALUES (5,'avi',2);
INSERT 0 1
percona=# INSERT into scott.employee VALUES (6,'avi',2);
INSERT 0 1
percona=# INSERT into scott.employee VALUES (7,'avi',2);
INSERT 0 1
percona=# INSERT into scott.employee VALUES (8,'avi',2);
INSERT 0 1
percona=# COMMIT;
COMMIT
percona=# select xmin,xmax,cmin,cmax,* from scott.employee;
 xmin | xmax | cmin | cmax | emp_id | emp_name | dept_id
------+------+------+------+--------+----------+---------
  644 |    0 |    0 |    0 |      1 | avi      |       2
  644 |    0 |    1 |    1 |      2 | avi      |       2
  644 |    0 |    2 |    2 |      3 | avi      |       2
  644 |    0 |    3 |    3 |      4 | avi      |       2
  644 |    0 |    4 |    4 |      5 | avi      |       2
  644 |    0 |    5 |    5 |      6 | avi      |       2
  644 |    0 |    6 |    6 |      7 | avi      |       2
  644 |    0 |    7 |    7 |      8 | avi      |       2
(8 rows)

If you observe the above output log, you see cmin and cmax values incrementing for each insert.

Now let’s delete 3 records from Terminal A and observe how the values appear in Terminal B before COMMIT.

On Terminal A
---------------
percona=# BEGIN;
BEGIN
percona=# DELETE from scott.employee where emp_id = 4;
DELETE 1
percona=# DELETE from scott.employee where emp_id = 5;
DELETE 1
percona=# DELETE from scott.employee where emp_id = 6;
DELETE 1
On Terminal B, before issuing COMMIT on Terminal A
----------------------------------------------------
percona=# select xmin,xmax,cmin,cmax,* from scott.employee;
 xmin | xmax | cmin | cmax | emp_id | emp_name | dept_id
------+------+------+------+--------+----------+---------
  644 |    0 |    0 |    0 |      1 | avi      |       2
  644 |    0 |    1 |    1 |      2 | avi      |       2
  644 |    0 |    2 |    2 |      3 | avi      |       2
  644 |  645 |    0 |    0 |      4 | avi      |       2
  644 |  645 |    1 |    1 |      5 | avi      |       2
  644 |  645 |    2 |    2 |      6 | avi      |       2
  644 |    0 |    6 |    6 |      7 | avi      |       2
  644 |    0 |    7 |    7 |      8 | avi      |       2
(8 rows)

Now, in the above log, you see that the cmax and cmin values have incrementally started from 0 for the records being deleted. Their values where different before the delete, as we have seen earlier. Even if you ROLLBACK, the values remain the same.

After understanding the hidden columns and how PostgreSQL maintains UNDO as multiple versions of rows, the next question would be—what would clean up this UNDO from a table? Doesn’t this increase the size of a table continuously? In order to understand that better, we need to know about VACUUM in PostgreSQL.

VACUUM in PostgreSQL

As seen in the above examples, every such record that has been deleted but is still taking some space is called a dead tuple. Once there is no dependency on those dead tuples with the already running transactions, the dead tuples are no longer needed. Thus, PostgreSQL runs VACUUM on such Tables. VACUUM reclaims the storage occupied by these dead tuples. The space occupied by these dead tuples may be referred to as Bloat. VACUUM scans the pages for dead tuples and marks them to the freespace map (FSM). Each relation apart from hash indexes has an FSM stored in a separate file called <relation_oid>_fsm.

Here, relation_oid is the oid of the relation that is visible in pg_class.

percona=# select oid from pg_class where relname = 'employee';
  oid
-------
 24613
(1 row)

Upon VACUUM, this space is not reclaimed to disk but can be re-used by future inserts on this table. VACUUM stores the free space available on each heap (or index) page to the FSM file.

Running a VACUUM is a non-blocking operation. It never causes exclusive locks on tables. This means VACUUM can run on a busy transactional table in production while there are several transactions writing to it.

As we discussed earlier, an UPDATE of 10 records has generated 10 dead tuples. Let us see the following log to understand what happens to those dead tuples after a VACUUM.

percona=# VACUUM scott.employee ;
VACUUM
percona=# SELECT t_xmin, t_xmax, tuple_data_split('scott.employee'::regclass, t_data, t_infomask, t_infomask2, t_bits) FROM heap_page_items(get_raw_page('scott.employee', 0));
 t_xmin | t_xmax |               tuple_data_split
--------+--------+-----------------------------------------------
        |        |
        |        |
        |        |
        |        |
        |        |
        |        |
        |        |
        |        |
        |        |
        |        |
    673 |      0 | {"\\x01000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x02000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x03000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x04000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x05000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x06000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x07000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x08000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x09000000","\\x0b61766969","\\x01000000"}
    673 |      0 | {"\\x0a000000","\\x0b61766969","\\x01000000"}
(20 rows)

In the above log, you might notice that the dead tuples are removed and the space is available for re-use. However, this space is not reclaimed to filesystem after VACUUM. Only the future inserts can use this space.

VACUUM does an additional task. All the rows that are inserted and successfully committed in the past are marked as frozen, which indicates that they are visible to all the current and future transactions. We will be discussing this in detail in our future blog post “Transaction ID Wraparound in PostgreSQL”.

VACUUM does not usually reclaim the space to filesystem unless the dead tuples are beyond the high water mark.

Let’s consider the following example to see when a VACUUM could release the space to filesystem.

Create a table and insert some sample records. The records are physically ordered on the disk based on the primary key index.

percona=# CREATE TABLE scott.employee (emp_id int PRIMARY KEY, name varchar(20), dept_id int);
CREATE TABLE
percona=# INSERT INTO scott.employee VALUES (generate_series(1,1000), 'avi', 1);
INSERT 0 1000

Now, run ANALYZE on the table to update its statistics and see how many pages are allocated to the table after the above insert.

percona=# ANALYZE scott.employee ;
ANALYZE
percona=# select relpages, relpages*8192 as total_bytes, pg_relation_size('scott.employee') as relsize
FROM pg_class
WHERE relname = 'employee';
relpages | total_bytes | relsize
---------+-------------+---------
6        | 49152       | 49152
(1 row)

Let’s now see how VACUUM behaves when you delete the rows with emp_id > 500

percona=# DELETE from scott.employee where emp_id > 500;
DELETE 500
percona=# VACUUM ANALYZE scott.employee ;
VACUUM
percona=# select relpages, relpages*8192 as total_bytes, pg_relation_size('scott.employee') as relsize
FROM pg_class
WHERE relname = 'employee';
relpages | total_bytes | relsize
---------+-------------+---------
3        | 24576       | 24576
(1 row)

In the above log, you see that the VACUUM has reclaimed half the space to filesystem. Earlier, it occupied 6 pages (8KB each or as set to parameter : block_size). After VACUUM, it has released 3 pages to filesystem.

Now, let’s repeat the same exercise by deleting the rows with emp_id < 500

percona=# DELETE from scott.employee ;
DELETE 500
percona=# INSERT INTO scott.employee VALUES (generate_series(1,1000), 'avi', 1);
INSERT 0 1000
percona=# DELETE from scott.employee where emp_id < 500;
DELETE 499
percona=# VACUUM ANALYZE scott.employee ;
VACUUM
percona=# select relpages, relpages*8192 as total_bytes, pg_relation_size('scott.employee') as relsize
FROM pg_class
WHERE relname = 'employee';
 relpages | total_bytes | relsize
----------+-------------+---------
        6 |       49152 |   49152
(1 row)

In the above example, you see that the number of pages still remain same after deleting half the records from the table. This means, VACUUM has not released the space to filesystem this time.

As explained earlier, if there are pages with no more live tuples after the high water mark, the subsequent pages can be flushed away to the disk by VACUUM. In the first case, it is understandable that there are no more live tuples after the 3rd page. So, the 4th, 5th and 6th page have been flushed to disk.

However, If you would need to reclaim the space to filesystem in the scenario where we deleted all the records with emp_id < 500, you may run VACUUM FULL. VACUUM FULL rebuilds the entire table and reclaims the space to disk.

percona=# VACUUM FULL scott.employee ;
VACUUM
percona=# VACUUM ANALYZE scott.employee ;
VACUUM
percona=# select relpages, relpages*8192 as total_bytes, pg_relation_size('scott.employee') as relsize
FROM pg_class
WHERE relname = 'employee';
 relpages | total_bytes | relsize
----------+-------------+---------
        3 |       24576 |   24576
(1 row)

Please note that VACUUM FULL is not an ONLINE operation. It is a blocking operation. You cannot read from or write to the table while VACUUM FULL is in progress. We will discuss about the ways to rebuild a table online without blocking in our future blog post.

The post Basic Understanding of Bloat and VACUUM in PostgreSQL appeared first on Percona Database Performance Blog.

by Avinash Vallarapu at August 06, 2018 01:46 PM

Webinar Tues 8/14: Utilizing ProxySQL for Connection Pooling in PHP

ProxySQL for connection pooling

ProxySQL for connection poolingPlease join Percona’s Architect, Tibi Köröcz as he presents Utilizing ProxySQL for Connection Pooling in PHP on Tuesday August 14, 2018, at 8:00 am PDT (UTC-7) / 11:00 am EDT (UTC-4).

 

ProxySQL is a very powerful tool, with extended capabilities. This presentation will demonstrate how to use ProxySQL to gain functionality (seamless database backend switch) and correct problems (applications missing connection pooling).

The presentation will be a real-life study on how we use ProxySQL for connection pooling, database failover and load balancing the communication between our (third party) PHP-application and our master-master MySQL-cluster.
Also, we will show monitoring and statistics using Percona Monitoring and Management (PMM).

Register Now!

Tibor Köröcz

Architect

ProxySQL for Connection Pooling

Tibi joined Percona in 2015 as a Consultant. Before joining Percona, among many other things, he worked at the world’s largest car hire booking service as a Senior Database Engineer. He enjoys trying and working with the latest technologies and applications which can help or work with MySQL together. In his spare time he likes to spend time with his friends, travel around the world and play ultimate frisbee.

 

The post Webinar Tues 8/14: Utilizing ProxySQL for Connection Pooling in PHP appeared first on Percona Database Performance Blog.

by Tibor Korocz at August 06, 2018 09:17 AM

August 04, 2018

Valeriy Kravchuk

Fun with Bugs #70 - On MySQL Bug Reports I am Subscribed to, Part VIII

More than 2 months passed since my previous review of active MySQL bug reports I am subscribed to, so it's time to describe what I was interested in this summer.

Let's start with few bug reports that really surprised me:
  • Bug #91893 - "LOAD DATA INFILE throws error with NOT NULL column defined via SET". The bug was reported yesterday and seem to be about a regression in MySQL 8.0.12 vs older versions. At least I have no problem to use such a way to generate columns for LOAD DATA with MariaDB 10.3.7.
  • Bug #91847 - "Assertion `thread_ids.empty()' failed.". As usual, Roel Van de Paar finds funny corner cases and assertion failures of all kinds. This time in MySQL 8.0.12.
  • Bug #91822 - "incorrect datatype in RBR event when column is NULL and not explicit in query". Ernie Souhrada found out that the missing column is given the datatype of the column immediately preceding it, at least according to mysqlbinlog output.
  • Bug #91803 - "mysqladmin shutdown does not wait for MySQL to shut down anymore". My life will never be the same as before after this. How can we trust anything when even shutdown command is no longer works as expected? I hope this bug is not confirmed after all, it's still "Open".
  • Bug #91769 - "mysql_upgrade return code is '0' even after errors". Good luck to script writers! The bug is still "Open".
  • Bug #91647 - "wrong result while alter an event". Events may just disappear when you alter them. Take care!
  • Bug #91610 - "5.7 or later returns an error on strict mode when del/update with error func". Here Meiji Kimura noted a case when the behavior of strict sql_mode differs in MySQL 5.6 vs never versions.
  • Bug #91585 - "“dead” code inside the stored proc or function can significantly slow it down". This was proved by Alexander Rubin from Percona.
  • Bug #91577 - "INFORMATION_SCHEMA.INNODB_FOREIGN does not return a correct TYPE". This is a really weird bug in MySQL 8.
  • Bug #91377 - "Can't Initialize MySQl if internal_tmp_disk_storage_engine is set to MYISAM". It seems Oracle tries really hard to get rid of MyISAM by all means in MySQL 8 :)
  • Bug #91203 - "For partitions table, deal with NULL with is mismatch with reference guide". All version affected. maybe manual is wrong, but then we see weird results in information_schema as well. So, let's agree for now that it's a "Verified" bug in partitioning...
As usual, I am interested in InnoDB-related bugs:
  • Bug #91861 - "The buf_LRU_free_page function may leak some memory in a particular scenario". This is a very interesting bug report about the memory leak that happens when tables are compressed. It shows how to use memory instrumentation in performance_schema to pinpoint the leak. This bug report is still "Open".
  • Bug #91630 - "stack-use-after-scope in innobase_convert_identifier() detected by ASan". Yura Sorokin from Percona had not only reported this problem, but also contributed a patch.
  • Bug #91120 - "MySQL8.0.11: ibdata1 file size can't be more than 4G". Why nobody tries to do anything about this "Verified" bug reported 2 months ago?
  • Bug #91048 - "ut_basename_noext now dead code". This was reported by Laurynas Biveinis.
Replication problems are also important to know about:
  • Bug #91744 - "START SLAVE UNTIL going further than it should." This scary bug in cyclic replication setup was reported by  Jean-François Gagné 2 weeks ago and is still "Open" at the moment.
  • Bug #91633 - "Replication failure (errno 1399) on update in XA tx after deadlock". On top of all other problems with XA transactions we have in MySQL, it seems that replication may break upon executing a row update immediately after a forced transaction rollback due to a deadlock being detected while in an XA transaction.
Some optimizer bugs also caught my attention:
  • Bug #91486 - "Wrong column type , view , binary". We have a "Verified" regression bug here without a "regression" tag or exact versions checked. Well done, Sinisa Milivojevic!
  • Bug #91386 - "Index for group-by is not used with primary key for SELECT COUNT(DISTINCT a)". Yet another case where a well known bug reporter, Monty Solomon, had  to apply enormous efforts to get it "Verified" as a feature request.
  • Bug #91139 - "use index dives less often". A "Verified" feature request from Mark Callaghan.
The last but not the least, documentation bugs. We have one today (assuming I do not care that much about group replication):
  • Bug #91074 - "INSTANT add column undocumented". It was reported by my former colleague in Percona, Jaime Crespo. The bug is still "Verified" as of now, but since MySQL 8.0.12 release I think it's no longer valid. I see a lot of related details here, for example. But nobody cares to close this bug properly and provide the links to manual that were previously missing.
That's all for today, folks! Stay tuned.

by Valeriy Kravchuk (noreply@blogger.com) at August 04, 2018 01:53 PM

August 03, 2018

MariaDB Foundation

MariaDB Galera Cluster 5.5.61, MariaDB Connector/C 3.0.6 and MariaDB Connector/ODBC 3.0.6 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB Galera Cluster 5.5.61 as well as MariaDB Connector/C 3.0.6 and MariaDB Connector/ODBC 3.0.6 all stable releases. See the release notes and changelogs for details. Download MariaDB Galera Cluster 5.5.61 Release Notes Changelog What is MariaDB Galera Cluster? Download MariaDB Connector/C 3.0.6 Release Notes Changelog […]

The post MariaDB Galera Cluster 5.5.61, MariaDB Connector/C 3.0.6 and MariaDB Connector/ODBC 3.0.6 now available appeared first on MariaDB.org.

by Ian Gilfillan at August 03, 2018 04:33 PM

MariaDB AB

MariaDB Cluster 5.5.61 and updated connectors now available

MariaDB Cluster 5.5.61 and updated connectors now available dbart Fri, 08/03/2018 - 12:00

The MariaDB project is pleased to announce the immediate availability of MariaDB Cluster 5.5.61 and updated MariaDB C/C++ and ODBC connectors. See the release notes and changelogs for details and visit mariadb.com/downloads to download.

Download MariaDB Cluster 5.5.61

Release Notes Changelog What is MariaDB Cluster?


Download MariaDB C/C++ and ODBC Connectors

MariaDB Connector/C 3.0.6 Release Notes MariaDB Connector/ODBC 3.0.6 Release Notes

The MariaDB project is pleased to announce the immediate availability of MariaDB Galera Cluster 5.5.61 and updated MariaDB Connectors C/C++ and ODBC. See the release notes and changelogs for details.

Marion  Henderson

Marion Henderson

Fri, 08/10/2018 - 03:58

Nice

[url=https://www.wikipedia.org/]...[/url]

Login or Register to post comments

by dbart at August 03, 2018 04:00 PM

Peter Zaitsev

Percona Server for MongoDB 3.6.6-1.4 Is Now Available

MongoRocks

Percona Server for MongoDBPercona announces the release of Percona Server for MongoDB 3.6.6-1.4 on August 3, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.6 Community Edition. It supports MongoDB 3.6 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine, as well as several enterprise-grade features. Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.6.6 and does not include any additional changes.

The Percona Server for MongoDB 3.6.6-1.4 release notes are available in the official documentation.

The post Percona Server for MongoDB 3.6.6-1.4 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at August 03, 2018 03:21 PM

This Week in Data with Colin Charles 47: MySQL 8.0.12 and It’s Time To Submit!

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Don’t wait, submit a talk for Percona Live Europe 2018 to be held in Frankfurt 5-7 November 2018. The call for proposals is ending soon, there is a committee being created, and it is a great conference to speak at, with a new city to boot!

Releases

  • A big release, MySQL 8.0.12, with INSTANT ADD COLUMN support, BLOB optimisations, changes around replication, the query rewrite plugin and lots more. Naturally this also means the connectors get bumped up to the 8.0.12, including a nice new MySQL Shell.
  • A maintenance release, with security fixes, MySQL 5.5.61 as well as MariaDB 5.5.61.
  • repmgr v4.1 helps monitor PostgreSQL replication, and can handle switch overs and failovers.

Link List

  • Saving With MyRocks in The Cloud – a great MyRocks use case, as in the cloud, resources are major considerations and you can save on I/O with MyRocks. As long as your workload is I/O bound, you’re bound to benefit.
  • Hasura GraphQL Engine allows you to get an instant GraphQL API on any PostgreSQL based application. This is in addition to Graphile. For MySQL users, there is Prisma.

Industry Updates

  • Jeremy Cole (Linkedin) ended his sabbatical to start work at Shopify. He was previously hacking on MySQL and MariaDB Server at Google, and had stints at Twitter, Yahoo!, his co-owned firm Proven Scaling, as well as MySQL AB.
  • Dremio raises $30 million from the likes of Cisco and more for their Series B. They are a “data-as-a-service” company, having raised a total of $45m in two rounds (Crunchbase).

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 47: MySQL 8.0.12 and It’s Time To Submit! appeared first on Percona Database Performance Blog.

by Colin Charles at August 03, 2018 12:10 PM

August 02, 2018

Peter Zaitsev

Amazon RDS Multi-AZ Deployments and Read Replicas

RDS Multi-AZ

Amazon RDS is a managed relational database service that makes it easier to set up, operate, and scale a relational database in the cloud. One of the common questions that we get is “What is Multi-AZ and how it’s different from Read Replica, do I need both?”.  I have tried to answer this question in this blog post and it depends on your application needs. Are you looking for High Availability (HA), read scalability … or both?

Before we go to into detail, let me explain two common terms used with Amazon AWS.

Region – an AWS region is a separate geographical area like US East (N. Virginia), Asia Pacific (Mumbai), EU (London) etc. Each AWS Region has multiple, isolated locations known as Availability Zones.

Availability Zone (AZ) – AZ is simply one or more data centers, each with redundant power, networking and connectivity, housed in separate facilities. Data centers are geographically isolated within the same region.

What is Multi-AZ?

Amazon RDS provides high availability and failover support for DB instances using Multi-AZ deployments.

In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica of the master DB in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to the standby replica to provide data redundancy, failover support and to minimize latency during system backups. In the event of planned database maintenance, DB instance failure, or an AZ failure of your primary DB instance, Amazon RDS automatically performs a failover to the standby so that database operations can resume quickly without administrative intervention.

You can check in the AWS management console if a database instance is configured as Multi-AZ. Select the RDS service, click on the DB instance and review the details section.

AWS management console showing that instance is Multi-AZ

This screenshot from AWS management console (above) shows that the database is hosted as Multi-AZ deployment and the standby replica is deployed in us-east-1a AZ.

Benefits of Multi-AZ deployment:

  • Replication to a standby replica is synchronous which is highly durable.
  • When a problem is detected on the primary instance, it will automatically failover to the standby in the following conditions:
    • The primary DB instance fails
    • An Availability Zone outage
    • The DB instance server type is changed
    • The operating system of the DB instance is undergoing software patching.
    • A manual failover of the DB instance was initiated using Reboot with failover.
  • The endpoint of the DB instance remains the same after a failover, the application can resume database operations without manual intervention.
  • If a failure occurs, your availability impact is limited to the time that the automatic failover takes to complete. This helps to achieve increased availability.
  • It reduces the impact of maintenance. RDS performs maintenance on the standby first, promotes the standby to primary master, and then performs maintenance on the old master which is now a standby replica.
  • To prevent any negative impact of the backup process on performance, Amazon RDS creates a backup from the standby replica.

Amazon RDS does not failover automatically in response to database operations such as long-running queries, deadlocks or database corruption errors. Also, the Multi-AZ deployments are limited to a single region only, cross-region Multi-AZ is not currently supported.

Can I use an RDS standby replica for read scaling?

The Multi-AZ deployments are not a read scaling solution, you cannot use a standby replica to serve read traffic. Multi-AZ maintains a standby replica for HA/failover. It is available for use only when RDS promotes the standby instance as the primary. To service read-only traffic, you should use a Read Replica instead.

What is Read Replica?

Read replicas allow you to have a read-only copy of your database.

When you create a Read Replica, you first specify an existing DB instance as the source. Then Amazon RDS takes a snapshot of the source instance and creates a read-only instance from the snapshot. You can use MySQL native asynchronous replication to keep Read Replica up-to-date with the changes. The source DB must have automatic backups enabled for setting up read replica.

Benefits of Read Replica

  • Read Replica helps in decreasing load on the primary DB by serving read-only traffic.
  • A Read Replica can be manually promoted as a standalone database instance.
  • You can create Read Replicas within AZ, Cross-AZ or Cross-Region.
  • You can have up to five Read Replicas per master, each with own DNS endpoint. Unlike a Multi-AZ standby replica, you can connect to each Read Replica and use them for read scaling.
  • You can have Read Replicas of Read Replicas.
  • Read Replicas can be Multi-AZ enabled.
  • You can use Read Replicas to take logical backups (mysqldump/mydumper) if you want to store the backups externally to RDS.
  • Read Replica helps to maintain a copy of databases in a different region for disaster recovery.

At AWS re:Invent 2017, AWS announced the preview for Amazon Aurora Multi-Master, this will allow users to create multiple Aurora writer nodes and helps in scaling reads/writes across multiple AZs. You can sign up for preview here.

Conclusion

While both (Multi-AZ and Read replica) maintain a copy of database but they are different in nature. Use Multi-AZ deployments for High Availability and Read Replica for read scalability. You can further set up a cross-region read replica for disaster recovery.

The post Amazon RDS Multi-AZ Deployments and Read Replicas appeared first on Percona Database Performance Blog.

by Alok Pathak at August 02, 2018 12:37 PM

August 01, 2018

Peter Zaitsev

Percona Monitoring and Management 1.13.0 Is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPMM (Percona Monitoring and Management) is a free and open-source platform for managing and monitoring MySQL and MongoDB performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

The most significant feature in this release is Prometheus 2, however we also packed a lot of visual changes into release 1.13:

  • Prometheus 2 – Consumes less resources, and Dashboards load faster!
  • New Dashboard: Network Overview – New dashboard for all things IPv4!
  • New Dashboard: NUMA Overview – New Dashboard! Understand memory allocation across DIMMs
  • Snapshots and Updates Improvements – Clearer instructions for snapshot sharing, add ability to disable update reporting
  • System Overview Dashboard improvements – See high level summary, plus drill in on CPU, Memory, Disk, and Network
  • Improved SingleStat for percentages – Trend line now reflects percentage value

We addressed 13 new features and improvements, and fixed 13 bugs.

Prometheus 2

The long awaited Prometheus 2 release is here!  By upgrading to PMM release 1.13, Percona’s internal testing has shown you will achieve a 3x-10x reduction in CPU usage, which translates into PMM Server being able to handle more instances than you could in 1.12.  You won’t see any gaps in graphs since internally PMM Server will run two instances of Prometheus and leverage remote_read in order to provide consistent graphs!

Our Engineering teams have worked very hard to make this upgrade as transparent as possible – hats off to them for their efforts!!

Lastly on Prometheus 2, we also included a new set of graphs to the Prometheus Dashboard to help you better understand when your PMM Server may run out of space. We hope you find this useful!

Network Overview Dashboard

We’re introducing a new dashboard that focuses on all things Networking – we placed a Last Hour panel highlighting high-level network metrics, and then drill into Network Traffic + Details, then focus on TCP, UDP, and ICMP behavior.

Snapshots and Updates Improvements

Of most interest to current Percona Customers, we’ve clarified the instructions on how to take a snapshot of a Dashboard in order to highlight that you are securely sharing with Percona. We’ve also configured the sharing timeout to 30 seconds (up from 4 seconds) so that we more reliably share useful data to Percona Support Engineers, as shorter timeout led to incomplete graphs being shared.

Packed into this feature is also a change to how we report installed version, latest version, and what’s new information:

Lastly, we modified the behavior of the docker environment option DISABLE_UPDATES to remove the Update button.  As a reminder, you can choose to disable update reporting for environments where you want tighter control over (i.e. lock down) who can initiate an update by launching the PMM docker container along with the environment variable as follows:

docker run ... -e DISABLE_UPDATES=TRUE

System Overview Dashboard Improvements

We’ve updated our System Overview Dashboard to focus on the four criteria of CPU, Memory, Disk, and Network, while also presenting a single panel row of high level information (uptime, count of CPUs, load average, etc)

Our last feature we’re introducing in 1.13 is a fix to SingleStat panels where the percentage value is reflected in the level of the trend line in the background.  For example, if you have a stat panel at 20% and 86%, the line in the background should fill the respective amount of the box:Improved SingleStat for percentages

New Features & Improvements

  • PMM-2225 – Add new Dashboard: Network Overview
  • PMM-2485 – Improve Singlestat for percentage values to accurately display trend line
  • PMM-2550 – Update to Prometheus 2
  • PMM-1667 – New Dashboard: NUMA Overview
  • PMM-1930 – Reduce Durability for MySQL
  • PMM-2291 – Add Prometheus Disk Space Utilization Information
  • PMM-2444 – Increase space for legends
  • PMM-2594 – Upgrade to Percona Toolkit 3.0.10
  • PMM-2610 – Configure Snapshot Timeout Default Higher and Update Instructions
  • PMM-2637 – Check for Updates and Disable Updates Improvements
  • PMM-2652 – Fix “Unexpected error” on Home dashboard after upgrade
  • PMM-2661 – Data resolution on Dashboards became 15sec min instead of 1sec
  • PMM-2663 – System Overview Dashboard Improvements

Bug Fixes

  • PMM-1977 – after upgrade pmm-client (1.6.1-1) can’t start mysql:metrics – can’t find .my.cnf
  • PMM-2379 – Invert colours for Memory Available graph
  • PMM-2413 – Charts on MySQL InnoDB metrics are not fully displayed
  • PMM-2427 – Information loss in CPU Graph with Grafana 5 upgrade
  • PMM-2476 – AWS PMM is broken on C5/M5 instances
  • PMM-2576 – Error in logs for MySQL 8 instance on CentOS
  • PMM-2612 – Wrong information in PMM Scrapes Task
  • PMM-2639 – mysql:metrics does not work on Ubuntu 18.04
  • PMM-2643 – Socket detection and MySQL 8
  • PMM-2698 – Misleading Graphs for Rare Events
  • PMM-2701 – MySQL 8 – Innodb Checkpoint Age
  • PMM-2722 – Memory auto-configuration for Prometheus evaluates to minimum of 128MB in entrypoint.sh

How to get PMM Server

PMM is available for installation using three methods:

The post Percona Monitoring and Management 1.13.0 Is Now Available appeared first on Percona Database Performance Blog.

by Michael Coburn at August 01, 2018 06:20 PM

MariaDB Foundation

MariaDB 10.0.36 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.0.36, the latest stable release in the MariaDB 10.0 series. See the release notes and changelogs for details. Download MariaDB 10.0.36 Release Notes Changelog What is MariaDB 10.0? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.0.36 Alexander Barkov (MariaDB Corporation) Alexey […]

The post MariaDB 10.0.36 now available appeared first on MariaDB.org.

by Ian Gilfillan at August 01, 2018 05:40 PM

MariaDB AB

MariaDB Server 5.5.61 and 10.0.36 now available

MariaDB Server 5.5.61 and 10.0.36 now available dbart Wed, 08/01/2018 - 13:01

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 5.5.61 and MariaDB Server 10.0.36. See the release notes and changelogs for details and visit mariadb.com/downloads to download.

Download MariaDB Server 5.5.61

Release Notes Changelog What is MariaDB 5.5?


Download MariaDB Server 10.0.36

Release Notes Changelog What is MariaDB 10.0?

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 5.5.61 and MariaDB Server 10.0.36. See the release notes and changelogs for details.

Login or Register to post comments

by dbart at August 01, 2018 05:01 PM

Peter Zaitsev

Saving With MyRocks in The Cloud

The main focus of a previous blog post was the performance of MyRocks when using fast SSD devices. However, I figured that MyRocks would be beneficial for use in cloud workloads, where storage is either slow or expensive.

In that earlier post, we demonstrated the benefits of MyRocks, especially for heavy IO workloads. Meanwhile, Mark wrote in his blog that the CPU overhead in MyRocks might be significant for CPU-bound workloads, but this should not be the issue for IO-bound workloads.

In the cloud the cost of resources is a major consideration. Let’s review the annual cost for the processing and storage resources.

 Resource cost/year, $   IO cost $/year   Total $/year 
c5.9xlarge  7881    7881
1TB io1 5000 IOPS  1500  3900    5400
1TB io1 10000 IOPS  1500  7800    9300
1TB io1 15000 IOPS  1500  11700  13200
1TB io1 20000 IOPS  1500  15600  17100
1TB io1 30000 IOPS  1500  23400  24900
3.4TB GP2 (10000 IOPS)  4800    4800

 

The scenario

The server version is Percona Server 5.7.22

For instances, I used c5.9xlarge instances. The reason for c5 was that it provides high performance Nitro virtualization: Brendan Gregg describes this in his blog post. The rationale for 9xlarge instances was to be able to utilize io1 volumes with a 30000 IOPS throughput – smaller instances will cap io1 throughput at a lower level.

I also used huge gp2 volumes: 3400GB, as this volume provides guaranteed 10000 IOPS even if we do not use io1 volumes. This is a cheaper alternative to io1 volumes to achieve 10000 IOPS.

For the workload I used sysbench-tpcc 5000W (50 tables * 100W), which for InnoDB gave about 471GB in storage used space.

For the cache I used 27GB and 54G buffer size, so the workload is IO-heavy.

I wanted to compare how InnoDB and RocksDB performed under this scenario.

If you are curious I prepared my terraform+ansible deployment files here: https://github.com/vadimtk/terraform-ansible-percona

Before jumping to the results, I should note that for MyRocks I used LZ4 compression for all levels, which in its final size is 91GB. That is five times less than InnoDB size. This alone provides operational benefits—for example to copy InnoDB files (471GB) from a backup volume takes longer than 1 hour, while it is much faster (five times) for MyRocks.

The benchmark results

So let’s review the results.

InnoDB versus MyRocks throughput in the cloud

Or presenting average throughput in a tabular form:

cachesize IOPS engine avg TPS
27 5000 innodb 132.66
27 5000 rocksdb 481.03
27 10000 innodb 285.93
27 10000 rocksdb 1224.14
27 10000gp2 innodb 227.19
27 10000gp2 rocksdb 1268.89
27 15000 innodb 436.04
27 15000 rocksdb 1839.66
27 20000 innodb 584.69
27 20000 rocksdb 2336.94
27 30000 innodb 753.86
27 30000 rocksdb 2508.97
54 5000 innodb 197.51
54 5000 rocksdb 667.63
54 10000 innodb 433.99
54 10000 rocksdb 1600.01
54 10000gp2 innodb 326.12
54 10000gp2 rocksdb 1559.98
54 15000 innodb 661.34
54 15000 rocksdb 2176.83
54 20000 innodb 888.74
54 20000 rocksdb 2506.22
54 30000 innodb 1097.31
54 30000 rocksdb 2690.91

 

We can see that MyRocks outperformed InnoDB in every single combination, but it is also important to note the following:

MyRocks on io1 5000 IOPS showed the performance that InnoDB showed in io1 15000 IOPS.

That means that InnoDB requires three times more in storage throughput. If we take a look at the storage cost, it corresponds to three times more expensive storage. Given that MyRocks requires less storage, it is possible to save even more on storage capacity.

On the most economical storage (3400GB gp2, which will provide 10000 IOPS) MyRocks showed 4.7 times better throughput.

For the 30000 IOPS storage, MyRocks was still better by 2.45 times.

However it is worth noting that MyRocks showed a greater variance in throughput during the runs. Let’s review the charts with 1 sec resolution for GP2 and io1 30000 IOPS storage:Throughput 1 sec resolution for GP2 and io1 30000 IOPS storage MyROCKS versus InnoDB

Such variance might be problematic for workloads that require stable throughput and where periodical slowdowns are unacceptable.

Conclusion

MyRocks is suitable and beneficial not only for fast SSD, but also for cloud deployments. By requiring less IOPS, MyRocks can provide better performance and save on the storage costs.

However, before evaluating MyRocks, make sure that your workload is IO-bound i.e. the working set is much bigger than available memory. For CPU-intensive workloads (where the working set fits into memory), MyRocks will be less beneficial or even perform worse than InnoDB (as described in the blog post A Look at MyRocks Performance)

 

 

 

The post Saving With MyRocks in The Cloud appeared first on Percona Database Performance Blog.

by Vadim Tkachenko at August 01, 2018 03:13 PM

MariaDB AB

MariaDB AX Distributed Tarball Installation

MariaDB AX Distributed Tarball Installation Faisal Wed, 08/01/2018 - 08:13

This guide is meant to help set up a MariaDB AX cluster using TARBALL tar.gz binary image instead of RPM files with a non-root account on CentOS 7 machines. But we still need to install some dependencies using root and yum repository manager.

By the end of this, we will have a 2 User Modules (UM), 3 Performance Module (PM) node cluster running on local storage.

Summary

Download the latest tarball binaries for MariaDB AX.

The following is the summary of tasks to be performed:

  • VMs OS prerequisites setup
    • We will be using CentOS 7, it should be identical for Red Hat Enterprise Linux as well.
  • Create MariaDB ColumnStore owner account and a group as mcsadm and set its password.
  • Setup the /etc/hosts file on all the nodes with IP - HostName mapping for easier access.
  • Download the ColumnStore TARBALL and extract it under /home/mcsadm.
  • Generate ssh key on PM1 using mcsadm user and copy the key to all the nodes.
  • Generate ssh key on UM1 and copy the public key to UM1 and UM2.
    • This is used for UM1 to UM2 data replication.
  • Generate ssh key on UM2 and copy the public key to UM2 and UM1.
    • This is used for UM2 to UM1 data replication.
  • Server preparation
    • Using the root user
      • Install the ColumnStore dependencies on all nodes using yum
      • setup umask on all nodes
      • setup sudo access on all nodes for mcsadm user
      • setup ulimit on all nodes for mcsadm user
  • Test the setup using the ClusterTest tool using mcsadm user

Preparing the VM OS

There are a few important things that are required before we start the installations.

Note: the following steps must be performed and validated on all the VMs

Disable SELinux

For this we will edit the SELinux configuration, in the file /etc/selinux/config, make sure to change SELINUX=disabled and it should look like this:

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

After saving and exiting, we will need to reboot the VM to take permanent effect. Check if the SELinux has actually been disabled, use either of the two commands (sestatus/getenforce) to confirm:

[root@localhost ~] sestatus
SELinux status:                 disabled

[root@localhost ~] getenforce
Disabled

Disable firewalld

Firewalld is a standard service that is disabled using the command systemctl on the RHEL 7 / CentOS 7. Disable it on all the nodes and check its status using the systemctl status firewalld:

[root@localhost ~] systemctl stop firewalld

[root@localhost ~] systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

[root@localhost ~] systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

Change localedef

Execute the following on all the nodes to make sure proper character set across the cluster.

[root@localhost ~] localedef -i en_US -f UTF-8 en_US.UTF-8

Performance optimization considerations

Following are some of the network-related optimizations that we can do, please also consult with your network/OS administrators for other optimizations that might be beneficial.

GbE NIC settings:

Modify /etc/rc.d/rc.local to include the following:

/sbin/ifconfig eth0 txqueuelen 10000
Modify /etc/sysctl.conf for the following:
# increase TCP max buffer size

net.core.rmem_max = 16777216

net.core.wmem_max = 16777216

# increase Linux autotuning TCP buffer limits

# min, default, and max number of bytes to use

net.ipv4.tcp_rmem = 4096 87380 16777216

net.ipv4.tcp_wmem = 4096 65536 16777216

# don't cache ssthresh from previous connection

net.ipv4.tcp_no_metrics_save = 1

# recommended to increase this for 1000 BT or higher

net.core.netdev_max_backlog = 2500

# for 10 GigE, use this

net.core.netdev_max_backlog = 30000

NOTE: Make sure there is only 1 setting of net.core.netdev_max_backlog in the /etc/sysctl.conf file.

Cache memory settings

To optimize Linux to cache directories and inodes the vm.vfs_cache_pressure can be set to a lower value than 100 to attempt to retain caches for inode and directory structures. This will help improve read performance. A value of 10 is suggested. The following commands must all be run as the root user or with sudo.

To check the current value:

cat /proc/sys/vm/vfs_cache_pressure

Add the following to /etc/sysctl.conf to make the cache changes permanent.

vm.vfs_cache_pressure = 10

Create mcsadm account

Now we are ready to create the MariaDB ColumnStore owner account mcsadm which will be used for the installation of the tarball.

The mcsadm user and group are required to be created on all the nodes. sudo privilege to the mcsadm user is also mandatory, this requirement of sudo access is being removed in future releases of ColumnStore.

Following this, all steps will be done using mcsadm user with the help of sudo unless specified differently.

Remember to set mcsadm user's password as it will be required later on for key exchange.

[root@localhost ~] groupadd mcsadm
[root@localhost ~] useradd -g mcsadm mcsadm

[root@localhost ~] passwd mcsadm
Changing password for user mcsadm.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Setup /etc/hosts file

Add the following to the file /etc/hosts on all the nodes. This will ensure all nodes are accessible by their respective hostnames.

192.168.56.104 UM1
192.168.56.105 UM2
192.168.56.106 PM1
192.168.56.107 PM2
192.168.56.108 PM3

Download

In this case, we are going to download the tar file directly from the server, but feel free to download it externally and transfer to the server using your favorite secure file transfer tools.

[mcsadm@pm1 ~]$ wget https://downloads.mariadb.com/MariaDB/mariadb-columnstore/latest/centos/x86_64/7/mariadb-columnstore-1.1.5-1-centos7.x86_64.bin.tar.gz
--2018-06-30 14:02:50--  https://downloads.mariadb.com/MariaDB/mariadb-columnstore/latest/centos/x86_64/7/mariadb-columnstore-1.1.5-1-centos7.x86_64.bin.tar.gz
Resolving downloads.mariadb.com (downloads.mariadb.com)... 51.255.94.155, 2001:41d0:1004:249b::
Connecting to downloads.mariadb.com (downloads.mariadb.com)|51.255.94.155|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 489245964 (467M) [application/octet-stream]
Saving to: ‘mariadb-columnstore-1.1.5-1-centos7.x86_64.bin.tar.gz’

100%[================================================================>] 489,245,964 5.54MB/s   in 2m 16s

2018-06-30 14:05:13 (3.44 MB/s) - ‘mariadb-columnstore-1.1.5-1-centos7.x86_64.bin.tar.gz’ saved [489245964/489245964]

[mcsadm@pm1 ~]$ ls -rlt
total 477780
-rw-rw-r-- 1 mcsadm mcsadm 489245964 Jun 15 16:45 mariadb-columnstore-1.1.5-1-centos7.x86_64.bin.tar.gz

[mcsadm@pm1 downloads]$ tar -zxf mariadb-columnstore-1.1.5-1-centos7.x86_64.bin.tar.gz

[mcsadm@pm1 ~]$ ls -rlt
total 0
drwxr-xr-x 3 mcsadm mcsadm 25 Jun 11 19:29 mariadb
-rw-rw-r-- 1 mcsadm mcsadm 489245964 Jun 15 16:45 mariadb-columnstore-1.1.5-1-centos7.x86_64.bin.tar.gz

[mcsadm@pm1 ~]$ pwd
/home/mcsadm

Generating SSH Keys

Once the tarball is downloaded and extracted under /home/mcsadm folder, generate the Key on PM1 node using ssh-keygen and then copy it to all the nodes using ssh-copyid -i

[mcsadm@pm1 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/mcsadm/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/mcsadm/.ssh/id_rsa.
Your public key has been saved in /home/mcsadm/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:IcAWtetlQw8vHS3QwAgS4vphvY22HdCJaCHYqRHa1ab mcsadm@um1
The key's randomart image is:
+---[RSA 2048]----+
|. . o..-.+ .=.   |
|o= + .B.+ +o *.  |
|= B .+.-. .+B.o  |
| * X o o E.+o-   |
|  * + o S .      |
| -+ o == . E     |
|  E o o--B       |
|    - + .        |
|    o .          |
+----[SHA256]-----+

[mcsadm@pm1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub pm1
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/mcsadm/.ssh/id_rsa.pub"
The authenticity of host 'pm1 (192.168.56.106)' can't be established.
ECDSA key fingerprint is SHA256:Jle/edRpKz9ysV8xp1K9TlIGvbg8Sb1p+GbDob3Id0g.
ECDSA key fingerprint is MD5:a1:ce:9d:58:80:c6:ed:5a:95:7b:33:82:68:cb:0f:40.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
mcsadm@pm1's password: ******

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'pm1'"
and check to make sure that only the key(s) you wanted were added.

[mcsadm@pm1 ~]$ ssh pm1
Last login: Sat Jun 30 13:37:50 2018
[mcsadm@pm1 ~]$ exit
logout
Connection to pm1 closed.

[mcsadm@pm1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub pm2
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/mcsadm/.ssh/id_rsa.pub"
The authenticity of host 'pm2 (192.168.56.107)' can't be established.
ECDSA key fingerprint is SHA256:Jle/edRpKz9ysV8xp1K9TlIGvbg8Sb1p+GbDob3Id0g.
ECDSA key fingerprint is MD5:a1:ce:9d:58:80:c6:ed:5a:95:7b:33:82:68:cb:0f:40.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
mcsadm@pm2's password: ******

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'pm2'"
and check to make sure that only the key(s) you wanted were added.

[mcsadm@pm1 ~]$ ssh pm2
[mcsadm@pm2 ~]$ exit
logout
Connection to pm2 closed.

[mcsadm@pm1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub pm3
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/mcsadm/.ssh/id_rsa.pub"
The authenticity of host 'pm3 (192.168.56.108)' can't be established.
ECDSA key fingerprint is SHA256:Jle/edRpKz9ysV8xp1K9TlIGvbg8Sb1p+GbDob3Id0g.
ECDSA key fingerprint is MD5:a1:ce:9d:58:80:c6:ed:5a:95:7b:33:82:68:cb:0f:40.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
mcsadm@pm3's password: ******

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'pm3'"
and check to make sure that only the key(s) you wanted were added.

[mcsadm@pm1 ~]$ ssh pm3
[mcsadm@pm3 ~]$ exit
logout
Connection to pm3 closed.

[mcsadm@pm1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub um1
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/mcsadm/.ssh/id_rsa.pub"
The authenticity of host 'um1 (192.168.56.104)' can't be established.
ECDSA key fingerprint is SHA256:Jle/edRpKz9ysV8xp1K9TlIGvbg8Sb1p+GbDob3Id0g.
ECDSA key fingerprint is MD5:a1:ce:9d:58:80:c6:ed:5a:95:7b:33:82:68:cb:0f:40.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
mcsadm@um1's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'um1'"
and check to make sure that only the key(s) you wanted were added.

[mcsadm@pm1 ~]$ ssh um1
[mcsadm@um1 ~]$ exit
logout
Connection to um1 closed.
[mcsadm@pm1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub um2
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/mcsadm/.ssh/id_rsa.pub"
The authenticity of host 'um2 (192.168.56.105)' can't be established.
ECDSA key fingerprint is SHA256:Jle/edRpKz9ysV8xp1K9TlIGvbg8Sb1p+GbDob3Id0g.
ECDSA key fingerprint is MD5:a1:ce:9d:58:80:c6:ed:5a:95:7b:33:82:68:cb:0f:40.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
mcsadm@um2's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'um2'"
and check to make sure that only the key(s) you wanted were added.

[mcsadm@pm1 ~]$ ssh um2
[mcsadm@um2 ~]$ exit
logout
Connection to um2 closed.
[mcsadm@pm1 ~]$

Key Exchange between UM1 and UM2

Generate an SSH key on UM1 and copy in both UM1 and UM2, similarly generate another SSH key on UM2 and copy it to UM2 and UM1 respectively.

UM1:

[mcsadm@um1 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/mcsadm/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/mcsadm/.ssh/id_rsa.
Your public key has been saved in /home/mcsadm/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:IcXWteJlQw8vHK3YwAgS4vphvY2bHdCJaCHYqRHa1ro mcsadm@um1
The key's randomart image is:
+---[RSA 2048]----+
|. . o..o.+ .=.   |
|o= + . .+ +o *.  |
|= B . ... .+B.o  |
| * = o o o.+oo   |
|o * + o S .      |
| + o =           |
|  E o o          |
|     + .         |
|    o .          |
+----[SHA256]-----+
[mcsadm@um1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub um1
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/mcsadm/.ssh/id_rsa.pub"
The authenticity of host 'um1 (192.168.56.104)' can't be established.
ECDSA key fingerprint is SHA256:Jle/edRpKz9ysV8xp1K9TlIGvbg8Sb1p+GbDob3Id0g.
ECDSA key fingerprint is MD5:a1:ce:9d:58:80:c6:ed:5a:95:7b:33:82:68:cb:0f:40.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
mcsadm@um1's password:
Permission denied, please try again.
mcsadm@um1's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'um1'"
and check to make sure that only the key(s) you wanted were added.

[mcsadm@um1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub um2
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/mcsadm/.ssh/id_rsa.pub"
The authenticity of host 'um2 (192.168.56.105)' can't be established.
ECDSA key fingerprint is SHA256:Jle/edRpKz9ysV8xp1K9TlIGvbg8Sb1p+GbDob3Id0g.
ECDSA key fingerprint is MD5:a1:ce:9d:58:80:c6:ed:5a:95:7b:33:82:68:cb:0f:40.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
mcsadm@um2's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'um2'"
and check to make sure that only the key(s) you wanted were added.

[mcsadm@um1 ~]$ ssh um2
[mcsadm@um2 ~]$ exit
logout
Connection to um2 closed.
[mcsadm@um1 ~]$

UM2:

[mcsadm@um2 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/mcsadm/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/mcsadm/.ssh/id_rsa.
Your public key has been saved in /home/mcsadm/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:VSiUVyGkQQYcQE/WxeNP6wiWIWtNOfpSqzdEG/uImA8 mcsadm@um2
The key's randomart image is:
+---[RSA 2048]----+
|   .oo=*==+o+.   |
|     +..+o=o     |
|      . .=..     |
|      . B.. .    |
|       BSB o .   |
|      + X   o    |
|    E+ * = o     |
|    o.o * o .    |
|     .o+ .       |
+----[SHA256]-----+
[mcsadm@um2 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub um2
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/mcsadm/.ssh/id_rsa.pub"
The authenticity of host 'um2 (192.168.56.105)' can't be established.
ECDSA key fingerprint is SHA256:Jle/edRpKz9ysV8xp1K9TlIGvbg8Sb1p+GbDob3Id0g.
ECDSA key fingerprint is MD5:a1:ce:9d:58:80:c6:ed:5a:95:7b:33:82:68:cb:0f:40.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
mcsadm@um2's password:
Permission denied, please try again.
mcsadm@um2's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'um2'"
and check to make sure that only the key(s) you wanted were added.

[mcsadm@um2 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub um1
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/mcsadm/.ssh/id_rsa.pub"
The authenticity of host 'um1 (192.168.56.104)' can't be established.
ECDSA key fingerprint is SHA256:Jle/edRpKz9ysV8xp1K9TlIGvbg8Sb1p+GbDob3Id0g.
ECDSA key fingerprint is MD5:a1:ce:9d:58:80:c6:ed:5a:95:7b:33:82:68:cb:0f:40.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
mcsadm@um1's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'um1'"
and check to make sure that only the key(s) you wanted were added.

Install Dependencies and Configurations

Using user root, install all the required packages, set  umask, sudo privileges and ulimit for mcsadm user. All these are required by ColumnStore on all the nodes and also disable "tty" for sudoers.

[root@pm1 local] yum -y install boost expect perl perl-DBI openssl zlib file sudo  libaio rsync snappy net-tools perl-DBD-MySQL
[root@pm1 local] echo "umask 022" >> /etc/profile
[root@pm1 local] echo "mcsadm ALL=(ALL)       NOPASSWD: ALL" >> /etc/sudoers
[root@pm1 local] echo "Defaults:mcsadm !requiretty" >> /etc/sudoers
[root@pm1 local] echo "@mcsadm hard nofile 65536" >> /etc/security/limits.conf
[root@pm1 local] echo "@mcsadm soft nofile 65536" >> /etc/security/limits.conf

Cluster test

Once the dependencies have been set up along with ssh key exchange, execute the Columnstore cluster tester tool from PM1 node using mcsadm user.

[mcsadm@pm1 ~]$ ./mariadb/columnstore/bin/columnstoreClusterTester.sh

*** This is the MariaDB Columnstore Cluster System Test Tool ***

** Validate local OS is supported

Local Node OS System Name : CentOS Linux 7 (Core)

** Run Non-root User directory permissions check on Local Node

Local Node permission test on /tmp : Passed
Local Node permission test on /dev/shm : Passed

** Run MariaDB Console Password check

Passed, no problems detected with a MariaDB password being set without an associated /root/.my.cnf

** Run MariaDB ColumnStore Dependent Package Check

Local Node - Passed, all dependency packages are installed
Failed, Local Node package mariadb-libs is installed, please un-install

Failure occurred, do you want to continue? (y,n) > y


*** Finished Validation of the Cluster, Failures occurred. Check for Error/Failed test results ***

[mcsadm@pm1 ~]$ rpm -qa | grep mariadb
mariadb-libs-5.5.56-2.el7.x86_64
[mcsadm@pm1 ~]$

The above shows a failed cluster test as the default MariaDB libraries that are provided within the Linux distribution are already installed, we need to uninstall these for a clean ColumnStore Installation.

Remove the MariaDB 5.5 libraries and execute the cluster test one more time.

[mcsadm@pm1 ~]$ sudo rpm -e --nodeps mariadb-libs-5.5.56-2.el7.x86_64
[mcsadm@pm1 ~]$ ./mariadb/columnstore/bin/columnstoreClusterTester.sh

*** This is the MariaDB Columnstore Cluster System Test Tool ***

** Validate local OS is supported

Local Node OS System Name : CentOS Linux 7 (Core)

** Run Non-root User directory permissions check on Local Node

Local Node permission test on /tmp : Passed
Local Node permission test on /dev/shm : Passed

** Run MariaDB Console Password check

Passed, no problems detected with a MariaDB password being set without an associated /root/.my.cnf

** Run MariaDB ColumnStore Dependent Package Check

Local Node - Passed, all dependency packages are installed
Local Node - Passed, all packages that should not be installed aren't installed


*** Finished Validation of the Cluster, all Tests Passed ***

[mcsadm@pm1 ~]$

All the tests are cleared, now its time to configure ColumnStore for a 2 UM and 3 PM nodes.

Post Install

Once the cluster test tool is successful, execute post-install and follow the instruction for a distributed install, the user inputs are as follows:

  • Select the type of Data Storage [1=internal, 2=external] () > 1
    • Internal means local disk
  • Enter number of User Modules [1,1024] () > 2

    • Since we are doing a 2UM Nodes setup

  • Enter Nic Interface #1 Host Name () > um1

    • Enter "um1" here based on the /etc/hosts file setup

  • Enter Nic Interface #1 IP Address of um1 (192.168.56.104) >

    • Just press enter to accept the IP already identified by the setup

  • Enter Nic Interface #2 Host Name (unassigned) >

    • Press Enter without any value here as we are only configuring 1 network interface per node

  • The above 3 will repeat depending on the number of UM nodes specified

  • Enter number of Performance Modules [1,1024] () > 3

    • Since we are doing a 3 PM nodes

  • Enter Nic Interface #1 Host Name () > pm1

    • Enter "pm1" here based on the /etc/hosts setup

  • Enter Nic Interface #1 IP Address of pm1 (192.168.56.106) >

    • Just press enter to accept the IP already identified by the setup

  • Enter Nic Interface #2 Host Name (unassigned) >

    • Press Enter without any value here as we are only configuring 1 network interface per node

  • The above three will be repeated depending on the number of PM nodes specified

[mcsadm@pm1 ~]$ ./mariadb/columnstore/bin/post-install --installdir=$HOME/mariadb/columnstore
The next steps are:

If installing on a pm1 node:

export COLUMNSTORE_INSTALL_DIR=/home/mcsadm/mariadb/columnstore
export LD_LIBRARY_PATH=/home/mcsadm/mariadb/columnstore/lib:/home/mcsadm/mariadb/columnstore/mysql/lib/mysql:/home/mcsadm/mariadb/columnstore/lib:/home/mcsadm/mariadb/columnstore/mysql/lib:/home/mcsadm/mariadb/columnstore/lib:/home/mcsadm/mariadb/columnstore/mysql/lib:/home/mcsadm/mariadb/columnstore/lib:/home/mcsadm/mariadb/columnstore/mysql/lib
/home/mcsadm/mariadb/columnstore/bin/postConfigure -i /home/mcsadm/mariadb/columnstore

If installing on a non-pm1 using the non-distributed option:

export COLUMNSTORE_INSTALL_DIR=/home/mcsadm/mariadb/columnstore
export LD_LIBRARY_PATH=/home/mcsadm/mariadb/columnstore/lib:/home/mcsadm/mariadb/columnstore/mysql/lib/mysql:/home/mcsadm/mariadb/columnstore/lib:/home/mcsadm/mariadb/columnstore/mysql/lib:/home/mcsadm/mariadb/columnstore/lib:/home/mcsadm/mariadb/columnstore/mysql/lib:/home/mcsadm/mariadb/columnstore/lib:/home/mcsadm/mariadb/columnstore/mysql/lib
/home/mcsadm/mariadb/columnstore/bin/columnstore start

Copy the above export scripts in the ~/.bashrc script for PM1 node install Make sure the tar.gz file is under /home/mcsadm/ home folder before executing postConfigure.

[mcsadm@pm1 ~]$ /home/mcsadm/mariadb/bin/postConfigure -i /home/mcsadm/mariadb/columnstore

This is the MariaDB ColumnStore System Configuration and Installation tool.
It will Configure the MariaDB ColumnStore System and will perform a Package
Installation of all of the Servers within the System that is being configured.

IMPORTANT: This tool should only be run on the Parent OAM Module
           which is a Performance Module, preferred Module #1

Prompting instructions:

        Press 'enter' to accept a value in (), if available or
        Enter one of the options within [], if available, or
        Enter a new value


===== Setup System Server Type Configuration =====

There are 2 options when configuring the System Server Type: single and multi

  'single'  - Single-Server install is used when there will only be 1 server configured
              on the system. It can also be used for production systems, if the plan is
              to stay single-server.

  'multi'   - Multi-Server install is used when you want to configure multiple servers now or
              in the future. With Multi-Server install, you can still configure just 1 server
              now and add on addition servers/modules in the future.

Select the type of System Server install [1=single, 2=multi] (2) > 2


===== Setup System Module Type Configuration =====

There are 2 options when configuring the System Module Type: separate and combined

  'separate' - User and Performance functionality on separate servers.

  'combined' - User and Performance functionality on the same server

Select the type of System Module Install [1=separate, 2=combined] (1) > 1

Seperate Server Installation will be performed.

NOTE: Local Query Feature allows the ability to query data from a single Performance
      Module. Check MariaDB ColumnStore Admin Guide for additional information.

Enable Local Query feature? [y,n] (n) > n

NOTE: The MariaDB ColumnStore Schema Sync feature will replicate all of the
      schemas and InnoDB tables across the User Module nodes. This feature can be enabled
      or disabled, for example, if you wish to configure your own replication post installation.

MariaDB ColumnStore Schema Sync feature is Enabled, do you want to leave enabled? [y,n] (y) > y


NOTE: MariaDB ColumnStore Replication Feature is enabled

Enter System Name (columnstore-1) >


===== Setup Storage Configuration =====


----- Setup Performance Module DBRoot Data Storage Mount Configuration -----

There are 2 options when configuring the storage: internal or external

  'internal' -    This is specified when a local disk is used for the DBRoot storage.
                  High Availability Server Failover is not Supported in this mode

  'external' -    This is specified when the DBRoot directories are mounted.
                  High Availability Server Failover is Supported in this mode.

Select the type of Data Storage [1=internal, 2=external] (1) > 1

===== Setup Memory Configuration =====

NOTE: Setting 'NumBlocksPct' to 70%
      Setting 'TotalUmMemory' to 50%

===== Setup the Module Configuration =====


----- User Module Configuration -----

Enter number of User Modules [1,1024] (2) > 2

*** User Module #1 Configuration ***

Enter Nic Interface #1 Host Name (um1) > um1
Enter Nic Interface #1 IP Address of um1 (192.168.56.104) >
Enter Nic Interface #2 Host Name (unassigned) >

*** User Module #2 Configuration ***

Enter Nic Interface #1 Host Name (um2) > um2
Enter Nic Interface #1 IP Address of um2 (192.168.56.105) >
Enter Nic Interface #2 Host Name (unassigned) >

----- Performance Module Configuration -----

Enter number of Performance Modules [1,1024] (3) > 3

*** Parent OAM Module Performance Module #1 Configuration ***

Enter Nic Interface #1 Host Name (pm1) > pm1
Enter Nic Interface #1 IP Address of pm1 (192.168.56.106) >
Enter Nic Interface #2 Host Name (unassigned) >
Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm1' (1) > 1

*** Performance Module #2 Configuration ***

Enter Nic Interface #1 Host Name (pm2) > pm2
Enter Nic Interface #1 IP Address of pm2 (192.168.56.107) >
Enter Nic Interface #2 Host Name (unassigned) >
Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm2' (2) > 2

*** Performance Module #3 Configuration ***

Enter Nic Interface #1 Host Name (pm3) > pm3
Enter Nic Interface #1 IP Address of pm3 (192.168.56.108) >
Enter Nic Interface #2 Host Name (unassigned) >
Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm3' (3) > 3

===== System Installation =====

System Configuration is complete.
Performing System Installation.

Performing a MariaDB ColumnStore System install using a Binary package
located in the /home/mcsadm directory.


Next step is to enter the password to access the other Servers.
This is either your password or you can default to using a ssh key
If using a password, the password needs to be the same on all Servers.

Enter password, hit 'enter' to default to using a ssh key, or 'exit' >


----- Performing Install on 'um1 / um1' -----

Install log file is located here: /tmp/um1_binary_install.log


----- Performing Install on 'um2 / um2' -----

Install log file is located here: /tmp/um2_binary_install.log


----- Performing Install on 'pm2 / pm2' -----

Install log file is located here: /tmp/pm2_binary_install.log


----- Performing Install on 'pm3 / pm3' -----

Install log file is located here: /tmp/pm3_binary_install.log


MariaDB ColumnStore Package being installed, please wait ...  DONE

===== Checking MariaDB ColumnStore System Logging Functionality =====

The MariaDB ColumnStore system logging is setup and working on local server

===== MariaDB ColumnStore System Startup =====

System Configuration is complete.
Performing System Installation.

----- Starting MariaDB ColumnStore on local server -----

MariaDB ColumnStore successfully started

MariaDB ColumnStore Database Platform Starting, please wait ...................... DONE

Run MariaDB ColumnStore Replication Setup..  DONE

MariaDB ColumnStore Install Successfully Completed, System is Active

Enter the following command to define MariaDB ColumnStore Alias Commands

. /home/mcsadm/mariadb/columnstore/bin/columnstoreAlias

Enter 'mcsmysql' to access the MariaDB ColumnStore SQL console
Enter 'mcsadmin' to access the MariaDB ColumnStore Admin console

NOTE: The MariaDB ColumnStore Alias Commands are in /etc/profile.d/columnstoreAlias.sh

[mcsadm@pm1 ~]$

The above indicates a successful installation of MariaDB AX and ColumnStore.

In case of any errors, the log files for all the nodes are located under PM 1's /tmp folder with node-specific file names.

Test ColumnStore by using mcsmysql from UM1 node

[mcsadm@um1 ~]$ mcsmysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 40
Server version: 10.2.15-MariaDB-log Columnstore 1.1.5-1

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show databases;
+---------------------+
| Database            |
+---------------------+
| calpontsys          |
| columnstore_info    |
| infinidb_querystats |
| infinidb_vtable     |
| information_schema  |
| mysql               |
| performance_schema  |
| test                |
+---------------------+
8 rows in set (0.00 sec)

MariaDB [(none)]> create database testdb;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> use testdb;
Database changed
MariaDB [testdb]> show tables;
Empty set (0.00 sec)

MariaDB [testdb]> create table tab(id int, name varchar(100));
Query OK, 0 rows affected (0.01 sec)

MariaDB [testdb]> show create table tab\G
*************************** 1. row ***************************
       Table: tab
Create Table: CREATE TABLE `tab` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(100) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MariaDB [testdb]> create table tab_cs(id int, name varchar(100)) engine=columnstore;
Query OK, 0 rows affected (0.39 sec)

MariaDB [testdb]> show create table tab_cs\G
*************************** 1. row ***************************
       Table: tab_cs
Create Table: CREATE TABLE `tab_cs` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(100) DEFAULT NULL
) ENGINE=Columnstore DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MariaDB [testdb]> insert into tab_cs select rand()*10000, column_name from information_schema.columns;
Query OK, 1921 rows affected (1.70 sec)
Records: 1921  Duplicates: 0  Warnings: 0

MariaDB [testdb]> select count(*) from tab_cs;
+----------+
| count(*) |
+----------+
|     1921 |
+----------+
1 row in set (0.08 sec)

MariaDB [testdb]>

This concludes a successful MariaDB AX distributed tarball install.

Conclusion

MariaDB ColumnStore is a powerful distributed analytical storage engine available as part of MariaDB AX along with MariaDB MaxScale. Installation using a tarball is quite an easy task.

Summary:

  • Create a new user/group mcsadm
  • Download and untar the tarball tar.gz file under /home/mcsadm 
  • Perform SSH Key Exchange between PM1 and all nodes
  • Perform SSH Key Exchange between UM1 and UM2 and vice versa
  • Perform a Cluster Test to ensure no problems
  • Execute Post Install
  • Execute Post Configure

References

This blog is about setting up a MariaDB AX Cluster with 2 UM and 3 PM nodes on a CentOS 7 Cluster. We will be using a tarball binary image non-root installation.

Login or Register to post comments

by Faisal at August 01, 2018 12:13 PM

July 31, 2018

MariaDB Foundation

MariaDB 5.5.61, MariaDB Connector/Node.js 0.7.0 and MariaDB Connector/J 2.2.6 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 5.5.61, the latest stable release in the MariaDB 5.5 series, as well as MariaDB Connector/Node.js 0.7.0, the first alpha release of the new 100% JavaScript non-blocking MariaDB client for Node.js, compatible with Node.js 6+, and MariaDB Connector/J 2.2.6, the latest stable MariaDB Connector/J release. […]

The post MariaDB 5.5.61, MariaDB Connector/Node.js 0.7.0 and MariaDB Connector/J 2.2.6 now available appeared first on MariaDB.org.

by Ian Gilfillan at July 31, 2018 04:55 PM

Peter Zaitsev

Webinar Wednesday, August 1, 2018: Migrating to AWS Aurora, Monitoring AWS Aurora with PMM

Migrating to AWS Aurora

Migrating to AWS AuroraPlease join Autodesk’s Senior DevOps Engineer, Sanjeet Deshpande, Autodesk’s Senior Database Engineer, Vineet Khanna, and Percona’s Sr. MySQL DBA, Tate McDaniel as they present Migrating to AWS Aurora, Monitoring AWS Aurora with PMM on Wednesday, August 1st, 2018, at 5:00 PM PDT (UTC-7) / 8:00 PM EDT (UTC-4).

Amazon Web Services (AWS) Aurora is one of the most popular cloud-based RDBMS solutions. The main reason for Aurora’s success is because it’s based on the InnoDB storage engine.

In this session, we will talk about how you can efficiently plan for migrating to AWS Aurora using Terraform and Percona products and solutions. We will share our Terraform code for launching AWS Aurora clusters, look at tricks for checking data consistency, verify migration paths and effectively monitor the environment using Percona Monitoring and Management (PMM).

The topics in this session include:

  • Why AWS Aurora? What is the future of AWS Aurora?
  • Build Aurora infrastructure
  • Using Terraform (without data)
  • Restore using Terraform and Percona XtraBackup (using AWS S3 bucket)
  • Verify data consistency
  • Aurora migration
  • 1:1 migration
  • Many:1 migration using Percona Server for MySQL multi-source replication
  • Show benchmarks and PMM dashboard
  • Demo

Register for the webinar.

Sanjeet DeshpandeSanjeet Deshpande, Senior DevOps Engineer

Sanjeet is a Senior DevOps having 10+ years of experience and currently working with Autodesk, Singapore. He is experienced in architecting, deploying and managing the cloud infrastructures/applications and automation experience. Sanjeet has worked extensively on AWS services like Lambda, SQS, RDS, SNS to name a few. He has also worked closely with the engineering team for different applications and suggested changes to improve their application uptime.

Tate McDanielTate Mcdaniel, Sr. MySQL DBA

Tate joined Percona in June 2017 as a Remote MySQL DBA. He holds a Bachelors degree in Information Systems and Decision Strategies from LSU. He has 10+ years of experience working with MySQL and operations management. His great love is application query tuning. In his off time, he races sailboats, travels the Caribbean by sailboat, and drives all over in an RV.

Vineet KhannaVineet Khanna, Senior Database Engineer

Vineet Khanna, Senior Database Engineer at Autodesk, has 10+ years of experience as a MySQL DBA. His main professional interests are managing complex database environments, improving database performance, and architecting high availability solutions for MySQL. He has handled database environments for organizations like Chegg, Zendesk, and Adobe.

The post Webinar Wednesday, August 1, 2018: Migrating to AWS Aurora, Monitoring AWS Aurora with PMM appeared first on Percona Database Performance Blog.

by Tate McDaniel at July 31, 2018 02:56 PM

July 30, 2018

Peter Zaitsev

Upcoming Webinar Tuesday, 7/31: Using MySQL for Distributed Database Architectures

Distributed Database Architectures

Distributed Database ArchitecturesPlease join Percona’s CEO, Peter Zaitsev as he presents Using MySQL for Distributed Database Architectures on Tuesday, July 31st, 2018 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

 

In modern data architectures, we’re increasingly moving from single-node design systems to distributed architectures using multiple nodes – often spread across multiple databases and multiple continents. Such architectures bring many benefits (such as scalability and resiliency), but can also bring a lot of pain if incorrectly designed and executed.

In this presentation, we will look at how we can use MySQL to engineer distributed multi-node systems.

Register for the webinar.

Peter ZaitsevPeter Zaitsev, CEO and Co-Founder

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Inc. 5000 named Percona to their list in 2013, 2014, 2015 and 2016. Peter was an early employee at MySQL AB, eventually leading the company’s High-Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High-Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his ebook Practical MySQL Performance Optimization is one of Percona’s most popular downloads.

 

The post Upcoming Webinar Tuesday, 7/31: Using MySQL for Distributed Database Architectures appeared first on Percona Database Performance Blog.

by Peter Zaitsev at July 30, 2018 04:13 PM

July 28, 2018

Valeriy Kravchuk

Fun with Bugs #69 - On Some Public Bugs Fixed in MySQL 5.7.23

Several MySQL releases happened yesterday, but of them all I am mostly interested in MySQL 5.7.23, as MySQL 5.7 (either directly or indirectly, via forks and upstream fixes they merge) is probably the most widely used MySQL GA release at the moment.

In this post (in a typical manner for this "Fun with Bugs" series)  I'd like to describe several bugs reported by MySQL Community users and fixed in MySQL 5.7.23. As usual, I'll try to concentrate mostly on InnoDB, replication, partitioning and optimizer-related bugs (if any).

Checking MySQL release notes is like swimming with dolphins - it's pure fun
 
I'd like to start with InnoDB and partitioning-related fixes:
  • The most important fix (that will make downgrades from 5.7.23 more problematic probably) is for Bug #86926 - "The field table_name (varchar(64)) from mysql.innodb_table_stats can overflow.", reported by Jean-François Gagné a year ago. The length of the table_name column in mysql.innodb_table_stats and mysql.innodb_index_stats tables has been increased from 64 to 199 characters (not 100% sure where this new value comes from), and one has to run mysql_upgrade while upgrading to 5.7.23.
  • Bug #90296 - "Hard error should be report when fsync() return EIO". Now this recent request from Allen Lai is implemented.
  • Bug #88739 - "Mysql scalability improvement patch: Optimized away excessive condition". The patch that removes unnecessary check for read-only transactions was suggested by Sandeep Sethia.
  • Bug #86370 - "crash on startup, divide by zero, with inappropriate buffer pool configuration". This funny bug was reported by Shane Bester. I wonder how many more bombs of this kind (use of uint or ulint data types) still remain in the code...
  • Bug #87253 - "innodb partition table has unexpected row lock". This bug was reported by  Yx Jiang and a patch was later suggested by Zhenghu Wen. If the comment at the end is correct, the fix is NOT included in MySQL 8.0.12.
Now let's check some replication bugs fixed:

  • Bug #89272 - "Binlog and Engine become inconsistent when binlog cache file gets out of space". This bug was reported by Yoshinori Matsunobu. Good question in the last comment there, by Artem Danilov:
    "I wonder why does only ENOSPC end up with flush errors and clears the binary log cached. What about any other disk error? Can this fix be extended for all disk errors?"
  • Bug #88891 - "Filtered replication leaves GTID holes with create database if not exists". This serious bug was reported by Simon Mudd
  • Bug #89938 - "Rejoin old primary node may duplicate key when recovery", was reported by Zhenghu Wen, who cares about group replication a lot (unlike me, I have Galera for that).
  • Yes another bug report from Jean-François Gagné, this time in group replication - Bug #89146 - "Binlog name and Pos are wrong in group_replication_applier channel's error msgs".
  • Bug #81500 - "dead branch in search_key_in_table()". This bug was reported 2 years ago by Andrei Elkin, now my colleague (again) in MariaDB. Index is properly used on slave to find rows after this fix.
Surely in MySQL 5.7.23 there had to be some community bug fixed that is kept private. This time it's a bug in mysqldump:
"mysqldump exited abnormally for large --where option values. (Bug #26171967, Bug #86496, Bug #27510150)"
and this is a complete nonsense to me!

The last but not the least, we all should be thankful to Daniël van Eeden for these bug reports and patches contributed:
  • Bug #89001 - "MySQL aborts without proper error message on startup if grant tables are corrupt".
  • Bug #89578 - "Contribution: Use native host check from OpenSSL"
Had you noted any fixes to numerous public bug reports for InnoDB FULLTEXT indexes, "online" DDL or InnoDB compression, or optimizer maybe? I had not, and no wonder - the way these features are used by Community is obviously not something Oracle cares much about.

Anyway, I'd suggest to consider upgrade to MySQL 5.7.23 to anyone who cares about security and uses OpenSSL, replication of any kind or InnoDB partitioned tables in production.

by Valeriy Kravchuk (noreply@blogger.com) at July 28, 2018 05:17 PM

July 27, 2018

Peter Zaitsev

Percona Server for MongoDB 3.4.16-2.14 Is Now Available

MongoRocks

Percona Server for MongoDB 3.4Percona announces the release of Percona Server for MongoDB 3.4.16-2.14 on July 27, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.4 Community Edition. It supports MongoDB 3.4 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features. It requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.16 and does not include any additional changes.

The Percona Server for MongoDB 3.4.16-2.14 release notes are available in the official documentation.

The post Percona Server for MongoDB 3.4.16-2.14 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at July 27, 2018 04:58 PM

This Week in Data with Colin Charles 46: OSCON Recap, Google Site Reliability Workbook

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

OSCON happened last week and was incredible. It is true there was less of a database focus, and a lot more topics covered. In fact, you’d have found it hard to find database content. There was plenty of interesting content around AI/ML, cloud, SRE, blockchain and more. As a speaker, the 40-minute sessions that included a Q and A session was quite compact (I felt it was a little too short, and many speakers sped up towards the end). I guess it will make for more blog content.

The conference’s open source ethos is still extremely strong, and the keynotes exemplified that. It was not just the 20th anniversary of OSCON, but also the 20th anniversary of the Open Source Initiative (OSI). Percona is a sponsor (and I am an individual member). From a sponsor standpoint, Home Depot had a huge booth, but so did the NSA – who, along with giving out stickers, were actively recruiting. Readers might recall mention of NSA’s involvement with LemonGraph and the other data open source data projects from column 43.

Google just released The Site Reliability Workbook, which looks like the full PDF (“launch day edition”) of their new book. It includes practical ways to implement SRE. You can pre-order the book, and the official release date is August 4, 2018. This should be the best companion to Site Reliability Engineering: How Google Runs Production Systems, which I highly recommend reading first before getting to the workbook. After a quick perusal of the new release, I can say I like it — the case studies from Evernote and Home Depot, are all very interesting from a database standpoint (MySQL, Cloud SQL). Plenty of information is relevant if you’re a Prometheus user as well. I say skim the PDF, and devour the book!

Releases

Link List

Industry Updates

  • Elastic is on a spree of grabbing folk – Massimo Brignoli (ex-MongoDB, SkySQL, Oracle, and MySQL) joins as a Principal Solutions Architect, and Gerardo Narvaja joins as Sr. Solutions Architect for Pacific Northwest. He departs MariaDB Corporation, and has previously been at Tokutek, Pythian and MySQL.
  • Morgan Tocker (LinkedIn) has joined PingCAP as Senior Product & Community Manager. Previously he was both product manager and community manager at Oracle for MySQL, had a stint at Percona and also was at the original MySQL AB.
  • Baron Schwartz is now Chief Technology Officer of VividCortex, and Amena Ali has become the new CEO.

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 46: OSCON Recap, Google Site Reliability Workbook appeared first on Percona Database Performance Blog.

by Colin Charles at July 27, 2018 04:33 PM

Jean-Jerome Schmidt

Introduction to Failover for MySQL Replication - the 101 Blog

You may have heard about the term “failover” in the context of MySQL replication.

Maybe you wondered what it is as you are starting your adventure with databases.

Maybe you know what it is but you are not sure about potential problems related to it and how they can be solved?

In this blog post we will try to give you an introduction to failover handling in MySQL & MariaDB.

We will discuss what the failover is, why it is unavoidable, what the difference is between failover and switchover. We will discuss the failover process in the most generic form. We will also touch a bit on different issues that you will have to deal with in relation to the failover process.

What does “failover” mean?

MySQL replication is a collective of nodes, each of them may serve one role at a time. It can become a master or a replica. There is only one master node at a given time. This node receives write traffic and it replicates writes to its replicas.

As you can imagine, being a single point of entry for data into the replication cluster, the master node is quite important. What would happen had it failed and become unavailable?

This is quite a serious condition for a replication cluster. It cannot accept any writes at a given moment. As you may expect, one of the replicas will have to take over the master’s tasks and start accepting writes. The rest of the replication topology may also have to change - remaining replicas should change their master from the old, failed node to the newly chosen one. This process of “promoting” a replica to become a master after the old master has failed is called “failover”.

On the other hand, “switchover” happens when the user triggers the promotion of the replica. A new master is promoted from a replica pointed by the user and the old master, typically, becomes a replica to the new master.

The most important difference between “failover” and “switchover” is the state of the old master. When a failover is performed the old master is, in some way, not reachable. It may have crashed, it may have suffered a network partitioning. It cannot be used at a given moment and its state is, typically, unknown.

On the other hand, when a switchover is performed, the old master is alive and well. This has serious consequences. If a master is unreachable, it may mean that some of the data has not yet been sent to the slaves (unless semi-synchronous replication was used). Some of the data may have been corrupted or sent partially.

There are mechanisms in place to avoid propagating such corruptions on slaves but the point is that some of the data may be lost in process. On the other hand, while performing a switchover, the old master is available and data consistency is maintained.

Failover process

Let’s spend some time discussing how exactly the failover process looks like.

Master crash is detected

For starters, a master has to crash before the failover will be performed. Once it is not available, a failover is triggered. So far, it seems simple but the truth is, we are already on slippery ground.

First of all, how is master health tested? Is it tested from one location or are tests distributed? Does the failover management software just attempt to connect to the master or does it implement more advanced verifications before master failure is declared?

Let’s imagine the following topology:

We have a master and two replicas. We also have a failover management software located on some external host. What would happen if a network connection between the host with failover software and the master failed?

According to the failover management software, the master has crashed - there’s no connectivity to it. Still, the replication itself is working just fine. What should happen here is that the failover management software would try to connect to replicas and see what their point of view is.

Do they complain about a broken replication or are they happily replicating?

Things may become even more complex. What if we’d add a proxy (or a set of proxies)? It will be used to route traffic - writes to master and reads to replicas. What if a proxy cannot access the master? What if none of the proxies can access the master?

This means that the application cannot function under those conditions. Should the failover (actually, it would be more of a switchover as the master is technically alive) be triggered?

Technically, the master is alive but it cannot be used by the application. Here, the business logic has to come in and a decision has to be made.

Preventing old master from running

No matter how and why, if there is a decision to promote one of the replicas to become a new master, the old master has to be stopped and, ideally, it shouldn’t be able to start again.

How this can be achieved depends on the details of the particular environment; therefore this part of the failover process is commonly reinforced by external scripts integrated into the failover process through different hooks.

Those scripts can be designed to use tools available in the particular environment to stop the old master. It can be a CLI or API call that will stop a VM; it can be shell code that runs commands through some sort of “lights out management” device; it can be a script which sends SNMP traps to the Power Distribution Unit that disable the power outlets the old master is using (without electric power we can be sure it will not start again).

If a failover management software is a part of more complex product, which also handles recovery of nodes (like it is the case for ClusterControl), the old master may be marked as excluded from the recovery routines.

You may wonder why it is so important to prevent the old master from becoming available once more?

The main issue is that in replication setups, only one node can be used for writes. Typically you ensure that by enabling a read_only (and super_read_only, if applicable) variable on all replicas and keeping it disabled only on the master.

Once a new master is promoted, it will have read_only disabled. The problem is that, if the old master is unavailable, we cannot switch it back to read_only=1. If MySQL or a host crashed, this is not much of an issue as good practices are to have my.cnf configured with that setting so, once MySQL starts, it always starts in read only mode.

The problem shows when it’s not a crash but a network issue. The old master is still running with read_only disabled, it’s just not available. When networks converge, you will end up with two writeable nodes. This may or may not be a problem. Some of the proxies use the read_only setting as an indicator whether a node is a master or a replica. Two masters showing up at the given moment may result in a huge issue as data is written to both hosts, but replicas get only half of the write traffic (the part that hit the new master).

Sometimes it’s about hardcoded settings in some of the scripts which are configured to connect to a given host only. Normally they’d fail and someone would notice that the master has changed.

With the old master being available, they will happily connect to it and data discrepancy will arise. As you can see, making sure that the old master will not start is quite a high priority item.

Decide on a master candidate

The old master is down and it will not return from its grave, now it’s time to decide which host we should use as a new master. Usually there is more than one replica to pick from, so a decision has to be made. There are many reasons why one replica may be picked over another, therefore checks have to be performed.

Whitelists and blacklists

For starters, a team managing databases may have its reasons to pick one replica over another when deciding about a master candidate. Maybe it’s using weaker hardware or has some particular job assigned to it (that replica runs backup, analytic queries, developers have access to it and run custom, hand-made queries). Maybe it’s a test replica where a new version is undergoing acceptance tests before proceeding with the upgrade. Most failover management software supports white and blacklists, which can be utilized to precisely define which replicas should or cannot be used as master candidates.

Semi-synchronous replication

A replication setup may be a mix of asynchronous and semi-synchronous replicas. There’s a huge difference between them - semi-synchronous replica is guaranteed to contain all of the events from the master. An asynchronous replica may not have received all the data thus failing over to it may result in data loss. We would rather see semi-synchronous replicas to be promoted.

Replication lag

Even though a semi-synchronous replica will contain all the events, those events may still reside in relay logs only. With heavy traffic, all replicas, no matter if semi-sync or async, may lag.

The problem with replication lag is that, when you promote a replica, you should reset the replication settings so it will not attempt to connect to the old master. This will also remove all relay logs, even if they are not yet applied - which leads to data loss.

Even if you will not reset the replication settings, you still cannot open a new master to connections if it hasn’t applied all events from its relay log. Otherwise you will risk that the new queries will affect transactions from the relay log, triggering all sort of problems (for example, an application may remove some rows which are accessed by transactions from relay log).

Taking all of this under consideration, the only safe option is to wait for the relay log to be applied. Still, it may take a while if the replica was lagging heavily. Decisions have to be made as to which replica would make a better master - asynchronous, but with small lag or semi-synchronous, but with lag that would require a significant amount of time to apply.

Errant transactions

Even though replicas should not be written to, it still could happen that someone (or something) has written to it.

It may have been just a single transaction way in the past, but it still may have a serious effect on the ability to perform a failover. The issue is strictly related to Global Transaction ID (GTID), a feature which assigns a distinct ID to every transaction executed on a given MySQL node.

Nowadays it’s quite a popular setup as it brings great levels of flexibility and it allows for better performance (with multi-threaded replicas).

The issue is that, while re-slaving to a new master, GTID replication requires all events from that master (which have not been executed on replica) to be replicated to the replica.

Let’s consider the following scenario: at some point in the past, a write happened on a replica. It was a long time ago and this event has been purged from the replica’s binary logs. At some point a master has failed and the replica was appointed as a new master. All remaining replicas will be slaved off the new master. They will ask about transactions executed on the new master. It will respond with a list of GTIDs which came from the old master and the single GTID related to that old write. GTIDs from the old master are not a problem as all remaining replicas contain at least the majority of them (if not all) and all missing events should be recent enough to be available in the new master’s binary logs.

Worst case scenario, some missing events will be read from the binary logs and transferred to replicas. The issue is with that old write - it happened on a new master only, while it was still a replica, thus it does not exist on remaining hosts. It is an old event therefore there is no way to retrieve it from binary logs. As a result, none of the replicas will be able to slave off the new master. The only solution here is to take a manual action and inject an empty event with that problematic GTID on all replicas. It will also means that, depending on what happened, the replicas may not be in sync with the new master.

As you can see, it is quite important to track errant transactions and determine if it is safe to promote a given replica to become a new master. If it contains errant transactions, it may not be the best option.

Failover handling for the application

It is crucial to keep in mind that master switch, forced or not, does have an effect on the whole topology. Writes have to be redirected to a new node. This can be done in multiple ways and it is critical to ensure that this change is as transparent to the application as possible. In this section we will take a look at some of the examples of how the failover can be made transparent to the application.

DNS

One of the ways in which an application can be pointed to a master is by utilizing DNS entries. With low TTL it is possible to change the IP address to which a DNS entry such as ‘master.dc1.example.com’ points. Such a change can be done through external scripts executed during the failover process.

Service discovery

Tools like Consul or etc.d can also be used for directing traffic to a correct location. Such tools may contain information that the current master’s IP is set to some value. Some of them also give the ability to use hostname lookups to point to a correct IP. Again, entries in service discovery tools have to be maintained and one of the ways to do that is to make those changes during the failover process, using hooks executed on different stages of the failover.

Proxy

Proxies may also be used as a source of truth about topology. Generally speaking, no matter how they discover the topology (it can be either an automatic process or the proxy has to be reconfigured when the topology changes), they should contain the current state of the replication chain as otherwise they wouldn’t be able to route queries correctly.

The approach to use a proxy as a source of truth can be quite common in conjunction with the approach to collocate proxies on application hosts. There are numerous advantages to collocating proxy and web servers: fast and secure communication using Unix socket, keeping a caching layer (as some of the proxies, like ProxySQL can also do the caching) close to the application. In such a case, it makes sense for the application to just connect to the proxy and assume it will route queries correctly.

Failover in ClusterControl

ClusterControl applies industry best practices to make sure that the failover process is performed correctly. It also ensures that the process will be safe - default settings are intended to abort the failover if possible issues are detected. Those settings can be overridden by the user should they want to prioritize failover over data safety.

Once a master failure has been detected by ClusterControl, a failover process is initiated and a first failover hook is immediately executed:

Next, master availability is tested.

ClusterControl does extensive tests to make sure the master is indeed unavailable. This behavior is enabled by default and it is managed by the following variable:

replication_check_external_bf_failover
     Before attempting a failover, perform extended checks by checking the slave status to detect if the master is truly down, and also check if ProxySQL (if installed) can still see the master. If the master is detected to be functioning, then no failover will be performed. Default is 1 meaning the checks are enabled.

As a following step, ClusterControl ensures that the old master is down and if not, that ClusterControl will not attempt to recover it:

Next step is to determine which host can be used as a master candidate. ClusterControl does check if a whitelist or a blacklist is defined.

You can do that by using the following variables in the cmon configuration file:

replication_failover_blacklist
     Comma separated list of hostname:port pairs. Blacklisted servers will not be considered as a candidate during failover. replication_failover_blacklist is ignored if replication_failover_whitelist is set.
replication_failover_whitelist
     Comma separated list of hostname:port pairs. Only whitelisted servers will be considered as a candidate during failover. If no server on the whitelist is available (up/connected) the failover will fail. replication_failover_blacklist is ignored if replication_failover_whitelist is set.

It is also possible to configure ClusterControl to look for differences in binary log filters across all replicas. It can be done using replication_check_binlog_filtration_bf_failover variable. By default, those checks are disabled. ClusterControl also verifies there are no errant transactions in place, which could cause issues.

You can also ask ClusterControl to auto-rebuild replicas which cannot replicate from the new master using following setting in cmon configuration file:

 * replication_auto_rebuild_slave:
     If the SQL THREAD is stopped and error code is non-zero then the slave will be automatically rebuilt. 1 means enable, 0 means disable (default).

Afterwards a second script is executed: it is defined in replication_pre_failover_script setting. Next, a candidate undergoes preparation process.

ClusterControl waits for redo logs to be applied (ensuring that data loss is minimal). It also checks if there are other transactions available on remaining replicas, which have not been applied to master candidate. Both behaviors can be controlled by the user, using the following settings in cmon configuration file:

replication_skip_apply_missing_txs
     Force failover/switchover by skipping applying transactions from other slaves. Default disabled. 1 means enabled.
replication_failover_wait_to_apply_timeout
     Candidate waits up to this many seconds to apply outstanding relay log (retrieved_gtids) before failing over. Default -1 seconds (wait forever). 0 means failover immediately.

As you can see, you can force a failover even though not all of the redo log events have been applied - it allows the user to decide what has the higher priority - data consistency or failover velocity.

Finally, the master is elected and the last script is executed (a script which can be defined as replication_post_failover_script.

If you haven’t tried ClusterControl yet, I encourage you to download it (it’s free) and give it a go.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Master detection in ClusterControl

ClusterControl gives you ability to deploy full High Availability stack including database and proxy layers. Master discovery is always one of the issues to deal with.

How does it work in ClusterControl?

A high availability stack, deployed through ClusterControl, consists of three pieces:

  • database layer
  • proxy layer which can be HAProxy or ProxySQL
  • keepalived layer, which, with use of Virtual IP, ensures high availability of the proxy layer

Proxies rely on read_only variables on the nodes.

As you can see in the screenshot above, only one node in the topology is marked as “writable”. This is the master and this is the only node which will receive writes.

A proxy (in this example, ProxySQL) will monitor this variable and it will reconfigure itself automatically.

On the other side of that equation, ClusterControl takes care of topology changes: failovers and switchovers. It will make necessary changes in read_only value to reflect the state of the topology after the change. If a new master is promoted, it will become the only writable node. If a master is elected after the failover, it will have read_only disabled.

On top of the proxy layer, keepalived is deployed. It deploys a VIP and it monitors the state of underlying proxy nodes. VIP points to one proxy node at a given time. If this node goes down, virtual IP is redirected to another node, ensuring that the traffic directed to VIP will reach a healthy proxy node.

To sum it up, an application connects to the database using virtual IP address. This IP points to one of the proxies. Proxies redirect traffic accordingly to the topology structure. Information about topology is derived from read_only state. This variable is managed by ClusterControl and it is set based on the topology changes user requested or ClusterControl performed automatically.

by krzysztof at July 27, 2018 10:44 AM

July 26, 2018

Peter Zaitsev

Tuning InnoDB Primary Keys

The choice of good InnoDB primary keys is a critical performance tuning decision. This post will guide you through the steps of choosing the best primary key depending on your workload.

As a principal architect at Percona, one of my main duties is to tune customer databases. There are many aspects related to performance tuning which make the job complex and very interesting. In this post, I want to discuss one of the most important one: the choice of good InnoDB primary keys. You would be surprised how many times I had to explain the importance of primary keys and how many debates I had around the topic as often people have preconceived ideas that translate into doing things a certain way without further thinking.

The choice of a good primary key for an InnoDB table is extremely important and can have huge performance impacts. When you start working with a customer using an overloaded x1.16xlarge RDS instance, with close to 1TB of RAM, and after putting a new primary in place they end up doing very well with a r4.4xlarge instance — it’s a huge impact. Of course, it is not a silver bullet –, you need to have a workload like the ones I’ll highlight in the following sections. Keep in mind that tuning comes with trade-offs, especially with the primary key. What you gain somewhere, you have to pay for, performance-wise, elsewhere. You need to calculate what is best for your workload.

What is special about InnoDB primary keys?

InnoDB is called an index-organized storage engine. An index-organized storage engine uses the B-Tree of the primary key to stores the data, the table rows. That means a primary key is mandatory with InnoDB. If there is no primary key for a table, InnoDB adds a hidden auto-incremented 6 bytes counter to the table and use that hidden counter as the primary key. There are some issues with the InnoDB hidden primary key. You should always define explicit primary keys on your tables. In summary, you access all InnoDB rows by the primary key values.

An InnoDB secondary index is also a B-Tree. The search key is made of the index columns and the values stored are the primary keys of matching rows. A search by a secondary index very often results in an implicit search by primary key. You can find more information about InnoDB file format in the documentation. Jeremy Cole’s InnoDB Ruby tools are also a great way to learn about InnoDB internals.

What is a B-Tree?

A B-Tree is a data structure optimized for operations on block devices. Block devices, or disks, have a rather important data access latency, especially spinning disks. Retrieving a single byte at a random position doesn’t take much less time than retrieving a bigger piece of data like a 8KB or 16KB object. That’s the fundamental argument for B-Trees. InnoDB uses pieces of data — pages — of 16KB.

A simple three level B-Tree

Let’s attempt a simplified description of a B-Tree. A B-Tree is a data structure organized around a key. The key is used to search the data inside the B-Tree. A B-Tree normally has multiple levels. The data is stored only in the bottom-most level, the leaves. The pages of the other levels, the nodes, only contains keys and pointers to pages in the next lower level.

When you want to access a piece of data for a given value of the key, you start from the top node, the root node, compare the keys it contains with the search value and finds the page to access at the next level. The process is repeated until you reach the last level, the leaves.  In theory, you need one disk read operation per level of the B-Tree. In practice there is always a memory cache and the nodes, since they are less numerous and accessed often, are easy to cache.

An ordered insert example

Let’s consider the following sysbench table:

mysql> show create table sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=3000001 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
mysql> show table status like 'sbtest1'\G
*************************** 1. row ***************************
           Name: sbtest1
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 2882954
 Avg_row_length: 234
    Data_length: 675282944
Max_data_length: 0
   Index_length: 47775744
      Data_free: 3145728
 Auto_increment: 3000001
    Create_time: 2018-07-13 18:27:09
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

The primary key B-Tree size is Data_length. There is one secondary key B-Tree, the k_1 index, and its size is given by Index_length. The sysbench table was inserted in order of the primary key since the id column is auto-incremented. When you insert in order of the primary key, InnoDB fills its pages with up to 15KB of data (out of 16KB), even when innodb_fill_factor is set to 100. That allows for some row expansion by updates after the initial insert before a page needs to be split. There are also some headers and footers in the pages. If a page is too full and cannot accommodate an update adding more data, the page is split into two. Similarly, if two neighbor pages are less than 50% full, InnoDB will merge them. Here is, for example, a sysbench table inserted in id order:

mysql> select count(*), TABLE_NAME,INDEX_NAME, avg(NUMBER_RECORDS), avg(DATA_SIZE) from information_schema.INNODB_BUFFER_PAGE
    -> WHERE TABLE_NAME='`sbtest`.`sbtest1`' group by TABLE_NAME,INDEX_NAME order by count(*) desc;
+----------+--------------------+------------+---------------------+----------------+
| count(*) | TABLE_NAME         | INDEX_NAME | avg(NUMBER_RECORDS) | avg(DATA_SIZE) |
+----------+--------------------+------------+---------------------+----------------+
|    13643 | `sbtest`.`sbtest1` | PRIMARY    |             75.0709 |     15035.8929 |
|       44 | `sbtest`.`sbtest1` | k_1        |           1150.3864 |     15182.0227 |
+----------+--------------------+------------+---------------------+----------------+
2 rows in set (0.09 sec)
mysql> select PAGE_NUMBER,NUMBER_RECORDS,DATA_SIZE,INDEX_NAME,TABLE_NAME from information_schema.INNODB_BUFFER_PAGE
    -> WHERE TABLE_NAME='`sbtest`.`sbtest1`' order by PAGE_NUMBER limit 1;
+-------------+----------------+-----------+------------+--------------------+
| PAGE_NUMBER | NUMBER_RECORDS | DATA_SIZE | INDEX_NAME | TABLE_NAME         |
+-------------+----------------+-----------+------------+--------------------+
|           3 |             35 |       455 | PRIMARY    | `sbtest`.`sbtest1` |
+-------------+----------------+-----------+------------+--------------------+
1 row in set (0.04 sec)
mysql> select PAGE_NUMBER,NUMBER_RECORDS,DATA_SIZE,INDEX_NAME,TABLE_NAME from information_schema.INNODB_BUFFER_PAGE
    -> WHERE TABLE_NAME='`sbtest`.`sbtest1`' order by NUMBER_RECORDS desc limit 3;
+-------------+----------------+-----------+------------+--------------------+
| PAGE_NUMBER | NUMBER_RECORDS | DATA_SIZE | INDEX_NAME | TABLE_NAME         |
+-------------+----------------+-----------+------------+--------------------+
|          39 |           1203 |     15639 | PRIMARY    | `sbtest`.`sbtest1` |
|          61 |           1203 |     15639 | PRIMARY    | `sbtest`.`sbtest1` |
|          37 |           1203 |     15639 | PRIMARY    | `sbtest`.`sbtest1` |
+-------------+----------------+-----------+------------+--------------------+
3 rows in set (0.03 sec)

The table doesn’t fit in the buffer pool, but the queries give us good insights. The pages of the primary key B-Tree have on average 75 records and store a bit less than 15KB of data. The index k_1 is inserted in random order by sysbench. Why is the filling factor so good? It’s simply because sysbench creates the index after the rows have been inserted and InnoDB uses a sort file to create it.

You can easily estimate the number of levels in an InnoDB B-Tree. The above table needs about 40k leaf pages (3M/75). Each node page holds about 1200 pointers when the primary key is a four bytes integer.  The level above the leaves thus has approximately 35 pages and then, on top of the B-Tree is the root node (PAGE_NUMBER = 3). We have a total of three levels.

A randomly inserted example

If you are a keen observer, you realized a direct consequence of inserting in random order of the primary key. The pages are often split, and on average the filling factor is only around 65-75%. You can easily see the filling factor from the information schema. I modified sysbench to insert in random order of id and created a table, also with 3M rows. The resulting table is much larger:

mysql> show table status like 'sbtest1'\G
*************************** 1. row ***************************
           Name: sbtest1
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 3137367
 Avg_row_length: 346
    Data_length: 1088405504
Max_data_length: 0
   Index_length: 47775744
      Data_free: 15728640
 Auto_increment: NULL
    Create_time: 2018-07-19 19:10:36
    Update_time: 2018-07-19 19:09:01
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

While the size of the primary key b-tree inserted in order of id is 644MB, the size, inserted in random order, is about 1GB, 60% larger. Obviously, we have a lower page filling factor:

mysql> select count(*), TABLE_NAME,INDEX_NAME, avg(NUMBER_RECORDS), avg(DATA_SIZE) from information_schema.INNODB_BUFFER_PAGE
    -> WHERE TABLE_NAME='`sbtestrandom`.`sbtest1`'group by TABLE_NAME,INDEX_NAME order by count(*) desc;
+----------+--------------------------+------------+---------------------+----------------+
| count(*) | TABLE_NAME               | INDEX_NAME | avg(NUMBER_RECORDS) | avg(DATA_SIZE) |
+----------+--------------------------+------------+---------------------+----------------+
|     4022 | `sbtestrandom`.`sbtest1` | PRIMARY    |             66.4441 |     10901.5962 |
|     2499 | `sbtestrandom`.`sbtest1` | k_1        |           1201.5702 |     15624.4146 |
+----------+--------------------------+------------+---------------------+----------------+
2 rows in set (0.06 sec)

The primary key pages are now filled with only about 10KB of data (~66%). It is a normal and expected consequence of inserting rows in random order. We’ll see that for some workloads, it is bad. For some others, it is a small price to pay.

A practical analogy

InnoDB Primary KeyIt is always good to have a concrete model or analogy in your mind to better understand what is going on. Let’s assume you have been tasked to write the names and arrival time, on paper, of all the attendees arriving at a large event like Percona Live. So, you sit at a table close to the entry with a good pen and a pile of sheets of paper. As people arrive, you write their names and arrival time, one after the other. When a sheet is full, after about 40 names, you move it aside and start writing to a new one. That’s fast and effective. You handle a sheet only once, and when it is full, you don’t touch it anymore. The analogy is easy, a sheet of paper represents an InnoDB page.

The above use case represents an ordered insert. It is very efficient for the writes. Your only issue is with the organizer of the event: she keeps coming to you asking if “Mr. X” or “Mrs. Y” has arrived. You have to scan through your sheets to find the name. That’s the drawback of ordered inserts, reads can be more expensive. Not all reads are expensive, some can be very cheap. For example: “Who were the first ten people to get in?” is super easy. You’ll want an ordered insert strategy when the critical aspects of the application are the rate and the latency of the inserts. That usually means the reads are not user-facing. They are coming from report batch jobs, and as long as these jobs complete in a reasonable time, you don’t really care.

Now, let’s consider a random insertion analogy. For the next day of the event, tired of the organizer questions, you decide on a new strategy: you’ll write the names grouped by the first letter of the last name. Your goal is to ease the searches by name. So you take 26 sheets, and on top of each one, you write a different letter. As the first visitors arrive, you quickly realize you are now spending a lot more time looking for the right sheet in the stack and putting it back at the right place once you added a name to it.

At the end of the morning, you have worked much more. You also have more sheets than the previous day since for some letters there are few names while for others you needed more than a sheet. Finding names is much easier though. The main drawback of random insertion order is the overhead to manage the database pages when adding entries. The database will read and write from/to disk much more and the dataset size is larger.

Determine your workload type

The first step is to determine what kind of workload you have. When you have an insert-intensive workload, very likely, the top queries are inserts on some large tables and the database heavily writes to disk. If you repeatedly execute “show processlist;” in the MySQL client, you see these inserts very often. That’s typical of applications logging a lot of data. There are many data collectors and they all wait to insert data. If they wait for too long, some data may be lost. If you have strict SLA on the insert time and relaxed ones on the read time, you clearly have an insert oriented workload and you should insert rows in order of the primary key.

You may also have a decent insert rate on large tables but these inserts are queued and executed by batch processes. Nobody is really waiting for these inserts to complete and the server can easily keep up with the number of inserts. What matters for your application is the large number of read queries going to the large tables, not the inserts. You already went through query tuning and even though you have good indexes, the database is reading from disk at a very high rate.

When you look at the MySQL processlist, you see many times the same select query forms on the large tables. The only options seem to be adding more memory to lower the disk reads, but the tables are growing fast and you can’t add memory forever. We’ll discuss the read-intensive workload in details in the next section.

If you couldn’t figure if you have an insert-heavy or read-heavy workload, maybe you just don’t have a big workload. In such a case, the default would be to use ordered inserts, and the best way to achieve this with MySQL is through an auto-increment integer primary key. That’s the default behavior of many ORMs.

A read-intensive workload

I have seen quite a few read-intensive workloads over my consulting years, mostly with online games and social networking applications. On top of that, some games have social networking features like watching the scores of your friends as they progress through the game. Before we go further, we first need to confirm the reads are inefficient. When reads are inefficient, the top select query forms will the accessing a number of distinct InnoDB pages close to the number of rows examined. The Percona Server for MySQL slow log, when the verbosity level includes “InnoDB”, exposes both quantities, and the pt-query-digest tool includes stats on them. Here’s an example output (I’ve removed some lines):

# Query 1: 2.62 QPS, 0.00x concurrency, ID 0x019AC6AF303E539E758259537C5258A2 at byte 19976
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.00
# Time range: 2018-07-19T20:28:02 to 2018-07-19T20:28:23
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         48      55
# Exec time     76    93ms   637us     3ms     2ms     2ms   458us     2ms
# Lock time    100    10ms    72us   297us   182us   247us    47us   176us
# Rows sent    100   1.34k      16      36   25.04   31.70    4.22   24.84
# Rows examine 100   1.34k      16      36   25.04   31.70    4.22   24.84
# Rows affecte   0       0       0       0       0       0       0       0
# InnoDB:
# IO r bytes     0       0       0       0       0       0       0       0
# IO r ops       0       0       0       0       0       0       0       0
# IO r wait      0       0       0       0       0       0       0       0
# pages distin 100   1.36k      18      35   25.31   31.70    3.70   24.84
# EXPLAIN /*!50100 PARTITIONS*/
select * from friends where user_id = 1234\G

The friends table definition is:

CREATE TABLE `friends` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(10) unsigned NOT NULL,
  `friend_user_id` int(10) unsigned NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `active` tinyint(4) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk_user_id_friend` (`user_id`,`friend_user_id`),
  KEY `idx_friend` (`friend_user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=144002 DEFAULT CHARSET=latin1

I built this simple example on my test server. The table easily fits in memory, so there are no disk reads. What matters here is the relation between “page distin” and “Rows examine”. As you can see, the ratio is close to 1. It means that InnoDB rarely gets more than one row per page it accesses. For a given user_id value, the matching rows are scattered all over the primary key b-tree. We can confirm this by looking at the output of the sample query:

mysql> select * from friends where user_id = 1234 order by id limit 10;
+-------+---------+----------------+---------------------+--------+
| id    | user_id | friend_user_id | created             | active |
+-------+---------+----------------+---------------------+--------+
|   257 |    1234 |             43 | 2018-07-19 20:14:47 |      1 |
|  7400 |    1234 |           1503 | 2018-07-19 20:14:49 |      1 |
| 13361 |    1234 |            814 | 2018-07-19 20:15:46 |      1 |
| 13793 |    1234 |            668 | 2018-07-19 20:15:47 |      1 |
| 14486 |    1234 |           1588 | 2018-07-19 20:15:47 |      1 |
| 30752 |    1234 |           1938 | 2018-07-19 20:16:27 |      1 |
| 31502 |    1234 |            733 | 2018-07-19 20:16:28 |      1 |
| 32987 |    1234 |           1907 | 2018-07-19 20:16:29 |      1 |
| 35867 |    1234 |           1068 | 2018-07-19 20:16:30 |      1 |
| 41471 |    1234 |            751 | 2018-07-19 20:16:32 |      1 |
+-------+---------+----------------+---------------------+--------+
10 rows in set (0.00 sec)

The rows are often apart by thousands of id values. Although the rows are small, about 30 bytes, an InnoDB page doesn’t contain more than 500 rows. As the application becomes popular, there are more and more users and the table size grows like the square of the number of users. As soon as the table outgrows the InnoDB the buffer pool, MySQL starts to read from disk. Worse case, with nothing cached, we need one read IOP per friend. If the rate of these selects is 300/s and on average, every user has 100 friends, MySQL needs to access up to 30000 pages per second. Clearly, this doesn’t scale for long.

We need to determine all the ways the table is accessed. For that, I use pt-query-digest and I raise the limit on the number of query forms returned. Let’s assume I found:

  • 93% of the times by user_id
  • 5% of the times by friend_id
  • 2% of the times by id

The above proportions are quite common. When there is a dominant access pattern, we can do something. The friends table is a typical example of a many-to-many table. With InnoDB, we should define such tables as:

CREATE TABLE `friends` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(10) unsigned NOT NULL,
  `friend_user_id` int(10) unsigned NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `active` tinyint(4) NOT NULL DEFAULT '1',
  PRIMARY KEY (`user_id`,`friend_user_id`),
  KEY `idx_friend` (`friend_user_id`),
  KEY `idx_id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=144002 DEFAULT CHARSET=latin1

Now, the rows are ordered, grouped, by user_id inside the primary key B-Tree but the inserts are in random order. Said otherwise, we slowed down the inserts to the benefit of the select statements on the table. To insert a row, InnoDB potentially needs one disk read to get the page where the new row is going and one disk write to save it back to the disk. Remember in the previous analogy, we needed to take one sheet from the stack, add a name and put it back in place. We also made the table bigger, the InnoDB pages are not as full and the secondary indexes are bigger since the primary key is larger. We also added a secondary index. Now we have less data in the InnoDB buffer pool.

Shall we panic because there is less data in the buffer pool? No, because now when InnoDB reads a page from disk, instead of getting only a single matching row, it gets up to hundreds of matching rows. The amount of read IOPS is no longer correlated to the number of friends times the rate of select statements. It is now only a factor of the incoming rate of select statements. The impacts of not having enough memory to cache all the table are much reduced. As long as the storage can perform more read IOPS than the rate of select statements, all is fine. With the modified table, the relevant lines of the pt-query-digest output are now:

# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Rows examine 100   1.23k      16      34   23.72   30.19    4.19   22.53
# pages distin 100     111       2       5    2.09    1.96    0.44    1.96

With the new primary key, instead of 30k read IOPS, MySQL needs to perform only about 588 read IOPS (~300*1.96). It is a workload much easier to handle. The inserts are more expensive but if their rate is 100/s, it just means 100 read IOPS and 100 write IOPS in the worse case.

The above strategy works well when there is a clear access pattern. On top of my mind, here are a few other examples where there are usually dominant access patterns:

  • Game leaderboards (by user)
  • User preferences (by user)
  • Messaging application (by from or to)
  • User object store (by user)
  • Likes on items (by item)
  • Comments on items (by item)

What can you do when you don’t have a dominant access pattern? One option is the use of a covering index. The covering index needs to cover all the required columns. The order of the columns is also important, as the first must be the grouping value. Another option is to use partitions to create an easy to cache hot spot in the dataset. I’ll discuss these strategies in future posts, as this one is long enough!

We have seen in this post a common strategy used to solve read-intensive workload. This strategy doesn’t work all the time — you must access the data through a common pattern. But when it works, and you choose good InnoDB primary keys, you are the hero of the day!

The post Tuning InnoDB Primary Keys appeared first on Percona Database Performance Blog.

by Yves Trudeau at July 26, 2018 05:33 PM

Jean-Jerome Schmidt

Watch the Webinar Replay: Disaster Recovery Planning for MySQL & MariaDB with ClusterControl

Everyone should have a disaster recovery plan for MySQL & MariaDB!

Watch the replay of our webinar with Vinay Joosery, CEO at Severalnines, on Disaster Recovery Planning for MySQL & MariaDB with ClusterControl; especially if you’re unsure about RTO and RPO or whether you should have a secondary datacenter, or are concerned about disaster recovery in the cloud …

Organizations need an appropriate disaster recovery plan in order to mitigate the impact of downtime. But how much should a business invest? Designing a highly available system comes at a cost, and not all businesses and certainly not all applications need five 9s availability; though several 9s are always good to have ;-)

In this webinar, Vinay explains key disaster recovery concepts and walks us through the relevant options from the MySQL & MariaDB ecosystem in order to meet different tiers of disaster recovery requirements; and demonstrates how ClusterControl can help users fully automate an appropriate disaster recovery plan.

Watch the replay!

And/or download the white paper of the same name!

Agenda

  • Business Considerations for DR
    • Is 100% uptime possible?
    • Analyzing risk
    • Assessing business impact
  • Defining DR
    • Outage Timeline
    • RTO
    • RPO
    • RTO + RPO = 0 ?
  • DR Tiers
    • No offsite data
    • Database backup with no Hot Site
    • Database backup with Hot Site
    • Asynchronous replication to Hot Site
    • Synchronous replication to Hot Site
  • Implementing DR with ClusterControl
    • Demo
  • Q&A

Speaker

Vinay Joosery, CEO & Co-Founder, Severalnines

Vinay Joosery, CEO, Severalnines, is a passionate advocate and builder of concepts and business around distributed database systems. Prior to co-founding Severalnines, Vinay held the post of Vice-President EMEA at Pentaho Corporation - the Open Source BI leader. He has also held senior management roles at MySQL / Sun Microsystems / Oracle, where he headed the Global MySQL Telecoms Unit, and built the business around MySQL's High Availability and Clustering product lines. Prior to that, Vinay served as Director of Sales & Marketing at Ericsson Alzato, an Ericsson-owned venture focused on large scale real-time databases.

This webinar builds upon a related white paper written by Vinay on disaster recovery, which you can download here: https://severalnines.com/resources/whitepapers/disaster-recovery-planning-mysql-mariadb.

by jj at July 26, 2018 11:25 AM

July 25, 2018

Valeriy Kravchuk

On Some Problematic Oracle MySQL Server Features

In one of my previous posts I stated that in Oracle's MySQL server some old enough features remain half-backed, not well tested, not properly integrated with each other, and not documented properly. It's time to prove this statement.

I should highlight from the very beginning that most of the features I am going to list are not that much improved by other vendors. But they at least have an option of providing other, fully supported storage engines that may overcome the problems in these features, while Oracle's trend to get rid of most engines but InnoDB makes MySQL users more seriously affected by any problems related to InnoDB.

The Royal Pavilion in Brighton looks nice from the outside and is based on some great engineering decisions, but the decorations had never been completed, some interiors were ruined and never restored, and the building was used for too many different purposes over years.
The list of problematic MySQL server features includes (but is not limited to) the following:
  • InnoDB's data compression

    Classical InnoDB compression (row_format=compressed) has limited efficiency and does not get any attention from developers recently. Transparent page compression for InnoDB seems to be originally more like a proof of concept in MySQL that may not work well in production on commodity hardware and filesystems, and was not integrated with backup tools.
  • Partitioning

    Bugs reported for this feature by MySQL Community do not get proper attention. DDL against partitioned tables and partition pruning do not work the way DBAs may expect. We still miss parallel processing for partitioned tables (even though proof of concept for parallel DDL and some kinds of SELECTs was ready and working 10 years ago). Lack of careful testing of partitioning integration with other features is also visible.
  • InnoDB's FULLTEXT indexes
    This feature appeared in MySQL 5.6, but 5 years later there are still all kinds of serious bugs in it, from wrong results to hangs, debug assertions and crashes. There are performance regressions and missing features comparing to MyISAM FULLTEXT indexes, and this makes the idea to use InnoDB for everything even more problematic. Current implementation is not designed to work with really large tables and result sets. DBAs should expect problems during routine maintenance activities, like ALTERing tables or dumps and restores when any table with InnoDB FULLTEXT index is involved.

  • InnoDB's "online" DDL implementation
    It is not really "online" in too many important practical cases and senses. Replication ignores LOCK=NONE and slave starts to apply "concurrent" DML only after commit, and this may lead to a huge replication lag. The entire table is often rebuilt (data are (re-)written) to often, in place or by creating a copy. One recent improvement in MySQL 8, "instant ADD COLUMN", was actually contributed by Community. The size of the "online log" (that is kept in memory and in temporary file) created per table altered or index created, depends on concurrent DML workload and is hard to predict. For most practical purposes good old pt-online-schema-change or gh-ost tool work better.

  • InnoDB's persistent optimizer statistics

    Automatic statistics recalculation does not work as expected, and to get proper statistics explicit ANALYZE TABLE calls are still needed. The implementation is complicated and introduced separate implicit transactions (in dirty reads mode) against statistics tables. Bugs in the implementation do not seem to get proper priority and are not fixed.
I listed only those features I recently studied in some details in my previous blog posts. I've included main problems with each feature according to my older posts. Click on the links in the list above to find the details.

The Royal Pavilion of InnoDB in MySQL is beautiful from the outside (and somewhere inside), but is far from being completed, and some historical design decisions do not seem to be improved over years. We are lucky that it is still used and works nice for many current purposes, but there are too many dark corners and background threads there where even Oracle engineers rarely look and even less are improving them...

by Valeriy Kravchuk (noreply@blogger.com) at July 25, 2018 10:22 AM

July 24, 2018

Peter Zaitsev

Webinar Thursday, July 27, 2018: MongoDB Sharding 101 – Geo-Partitioning

MongoDB Sharding

MongoDB ShardingPlease join Percona’s Senior Support Engineer, Adamo Tonete as he presents MongoDB Sharding 101 – Geo-Partitioning on Thursday, July 26th, 2018 at 12:30 PM PDT (UTC-7) 3:30 EDT (UTC-4).

In this webinar, we will discuss the common shard keys and demonstrate how to build a worldwide distributed sharded cluster using tags.Tags can be used to choose where to save your data based on location or any other parameter your application uses. With tags, we can guarantee that users from the US will only write their data to the US datacenter, and users from Europe will only write in the European datacenter, for example.

Register for the webinar now.

This is part two of our MongoDB Sharding 101 webinar series. Part one of this series is also available for viewing. You can download the part one slide deck there as well.

MongoDB ShardingAdamo Tonete, Senior Technical Services Engineer

Adamo joined Percona in 2015, after working as a MongoDB/MySQL Database Administrator for three years. As the main database administrator of a startup, he was responsible for suggesting the best architecture and data flows for a worldwide company in a 7/24 environment. Before that, he worked as a Microsoft SQL Server DBA in a large e-commerce company, mainly on performance tuning and automation. Adamo has almost eight years of experience working as a DBA. In the past three years, he has moved to NoSQL technologies without giving up relational databases. He likes to play videogames and to study everything that is related to engines. Adamo lives with his wife in São Paulo, Brazil.

The post Webinar Thursday, July 27, 2018: MongoDB Sharding 101 – Geo-Partitioning appeared first on Percona Database Performance Blog.

by Adamo Tonete at July 24, 2018 09:02 PM

Jean-Jerome Schmidt

Asynchronous Replication Between MySQL Galera Clusters - Failover and Failback

Galera Cluster enforces strong data consistency, where all nodes in the cluster are tightly coupled. Although network segmentation is supported, replication performance is still bound by two factors:

  • Round trip time (RTT) to the farthest node in the cluster from the originator node.
  • The size of a writeset to be transferred and certified for conflict on the receiver node.

While there are ways to boost the performance of Galera, it is not possible to work around these 2 limiting factors.

Luckily, Galera Cluster was built on top of MySQL, which also comes with its built-in replication feature (duh!). Both Galera replication and MySQL replication exist in the same server software independently. We can make use of these technologies to work together, where all replication within a datacenter will be on Galera while inter-datacenter replication will be on standard MySQL Replication. The slave site can act as a hot-standby site, ready to serve data once the applications are redirected to the backup site. We covered this in a previous blog on MySQL architectures for disaster recovery.

In this blog post, we’ll see how straightforward it is to set up replication between two Galera Clusters (PXC 5.7). Then we’ll look at the more challenging part, that is, handling failures at both node and cluster levels. Failover and failback operations are crucial in order to preserve data integrity across the system.

Cluster Deployment

For the sake of our example, we’ll need at least two clusters and two sites - one for the primary and another one for the secondary. It works similarly to traditional MySQL master-slave replication, but on a bigger scale with three nodes in each site. With ClusterControl, you would achieve this by deploying two separate clusters, one on each site. Then, you would configure asynchronous replication between designed nodes from each cluster.

The following diagram illustrates our default architecture:

We have 6 nodes in total, 3 on the primary site and another 3 on the disaster recovery site. To simplify the node representation, we will use the following notations:

  • Primary site: galera1-P, galera2-P, galera3-P (master)
  • Disaster recovery site: galera1-DR, galera2-DR (slave), galera3-DR

Once the Galera Cluster is deployed, simply pick one node on each site to set up the asynchronous replication link. Take note that ALL Galera nodes must be configured with binary logging and log_slave_updates enabled. Enabling GTID is highly recommended, although not compulsory. On all nodes, configure with the following parameters inside my.cnf:

server_id=40 # this number must be different on every node.
binlog_format=ROW
log_bin = /var/lib/mysql-binlog/binlog
log_slave_updates = ON
gtid_mode = ON
enforce_gtid_consistency = true
expire_logs_days = 7

If you are using ClusterControl, from the web interface, pick Nodes -> the chosen Galera node -> Enable Binary Logging. You might then have to change the server-id on the DR site manually to make sure every node is holding a distinct server-id value.

Then on galera3-P, create the replication user:

mysql> GRANT REPLICATION SLAVE ON *.* to slave@'%' IDENTIFIED BY 'slavepassword';

On galera2-DR, point the slave to the current master, galera3-P:

mysql> CHANGE MASTER TO MASTER_HOST = 'galera3-primary', MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword' , MASTER_AUTO_POSITION=1;

Start the replication slave on galera2-DR:

mysql> START SLAVE;

From ClusterControl dashboard, once the replication is established, you should see the DR site has a slave and the Primary site got 3 masters (nodes that produce binary logs):

The deployment is now complete. Applications should send writes to the Primary Site only, as the replication direction goes from Primary Site to DR site. Reads can be sent to both sites, although the DR site might be lagging behind. Assuming that writes only reach the Primary Site, it should not be necessary to set the DR site to read-only (although it can be a good precaution).

This setup will make the primary and disaster recovery site independent of each other, loosely connected with asynchronous replication. One of the Galera nodes in the DR site will be a slave, that replicates from one of the Galera nodes (master) in the primary site. Ensure that both sites are producing binary logs with GTID, and that log_slave_updates is enabled - the updates that come from the asynchronous replication stream will be applied to the other nodes in the cluster.

We now have a system where a cluster failure on the primary site will not affect the backup site. Performance-wise, WAN latency will not impact updates on the active cluster. These are shipped asynchronously to the backup site.

As a side note, it’s also possible to have a dedicated slave instance as replication relay, instead of using one of the Galera nodes as slave.

Node Failover Procedure

In case the current master (galera3-P) fails and the remaining nodes in the Primary Site are still up, the slave on the Disaster Recovery site (galera2-DR) should be directed to any available masters, as shown in the following diagram:

With GTID-based replication, this is peanut. Simply run the following on galera2-DR:

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_HOST = 'galera1-P', MASTER_AUTO_POSITION=1;
mysql> START SLAVE;

Verify the slave status with:

mysql> SHOW SLAVE STATUS\G
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
...            
           Retrieved_Gtid_Set: f66a6152-74d6-ee17-62b0-6117d2e643de:2043395-2047231
            Executed_Gtid_Set: d551839f-74d6-ee17-78f8-c14bd8b1e4ed:1-4,
f66a6152-74d6-ee17-62b0-6117d2e643de:1-2047231
...

Ensure the above values are reporting correctly. The executed GTID set should have 2 sets of GTID at this point, one from all transactions executed on the old master and the other for the new master.

Cluster Failover Procedure

If the primary cluster goes down, crashes, or simply loses connectivity from the application standpoint, the application can be directed to the DR site instantly. No database failover is necessary to continue the operation. If the application has connected to the DR site and starts to write, it is important to ensure no other writes are happening on the Primary Site once the DR site is activated.

The following diagram shows our architecture after application is failed over to the DR site:

Assuming the Primary Site is still down, at this point, there is no replication between sites until we re-configure one of the nodes in the Primary Site once it comes back up.

For clean up purposes, the slave process on the DR site has to be stopped. On galera2-DR, stop replication slave:

mysql> STOP SLAVE;

The failover to the DR site is now considered complete.

Cluster Failback Procedure

To failback to the Primary Site, one of the Galera nodes must become a slave to catch up on changes that happened on the DR site. The procedure would be something like the following:

  1. Pick one node to become a slave in the Primary Site (galera3-P).
  2. Stop all nodes other than the chosen one (galera1-P and galera2-P). Keep the chosen slave up (galera3-P).
  3. Create a backup from the new master (galera2-DR) in the DR site and transfer it over to the chosen slave (galera3-P).
  4. Restore the backup.
  5. Start the replication slave.
  6. Start the remaining Galera node in the Primary Site, with grastate.dat removed.

The below steps can then be performed to fail back the system to its original architecture - Primary is the master and DR is the slave.

1) Shut down all nodes other than the chosen slave:

$ systemctl stop mysql # galera1-P
$ systemctl stop mysql # galera2-P

Or from the ClusterControl interface, simply pick the node from the UI and click "Shutdown Node".

2) Pick a node in the DR site to be the new master (galera2-DR). Create a mysqldump backup with the necessary parameters (for PXC, pxc_strict_mode has to be set other than ENFORCING):

$ mysql -uroot -p -e 'set global pxc_strict_mode = "PERMISSIVE"'
$ mysqldump -uroot -p --all-databases --triggers --routines --events > dump.sql
$ mysql -uroot -p -e 'set global pxc_strict_mode = "ENFORCING"'

3) Transfer the backup to the chosen slave, galera3-P via your preferred remote copy tool:

$ scp dump.sql galera3-primary:~

4) In order to perform RESET MASTER on a Galera node, Galera replication plugin must be turned off. On galera3-P, disable Galera write-set replication temporarily and then restore the dump file in the very same session:

mysql> SET GLOBAL pxc_strict_mode = 'PERMISSIVE';
mysql> SET wsrep_on=OFF;
mysql> RESET MASTER;
mysql> SOURCE /root/dump.sql;

Variable wsrep_on is a session variable. Therefore, we have to perform the restore operation within the same session using SOURCE statement. Otherwise, restoring using standard mysql client would require wsrep_on=OFF or commenting wsrep_provider inside my.cnf set during MySQL startup.

5) Start the replication thread on the chosen slave, galera3-P:

mysql> CHANGE MASTER TO MASTER_HOST = 'galera2-DR', MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;
mysql> SHOW SLAVE STATUS\G

6) Start the remaining of the nodes in the cluster (one node at a time), and force an SST by removing grastate.dat beforehand:

$ rm -Rf /var/lib/mysql/grastate.dat
$ systemctl start mysql

Or from ClusterControl, simply pick the node -> Start Node -> check "Perform an Initial Start".

The above will force other Galera nodes to re-sync with galera3-P through SST and get the most up-to-date data. At this point, the replication direction has switched, from DR to Primary. Write operations are coming to the DR site and the Primary Site has become the replicating site:

From ClusterControl dashboard, you would notice the Primary Site has a slave configured while the DR site are all masters. In ClusterControl, MASTER indicator means all Galera nodes are generating binary logs:

7) Optionally, we can clean up slave's entries on galera2-DR since it's already become a master:

mysql> RESET SLAVE ALL;

8) Once the Primary site catches up, we may switch the database traffic from application back to the primary cluster:

At this point, all writes must go to the Primary Site only. The replication link should be stopped as described under the "Cluster Failover Procedure" section above.

The above mentioned failback steps should be applied when staging back the DR site from the Primary Site:

  • Stop replication between primary site and DR site.
  • Re-slave one of the Galera nodes on the DR site to replicate from the Primary Site.
  • Start replication between both sites.

Once done, the replication direction has gone back to its original configuration, from Primary to DR. Writes operations are coming to the Primary Site and the DR Site is now the replicating site:

Finally, perform some clean ups on the newly promoted master by running "RESET SLAVE ALL".

Advantages

Cluster-to-cluster asynchronous replication comes with a number of advantages:

  • Minimal downtime during database failover operation. Basically, you can redirect the write almost instantly to the slave site, only and only if you can protect writes to not reach the master site (as these writes would not be replicated, and will probably be overwritten when re-syncing from the DR site).
  • No performance impact on the primary site since it is independent from the backup (DR) site. Replication from master to slave is performed asynchronously. The master site generates binary logs, the slave site replicates the events and applies the events at some later time.
  • Disaster recovery site can be used for other purposes, e.g., database backup, binary logs backup and reporting or heavy analytical queries (OLAP). Both sites can be used simultaneously, with exceptions on the replication lag and read-only operations on the slave side.
  • The DR cluster could potentially run on smaller instances in a public cloud environment, as long as they can keep up with the primary cluster. The instances can be upgraded if needed. In certain scenarios, it can save you some costs.
  • You only need one extra site for Disaster Recovery, as opposed to active-active Galera multi-site replication setup, which requires at least 3 sites to operate correctly.

Disadvantages

There are also drawbacks having this setup:

  • There is a chance of missing some data during failover if the slave was behind, since replication is asynchronous. This could be improved with semi-synchronous and multi-threaded slaves replication, albeit there will be another set of challenges waiting (network overhead, replication gap, etc).
  • Despite the failover operation being fairly simple, the failback operation can be tricky and prone to human error. It requires some expertise on switching master/slave role back to the primary site. It's recommended to keep the procedures documented, rehearse the failover/failback operation regularly and use accurate reporting and monitoring tools.
  • There is no built-in failure detection and notification process. You may need to automate and send out notifications to the relevant person once the unwanted event occurs. One good way is to regularly check the slave's status from other available node's point-of-view on the master's site before raising an alarm for the operation team.
  • Pretty costly, as you have to setup a similar number of nodes on the disaster recovery site. This is not black and white, as the cost justification usually comes from the requirements of your business. With some planning, it is possible to maximize usage of database resources at both sites, regardless of the database roles.

by ashraf at July 24, 2018 09:43 AM

July 23, 2018

Peter Zaitsev

Webinar Weds 7/25: XA Transactions

xa transactions distributed transactions

xa transactions distributed transactionsPlease join Percona Senior MySQL DBA for Managed Services, Dov Endress, as he presents XA Transactions on Wednesday, July 25th, 2018 at 12:00 PM PDT (UTC-7) / 3:00 PM EDT (UTC-4).

Distributed transactions (XA) are becoming more and more vital as applications evolve. In this webinar, we will learn what distributed transactions are and how MySQL implements the XA specification. We will learn the investigatory and debugging techniques necessary to ensure high availability and data consistency across disparate environments.

This webinar is not intended to be an in-depth look at transaction managers, but focuses on resource managers only. It is primarily intended for database administrators and site reliability engineers.

Register Now

Dov Endress

Dov Endress, Senior MySQL DBA in Managed Services

Dov joined Percona in the fall of 2015 as a senior support engineer. He learned BASIC in elementary school on an Apple IIE, and his first computer was a Commodore 64. Dov started working in the LAMP stack in 1999 and has been doing so ever since. He lives in northern Nevada with his wife, step-daughter, grandson, a clowder of cats and Macy – the best dog a person could meet. In his free time, he can be found somewhere outdoors or making things in the garage.

The post Webinar Weds 7/25: XA Transactions appeared first on Percona Database Performance Blog.

by Dov Endress at July 23, 2018 06:35 PM

July 22, 2018

Valeriy Kravchuk

Problems with Oracle's Way of MySQL Bugs Database Maintenance

In one of my previous posts I stated that Oracle does not care enough to maintain public MySQL bugs database properly. I think it's time to explain this statement in details.

The fact that https://bugs.mysql.com/ still exists and community bug reports there are still processed on a regular basis by my former colleagues, Miguel Solorzano, Sinisa Milivojevic, Umesh Shastry, Bogdan Kecman and others, is awesome. Some probably had not expected this to still be the case for 8+ years since Oracle took over the software and procedures around it. My former bugs verification team still seems to exist and even get some new members. Moreover, today we have less "Open" bugs than 6 years ago, when I was preparing to leave Oracle to join (and build) the best MySQL support team in the industry elsewhere...

That's all good and beyond the best expectation of many. Now, what's wrong then with the way Oracle engineers process community bug reports these days?

On the photo above that I made last autumn we see the West Pier in Brighton, England. It keeps collapsing after major damages it got in 2002 and 2003 from storms and fires, and this spring I've seen even more ruins happened. Same happens to MySQL public bugs database - we see it is used less than before, and severe damage to the Community is made by some usual and even relatively simple actions and practices. Let me summarize them.
  1. "Security" bugs are handled in such a way that they never becomes public back, even after the problem is fixed in all affected versions and this fix is documented.

    Moreover, often it takes bug reporter to check "security" flag by mistake or for whatever reason for nobody else ever to be able to find out what exactly the bug was about. This is done even in cases when all other vendors keep the information public or open it after the fix is published. Even worse, sometimes when somebody "escalates" some public bug forgotten for a long time (or wrongly handled) to Oracle engineers, in public, the bug immediately becomes private, even though nobody cared about it for months before. I have complained about this many times everywhere.
    I would so much like to publish all the details of mishandling of Bug #91118, for example, but the bug was made private, so you'd have to just trust my words and quotes, while I prefer to provide arguments everyone can verify... Remember the bug number though and this (top) part of the stack trace:
    ...
    /home/openxs/dbs/8.0/bin/
    mysqld(row_search_mvcc(unsigned char*,
    page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)+0x2b76)
    [0x1dc78a6]
    /home/openxs/dbs/8.0/bin/
    mysqld(ha_innobase::general_fetch(unsigned
    char*, unsigned int, unsigned int)+0x15f) [0x1c98c1f]
    /home/openxs/dbs/8.0/bin/
    mysqld(handler::ha_index_next_same(unsigned
    char*, unsigned char const*, unsigned int)+0x1e4) [0xe909a4]
    /home/openxs/dbs/8.0/bin/
    mysqld() [0xc5636a]
    ...
    The practice of hiding bugs once and forever was advocated by some MySQL engineers well before Oracle's acquisition, but in Oracle it became a rule that only bravest engineers sometimes can afford NOT to follow.

  2. Some older bug reports are never processed or never revisited back by Oracle engineers.

    Consider this simple example, Bug #70237 - "the mysqladmin shutdown hangs". It was reported almost 5 years ago by a famous Community member, Zhai Weixiang, who even suggested a patch. This bug had not got a single public comment from anyone in Oracle.
    The oldest open server bug today is Bug #44411  - "some Unicode text garbled in LOAD DATA INFILE with user variables". It was reported more than 9 years ago. Moreover, it was once "Verified", but then silently became "Not a bug" and then was reopened by the bug reporter. Nobody cares, even though it's clear that many Oracle engineers should get notification emails whenever any change to public bug report happens. This is also fundamentally wrong, no matter what happened to assignee or engineer who worked on the bug in the past.

    This specific problem with bugs handling is not new, we always had a backlog of bugs to verify and some bugs were re-checked maybe once in many years, but Oracle now has all kinds of resources to fix this problem, and not just reduce the number of open reports by closing them by all fair and not that fair means... See the next item also.

  3. Recently some bug reports are handled wrongly, with a trend of wasting bug reporter time on irrelevant clarifications or closing the bug too early when there is a problem to reproduce it.

    If you report some MySQL problem to Oracle, be ready to see your report closed soon with a suggestion to contact Support. Check this recent example, Bug #90375 - "Significantly improve performance of simple functions". OK, this is a known limitation, but user suggested several workarounds that are valid feature requests for optimizer, that can be smart enough to inline the stored function if it just returns some simple expression. Why not to verify this as a valid feature request?
    Another great example is Bug #80919 - "MySQL Crashes when Droping Indexes - Long semaphore wait". It was closed (after wasting more than a year on waiting) as a "Duplicate" with such a nice comment:
    "[16 Jan 15:53] Sinisa Milivojevic
    Hi!

    This is a duplicate bug, because it is very similar to an internal-only bug, that is not present in the public bugs database.


    I will provide the internal bug number in the hidden comment, but will update this page with public comment, once when that bug is fixed.
    "
    No reference to the internal bug number, just nothing. If you are wondering what is the real problem, check MDEV-14637.
    Yet another case of "Not a bug", where the decision is questionable and further statements from the bug reporter are ignored is Bug #89065 - "sync_binlog=1 on a busy server and slow binary log filesystem stalls slaves".
    The bug may be "Verified" finally, but after some time wasted on discussions and clarifications when the problem is obvious and bug report contains everything needed to understand this. Nice example is Bug #91386 - "Index for group-by is not used with primary key for SELECT COUNT(DISTINCT a)". Yet another example is Bug #91010 - "WolfSSL build broken due to cmake typo". Knowing parties involved in that discussions in person, I wish they spent their time on something more useful than arguing on the obvious problems.

    This practice discourage users from reporting bugs to Oracle. Not that bug handling mistakes never happened before Oracle (I did many myself), but recently I see more and more wrongly handled bugs. This trend is scary!

  4. Oracle mostly fixes bugs reported internally.

    Just check any recent Release Notes and make your own conclusion. One may say it means that internal QA and developers find bugs even before Community notices them, but the presence of all kinds of test failures, regression bugs etc in the same recent versions still tells me that Community QA is essential for MySQL quality. But it does not set the agenda for the bug fixing process, for many years.
    One may also say that bug fixing agenda is defined by Oracle customers mostly. I let Oracle customers to comment here if they are happy with what they get. Some of the were not so happy with the speed of resolution even for their real security related bugs.
I can continue with more items, but let's stop for now.

If we want MySQL public bugs database to never reach the state of the West Pier (declared to be beyond repair in 2004), we should force Oracle to do something to fix the problems above. Otherwise we'll see further decline of bugs database and Community activity switched elsewhere (to Percona's and MariaDB's public bugs databases, or somewhere else). It would be sad to not have any central public location for all problem reports about core MySQL server functionality...

by Valeriy Kravchuk (noreply@blogger.com) at July 22, 2018 06:57 PM

July 20, 2018

MariaDB AB

MariaDB Connector/Node.js First Alpha Now Available

MariaDB Connector/Node.js First Alpha Now Available diego Dupin Fri, 07/20/2018 - 09:33

MariaDB is pleased to announce the immediate availability of MariaDB Connector/Node.js alpha version 0.7.0. This is a non-blocking MariaDB client for Node.js, 100 percent JavaScript, compatible with Node.js 6+.

Why a new client? While there are existing clients that work with MariaDB, (such as the mysql and mysql2 clients), the MariaDB Node.js Connector offers new functionality, like insert Streaming and Pipelining while making no compromises on performance.

Insert Streaming

Using a Readable stream in your application, you can stream INSERT statements to MariaDB through the Connector.

https.get('https://someContent', readableStream => {
    //readableStream implement Readable, driver will stream data to database 
    connection.query("INSERT INTO myTable VALUE (?)", [readableStream]);
});

Pipelining

With Pipelining, the Connector sends commands without waiting for server results, preserving order. For instance, consider the use of executing two INSERT statements.

pip - Copie.png

The Connector doesn't wait for query results before sending the next INSERT statement. Instead, it sends queries one after the other, avoiding much of the network latency.

Quick Start

The MariaDB Connector is available through the Node.js repositories. You can install it using npm.

$ npm install mariadb

Using ECMAScript 2017:

const mariadb = require('mariadb');
const pool = mariadb.createPool({host: 'mydb.com', user:'myUser', connectionLimit: 5});

async function asyncFunction() {
  let conn;
  try {
	conn = await pool.getConnection();
	const rows = await conn.query("SELECT 1 as val");
	console.log(rows); //[ {val: 1}, meta: ... ]
	const res = await conn.query("INSERT INTO myTable value (?, ?)", [1, "mariadb"]);
	console.log(res); // { affectedRows: 1, insertId: 1, warningStatus: 0 }

  } catch (err) {
	throw err;
  } finally {
	if (conn) return conn.end();
  }
}


Documentation can be found on the  MariaDB knowledge base and sources are on GitHub.

Benchmarks

Comparing the MariaDB Connector with other Node.js clients:

promise-mysql  : 1,366 ops/sec ±1.42%
mysql2         : 1,469 ops/sec ±1.63%
mariadb        : 1,802 ops/sec ±1.19%

bench.png

Benchmarks for the MariaDB Node.js Connector are done using the popular benchmark.js package. You can find the source code for our benchmarks in the benchmarks/ folder.

Roadmap

The MariaDB Node.js Connector remains in development. This is an alpha release so we do not recommend using it in production. Below is a list of features being developed for future connector releases.

  • PoolCluster
  • MariaDB ed25519 plugin authentication
  • Query Timeouts
  • Bulk Insertion, (that is, fast batch).

Download the MariaDB Node.js Connector directly.

MariaDB is pleased to announce the immediate availability of MariaDB Connector/Node.js 0.7.0, which is the first alpha version. See the release notes and changelog for details.

Login or Register to post comments

by diego Dupin at July 20, 2018 01:33 PM

Peter Zaitsev

InnoDB Cluster in a Nutshell Part 3: MySQL Shell

MySQL InnoDB Cluster MySQL Shell

MySQL InnoDB Cluster MySQL ShellWelcome to the third part of this series. I’m glad you’re still reading, as hopefully this means you find this subject interesting at least. Previously we presented the first two components of MySQL InnoDB Cluster: Group Replication and MySQL Router and now we will discuss the last component, MySQL Shell.

MySQL Shell

This is the last component in the cluster and I love it. Oracle have created this tool to centralize cluster management, providing a friendly, command-line based user interface.

The tool can be defined as an advanced MySQL shell, which is much more powerful than the well known MySQL client. With the capacity to work with both relational and document (JSON) data, the tool provides an extended capability to interact with the database from a single place.

MySQL Shell is also able to understand different languages:

  • JavaScript (default) which includes several built-in functions to administer the cluster—create, destroy, restart, etc.—in a very easy way.
  • Python it provides an easy way to write Python code to interact with the database. This is particularly useful for developers who don’t need to have SQL skills or run applications to test code.
  • SQL to work in classic mode to query database as we used to do with the old MySQL client.

A very interesting feature provided with MySQL Shell is the ability to establish different connections to different servers/clusters from within the same shell. There is no need to exit to connect to a different server, just issuing the command \connect will make this happen. As DBA, I find this pretty useful when handling multiple clusters/servers.

Some of the features present in this tool:

  • Capacity to use both Classic and X protocols.
  • Online switch mode to change languages (JavaScript, Python and SQL)
  • Auto-completion of commands using tab, a super expected feature in MySQL client.
  • Colored formatting output that also supports different formats like Table, Tab-separated and Json formats.
  • Batch mode that processes batches of commands allowing also an interactive mode to print output according each line is processed.

Some sample commands

Samples of new tool and execution modes:

#switch modes
\sql
\js
\py
#connect to instance
\connect user@host:[port]
#create a cluster (better to handle through variables)
var cluster=dba.createCluster('percona')
#add instances to cluster
cluster.addInstance(‘root@192.168.70.2:3306’)
#check cluster status
cluster.status()
#using another variable
var cluster2=dba.getCluster(‘percona’)
cluster.status()
#get cluster structure
cluster.describe()
#rejoin instance to cluster - needs to be executed locally to the instance
cluster.rejoinInstance()
#rejoin instance to cluster - needs to be executed locally to the instance
cluster.rejoinInstance()
#recover from lost quorum
cluster.forceQuorumUsingPartitionOf(‘root@localhost:3306’)
#recover from lost quorum
cluster.rebootClusterFromCompleteOutage()
#destroy cluster
cluster.dissolve({force:true});

Personally, I think this tool is a very good replacement for the classic MySQL client. Sadly, mysql-server installations do not include MySQL shell by default, but it is worth getting used to. I recommend you try it.

Conclusion

We finally reached the end of this series. I hope you have enjoyed this short introduction to what seems to be Oracle’s bid to have a built-in High Availability solution based on InnoDB. It may become a good competitor to Galera-based solutions. Still, there is a long way to go, as the tool was only just released as GA (April 2018). There are a bunch of things that need to be addressed before it becomes consistent enough to be production-ready. In my personal opinion, it is not—yet. Nevertheless, I think it is a great tool that will eventually be a serious player in the HA field as it’s an excellent, flexible and easy to deploy solution.

The post InnoDB Cluster in a Nutshell Part 3: MySQL Shell appeared first on Percona Database Performance Blog.

by Francisco Bordenave at July 20, 2018 12:58 PM

Jean-Jerome Schmidt

6 Common Failure Scenarios for MySQL & MariaDB, and How to Fix Them

It this blog post, we will analyze 6 different failure scenarios in production database systems, ranging from single-server issues to multi-datacenter failover plans. We will walk you through recovery and failover procedures for the respective scenario. Hopefully, this will give you a good understanding of the risks you might face and things to consider when designing your infrastructure.

Database schema corrupted

Let's start with single node installation - a database setup in the simplest form. Easy to implement, at the lowest cost. In this scenario, you run multiple applications on the single server where each of the database schemas belongs to the different application. The approach for recovery of a single schema would depend on several factors.

  • Do I have any backup?
  • Do I have a backup and how fast can I restore it?
  • What kind of storage engine is in use?
  • Do I have a PITR-compatible (point in time recovery) backup?

Data corruption can be identified by mysqlcheck.

mysqlcheck -uroot -p <DATABASE>

Replace DATABASE with the name of the database, and replace TABLE with the name of the table that you want to check:

mysqlcheck -uroot -p <DATABASE> <TABLE>

Mysqlcheck checks the specified database and tables. If a table passes the check, mysqlcheck displays OK for the table. In below example, we can see that the table salaries requires recovery.

employees.departments                              OK
employees.dept_emp                                 OK
employees.dept_manager                             OK
employees.employees                                OK
Employees.salaries
Warning  : Tablespace is missing for table 'employees/salaries'
Error    : Table 'employees.salaries' doesn't exist in engine
status   : Operation failed
employees.titles                                   OK

For a single node installation with no additional DR servers, the primary approach would be to restore data from backup. But this is not the only thing you need to consider. Having multiple database schema under the same instance causes an issue when you have to bring your server down to restore data. Another question is if you can afford to rollback all of your databases to the last backup. In most cases, that would not be possible.

There are some exceptions here. It is possible to restore a single table or database from the last backup when point in time recovery is not needed. Such process is more complicated. If you have mysqldump, you can extract your database from it. If you run binary backups with xtradbackup or mariabackup and you have enabled table per file, then it is possible.

Here is how to check if you have a table per file option enabled.

mysql> SET GLOBAL innodb_file_per_table=1; 

With innodb_file_per_table enabled, you can store InnoDB tables in a tbl_name .ibd file. Unlike the MyISAM storage engine, with its separate tbl_name .MYD and tbl_name .MYI files for indexes and data, InnoDB stores the data and the indexes together in a single .ibd file. To check your storage engine you need to run:

mysql> select table_name,engine from information_schema.tables where table_name='table_name' and table_schema='database_name';

or directly from the console:

[root@master ~]# mysql -u<username> -p -D<database_name> -e "show table status\G"
Enter password: 
*************************** 1. row ***************************
           Name: test1
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 12
 Avg_row_length: 1365
    Data_length: 16384
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2018-05-24 17:54:33
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: 

To restore tables from xtradbackup, you need to go through an export process. Backup needs to be prepared before it can be restored. Exporting is done in the preparation stage. Once a full backup is created, run standard prepare procedure with the additional flag --export :

innobackupex --apply-log --export /u01/backup

This will create additional export files which you will use later on in the import phase. To import a table to another server, first create a new table with the same structure as the one that will be imported at that server:

mysql> CREATE TABLE corrupted_table (...) ENGINE=InnoDB;

discard the tablespace:

mysql> ALTER TABLE mydatabase.mytable DISCARD TABLESPACE;

Then copy mytable.ibd and mytable.exp files to database’s home, and import its tablespace:

mysql> ALTER TABLE mydatabase.mytable IMPORT TABLESPACE;

However to do this in a more controlled way, the recommendation would be to restore a database backup in other instance/server and copy what is needed back to the main system. To do so, you need to run the installation of the mysql instance. This could be done either on the same machine - but requires more effort to configure in a way that both instances can run on the same machine - for example, that would require different communication settings.

You can combine both task restore and installation using ClusterControl.

ClusterControl will walk you through the available backups on-prem or in the cloud, let you choose exact time for a restore or the precise log position, and install a new database instance if needed.

ClusterControl point in time recovery
ClusterControl point in time recovery
ClusterControl restore and verify on a standalone host
ClusterControl restore and verify on a standalone host
CusterControl restore and verify on a standalone host. Installation options.
CusterControl restore and verify on a standalone host. Installation options.

You can find more information about data recovery in blog My MySQL Database is Corrupted... What Do I Do Now?

Database instance corrupted on the dedicated server

Defects in the underlying platform are often the cause for database corruption. Your MySQL instance relies on a number of things to store and retrieve data - disk subsystem, controllers, communication channels, drivers and firmware. A crash can affect parts of your data, mysql binaries or even backup files that you store on the system. To separate different applications, you can place them on dedicated servers.

Different application schemas on separate systems is a good idea if you can afford them. One may say that this is a waste of resources, but there is a chance that the business impact will be less if only one of them goes down. But even then, you need to protect your database from data loss. Storing backup on the same server is not a bad idea as long you have a copy somewhere else. These days, cloud storage is an excellent alternative to tape backup.

ClusterControl enables you to keep a copy of your backup in the cloud. It supports uploading to the top 3 cloud providers - Amazon AWS, Google Cloud, and Microsoft Azure.

When you have your full backup restored, you may want to restore it to certain point in time. Point-in-time recovery will bring server up to date to a more recent time than when the full backup was taken. To do so, you need to have your binary logs enabled. You can check available binary logs with:

mysql> SHOW BINARY LOGS;

And current log file with:

SHOW MASTER STATUS;

Then you can capture incremental data by passing binary logs into sql file. Missing operations can be then re-executed.

mysqlbinlog --start-position='14231131' --verbose /var/lib/mysql/binlog.000010 /var/lib/mysql/binlog.000011 /var/lib/mysql/binlog.000012 /var/lib/mysql/binlog.000013 /var/lib/mysql/binlog.000015 > binlog.out

The same can be done in ClusterControl.

ClusterControl cloud backup
ClusterControl cloud backup
ClusterControl cloud backup
ClusterControl cloud backup

Database slave goes down

Ok, so you have your database running on a dedicated server. You created a sophisticated backup schedule with a combination of full and incremental backups, upload them to the cloud and store the latest backup on local disks for fast recovery. You have different backup retention policies - shorter for backups stored on local disk drivers and extended for your cloud backups.

It sounds like you are well prepared for a disaster scenario. But when it comes to the restore time, it may not satisfy your business needs.

You need a quick failover function. A server that will be up and running applying binary logs from the master where writes happen. Master/Slave replication starts a new chapter in the failover scenario. It's a fast method to bring your application back to life if you master goes down.

But there are few things to consider in the failover scenario. One is to setup a delayed replication slave, so you can react to fat finger commands that were triggered on the master server. A slave server can lag behind the master by at least a specified amount of time. The default delay is 0 seconds. Use the MASTER_DELAY option for CHANGE MASTER TO to set the delay to N seconds:

CHANGE MASTER TO MASTER_DELAY = N;

Second is to enable automated failover. There are many automated failover solutions on the market. You can set up automatic failover with command line tools like MHA, MRM, mysqlfailover or GUI Orchestrator and ClusterControl. When it's set up correctly, it can significantly reduce your outage.

ClusterControl supports automated failover for MySQL, PostgreSQL and MongoDB replications as well as multi-master cluster solutions Galera and NDB.

ClusterControl replication topology view
ClusterControl replication topology view

When a slave node crashes and the server is severely lagging behind, you may want to rebuild your slave server. The slave rebuild process is similar to restoring from backup.

ClusterControl rebuild slave
ClusterControl rebuild slave

Database multi-master server goes down

Now when you have slave server acting as a DR node, and your failover process is well automated and tested, your DBA life becomes more comfortable. That's true, but there are a few more puzzles to solve. Computing power is not free, and your business team may ask you to better utilize your hardware, you may want to use your slave server not only as passive server, but also to serve write operations.

You may then want to investigate a multi-master replication solution. Galera Cluster has become a mainstream option for high availability MySQL and MariaDB. And though it is now known as a credible replacement for traditional MySQL master-slave architectures, it is not a drop-in replacement.

Galera cluster has a shared nothing architecture. Instead of shared disks, Galera uses certification based replication with group communication and transaction ordering to achieve synchronous replication. A database cluster should be able to survive a loss of a node, although it's achieved in different ways. In case of Galera, the critical aspect is the number of nodes. Galera requires a quorum to stay operational. A three node cluster can survive the crash of one node. With more nodes in your cluster, you can survive more failures.

Recovery process is automated so you don’t need to perfom any failover operations. However the good practice would be to kill nodes and see how fast you can bring them back. In order to make this operation more efficient, you can modify the galera cache size. If the galera cache size is not properly planned, your next booting node will have to take a full backup instead of only missing write-sets in the cache.

The failover scenario is simple as starting the intance. Based on the data in the galera cache, the booting node will perfom SST (restore from full backup) or IST (apply missing write-sets). However, this is often linked to human intervention. If you want to automate the entire failover process, you can use ClusterControl’s autorecovery functionality (node and cluster level).

ClusterControl cluster autorecovery
ClusterControl cluster autorecovery

Estimate galera cache size:

MariaDB [(none)]> SET @start := (SELECT SUM(VARIABLE_VALUE/1024/1024) FROM information_schema.global_status WHERE VARIABLE_NAME LIKE 'WSREP%bytes'); do sleep(60); SET @end := (SELECT SUM(VARIABLE_VALUE/1024/1024) FROM information_schema.global_status WHERE VARIABLE_NAME LIKE 'WSREP%bytes'); SET @gcache := (SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(@@GLOBAL.wsrep_provider_options,'gcache.size = ',-1), 'M', 1)); SELECT ROUND((@end - @start),2) AS `MB/min`, ROUND((@end - @start),2) * 60 as `MB/hour`, @gcache as `gcache Size(MB)`, ROUND(@gcache/round((@end - @start),2),2) as `Time to full(minutes)`;

To make failover more consistent, you should enable gcache.recover=yes in mycnf. This option will revive the galera-cache on restart. This means the node can act as a DONOR and service missing write-sets (facilitating IST, instead of using SST).

2018-07-20  8:59:44 139656049956608 [Note] WSREP: Quorum results:
    version    = 4,
    component  = PRIMARY,
    conf_id    = 2,
    members    = 2/3 (joined/total),
    act_id     = 12810,
    last_appl. = 0,
    protocols  = 0/7/3 (gcs/repl/appl),
    group UUID = 49eca8f8-0e3a-11e8-be4a-e7e3fe48cb69
2018-07-20  8:59:44 139656049956608 [Note] WSREP: Flow-control interval: [28, 28]
2018-07-20  8:59:44 139656049956608 [Note] WSREP: Trying to continue unpaused monitor
2018-07-20  8:59:44 139657311033088 [Note] WSREP: New cluster view: global state: 49eca8f8-0e3a-11e8-be4a-e7e3fe48cb69:12810, view# 3: Primary, number of nodes: 3, my index: 1, protocol version 3

Proxy SQL node goes down

If you have a virtual IP setup, all you have to do is to point your application to the virtual IP address and everything should be correct connection wise. It’s not enough to have your database instances spanning across multiple datacenters, you still need your applications to access them. Assume you have scaled out the number of read replicas, you might want to implement virtual IPs for each of those read replicas as well because of maintenance or availability reasons. It might become a cumbersome pool of virtual IPs that you have to manage. If one of those read replicas face a crash, you need to re-assign the virtual IP to the different host, or else your application will connect to either a host that is down or in the worst case, a lagging server with stale data.

ClusterControl HA load balancers topology view
ClusterControl HA load balancers topology view

Crashes are not frequent, but more probable than servers going down. If for whatever reason, a slave goes down, something like ProxySQL will redirect all of the traffic to the master, with the risk of overloading it. When the slave recovers, traffic will be redirected back to it. Usually, such downtime shouldn’t take more than a couple of minutes, so the overall severity is medium, even though the probability is also medium.

To have your load balancer components redundant, you can use keepalived.

ClusterControl: Deploy keepalived for ProxySQL load balancer
ClusterControl: Deploy keepalived for ProxySQL load balancer

Datacenter goes down

The main problem with replication is that there is no majority mechanism to detect a datacenter failure and serve a new master. One of the resolutions is to use Orchestrator/Raft. Orchestrator is a topology supervisor that can control failovers. When used along with Raft, Orchestrator will become quorum-aware. One of the Orchestrator instances is elected as a leader and executes recovery tasks. The connection between orchestrator node does not correlate to transactional database commits and is sparse.

Orchestrator/Raft can use extra instances which perfom monitoring of the topology. In the case of network partitioning, the partitioned Orchestrator instances won’t take any action. The part of the Orchestrator cluster which has the quorum will elect a new master and make the necessary topology changes.

ClusterControl is used for management, scaling and, what’s most important, node recovery - Orchestrator would handle failovers, but if a slave would crash, ClusterControl will make sure it will be recovered. Orchestrator and ClusterControl would be located in the same availability zone, separated from the MySQL nodes, to make sure their activity won’t be affected by network splits between availability zones in the data center.

by Bart Oles at July 20, 2018 12:32 PM

July 19, 2018

MariaDB AB

Data Streaming with MariaDB

Data Streaming with MariaDB Faisal Thu, 07/19/2018 - 12:07

Big Data vs Fast Data

While Big Data is being used across the globe by companies to solve their analytical problems, sometimes it becomes a hassle to extract data from a bunch of data sources, do the necessary transformation and then eventually load it into an analytical platform such as Hadoop or something else.

This obviously takes time and effort, instead of a bunch of ETL jobs, MariaDB provides a data streaming solution directly from OLTP MariaDB TX to OLAP MariaDB AX.

Fast real-time data streaming is achieved with the help of a CDC (Change Data Capture) framework that streams data from MariaDB to MariaDB ColumnStore using MariaDB MaxScale 2.2.x as a bin-log router. MaxScale makes use of the Binary Logs from a MariaDB TX server and steams the data directly to MariaDB AX (MariaDB ColumnStore) for analytics.

This is achieved with the MaxScale CDC ConnectorCDC Adapter, and the ColumnStore API package. This sounds a bit complex but in reality, it's quite simple.

Here is a quick look at how the data streaming setup works:

MariaDB-server → MaxScale → MaxScale-CDC-Connector → MaxScale-CDC-Adapter → ColumnStore-API → ColumnStore Database

CDC Architecture.png

The Setup

Requirements

Here is a quick look at the setup that we are going to be working on, I am using Oracle Virtual Machines with CentOS 7

  • 1x CentOS 7 VM for MariaDB TX 3.0

  • 1x CentOS 7 VM for MaxScale as a Replication Router

  • 1x CentOS 7 VM for CDC Adapter + Connector and ColumnStore API

  • 1x CentOS 7 VM for ColumnStore, we will be using a single node "combined" ColumnStore setup for simplicity, refer to MariaDB Columnstore Installation Guide page for detailed distributed setup instruction using CentOS 7 VMs.

  • Our Setup will look like this

    • MariaDB (192.168.56.101)
      • MaxScale (192.168.56.102)
        • CDC Server (192.168.56.103)
          • ColumnStore (192.168.56.104)

Note: We will assume that you already have a MariaDB (Source of our data streaming) and ColumnStore (Data's final destination) readily available for use. 

Preparing the VM OS

There are a few important things that are required before we start the installations.

Note: the following steps must be performed and validated on all the VMs

Disable SELinux

For the purposes of testing, we want SELinux disabled. Make sure that your SELinux configuration, in the file /etc/selinux/config, looks something like this on all the nodes:

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

The change here is the SELinux setting of course.

After a reboot of the server, check if the SELinux has actually been disabled, use either of the two commands (sestatus/getenforce) to confirm

[root@localhost ~] sestatus
SELinux status:                 disabled

[root@localhost ~] getenforce
Disabled

Disable firewalld

Firewalld is a standard service that is disabled using the systemctl command on the REHL 7/CETOS 7. After disabling double check using systemctl status firewalld on all the nodes

[root@localhost ~] systemctl stop firewalld

[root@localhost ~] systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

[root@localhost ~] systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

Setup up the MariaDB account

The mariadb user and group are required to be created on all the nodessudo privilege for the mariadb user is also mandatory.

Following this, all steps will be done using mariadb user with the help of sudo unless specified differently.

[root@localhost ~] groupadd mysql
[root@localhost ~] useradd -g mysql mysql

[root@localhost ~] echo "mysql ALL=(ALL) ALL" >> /etc/sudoers

[root@localhost ~] passwd mysql
Changing password for user mysql.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Enable Networking on the VirtualBox VMs

This is important so that the VMs could use Static IP addresses and also enable them to use Internet access of the host operating system.

Note: Make sure that NAT and Host-Only Adapters are both enabled for all the VMs.

Network setting within the VMs will need to be modified as follows for NAT:

[root@localhost ~] cat /etc/sysconfig/network-scripts/ifcfg-enp0s3
TYPE=Ethernet
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=enp0s3
DNS1=8.8.8.8
DNS2=8.8.4.4
UUID=89bfb7a3-17c3-49f9-aaa6-4d66c47ea6fb
DEVICE=enp0s3
ONBOOT=yes
PEERDNS=yes
PEERROUTES=yes

For Host-Only Adapter edit the IP address accordingly for each VM:

[root@localhost ~] cat /etc/sysconfig/network-scripts/ifcfg-enp0s8
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=enp0s8
PREFIX=24
IPADDR=192.168.56.101
UUID=21a845ad-8cae-4a2c-b563-164a1f8a30cf
DEVICE=enp0s8
ONBOOT=yes

Note: Make sure the above two files (ifcfg-enp0s3mifcfg-enp0s8) are changed on each Node/VM according and has it's own distinct Static IP defined in the file (ifcfg-enp0s8).

Set Hostnames on each VM for a better setup, this step is not mandatory but a good practice.

MariaDB VM:

[root@localhost ~] hostnamectl set-hostname mariadb101

MaxScale VM:

[root@localhost ~] hostnamectl set-hostname maxscale102

CDC VM:

[root@localhost ~] hostnamectl set-hostname cdc103

Install MariaDB TX 3.0

Setup the Repository

Login to the MariaDB Server (192.168.56.101)

Note: Not all the Linux distros are supported by the curl tool

  • Once the repository has been set up, simply use "yum install"
    • yum -y install MariaDB-server
      • Take care with the Unix's case sensitivity

We will be using the curl script to set up the MariaDB repositories, on production environments internet access is normally not available, in that case, we can download the RPMs externally and transfer the files to the servers using your favorite secure file transfer tools.

Installation

We will use mariadb user to install MariaDB on the MariaDB server.

[root@mariadb101 ~] su - mysql
[mysql@mariadb101 ~]$ sudo curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash
[info] Repository file successfully written to /etc/yum.repos.d/mariadb.repo.
[info] Adding trusted package signing keys...
[info] Succeessfully added trusted package signing keys.

[mysql@mariadb101 ~]$ sudo yum install MariaDB-server
mariadb-main                                                             | 2.9 kB  00:00:00
mariadb-maxscale                                                         | 2.4 kB  00:00:00
mariadb-tools                                                            | 2.9 kB  00:00:00
(1/3): mariadb-maxscale/7/x86_64/primary_db                              | 6.3 kB  00:00:01
(2/3): mariadb-tools/7/x86_64/primary_db                                 | 9.6 kB  00:00:01
(3/3): mariadb-main/7/x86_64/primary_db                                  |  48 kB  00:00:01
Resolving Dependencies

================================================================================================
 Package                        Arch          Version                      Repository      Size
================================================================================================
Installing:
 MariaDB-compat                 x86_64        10.3.7-1.el7.centos          mariadb-main   2.8 M
     replacing  mariadb-libs.x86_64 1:5.5.56-2.el7
 MariaDB-server                 x86_64        10.3.7-1.el7.centos          mariadb-main   123 M
Installing for dependencies:
 MariaDB-client                 x86_64        10.3.7-1.el7.centos          mariadb-main    53 M
 MariaDB-common                 x86_64        10.3.7-1.el7.centos          mariadb-main   157 k
 galera                         x86_64        25.3.23-1.rhel7.el7.centos   mariadb-main   8.0 M

Transaction Summary
================================================================================================
Install  2 Packages (+40 Dependent packages)

Total download size: 200 M
Is this ok [y/d/N]: Y
...
...
Complete!

Enter "Y" when prompted, it will download and install MariaDB server and Client along with all its dependencies.

Install MariaDB MaxScale 2.2.x

Setup the Repository

Login to the MaxScale server (102.168.56.102) and Use "curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash"

Just like with MariaDB, we will use mariadb user to set up the MariaDB repositories and install MaxScale using yum

[root@maxscale102 ~] curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash
[info] Repository file successfully written to /etc/yum.repos.d/mariadb.repo.
[info] Adding trusted package signing keys...
[info] Succeessfully added trusted package signing keys.

[root@maxscale102 ~] su - mysql
Last login: Fri Jun 22 14:39:53 EDT 2018 on pts/0

[mysql@maxscale102 ~]$ sudo yum install maxscale
mariadb-main                                                             | 2.9 kB  00:00:00
mariadb-maxscale                                                         | 2.4 kB  00:00:00
mariadb-tools                                                            | 2.9 kB  00:00:00
(1/3): mariadb-maxscale/7/x86_64/primary_db                              | 6.3 kB  00:00:01
(2/3): mariadb-tools/7/x86_64/primary_db                                 | 9.6 kB  00:00:01
(3/3): mariadb-main/7/x86_64/primary_db                                  |  48 kB  00:00:02
Resolving Dependencies
Dependencies Resolved

================================================================================================
 Package               Arch                Version                Repository               Size
================================================================================================
Installing:
 maxscale              x86_64              2.2.9-1                mariadb-maxscale         18 M
Installing for dependencies:
 gnutls                x86_64              3.3.26-9.el7           base                    677 k
 nettle                x86_64              2.7.1-8.el7            base                    327 k
 trousers              x86_64              0.3.14-2.el7           base                    289 k

Transaction Summary
================================================================================================
Install  1 Package (+3 Dependent packages)

Total download size: 19 M
Installed size: 70 M
Is this ok [y/d/N]: Y
...
...
Complete!

[mysql@maxscale102 ~]$ sudo systemctl enable maxscale
Created symlink from /etc/systemd/system/multi-user.target.wants/maxscale.service to /usr/lib/systemd/system/maxscale.service.
[mysql@maxscale102 ~]$ systemctl status maxscale
● maxscale.service - MariaDB MaxScale Database Proxy
   Loaded: loaded (/usr/lib/systemd/system/maxscale.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

Install CDC Server

Setup the Repository

Login to the CDC Server (102.168.56.103) we will need to set up two extra repositories that are currently not automatically added by the curl script. CDC Adapter, however, will automatically be taken care of by it.

Using the above URLs, install the API and Adapter directly from the path, the connector, however, should automatically download based on the repository setup.

  • Sequence of Installation
    • install maxscale-cdc-connector
    • install epel-release
      • install libuv
    • install columnstore-api
    • install cdc-adapters

Install MaxScale CDC Connector

[root@cdc103 ~] yum install maxscale-cdc-connector
Resolving Dependencies
Dependencies Resolved

================================================================================================
 Package                          Arch            Version        Repository                Size
================================================================================================
Installing:
 maxscale-cdc-connector           x86_64          2.2.9-1        mariadb-maxscale         214 k

Transaction Summary
================================================================================================
Install  1 Package

Total download size: 214 k
Installed size: 1.4 M
Is this ok [y/d/N]: y
...
...
Complete!

Install Extra Packages for Enterprise Linux epel

[root@cdc103 ~] yum install epel-release
Resolving Dependencies
Dependencies Resolved

============================================================================================== 
Package                   Arch                Version              Repository           Size 
==============================================================================================
Installing:
 epel-release              noarch              7-11                 extras               15 k

Transaction Summary
==============================================================================================
Install  1 Package

Total download size: 15 k
Installed size: 24 k
Is this ok [y/d/N]: y
...
...
Complete!

[root@cdc103 ~] yum install libuv
Resolving Dependencies
Dependencies Resolved

============================================================================================== 
Package               Arch                Version                    Repository         Size 
==============================================================================================
Installing:
 libuv                 x86_64              1:1.19.2-1.el7             epel              121 k

Transaction Summary
==============================================================================================
Install  1 Package

Total download size: 121 k
Installed size: 308 k
Is this ok [y/d/N]: y
Downloading packages:
...
...
Complete!

Install ColumnStore API Package

You can Install the API directly from the source or download the RPM using " wget " and then install it locally.

[root@cdc103 ~] yum install https://downloads.mariadb.com/MariaDB/mariadb-columnstore-api/latest/yum/centos/7/x86_64/mariadb-columnstore-api-1.1.5-1-x86_64-centos7.rpm
Resolving Dependencies
Dependencies Resolved

=============================================================================================
 Package                 Arch   Version   Repository                                    Size
=============================================================================================
Installing:
 mariadb-columnstore-api x86_64 1.1.5-1   /mariadb-columnstore-api-1.1.5-1-x86_64      5.4 M

Transaction Summary
=============================================================================================
Install  1 Package
...
...
Complete!

Install CDC Adapter

You can Install the CDC Adapter directly from the source or download the RPM using " wget " and then install it locally.

[root@cdc103 ~] yum install https://downloads.mariadb.com/MariaDB/data-adapters/mariadb-streaming-data-adapters/latest/yum/centos/7/x86_64/mariadb-columnstore-maxscale-cdc-adapters-1.1.5-1-x86_64-centos7.rpm
Resolving Dependencies
Dependencies Resolved

================================================================================================
 Package                                 Arch   Version Repository                         Size
================================================================================================
Installing:
 mariadb-columnstore-data-adapters       x86_64 1.1.5-1 /mariadb-columnstore-maxscale
                                                                                           77 k
Installing for dependencies:
 maxscale-cdc-connector                  x86_64 2.2.9-1 mariadb-maxscale                  214 k

Transaction Summary
================================================================================================
Install  1 Package (+1 Dependent package)

Total size: 291 k
Total download size: 214 k
Installed size: 1.5 M
Is this ok [y/d/N]: y
...
...
Complete!

Setup Communication Between CDC and ColumnStore

Now that the CDC Adapter, API, and Connectors have been installed, we can start to set up communication between the CDC and ColumnStore servers.

In this exercise, we are using single instance ColumnStore, in case of a distributed install, we will be working with ColumnStore UM1 Node.

ColumnStore Configuration

Connect to CDC server using ssh client and pull the /home/mysql/mariadb/columnstore/etc/Columnstore.xml from ColumnStore server using scprsync or any other method available for file transfer between the servers. Columnstore.xml should be downloaded to /etc and owned by root user.

The file can be downloaded from ColumnStore or any node will do as the Columnstore.xml is automatically synchronized between all the nodes. Use mariadb as the remote user to connect to ColumnStore.

As this is the first time connecting to ColumnStore from CDC server, Linux will ask for yes/no and for mariadb password.

Once the file is downloaded, ensure that it has 644 permission so that everyone can read it.

[root@cdc103 ~] scp mysql@192.168.56.104:/home/mysql/mariadb/columnstore/etc/Columnstore.xml /etc
The authenticity of host '192.168.56.104 (192.168.56.104)' can't be established.
ECDSA key fingerprint is SHA256:Jle/edRpKz9ysV8xp1K9TlIGvbg8Sb1p+GbDob3Id0g.
ECDSA key fingerprint is MD5:a1:ce:9d:58:80:c6:ed:5a:95:7b:33:82:68:cb:0f:40.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.56.104' (ECDSA) to the list of known hosts.
mysql@192.168.56.104's password:
Columnstore.xml                                                         100%   21KB  13.3MB/s   00:00
[root@cdc103 ~]
[root@cdc103 ~] ls -rlt /etc/Columnstore.xml
-rw-r--r-- 1 root root 21089 Jul  1 05:09 /etc/Columnstore.xml
[root@cdc103 ~]

Search and replace 127.0.0.1 with ColumnStore's IP address in the newly downloaded Columnstore.xml file using root user as the file is under /etc/ folder.

[root@cdc103 etc] sed -i 's/127.0.0.1/192.168.56.104/g' /etc/Columnstore.xml
[root@cdc103 etc]

Setup Master MariaDB Replication Master

Now that the CDC node is ready, it's time to setup MariaDB Server as the replication Master for MaxScale.

MaxScale will be working as a Replication Slave for this data streaming setup.

Login to the MariaDB Master server using the root user and add the following contents to the /etc/my.cnf.d/server.cnf file under [mysqld] section and restart MariaDB process.

[mysqld]
server_id=1
log-bin          = mariadb-bin
binlog-format    = ROW
gtid_strict_mode = 1
log_error
log-slave-updates
[root@mariadb101 ~] systemctl restart mariadb

The server_id will be used when configuring MaxScale as a replication slave node.

Setup Replication User

After MariaDB restarts, login to MariaDB using root user and setup replication user maxuser and grant REPLICATION SLAVE privilege to it.

The second step is to create a new database and a test table with some data that we are going to use for testing data streaming to ColumnStore.

[mysql@mariadb101 ~] mysql -uroot -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 10
Server version: 10.3.7-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> RESET MASTER;
Query OK, 0 rows affected (0.004 sec)

MariaDB [(none)]> CREATE USER 'maxuser'@'%' IDENTIFIED BY 'maxpwd';
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> GRANT REPLICATION SLAVE ON *.* TO 'maxuser'@'%';
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> CREATE DATABASE cdc_test;
Query OK, 1 row affected (0.000 sec)

MariaDB [(none)]> USE cdc_test;
Database changed

MariaDB [cdc_test]> CREATE TABLE cdc_tab(id serial, col varchar(100));
Query OK, 0 rows affected (0.010 sec)

MariaDB [cdc_test]> INSERT INTO cdc_tab(col) values ('Row 1'), ('Row 2'), ('Row 3');
Query OK, 3 rows affected (0.004 sec)
Records: 3  Duplicates: 0  Warnings: 0

MariaDB [cdc_test]> UPDATE cdc_tab SET col = 'Updated Row 2' WHERE id = 2;
Query OK, 1 row affected (0.004 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MariaDB [cdc_test]> INSERT INTO cdc_tab(col) values ('Row 4'), ('Row 5'), ('Row 6');
Query OK, 3 rows affected (0.004 sec)
Records: 3  Duplicates: 0  Warnings: 0

MariaDB [cdc_test]> DELETE FROM cdc_tab where id=5;
Query OK, 1 row affected (0.005 sec)

MariaDB [cdc_test]>

Events that are captured in the binary logs of the master MariaDB Server:

  • INSERT 3 Records to cdc_tab.
  • UPDATE col for a row in cdc_tab.
  • INSERT 3 Records to cdc_tab.
  • DELETE a row from cdc_tab.

Setup MaxScale

Log on to the MaxScale server, edit the /etc/maxscale.cnf file, remove everything under the [maxscale] threads=auto section and add the following configuration to it. This will set MaxScale for data streaming to ColumnStore using Avro Listener.

Take note of the server_id in the [replication-router] section, this points to the master MariaDB server's ID that we defined earlier while setting up MariaDB as a master.

#Replication configuration that points to a particular server_id and binlog
[replication-router]
type=service
router=binlogrouter
user=maxuser
passwd=maxpwd
server_id=2
master_id=1
binlogdir=/var/lib/maxscale
mariadb10-compatibility=1
filestem=mariadb-bin

#Replication listener that will listen to the Master DB using the port #6603
[replication-listener]
type=listener
service=replication-router
protocol=MySQLClient
port=6603

#Avro service that will generate JSON files form the Bin-Logs that were received from the MasterDB and store them in the  using the replication-router
[avro-router]
type=service
router=avrorouter
source=replication-router
avrodir=/var/lib/maxscale

#Avro listener that is used by the Avro router on a specific port to be used by CDC
[avro-listener]
type=listener
service=avro-router
protocol=cdc
port=4001
[replication-router]
  • This section defines a bin-log router from Master MariaDB to MaxScale Slave using replication user maxuser which was created earlier.
[replication-listener]
  • This is the mysql client listener service. Any server with MariaDB client can connect to MaxScale on the port 6603 specified in this section. We will connect to MaxScale later to set it up as a REPLICATION SLAVE using this port.
[avro-router]
  • This is the router service that routes the bin-log data into AVRO (JSON) files. CDC will use these AVRO files to generate bulk loading scripts for ColumnStore database.
[avro-listener]
  • Avro Listener uses avro-router service, this listener is used by CDC Adapter to get the AVRO files and stream them to ColumnStore.

Once /etc/maxscale.cnf has been modified on the MaxScale server, restart MaxScale service.

[root@maxscale102 ~] systemctl restart maxscale

Add CDC User / Password for avro-router service in MaxScale

[mysql@maxscale102 ~]$ maxctrl call command cdc add_user avro-router cdcuser cdcpassword

This user will be the user that MaxScale uses to generate Avro schema files to be transferred and loaded into ColumnStore by the MaxScale CDC Adapter.

Setup MaxScale as a REPLICATION SLAVE

Install MariaDB client on the MaxScale Server. We will use this client to connect to MaxScale MariaDB listener and set up MaxScale as a replication SLAVE for MariaDB TX.

[mysql@maxscale102 ~]$ sudo yum -y install MariaDB-client
Resolving Dependencies
Dependencies Resolved

==============================================================================================
Package                         Arch            Version                Repository       Size 
==============================================================================================
Installing:
 MariaDB-client                  x86_64          10.3.7-1.el7.centos    mariadb-main     53 M  
 MariaDB-compat                  x86_64          10.3.7-1.el7.centos    mariadb-main    2.8 M
     replacing  mariadb-libs.x86_64 1:5.5.56-2.el7
Installing for dependencies:
 MariaDB-common                  x86_64          10.3.7-1.el7.centos    mariadb-main    157 k 
Running transaction
  Installing : MariaDB-common-10.3.7-1.el7.centos.x86_64                                 1/31
  Installing : MariaDB-compat-10.3.7-1.el7.centos.x86_64                                 2/31
==============================================================================================

Installed:
  MariaDB-client.x86_64 0:10.3.7-1.el7.centos         MariaDB-compat.x86_64 0:10.3.7-1.el7.centos

Dependency Installed:
  MariaDB-common.x86_64 0:10.3.7-1.el7.centos         perl.x86_64 4:5.16.3-292.el7
  perl-Carp.noarch 0:1.26-244.el7                     perl-Encode.x86_64 0:2.51-7.el7
  perl-Exporter.noarch 0:5.68-3.el7                   perl-File-Path.noarch 0:2.09-2.el7
  perl-File-Temp.noarch 0:0.23.01-3.el7               perl-Filter.x86_64 0:1.49-3.el7
  perl-Getopt-Long.noarch 0:2.40-3.el7                perl-HTTP-Tiny.noarch 0:0.033-3.el7
  perl-PathTools.x86_64 0:3.40-5.el7                  perl-Pod-Escapes.noarch 1:1.04-292.el7
  perl-Pod-Perldoc.noarch 0:3.20-4.el7                perl-Pod-Simple.noarch 1:3.28-4.el7
  perl-Pod-Usage.noarch 0:1.63-3.el7                  perl-Scalar-List-Utils.x86_64 0:1.27-248.el7
  perl-Socket.x86_64 0:2.010-4.el7                    perl-Storable.x86_64 0:2.45-3.el7
  perl-Text-ParseWords.noarch 0:3.29-4.el7            perl-Time-HiRes.x86_64 4:1.9725-3.el7
  perl-Time-Local.noarch 0:1.2300-2.el7               perl-constant.noarch 0:1.27-2.el7
  perl-libs.x86_64 4:5.16.3-292.el7                   perl-macros.x86_64 4:5.16.3-292.el7
  perl-parent.noarch 1:0.225-244.el7                  perl-podlators.noarch 0:2.5.1-3.el7
  perl-threads.x86_64 0:1.87-4.el7                    perl-threads-shared.x86_64 0:1.43-6.el7

Replaced:
  mariadb-libs.x86_64 1:5.5.56-2.el7

Complete!

Now we can Connect to MaxScale MariaDB service using the installed client and setup MaxScale as a REPLICATION SLAVE; use proper Master MariaDB IP in the CHANGE MASTER TO MASTER_HOST followed by START SLAVE and SHOW SLAVE STATUS to ensure the slave is running without any issues.

This setup is just like setting up a MariaDB Master/Slave replication, the only difference, in this case, is that we are setting up MaxScale as a bin-log router slave. Take note that, we have used -h 192.168.56.102 argument to pass in MaxScale's IP address and -P 6603 argument to pass in the [replication-listener] port from the /etc/maxscale.cnf file. That port is for mysqlclient service, that is why we are able to connect to MaxScale's MariaDB interface.

[mysql@maxscale102 ~]$ mysql -h 192.168.56.102 -u maxuser -p maxpwd -P 6603
Enter password:
ERROR 1049 (42000): Unknown database 'maxpwd'
[mysql@maxscale102 ~]$ mysql -h192.168.56.102 -umaxuser -pmaxpwd -P6603
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 10.2.12 2.2.9-maxscale

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.56.101', MASTER_PORT=3306, MASTER_USER='maxuser', MASTER_PASSWORD='maxpwd', MASTER_LOG_FILE='mariadb-bin.000001', MASTER_LOG_POS=4;
Query OK, 0 rows affected (0.001 sec)

MySQL [(none)]> START SLAVE;
Query OK, 0 rows affected (0.201 sec)

MySQL [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Binlog Dump
                  Master_Host: 192.168.56.101
                  Master_User: maxuser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000001
          Read_Master_Log_Pos: 2047
               Relay_Log_File: mariadb-bin.000001
                Relay_Log_Pos: 2047
        Relay_Master_Log_File: mariadb-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 2047
              Relay_Log_Space: 2047
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 5a2189b0-7d1a-11e8-b04f-080027ad63ed
             Master_Info_File: /var/lib/maxscale/master.ini
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave running
           Master_Retry_Count: 1000
                  Master_Bind:
      Last_IO_Error_TimeStamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: No
                  Gtid_IO_Pos:
1 row in set (0.000 sec)

MySQL [(none)]>

Start Data Streaming

With all the settings in place, we can now start data streaming from MariaDB TX to ColumnStore in MariaDB AX using the CDC and Avro data.

Log in to CDC Server and execute mxs_adapter to start streaming.

[mysql@cdc103 ~]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h 192.168.56.102 -P 4001 cdc_test cdc_tab

Let's review this command and its arguments:

  • -c indicates the path of the Columnstore.xml configuration file
  • -u / -p indicates the CDC User and Password that were created earlier
  • -h is the IP address of the MaxScale server
  • -P is the port on which MaxScale [avro-listener] service is listening to
  • cdc_test is the database name on the source (MariaDB) and target (ColumnStore)
  • cdc_tab is the table name on the source (MariaDB) and target (ColumnStore)

The target database/table should be already created before starting mxs_adapter, if not, mxs_adapter will provide a script to create the table in ColumnStore automatically. Let's start the service and see what happens.

[mysql@cdc103 ~]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h 192.168.56.102 -P 4001 cdc_test cdc_tab
Table not found, create with:

    CREATE TABLE cdc_test.cdc_tab (col varchar(100), domain int, event_number int, event_type varchar(50), id serial, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;

[mysql@cdc103 ~]$

As expected, the error message "Table not found", copy the CREATE TABLE script and execute it on the ColumnStore node using mcsmysql interface.

[root@um1 ~] mcsmysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 43
Server version: 10.2.15-MariaDB-log Columnstore 1.1.5-1

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> CREATE DATABASE cdc_test;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> CREATE TABLE cdc_test.cdc_tab (col varchar(100), domain int, event_number int, event_type varchar(50), id serial, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;
ERROR 1069 (42000): Too many keys specified; max 0 keys allowed
MariaDB [(none)]>

There is another problem! Our source table had a serial column which MariaDB treats as a Primary Key. Since ColumnStore does not support Primary Keys / Indexes we will need to change the CREATE TABLE script before executing it in the ColumnStore DB.

Simply change, serial to bigint unsigned .

Let's log back into ColumnStore mcsmysql and recreate the table.

MariaDB [(none)]> DROP TABLE cdc_test.cdc_tab;
Query OK, 0 rows affected (0.02 sec)

MariaDB [(none)]> CREATE TABLE cdc_test.cdc_tab (col varchar(100), domain int, event_number int, event_type varchar(50), id bigint unsigned, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;
Query OK, 0 rows affected (0.55 sec)

MariaDB [(none)]>

Another thing to note here is that the table structure in ColumnStore is quite different from the source. There are a few additional columns. This table is an event table that captures all the INSERT, UPDATE and DELETE events. We can use this data to identify the latest row for each specific ID and run some analytics.

One more thing to notice here is that at the moment, one mxs_adapter can only handle one table at a time. This is to prevent streaming the entire MariaDB databases to ColumnStore. Instead, we could create aggregate tables on MariaDB TX and stream that data to ColumnStore for better analytics use case.

Let's start the mxs_adapter once more and see if it can stream the data from our source MariaDB TX to target MariaDB AX.

[mysql@cdc103 ~]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h 192.168.56.102 -P 4001 cdc_test cdc_tab
4 rows and 1 transactions inserted in 3.3203 seconds. GTID = 0-1-6
2 rows and 1 transactions inserted in 0.792351 seconds. GTID = 0-1-7
3 rows and 1 transactions inserted in 0.604614 seconds. GTID = 0-1-8

Success! The three events are captured, INSERT, UPDATE and DELETE and their GTIDs. Let's log in to ColumnStore and review the data in the cdc_tab table.

MariaDB [cdc_test]> SHOW TABLES;
+--------------------+
| Tables_in_cdc_test |
+--------------------+
| cdc_tab            |
+--------------------+
1 row in set (0.00 sec)

MariaDB [cdc_test]> SELECT * FROM cdc_tab;
+---------------+--------+--------------+---------------+------+----------+-----------+------------+
| col           | domain | event_number | event_type    | id   | sequence | server_id | timestamp  |
+---------------+--------+--------------+---------------+------+----------+-----------+------------+
| Row 1         |      0 |            1 | insert        |    1 |        5 |         1 | 1530439436 |
| Row 2         |      0 |            2 | insert        |    2 |        5 |         1 | 1530439436 |
| Row 3         |      0 |            3 | insert        |    3 |        5 |         1 | 1530439436 |
| Row 2         |      0 |            1 | update_before |    2 |        6 |         1 | 1530439675 |
| Updated Row 2 |      0 |            2 | update_after  |    2 |        6 |         1 | 1530439675 |
| Row 4         |      0 |            1 | insert        |    4 |        7 |         1 | 1530439697 |
| Row 5         |      0 |            2 | insert        |    5 |        7 |         1 | 1530439697 |
| Row 6         |      0 |            3 | insert        |    6 |        7 |         1 | 1530439697 |
| Row 5         |      0 |            1 | delete        |    5 |        8 |         1 | 1530439714 |
+---------------+--------+--------------+---------------+------+----------+-----------+------------+
9 rows in set (0.17 sec)

MariaDB [cdc_test]>

We can see the INSERT, UPDATE (Before and After) and a DELETE. SEQUENCE column indicates the sequence in which these events were triggered and the timestamp column indicates the time as and when these events took place.

Avro files

While setting up MaxScale we specified avrodir as /var/lib/mysql . This can be any different location.

[avro-router]
type=service
router=avrorouter
source=replication-router
avrodir=/var/lib/maxscale

Let's see the contents of the avrodir folder. We can see two files generated .avsc contains the data structure while .avro contains the actual data from the bin logs.

[root@maxscale102 maxscale] pwd
/var/lib/maxscale

[root@maxscale102 maxscale] ls -rlt
total 40
drwxr-xr-x 2 maxscale maxscale    6 Jun 22 14:44 maxscale.cnf.d
-rw-r--r-- 1 maxscale maxscale   54 Jun 22 14:44 maxadmin-users
-rw-r--r-- 1 maxscale maxscale   84 Jun 22 14:44 passwd
drwxr--r-- 2 maxscale maxscale   25 Jul  1 04:25 MariaDB-Monitor
drwxr-xr-x 2 maxscale maxscale    6 Jul  1 06:34 data1288
drwxr-xr-x 2 maxscale maxscale   22 Jul  1 06:46 avro-router
-rw------- 1 maxscale maxscale  183 Jul  1 06:50 master.ini
drwx------ 2 maxscale maxscale  199 Jul  1 06:50 cache
-rw-r--r-- 1 maxscale maxscale 4096 Jul  1 06:50 gtid_maps.db
-rw-r--r-- 1 maxscale maxscale 2047 Jul  1 06:50 mariadb-bin.000001
-rw-r--r-- 1 maxscale maxscale  572 Jul  1 06:50 cdc_test.cdc_tab.000001.avsc
-rw-r--r-- 1 maxscale maxscale  797 Jul  1 06:50 cdc_test.cdc_tab.000001.avro
-rw-r--r-- 1 maxscale maxscale 5120 Jul  1 06:50 avro.index
-rw-r--r-- 1 maxscale maxscale   69 Jul  1 07:29 avro-conversion.ini
[root@maxscale102 maxscale]

In case of some issue with the data streaming, one can try to restart MaxScale service. If there's still a failure, as a last resort one can delete *.avsc, *.avro, avro.index and avro-conversion.ini and restart the MaxScale service. It should be able to recover.

Since the mxs_adapter is running, as and when new data is inserted in the cdc_test.cdc_tab table, it will automatically be streamed into ColumnStore.

Conclusion

Using this setup, we can stream data directly from OLTP MariaDB TX not only to MariaDB AX but also to other sources that can take Avro/JSON data. There is a Kafka Adapter already available in the Data Adapters download page.

References

Thanks.

This blog is about setting up Data Streaming from MariaDB TX to MariaDB AX. We will be using Virtual Box CentOS images.

Login or Register to post comments

by Faisal at July 19, 2018 04:07 PM

Jean-Jerome Schmidt

Controlling Replication Failover for MySQL and MariaDB with Pre- or Post-Failover Scripts

In a previous post, we discussed how you can take control of the failover process in ClusterControl by utilizing whitelists and blacklists. In this post, we are going to discuss a similar concept. But this time we will focus on integrations with external scripts and applications through numerous hooks made available by ClusterControl.

Infrastructure environments can be built in different ways, as oftentimes there are many options to choose from for a given piece of the puzzle. How do we define which database node to write to? Do you use virtual IP? Do you use some sort of service discovery? Maybe you go with DNS entries and change the A records when needed? What about the proxy layer? Do you rely on ‘read_only’ value for your proxies to decide on the writer, or maybe you make the required changes directly in the configuration of the proxy? How does your environment handle switchovers? Can you just go ahead and execute it, or maybe you have to take some preliminary actions beforehand? For instance, halting some other processes before you can actually do the switch?

It is not possible for a failover software to be preconfigured to cover all of the different setups that people can create. This is main reason to provide different ways of hooking into the failover process. This way you can customize it and make it possible to handle all of the subtleties of your setup. In this blog post, we will look into how ClusterControl’s failover process can be customized using different pre- and post-failover scripts. We will also discuss some examples of what can be accomplished with such customization.

Integrating ClusterControl

ClusterControl provides several hooks that can be used to plug in external scripts. Below you will find a list of those with some explanation.

  1. Replication_onfail_failover_script - this script executes as soon as it has been discovered that a failover is needed. If the script returns non-zero, it will force the failover to abort. If the script is defined but not found, the failover will be aborted. Four arguments are supplied to the script: arg1='all servers' arg2='oldmaster' arg3='candidate', arg4='slaves of oldmaster' and passed like this: 'scripname arg1 arg2 arg3 arg4'. The script must be accessible on the controller and be executable.
  2. Replication_pre_failover_script - this script executes before the failover happens, but after a candidate has been elected and it is possible to continue the failover process. If the script returns non-zero it will force the failover to abort. If the script is defined but not found, the failover will be aborted. The script must be accessible on the controller and be executable.
  3. Replication_post_failover_script - this script executes after the failover happened. If the script returns non-zero, a Warning will be written in the job log. The script must be accessible on the controller and be executable.
  4. Replication_post_unsuccessful_failover_script - This script is executed after the failover attempt failed. If the script returns non-zero, a Warning will be written in the job log. The script must be accessible on the controller and be executable.
  5. Replication_failed_reslave_failover_script - this script is executed after that a new master has been promoted and if the reslaving of the slaves to the new master fails. If the script returns non-zero, a Warning will be written in the job log. The script must be accessible on the controller and be executable.
  6. Replication_pre_switchover_script - this script executes before the switchover happens. If the script returns non-zero, it will force the switchover to fail. If the script is defined but not found, the switchover will be aborted. The script must be accessible on the controller and be executable.
  7. Replication_post_switchover_script - this script executes after the switchover happened. If the script returns non-zero, a Warning will be written in the job log. The script must be accessible on the controller and be executable.

As you can see, the hooks cover most of the cases where you may want to take some actions - before and after a switchover, before and after a failover, when the reslave has failed or when the failover has failed. All of the scripts are invoked with four arguments (which may or may not be handled in the script, it is not required for the script to utilize all of them): all servers, hostname (or IP - as it is defined in ClusterControl) of the old master, hostname (or IP - as it is defined in ClusterControl) of the master candidate and the fourth one, all replicas of the old master. Those options should make it possible to handle the majority of the cases.

All of those hooks should be defined in a configuration file for a given cluster (/etc/cmon.d/cmon_X.cnf where X is the id of the cluster). An example may look like this:

replication_pre_failover_script=/usr/bin/stonith.py
replication_post_failover_script=/usr/bin/vipmove.sh

Of course, invoked scripts have to be executable, otherwise cmon won’t be able to execute them. Let’s now take a moment and go through the failover process in ClusterControl and see when the external scripts are executed.

Failover process in ClusterControl

We defined all of the hooks that are available:

replication_onfail_failover_script=/tmp/1.sh
replication_pre_failover_script=/tmp/2.sh
replication_post_failover_script=/tmp/3.sh
replication_post_unsuccessful_failover_script=/tmp/4.sh
replication_failed_reslave_failover_script=/tmp/5.sh
replication_pre_switchover_script=/tmp/6.sh
replication_post_switchover_script=/tmp/7.sh

After this, you have to restart the cmon process. Once it’s done, we are ready to test the failover. The original topology looks like this:

A master has been killed and the failover process started. Please note, the more recent log entries are at the top so you want to follow the failover from bottom to the top.

As you can see, immediately after the failover job started, it triggers the ‘replication_onfail_failover_script’ hook. Then, all reachable hosts are marked as read_only and ClusterControl attempts to prevent the old master from running.

Next, the master candidate is picked, sanity checks are executed. Once it is confirmed the master candidate can be used as a new master, the ‘replication_pre_failover_script’ is executed.

More checks are performed, replicas are stopped and slaved off the new master. Finally, after the failover completed, a final hook, ‘replication_post_failover_script’, is triggered.

When hooks can be useful?

In this section, we’ll go through a couple of examples cases where it might be a good idea to implement external scripts. We will not get into any details as those are too closely related to a particular environment. It will be more of a list of suggestions that might be useful to implement.

STONITH script

Shoot The Other Node In The Head (STONITH) is a process of making sure that the old master, which is dead, will stay dead (and yes.. we don’t like zombies roaming about in our infrastructure). The last thing you probably want is to have an unresponsive old master which then gets back online and, as a result, you end up with two writable masters. There are precautions you can take to make sure the old master will not be used even if shows up again, and it is safer for it to stay offline. Ways on how to ensure it will differ from environment to environment. Therefore, most likely, there will be no built-in support for STONITH in the failover tool. Depending on the environment, you may want to execute CLI command which will stop (and even remove) a VM on which the old master is running. If you have an on-prem setup, you may have more control over the hardware. It might be possible to utilize some sort of remote management (integrated Lights-out or some other remote access to the server). You may have also access to manageable power sockets and turn off the power in one of them to make sure server will never start again without human intervention.

Service discovery

We already mentioned a bit about service discovery. There are numerous ways one can store information about a replication topology and detect which host is a master. Definitely, one of the more popular options is to use etc.d or Consul to store data about current topology. With it, an application or proxy can rely in this data to send the traffic to the correct node. ClusterControl (just like most of the tools which do support failover handling) does not have a direct integration with either etc.d or Consul. The task to update the topology data is on the user. She can use hooks like replication_post_failover_script or replication_post_switchover_script to invoke some of the scripts and do the required changes. Another pretty common solution is to use DNS to direct traffic to correct instances. If you will keep the Time-To-Live of a DNS record low, you should be able to define a domain, which will point to your master (i.e. writes.cluster1.example.com). This requires a change to the DNS records and, again, hooks like replication_post_failover_script or replication_post_switchover_script can be really helpful to make required modifications after a failover happened.

Proxy reconfiguration

Each proxy server that is used has to send traffic to correct instances. Depending on the proxy itself, how a master detection is performed can be either (partially) hardcoded or can be up to the user to define whatever she likes. ClusterControl failover mechanism is designed in a way it integrates well with proxies that it deployed and configured. It still may happen that there are proxies in place, which were not installed by ClusterControl and they require some manual actions to take place while failover is being executed. Such proxies can also be integrated with the ClusterControl failover process through external scripts and hooks like replication_post_failover_script or replication_post_switchover_script.

Additional logging

It may happen that you’d like to collect data of the failover process for debugging purposes. ClusterControl has extensive printouts to make sure it is possible to follow the process and figure out what happened and why. It still may happen that you would like to collect some additional, custom information. Basically all of the hooks can be utilized here - you can collect the initial state, before the failover, you can track the state of the environment at all stages of the failover.

by krzysztof at July 19, 2018 11:51 AM

July 18, 2018

Peter Zaitsev

Webinar Wed 7/19: MongoDB Sharding

MongoDB shard zones

MongoDB shard zonesPlease join Percona’s Senior Support Engineer, Adamo Tonete as he presents MongoDB Sharding 101 on July 19th, 2018, at 12:30 PM PDT (UTC-7) / 3:30 PM EDT (UTC-4).

 

This tutorial is a continuation of advanced topics for the DBA. In it, we will share best practices and tips on how to perform the most common activities.

In this tutorial, we are going to cover MongoDB sharding.

Register Now

The post Webinar Wed 7/19: MongoDB Sharding appeared first on Percona Database Performance Blog.

by Adamo Tonete at July 18, 2018 03:55 PM

Webinar Wed 7/18: MariaDB 10.3 vs. MySQL 8.0

MariaDB 10.3 vs MySQL 8.0

MariaDB 10.3 vs MySQL 8.0Please join Percona’s Chief Evangelist, Colin Charles as he presents as he presents MariaDB 10.3 vs. MySQL 8.0 on Wednesday, July 18th, 2018, at 9:00 AM PDT (UTC-7) / 12:00 PM EDT (UTC-4).

 

Technical considerations

Are they syntactically similar? Where do these two databases differ? Why would I use one over the other?

MariaDB 10.3 is on the path of gradually diverging from MySQL 8.0. One obvious example is the internal data dictionary currently under development for MySQL 8.0. This is a major change to the way metadata is stored and used within the server, and MariaDB doesn’t have an equivalent feature. Implementing this feature could mark the end of datafile-level compatibility between MySQL and MariaDB.

Non-technical considerations

There are also non-technical differences between MySQL 8.0 and MariaDB 10.3, including:

Licensing: MySQL offers their code as open-source under the GPL, and provides the option of non-GPL commercial distribution in the form of MySQL Enterprise. MariaDB can only use the GPL, because their work is derived from the MySQL source code under the terms of that license.

Support services: Oracle provides technical support, training, certification and consulting for MySQL, while MariaDB has their own support services. Some people will prefer working with smaller companies, as traditionally it affords them more leverage as a customer.

Community contributions: MariaDB touts the fact that they accept more community contributions than Oracle. Part of the reason for this disparity is that developers like to contribute features, bug fixes and other code without a lot of paperwork overhead (and they complain about the Oracle Contributor Agreement). However, MariaDB has its own MariaDB Contributor Agreement — which more or less serves the same purpose.

Colin will take a look at some of the differences between MariaDB 10.3 and MySQL 8.0 and help answer some of the common questions our Database Performance Experts get about the two databases.

Register Now

The post Webinar Wed 7/18: MariaDB 10.3 vs. MySQL 8.0 appeared first on Percona Database Performance Blog.

by Colin Charles at July 18, 2018 02:58 PM

Why Consumer SSD Reviews are Useless for Database Performance Use Case

Anandtech Table reviewing consumer SSD performance

If you’re reading consumer SSD reviews and using them to estimate SSD performance under database workloads, you’d better stop. Databases are not your typical consumer applications and they do not use IO in the same way.

Let’s look, for example, at this excellent AnandTech review of Samsung 960 Pro –  a consumer NVMe device that I happen to have in my test lab.

Anandtech Table reviewing consumer SSD performance

The summary table is actually great, showing the performance both at Queue Depth 1 (single threaded) as well as Queue Depth 32 – a pretty heavy concurrent load.

Even at QD1 we see 50K (4K) writes per second, which should be enough for pretty serious database workloads.

In reality, though, you might be in for some disappointing surprises. While “normal” buffered IO is indeed quite fast, this drive really hates fsync() calls, with a single thread fsync() latency of 3.5ms or roughly 300 fsync/sec. That’s just slightly more than your old school spinning drive.

Why is fsync() performance critical for databases?

To achieve Durabilitythe letter “D” of ACIDdatabases tend to rely on a write ahead log (WAL) which is sequentially written. The WAL must be synced to disk on every transaction commit using fsync() or similar measures, such as opening file with O_SYNC flag. These tend to have similar performance implications.

Other database operations use fsync() too, but writing WAL is where it usually hurts the most.    

In a fully durable configuration MySQL tends to be impacted even more by poor fsync() performance. It may need to perform as many as three fsync operations per transaction commit. Group commit reduces the impact on throughput but transaction latency will still be severely impacted

Want more bad news? If the fsync() performance is phenomenal on your consumer SSD it indeed might be too good to be true. Over the years, some consumer SSDs “faked” fsync and accepted possible data loss in the case of power failure. This might not be a big deal if you only use them for testing but it is a showstopper for any real use.

Want to know more about your drive’s fsync() performance?  You can use these sysbench commands:

sysbench fileio --time=60 --file-num=1 --file-extra-flags= --file-total-size=4096 --file-block-size=4096 --file-fsync-all=on --file-test-mode=rndwr --file-fsync-freq=0 --file-fsync-end=0  --threads=1 --percentile=99 prepare
sysbench fileio --time=60 --file-num=1 --file-extra-flags= --file-total-size=4096 --file-block-size=4096 --file-fsync-all=on --file-test-mode=rndwr --file-fsync-freq=0 --file-fsync-end=0  --threads=1 --percentile=99 run | grep "avg:"

You can also use ioping as described in this blog post

I wish that manufacturers’ tech specifications described fsync latency, along with a clear statement as to whether the drive guarantees no loss of data on power failure. Likewise, I wish folk doing storage reviews could include these in their research.

Interested in fsync() performance for variety of devices?  Yves Trudeau wrote an excellent blog post about fsync() performance on various storage devices  a few months ago.

Other technical resources

Principal Support Escalation Specialist Sveta Smirnova presents Troubleshooting MySQL Concurrency Issues with Load Testing Tools. 

You can download a three part series of eBooks by Principal Consultant Alexander Rubin and me on MySQL Performance.

The post Why Consumer SSD Reviews are Useless for Database Performance Use Case appeared first on Percona Database Performance Blog.

by Peter Zaitsev at July 18, 2018 11:55 AM

Open Query Pty Ltd

Australian government encryption folly

data encryption lockIn IDG’s CIO magazine (17 July 2018): Wickr, Linux Australia, Twilio sign open letter against govt’s encryption crackdown ‘mistake’.  Not just those few, though, 76 companies and organisations signed that letter.

Learning Lessons

Encryption is critical to the whole “online thing” working, both for individuals as well as companies.  Let’s look back at history:

  • In most countries’ postal legislation, there’s a law against the Post Office opening letters and packages.
  • Similarly, a bit later, telephone lines couldn’t get tapped unless there’s a very specific court order.

This was not just a nice gesture or convenience.  It was critical for

  1. trusting these new communication systems and their facilitating companies, and
  2. enabling people to communicate at greater distances and do business that way, or arrange other matters.

Those things don’t work if there are third parties looking or listening in, and it really doesn’t matter who or what the third party is. Today’s online environment is really not that different.

Various governments have tried to nobble encryption at various points over the past few decades: from trying to ban encryption outright, to requiring super keys in escrow, to exploiting (and possibly creating) weaknesses in encryption mechanisms.  The latter in particular is very damaging, because it’s so arrogant: the whole premise becomes that “your people” are the only bright ones that can figure out and exploit the weakness. Such presumptions are always wrong, even if there is no public proof. A business or criminal organisation who figures it out can make good use of it for their own purposes, provided they keep quiet about it.  And companies are, ironically, must better than governments at keeping secrets.

Best Practice

Apps and web sites, and facilitating infrastructures, should live by the following basic rules:

  • First and foremost, get people on staff or externally who really understand encryption and privacy. Yes, geeks.
  • Use proper and current encryption and signature mechanisms, without shortcuts.  A good algorithm incorrectly utilised is not secure.
  • Only ask for and store data you really need, nothing else.
  • Be selective with collecting metadata (including logging, third party site plugins, advertising, etc). It can easily be a privacy intrusion and also have security consequences.
  • Only retain data as per your actual needs (and legal requirements), no longer.
  • When acting as an intermediary, design for end-to-end encryption with servers merely passing on the encrypted data.

Most of these aspects interact.  For example: if you just pass on encrypted data, but meanwhile collect and store an excess of metadata, you’re not actually delivering a secure environment. On the other hand, by not having data, or keys, or metadata, you’ll be neither a target for criminals, nor have anything to hand over to a government.

See also our earlier post on Keeping Data Secure.

But what about criminals?

Would you, when following these guidelines, enable criminals? Hardly. The proper technology is available anyway, and criminal elements who are not technically capable can hire that knowledge and skill to assist them. Fact is, some smart people “go to the dark side” (for reasons of money, ego, or whatever else).  You don’t have to presume that your service or app is the one thing that enables these people. It’s just not.  Which is another reason why these government “initiatives” are folly: they’ll fail at their intended objective, while at the same time impairing and limiting general use of essential security mechanisms.  Governments themselves could do much better by hiring and listening to people who understand these matters.

by Arjen Lentz at July 18, 2018 12:45 AM

July 17, 2018

Jean-Jerome Schmidt

ClusterControl Release 1.6.2: New Backup Management and Security Features for MySQL & PostgreSQL

We are excited to announce the 1.6.2 release of ClusterControl - the all-inclusive database management system that lets you easily automate and manage highly available open source databases in any environment: on-premise or in the cloud.

ClusterControl 1.6.2 introduces new exciting Backup Management as well as Security & Compliance features for MySQL & PostgreSQL, support for MongoDB v 3.6 … and more!

Release Highlights

Backup Management

  • Continuous Archiving and Point-in-Time Recovery (PITR) for PostgreSQL
  • Rebuild a node from a backup with MySQL Galera clusters to avoid SST

Security & Compliance

  • New, consolidated Security section

Additional Highlights

  • Support for MongoDB v 3.6

View the ClusterControl ChangeLog for all the details!

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

View Release Details and Resources

Release Details

Backup Management

One of the issues with MySQL and PostgreSQL is that there aren’t really any out-of-the-box tools for users to simply (in the GUI) pick up restore-time: certain operations need to be performed to do that, such as finding the full backup, restore it and apply any changes manually that happened after the backup was taken.

ClusterControl provides a single process to restore data to point in time with no extra actions needed.

With the same system, users can verify their backups (in the case of MySQL for instance, ClusterControl will do the installation, set up the cluster, do a restore and, if the backup is sound, make it valid - which, as one can imagine, represents a lot of steps).

With ClusterControl, users can not only go back to a point in time, but also pick up the exact transaction that happened; and, with surgical precision, restore their data before disaster really strikes.

New for PostgreSQL

Continuous Archiving and Point-in-Time Recovery (PITR) for PostgreSQL: ClusterControl automates that process now and enables continuous WAL archiving as well as a PITR with backups.

New for MySQL Galera Cluster

Rebuild a node from a backup with MySQL Galera clusters to avoid SST: ClusterControl reduces the time it takes to recover a node by avoiding streaming a full dataset over the network from another node.

Security & Compliance

The new Security section in ClusterControl lets users easily check which security features they have enabled (or disabled) for their clusters, thus simplifying the process of taking the relevant security measures for their setups.

Additional New Functionalities

View the ClusterControl ChangeLog for all the details!

 

Download ClusterControl today!

Happy Clustering!

by jj at July 17, 2018 02:51 PM

Peter Zaitsev

When Should I Use Amazon Aurora and When Should I use RDS MySQL?

Now that Database-as-a-service (DBaaS) is in high demand, there is one question regarding AWS services that cannot always be answered easily : When should I use Aurora and when RDS MySQL?

DBaaS cloud services allow users to use databases without configuring physical hardware and infrastructure, and without installing software. I’m not sure if there is a straightforward answer, but when trying to find out which solution best fits an organization there are multiple factors that should be taken into consideration. These may be performance, high availability, operational cost, management, capacity planning, scalability, security, monitoring, etc.

There are also cases where although the workload and operational needs seem to best fit to one solution, there are other limiting factors which may be blockers (or at least need special handling).

In this blog post, I will try to provide some general rules of thumb but let’s first try to give a short description of these products.

What we should really compare is the MySQL and Aurora database engines provided by Amazon RDS.

An introduction to Amazon RDS

Amazon Relational Database Service (Amazon RDS) is a hosted database service which provides multiple database products to choose from, including Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL Server. We will focus on MySQL and Aurora.

With regards to systems administration, both solutions are time-saving. You get an environment ready to deploy your application and if there are no dedicated DBAs, RDS gives you great flexibility for operations like upgrades or backups. For both products, Amazon applies required updates and the latest patches without any downtime. You can define maintenance windows and automated patching (if enabled) will occur within them. Data is continuously backed up to S3 in real time, with no performance impact. This eliminates the need for backup windows and other, complex or not, scripted procedures. Although this sounds great, the risk of vendor lock-in and the challenges of enforced updates and client-side optimizations are still there.

So, Aurora or RDS MySQL?

Amazon Aurora is a relational, proprietary, closed-source database engine, with all that that implies.

RDS MySQL is 5.5, 5.6 and 5.7 compatible and offers the option to select among minor releases. While RDS MySQL supports multiple storage engines with varying capabilities, not all of them are optimized for crash recovery and data durability. Until recently, it was a limitation that Aurora was only compatible with MySQL 5.6 but it’s now compatible with both 5.6 and 5.7 too.

So, in most cases, no significant application changes are required for either product. Keep in mind that certain MySQL features like the MyISAM storage engine are not available with Amazon Aurora. Migration to RDS can be performed using Percona XtraBackup.

For RDS products shell access to the underlying operating system is disabled and access to MySQL user accounts with the “SUPER” privilege isn’t allowed. To configure MySQL variables or manage users, Amazon RDS provides specific parameter groups, APIs and other special system procedures which be used. If you need to enable remote access this article will help you do so https://www.percona.com/blog/2018/05/08/how-to-enable-amazon-rds-remote-access/

Performance considerations

Although Amazon RDS uses SSDs to achieve better IO throughput for all its database services, Amazon claims that the Aurora is able to achieve a 5x performance boost than standard MySQL and provides reliability out of the box. In general, Aurora seems to be faster, but not always.

For example, due to the need to disable the InnoDB change buffer for Aurora (this is one of the keys for the distributed storage engine), and that updates to secondary indexes must be write through, there is a big performance penalty in workloads where heavy writes that update secondary indexes are performed. This is because of the way MySQL relies on the change buffer to defer and merge secondary index updates. If your application performs a high rate of updates against tables with secondary indexes, Aurora performance may be poor. In any case, you should always keep in mind that performance depends on schema design. Before taking the decision to migrate, performance should be evaluated against an application specific workload. Doing extensive benchmarks will be the subject of a future blog post.

Capacity Planning

Talking about underlying storage, another important thing to take into consideration is that with Aurora there is no need for capacity planning. Aurora storage will automatically grow, from the minimum of 10 GB up to 64 TiB, in 10 GB increments, with no impact on database performance. The table size limit is only constrained by the size of the Aurora cluster volume, which has a maximum of 64 tebibytes (TiB). As a result, the maximum table size for a table in an Aurora database is 64 TiB. For RDS MySQL, the maximum provisioned storage limit constrains the size of a table to a maximum size of 16 TB when using InnoDB file-per-table tablespaces.

Replication

Replication is a really powerful feature of MySQL (like) products. With Aurora, you can provision up to fifteen replicas compared to just five in RDS MySQL. All Aurora replicas share the same underlying volume with the primary instance and this means that replication can be performed in milliseconds as updates made by the primary instance are instantly available to all Aurora replicas. Failover is automatic with no data loss on Amazon Aurora whereas the replicas failover priority can be set.

An explanatory description of Amazon Aurora’s architecture can be found in Vadim’s post written a couple of years ago https://www.percona.com/blog/2015/11/16/amazon-aurora-looking-deeper/

The architecture used and the way that replication works on both products shows a really significant difference between them. Aurora is a High Availablity (HA) solution where you only need to attach a reader and this automatically becomes Multi-AZ available. Aurora replicates data to six storage nodes in Multi-AZs to withstand the loss of an entire AZ (Availability Zone) or two storage nodes without any availability impact to the client’s applications.

On the other hand, RDS MySQL allows only up to five replicas and the replication process is slower than Aurora. Failover is a manual process and may result in last-minute data loss. RDS for MySQL is not an HA solution, so you have to mark the master as Multi-AZ and attach the endpoints.

Monitoring

Both products can be monitored with a variety of monitoring tools. You can enable automated monitoring and you can define the log types to publish to Amazon CloudWatch. Percona Monitoring and Management (PMM) can also be used to gather metrics.

Be aware that for Aurora there is a limitation for the T2 instances such that Performance Schema can cause the host to run out of memory if enabled.

Costs

Aurora instances will cost you ~20% more than RDS MySQL. If you create Aurora read replicas then the cost of your Aurora cluster will double. Aurora is only available on certain RDS instance sizes. Instances pricing details can be found here and here.

Storage pricing may be a bit tricky. Keep in mind that pricing for Aurora differs to that for RDS MySQL. For RDS MySQL you have to select the type and size for the EBS volume, and you have to be sure that provisioned EBS IOPs can be supported by your instance type as EBS IOPs are restricted by the instance type capabilities. Unless you watch for this, you may end up having EBS IOPs that cannot be really used by your instance.

For Aurora, IOPs are only limited by the instance type. This means that if you want to increase IOPs performance on Aurora you should proceed with an instance type upgrade. In any case, Amazon will charge you based on the dataset size and the requests per second.

That said, although for Aurora you pay only for the data you really use in 10GB increments if you want high performance you have to select the correct instance. For Aurora, regardless of the instance type, you get billed $0.10 per GB-month and $0.20 per 1 million requests so if you need high performance the cost maybe even more than RDS MySQL. For RDS MySQL storage costs are based on the EBS type and size.

Percona provides support for RDS services and you might be interested in these cases studies:

When a more fully customized solution is required, most of our customers usually prefer the use of AWS EC2 instances supported by our managed services offering.

TL;DR
  • If you are looking for a native HA solution then you should use Aurora
  • For a read-intensive workload within an HA environment, Aurora is a perfect match. Combined with ProxySQL for RDS you can get a high flexibility
  • Aurora performance is great but is not as much as expected for write-intensive workloads when secondary indexes exist. In any case, you should benchmark both RDS MySQL and Aurora before taking the decision to migrate.  Performance depends much on workload and schema design
  • By choosing Amazon Aurora you are fully dependent on Amazon for bug fixes or upgrades
  • If you need to use MySQL plugins you should use RDS MySQL
  • Aurora only supports InnoDB. If you need other engines i.e. MyISAM, RDS MySQL is the only option
  • With RDS MySQL you can use specific MySQL releases
  • Aurora is not included in the AWS free-tier and costs a bit more than RDS MySQL. If you only need a managed solution to deploy services in a less expensive way and out of the box availability is not your main concern, RDS MySQL is what you need
  • If for any reason Performance Schema must be ON, you should not enable this on Amazon Aurora MySQL T2 instances. With the Performance Schema enabled, the T2 instance may run out of memory
  • For both products, you should carefully examine the known issues and limitations listed here https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.KnownIssuesAndLimitations.html and here https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.AuroraMySQL.html

The post When Should I Use Amazon Aurora and When Should I use RDS MySQL? appeared first on Percona Database Performance Blog.

by Ananias Tsalouchidis at July 17, 2018 01:46 PM

July 16, 2018

MariaDB AB

MariaDB Pledges Cure Period for Open Source Licenses

MariaDB Pledges Cure Period for Open Source Licenses kajarno Mon, 07/16/2018 - 17:10

MariaDB is joining Red Hat, Facebook, Google, IBM, Microsoft, HPE and others to adopt a fair cure period to correct license compliance issues for GPLv2 software. Under GPLv2, the license governing MariaDB Server, violators of the license have no opportunity to correct a license violation. A cure period was added to GPLv3 to give users in violation of the license time to fix the violation. Today, we are taking a stand with other open source leaders to add this cure period to our software. That is, users in violation of the license will have time to fix their license violation.

We join the initiative for several reasons. Because we want to be friendly with our users. Because the GPLv3 cure period is tried and tested, and a good way to build confidence in the user base, knowing that users will be fairly treated. Because we believe Open Source deserve greater predictability.

Extending the initiative is good for the MariaDB community, because confidence in fair and just treatment of the user base is important for the adoption and community recognition of any GPLv2 software – MariaDB is no different here. We want to show our appreciation towards the industry consortium taking this simple but helpful step towards an even more fair and cooperative world of Open Source.

Our customers will also benefit. MariaDB Corporation is taking active measures to be a good corporate citizen in the community of Open Source developers. We want to show our support for a consortium that has found a way to eliminate a legal hurdle in the fair use of GPLv2 software. The Cure Period initiative gives both MariaDB users and our customers the confidence to innovate freely with open source by granting them these protections.

The Cure Pledge is a logical next step for MariaDB Corporation, as we have already taken an active role in developing licensing related to open source through our release of the Business Source License in 2017.  The Cure Period initiative specifically relates to GPLv2 software. However, we are immediately extending the same cure period to our BSL software, and once we update BSL with a new version, we will work the corresponding language into that version. In the meantime, our BSL customers do enjoy the same protection.

For the legal details, you can read our Cure Pledge.

Login or Register to post comments

by kajarno at July 16, 2018 09:10 PM

Peter Zaitsev

ProxySQL 1.4.9 and Updated proxysql-admin Tool Now in the Percona Repository

ProxySQL 1.4.9

ProxySQL 1.4.9ProxySQL 1.4.9, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.9 source and binary packages available at https://percona.com/downloads/proxysql include ProxySQL Admin – a tool, developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.9 are available as well: https://hub.docker.com/r/percona/proxysql/. You can download the original ProxySQL from https://github.com/sysown/proxysql/releases.

This release contains the following bug fixes and enhancements in ProxySQL Admin:

  • PSQLADM-31proxysql-admin is now able to handle multiple ProxySQL instances using separate configuration files passed with the enable command. This new multi-instance functionality was the reason to fix detection of a hostgroup, which was previously hardcoded in the script (bug fixed PSQLADM-56), and to fix features broken on multiple clusters: host priority (bug fixed PSQLADM-63), user accounts sync (bug fixed PSQLADM-70), and, finally, reading right cluster name from scheduler by the Galera checker script, broken in case of a matching write hostgroup (bug fixed PSQLADM-69).
  • BLD-1068: A “-percona” suffix added to the proxysql version now indicates the Percona built version.
  • PSQLADM-62: The new  --without-check-monitor-user option allows using ProxySQL Admin tool  without human interaction, to provide an automatic deployment.
  • PSQLADM-57: proxysql-admin script now checks if the current user has sufficient privileges on the ProxySQL data directory instead of unconditionally asking to be executed by root user.
  • PSQLADM-67: When the cluster comes back online after the maintenance activity (in which all nodes are OFFLINE_SOFT and are moved to a reader hostgroup), Galera checker will now promote one of the read nodes to write node in singlewrite mode setup.
  • PSQLADM-58: Now the proxysql_galera_checker script terminates if the monitor credentials do not match ones of Percona XtraDB Cluster, to avoid situations when monitor user password rotation puts the cluster to a read-only state until the next ProxySQL restart.

ProxySQL is available under the OpenSource license GPLv3.

The post ProxySQL 1.4.9 and Updated proxysql-admin Tool Now in the Percona Repository appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at July 16, 2018 08:31 PM

InnoDB Cluster in a Nutshell: Part 2 MySQL Router

MySQL Router MySQL Cluster

MySQL Router MySQL ClusterMySQL InnoDB Cluster is an Oracle High Availability solution that can be easily installed over MySQL to provide high availability with multi-master capabilities and automatic failover. In the previous post we presented the first component of InnoDB Cluster, group replication. Now we will go through the second component, MySQL Router.  We will address MySQL Shell in a final instalment of this three-part series. By then, you should have a good overview of the features offeed by MySQL InnoDB Cluster.

MySQL Router

This component is responsible for distributing the traffic between members of the cluster. It is a proxy-like solution to hide cluster topology from applications, so applications don’t need to know which member of a cluster is the primary node and which are secondaries.

The tool is capable of performing read/write splitting by exposing different interfaces. A common setup is to have one read-write interface and one read-only interface. This is default behavior that also exposes 2 similar interfaces to use x-protocol (i.e. used for CRUD operations and async calls).

The read and write split is done using a concept of roles: Primary for writes and Secondary for read-only. This is analogous to how members of cluster are named. Additionally, each interface is exposed via a TCP port so applications only need to know the IP:port combination used for writes and the one used for reads. Then, MySQL Router will take care of connections to cluster members depending on the type of traffic to server.

MySQL Router is a very simple tool, maybe too simple as it is a layer four load balance and lacks some of the advanced features that some of it’s competitors have (e.g.. ProxySQL).

Here is a short list of the most important features of MySQL Router:

  • As mentioned, read and write split based on roles.
  • Load balancing both for reads and writes use different algorithms.
  • Configuration is stored in a configuration test file.
  • Automatically detects cluster topology by connecting and retrieving information, based on this information the router configures itself with default rules.
  • Automatically detects failing nodes and redirects traffic accordingly.

Algorithms used for routing

An important thing to mention is the routing_strategy algorithms that are available, as they are assigned by default depending on the routing mode:

  • For PRIMARY mode (i.e. writer node – or nodes): uses the first-available algorithm that picks the first writer node from a list of writes and in case of failure moves to the next in the list. If the failing node comes back to life, it’s automatically added to the list of servers and become PRIMARY again when cluster assign this status. When no writers are available then write routing is stopped
  • For read-only mode (i.e. read nodes): uses the round-robin algorithm between servers listed in the destinations variable. This mode splits read traffic between all servers in an even manner.

Additional routing_strategy algorithms :

  • next-available: similar to first-available but in this case a failing node is marked as crashed and can’t get back into the rotation.
  • round-robin-with-fallback: same as round-robin but it includes the ability in this case of using servers from the primary list (writers) to distribute the read traffic.

A sample configuration

For performance purposes it’s recommended to setup MySQL Router in the same place as the application, considering an instance per application server.

Here you can see a sample configuration file auto-generated by --bootstrap functionality:

$ cat /etc/mysqlrouter/mysqlrouter.conf
# File automatically generated during MySQL Router bootstrap
[DEFAULT]
name=system
user=mysqlrouter
keyring_path=/var/lib/mysqlrouter/keyring
master_key_path=/etc/mysqlrouter/mysqlrouter.key
connect_timeout=30
read_timeout=30
[logger]
level = INFO
[metadata_cache:percona]
router_id=4
bootstrap_server_addresses=mysql://192.168.70.4:3306,mysql://192.168.70.3:3306,mysql://192.168.70.2:3306
user=mysql_router4_56igr8pxhz0m
metadata_cluster=percona
ttl=5
[routing:percona_default_rw]
bind_address=0.0.0.0
bind_port=6446
destinations=metadata-cache://percona/default?role=PRIMARY
routing_strategy=round-robin
protocol=classic
[routing:percona_default_ro]
bind_address=0.0.0.0
bind_port=6447
destinations=metadata-cache://percona/default?role=SECONDARY
routing_strategy=round-robin
protocol=classic
[routing:percona_default_x_rw]
bind_address=0.0.0.0
bind_port=64460
destinations=metadata-cache://percona/default?role=PRIMARY
routing_strategy=round-robin
protocol=x
[routing:percona_default_x_ro]
bind_address=0.0.0.0
bind_port=64470
destinations=metadata-cache://percona/default?role=SECONDARY
routing_strategy=round-robin
protocol=x

We are almost done now, only one post left. The final post is about our third component MySQL Shell, so please keep reading.

The post InnoDB Cluster in a Nutshell: Part 2 MySQL Router appeared first on Percona Database Performance Blog.

by Francisco Bordenave at July 16, 2018 01:07 PM

Jean-Jerome Schmidt

How to Control Replication Failover for MySQL and MariaDB

Automated failover is pretty much a must have for many applications - uptime is taken for granted. It’s quite hard to accept that an application is down for 20 or 30 minutes because someone has to be paged to log in and investigate the situation before taking action.

In the real world, replication setups tend to grow over time to become complex, sometimes messy. And there are constraints. For instance, not every node in a setup makes a good master candidate. Maybe the hardware differs and some of the replicas have less powerful hardware as they are dedicated to handle some specific types of the workload? Maybe you are in the middle of migration to a new MySQL version and some of the slaves have already been upgraded? You’d rather not have a master in more recent version replicating to old replicas, as this can break replication. If you have two datacenters, one active and one for disaster recovery, you may prefer to pick master candidates only in the active datacenter, to keep the master close to the application hosts. Those are just example situations, where you may find yourself in need of manual intervention during the failover process. Luckily, many failover tools have an option to take control of the process by using whitelists and blacklists. In this blog post, we’d like to show you some examples how you can influence ClusterControl’s algorithm for picking master candidates.

Whitelist and Blacklist Configuration

ClusterControl gives you an option to define both whitelist and blacklist of replicas. A whitelist is a list of replicas which are intended to become master candidates. If none of them are available (either because they are down, or there are errant transactions, or there are other obstacles that prevent any of them from being promoted), failover will not be performed. In this way, you can define which hosts are available to become a master candidate. Blacklists, on the other hand, define which replicas are not suitable to become a master candidate.

Both of those lists can be defined in the cmon configuration file for a given cluster. For example, if your cluster has id of ‘1’, you want to edit ‘/etc/cmon.d/cmon_1.cnf’. For whitelist you will use ‘replication_failover_whitelist’ variable, for blacklist it will be a ‘replication_failover_blacklist’. Both accept a comma separated list of ‘host:port’.

Let’s consider the following replication setup. We have an active master (10.0.0.141) which has two replicas (10.0.0.142 and 10.0.0.143), both act as intermediate masters and have one replica each (10.0.0.144 and 10.0.0.147). We also have a standby master in a separate datacenter (10.0.0.145) which has a replica (10.0.0.146). Those hosts are intended to be used in case of a disaster. Replicas 10.0.0.146 and 10.0.0.147 act as backup hosts. See below screenshot.

Given that the second datacenter is only intended for disaster recovery, we don’t want any of those hosts to be promoted as master. In the worst case scenario, we will take manual action. The second datacenter’s infrastructure is not scaled to the size of the production datacenter (there are three replicas less in the DR datacenter), so manual actions are needed anyway before we can promote a host in the DR datacenter. We also would not like for a backup replica (10.0.0.147) to be promoted. Neither we want a third replica in the chain to be picked up as a master (even though it could be done with GTID).

Whitelist Configuration

We can configure either whitelist or a blacklist to make sure that failover will be handled to our liking. In this particular setup, using whitelist may be more suitable - we will define which hosts can be used for failover and if someone adds a new host to the setup, it will not be taken under consideration as master candidate until someone will manually decide it is ok to use it and add it to the whitelist. If we used blacklist, adding a new replica somewhere in the chain could mean that such replica could theoretically be automatically used for failover unless someone explicitly says it cannot be used. Let’s stay on the safe side and define a whitelist in our cluster configuration file (in this case it is /etc/cmon.d/cmon_1.cnf as we have just one cluster):

replication_failover_whitelist=10.0.0.141:3306,10.0.0.142:3306,10.0.0.143:3306

We have to make sure that the cmon process has been restarted to apply changes:

service cmon restart

Let’s assume our master has crashed and cannot be reached by ClusterControl. A failover job will be initiated:

The topology will look like below:

As you can see, the old master is disabled and ClusterControl will not attempt to automatically recover it. It is up to the user to check what has happened, copy any data which may not have been replicated to the master candidate and rebuild the old master:

Then it’s a matter of a few topology changes and we can bring the topology to the original state, just replacing 10.0.0.141 with 10.0.0.142:

Blacklist Configuration

Now we are going to see how the blacklist works. We mentioned that, in our example, it may not be the best option but we will try it for the sake of illustration. We will blacklist every host except 10.0.0.141, 10.0.0.142 and 10.0.0.143 as those are the hosts we want to see as master candidates.

replication_failover_blacklist=10.0.0.145:3306,10.0.0.146:3306,10.0.0.144:3306,10.0.0.147:3306

We will also restart the cmon process to apply configuration changes:

service cmon restart

The failover process is similar. Again, once the master crash is detected, ClusterControl will start a failover job.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

When a Replica May Not be a Good Master Candidate

In this short section, we would like to discuss in more details some of the cases in which you may not want to promote a given replica to become a new master. Hopefully, this will give you some ideas of the cases where you may need to consider inducing more manual control of the failover process.

Different MySQL Version

First, if your replica uses a different MySQL version than the master, it is not a good idea to promote it. Generally speaking, a more recent version is always a no-go as replication from the new to the old MySQL version is not supported and may not work correctly. This is relevant mostly to major versions (for example, 8.0 replicating to 5.7) but the good practice is to avoid this setup altogether, even if we are talking about small version differences (5.7.x+1 -> 5.7.x). Replicating from lower to higher/newer version is supported as it is a must for the upgrade process, but still, you would rather want to avoid this (for example, if your master is on 5.7.x+1 you would rather not replace it with a replica on 5.7.x).

Different Roles

You may assign different roles to your replicas. You can pick one of them to be available for developers to test their queries on a production dataset. You may use one of them for OLAP workload. You may use one of them for backups. No matter what it is, typically you would not want to promote such replica to master. All of those additional, non-standard workloads may cause performance problems due to the additional overhead. A good choice for a master candidate is a replica which is handling “normal” load, more or less the same type of load as the current master. You can then be certain it will handle the master load after failover if it handled it before that.

Different Hardware Specifications

We mentioned different roles for replicas. It is not uncommon to see different hardware specifications too, especially in conjunction with different roles. For example, a backup slave most likely doesn’t have to be as powerful as a regular replica. Developers may also test their queries on a slower database than the production (mostly because you would not expect the same level of concurrency on development and production database) and, for example, CPU core count can be reduced. Disaster recovery setups may also be reduced in size if their main role would be to keep up with the replication and it is expected that DR setup will have to be scaled (both vertically, by sizing up the instance and horizontally, by adding more replicas) before traffic can be redirected to it.

Delayed Replicas

Some of the replicas may be delayed - it is a very good way of reducing recovery time if data has been lost, but it makes them very bad master candidates. If a replica is delayed by 30 minutes, you will either lose that 30 minutes of transactions or you will have to wait (probably not 30 minutes as, most likely, the replica can catch up faster) for the replica to apply all delayed transactions. ClusterControl allows you to pick if you want to wait or if you want to failover immediately, but this would work ok for a very small lag - tens of seconds at most. If failover is supposed to take minutes, it’s just no point on using such a replica and therefore it’s a good idea to blacklist it.

Different Datacenter

We mentioned scaled-down DR setups but even if your second datacenter is scaled to the size of production, it still may be a good idea to keep the failovers within a single DC only. For starters, your active application hosts may be located in the main datacenter thus moving the master to a standby DC would significantly increase latency for write queries. Also, in case of a network split, you may want to manually handle this situation. MySQL does not have a quorum mechanism built in therefore it is kind of tricky to correctly handle (in an automatic way) network loss between two datacenters.

by krzysztof at July 16, 2018 10:33 AM

July 13, 2018

Peter Zaitsev

This Week in Data with Colin Charles 45: OSCON and Percona Live Europe 2018 Call for Papers

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Hello again after the hiatus last week. I’m en route to Portland for OSCON, and am very excited as it is the conference’s 20th anniversary! I hope to see some of you at my talk on July 19.

On July 18, join me for a webinar: MariaDB 10.3 vs. MySQL 8.0 at 9:00 AM PDT (UTC-7) / 12:00 PM EDT (UTC-4). I’m also feverishly working on an update to MySQL vs. MariaDB: Reality Check, now that both MySQL 8.0 and MariaDB Server 10.3 are generally available.

Rather important: Percona Live Europe 2018 Call for Papers is now open. You can submit talk ideas until August 10, and the theme is Connect. Accelerate. Innovate.

Releases

Link List

Industry Updates

Upcoming appearances

  • OSCON – Portland, Oregon, USA – July 16-19 2018

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 45: OSCON and Percona Live Europe 2018 Call for Papers appeared first on Percona Database Performance Blog.

by Colin Charles at July 13, 2018 10:45 PM

On MySQL and Intel Optane performance

MySQL 8 versus Percona Server with heavy IO application on Intel Optane

Recently, Dimitri published the results of measuring MySQL 8.0 on Intel Optane storage device. In this blog post, I wanted to look at this in more detail and explore the performance of MySQL 8, MySQL 5.7 and Percona Server for MySQL using a similar set up. The Intel Optane is a very capable device, so I was puzzled that Dimitri chose MySQL options that are either not safe or not recommended for production workloads.

Since we have an Intel Optane in our labs, I wanted to run a similar benchmark, but using settings that we would recommend our customers to use, namely:

  • use innodb_checksum
  • use innodb_doublewrite
  • use binary logs with sync_binlog=1
  • enable (by default) Performance Schema

I still used

charset=latin1
  (even though the default is utf8mb4 in MySQL 8) and I set a total size of InnoDB log files to 30GB (as in Dimitri’s benchmark). This setting allocates big InnoDB log files to ensure there is no pressure from adaptive flushing. Though I have concerns about how it works in MySQL 8, this is a topic for another research.

So let’s see how MySQL 8.0 performed with these settings, and compare it with MySQL 5.7 and Percona Server for MySQL 5.7.

I used an Intel Optane SSD 905P 960GB device on the server with 2 socket Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz CPUs.

To highlight the performance difference I wanted to show, I used a single case: sysbench 8 tables 50M rows each (which is about ~120GB of data) and buffer pool 32GB. I ran sysbench oltp_read_write in 128 threads.

First, let’s review the results for MySQL 8 vs MySQL 5.7

After achieving a steady state – we can see that MySQL 8 does not have ANY performance improvements over MySQL 5.7.

Let’s compare this with Percona Server for MySQL 5.7

MySQL 8 versus Percona Server with heavy IO application on Intel Optane

Percona Server for MySQL 5.7 shows about 60% performance improvement over both MySQL 5.7 and MySQL 8.

How did we achieve this? All our improvements are described here: https://www.percona.com/doc/percona-server/LATEST/performance/xtradb_performance_improvements_for_io-bound_highly-concurrent_workloads.html. In short:

  1. Parallel doublewrite.  In both MySQL 5.7 and MySQL 8 writes are serialized by writing to doublewrite.
  2. Multi-threaded LRU flusher. We reported and proposed a solution here https://bugs.mysql.com/bug.php?id=70500. However, Oracle have not incorporated the solution upstream.
  3. Single page eviction. This is another problematic area in MySQL’s flushing algorithm. The bug https://bugs.mysql.com/bug.php?id=81376 was reported over 2 years ago, but unfortunately it’s still overlooked.

Summarizing performance findings:

  • For Percona Server for MySQL during this workload, I observed 1.4 GB/sec  reads and 815 MB/sec  writes
  • For MySQL 5.7 and MySQL 8 the numbers are 824 MB/sec reads and  530 MB/sec writes.

My opinion is that Oracle focused on addressing the wrong performance problems in MySQL 8 and did not address the real issues. In this benchmark, using real production settings, MySQL 8 does not show any significant performance benefits over MySQL 5.7 for workloads characterized by heavy IO writes.

With this, I should admit that Intel Optane is a very performant storage. By comparison, on Intel 3600 SSD under the same workload, for Percona Server I am able to achieve only 2000 tps, which is 2.5x times slower than with Intel Optane.

Drawing some conclusions

So there are a few outcomes I can highlight:

  • Intel Optane is a very capable drive, it is easily the fastest of those we’ve tested so far
  • MySQL 8 is not able to utilize all the power of Intel Optane, unless you use unsafe settings (which to me is the equivalent of driving 200 MPH on a highway without working brakes)
  • Oracle has focused on addressing the wrong IO bottlenecks and has overlooked the real ones
  • To get all the benefits of Intel Optane performance, use a proper server—Percona Server for MySQL—which is able to utilize more IOPS from the device.

The post On MySQL and Intel Optane performance appeared first on Percona Database Performance Blog.

by Vadim Tkachenko at July 13, 2018 03:25 PM

Jean-Jerome Schmidt

How to perform Schema Changes in MySQL & MariaDB in a Safe Way

Before you attempt to perform any schema changes on your production databases, you should make sure that you have a rock solid rollback plan; and that your change procedure has been successfully tested and validated in a separate environment. At the same time, it’s your responsibility to make sure that the change causes none or the least possible impact acceptable to the business. It’s definitely not an easy task.

In this article, we will take a look at how to perform database changes on MySQL and MariaDB in a controlled way. We will talk about some good habits in your day-to-day DBA work. We’ll focus on pre-requirements and tasks during the actual operations and problems that you may face when you deal with database schema changes. We will also talk about open source tools that may help you in the process.

Test and rollback scenarios

Backup

There are many ways to lose your data. Schema upgrade failure is one of them. Unlike application code, you can’t drop a bundle of files and declare that a new version has been successfully deployed. You also can’t just put back an older set of files to rollback your changes. Of course, you can run another SQL script to change the database again, but there are cases when the only accurate way to roll back changes is by restoring the entire database from backup.

However, what if you can’t afford to rollback your database to the latest backup, or your maintenance window is not big enough (considering system performance), so you can’t perform a full database backup before the change?

One may have a sophisticated, redundant environment, but as long as data is modified in both primary and standby locations, there is not much to do about it. Many scripts can just be run once, or the changes are impossible to undo. Most of the SQL change code falls into two groups:

  • Run once – you can’t add the same column to the table twice.
  • Impossible to undo – once you’ve dropped that column, it’s gone. You could undoubtedly restore your database, but that’s not precisely an undo.

You can tackle this problem in at least two possible ways. One would be to enable the binary log and take a backup, which is compatible with PITR. Such backup has to be full, complete and consistent. For xtrabackup, as long as it contains a full dataset, it will be PITR-compatible. For mysqldump, there is an option to make it PITR-compatible too. For smaller changes, a variation of mysqldump backup would be to take only a subset of data to change. This can be done with --where option. The backup should be part of the planned maintenance.

mysqldump -u -p --lock-all-tables --where="WHERE employee_id=100" mydb employees> backup_table_tmp_change_07132018.sql

Another possibility is to use CREATE TABLE AS SELECT.

You can store data or simple structure changes in the form of a fixed temporary table. With this approach you will get a source if you need to rollback your changes. It may be quite handy if you don’t change much data. The rollback can be done by taking data out from it. If any failures occur while copying the data to the table, it is automatically dropped and not created, so make sure that your statement creates a copy you need.

Obviously, there are some limitations too.

Because the ordering of the rows in the underlying SELECT statements cannot always be determined, CREATE TABLE ... IGNORE SELECT and CREATE TABLE ... REPLACE SELECT are flagged as unsafe for statement-based replication. Such statements produce a warning in the error log when using statement-based mode and are written to the binary log using the row-based format when using MIXED mode.

A very simple example of such method could be:

CREATE TABLE tmp_employees_change_07132018 AS SELECT * FROM employees where employee_id=100;
UPDATE employees SET salary=120000 WHERE employee_id=100;
COMMMIT;

Another interesting option may be MariaDB flashback database. When a wrong update or delete happens, and you would like to revert to a state of the database (or just a table) at a certain point in time, you may use the flashback feature.

Point-in-time rollback enables DBAs to recover data faster by rolling back transactions to a previous point in time rather than performing a restore from a backup. Based on ROW-based DML events, flashback can transform the binary log and reverse purposes. That means it can help undo given row changes fast. For instance, it can change DELETE events to INSERTs and vice versa, and it will swap WHERE and SET parts of the UPDATE events. This simple idea can dramatically speed up recovery from certain types of mistakes or disasters. For those who are familiar with the Oracle database, it’s a well known feature. The limitation of MariaDB flashback is the lack of DDL support.

Create a delayed replication slave

Since version 5.6, MySQL supports delayed replication. A slave server can lag behind the master by at least a specified amount of time. The default delay is 0 seconds. Use the MASTER_DELAY option for CHANGE MASTER TO to set the delay to N seconds:

CHANGE MASTER TO MASTER_DELAY = N;

It would be a good option if you didn’t have time to prepare a proper recovery scenario. You need to have enough delay to notice the problematic change. The advantage of this approach is that you don’t need to restore your database to take out data needed to fix your change. Standby DB is up and running, ready to pick up data which minimizes the time needed.

Create an asynchronous slave which is not part of the cluster

When it comes to Galera cluster, testing changes is not easy. All nodes run the same data, and heavy load can harm flow control. So you not only need to check if changes applied successfully, but also what the impact to the cluster state was. To make your test procedure as close as possible to the production workload, you may want to add an asynchronous slave to your cluster and run your test there. The test will not impact synchronization between cluster nodes, because technically it’s not part of the cluster, but you will have an option to check it with real data. Such slave can be easily added from ClusterControl.

ClusterControl add asynchronous slave
ClusterControl add asynchronous slave

As shown in the above screenshot, ClusterControl can automate the process of adding an asynchronous slave in a few ways. You can add the node to the cluster, delay the slave. To reduce the impact on the master, you can use an existing backup instead of the master as the data source when building the slave.

Clone database and measure time

A good test should be as close as possible to the production change. The best way to do this is to clone your existing environment.

ClusterControl Clone Cluster for test
ClusterControl Clone Cluster for test

Perform changes via replication

To have better control over your changes, you can apply them on a slave server ahead of time and then do the switchover. For statement-based replication, this works fine, but for row-based replication, this can work up to a certain degree. Row-based replication enables extra columns to exist at the end of the table, so as long as it can write the first columns, it will be fine. First apply these setting to all slaves, then failover to one of the slaves and then implement the change to the master and attach that as a slave. If your modification involves inserting or removing a column in the middle of the table, it will work with row-based replication.

Operation

During the maintenance window, we do not want to have application traffic on the database. Sometimes it is hard to shut down all applications spread over the whole company. Alternatively, we want to allow only some specific hosts to access MySQL from remote (for example the monitoring system or the backup server). For this purpose, we can use the Linux packet filtering. To see what packet filtering rules are available, we can run the following command:

iptables -L INPUT -v

To close the MySQL port on all interfaces we use:

iptables -A INPUT -p tcp --dport mysql -j DROP

and to open the MySQL port again after the maintenance window:

iptables -D INPUT -p tcp --dport mysql -j DROP

For those without root access, you can change max_connection to 1 or 'skip networking'.

Logging

To get the logging process started, use the tee command at the MySQL client prompt, like this:

mysql> tee /tmp/my.out;

That command tells MySQL to log both the input and output of your current MySQL login session to a file named /tmp/my.out .Then execute your script file with source command.

To get a better idea of your execution times, you can combine it with the profiler feature. Start the profiler with

SET profiling = 1;

Then execute your Query with

SHOW PROFILES;

you see a list of queries the profiler has statistics for. So finally, you choose which query to examine with

SHOW PROFILE FOR QUERY 1;

Schema migration tools

Many times, a straight ALTER on the master is not possible - most of the cases it causes lag on the slave, and this may not be acceptable to the applications. What can be done, though, is to execute the change in a rolling mode. You can start with slaves and, once the change is applied to the slave, migrate one of the slaves as a new master, demote the old master to a slave and execute the change on it.

A tool that may help with such a task is Percona’s pt-online-schema-change. Pt-online-schema-change is straightforward - it creates a temporary table with the desired new schema (for instance, if we added an index, or removed a column from a table). Then, it creates triggers on the old table. Those triggers are there to mirror changes that happen on the original table to the new table. Changes are mirrored during the schema change process. If a row is added to the original table, it is also added to the new one. It emulates the way that MySQL alters tables internally, but it works on a copy of the table you wish to alter. It means that the original table is not locked, and clients may continue to read and change data in it.

Likewise, if a row is modified or deleted on the old table, it is also applied in the new table. Then, a background process of copying data (using LOW_PRIORITY INSERT) between old and new table begins. Once data has been copied, RENAME TABLE is executed.

Another intresting tool is gh-ost. Gh-ost creates a temporary table with the altered schema, just like pt-online-schema-change does. It executes INSERT queries, which use the following pattern to copy data from old to new table. Nevertheless it does not use triggers. Unfortunately triggers may be the source of many limitations. gh-ost uses the binary log stream to capture table changes and asynchronously applies them onto the ghost table. Once we verified that gh-ost can execute our schema change correctly, it’s time to actually execute it. Keep in mind that you may need to manually drop old tables that were created by gh-ost during the process of testing the migration. You can also use --initially-drop-ghost-table and --initially-drop-old-table flags to ask gh-ost to do it for you. The final command to execute is exactly the same as we used to test our change, we just added --execute to it.

pt-online-schema-change and gh-ost are very popular among Galera users. Nevertheless Galera has some additional options.The two methods Total Order Isolation (TOI) and Rolling Schema Upgrade (RSU) have both their pros and cons.

TOI - This is the default DDL replication method. The node that originates the writeset detects DDL at parsing time and sends out a replication event for the SQL statement before even starting the DDL processing. Schema upgrades run on all cluster nodes in the same total order sequence, preventing other transactions from committing for the duration of the operation. This method is good when you want your online schema upgrades to replicate through the cluster and don’t mind locking the entire table (similar to how default schema changes happened in MySQL).

SET GLOBAL wsrep_OSU_method='TOI';

RSU - perfom the schema upgrades locally. In this method, your writes are affecting only the node on which they are run. The changes do not replicate to the rest of the cluster.This method is good for non-conflicting operations and it will not slow down the cluster.

SET GLOBAL wsrep_OSU_method='RSU';

While the node processes the schema upgrade, it desynchronizes with the cluster. When it finishes processing the schema upgrade, it applies delayed replication events and synchronizes itself with the cluster. This could be a good option to run heavy index creations.

Conclusion

We presented here several different methods that may help you with planning your schema changes. Of course it all depends on your application and business requirements. You can design your change plan, perform necessary tests, but there is still a small chance that something will go wrong. According to Murphy’s law - “things will go wrong in any given situation, if you give them a chance”. So make sure you try out different ways of performing these changes, and pick the one that you are the most comfortable with.

by Bart Oles at July 13, 2018 09:29 AM

July 12, 2018

MariaDB AB

Ask the Experts – Extended Q&A on MariaDB TX 3.0 and MariaDB Server 10.3

Ask the Experts – Extended Q&A on MariaDB TX 3.0 and MariaDB Server 10.3 MariaDB Team Thu, 07/12/2018 - 15:59

Until the recent release of MariaDB TX 3.0, several enterprise database features were available only from proprietary vendors such as Oracle, Microsoft and IBM. It’s no wonder that our first webinar on MariaDB TX 3.0 drew a large crowd – and spurred a lively discussion with lots of great follow-up questions.

That’s why we hosted an Ask the Experts webinar recently. We’ll started with a brief overview of those new enterprise-grade features, including temporal tables and queries, the MyRocks and Spider storage engines, and more. Then we dug into the nitty-gritty—a deep-dive Q&A session with MariaDB senior product and engineering leads.

Get answers to your implementation questions and understand how MariaDB TX 3.0 (which includes MariaDB Server 10.3 at its core) can help your organization achieve cost savings, simplify migration, support multiple workloads with different characteristics–whether transactional, analytical, write-intensive or extreme scale, and more. 

Watch the recording on-demand

And if you missed our What’s New in MariaDB TX 3.0 webinar (or just want a review as you head into the Q&A session), watch it on demand anytime.

Login or Register to post comments

by MariaDB Team at July 12, 2018 07:59 PM

Serge Frezefond

Porting this Oracle MySQL feature to MariaDB would be great ;-)

Oracle has done a great technical work with MySQL. Specifically a nice job has been done around security. There is one useful feature that exists in Oracle MySQL and that currently does not exist in MariaDB. Oracle MySQL offers the possibility from within the server to generate asymetric key pairs. It is then possible use [...]

by Serge at July 12, 2018 03:05 PM

Peter Zaitsev

Why MySQL Stored Procedures, Functions and Triggers Are Bad For Performance

Execution map for func1()

MySQL stored procedures, functions and triggers are tempting constructs for application developers. However, as I discovered, there can be an impact on database performance when using MySQL stored routines. Not being entirely sure of what I was seeing during a customer visit, I set out to create some simple tests to measure the impact of triggers on database performance. The outcome might surprise you.

Why stored routines are not optimal performance wise: short version

Recently, I worked with a customer to profile the performance of triggers and stored routines. What I’ve learned about stored routines: “dead” code (the code in a branch which will never run) can still significantly slow down the response time of a function/procedure/trigger. We will need to be careful to clean up what we do not need.

Profiling MySQL stored functions

Let’s compare these four simple stored functions (in MySQL 5.7):

Function 1:

CREATE DEFINER=`root`@`localhost` FUNCTION `func1`() RETURNS int(11)
BEGIN
	declare r int default 0;
RETURN r;
END

This function simply declares a variable and returns it. It is a dummy function

Function 2:

CREATE DEFINER=`root`@`localhost` FUNCTION `func2`() RETURNS int(11)
BEGIN
    declare r int default 0;
    IF 1=2
    THEN
		select levenshtein_limit_n('test finc', 'test func', 1000) into r;
    END IF;
RETURN r;
END

This function calls another function, levenshtein_limit_n (calculates levenshtein distance). But wait: this code will never run – the condition IF 1=2 will never be true. So that is the same as function 1.

Function 3:

CREATE DEFINER=`root`@`localhost` FUNCTION `func3`() RETURNS int(11)
BEGIN
    declare r int default 0;
    IF 1=2 THEN
		select levenshtein_limit_n('test finc', 'test func', 1) into r;
    END IF;
    IF 2=3 THEN
		select levenshtein_limit_n('test finc', 'test func', 10) into r;
    END IF;
    IF 3=4 THEN
		select levenshtein_limit_n('test finc', 'test func', 100) into r;
    END IF;
    IF 4=5 THEN
		select levenshtein_limit_n('test finc', 'test func', 1000) into r;
    END IF;
RETURN r;
END

Here there are four conditions and none of these conditions will be true: there are 4 calls of “dead” code. The result of the function call for function 3 will be the same as function 2 and function 1.

Function 4:

CREATE DEFINER=`root`@`localhost` FUNCTION `func3_nope`() RETURNS int(11)
BEGIN
    declare r int default 0;
    IF 1=2 THEN
		select does_not_exit('test finc', 'test func', 1) into r;
    END IF;
    IF 2=3 THEN
		select does_not_exit('test finc', 'test func', 10) into r;
    END IF;
    IF 3=4 THEN
		select does_not_exit('test finc', 'test func', 100) into r;
    END IF;
    IF 4=5 THEN
		select does_not_exit('test finc', 'test func', 1000) into r;
    END IF;
RETURN r;
END

This is the same as function 3 but the function we are running does not exist. Well, it does not matter as the

select does_not_exit
  will never run.

So all the functions will always return 0. We expect that the performance of these functions will be the same or very similar. Surprisingly it is not the case! To measure the performance I used the “benchmark” function to run the same function 1M times. Here are the results:

+-----------------------------+
| benchmark(1000000, func1()) |
+-----------------------------+
|                           0 |
+-----------------------------+
1 row in set (1.75 sec)
+-----------------------------+
| benchmark(1000000, func2()) |
+-----------------------------+
|                           0 |
+-----------------------------+
1 row in set (2.45 sec)
+-----------------------------+
| benchmark(1000000, func3()) |
+-----------------------------+
|                           0 |
+-----------------------------+
1 row in set (3.85 sec)
+----------------------------------+
| benchmark(1000000, func3_nope()) |
+----------------------------------+
|                                0 |
+----------------------------------+
1 row in set (3.85 sec)

As we can see func3 (with four dead code calls which will never be executed, otherwise identical to func1) runs almost 3x slower compared to func1(); func3_nope() is identical in terms of response time to func3().

Visualizing all system calls from functions

To figure out what is happening inside the function calls I used performance_schema / sys schema to create a trace with ps_trace_thread() procedure

  1. Get the thread_id for the MySQL connection:
    mysql> select THREAD_ID from performance_schema.threads where processlist_id = connection_id();
    +-----------+
    | THREAD_ID |
    +-----------+
    |        49 |
    +-----------+
    1 row in set (0.00 sec)
  2. Run ps_trace_thread in another connection passing the thread_id=49:
    mysql> CALL sys.ps_trace_thread(49, concat('/var/lib/mysql-files/stack-func1-run1.dot'), 10, 0, TRUE, TRUE, TRUE);
    +--------------------+
    | summary            |
    +--------------------+
    | Disabled 0 threads |
    +--------------------+
    1 row in set (0.00 sec)
    +---------------------------------------------+
    | Info                                        |
    +---------------------------------------------+
    | Data collection starting for THREAD_ID = 49 |
    +---------------------------------------------+
    1 row in set (0.00 sec)
  3. At that point I switched to the original connection (thread_id=49) and run:
    mysql> select func1();
    +---------+
    | func1() |
    +---------+
    |       0 |
    +---------+
    1 row in set (0.00 sec)
  4. The sys.ps_trace_thread collected the data (for 10 seconds, during which I ran the
    select func1()
     ), then it finished its collection and created the dot file:
    +-----------------------------------------------------------------------+
    | Info                                                                  |
    +-----------------------------------------------------------------------+
    | Stack trace written to /var/lib/mysql-files/stack-func3nope-new12.dot |
    +-----------------------------------------------------------------------+
    1 row in set (9.21 sec)
    +-------------------------------------------------------------------------------+
    | Convert to PDF                                                                |
    +-------------------------------------------------------------------------------+
    | dot -Tpdf -o /tmp/stack_49.pdf /var/lib/mysql-files/stack-func3nope-new12.dot |
    +-------------------------------------------------------------------------------+
    1 row in set (9.21 sec)
    +-------------------------------------------------------------------------------+
    | Convert to PNG                                                                |
    +-------------------------------------------------------------------------------+
    | dot -Tpng -o /tmp/stack_49.png /var/lib/mysql-files/stack-func3nope-new12.dot |
    +-------------------------------------------------------------------------------+
    1 row in set (9.21 sec)
    Query OK, 0 rows affected (9.45 sec)

I repeated these steps for all the functions above and then created charts of the commands.

Here are the results:

Func1()

Execution map for func1()

Func2()

Execution map for func2()

Func3()

Execution map for func3()

 

As we can see there is a sp/jump_if_not call for every “if” check followed by an opening tables statement (which is quite interesting). So parsing the “IF” condition made a difference.

For MySQL 8.0 we can also see MySQL source code documentation for stored routines which documents how it is implemented. It reads:

Flow Analysis Optimizations
After code is generated, the low level sp_instr instructions are optimized. The optimization focuses on two areas:

Dead code removal,
Jump shortcut resolution.
These two optimizations are performed together, as they both are a problem involving flow analysis in the graph that represents the generated code.

The code that implements these optimizations is sp_head::optimize().

However, this does not explain why it executes “opening tables”. I have filed a bug.

When slow functions actually make a difference

Well, if we do not plan to run one million of those stored functions we will never even notice the difference. However, where it will make a difference is … inside a trigger. Let’s say that we have a trigger on a table: every time we update that table it executes a trigger to update another field. Here is an example: let’s say we have a table called “form” and we simply need to update its creation date:

mysql> update form set form_created_date = NOW() where form_id > 5000;
Query OK, 65536 rows affected (0.31 sec)
Rows matched: 65536  Changed: 65536  Warnings: 0

That is good and fast. Now we create a trigger which will call our dummy func1():

CREATE DEFINER=`root`@`localhost` TRIGGER `test`.`form_AFTER_UPDATE`
AFTER UPDATE ON `form`
FOR EACH ROW
BEGIN
	declare r int default 0;
	select func1() into r;
END

Now repeat the update. Remember: it does not change the result of the update as we do not really do anything inside the trigger.

mysql> update form set form_created_date = NOW() where form_id > 5000;
Query OK, 65536 rows affected (0.90 sec)
Rows matched: 65536  Changed: 65536  Warnings: 0

Just adding a dummy trigger will add 2x overhead: the next trigger, which does not even run a function, introduces a slowdown:

CREATE DEFINER=`root`@`localhost` TRIGGER `test`.`form_AFTER_UPDATE`
AFTER UPDATE ON `form`
FOR EACH ROW
BEGIN
	declare r int default 0;
END
mysql> update form set form_created_date = NOW() where form_id > 5000;
Query OK, 65536 rows affected (0.52 sec)
Rows matched: 65536  Changed: 65536  Warnings: 0

Now, lets use func3 (which has “dead” code and is equivalent to func1):

CREATE DEFINER=`root`@`localhost` TRIGGER `test`.`form_AFTER_UPDATE`
AFTER UPDATE ON `form`
FOR EACH ROW
BEGIN
	declare r int default 0;
	select func3() into r;
END
mysql> update form set form_created_date = NOW() where form_id > 5000;
Query OK, 65536 rows affected (1.06 sec)
Rows matched: 65536  Changed: 65536  Warnings: 0

However, running the code from the func3 inside the trigger (instead of calling a function) will speed up the update:

CREATE DEFINER=`root`@`localhost` TRIGGER `test`.`form_AFTER_UPDATE`
AFTER UPDATE ON `form`
FOR EACH ROW
BEGIN
    declare r int default 0;
    IF 1=2 THEN
		select levenshtein_limit_n('test finc', 'test func', 1) into r;
    END IF;
    IF 2=3 THEN
		select levenshtein_limit_n('test finc', 'test func', 10) into r;
    END IF;
    IF 3=4 THEN
		select levenshtein_limit_n('test finc', 'test func', 100) into r;
    END IF;
    IF 4=5 THEN
		select levenshtein_limit_n('test finc', 'test func', 1000) into r;
    END IF;
END
mysql> update form set form_created_date = NOW() where form_id > 5000;
Query OK, 65536 rows affected (0.66 sec)
Rows matched: 65536  Changed: 65536  Warnings: 0

Memory allocation

Potentially, even if the code will never run, MySQL will still need to parse the stored routine—or trigger—code for every execution, which can potentially lead to a memory leak, as described in this bug.

Conclusion

Stored routines and trigger events are parsed when they are executed. Even “dead” code that will never run can significantly affect the performance of bulk operations (e.g. when running this inside the trigger). That also means that disabling a trigger by setting a “flag” (e.g.

if @trigger_disable = 0 then ...
 ) can still affect performance of bulk operations.

The post Why MySQL Stored Procedures, Functions and Triggers Are Bad For Performance appeared first on Percona Database Performance Blog.

by Alexander Rubin at July 12, 2018 01:19 PM

July 11, 2018

Jean-Jerome Schmidt

Galera Cluster Recovery 101 - A Deep Dive into Network Partitioning

One of the cool features in Galera is automatic node provisioning and membership control. If a node fails or loses communication, it will be automatically evicted from the cluster and remain unoperational. As long as the majority of nodes are still communicating (Galera calls this PC - primary component), there is a very high chance the failed node would be able to automatically rejoin, resync and resume the replication once the connectivity is back.

Generally, all Galera nodes are equal. They hold the same data set and same role as masters, capable of handling read and write simultaneously, thanks to Galera group communication and certification-based replication plugin. Therefore, there is actually no failover from the database point-of-view due to this equilibrium. Only from the application side that would require failover, to skip the unoperational nodes while the cluster is partitioned.

In this blog post, we are going to look into understanding how Galera Cluster performs node and cluster recovery in case network partition happens. Just as a side note, we have covered a similar topic in this blog post some time back. Codership has explained Galera's recovery concept in great details in the documentation page, Node Failure and Recovery.

Node Failure and Eviction

In order to understand the recovery, we have to understand how Galera detects the node failure and eviction process first. Let's put this into a controlled test scenario so we can understand the eviction process better. Suppose we have a three-node Galera Cluster as illustrated below:

The following command can be used to retrieve our Galera provider options:

mysql> SHOW VARIABLES LIKE 'wsrep_provider_options'\G

It's a long list, but we just need to focus on some of the parameters to explain the process:

evs.inactive_check_period = PT0.5S; 
evs.inactive_timeout = PT15S; 
evs.keepalive_period = PT1S; 
evs.suspect_timeout = PT5S; 
evs.view_forget_timeout = P1D;
gmcast.peer_timeout = PT3S;

First of all, Galera follows ISO 8601 formatting to represent duration. P1D means the duration is one day, while PT15S means the duration is 15 seconds (note the time designator, T, that precedes the time value). For example if one wanted to increase evs.view_forget_timeout to 1 day and a half, one would set P1DT12H, or PT36H.

Considering all hosts haven't been configured with any firewall rules, we use the following script called block_galera.sh on galera2 to simulate a network failure to/from this node:

#!/bin/bash
# block_galera.sh
# galera2, 192.168.55.172

iptables -I INPUT -m tcp -p tcp --dport 4567 -j REJECT
iptables -I INPUT -m tcp -p tcp --dport 3306 -j REJECT
iptables -I OUTPUT -m tcp -p tcp --dport 4567 -j REJECT
iptables -I OUTPUT -m tcp -p tcp --dport 3306 -j REJECT
# print timestamp
date

By executing the script, we get the following output:

$ ./block_galera.sh
Wed Jul  4 16:46:02 UTC 2018

The reported timestamp can be considered as the start of the cluster partitioning, where we lose galera2, while galera1 and galera3 are still online and accessible. At this point, our Galera Cluster architecture is looking something like this:

From Partitioned Node Perspective

On galera2, you will see some printouts inside the MySQL error log. Let's break them out into several parts. The downtime was started around 16:46:02 UTC time and after gmcast.peer_timeout=PT3S, the following appears:

2018-07-04 16:46:05 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') connection to peer 8b2041d6 with addr tcp://192.168.55.173:4567 timed out, no messages seen in PT3S
2018-07-04 16:46:05 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.55.173:4567
2018-07-04 16:46:06 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') connection to peer 737422d6 with addr tcp://192.168.55.171:4567 timed out, no messages seen in PT3S
2018-07-04 16:46:06 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') reconnecting to 8b2041d6 (tcp://192.168.55.173:4567), attempt 0

As it passed evs.suspect_timeout = PT5S, both nodes galera1 and galera3 are suspected as dead by galera2:

2018-07-04 16:46:07 140454904243968 [Note] WSREP: evs::proto(62116b35, OPERATIONAL, view_id(REG,62116b35,54)) suspecting node: 8b2041d6
2018-07-04 16:46:07 140454904243968 [Note] WSREP: evs::proto(62116b35, OPERATIONAL, view_id(REG,62116b35,54)) suspected node without join message, declaring inactive
2018-07-04 16:46:07 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') reconnecting to 737422d6 (tcp://192.168.55.171:4567), attempt 0
2018-07-04 16:46:08 140454904243968 [Note] WSREP: evs::proto(62116b35, GATHER, view_id(REG,62116b35,54)) suspecting node: 737422d6
2018-07-04 16:46:08 140454904243968 [Note] WSREP: evs::proto(62116b35, GATHER, view_id(REG,62116b35,54)) suspected node without join message, declaring inactive

Then, Galera will revise the current cluster view and the position of this node:

2018-07-04 16:46:09 140454904243968 [Note] WSREP: view(view_id(NON_PRIM,62116b35,54) memb {
        62116b35,0
} joined {
} left {
} partitioned {
        737422d6,0
        8b2041d6,0
})
2018-07-04 16:46:09 140454904243968 [Note] WSREP: view(view_id(NON_PRIM,62116b35,55) memb {
        62116b35,0
} joined {
} left {
} partitioned {
        737422d6,0
        8b2041d6,0
})

With the new cluster view, Galera will perform quorum calculation to decide whether this node is part of the primary component. If the new component sees "primary = no", Galera will demote the local node state from SYNCED to OPEN:

2018-07-04 16:46:09 140454288942848 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-07-04 16:46:09 140454288942848 [Note] WSREP: Flow-control interval: [16, 16]
2018-07-04 16:46:09 140454288942848 [Note] WSREP: Trying to continue unpaused monitor
2018-07-04 16:46:09 140454288942848 [Note] WSREP: Received NON-PRIMARY.
2018-07-04 16:46:09 140454288942848 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 2753699)

With the latest change on the cluster view and node state, Galera returns the post-eviction cluster view and global state as below:

2018-07-04 16:46:09 140454222194432 [Note] WSREP: New cluster view: global state: 55238f52-41ee-11e8-852f-3316bdb654bc:2753699, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2018-07-04 16:46:09 140454222194432 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

You can see the following global status of galera2 have changed during this period:

mysql> SELECT * FROM information_schema.global_status WHERE variable_name IN ('WSREP_CLUSTER_STATUS','WSREP_LOCAL_STATE_COMMENT','WSREP_CLUSTER_SIZE','WSREP_EVS_DELAYED','WSREP_READY');
+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+
| VARIABLE_NAME             | VARIABLE_VALUE                                                                                                                    |
+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+
| WSREP_CLUSTER_SIZE        | 1                                                                                                                                 |
| WSREP_CLUSTER_STATUS      | non-Primary                                                                                                                       |
| WSREP_EVS_DELAYED         | 737422d6-7db3-11e8-a2a2-bbe98913baf0:tcp://192.168.55.171:4567:1,8b2041d6-7f62-11e8-87d5-12a76678131f:tcp://192.168.55.173:4567:2 |
| WSREP_LOCAL_STATE_COMMENT | Initialized                                                                                                                       |
| WSREP_READY               | OFF                                                                                                                               |
+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+

At this point, MySQL/MariaDB server on galera2 is still accessible (database is listening on 3306 and Galera on 4567) and you can query the mysql system tables and list out the databases and tables. However when you jump into the non-system tables and make a simple query like this:

mysql> SELECT * FROM sbtest1;
ERROR 1047 (08S01): WSREP has not yet prepared node for application use

You will immediately get an error indicating WSREP is loaded but not ready to use by this node, as reported by wsrep_ready status. This is due to the node losing its connection to the Primary Component and it enters the non-operational state (the local node status was changed from SYNCED to OPEN). Data reads from nodes in a non-operational state are considered stale, unless you set wsrep_dirty_reads=ON to permit reads, although Galera still rejects any command that modifies or updates the database.

Finally, Galera will keep on listening and reconnecting to other members in the background infinitely:

2018-07-04 16:47:12 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') reconnecting to 8b2041d6 (tcp://192.168.55.173:4567), attempt 30
2018-07-04 16:47:13 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') reconnecting to 737422d6 (tcp://192.168.55.171:4567), attempt 30
2018-07-04 16:48:20 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') reconnecting to 8b2041d6 (tcp://192.168.55.173:4567), attempt 60
2018-07-04 16:48:22 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') reconnecting to 737422d6 (tcp://192.168.55.171:4567), attempt 60

The eviction process flow by Galera group communication for the partitioned node during network issue can be summarized as below:

  1. Disconnects from the cluster after gmcast.peer_timeout.
  2. Suspects other nodes after evs.suspect_timeout.
  3. Retrieves the new cluster view.
  4. Performs quorum calculation to determine the node's state.
  5. Demotes the node from SYNCED to OPEN.
  6. Attempts to reconnect to the primary component (other Galera nodes) in the background.

From Primary Component Perspective

On galera1 and galera3 respectively, after gmcast.peer_timeout=PT3S, the following appears in the MySQL error log:

2018-07-04 16:46:05 139955510687488 [Note] WSREP: (8b2041d6, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.55.172:4567
2018-07-04 16:46:06 139955510687488 [Note] WSREP: (8b2041d6, 'tcp://0.0.0.0:4567') reconnecting to 62116b35 (tcp://192.168.55.172:4567), attempt 0

After it passed evs.suspect_timeout = PT5S, galera2 is suspected as dead by galera3 (and galera1):

2018-07-04 16:46:10 139955510687488 [Note] WSREP: evs::proto(8b2041d6, OPERATIONAL, view_id(REG,62116b35,54)) suspecting node: 62116b35
2018-07-04 16:46:10 139955510687488 [Note] WSREP: evs::proto(8b2041d6, OPERATIONAL, view_id(REG,62116b35,54)) suspected node without join message, declaring inactive

Galera checks out if the other nodes respond to the group communication on galera3, it finds galera1 is in primary and stable state:

2018-07-04 16:46:11 139955510687488 [Note] WSREP: declaring 737422d6 at tcp://192.168.55.171:4567 stable
2018-07-04 16:46:11 139955510687488 [Note] WSREP: Node 737422d6 state prim

Galera revises the cluster view of this node (galera3):

2018-07-04 16:46:11 139955510687488 [Note] WSREP: view(view_id(PRIM,737422d6,55) memb {
        737422d6,0
        8b2041d6,0
} joined {
} left {
} partitioned {
        62116b35,0
})
2018-07-04 16:46:11 139955510687488 [Note] WSREP: save pc into disk

Galera then removes the partitioned node from the Primary Component:

2018-07-04 16:46:11 139955510687488 [Note] WSREP: forgetting 62116b35 (tcp://192.168.55.172:4567)

The new Primary Component is now consisted of two nodes, galera1 and galera3:

2018-07-04 16:46:11 139955502294784 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2

The Primary Component will exchange the state between each other to agree on the new cluster view and global state:

2018-07-04 16:46:11 139955502294784 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2018-07-04 16:46:11 139955510687488 [Note] WSREP: (8b2041d6, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-07-04 16:46:11 139955502294784 [Note] WSREP: STATE EXCHANGE: sent state msg: b3d38100-7f66-11e8-8e70-8e3bf680c993
2018-07-04 16:46:11 139955502294784 [Note] WSREP: STATE EXCHANGE: got state msg: b3d38100-7f66-11e8-8e70-8e3bf680c993 from 0 (192.168.55.171)
2018-07-04 16:46:11 139955502294784 [Note] WSREP: STATE EXCHANGE: got state msg: b3d38100-7f66-11e8-8e70-8e3bf680c993 from 1 (192.168.55.173)

Galera calculates and verifies the quorum of the state exchange between online members:

2018-07-04 16:46:11 139955502294784 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 27,
        members    = 2/2 (joined/total),
        act_id     = 2753703,
        last_appl. = 2753606,
        protocols  = 0/8/3 (gcs/repl/appl),
        group UUID = 55238f52-41ee-11e8-852f-3316bdb654bc
2018-07-04 16:46:11 139955502294784 [Note] WSREP: Flow-control interval: [23, 23]
2018-07-04 16:46:11 139955502294784 [Note] WSREP: Trying to continue unpaused monitor

Galera updates the new cluster view and global state after galera2 eviction:

2018-07-04 16:46:11 139955214169856 [Note] WSREP: New cluster view: global state: 55238f52-41ee-11e8-852f-3316bdb654bc:2753703, view# 28: Primary, number of nodes: 2, my index: 1, protocol version 3
2018-07-04 16:46:11 139955214169856 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-07-04 16:46:11 139955214169856 [Note] WSREP: REPL Protocols: 8 (3, 2)
2018-07-04 16:46:11 139955214169856 [Note] WSREP: Assign initial position for certification: 2753703, protocol version: 3
2018-07-04 16:46:11 139956691814144 [Note] WSREP: Service thread queue flushed.
Clean up the partitioned node (galera2) from the active list:
2018-07-04 16:46:14 139955510687488 [Note] WSREP: cleaning up 62116b35 (tcp://192.168.55.172:4567)

At this point, both galera1 and galera3 will be reporting similar global status:

mysql> SELECT * FROM information_schema.global_status WHERE variable_name IN ('WSREP_CLUSTER_STATUS','WSREP_LOCAL_STATE_COMMENT','WSREP_CLUSTER_SIZE','WSREP_EVS_DELAYED','WSREP_READY');
+---------------------------+------------------------------------------------------------------+
| VARIABLE_NAME             | VARIABLE_VALUE                                                   |
+---------------------------+------------------------------------------------------------------+
| WSREP_CLUSTER_SIZE        | 2                                                                |
| WSREP_CLUSTER_STATUS      | Primary                                                          |
| WSREP_EVS_DELAYED         | 1491abd9-7f6d-11e8-8930-e269b03673d8:tcp://192.168.55.172:4567:1 |
| WSREP_LOCAL_STATE_COMMENT | Synced                                                           |
| WSREP_READY               | ON                                                               |
+---------------------------+------------------------------------------------------------------+

They list out the problematic member in the wsrep_evs_delayed status. Since the local state is "Synced", these nodes are operational and you can redirect the client connections from galera2 to any of them. If this step is inconvenient, consider using a load balancer sitting in front of the database to simplify the connection endpoint from the clients.

Node Recovery and Joining

A partitioned Galera node will keep on attempting to establish connection with the Primary Component infinitely. Let's flush the iptables rules on galera2 to let it connect with the remaining nodes:

# on galera2
$ iptables -F

Once the node is capable of connecting to one of the nodes, Galera will start re-establishing the group communication automatically:

2018-07-09 10:46:34 140075962705664 [Note] WSREP: (1491abd9, 'tcp://0.0.0.0:4567') connection established to 8b2041d6 tcp://192.168.55.173:4567
2018-07-09 10:46:34 140075962705664 [Note] WSREP: (1491abd9, 'tcp://0.0.0.0:4567') connection established to 737422d6 tcp://192.168.55.171:4567
2018-07-09 10:46:34 140075962705664 [Note] WSREP: declaring 737422d6 at tcp://192.168.55.171:4567 stable
2018-07-09 10:46:34 140075962705664 [Note] WSREP: declaring 8b2041d6 at tcp://192.168.55.173:4567 stable

Node galera2 will then connect to one of the Primary Component (in this case is galera1, node ID 737422d6) to get the current cluster view and nodes state:

2018-07-09 10:46:34 140075962705664 [Note] WSREP: Node 737422d6 state prim
2018-07-09 10:46:34 140075962705664 [Note] WSREP: view(view_id(PRIM,1491abd9,142) memb {
        1491abd9,0
        737422d6,0
        8b2041d6,0
} joined {
} left {
} partitioned {
})
2018-07-09 10:46:34 140075962705664 [Note] WSREP: save pc into disk

Galera will then perform state exchange with the rest of the members that can form the Primary Component:

2018-07-09 10:46:34 140075954312960 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2018-07-09 10:46:34 140075954312960 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 4b23eaa0-8322-11e8-a87e-fe4e0fce2a5f
2018-07-09 10:46:34 140075954312960 [Note] WSREP: STATE EXCHANGE: sent state msg: 4b23eaa0-8322-11e8-a87e-fe4e0fce2a5f
2018-07-09 10:46:34 140075954312960 [Note] WSREP: STATE EXCHANGE: got state msg: 4b23eaa0-8322-11e8-a87e-fe4e0fce2a5f from 0 (192.168.55.172)
2018-07-09 10:46:34 140075954312960 [Note] WSREP: STATE EXCHANGE: got state msg: 4b23eaa0-8322-11e8-a87e-fe4e0fce2a5f from 1 (192.168.55.171)
2018-07-09 10:46:34 140075954312960 [Note] WSREP: STATE EXCHANGE: got state msg: 4b23eaa0-8322-11e8-a87e-fe4e0fce2a5f from 2 (192.168.55.173)

The state exchange allows galera2 to calculate the quorum and produce the following result:

2018-07-09 10:46:34 140075954312960 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 71,
        members    = 2/3 (joined/total),
        act_id     = 2836958,
        last_appl. = 0,
        protocols  = 0/8/3 (gcs/repl/appl),
        group UUID = 55238f52-41ee-11e8-852f-3316bdb654bc

Galera will then promote the local node state from OPEN to PRIMARY, to start and establish the node connection to the Primary Component:

2018-07-09 10:46:34 140075954312960 [Note] WSREP: Flow-control interval: [28, 28]
2018-07-09 10:46:34 140075954312960 [Note] WSREP: Trying to continue unpaused monitor
2018-07-09 10:46:34 140075954312960 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 2836958)

As reported by the above line, Galera calculates the gap on how far the node is behind from the cluster. This node requires state transfer to catch up to writeset number 2836958 from 2761994:

2018-07-09 10:46:34 140075929970432 [Note] WSREP: State transfer required:
        Group state: 55238f52-41ee-11e8-852f-3316bdb654bc:2836958
        Local state: 55238f52-41ee-11e8-852f-3316bdb654bc:2761994
2018-07-09 10:46:34 140075929970432 [Note] WSREP: New cluster view: global state: 55238f52-41ee-11e8-852f-3316bdb654bc:2836958, view# 72: Primary, number of nodes:
3, my index: 0, protocol version 3
2018-07-09 10:46:34 140075929970432 [Warning] WSREP: Gap in state sequence. Need state transfer.
2018-07-09 10:46:34 140075929970432 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-07-09 10:46:34 140075929970432 [Note] WSREP: REPL Protocols: 8 (3, 2)
2018-07-09 10:46:34 140075929970432 [Note] WSREP: Assign initial position for certification: 2836958, protocol version: 3

Galera prepares the IST listener on port 4568 on this node and asks any Synced node in the cluster to become a donor. In this case, Galera automatically picks galera3 (192.168.55.173), or it could also pick a donor from the list under wsrep_sst_donor (if defined) for the syncing operation:

2018-07-09 10:46:34 140075996276480 [Note] WSREP: Service thread queue flushed.
2018-07-09 10:46:34 140075929970432 [Note] WSREP: IST receiver addr using tcp://192.168.55.172:4568
2018-07-09 10:46:34 140075929970432 [Note] WSREP: Prepared IST receiver, listening at: tcp://192.168.55.172:4568
2018-07-09 10:46:34 140075954312960 [Note] WSREP: Member 0.0 (192.168.55.172) requested state transfer from '*any*'. Selected 2.0 (192.168.55.173)(SYNCED) as donor.

It will then change the local node state from PRIMARY to JOINER. At this stage, galera2 is granted with state transfer request and starts to cache write-sets:

2018-07-09 10:46:34 140075954312960 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2836958)
2018-07-09 10:46:34 140075929970432 [Note] WSREP: Requesting state transfer: success, donor: 2
2018-07-09 10:46:34 140075929970432 [Note] WSREP: GCache history reset: 55238f52-41ee-11e8-852f-3316bdb654bc:2761994 -> 55238f52-41ee-11e8-852f-3316bdb654bc:2836958
2018-07-09 10:46:34 140075929970432 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): full reset

Node galera2 starts receiving the missing writesets from the selected donor's gcache (galera3):

2018-07-09 10:46:34 140075954312960 [Note] WSREP: 2.0 (192.168.55.173): State transfer to 0.0 (192.168.55.172) complete.
2018-07-09 10:46:34 140075929970432 [Note] WSREP: Receiving IST: 74964 writesets, seqnos 2761994-2836958
2018-07-09 10:46:34 140075593627392 [Note] WSREP: Receiving IST...  0.0% (    0/74964 events) complete.
2018-07-09 10:46:34 140075954312960 [Note] WSREP: Member 2.0 (192.168.55.173) synced with group.
2018-07-09 10:46:34 140075962705664 [Note] WSREP: (1491abd9, 'tcp://0.0.0.0:4567') connection established to 737422d6 tcp://192.168.55.171:4567
2018-07-09 10:46:41 140075962705664 [Note] WSREP: (1491abd9, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-07-09 10:46:44 140075593627392 [Note] WSREP: Receiving IST... 36.0% (27008/74964 events) complete.
2018-07-09 10:46:54 140075593627392 [Note] WSREP: Receiving IST... 71.6% (53696/74964 events) complete.
2018-07-09 10:47:02 140075593627392 [Note] WSREP: Receiving IST...100.0% (74964/74964 events) complete.
2018-07-09 10:47:02 140075929970432 [Note] WSREP: IST received: 55238f52-41ee-11e8-852f-3316bdb654bc:2836958
2018-07-09 10:47:02 140075954312960 [Note] WSREP: 0.0 (192.168.55.172): State transfer from 2.0 (192.168.55.173) complete.

Once all the missing writesets are received and applied, Galera will promote galera2 as JOINED until seqno 2837012:

2018-07-09 10:47:02 140075954312960 [Note] WSREP: Shifting JOINER -> JOINED (TO: 2837012)
2018-07-09 10:47:02 140075954312960 [Note] WSREP: Member 0.0 (192.168.55.172) synced with group.

The node applies any cached writesets in its slave queue and finishes catching up with the cluster. Its slave queue is now empty. Galera will promote galera2 to SYNCED, indicating the node is now operational and ready to serve clients:

2018-07-09 10:47:02 140075954312960 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 2837012)
2018-07-09 10:47:02 140076605892352 [Note] WSREP: Synchronized with group, ready for connections

At this point, all nodes are back operational. You can verify by using the following statements on galera2:

mysql> SELECT * FROM information_schema.global_status WHERE variable_name IN ('WSREP_CLUSTER_STATUS','WSREP_LOCAL_STATE_COMMENT','WSREP_CLUSTER_SIZE','WSREP_EVS_DELAYED','WSREP_READY');
+---------------------------+----------------+
| VARIABLE_NAME             | VARIABLE_VALUE |
+---------------------------+----------------+
| WSREP_CLUSTER_SIZE        | 3              |
| WSREP_CLUSTER_STATUS      | Primary        |
| WSREP_EVS_DELAYED         |                |
| WSREP_LOCAL_STATE_COMMENT | Synced         |
| WSREP_READY               | ON             |
+---------------------------+----------------+

The wsrep_cluster_size reported as 3 and the cluster status is Primary, indicating galera2 is part of the Primary Component. The wsrep_evs_delayed has also been cleared and the local state is now Synced.

The recovery process flow for the partitioned node during network issue can be summarized as below:

  1. Re-establishes group communication to other nodes.
  2. Retrieves the cluster view from one of the Primary Component.
  3. Performs state exchange with the Primary Component and calculates the quorum.
  4. Changes the local node state from OPEN to PRIMARY.
  5. Calculates the gap between local node and the cluster.
  6. Changes the local node state from PRIMARY to JOINER.
  7. Prepares IST listener/receiver on port 4568.
  8. Requests state transfer via IST and picks a donor.
  9. Starts receiving and applying the missing writeset from chosen donor's gcache.
  10. Changes the local node state from JOINER to JOINED.
  11. Catches up with the cluster by applying the cached writesets in the slave queue.
  12. Changes the local node state from JOINED to SYNCED.
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Cluster Failure

A Galera Cluster is considered failed if no primary component (PC) is available. Consider a similar three-node Galera Cluster as depicted in the diagram below:

A cluster is considered operational if all nodes or majority of the nodes are online. Online means they are able to see each other through Galera's replication traffic or group communication. If no traffic is coming in and out from the node, the cluster will send a heartbeat beacon for the node to response in a timely manner. Otherwise, it will be put into the delay or suspected list according to how the node responses.

If a node goes down, let's say node C, the cluster will remain operational because node A and B are still in quorum with 2 votes out of 3 to form a primary component. You should get the following cluster state on A and B:

mysql> SHOW STATUS LIKE 'wsrep_cluster_status';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| wsrep_cluster_status | Primary |
+----------------------+---------+

If let's say a primary switch went kaput, as illustrated in the following diagram:

At this point, every single node loses communication to each other, and the cluster state will be reported as non-Primary on all nodes (as what happened to galera2 in the previous case). Every node would calculate the quorum and find out that it is the minority (1 vote out of 3) thus losing the quorum, which means no Primary Component is formed and consequently all nodes refuse to serve any data. This is deemed as cluster failure.

Once the network issue is resolved, Galera will automatically re-establish the communication between members, exchange node's states and determine the possibility of reforming the primary component by comparing node state, UUIDs and seqnos. If the probability is there, Galera will merge the primary components as shown in the following lines:

2018-06-27  0:16:57 140203784476416 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2018-06-27  0:16:57 140203784476416 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2018-06-27  0:16:57 140203784476416 [Note] WSREP: STATE EXCHANGE: sent state msg: 5885911b-795c-11e8-8683-931c85442c7e
2018-06-27  0:16:57 140203784476416 [Note] WSREP: STATE EXCHANGE: got state msg: 5885911b-795c-11e8-8683-931c85442c7e from 0 (192.168.55.171)
2018-06-27  0:16:57 140203784476416 [Note] WSREP: STATE EXCHANGE: got state msg: 5885911b-795c-11e8-8683-931c85442c7e from 1 (192.168.55.172)
2018-06-27  0:16:57 140203784476416 [Note] WSREP: STATE EXCHANGE: got state msg: 5885911b-795c-11e8-8683-931c85442c7e from 2 (192.168.55.173)
2018-06-27  0:16:57 140203784476416 [Warning] WSREP: Quorum: No node with complete state:

        Version      : 4
        Flags        : 0x3
        Protocols    : 0 / 8 / 3
        State        : NON-PRIMARY
        Desync count : 0
        Prim state   : SYNCED
        Prim UUID    : 5224a024-791b-11e8-a0ac-8bc6118b0f96
        Prim  seqno  : 5
        First seqno  : 112714
        Last  seqno  : 112725
        Prim JOINED  : 3
        State UUID   : 5885911b-795c-11e8-8683-931c85442c7e
        Group UUID   : 55238f52-41ee-11e8-852f-3316bdb654bc
        Name         : '192.168.55.171'
        Incoming addr: '192.168.55.171:3306'

        Version      : 4
        Flags        : 0x2
        Protocols    : 0 / 8 / 3
        State        : NON-PRIMARY
        Desync count : 0
        Prim state   : SYNCED
        Prim UUID    : 5224a024-791b-11e8-a0ac-8bc6118b0f96
        Prim  seqno  : 5
        First seqno  : 112714
        Last  seqno  : 112725
        Prim JOINED  : 3
        State UUID   : 5885911b-795c-11e8-8683-931c85442c7e
        Group UUID   : 55238f52-41ee-11e8-852f-3316bdb654bc
        Name         : '192.168.55.172'
        Incoming addr: '192.168.55.172:3306'

        Version      : 4
        Flags        : 0x2
        Protocols    : 0 / 8 / 3
        State        : NON-PRIMARY
        Desync count : 0
        Prim state   : SYNCED
        Prim UUID    : 5224a024-791b-11e8-a0ac-8bc6118b0f96
        Prim  seqno  : 5
        First seqno  : 112714
        Last  seqno  : 112725
        Prim JOINED  : 3
        State UUID   : 5885911b-795c-11e8-8683-931c85442c7e
        Group UUID   : 55238f52-41ee-11e8-852f-3316bdb654bc
        Name         : '192.168.55.173'
        Incoming addr: '192.168.55.173:3306'

2018-06-27  0:16:57 140203784476416 [Note] WSREP: Full re-merge of primary 5224a024-791b-11e8-a0ac-8bc6118b0f96 found: 3 of 3.
2018-06-27  0:16:57 140203784476416 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 5,
        members    = 3/3 (joined/total),
        act_id     = 112725,
        last_appl. = 112722,
        protocols  = 0/8/3 (gcs/repl/appl),
        group UUID = 55238f52-41ee-11e8-852f-3316bdb654bc
2018-06-27  0:16:57 140203784476416 [Note] WSREP: Flow-control interval: [28, 28]
2018-06-27  0:16:57 140203784476416 [Note] WSREP: Trying to continue unpaused monitor
2018-06-27  0:16:57 140203784476416 [Note] WSREP: Restored state OPEN -> SYNCED (112725)
2018-06-27  0:16:57 140202564110080 [Note] WSREP: New cluster view: global state: 55238f52-41ee-11e8-852f-3316bdb654bc:112725, view# 6: Primary, number of nodes: 3, my index: 2, protocol version 3

A good indicator to know if the re-bootstrapping process is OK is by looking at the following line in the error log:

[Note] WSREP: Synchronized with group, ready for connections

ClusterControl Auto Recovery

ClusterControl comes with node and cluster automatic recovery features, because it oversees and understands the state of all nodes in the cluster. Automatic recovery is by default enabled if the cluster is deployed using ClusterControl. To enable or disable the cluster, simply clicking on the power icon in the summary bar as shown below:

Green icon means automatic recovery is turned on, while red is the opposite. You can monitor the recovery progress from the Activity -> Jobs dialog, like in this case, galera2 was totally inaccessible due to firewall blocking, thus forcing ClusterControl to report the following:

The recovery process will only be commencing after a graceful timeout (30 seconds) to give Galera node a chance to recover itself beforehand. If ClusterControl fails to recover a node or cluster, it will first pull all MySQL error logs from all accessible nodes and will raise the necessary alarms to notify the user via email or by pushing critical events to the third-party integration modules like PagerDuty, VictorOps or Slack. Manual intervention is then required. For Galera Cluster, ClusterControl will keep on trying to recover the failure until you mark the node as under maintenance, or disable the automatic recovery feature.

ClusterControl's automatic recovery is one of most favorite features as voted by our users. It helps you to take the necessary actions quickly, with a complete report on what has been attempted and recommendation steps to troubleshoot further on the issue. For users with support subscriptions, you can look for extra hands by escalating this issue to our technical support team for assistance.

Conclusion

Galera automatic node recovery and membership control are neat features to simplify the cluster management, improve the database reliability and reduce the risk of human error, as commonly haunting other open-source database replication technology like MySQL Replication, Group Replication and PostgreSQL Streaming/Logical Replication.

by ashraf at July 11, 2018 02:28 PM

Peter Zaitsev

Percona Server for MongoDB 3.6.5-1.3 Is Now Available

MongoRocks

Percona Server for MongoDBPercona announces the release of Percona Server for MongoDB 3.6.5-1.3 on July 11, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.6 Community Edition. It supports MongoDB 3.6 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine, as well as several enterprise-grade features. Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.6.5 and does not include any additional changes.

The Percona Server for MongoDB 3.6.5-1.3 release notes are available in the official documentation.

The post Percona Server for MongoDB 3.6.5-1.3 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at July 11, 2018 02:25 PM

AMD EPYC Performance Testing… or Don’t get on the wrong side of SystemD

Ubuntu 16 AMD EPYC

Ever since AMD released their EPYC CPU for servers I wanted to test it, but I did not have the opportunity until recently, when Packet.net started offering bare metal servers for a reasonable price. So I started a couple of instances to test Percona Server for MySQL under this CPU. In this benchmark, I discovered some interesting discrepancies in performance between  AMD and Intel CPUs when running under systemd .

The set up

To test CPU performance, I used a read-only in-memory sysbench OLTP benchmark, as it burns CPU cycles and no IO is performed by Percona Server.

For this benchmark I used Packet.net c2.medium.x86 instances powered by AMD EPYC 7401P processors. The OS is exposed to 48 CPU threads.

For the OS I tried

  • Ubuntu 16.04 with default kernel 4.4 and upgraded to 4.15
  • Ubuntu 18.04 with kernel 4.15
  • Percona Server started from SystemD and without SystemD (for reasons which will become apparent later)

To have some points for comparison, I also ran a similar workload on my 2 socket Intel CPU server, with CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. I recognize this is not most recent Intel CPU, but this was the best I had at the time, and it also gave 48 CPU Threads.

Ubuntu 16

First, let’s review the results for Ubuntu 16

Or in tabular format:

Threads Ubuntu 16, kernel 4.4; systemd Ubuntu 16, kernel 4.4;

NO systemd

Ubuntu 16, kernel 4.15
1 943.44 948.7 899.82
2 1858.58 1792.36 1774.45
4 3533.2 3424.05 3555.94
8 6762.35 6731.57 7010.51
12 10012.18 9950.3 10062.82
16 13063.39 13043.55 12893.17
20 15347.68 15347.56 14756.27
24 16886.24 16864.81 16176.76
30 18150.2 18160.07 17860.5
36 18923.06 18811.64 19528.27
42 19374.86 19463.08 21537.79
48 20110.81 19983.05 23388.18
56 20548.51 20362.31 23768.49
64 20860.51 20729.52 23797.14
72 21123.71 21001.06 23645.95
80 21370 21191.24 23546.03
90 21622.54 21441.73 23486.29
100 21806.67 21670.38 23392.72
128 22161.42 22031.53 23315.33
192 22388.51 22207.26 22906.42
256 22091.17 21943.37 22305.06
512 19524.41 19381.69 19181.71

 

There are few conclusions we can see from this data

  1. AMD EPYC CPU scales quite well to the number of CPU Threads
  2. The recent kernel helps to boost the throughput.

Ubuntu 18.04

Now, let’s review the results for Ubuntu 18.04

Threads Ubuntu 18, systemd
Ubuntu 18, NO systemd
1 833.14 843.68
2 1684.21 1693.93
4 3346.42 3359.82
8 6592.88 6597.48
12 9477.92 9487.93
16 12117.12 12149.17
20 13934.27 13933
24 15265.1 15152.74
30 16846.02 16061.16
36 18488.88 16726.14
42 20493.57 17360.56
48 22217.47 17906.4
56 22564.4 17931.83
64 22590.29 17902.95
72 22472.75 17857.73
80 22421.99 17766.76
90 22300.09 17773.57
100 22275.24 17646.7
128 22131.86 17411.55
192 21750.8 17134.63
256 21177.25 16826.53
512 18296.61 17418.72

 

This is where the result surprised me: on Ubuntu 18.04 with SystemD running Percona Server for MySQL as a service the throughput was up to 24% better than if Percona Server for MySQL is started from a bash shell. I do not know exactly what causes this dramatic difference—systemd uses different slices for services and user commands, and somehow it affects the performance.

Baseline benchmark

To establish a baseline, I ran the same benchmark on my Intel box, running Ubuntu 16, and I tried two kernels: 4.13 and 4.15

Threads Ubuntu 16, kernel 4.13, systemd Ubuntu 16, kernel 4.15, systemd
Ubuntu 16, kernel 4.15, NO systemd
1 820.07 798.42 864.21
2 1563.31 1609.96 1681.91
4 2929.63 3186.01 3338.47
8 6075.73 6279.49 6624.49
12 8743.38 9256.18 9622.6
16 10580.14 11351.31 11984.64
20 12790.96 12599.78 14147.1
24 14213.68 14659.49 15716.61
30 15983.78 16096.03 17530.06
36 17574.46 18098.36 20085.9
42 18671.14 19808.92 21875.84
48 19431.05 22036.06 23986.08
56 19737.92 22115.34 24275.72
64 19946.57 21457.32 24054.09
72 20129.7 21729.78 24167.03
80 20214.93 21594.51 24092.86
90 20194.78 21195.61 23945.93
100 20753.44 21597.26 23802.16
128 20235.24 20684.34 23476.82
192 20280.52 20431.18 23108.36
256 20410.55 20952.64 22775.63
512 20953.73 22079.18 23489.3

 

Here we see the opposite result with SystemD: Percona Server running from a bash shell shows the better throughput compared with the SystemD service. So for some reason, systemd works differently for AMD and Intel CPUs. Please let me know if you have any ideas on how to deal with the impact that systemd has on performance.

Conclusions

So there are some conclusions from these results:

  1. AMD EPYC shows a decent performance scalability; the new kernel helps to improve it
  2. systemd shows different effects on throughput for AMD and Intel CPUs
  3. With AMD the throughput declines for a high concurrent workload with 512 threads, while Intel does not show a decline.

The post AMD EPYC Performance Testing… or Don’t get on the wrong side of SystemD appeared first on Percona Database Performance Blog.

by Vadim Tkachenko at July 11, 2018 11:52 AM