Planet MariaDB

January 17, 2019

Oli Sennhauser

MariaDB/MySQL Environment MyEnv 2.0.2 has been released

FromDual has the pleasure to announce the release of the new version 2.0.2 of its popular MariaDB, Galera Cluster and MySQL multi-instance environment MyEnv.

The new MyEnv can be downloaded here. How to install MyEnv is described in the MyEnv Installation Guide.

In the inconceivable case that you find a bug in the MyEnv please report it to the FromDual bug tracker.

Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.

Upgrade from 1.1.x to 2.0

Please look at the MyEnv 2.0.0 Release Notes.

Upgrade from 2.0.x to 2.0.2

shell> cd ${HOME}/product
shell> tar xf /download/myenv-2.0.2.tar.gz
shell> rm -f myenv
shell> ln -s myenv-2.0.2 myenv

Plug-ins

If you are using plug-ins for showMyEnvStatus create all the links in the new directory structure:

shell> cd ${HOME}/product/myenv
shell> ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/

Upgrade of the instance directory structure

From MyEnv 1.0 to 2.0 the directory structure of instances has fundamentally changed. Nevertheless MyEnv 2.0 works fine with MyEnv 1.0 directory structures.

Changes in MyEnv 2.0.2

MyEnv

  • Error message fixed.
  • bind_address 0.0.0.0 is optimized to *.
  • State up and down are coloured now.
  • Complaint on missing symbolic link to my.cnf added.
  • New start-timeout configuration variable added. Important for Galera SST.
  • Default MariaDB my.cnf hash added to avoid complaints.
  • mysqld is consistently searched in sbin, bin and libexec now for RHEL/CentOS 7 compatibility.
  • Avoid EGPCS error messages during MyEnv start/stop.
  • Not used aReleaseVersion removed, side effect is to not have performance issues any more on up in huge MyEnv set-ups with older MySQL releases.

MyEnv Installer

  • Function answerQuestion on previous error message works now.
  • Try and catch for existing configuration file improved.
  • Default answer is "q" on error and instance name and blacklist name check is fixed.
  • myenv.conf backup file has a correct timestamp now.
  • Create symlink to datadir for my.cnf.
  • Purge of database is done from instancedir and not datadir any more.

MyEnv Utilities

  • galera_monitor.sh output made nicer.
  • Script az_test.php added, initial test found already a bug in MariaDB 10.3.
  • Script slave_monitor.sh added.
  • Option check made more careful for drop_partition.php and merge_partition.php.
  • Timestamp problem fixed for year change in split_partition.php.

For subscriptions of commercial use of MyEnv please get in contact with us.

by Shinguz at January 17, 2019 06:35 PM

January 16, 2019

Peter Zaitsev

Percona Server for MongoDB 3.2.22-3.13 Is Now Available

Percona Server for MongoDB Operator

Percona Server for MongoDB 3.2

Percona is glad to announce the release of Percona Server for MongoDB 3.2.22-3.13 on January, 16 2019. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.2 Community Edition. It supports MongoDB 3.2 protocols and drivers.

Percona Server for MongoDB extends the functionality of MongoDB Community Edition by including the Percona Memory Engine storage engine, as well as several enterprise-grade features. Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.2.22. There are no additional improvements or new features on top of those upstream fixes.

The Percona Server for MongoDB 3.2.22-3.13 release notes are available in the official documentation.

by Borys Belinsky at January 16, 2019 04:04 PM

January 15, 2019

Valeriy Kravchuk

Using dbdeployer With MariaDB Server

Some time ago I've noted that one of the tools I use for testing various MySQL and MariaDB cases and to reproduce potential bugs, MySQL-Sandbox, is not updated any more. It turned out that active development switched to its port in Go called dbdeployer. You can find detailed information about dbdeployer and reasons behind developing it provided by its author, Giuseppe Maxia, here and there. See also this post at Percona blog for some quick review of its main features. One of the points of dbdeployer (and reasons to use Go) is that it is built once (per platform supported) somewhere and then binaries are downloaded from GitHub and used everywhere, without any problems with dependencies etc.

I've added checking dbdeployer to my long ToDo list, as I planned to use it (if not MySQL Sandbox) for some tests and posts related to resolving typical practical problems with MariaDB GTID-based replication. Yesterday I've allocated some time to finally try it and, as usual, I've started with building it from source (as for me databases-related software that I can not build from source on my test systems is not any attractive as something new to study and use). I was immediately surprised by the lack of instructions on how to do this at GitHub, no Makefile of any kind etc. All I was able to find is build.sh script. Correction: just check README.md on how to build it properly, as Giuseppe Maxia explained in the comment.

Good, regular structure is important for deployment
Fortunately this is not the first project written in Go that I try to build (or change somehow and then build). The first one was this replication manager (that has proper build instructions in docs). So, I though I knew what to do. I've installed missing golang package on my netbook with Ubuntu 14.04 that I had at hand and tried the following typical steps:
openxs@ao756:~/go$ export GOPATH=$HOME/go
openxs@ao756:~/go$ echo $GOPATH
/home/openxs/go
openxs@ao756:~/go$ go get github.com/datacharmer/dbdeployer
# github.com/datacharmer/dbdeployer/common
src/github.com/datacharmer/dbdeployer/common/strutils.go:170: undefined: sort.Slice
...
That was a bit surprising, but quick Google search shown that this could be caused by outdated (pre-1.8) version of golang package. So, dbdeployer requires golang 1.8 or newer and there was no such package for my good old Ubuntu (it has some 1.2.x only). One day I'll upgrade it, but so far I am OK with 14.04 for all other testing purposes, so I had to give up on the idea to build from source temporary. 

Today during few free minutes I've retried on my good old desktop box with Fedora 27 (where I surely built some Go project(s) successfully):
[openxs@fc23 go]$ uname -a
Linux fc23 4.18.19-100.fc27.x86_64 #1 SMP Wed Nov 14 22:04:34 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[openxs@fc23 ~]$ ls go
pkg  src
[openxs@fc23 ~]$ echo $GOPATH

[openxs@fc23 ~]$ export GOPATH=$HOME/go
[openxs@fc23 ~]$ cd go
[openxs@fc23 go]$ go versiongo version go1.9.7 linux/amd64
This environment should work for build, so I've proceeded with:
[openxs@fc23 go]$ go get github.com/datacharmer/dbdeployer
[openxs@fc23 go]$ ls src/github.com/datacharmer/   jmoiron/       nsf/           tanji/
go-sql-driver/ mattn/         ogier/
[openxs@fc23 go]$ ls src/github.com/datacharmer/dbdeployer/
abbreviations/ compare/       docs/          mkreadme/      test/
.build/        concurrent/    .git/          rest/          unpack/
cmd/           cookbook/      .github/       sandbox/       vendor/
common/        defaults/      globals/       scripts/
Now let's try that scripts/build.sh with linux as a parameter, as it's a way to build Linux binaries based on what I found:
[openxs@fc23 go]$ MKDOCS=1 src/github.com/datacharmer/dbdeployer/scripts/build.sh linux
+ env GOOS=linux GOARCH=386 go build --tags docs -o dbdeployer-1.17.0-docs.linux .
+ env GOOS=linux GOARCH=amd64 go build -o sort_versions.linux sort_versions.go
/home/openxs/go/src/github.com/datacharmer/dbdeployer
-rwxrwxr-x. 1 openxs openxs 8.1M Jan 14 10:27 dbdeployer-1.17.0-docs.linux
-rw-rw-r--. 1 openxs openxs 3.0M Jan 14 10:27 dbdeployer-1.17.0-docs.linux.tar.gz
[openxs@fc23 go]$ ls
bin  pkg  src
[openxs@fc23 go]$ ls bin
dbdeployer
[openxs@fc23 go]$ bin/dbdeployer --version
dbdeployer version 1.17.0
Now we know how to build dbdeployer from source, if needed. If some dependencies are missing you'll be informed and similar go get ... command should allow to install it.

I was somewhat surprised to see MariaDB NOT mentioned at all in README.md. It says:
"DBdeployer is a tool that deploys MySQL database servers easily."
while good old MySQL-Sandbox also mentions MariaDB explicitly:
"This package is a sandbox for testing features under any version of MySQL from 3.23 to 8.0 (and any version of MariaDB.)"
So, my idea was to double check that dbdeployer is both MySQL-Sandbox compatible and MariaDB compatible (it is). I have several sandboxes already created in the past. I also have MariaDB 10.2.21 .tar.gz binaries that I want to use with dbdeployer for further testing:
[openxs@fc23 go]$ ls ~/sandboxes/
clear_all        rsandbox_mariadb-10_0_19  send_kill_all  test_replication
plugin.conf      rsandbox_mariadb-10_1_12  start_all      use_all
restart_all      rsandbox_mysql-8_0_12     status_all
rsandbox_8_0_12  sandbox_action            stop_all
[openxs@fc23 go]$ ls ~/*.tar.gz
/home/openxs/galera-25.3.22-x86_64.tar.gz
/home/openxs/galera-25.3.24-x86_64.tar.gz
/home/openxs/galera-25.3.25-glibc_214-x86_64.tar.gz
/home/openxs/mariadb-10.2.12-linux-x86_64.tar.gz
/home/openxs/mariadb-10.2.21-linux-x86_64.tar.gz
With dbdeployer one has to unpack .tar.gz first with dbdeployer unpack command. So, I tried it immediately:
[openxs@fc23 go]$ bin/dbdeployer unpack ~/mariadb-10.2.21-linux-x86_64.tar.gz
directory '/home/openxs/opt/mysql' not found
You should create it or provide an alternate base directory using --sandbox-binary
It seems the tool now wants to use ~/opt/mysql as a directory to unpack to, while MySQL_Sandbox silently used ~:
[openxs@fc23 go]$ ls ~ | grep 8.0
8.0.12
I made a lame try to force it to use ~, but failed for the reason I was too lazy to study:
[openxs@fc23 go]$ bin/dbdeployer --sandbox-binary=/home/openxs unpack /home/openxs/mariadb-10.2.21-linux-x86_64.tar.gzUnpacking tarball /home/openxs/mariadb-10.2.21-linux-x86_64.tar.gz to $HOME/10.2.21
.........100.........200....&tar.Header{Name:"mariadb-10.2.21-linux-x86_64/mysql-test/mysql-test-run", Mode:511, Uid:1021, Gid:1004, Size:0, ModTime:time.Time{wall:0x0, ext:63681810892, loc:(*time.Location)(0xa47aa0)}, Typeflag:0x32, Linkname:"./mysql-test-run.pl", Uname:"dbart", Gname:"my", Devmajor:0, Devminor:0, AccessTime:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, ChangeTime:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Xattrs:map[string]string(nil)}
#ERROR: symlink ./mysql-test-run.pl mariadb-10.2.21-linux-x86_64/mysql-test/mysql-test-run: file exists
I just created ~/opt/mysql and proceeded with default configuration. After unpack step completed successfully I've proceeded with deploy step to create new replication sandbox:
[openxs@fc23 go]$ bin/dbdeployer unpack /home/openxs/mariadb-10.2.21-linux-x86_64.tar.gz
Unpacking tarball /home/openxs/mariadb-10.2.21-linux-x86_64.tar.gz to $HOME/opt/mysql/10.2.21
.........100.........200.........300.........400.........500.........600.........700.........800... ...
Renaming directory /home/openxs/opt/mysql/mariadb-10.2.21-linux-x86_64 to /home/openxs/opt/mysql/10.2.21

[openxs@fc23 go]$ bin/dbdeployer deploy replication 10.2.21
Installing and starting master
. sandbox server started
Installing and starting slave1
. sandbox server started
Installing and starting slave2
. sandbox server started
$HOME/sandboxes/rsandbox_10_2_21/initialize_slaves
initializing slave 1
initializing slave 2
Replication directory installed in $HOME/sandboxes/rsandbox_10_2_21
run 'dbdeployer usage multiple' for basic instructions'
We have access to nice enough documentation:
[openxs@fc23 go]$ bin/dbdeployer usage multiple
 USING MULTIPLE SERVER SANDBOX
On a replication sandbox, you have the same commands (run "dbdeployer usage single"),
with an "_all" suffix, meaning that you propagate the command to all the members.
Then you have "./m" as a shortcut to use the master, "./s1" and "./s2" to access
the slaves (and "s3", "s4" ... if you define more).

In group sandboxes without a master slave relationship (group replication and
multiple sandboxes) the nodes can be accessed by ./n1, ./n2, ./n3, and so on.

start_all    [options] > starts all nodes
status_all             > get the status of all nodes
restart_all  [options] > restarts all nodes
stop_all               > stops all nodes
use_all         "SQL"  > runs a SQL statement in all nodes
use_all_masters "SQL"  > runs a SQL statement in all masters
use_all_slaves "SQL"   > runs a SQL statement in all slaves
clear_all              > stops all nodes and removes all data
m                      > invokes MySQL client in the master
s1, s2, n1, n2         > invokes MySQL client in slave 1, 2, node 1, 2


The scripts "check_slaves" or "check_nodes" give the status of replication in the sandbox.
Typical sandbox directory (with some differences like use_all_slaves etc) is created in ~/sandboxes/ and shortcut commands work as expected:
[openxs@fc23 go]$ cd ~/sandboxes/rsandbox_10_2_21/
[openxs@fc23 rsandbox_10_2_21]$ ls
check_slaves       n2           sbdescription.json  test_sb_all
clear_all          node1        send_kill_all       use_all
initialize_slaves  node2        start_all           use_all_masters
m                  restart_all  status_all          use_all_slaves
master             s1           stop_all
n1                 s2           test_replication

[openxs@fc23 rsandbox_10_2_21]$ ls master/
add_option    init_db         restart             show_binlog    status   use
clear         load_grants     sbdescription.json  show_log       stop
data          my              sb_include          show_relaylog  test_sb
grants.mysql  my.sandbox.cnf  send_kill           start          tmp

[openxs@fc23 rsandbox_10_2_21]$ ls ../rsandbox_mariadb-10_1_12/
check_slaves             m       node1        s2             test_replication
clear_all                master  node2        send_kill_all  use_all
connection.json          n1      README       start_all
default_connection.json  n2      restart_all  status_all
initialize_slaves        n3      s1           stop_all

[openxs@fc23 rsandbox_10_2_21]$ ls ../rsandbox_mariadb-10_1_12/master/
add_option       default_connection.json  my              send_kill      tmp
change_paths     grants_5_7_6.mysql       mycli           show_binlog    use
change_ports     grants.mysql             my.sandbox.cnf  show_relaylog  USING
clear            json_in_db               proxy_start     start
connection.json  load_grants              README          status
data             msb                      restart         stop

[openxs@fc23 rsandbox_10_2_21]$ ./status_all
REPLICATION  /home/openxs/sandboxes/rsandbox_10_2_21
master : master on  -  port     23322 (23322)
node1 : node1 on  -  port       23323 (23323)
node2 : node2 on  -  port       23324 (23324)
[openxs@fc23 rsandbox_10_2_21]$ ./use_all "show variables like 'gtid%'"
# master
Variable_name   Value
gtid_binlog_pos 0-100-12
gtid_binlog_state       0-100-12
gtid_current_pos        0-100-12
gtid_domain_id  0
gtid_ignore_duplicates  OFF
gtid_seq_no     0
gtid_slave_pos
gtid_strict_mode        OFF
# server: 1
Variable_name   Value
gtid_binlog_pos
gtid_binlog_state
gtid_current_pos        0-100-12
gtid_domain_id  0
gtid_ignore_duplicates  OFF
gtid_seq_no     0
gtid_slave_pos  0-100-12
gtid_strict_mode        OFF
# server: 2
Variable_name   Value
gtid_binlog_pos
gtid_binlog_state
gtid_current_pos        0-100-12
gtid_domain_id  0
gtid_ignore_duplicates  OFF
gtid_seq_no     0
gtid_slave_pos  0-100-12
gtid_strict_mode        OFF
[openxs@fc23 rsandbox_10_2_21]$ ./m
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 14
Server version: 10.2.21-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

master [localhost:23322] {msandbox} ((none)) > show master status;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |     2835 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

master [localhost:23322] {msandbox} ((none)) > show variables like 'gtid%';
+------------------------+----------+
| Variable_name          | Value    |
+------------------------+----------+
| gtid_binlog_pos        | 0-100-12 |
| gtid_binlog_state      | 0-100-12 |
| gtid_current_pos       | 0-100-12 |
| gtid_domain_id         | 0        |
| gtid_ignore_duplicates | OFF      |
| gtid_seq_no            | 0        |
| gtid_slave_pos         |          |
| gtid_strict_mode       | OFF      |
+------------------------+----------+
8 rows in set (0.00 sec)

master [localhost:23322] {msandbox} ((none)) > exit
Bye
For my further tests I needed slaves to have log_slave_updates enabled and gtid_strict_mode=ON. So, I've added these settings to my.sandbox.cnf in node1 and node2 subdirectories for both slaves and restarted them:
[openxs@fc23 rsandbox_10_2_21]$ ./restart_all# executing 'stop' on /home/openxs/sandboxes/rsandbox_10_2_21
stop /home/openxs/sandboxes/rsandbox_10_2_21/node1
stop /home/openxs/sandboxes/rsandbox_10_2_21/node2
stop /home/openxs/sandboxes/rsandbox_10_2_21/master
# executing 'start' on /home/openxs/sandboxes/rsandbox_10_2_21
executing 'start' on master
. sandbox server started
executing 'start' on slave 1
. sandbox server started
executing 'start' on slave 2
. sandbox server started
I need a table to play with and I want to check that slaves are in sync:
[openxs@fc23 rsandbox_10_2_21]$ ./m
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 12
Server version: 10.2.21-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

master [localhost:23322] {msandbox} ((none)) > use test
Database changed
master [localhost:23322] {msandbox} (test) > create table t1(id int primary key, c1 int);
Query OK, 0 rows affected (0.17 sec)

master [localhost:23322] {msandbox} (test) > exit
Bye
[openxs@fc23 rsandbox_10_2_21]$ ./use_all "show variables like 'gtid%'"
# master
Variable_name   Value
gtid_binlog_pos 0-100-13
gtid_binlog_state       0-100-13
gtid_current_pos        0-100-13
gtid_domain_id  0
gtid_ignore_duplicates  OFF
gtid_seq_no     0
gtid_slave_pos
gtid_strict_mode        OFF
# server: 1
Variable_name   Value
gtid_binlog_pos 0-100-13
gtid_binlog_state       0-100-13
gtid_current_pos        0-100-13
gtid_domain_id  0
gtid_ignore_duplicates  OFF
gtid_seq_no     0
gtid_slave_pos  0-100-13
gtid_strict_mode        ON
# server: 2
Variable_name   Value
gtid_binlog_pos 0-100-13
gtid_binlog_state       0-100-13
gtid_current_pos        0-100-13
gtid_domain_id  0
gtid_ignore_duplicates  OFF
gtid_seq_no     0
gtid_slave_pos  0-100-13
gtid_strict_mode        ON
[openxs@fc23 rsandbox_10_2_21]$
Note the value of gtid_current_pos on master and gtid_slave_pos on each slave. They are the same and slaves are in sync. If you want to find out more about the format of GTIDs in MariaDB or all that gtid% server variables, please, check this KB article.

* * *

To summarize, dbdeployer is a nice port of MySQL-Sandbox into Go, with some additional features. It can be easily built from source if you have golang version 1.8 or newer (or just downloaded if you have not). Sandboxes created with dbdeployer may co-exists with older sandboxes in the same default directory (but .tar.gz files are unpacked into different directory by default). It still works well with MariaDB. I am going to use replication sandboxes built with it for some further testing of various real life use cases and problems of MariaDB's GTIDs implementation (that may be presented in further posts).

by Valeriy Kravchuk (noreply@blogger.com) at January 15, 2019 05:57 PM

Peter Zaitsev

Customizing Per-Process Metrics in PMM

Process Memory Usage - a filtered graph in PMM

If you have set up per-process metrics in Percona Monitoring and Management, you may have found yourself in need of tuning it further to not only group processes together, but to display some of them in isolation. In this blogpost we will explore how to modify the rules for grouping processes, so that you can make the most out of this awesome PMM integration.

Let’s say you have followed the link above on how to set up the per-process metrics integration on PMM, and you have imported the dashboard to show these metrics. You will see something like the following:

PMM database and system monitoring and management software

This is an internal testing server we use, in which you can see a high number of VBoxHeadless (29) and mysqld (99) processes running. All the metrics in the dashboard will be grouped by the name of the command used. But, what if we want to see metrics for only one of these processes in isolation? As things stand, we will not be able to do so. It may not make sense to do so in a testing environment, but if you are running multiple mysqld processes (or mongos, postgres, etc) bound to different ports, you may want to see metrics for each of them separately.

Modifying the configuration file

Enter all.yaml!

In the process-exporter documentation on using a configuration file, we can see the following:

The general format of the -config.path YAML file is a top-level process_names section, containing a list of name matchers. […] A process may only belong to one group: even if multiple items would match, the first one listed in the file wins.

This means that even if we have two rules that would match a process, only the first one will be taken into account. This will allow us to both list processes by themselves, and not miss any non-grouped process. How? Let’s imagine we have the following processes running:

mysqld --port=1
mysqld --port=2
mysqld --port=3
mysqld --port=4

And we wanted to be able to tell apart the instances running in ports 1 and 2 from the other ones, we could use the following rules:

- name: "mysqld_port_1"
 cmdline:
 - '.*mysqld.*port=1.*'
- name: "mysqld_port_2"
 cmdline:
 - '.*mysqld.*port=2.*'
- name: "{{.Comm}}"
 cmdline:
 - '.+'

In cmdline we will need the regular expression against which to match the process command running. In this case, we made use of the fact that they were using different ports, but any difference in the command strings can be used. The last rule is the one that will default to “anything else” (with the regular expression that matches anything).

The default rule at the end will make sure you don’t miss any other process, so unless you want only some processes metrics collected, you should always have a rule for it.

A real life working example of configuring per-process metrics

In case all these generic information didn’t make much sense, we will present a concrete example, hoping that it will make everything fit together nicely.

In this example we want to have the mysqld instance using the mysql_sandbox16679.sock socket isolated from all the others, and the VM with ID finishing in 97eafa2795da listed by their own. All other processes are to be grouped together by using the basename of the executable.

You can check the output from ps aux to see the full command used. For instance:

shell> ps aux | grep 97eafa2795da
agustin+ 27785  0.7 0.2 5619280 542536 ?      Sl Nov28 228:24 /usr/lib/virtualbox/VBoxHeadless --comment centos_node1_1543443575974_22181 --startvm a0151e29-35dd-4c14-8e37-97eafa2795da --vrde config

So, we can use the following regular expression for it (we use .* to match any string):

.*VBoxHeadless.*97eafa2795da.*

The same applies to the regular expression for the mysqld process.

The configuration file will end up looking like:

shell>  cat /etc/process-exporter/all.yaml
process_names:
 - name: "Custom VBox"
   cmdline:
   - '.*VBoxHeadless.*97eafa2795da.*'
 - name: "Custom MySQL"
   cmdline:
   - '.*mysqld.*mysql_sandbox16679.sock.*'
 - name: "{{.Comm}}"
   cmdline:
   - '.+'

Let’s restart the service, so that new changes apply, and we will check the graphs after five minutes, to see new changes. Note that you may have to reload the page for the changes to apply.

shell> systemctl restart process-exporter

After refreshing, we will see the new list of processes in the drop-down list:

A new list of processes in PMM after filtering

And after we select them, we will be able to see data for those processes in particular:

Thanks to the default configuration at the end, we are still capturing data from all the other mysqld processes. However, they will have their own group, as mentioned before:

System Processes Metrics graph in PMM

 

by Agustín at January 15, 2019 04:20 PM

January 14, 2019

Peter Zaitsev

Upcoming Webinar Thurs 1/17: How to Rock with MyRocks

How to Rock with MyRocks

How to Rock with MyRocksPlease join Percona’s Chief Technology Officer, Vadim Tkachenko, as he presents How to Rock with MyRocks on Thursday, January 17th at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

Register Now

MyRocks is a new storage engine from Facebook and is available in Percona Server for MySQL. In what cases will you want to use it? We will check different workloads and when MyRocks is most suitable for you. Also, as for any new engine, it’s important to set it up and tune it properly. So, we will review the most important settings to pay attention to.

Register for this webinar to learn How to Rock with MyRocks.

by Vadim Tkachenko at January 14, 2019 09:35 PM

Should You Use ClickHouse as a Main Operational Database?

ClickHouse as a main operational database

ClickHouse as a main operational databaseFirst of all, this post is not a recommendation but more like a “what if” story. What if we use ClickHouse (which is a columnar analytical database) as our main datastore? Well, typically, an analytical database is not a replacement for a transactional or key/value datastore. However, ClickHouse is super efficient for timeseries and provides “sharding” out of the box (scalability beyond one node).  So can we use it as our main datastore?

Let’s imagine we are running a webservice and provide a public API. Public API as -a-service has become a good business model: examples include social networks like Facebook/Twitter, messaging as a service like Twilio, and even credit card authorization platforms like Marqeta. Let’s also imagine we need to store all messages (SMS messages, email messages, etc) we are sending and allow our customers to get various information about the message. This information can be a mix of analytical (OLAP) queries (i.e. how many messages was send for some time period and how much it cost) and a typical key/value queries like: “return 1 message by the message id”.

Using a columnar analytical database can be a big challenge here. Although such databases can be very efficient with counts and averages, some queries will be slow or simply non existent. Analytical databases are optimized for a low number of slow queries. The most important limitations of the analytical databases are:

  1. Deletes and updates are non-existent or slow
  2. Inserts are efficient for bulk inserts only
  3. No secondary indexes means that point selects (select by ID) tend to be very slow

This is all true for ClickHouse, however, we may be able to live with it for our task.

To simulate text messages I have used ~3 billion of reddit comments (10 years from 2007 to 2017), downloaded from pushshift.io . Vadim published a blog post about analyzing reddit comments with ClickHouse. In my case, I’m using this data as a simulation of text messages, and will show how we can use ClickHouse as a backend for an API.

Loading the JSON data to Clickhouse

I used the following table in Clickhouse to load all data:

CREATE TABLE reddit.rc(
body String,
score_hidden Nullable(UInt8),
archived Nullable(UInt8),
name String,
author String,
author_flair_text Nullable(String),
downs Nullable(Int32),
created_utc UInt32,
subreddit_id String,
link_id Nullable(String),
parent_id Nullable(String),
score Nullable(Int16),
retrieved_on Nullable(UInt32),
controversiality Nullable(Int8),
gilded Nullable(Int8),
id String,
subreddit String,
ups Nullable(Int16),
distinguished Nullable(String),
author_flair_css_class Nullable(String),
stickied Nullable(UInt8),
edited Nullable(UInt8)
) ENGINE = MergeTree() PARTITION BY toYYYYMM(toDate(created_utc)) ORDER BY created_utc ;

Then I used the following command to load the JSON data (downloaded from pushshift.io) to ClickHouse:

$ bzip2 -d -c RC_20*.bz2 | clickhouse-client --input_format_skip_unknown_fields 1 --input_format_allow_errors_num 1000000 -d reddit -n --query="INSERT INTO rc FORMAT JSONEachRow"

The data on disk in ClickHouse is not significantly larger than compressed files, which is great:

#  du -sh /data/clickhouse/data/reddit/rc/
638G    /data/clickhouse/data/reddit/rc/
# du -sh /data/reddit/
404G    /data/reddit/

We have ~4 billion rows:

SELECT
    toDate(min(created_utc)),
    toDate(max(created_utc)),
    count(*)
FROM rc
┌─toDate(min(created_utc))─┬─toDate(max(created_utc))─┬────count()─┐
│               2006-01-01 │               2018-05-31 │ 4148248585 │
└──────────────────────────┴──────────────────────────┴────────────┘
1 rows in set. Elapsed: 11.554 sec. Processed 4.15 billion rows, 16.59 GB (359.02 million rows/s., 1.44 GB/s.)

The data is partitioned and sorted by created_utc so queries which include created_utc will be able to using partition pruning: therefore skip the not-needed partitions. However, let’s say our API needs to support the following features, which are not common for analytical databases:

  1. Selecting a single comment/message by ID
  2. Retrieving the last 10 or 100 of the messages/comments
  3. Updating a single message in the past (e.g. in the case of messages, we may need to update the final price; in the case of comments, we may need to upvote or downvote a comment)
  4. Deleting messages
  5. Text search

With the latest ClickHouse version, all of these features are available, but some of them may not perform fast enough.

Retrieving a single row in ClickHouse

Again, this is not a typical operation in any analytical database, those databases are simply not optimized for it. ClickHouse does not have secondary indexes, and we are using created_utc as a primary key (sort by). So, selecting a message by just ID will require a full table scan:

SELECT
    id,
    created_utc
FROM rc
WHERE id = 'dbumnpz'
┌─id──────┬─created_utc─┐
│ dbumnpz │  1483228800 │
└─────────┴─────────────┘
1 rows in set. Elapsed: 18.070 sec. Processed 4.15 billion rows, 66.37 GB (229.57 million rows/s., 3.67 GB/s.)

Only if we know the timestamp (created_utc)… Then it will be lighting fast: ClickHouse will use the primary key:

SELECT *
FROM rc
WHERE (id = 'dbumnpz') AND (created_utc = 1483228800)
...
1 rows in set. Elapsed: 0.010 sec. Processed 8.19 thousand rows, 131.32 KB (840.27 thousand rows/s., 13.47 MB/s.)

Actually, we can simulate an additional index set by creating a materialized view in ClickHouse:

create materialized view rc_id_v
ENGINE MergeTree() PARTITION BY toYYYYMM(toDate(created_utc)) ORDER BY (id)
POPULATE AS SELECT id, created_utc from rc;

Here I’m creating a materialized view and populating it initially from the main (rc) table. The view will be updated automatically when there are any inserts into table reddit.rc. The view is actually another MergeTree table sorted by id. Now we can use this query:

SELECT *
FROM rc
WHERE (id = 'dbumnpz') AND (created_utc =
(
    SELECT created_utc
    FROM rc_id_v
    WHERE id = 'dbumnpz'
))
...
1 rows in set. Elapsed: 0.053 sec. Processed 8.19 thousand rows, 131.32 KB (153.41 thousand rows/s., 2.46 MB/s.)

This is a single query which will join our materialized view to pass the created_utc (timestamp) to the original table. It is a little bit slower but still less than 100ms response time.

Using this trick (materialized views) we can potentially simulate other indexes.

Retrieving the last 10 messages

This is where ClickHouse is not very efficient. Let’s say we want to retrieve the last 10 comments:

SELECT
    id,
    created_utc
FROM rc
ORDER BY created_utc DESC
LIMIT 10
┌─id──────┬─created_utc─┐
│ dzwso7l │  1527811199 │
│ dzwso7j │  1527811199 │
│ dzwso7k │  1527811199 │
│ dzwso7m │  1527811199 │
│ dzwso7h │  1527811199 │
│ dzwso7n │  1527811199 │
│ dzwso7o │  1527811199 │
│ dzwso7p │  1527811199 │
│ dzwso7i │  1527811199 │
│ dzwso7g │  1527811199 │
└─────────┴─────────────┘
10 rows in set. Elapsed: 24.281 sec. Processed 4.15 billion rows, 82.96 GB (170.84 million rows/s., 3.42 GB/s.)

In a conventional relational database (like MySQL) this can be done by reading a btree index sequentially from the end, as the index is sorted (like “tail” command on linux). In a partitioned massively parallel database system, the storage format and sorting algorithm may not be optimized for that operation as we are reading multiple partitions in parallel. Currently, an issue has been opened to make the “tailing” based on the primary key much faster: slow order by primary key with small limit on big data. As a temporary workaround we can do something like this:

SELECT count()
FROM rc
WHERE (created_utc > (
(
    SELECT max(created_utc)
    FROM rc
) - ((60 * 60) * 24))) AND (subreddit = 'programming')
┌─count()─┐
│    1248 │
└─────────┘
1 rows in set. Elapsed: 4.510 sec. Processed 3.05 million rows, 56.83 MB (675.38 thousand rows/s., 12.60 MB/s.) ```

It is still a five seconds query. Hopefully, this type of query will become faster in ClickHouse.

Updating / deleting data in ClickHouse

The latest ClickHouse version allows running update/delete in the form of “ALTER TABLE .. UPDATE / DELETE” (it is called mutations in ClickHouse terms). For example, we may want to upvote a specific comment.

SELECT score
FROM rc_2017
WHERE (id = 'dbumnpz') AND (created_utc =
(
    SELECT created_utc
    FROM rc_id_v
    WHERE id = 'dbumnpz'
))
┌─score─┐
│     2 │
└───────┘
1 rows in set. Elapsed: 0.048 sec. Processed 8.19 thousand rows, 131.08 KB (168.93 thousand rows/s., 2.70 MB/s.)
:) alter table rc_2017 update score = score +1 where id =  'dbumnpz' and created_utc = (select created_utc from rc_id_v where id =  'dbumnpz');
ALTER TABLE rc_2017
    UPDATE score = score + 1 WHERE (id = 'dbumnpz') AND (created_utc =
    (
        SELECT created_utc
        FROM rc_id_v
        WHERE id = 'dbumnpz'
    ))
Ok.
0 rows in set. Elapsed: 0.052 sec.

“Mutation” queries will return immediately and will be executed asynchronously. We can see the progress by reading from the system.mutations table:

select * from system.mutations\G
SELECT *
FROM system.mutations
Row 1:
──────
database:                   reddit
table:                      rc_2017
mutation_id:                mutation_857.txt
command:                    UPDATE score = score + 1 WHERE (id = 'dbumnpz') AND (created_utc = (SELECT created_utc FROM reddit.rc_id_v  WHERE id = 'dbumnpz'))
create_time:                2018-12-27 22:22:05
block_numbers.partition_id: ['']
block_numbers.number:       [857]
parts_to_do:                0
is_done:                    1
1 rows in set. Elapsed: 0.002 sec.

Now we can try deleting comments that have been marked for deletion (body showing “[deleted]”):

ALTER TABLE rc_2017
    DELETE WHERE body = '[deleted]'
Ok.
0 rows in set. Elapsed: 0.002 sec.
:) select * from system.mutations\G
SELECT *
FROM system.mutations
...
Row 2:
──────
database:                   reddit
table:                      rc_2017
mutation_id:                mutation_858.txt
command:                    DELETE WHERE body = '[deleted]'
create_time:                2018-12-27 22:41:01
block_numbers.partition_id: ['']
block_numbers.number:       [858]
parts_to_do:                64
is_done:                    0
2 rows in set. Elapsed: 0.017 sec.

After a while, we can do the count again:

:) select * from system.mutations\G
SELECT *
FROM system.mutations
...
Row 2:
──────
database:                   reddit
table:                      rc_2017
mutation_id:                mutation_858.txt
command:                    DELETE WHERE body = '[deleted]'
create_time:                2018-12-27 22:41:01
block_numbers.partition_id: ['']
block_numbers.number:       [858]
parts_to_do:                0
is_done:                    1

As we can see our “mutation” is done.

Text analysis

ClickHouse does not offer full text search, however we can use some text functions. In my previous blog post about ClickHouse I used it to find the most popular wikipedia page of the month. This time I’m trying to find the news keywords of the year using all reddit comments: basically I’m calculating the most frequently used new words for the specific year (algorithm based on an article about finding trending topics using Google Books n-grams data). To do that I’m using the ClickHouse function alphaTokens(body) which will split the “body” field into words. From there, I can count the words or use arrayJoin to create a list (similar to MySQL’s group_concat function). Here is the example:

First I created a table word_by_year_news:

create table word_by_year_news ENGINE MergeTree() PARTITION BY y ORDER BY (y) as
select a.w as w, b.y as y, sum(a.occurrences)/b.total as ratio from
(
select
 lower(arrayJoin(alphaTokens(body))) as w,
 toYear(toDate(created_utc)) as y,
 count() as occurrences
from rc
where body <> '[deleted]'
and created_utc < toUnixTimestamp('2018-01-01 00:00:00')
and created_utc >= toUnixTimestamp('2007-01-01 00:00:00')
and subreddit in ('news', 'politics', 'worldnews')
group by w, y
having length(w) > 4
) as a
ANY INNER JOIN
(
select
 toYear(toDate(created_utc)) as y,
 sum(length(alphaTokens(body))) as total
from rc
where body <> '[deleted]'
and subreddit in ('news', 'politics', 'worldnews')
and created_utc < toUnixTimestamp('2018-01-01 00:00:00')
and created_utc >= toUnixTimestamp('2007-01-01 00:00:00')
group by y
) AS b
ON a.y = b.y
group by
  a.w,
  b.y,
  b.total;
0 rows in set. Elapsed: 787.032 sec. Processed 7.35 billion rows, 194.32 GB (9.34 million rows/s., 246.90 MB/s.)

This will store all frequent words (I’m filtering by subreddits; the examples are: “news, politics and worldnews” or “programming”) as well as its occurrence this year; actually I want to store “relative” occurrence which is called “ratio” above: for each word I divide its occurrence by the number of total words this year (this is needed as the number of comments grows significantly year by year).

Now we can actually calculate the words of the year:

SELECT
    groupArray(w) as words,
    y + 1 as year
FROM
(
    SELECT
        w,
        CAST((y - 1) AS UInt16) AS y,
        ratio AS a_ratio
    FROM word_by_year_news
    WHERE ratio > 0.00001
) AS a
ALL INNER JOIN
(
    SELECT
        w,
        y,
        ratio AS b_ratio
    FROM word_by_year_news
    WHERE ratio > 0.00001
) AS b USING (w, y)
WHERE (y > 0) AND (a_ratio / b_ratio > 3)
GROUP BY y
ORDER BY
    y
LIMIT 100;
10 rows in set. Elapsed: 0.232 sec. Processed 14.61 million rows, 118.82 MB (63.01 million rows/s., 512.29 MB/s.)

And the results are (here I’m grouping words for each year):

For “programming” subreddit:

┌─year─┬─words─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 2007 │ ['audio','patents','swing','phones','gmail','opera','devices','phone','adobe','vista','backup','mercurial','mobile','passwords','scala','license','copyright','licenses','photoshop'] │
│ 2008 │ ['webkit','twitter','teacher','android','itunes']                                                                                                                                     │
│ 2009 │ ['downvotes','upvote','drupal','android','upvoted']                                                                                                                                   │
│ 2010 │ ['codecs','imgur','floppy','codec','adobe','android']                                                                                                                                 │
│ 2011 │ ['scala','currency','println']                                                                                                                                                        │
│ 2013 │ ['voting','maven']                                                                                                                                                                    │
│ 2014 │ ['compose','xamarin','markdown','scrum','comic']                                                                                                                                      │
│ 2015 │ ['china','sourceforge','subscription','chinese','kotlin']                                                                                                                             │
│ 2016 │ ['systemd','gitlab','autotldr']                                                                                                                                                       │
│ 2017 │ ['offices','electron','vscode','blockchain','flash','collision']                                                                                                                      │
└──────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

For news subreddit:

┌─year─┬─words──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 2008 │ ['michigan','delegates','obama','alaska','georgia','russians','hamas','biden','hussein','barack','elitist','mccain']                                                                                                                                       │
│ 2009 │ ['stimulus','reform','medicare','franken','healthcare','payer','insurance','downvotes','hospitals','patients','option','health']                                                                                                                           │
│ 2010 │ ['blockade','arizona']                                                                                                                                                                                                                                     │
│ 2011 │ ['protests','occupy','romney','weiner','protesters']                                                                                                                                                                                                       │
│ 2012 │ ['santorum','returns','martin','obamacare','romney']                                                                                                                                                                                                       │
│ 2013 │ ['boston','chemical','surveillance']                                                                                                                                                                                                                       │
│ 2014 │ ['plane','poland','radar','subreddits','palestinians','putin','submission','russia','automoderator','compose','rockets','palestinian','hamas','virus','removal','russians','russian']                                                                      │
│ 2015 │ ['refugees','refugee','sanders','debates','hillary','removal','participating','removed','greece','clinton']                                                                                                                                                │
│ 2016 │ ['morons','emails','opponent','establishment','trump','reply','speeches','presidency','clintons','electoral','donald','trumps','downvote','november','subreddit','shill','domain','johnson','classified','bernie','nominee','users','returns','primaries','foundation','voters','autotldr','clinton','email','supporter','election','feedback','clever','leaks','accuse','candidate','upvote','rulesandregs','convention','conduct','uncommon','server','trolls','supporters','hillary'] │
│ 2017 │ ['impeached','downvotes','monitored','accusations','alabama','violation','treason','nazis','index','submit','impeachment','troll','collusion','bannon','neutrality','permanent','insults','violations']                                                    │
└──────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Conclusion

ClickHouse is a great massively parallel analytical system. It is extremely efficient and can potentially (with some hacks) be used as a main backend database powering a public API gateway serving both realtime and analytical queries. At the same time, it was not originally designed that way. Let me know in the comments if you are using ClickHouse for this or similar projects.


Photo by John Baker on Unsplash

 

 

by Alexander Rubin at January 14, 2019 04:34 PM

January 13, 2019

Valeriy Kravchuk

Understanding Status of MariaDB Server JIRA Issues

In my previous blog post on MariaDB's JIRA for MySQL users who are familiar with MySQL bugs database (but may be new to JIRA) I've presented some details about statuses that JIRA issues may have. There is no one to one correspondence with MySQL bug's statuses that I once described in details here. In case of MariaDB Server bugs ("JIRA issues") one may have to check not only "Status" field, but also "Resolution" filed and even "Labels" field to quickly understand what is the real status and what MariaDB engineers decided or are waiting for. So, I think some additional clarifications may help MySQL users who check or report MariaDB bugs as well.

Let me present details of this statuses correspondence in a simple table, where the first column contains MySQL's bug status, while 3 other columns contain the content of corresponding MariaDB Server JIRA issue's fields, "Status", "Resolution" and "Labels". There is also "Comment" column with some explanation on what else is usually done in JIRA issue when it gets this set of values defining its status or what this may mean in MySQL bugs database etc. Most important MySQL bug statuses are taken from this my post (there are more of them, but others are rarely used, especially when real work on bugs was moved into internal bugs database by Oracle, or were removed since that post as it happened to "To be fixed later").

MySQL Bug StatusMariaDB JIRA StatusMariaDB JIRA ResolutionMariaDB JIRA LabelComment
OpenOPENUnresolvedTypical status for just reported bug
ClosedCLOSEDFixedYou should see list of versions that got the fix in the Fix Version/s field
DuplicateCLOSEDDuplicateSo, in MariaDB it's "closed as a duplicate"
AnalyzingOPENUnresolvedUsually bug is assigned when some engineer is working on it, including analysis stage
VerifiedCONFIRMEDUnresolvedCONFIRMED bugs are usually assigned in JIRA while in MySQL "Verified" bugs are usually unassigned
Won't fixCLOSEDWon't FixUsually remains assigned
Can't repeatCLOSEDCannot reproduceUnlike in MySQL, usually means that both engineer and bug reporter are not able to reproduce this
No FeedbackCLOSEDIncompleteneed_feedbackAs in MySQL, bug should stay with "need_feedback" label for some time before it's closed as incomplete
Need FeedbackOPENUnresolvedneed_feedbackUsually in the last comment in the bug you can find out what kind of feedback is required. No automatic setting to "No Feedback" in 30 days
Not a BugCLOSEDNot a Bug 
UnsupportedCLOSEDWon't FixThere is no special "Unsupported" status in MariaDB. Most likely when there is a reason NOT to fix it's stated in the comment.

In the table above you can click on some links to see the list of MariaDB bugs with the status discussed in the table row. This is how I am going to use this post from now on, as a quick search starting point :) It will also be mentioned on one of slides of my upcoming FOSDEM 2019 talk.

by Valeriy Kravchuk (noreply@blogger.com) at January 13, 2019 06:03 PM

January 11, 2019

Peter Zaitsev

AWS Aurora MySQL – HA, DR, and Durability Explained in Simple Terms

It’s a few weeks after AWS re:Invent 2018 and my head is still spinning from all of the information released at this year’s conference. This year I was able to enjoy a few sessions focused on Aurora deep dives. In fact, I walked away from the conference realizing that my own understanding of High Availability (HA), Disaster Recovery (DR), and Durability in Aurora had been off for quite a while. Consequently, I decided to put this blog out there, both to collect the ideas in one place for myself, and to share them in general. Unlike some of our previous blogs, I’m not focused on analyzing Aurora performance or examining the architecture behind Aurora. Instead, I want to focus on how HA, DR, and Durability are defined and implemented within the Aurora ecosystem.  We’ll get just deep enough into the weeds to be able to examine these capabilities alone.

introducing the aurora storage engine 1

Aurora MySQL – What is it?

We’ll start with a simplified discussion of what Aurora is from a very high level.  In its simplest description, Aurora MySQL is made up of a MySQL-compatible compute layer and a multi-AZ (multi availability zone) storage layer. In the context of an HA discussion, it is important to start at this level, so we understand the redundancy that is built into the platform versus what is optional, or configurable.

Aurora Storage

The Aurora Storage layer presents a volume to the compute layer. This volume is built out in 10GB increments called protection groups.  Each protection group is built from six storage nodes, two from each of three availability zones (AZs).  These are represented in the diagram above in green.  When the compute layer—represented in blue—sends a write I/O to the storage layer, the data gets replicated six times across three AZs.

Durable by Default

In addition to the six-way replication, Aurora employs a 4-of-6 quorum for all write operations. This means that for each commit that happens at the database compute layer, the database node waits until it receives write acknowledgment from at least four out of six storage nodes. By receiving acknowledgment from four storage nodes, we know that the write has been saved in at least two AZs.  The storage layer itself has intelligence built-in to ensure that each of the six storage nodes has a copy of the data. This does not require any interaction with the compute tier. By ensuring that there are always at least four copies of data, across at least two datacenters (AZs), and ensuring that the storage nodes are self-healing and always maintain six copies, it can be said that the Aurora Storage platform has the characteristic of Durable by Default.  The Aurora storage architecture is the same no matter how large or small your Aurora compute architecture is.

One might think that waiting to receive four acknowledgments represents a lot of I/O time and is therefore an expensive write operation.  However, Aurora database nodes do not behave the way a typical MySQL database instance would. Some of the round-trip execution time is mitigated by the way in which Aurora MySQL nodes write transactions to disk. For more information on exactly how this works, check out Amazon Senior Engineering Manager, Kamal Gupta’s deep-dive into Aurora MySQL from AWS re:Invent 2018.

HA and DR Options

While durability can be said to be a default characteristic to the platform, HA and DR are configurable capabilities. Let’s take a look at some of the HA and DR options available. Aurora databases are deployed as members of an Aurora DB Cluster. The cluster configuration is fairly flexible. Database nodes are given the roles of either Writer or Reader. In most cases, there will only be one Writer node. The Reader nodes are known as Aurora Replicas. A single Aurora Cluster may contain up to 15 Aurora Replicas. We’ll discuss a few common configurations and the associated levels of HA and DR which they provide. This is only a sample of possible configurations: it is not meant to represent an exhaustive list of the possible configuration options available on the Aurora platform.

Single-AZ, Single Instance Deployment

great durability with Aurora but DA and HA less so

The most basic implementation of Aurora is a single compute instance in a single availability zone. The compute instance is monitored by the Aurora Cluster service and will be restarted if the database instance or compute VM has a failure. In this architecture, there is no redundancy at the compute level. Therefore, there is no database level HA or DR. The storage tier provides the same high level of durability described in the sections above. The image below is a view of what this configuration looks like in the AWS Console.

Single-AZ, Multi-Instance

Introducing HA into an Amazon Aurora solutionHA can be added to a basic Aurora implementation by adding an Aurora Replica.  We increase our HA level by adding Aurora Replicas within the same AZ. If desired, the Aurora Replicas can be used to also service some of the read traffic for the Aurora Cluster. This configuration cannot be said to provide DR because there are no database nodes outside the single datacenter or AZ. If that datacenter were to fail, then database availability would be lost until it was manually restored in another datacenter (AZ). It’s important to note that while Aurora has a lot of built-in automation, you will only benefit from that automation if your base configuration facilitates a path for the automation to follow. If you have a single-AZ base deployment, then you will not have the benefit of automated Multi-AZ availability. However, as in the previous case, durability remains the same. Again, durability is a characteristic of the storage layer. The image below is a view of what this configuration looks like in the AWS Console. Note that the Writer and Reader are in the same AZ.

Multi-AZ Options

Partial disaster recovery with Amazon auroraBuilding on our previous example, we can increase our level of HA and add partial DR capabilities to the configuration by adding more Aurora Replicas. At this point we will add one additional replica in the same AZ, bringing the local AZ replica count to three database instances. We will also add one replica in each of the two remaining regional AZs. Aurora provides the option to configure automated failover priority for the Aurora Replicas. Choosing your failover priority is best defined by the individual business needs. That said, one way to define the priority might be to set the first failover to the local-AZ replicas, and subsequent failover priority to the replicas in the other AZs. It is important to remember that AZs within a region are physical datacenters located within the same metro area. This configuration will provide protection for a disaster localized to the datacenter. It will not, however, provide protection for a city-wide disaster. The image below is a view of what this configuration looks like in the AWS Console. Note that we now have two Readers in the same AZ as the Writer and two Readers in two other AZs.

Cross-Region Options

The three configuration types we’ve discussed up to this point represent configuration options available within an AZ or metro area. There are also options available for cross-region replication in the form of both logical and physical replication.

Logical Replication

Aurora supports replication to up to five additional regions with logical replication.  It is important to note that, depending on the workload, logical replication across regions can be notably susceptible to replication lag.

Physical Replication

Durability, High Availability and Disaster Recovery with Amazon AuroraOne of the many announcements to come out of re:Invent 2018 is a product called Aurora Global Database. This is Aurora’s implementation of cross-region physical replication. Amazon’s published details on the solution indicate that it is storage level replication implemented on dedicated cross-region infrastructure with sub-second latency. In general terms, the idea behind a cross-region architecture is that the second region could be an exact duplicate of the primary region. This means that the primary region can have up to 15 Aurora Replicas and the secondary region can also have up to 15 Aurora Replicas. There is one database instance in the secondary region in the role of writer for that region. This instance can be configured to take over as the master for both regions in the case of a regional failure. In this scenario the secondary region becomes primary, and the writer in that region becomes the primary database writer. This configuration provides protection in the case of a regional disaster. It’s going to take some time to test this, but at the moment this architecture appears to provide the most comprehensive combination of Durability, HA, and DR. The trade-offs have yet to be thoroughly explored.

Multi-Master Options

Amazon is in the process of building out a new capability called Aurora Multi-Master. Currently, this feature is in preview phase and has not been released for general availability. While there were a lot of talks at re:Invent 2018 which highlighted some of the components of this feature, there is still no affirmative date for release. Early analysis points to the feature being localized to the AZ. It is not known if cross-region Multi-Master will be supported, but it seems unlikely.

Summary

As a post re:Invent takeaway, what I learned was that there is an Aurora configuration to fit almost any workload that requires strong performance behind it. Not all heavy workloads also demand HA and DR. If this describes one of your workloads, then there is an Aurora configuration that fits your needs. On the flip side, it is also important to remember that while data durability is an intrinsic quality of Aurora, HA and DR are not. These are completely configurable. This means that the Aurora architect in your organization must put thought and due diligence into the way they design your Aurora deployment. While we all need to be conscious of costs, don’t let cost consciousness become a blinder to reality. Just because your environment is running in Aurora does not mean you automatically have HA and DR for your database. In Aurora, HA and DR are configuration options, and just like the on-premise world, viable HA and DR have additional costs associated with them.

For More Information See Also:

 

 

 

by Brian Walters at January 11, 2019 07:53 PM

January 10, 2019

Peter Zaitsev

Percona Backup for MongoDB 0.2.0-Alpha Is Now Available

Percona Backup for MongoDB

Percona Backup for MongoDBPercona announces the first public release of Percona Backup for MongoDB 0.2.0-Alpha on January 10, 2019.

Percona Backup for MongoDB is a distributed, low-impact solution for consistent backups of MongoDB sharded clusters and replica sets. This is a tool for creating consistent backups across a MongoDB sharded cluster (or a single replica set), and for restoring those backups to a specific point in time. Percona Backup for MongoDB uses a distributed client/server architecture to perform backup/restore actions. The project was inspired by (and intends to replace) the Percona-Lab/mongodb_consistent_backup tool.

This release features:

  • Consistent backup of sharded clusters
  • Compression of oplogs and logical backups
  • Backup and restore from local files
  • Backup to S3
  • Running the backup on a single replica set using the safest node (preferably non-Primary or hidden nodes with the lowest replication priority and smallest replication lag)

Future releases will include:

Percona Backup for MongoDB supports Percona Server for MongoDB or MongoDB Community Server version 3.6 or higher with MongoDB replication enabled. Binaries for the supported platforms as well as the tarball with source code are available from the GitHub repository (https://github.com/percona/percona-backup-mongodb/releases/tag/v0.2.0). For more information about Percona Backup for MongoDB and the installation steps, see this README file.

Note Percona doesn’t recommend this release for production, and its API and configuration fields are likely to change in the future. It does not feature any API level security. You are welcome to report any bugs you encounter in our bug tracking system.

Percona Backup for MongoDB

Percona Backup for MongoDB process and interactions between key components.

 

by Borys Belinsky at January 10, 2019 08:14 PM

ProxySQL 1.4.13 and Updated proxysql-admin Tool

ProxySQL 1.4.12

ProxySQL 1.4.12

ProxySQL 1.4.13, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.13 source and binary packages available at https://percona.com/downloads/proxysql include ProxySQL Admin – a tool, developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.13 are available as well: https://hub.docker.com/r/percona/proxysql/. You can download the original ProxySQL from https://github.com/sysown/proxysql/releases. GitHub hosts the documentation in the wiki format.

Improvements

  • PSQLADM-53: Improved validation when --write-node is used with proxysql-admin
  • PSQLADM-122: galera/node monitor log now reports the count of async slave nodes that are online.

Bugs Fixed

  • PSQLADM-124: If the scheduler is configured with a –config-file that points to a file that doesn’t exist, the ERR_FILE was pointing to /dev/null. As a result, the user would not be notified about the error.
  • PSQLADM-126proxysql-admincould show an error when --syncusers was used and and mysql_users table was empty.
  • PSQLADM-127: proxysql_galera_checker could corrupt the scheduler configuration after restart
  • PSQLADM-129: Stopping or restarting ProxySQL can lead to multiple instances of proxysql_galera_checker running at the same time

ProxySQL is available under Open Source license GPLv3.

by Borys Belinsky at January 10, 2019 05:01 PM

PostgreSQL Updatable Views: Performing Schema Updates With Minimal Downtime

postgres updatable views

postgres updatable viewsRecently, one of our customers asked us how to minimize downtime when upgrading the database structure with changes that are not backwards-compatible. It’s an interesting question and I would like to visit some alternatives here. I will use PostgreSQL for this series of posts and walk through updatable views, INSTEAD OF Triggers, and the Rule System. Later, we’ll discuss alternatives available for other databases like MySQL.

This first post will give an overview of the problem and also the first implementation of the solution in PostgreSQL using updatable Views.

The Motivation

Software is like a living organism and as such, they evolve. It’s not surprising that the database schemas also evolve, and this brings us a problem: how to minimize downtime when performing upgrades? Or even further, is it possible to upgrade them without activating maintenance mode thereby making the service unavailable for our customers?

Let’s say that we want to push out an update 2.0. It’s a major update, and in this update, there are application code changes and changes to the database such as altered tables, dropped columns, new tables and so on. Checking the changelog, we notice that most of the database changes are backwards-compatible but a few modified tables are not so we can’t just push out the new database changes without breaking some functionality in the existing codebase. To avoid triggering errors while we upgrade the database, we need to shutdown the application servers, update the database, update the codebase, and then get the servers back and running again. That means that we need an unwanted maintenance window!

As per our definition of the problem, we want to get to the point where we don’t have to use this maintenance window, a point where the old and new codebase could coexist for a period of time while we upgrade the system. One solution is to not make changes that the current codebase can’t handle, but, as you may have already assumed, it isn’t really an option when we are constantly trying to optimize and improve our databases. Another option, then, would be to use PostgreSQL updatable views.

Updatable Views

PostgreSQL has introduced automatically updatable views in 9.3. The documentation[1] says that simple views are automatically updatable and the system will allow INSERT, UPDATE or DELETE statements to be used on the view in the same way as on a regular table. A view is automatically updatable if it satisfies all of the following conditions:

  • The view must have exactly one entry in its FROM list, which must be a table or another updatable view.
  • The view definition must not contain WITH, DISTINCT, GROUP BY, HAVING, LIMIT, or OFFSET clauses at the top level.
  • The view definition must not contain set operations (UNION, INTERSECT or EXCEPT) at the top level.
  • The view’s select list must not contain any aggregates, window functions, or set-returning functions.

Note that the idea is to give a simple mechanism that helps when using views, and if the view is automatically updatable the system will convert any INSERT, UPDATE or DELETE statement on the view into the corresponding statement on the underlying base table. This can also be used to increase the security granularity giving the power to define privilege that operates at the level. If using a WHERE clause in the view we can use the CHECK OPTION to prevent the user from being able to UPDATE or INSERT rows that are not in the scope of the view. For example, let’s say we have a view created to limit the user to view records from a specific country.  If the user changes the country of any record, those records would disappear from the view. The CHECK OPTION can help to prevent this from happening. I recommend reading the documentation for more information about how views work in PostgreSQL.

Implementation

Using updatable views makes the implementation as simple as creating views. For our example I will use the below table:

test=# CREATE TABLE t (id INTEGER PRIMARY KEY, name VARCHAR(100) NOT NULL, password VARCHAR(300) NOT NULL, date_created TIMESTAMP NOT NULL DEFAULT now());
CREATE TABLE
test=# INSERT INTO t(id, name, password) VALUES (1, 'user_1', 'pwd_1'), (2, 'user_2','pwd_2'),(3,'user_3','pwd_3'),(4,'user_4','pwd_4'),(5,'user_5','pwd_5');
INSERT 0 5
test=# SELECT * FROM t;
id | name | password | date_created
----+--------+----------+----------------------------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455
3 | user_3 | pwd_3 | 2018-12-27 07:50:39.562455
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455
(5 rows)

We then changed the schema renaming the columns password to pwd, date_created to dt_created and added 2 more columns, pwd_salt and comment. The added columns are not a real problem because they can be either nullable or have a default value but the column name change is a problem. The changes are:

test=# create schema v_10;
CREATE SCHEMA
test=# CREATE VIEW v_10.t AS SELECT id, name, password AS password, date_created AS date_created FROM public.t;
CREATE VIEW
test=# ALTER TABLE public.t RENAME COLUMN password TO pwd;
ALTER TABLE
test=# ALTER TABLE public.t RENAME COLUMN date_created TO dt_created;
ALTER TABLE
test=# ALTER TABLE public.t ADD COLUMN pwd_salt VARCHAR(100);
ALTER TABLE
test=# ALTER TABLE public.t ADD COLUMN comment VARCHAR(500);
ALTER TABLE

To make sure our application will work properly we’ve defined that the tables will be in a specific main schema, in this example is the PUBLIC schema and the views will be in the versioned schemas. In this case, if we have a change in one specific version that needs a view guaranteeing backwards-compatibility, we just create the view inside the versioned schema and apply the changes to the table in the main schema. The application will always define the “search_path” as “versioned_schema,main_schema”, which is “v_10, public” in this example:

test=# SET search_path TO v_10, public;
SET
test=# SELECT * FROM t;
id | name | password | date_created
----+--------+----------+----------------------------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455
3 | user_3 | pwd_3 | 2018-12-27 07:50:39.562455
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455
(5 rows)
test=# select * from public.t;
id | name | pwd | dt_created | pwd_salt | comment
----+--------+-------+----------------------------+----------+---------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455 | |
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455 | |
3 | user_3 | pwd_3 | 2018-12-27 07:50:39.562455 | |
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455 | |
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455 | |
(5 rows)

As we can see, the application still sees the old schema, but does this work? What if someone updates the password of ID #3? Let’s check:

test=# UPDATE t SET password = 'new_pwd_3' WHERE id = 3;
UPDATE 1
test=# SELECT * FROM t;
id | name | password | date_created
----+--------+-----------+----------------------------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455
3 | user_3 | new_pwd_3 | 2018-12-27 07:50:39.562455
(5 rows)
test=# SELECT * FROM public.t;
id | name | pwd | dt_created | pwd_salt | comment
----+--------+-----------+----------------------------+----------+---------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455 | |
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455 | |
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455 | |
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455 | |
3 | user_3 | new_pwd_3 | 2018-12-27 07:50:39.562455 | |
(5 rows)

As we can see, the updatable view worked just like a charm! The new and old application codebase can coexist and work together while we roll up our upgrades. There are some restrictions, as explained in the documentation, like having only one table or view in the WHERE clause but for its simplicity, upgradable views do a great job. For more complex cases where we need to split/join tables? Well, we will discuss these in future articles and show how we can solve them with both TRIGGERS and the PostgreSQL Rule System.

References

[1] https://www.postgresql.org/docs/current/sql-createview.html


Photo by Egor Kamelev from Pexels

by Charly Batista at January 10, 2019 09:35 AM

January 09, 2019

Peter Zaitsev

Percona Toolkit 3.0.13 Is Now Available

percona toolkit

percona toolkitPercona announces the release of Percona Toolkit 3.0.13 for January 9, 2019.

Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL®, MongoDB® and system tasks that are too difficult or complex to perform manually. With over 1,000,000 downloads, Percona Toolkit supports Percona Server for MySQL, MySQL®, MariaDB®, Percona Server for MongoDB and MongoDB.

Percona Toolkit, like all Percona software, is free and open source. You can download packages from the website or install from official repositories.

This release includes the following changes:

Bug fixes:

  • PT-1673: pt-show-grants was incompatible with MariaDB 10+ (thanks Tim Birkett)
  • PT-1638: pt-online-schema-change was erroneously taking MariaDB 10.x for MySQL 8.0 and rejecting to work with it to avoid the upstream bug #89441 scope.
  • PT-1616: pt-table-checksum failed to resume on large tables with binary strings containing invalid UTF-8 characters.
  • PT-1573: pt-query-digest didn’t work in case of log_timestamps = SYSTEM my.cnf option.
  • PT-157: Specifying a non-primary key index with the ‘i’ part of the --source argument made pt-archiver to ignore the --primary-key-only option presence.

Improvements:

  • PT-1340: pt-stalk now doesn’t call mysqladmin debug command by default to avoid flooding in the error log. CMD_MYSQLADMIN="mysqladmin debug" environment variable reverts pt-stalk to the previous way of operation.
  • PT-1637: A new --fail-on-stopped-replication option  allows pt-table-checksum to detect failing slave nodes.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

by Dmitriy Kostiuk at January 09, 2019 04:49 PM

Amazon Aurora Serverless – The Sleeping Beauty

Amazon RDS Aurora Serverless activation times

One of the most exciting features Amazon Aurora Serverless brings to the table is its ability to go to sleep (pause) when idle. This is a fantastic feature for development and test environments. You get access to a powerful database to run tests quickly, but it goes easy on your wallet as you only pay for storage when the instance is paused.

You can configure Amazon RDS Aurora Serverless to go to sleep after a specified period of time. This can be set to anywhere between five minutes and 24 hours

configure Amazon RDS Aurora Serverless sleep time

For this feature to work, however, inactivity has to be complete. If you have so much as a single query or even maintain an idle open connection, Amazon Aurora Serverless will not be able to pause.

This means, for example, that pretty much any monitoring you may have enabled, including our own Percona Monitoring and Management (PMM) will prevent the instance from pausing. It would be great if Amazon RDS Aurora Serverless would allow us to specify user accounts to ignore, or additional service endpoints which should not prevent it from pausing, but currently you need to get by without such monitoring and diagnostic tools, or else enable them only for duration of the test run.

If you’re using Amazon Aurora Serverless to back very low traffic applications, you might consider disabling the automatic pause function, since waking up currently takes quite a while. Otherwise, your users should be prepared for a 30+ seconds wait while Amazon Aurora Serverless activates.

Having such a high time to activate means you need to be mindful of timeout configuration in your test/dev scripts so you do not have to deal with sporadic failures. Or you can also use something like the mysqladmin ping command to activate the instance before your test run.

Some activation experiments

Let’s now take a closer look at Amazon RDS Aurora Serverless activation times. These times are measured for MySQL 5.6 based Aurora Serverless – the only one currently available. I expect numbers could be different in other editions

Amazon RDS Aurora Serverless activation times

I measured the time it takes to run a trivial query (SELECT 1) after the instance goes to sleep. You’ll see I manually scaled the Amazon RDS Aurora Serverless instance to a desired capacity in ACU (Aurora Compute Units), and then had the script wait for six minutes to allow for pause to happen before running the query. The test was performed 12 times and the Min/Max/Avg times of these test runs for different settings of ACU are presented above.

You can see there is some variation between min and max times. I would expect to have even higher outliers, so plan for an activation time of more than a minute as a worst case scenario.

Also note that there is an interesting difference in the activation time between instance sizes. While in my tests the smallest possible size (2 ACU) consistently took longer to activate compared to the medium size (8 ACU), the even bigger size (64 ACU) was the slowest of all.

So make no assumptions about how long it would take for instance of given size to wake up with your workload, but rather test it if it is important consideration for you.

In some (rare) cases I also observed some internal timeouts during the resume process:

[root@ip-172-31-16-160 serverless]# mysqladmin ping -h serverless-test.cluster-XXXX.us-east-2.rds.amazonaws.com -u user -ppassword
mysqladmin: connect to server at 'serverless-test.cluster-XXXX.us-east-2.rds.amazonaws.com' failed
error: 'Database was unable to resume within timeout period.'

What about Autoscaling?

Finally, you may wonder how such Amazon Aurora Serverless pausing plays with Amazon Aurora Serverless Autoscaling ?

In my tests, I observed that resume always restores the instance size to the same ACU as it was before it was paused. However, this is where pausing configuration matters a great deal. According to this document, Amazon Aurora Serverless will not scale down more frequently than once per 900 seconds. While the document does not clarify over what period of time the conditions initiating scale down – cpu usage, connection usage etc – have to be met for scale down to be triggered, I can see that if the instance is idle for five minutes the scale down is not performed – it is just put to sleep.

At the same time, if you change this default five minute period to a longer time, the idle instance will be automatically scaled down a notch every 900 seconds before it finally goes to sleep. Consequently, when it is awakened it will not be at the last stage at which the load was applied, but instead at the stage it was at when it was scaled down. Also, scaling down is considered an event by itself, which resets the idle counter and delays the pause. For example: if the initial instance scale is 8, and the pause timer is set to 1h, it takes 1h 30 minutes for the pause to actually happen – 30 minutes to do scale down twice, plus 1 hour at the minimum size for pause to trigger

Here is a graph to illustrate this:

Amazon Aurora Serverless scale down timings

This also shows that when the load is re-applied at about 13:47, it recovers to the last number of ACU it had before the pause.

This means that a pause time of more than 15 minutes makes the pause behavior substantially different to the default.

Summary

  • Amazon Aurora Serverless automatic pause is a great for test/dev environments.
  • Resume time is relatively long, can reach as much as one minute.
  • Consider disabling automatic pausing for low traffic production applications, or at least let your users know they need to wait when they wake up the application.
  • Pause and Resume behavior is different in practice for a pause timeout of more than 15 minutes. Sticking to the default 5 minutes is recommended unless you really know what you’re doing.

by Peter Zaitsev at January 09, 2019 12:59 PM

January 08, 2019

Peter Zaitsev

Percona Live 2019 Tracks

Percona Live 2019

Percona Live Percona Live 2019Open Source Database Conference 2019 in North America has moved to Austin, Texas: a cool place to be, and host to many big names in the tech space. Read what Dave Stokes, MySQL Community Manager for Oracle, has to say in favor of Austin.

If you need a conference ticket for Austin, put in your proposal now!

Those who are successful with their presentation or tutorial submissions will receive a pass to the full three days of the event. Closing date for the call for papers is Sunday, January 20.

Percona is adopting an industry trend by organizing the conference into 13 separate tracks with one Percona expert coordinating community input for each one. We believe subject-specific mini-committees of experts should provide better results than a single mega-committee covering everything.

The MySQL track is being led by Alkin Tezuysal, Senior Technical Manager

MariaDB is the responsibility of Sveta Smirnova, Principle Support Escalation Specialist.

MongoDB is being driven by Consultant Doug Duncan.

PostgreSQL is being pushed forward by Avinash Vallarapu, PostgreSQL Support Tech Lead

Other Open Source Databases well, this important challenge has been handed to Senior Support Engineer Agustín Gallego

Java Development for Open Source Databases might be of interest to developers and is being led by Rodrigo Trindade, Service Delivery Manager

Kubernetes track is being headed by Mykola Marzhan who is our Kubernetes Technical Lead

Database Security and Compliance will be overseen by Denis Farar, General Counsel and VP of HR (but make no mistake, this is still a track where tech content is very welcome)

Automation & AI topics, at the leading edge of database technology challenges, are the responsibility of Max Bubenick, Platform Lead.

Observability & Monitoring talk selection will be led by Roma Novikov, Director of Platform Engineering – so get those PMM and other OS monitoring proposals at the ready!

Polyglot Persistence is in the hands of our Senior Software Engineer Ibrar Ahmed who is waiting to hear all about your experiences with cross-database applications, data exchange and how to meet the challenges of a hybrid database world.

Migration to OpenSource Databases which is a similar-but-different track full of challenges parallel to that of polyglot applications is being watched over by Marco Tusa, Managing Consultant

Business & Enterprise track will be driven by Brian Walters, Director of Solution Engineering who is keen to hear of your case studies and experiences of the impact of open source databases on your process and organizations.

Cloud is a special case, since it touches on virtually all aspects of open source database technology. If your talk has particular relevance to ‘cloud’ then please add this track with your submission. Similarly Innovative Technologies can apply across the board, and if you have something to share that is truly new, then add that to your track list. Those that are most exciting in the context of cloud or innovative in their approach may be selected for their cloud or innovation merit, whichever track they belong to.

Our track champions will engage with community experts to select papers and shape content. If you would like to contribute by taking on talk selection, please let me know.

New speakers, and those with less experience, are welcome, we are here to help, so first check out my community blog post with links to info and video workshops on how to put together a selection-worthy proposal. Even old-hands might find some inspiration!

All in all, we think this is a great move, with the track champions contributing their passion, experience and knowledge of contemporary open source issues to the development of excellent content.  Although we’re changing several things at once, no one gets a prize for standing still. We hope you’ll continue to support and grow with us this great, open source, database focused event! Put a note in your diary to join us from May 28 – 30 in Austin, Texas.

Finally, if you would like to get in touch with any of our track champions, please let me know

by Lorraine Pocklington, Community Manager at January 08, 2019 08:23 PM

Upcoming Webinar Wed 1/9: Walkthrough of Percona Server MySQL 8.0

Walkthrough of Percona Server for MySQL 8.0

Walkthrough of Percona Server for MySQL 8.0Please join Percona’s MySQL Product Manager, Tyler Duzan as he presents Walkthrough of Percona Server MySQL 8.0 on Wednesday, January 9th at 11:00 AM PDT (UTC-7) / 2:00 PM (UTC-4).

Register Now

Our Percona Server for MySQL 8.0 software is the company’s free, enhanced, drop-in replacement for MySQL Community Edition. The software includes all of the great features in MySQL Community Edition 8.0. Additionally, it includes enterprise-class features from Percona made available free and open source. Thousands of enterprises trust Percona Server for MySQL to deliver excellent performance and reliability for their databases and mission-critical applications. Furthermore, our open source software meets their need for a mature, proven and cost-effective MySQL solution.

In sum, register for this webinar for a walkthrough of Percona Server for MySQL 8.0.

by Tyler Duzan at January 08, 2019 07:13 PM

Chris Calender

Un-Answered Problems Into Wonderful Means to Set up a Higher education Essay Shown

Releasing Excellent Strategies to Begin a University or college Essay Make certain your strategy includes a launch, mid and bottom line. In accordance with the system which you will need to obtain, there exist unique problems on how to write a effect cardstock to a documentary you should make. In the carry on piece, you may also check the documentary with other individuals around the specified variety or issue so that you can get paid audience have greater understanding of the report. Your visitor would desire to know just how the making has effects on them-and regardless of if the studying will at the mercy of them most definitely Correct right away, it is best to painting a photograph within the particular person or scenario and display the activity transpiring. Evidently, it’s not possible to obtain all the deserving thoughts with the txt in 20 min, but which can be the length of time it entails to build your appearance and select even if you would like to continue on reading or otherwise. writing a term paper Prosperous individuals might have a travel-begin in lifestyle and when you’d wish to call this option, then nice. Ask these questions : issues prior to composing your very own affirmation anything you value. A friendly notice can certainly be printed in more or less any way you decide on, then again there are various of corporate guidelines you’ll have the ability to stick to if you’re puzzled by things to come up with or a way to file format your notice. For your own first section, you could always be sought after to compose a post, as it’s the only pick supplied.

New Detailed Roadmap forever Methods of Start a University or college Essay As a consequence of engagement of my girl who apparently created a choice to dedicate her existence to resist my effort and hard work for minimalism I generate a forecast I am going to under no circumstances run within the stuff to chuck in to the trash bin. Consequently, it’s always clever that you really relax and watch the documentary with greater frequency than the moment for being in the position to effectively breakdown it and recognize what it is about. You’ve worked well diligently for the past couple of months and notably hard over the past variety of several hours, so be happy and rejoice in! Good Tips on how to Build a School Essay Benefits Astoundingly, individuals are astoundingly several and exactly what you do can be new to someone else. The particular starting clarification is I do not need to acquire a normal business vocation. If you ever must have help and support getting in an understanding-earning mentality, then this is a tip. Is situated You’ve Been Instructed About Good Tips on how to Go into a Advanced schooling Essay You’re producing a head when you believe you need one particular. The two main key will mean by which you could make the most of the most frequent IELTS essay stories for your advantages. If you’re equipped to effectively compose an ideal introduction, you are able to get great marks belonging to the tutor.

There is always barely any pupil, who wasn’t delegated to prepare an essay. Don’t even think http://dave.parsons.edu/custom_essay/?id=buy-paper-online about dissertation editing the time you wrap up composing the preceding sentence. Exactly what you end up doing on this website enormously is contingent on the sort of essay you are looking for authoring. Generating an systematic essay thesis The primary activity you might actually do because it is possible to prepare your investigation essay may be to make an examination essay. The mere real truth that you’re really being produced to compose an essay allows you to detest the topic, then again tough your professor attempted to give it time to be intriguing. Many scheduling ought to go into your make up before beginning authoring it. The One Thing for you to do for better Solutions to Build a University or college Essay Among the fascinating approaches to embark on your essay could be to open it employing a well known price quote or astonishing facts. You will find lots of methods to set up your essay. A wonderful essay will disclose how tricky and committed work one who is aware of how you can show theirselves you’re. The greater the distinct you’re, the a lot easier it is going to be to generate it within your essay. It’s a brief constitution over a certain area. You wish to prepare any person essay. Essay system composing Every last essay requires a natural composition if there aren’t various other instructions. You could also will need to explore the short article aloud to a person in order to discover the things they presume. Let’s say you’d want to write a manuscript. You will need to visualize a good idea depending on way the documentary handled you. Effectively, you’re fortunate enough as you’ve have me!

Writing is somewhat much like using meditation. First of all, you really should look at my list of 150 area tips for essays which reveal. Jot straight down anything you understand more concerning field. If you would like to try and do a very superb profession, when you’ve concluded your summation, you must revisit and evaluate the preliminary report a final decisive moment. The answer, and that also which I supporter in all of the my creating sessions, should be to create a endeavor that you could possibly be successful at, by modifying the factors. If you’re utilizing your private fake with this book or you’ve published it through your home pc, require notes right on the internet page and underline crucial quotations. Beneficial Ways to Create a Advanced schooling Essay: the most effective Ease! Or if you ever get a personal-hosted Word press online site, I would recommend obtaining the Yoast Search engine optimization wordpress plugin. Opt for the wonderful article author you realize. In actual fact, a skilled publisher can deliver the results faster than any pupil as they’ve been publishing scholastic duties in their complete everyday living. Side area jobs are supposed to do without any demands though establishing a startup is quite stressful. Furthermore, one or two a long time of class time will probably want to get allocated to have the capability to present the collages. Formulating currently is element of my regime I’m anticipating.

January 08, 2019 12:00 AM

January 07, 2019

Peter Zaitsev

Understanding MySQL X (All Flavors)

what is MySQL X Protocol

what is MySQL X ProtocolSince 5.7.12 MySQL includes what is called the X plugin, but also it includes X protocol and X DevApi. But what is all this and how does it work? Let me share a personal short story on how I found myself investigating this feature. In a previous post I wrote about the MySQL Router tool, and our colleague Mr. Lefred pointed out that I was wrong about X protocol, because I mentioned it was created to be used with JSON docs. Given this input, I wanted to investigate in a little bit more depth about what all this “X” means and how it can be used in our day to day operations.

First problem I found is that the documentation is pretty extensive in the how’s but it was really hard to find the what’s. This is a bit strange, because for people trying to research about this new feature the documentation is not very helpful. In fact, I had to go to different websites to get a sense of what X means, how it works, and what it was created for.

Let’s start from the very beginning: what does the X stand for? Basically, it’s a way to name the crossover between relational and document models with extended capabilities, and the X is used for naming the three components we are describing: the plugin, the protocol and the DevApi.

X Plugin

This is the actual interface between MySQL server and the clients. By clients we can consider a variety of clients, not only the MySQL shell. It has to be installed in MySQL 5.7 versions via the INSTALL PLUGIN command but comes installed by default in MySQL 8. The plugin adds all the functionality, configuration variables, and status counters we need to use it.

It has the ability to work with both traditional SQL and Document objects, and also supports CRUD (Create, Read, Update, Delete) operations,  asynchronous query execution and so on – this provides a great capacity to extend the current way we work with MySQL.

X Protocol

This is a new client protocol created to ‘talk’ between the X Plugin and Clients.  I think it is fair to say this is an eXtended version of the MySQL protocol.
It was designed with the idea of having the capacity for asynchronous calls, meaning that you can send more than one query to server from same client without the need of waiting for first query to finish before sending the second and so. This improves the overall execution time by saving network round trips between clients and server.

Additionally, the protocol accepts CRUD operations and, of course, the handling of JSON documents and plain SQL. The protocol is fully implemented in MySQLShell and has several connectors for popular languages (Java and .Net for example)

X DevAPI

The last piece of this package is the X DevAPI protocol. Probably the best documented of these pieces is the API implemented on the MySQL Shell and connectors that supports the X Protocol. This API is designed to easily write programs from a given client using some popular languages. For example, we can easily create/test a program from MySQL Shell using Python or JavaScript.

The API defines few interesting concepts to handle sessions. These sessions can handle several connections to a server so in a specific session we can encapsulate more than one MySQL connection. You can define a basic session connection as follows (in JavaScript) using the MySQL Shell:

MySQL  localhost:33060+ ssl  JS > var test = require('mysqlx');
 MySQL  localhost:33060+ ssl  JS > var session = mysqlx.getSession({host: 'localhost', user: 'root', password: 'root', port: 3306});

So what’s new here? How does it help, and how I can make use of it? First let’s try to illustrate the architecture:

 

MySQL X the components

As you may notice, the X plugin adds a new interface that talks to X protocol, then this protocol is able to interact with connectors that supports the protocol (as mentioned above). The classic functionality is still present, so we just extended its functionality. The good part of this is that the protocol is capable of operating with both relational data and document store.

So now let’s check the funny part by putting all pieces together using a simple example using MySQL Shell:

[root@data1 ~]# mysqlsh
MySQL Shell 8.0.13
Copyright (c) 2016, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type '\help' or '\?' for help; '\quit' to exit.
 MySQL  JS > var test_conn = require('mysqlx');
 MySQL  JS > var session = mysqlx.getSession({host: 'localhost', user: 'root', password: 'root', port: 33060});   #creating session, notice X protocol listen port 33060 by default
 MySQL  JS > test_collection = session.getSchema('test').createCollection("people");
<Collection:people>
 MySQL  JS > test_collection.add({birth:"1988-06-12", Name: "Francisco"});
Query OK, 1 item affected (0.0456 sec)
 MySQL  JS > test_collection.add({birth:"2001-11-03", Name: "Maria", Nickname: "Mary"});
Query OK, 1 item affected (0.0255 sec)
 MySQL  JS > test_collection.find();
[
    {
        "Name": "Francisco",
        "_id": "00005c19099f0000000000000004",
        "birth": "1988-06-12"
    },
    {
        "Name": "Maria",
        "Nickname": "Mary",
        "_id": "00005c19099f0000000000000005",
        "birth": "2001-11-03"
    }
]
2 documents in set (0.0005 sec)
 MySQL  JS > \sql 									#simple command to switch between modes
Switching to SQL mode... Commands end with ;
 MySQL  SQL > \connect root@localhost
Creating a session to 'root@localhost'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 36 (X protocol)
Server version: 8.0.11 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.
 MySQL  localhost:33060+ ssl  SQL > use test
Default schema set to `test`.
Fetching table and column names from `test` for auto-completion... Press ^C to stop.
 MySQL  localhost:33060+ ssl  test  SQL >  CREATE TABLE `people2` (
                                       ->   `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
                                       ->   `birth` datetime NOT NULL,
                                       ->   `name` varchar(45) NOT NULL DEFAULT '',
                                       ->   `nickname` varchar(45) NULL DEFAULT '',
                                       ->   PRIMARY KEY (`id`)
                                       -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.1056 sec)
 MySQL  localhost:33060+ ssl  test  SQL > insert into people2(birth, name, nickname) values('2010-05-01', 'Peter', null), ('1999-10-14','Joseph', 'Joe');
Query OK, 2 rows affected (0.0326 sec)
 MySQL  localhost:33060+ ssl  test  SQL > select * from people2;
+----+---------------------+--------+----------+
| id | birth               | name   | nickname |
+----+---------------------+--------+----------+
|  1 | 2010-05-01 00:00:00 | Peter  | NULL     |
|  2 | 1999-10-14 00:00:00 | Joseph | Joe      |
+----+---------------------+--------+----------+
2 rows in set (0.0004 sec)
 MySQL  localhost:33060+ ssl  test  SQL > select * from people;
+-----------------------------------------------------------------------------------------------------+------------------------------+
| doc                                                                                                 | _id                          |
+-----------------------------------------------------------------------------------------------------+------------------------------+
| {"_id": "00005c19099f0000000000000004", "Name": "Francisco", "birth": "1988-06-12"}                 | 00005c19099f0000000000000004 |
| {"_id": "00005c19099f0000000000000005", "Name": "Maria", "birth": "2001-11-03", "Nickname": "Mary"} | 00005c19099f0000000000000005 |
+-----------------------------------------------------------------------------------------------------+------------------------------+
2 rows in set (0.0028 sec)

Interesting right? Within the same shell, I’ve created session to run over X protocol, and handled both document and relational objects, all without quitting from shell.

Is this all? Of course not! We are just scratching the surface, we haven’t used asynchronous calls nor CRUD operations. In fact, these topics are enough for a blog post each. Hopefully, though, the What’s are answered for now – at least a little –and if that’s the case, I’ll be very happy!


Photo by Deva Darshan on Unsplash

by Francisco Bordenave at January 07, 2019 04:07 PM

MariaDB Foundation

MariaDB 10.3.12 and MariaDB Connector/C 3.0.8 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.3.12, the latest stable release in the MariaDB 10.3 series, as well as MariaDB Connector/ODBC 3.0.8, the latest stable release in the MariaDB Connector/ODBC series. See the release notes and changelogs for details. Download MariaDB 10.3.12 Release Notes Changelog What is MariaDB 10.3? MariaDB […]

The post MariaDB 10.3.12 and MariaDB Connector/C 3.0.8 now available appeared first on MariaDB.org.

by Ian Gilfillan at January 07, 2019 01:39 PM

Jean-Jerome Schmidt

Announcing ClusterControl 1.7.1: Support for PostgreSQL 11 and MongoDB 4.0, Enhanced Monitoring

We are excited to announce the 1.7.1 release of ClusterControl - the only management system you’ll ever need to take control of your open source database infrastructure!

ClusterControl 1.7.1 introduces the next iteration of our agent-based monitoring infrastructure for MySQL, Galera Cluster, PostgreSQL, MongoDB, HAProxy & ProxySQL, a suite of new features to help users fully automate and manage PostgreSQL (including support for PostgreSQL 11), support for MongoDB 4.0 ... and more!

Release Highlights

Performance Management

  • Enhanced performance dashboards for MySQL, Galera Cluster, PostgreSQL, MongoDB, HAProxy & ProxySQL
  • Enhanced query monitoring for PostgreSQL: view query statistics

Deployment & Backup Management

  • Create a cluster from backup for MySQL & PostgreSQL
  • Verify/restore backup on a standalone PostgreSQL host
  • ClusterControl Backup & Restore

Additional Highlights

  • Support for PostgreSQL 11 and MongoDB 4.0

View the ClusterControl ChangeLog for all the details!

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

View Release Details and Resources

Release Details

Performance Management

Enhanced performance dashboards for MySQL, Galera Cluster, PostgreSQL & ProxySQL

Since October 2018, ClusterControl users have access to a set of monitoring dashboards that have Prometheus as the data source with its flexible query language and multi-dimensional data model, where time series data is identified by metric name and key/value pairs.

The advantage of this new agent-based monitoring infrastructure is that users can enable their database clusters to use Prometheus exporters to collect metrics on their nodes and hosts, thus avoiding excessive SSH activity for monitoring and metrics collections and use SSH connectivity only for management operations.

These Prometheus exporters can now be installed or enabled Prometheus on your nodes and hosts with MySQL, PostgreSQL and MongoDB based clusters. And you have the possibility to customize collector flags for the exporters (Prometheus), which allows you to disable collecting from MySQL's performance schema for example, if you experience load issues on your server.

This allows for greater accuracy and customization options while monitoring your database clusters. ClusterControl takes care of installing and maintaining Prometheus as well as exporters on the monitored hosts.

With this 1.7.1 release, ClusterControl now also comes with the next iteration of the following (new) dashboards:

  • System Overview
  • Cluster Overview
  • MySQL Server - General
  • MySQL Server - Caches
  • MySQL InnoDB Metrics
  • Galera Cluster Overview
  • Galera Server Overview
  • PostgreSQL Overview
  • ProxySQL Overview
  • HAProxy Overview
  • MongoDB Cluster Overview
  • MongoDB ReplicaSet
  • MongoDB Server

Do check them out and let us know what you think!

MongoDB Cluster Overview
MongoDB Cluster Overview
HAProxy Overview
HAProxy Overview

Performance Management

Advanced query monitoring for PostgreSQL: view query statistics

ClusterControl 1.7.1 now comes with a whole range of new query statistics that can easily be viewed and monitored via the ClusterControl GUI. The following statistics are included in this new release:

  • Access by sequential or index scans
  • Table I/O statistics
  • Index I/O statistics
  • Database Wide Statistics
  • Table Bloat And Index Bloat
  • Top 10 largest tables
  • Database Sizes
  • Last analyzed or vacuumed
  • Unused indexes
  • Duplicate indexes
  • Exclusive lock waits
Table Bloat & Index Bloat
Table Bloat & Index Bloat

Deployment

Create a cluster from backup for MySQL & PostgreSQL

To be able to deliver database and application changes more quickly, several tasks must be automated. It can be a daunting job to ensure that a development team has the latest database build for the test when there is a proliferation of copies, and the production database is in use.

ClusterControl provides a single process to create a new cluster from backup with no impact on the source database system.

With this new release, you can easily create MySQL Galera or PostgreSQL including the data from backup you need.

Backup Management

ClusterControl Backup/Restore

ClusterControl users can use this new feature to migrate a setup from one controller to another controller; and backup the meta-data of an entire controller or individual clusters from the s9s CLI. The backup can then be restored on a new controller with a new hostname/IP and the restore process will automatically recreate database access privileges. Check it out!

Additional New Functionalities

View the ClusterControl ChangeLog for all the details!

Download ClusterControl today!

Happy Clustering!

by jj at January 07, 2019 10:48 AM

January 06, 2019

Valeriy Kravchuk

Fun with Bugs #76 - On MySQL Bug Reports I am Subscribed to, Part XIII

Holidays season is almost over here, so it's time to get back to my main topic of MySQL bugs. Proper MySQL bug reporting will be a topic of my FOSDEM 2019 talk in less than 4 weeks (and few slides with recent examples of bugs are not yet ready), so I have to concentrate on bugs.

Last time in this series I reviewed some interesting bug reports filed in November, 2018. Time to move on and proceed with bugs reported in December, 2018, as I've subscribed to 27 or so of them. As usual, I'll review them briefly starting from the oldest and try to check if MariaDB 10.3 is also affected when the bug report is about common features:
  • Bug #93440 - "Noop UPDATE query is logged to binlog after read_only flag is set". Nice corner case found by Artem Danilov. super_read_only, even if set to ON successfully, may not prevent from committing and advancing GTID value.
  • Bug #93450 - "mysqldump does not wrap SET NAMES into mysql-extension comment". This is a regression bug in MySQL 8.0 that may break compatibility with 3rd party tools not aware of MySQL extensions. This bug was reported by Mattias Jonsson.
  • Bug #93451 - "The table comment is cut down on selecting with ORDER BY". Nice regression in MySQL 8. As one can easily check, MariaDB 10.3.x and older MySQL versions are not affected.
  • Bug #93491 - "Optimizer does not correctly consider attached conditions in planning". Clear and useful bug report from Morgan Tocker.
  • Bug #93544 - "SHOW BINLOG EVENTS FROM <bad offset> is not diagnosed". Yet another regression bug in MySQL 8 found by Laurynas Biveinis from Percona. MariaDB 10.3 does not accept bad offsets:
    MariaDB [test]> show binlog events from 14 limit 1;
    ERROR 1220 (HY000): Error when executing command SHOW BINLOG EVENTS: Wrong offset or I/O error
    MariaDB [test]> show binlog events limit 4;
    +------------------+-----+-------------------+-----------+-------------+--------
    ---------------------------------------+
    | Log_name         | Pos | Event_type        | Server_id | End_log_pos | Info
                                           |
    +------------------+-----+-------------------+-----------+-------------+--------
    ---------------------------------------+
    | pc-PC-bin.000001 |   4 | Format_desc       |         1 |         256 | Server
    ver: 10.3.7-MariaDB-log, Binlog ver: 4 |
    | pc-PC-bin.000001 | 256 | Gtid_list         |         1 |         285 | []
                                           |
    | pc-PC-bin.000001 | 285 | Binlog_checkpoint |         1 |         328 | pc-PC-b
    in.000001                              |
    | pc-PC-bin.000001 | 328 | Gtid              |         1 |         370 | GTID 0-
    1-1                                    |
    +------------------+-----+-------------------+-----------+-------------+--------
    ---------------------------------------+
    4 rows in set (0.002 sec)

    MariaDB [test]> show binlog events from 256 limit 1;
    +------------------+-----+------------+-----------+-------------+------+
    | Log_name         | Pos | Event_type | Server_id | End_log_pos | Info |
    +------------------+-----+------------+-----------+-------------+------+
    | pc-PC-bin.000001 | 256 | Gtid_list  |         1 |         285 | []   |
    +------------------+-----+------------+-----------+-------------+------+
    1 row in set (0.002 sec)
  • Bug #93572 - "parallel workers+slave_preserve_commit_order+flushtables with read lock deadlock". I subscribed to it as it's just yet another example of improper handling of useful bug reports, as already discussed in my post "Problems with Oracle's Way of MySQL Bugs Database Maintenance". I think Ashe Sun's point is clear and suggestions like "don't do it" have nothing to do with proper bugs processing.
  • Bug #93587 - "Error when creating a table with long partition names". Nice regression bug in MySQL 8 comparing to 5.7 was found by Sergei Glushchenko from Percona.

    MariaDB 10.3.7 on Windows also fails with error message that is not clear:
    ERROR 1005 (HY000): Can't create table `mc5noglq9ofy7ym76z1t758ztptj6iplvsldhmse
    xt63mlvhcpew4dnu2opqdrre`.`th6edxfx5d1u5blb3i50ln5dfo415jirp9xkuc0h9o2ionkql3iom
    fyw4zvocfpp` (errno: 168 "Unknown (generic) error from engine")
    In the error log I see:
    2019-01-06 19:36:46 10 [ERROR] InnoDB: Operating system error number 3 in a file operation.
    2019-01-06 19:36:46 10 [ERROR] InnoDB: The error means the system cannot find the path specified.
    2019-01-06 19:36:46 10 [ERROR] InnoDB: File .\mc5noglq9ofy7ym76z1t758ztptj6iplvs
    ldhmsext63mlvhcpew4dnu2opqdrre\th6edxfx5d1u5blb3i50ln5dfo415jirp9xkuc0h9o2ionkql
    3iomfyw4zvocfpp#p#o8w7066agxadomywht89twmbjomtfdmdc74wj7iupkd75lvu1enov1j008sjbk
    kf#sp#ywkq987ztkdj33zbmlw526153x86vxl4x44r15spf8jqs92665mt0qi6bsnkazy5.ibd: 'cre
    ate' returned OS error 203.
    2019-01-06 19:36:46 10 [ERROR] InnoDB: Cannot create file '.\mc5noglq9ofy7ym76z1
    t758ztptj6iplvsldhmsext63mlvhcpew4dnu2opqdrre\th6edxfx5d1u5blb3i50ln5dfo415jirp9
    xkuc0h9o2ionkql3iomfyw4zvocfpp#p#o8w7066agxadomywht89twmbjomtfdmdc74wj7iupkd75lv
    u1enov1j008sjbkkf#sp#ywkq987ztkdj33zbmlw526153x86vxl4x44r15spf8jqs92665mt0qi6bsn
    kazy5.ibd'
  • Bug #93600 - "Setting out of range fractional part produces incorrect timestamps". After some arguing this bug reported by Evgeny Firsov was "Verified". In MariaDB 10.3 truncation happens:
    MariaDB [test]> SET SESSION TIMESTAMP=1.9999996;
    Query OK, 0 rows affected (0.039 sec)

    MariaDB [test]> SELECT CURRENT_TIMESTAMP(6);
    +----------------------------+
    | CURRENT_TIMESTAMP(6)       |
    +----------------------------+
    | 1970-01-01 02:00:01.999999 |
    +----------------------------+
    1 row in set (0.010 sec)

    MariaDB [test]> CREATE TABLE t1( ts TIMESTAMP(6), dt DATETIME(6) );
    Query OK, 0 rows affected (0.387 sec)

    MariaDB [test]> INSERT INTO t1 values (CURRENT_TIMESTAMP(6), CURRENT_TIMESTAMP(6));
    Query OK, 1 row affected (0.079 sec)

    MariaDB [test]> select * from t1;
    +----------------------------+----------------------------+
    | ts                         | dt                         |
    +----------------------------+----------------------------+
    | 1970-01-01 02:00:01.999999 | 1970-01-01 02:00:01.999999 |
    +----------------------------+----------------------------+
    1 row in set (0.016 sec)
  • Bug #93603 - "Memory access error with alter table character change." This bug was reported by Ramesh Sivaraman from Percona QA. I've subscribed mostly to find out how bug reports with new severity level (S6) are going to be processed and fixed. See also his Bug #93701 - "Assertion `maybe_null' failed |Item_func_concat::val_str(String*)".

    I've subscribed to S7 Bug #93617 - "Conditional jump or depends on uninitialized value(s) in Field_num::Field_num" from Laurynas Biveinis for similar reason.
  • Bug #93649 - "STOP SLAVE SQL_THREAD deadlocks if done while holding LOCK INSTANCE FOR BACKUP". New MySQL 8 feature, LOCK INSTANCE FOR BACKUP statement, is an attempt to introduce backup locks to MySQL. Sergei Glushchenko found that it may cause deadlocks. I am surprised that the bug is still "Open", since December 18, 2018.
  • Bug #93683 - "Got error 155 when reading table './test/t1'". I am not sure if this message in the error log noted by Roel Van de Paar from Percona is a bug or a problem. MariaDB 10.3.7 produces the same error message. Let's find out, so far this report is "Verified".
  • Bug #93684 - "mysql innodb dump restore slows down after upgrade mysql 5.7 to 8.0". Florian Kopp reported this potential notable performance regression of MySQL 8.0.13 vs 5.7, but it is still not verified. I am not sure how this report may end up, but it's not the first report about performance regressions in MySQL 8 :)

One can hardly find any bugs in this winter forest. But they are hiding and will affect everyone there one day, in spring...
To summarize:
  1. Some regression bugs are still not marked with "regression" tag.
  2. Some MySQL bug reports are still handled wrongly, with a trend of wasting bug reporter time on irrelevant clarifications and claims the problem is not clear, when there is a good enough explanation of the test case.
  3. Percona engineers still contribute a lot of MySQL 8 QA, by reporting numerous bugs. No wonder with their first Percona Server for MySQL 8 GA release happened in December...
I still have a dozen or so December 2018 bug reports (mostly "Open" at the moment) to review one day, so stay tuned!

by Valeriy Kravchuk (noreply@blogger.com) at January 06, 2019 06:50 PM

Federico Razzoli

My 2019 Database Wishlist

2019

Last year I published my 2018 Database Wishlist, which I recently revisited to check what happened and what didn’t. Time for a 2019 wishlist.

I am not going to list items from my 2018 list, even if they didn’t happen or they partially happened. Not because I changed my mind about their importance. Just because I wrote about them recently, and I don’t want to be more boring than I usually am.

External languages for MySQL and MariaDB

MariaDB 10.3 implemented a parser for PL/SQL stored procedures. This could be good for their business, as it facilitates the migration from Oracle. But it isn’t an answer to the community request of supporting external languages, like C or Python.

Oracle theoretically allows to use languages supported by GraalVM in MySQL. But unspecified legal problems seem to stop them from releasing this. In any case, unfortunately, this feature is only available on GraalVM.

External languages are really desirable. Antony Curtis wrote a patch for this years ago, but neither MySQL or MariaDB included it. Ronal Bouman wrote mysqlv8udfs, a UDF to run JavaScript code via Google Chrome’s JavaScript engine, but it was never included in MySQL or MariaDB.

Please, Oracle and MariaDB: do it. Don’t just start another discussion in a mailing list, do it for real.

ClickHouse without ZooKeeper

ClickHouse needs ZooKeeper to setup a cluster. ZooKeeper is based on the JVM and it’s hard to use. Having to use it because there is no alternative is always quite annoying.

In their repo, there is a feature request to support Consul instead of ZooKeeper. To recap: the main problem is that they use some ZooKeeper unique features, but they plan to refactor the ZooKeeper client library, and after that it is possible they they will implement this feature.

I wrote about Consul some time ago, when I was a Percona consultant. I consider it a very good solution. The other alternative is etcd. It’s worth noting that, despite it being widely used, the CNCF still considers it as an incubating project.

Sphinx: implement JOINs

Sphinx seems to be forgot by many. It is not trendy anymore. There have not been cool titles on the tech sites involving Sphinx for years. So, is it slowly dying? No, it’s simply stable.

There are more trendy alternatives nowadays, yes. Like ElasticSearch and Solr. So why do I care about Sphinx? Well, first, it’s quite KISS (keep it simple, stupid!). It doesn’t have lots of features just because it was theoretically possible to implement them. Sorry for repeating myself, but basically… it’s stable. Second, it supports a subset of SQL, it’s kind of relational, and its language is compatible with MySQL. It could invent a new model and a new language that are supported by nothing else in the world, but they (and not many other modern database vendors) realised that it wouldn’t be a good thing.

But it misses JOINs. I don’t want to run complex JOINs on it, it will never be the right tool for that. Still, this would open more opportunities for Sphinx users. They always mentioned JOINs as something “currently missing”, so I still hope to see them implemented.

PostgreSQL: setup a real bug tracker

As I already mentioned, PostgreSQL doesn’t have a bug tracker. There is a mailing list. A problem that this surely causes is that it’s impossible to do structured searches – for example: get a list of confirmed bugs in version 11 involving foreign keys. I suspect there is another major problem: this could prevent some people from reporting bugs.

PostgreSQL is a great project, but please take bugs seriously.

Percona, be more conservative about your unique features

Every major version removes some features from the previous one. I understand why you do that, and I appreciate it. You want to keep the delta between Percona Server and MySQL as small as possible. If a feature is not used by enough customers, it implies some unjustified difference to maintain. I know that this policy is important to maintain your fork’s quality high.

Yet, I feel that you tend to remove too much stuff. Suppose I start to use a feature now and after some time I have to stop. If I think about it, I wish you didn’t implement it at all in the first place. So my point is not necessarily “maintain more features”, it could be “develop less features” as well.

Open source databases and cloud providers

We have read about some open source databases going proprietary because they spend money to innovate, while cloud providers simply take their customers away without giving anything in return. In particular, Amazon modifies open source software to create its own products, and sells them without paying the original vendors.

What can I say about Amazon… nothing, just a sarcastic “thank you”.

But I’ve something to ask database vendors. Do you know what MaxScale is? If not, it’s because this strategy of going proprietary is not as smart as you may think. Users will simply move away and forget you. Some of those users are paying customers. But even many of those who never paid you did contribute to your product – and indirectly, to your company’s incomes. How? With bug reporting, by writing technical contents about your products, by talking about how good your software is, etc. I don’t know of any estimation of how much these things contribute to a company’s economy, but I would be very interested in reading such a thing.

Federico

by Federico at January 06, 2019 02:54 PM

January 04, 2019

Jean-Jerome Schmidt

MySQL Performance Cheat Sheet

MySQL is extensive and has lots of areas to optimize and tweak for the desired performance. Some changes can be performed dynamically, others require a server restart. It is pretty common to find a MySQL installation with a default configuration, although the latter may not be appropriate per se from your workload and setup.

Here are the key areas in MySQL which I have taken from different expert sources in the MySQL world, as well as our own experiences here at Severalnines. This blog would serve as your cheat sheet to tune performance and make your MySQL great again :-)

Let’s take a look on these by outlining the key areas in MySQL.

System Variables

MySQL has lots of variables that you can consider to change. Some variables are dynamic which means they can be set using the SET statement. Others require a server restart, after they are set in the configuration file (e.g. /etc/my.cnf, etc/mysql/my.cnf). However, I’ll go over the common things that are pretty common to tune to make the server optimized.

sort_buffer_size

This variable controls how large your filesort buffer is, which means that whenever a query needs to sort the rows, the value of this variable is used to limit the size that needs to be allocated. Take note that this variable is per-query that is processed (or per-connection) basis, which means that it would be a memory hungry when you set this higher and if you have multiple connections that requires sorting of your rows. However, you can monitor your needs by checking the global status variable Sort_merge_passes. If this value is large, you should consider increasing the value of the sort_buffer_size system variable. Otherwise, take it to the moderate limit that you need. If you set this too low or if you have large queries to process, the effect of sorting your rows can be slower than expected because data is retrieved randomly doing disk dives. This can cause performance degradation. However, it is best to fix your queries. Otherwise, if your application is designed to pull large queries and requires sorting, then it is efficient to use tools that handles query caching like Redis. By default, in MySQL 8.0, the current value set is 256 KiB. Set this accordingly only when you have queries that are heavily using or calling sorts.

read_buffer_size

MySQL documentation mentions that for each request that performs a sequential scan of a table, it allocates a read buffer. The read_buffer_size system variable determines the buffer size. It is also useful for MyISAM, but this variable affects all storage engines as well. For MEMORY tables, it is use to determine the memory block size.

Basically, each thread that does a sequential scan for a MyISAM table allocates a buffer of this size (in bytes) for each table it scans. It does applies for all storage engines (that includes InnoDB) as well, so it’s helpful for queries that are sorting rows using ORDER BY and caching its indexes in a temporary file. If you do many sequential scans, bulk insert into partition tables, caching results of nested queries, then consider increasing its value. The value of this variable should be a multiple of 4KB. If it is set to a value that is not a multiple of 4KB, its value will be rounded down to the nearest multiple of 4KB. Take into account that setting this to a higher value will consume a large chunk of your server’s memory. I suggest not to use this without proper benchmarking and monitoring of your environment.

read_rnd_buffer_size

This variable deals with reading rows from a MyISAM table in sorted order following a key-sorting operation, the rows are read through this buffer to avoid disk seeks. The documentation says, when reading rows in an arbitrary sequence or from a MyISAM table in sorted order following a key-sorting operation, the rows are read through this buffer (and determined through this buffer size) to avoid disk seeks. Setting the variable to a large value can improve ORDER BY performance by quite a lot. However, this is a buffer allocated for each client, so you should not set the global variable to a large value. Instead, change the session variable only from within those clients that need to run large queries. However, you should take into account that this does not apply to MariaDB, especially when taking advantage of MRR. MariaDB uses mrr_buffer_size while MySQL uses read_buffer_size read_rnd_buffer_size.

join_buffer_size

By default, value is of 256K. The minimum size of the buffer that is used for plain index scans, range index scans, and joins that do not use indexes and thus perform full table scans. Also used by the BKA optimization (which is disabled by default). Increase its value to get faster full joins when adding indexes is not possible. Caveat though might be memory issues if you set this too high. Remember that one join buffer is allocated for each full join between two tables. For a complex join between several tables for which indexes are not used, multiple join buffers might be necessary. Best left low globally and set high in sessions (by using SET SESSION syntax) that require large full joins. In 64-bit platforms, Windows truncates values above 4GB to 4GB-1 with a warning.

max_heap_table_size

This is the maximum size in bytes for user-created MEMORY tables are permitted to grow. This is helpful when your application is dealing with MEMORY storage engine tables. Setting the variable while the server is active has no effect on existing tables unless they are recreated or altered. The smaller of max_heap_table_size and tmp_table_size also limits internal in-memory tables. This variable is also in conjunction with tmp_table_size to limit the size of internal in-memory tables (this differs from the tables created explicitly as Engine=MEMORY as it only applies max_heap_table_size), whichever is smaller is applied between the two.

tmp_table_size

The largest size for temporary tables in-memory (not MEMORY tables) although if max_heap_table_size is smaller the lower limit will apply. If an in-memory temporary table exceeds the limit, MySQL automatically converts it to an on-disk temporary table. Increase the value of tmp_table_size (and max_heap_table_size if necessary) if you do many advanced GROUP BY queries and you have large available memory space. You can compare the number of internal on-disk temporary tables created to the total number of internal temporary tables created by comparing the values of the Created_tmp_disk_tables and Created_tmp_tables variables. In ClusterControl, you can monitor this via Dashboard -> Temporary Objects graph.

table_open_cache

You can increase the value of this variable if you have large number of tables that are frequently accessed in your data set. It will be applied for all threads, meaning per connection basis. The value indicates the maximum number of tables the server can keep open in any one table cache instance. Although increasing this value increases the number of file descriptors that mysqld requires, so you might as well consider checking your open_files_limit value or check how large is the SOFT and HARD limit set in your *nix operating system. You can monitor this whether you need to increase the table cache by checking the Opened_tables status variable. If the value of Opened_tables is large and you do not use FLUSH TABLES often (which just forces all tables to be closed and reopened), then you should increase the value of the table_open_cache variable. If you have a small value for table_open_cache, and a high number of tables are frequently accessed, this can affect the performance of your server. If you notice many entries in the MySQL processlistwith status “Opening tables” or “Closing tables”, then it’s time to adjust the value of this variable but take note of the caveat mentioned earlier. In ClusterControl, you can check this under Dashboards -> Table Open Cache Status or Dashboards -> Open Tables. You can check it here for more info.

table_open_cache_instances

Setting this variable would help improve scalability, and of course, performance which would reduce contention among sessions. The value you set here limits the number of open tables cache instances. The open tables cache can be partitioned into several smaller cache instances of size table_open_cache / table_open_cache_instances . A session needs to lock only one instance to access it for DML statements. This segments cache access among instances, permitting higher performance for operations that use the cache when there are many sessions accessing tables. (DDL statements still require a lock on the entire cache, but such statements are much less frequent than DML statements.) A value of 8 or 16 is recommended on systems that routinely use 16 or more cores.

table_definition_cache

Cache table definitions i.e. this is where the CREATE TABLE are cached to speed up opening of tables and only one entry per table. It would be reasonable to increase the value if you have large number of tables. The table definition cache takes less space and does not use file descriptors, unlike the normal table cache. Peter Zaitsev of Percona suggest if you can try the setting of the formula below,

The number of user-defined tables + 10% unless 50K+ tables

But take note that the default value is based on the following formula capped to a limit of 2000.

MIN(400 + table_open_cache / 2, 2000)

So in case you have larger number of tables compared to the default, then it’s reasonable you increase its value. Take into account that with InnoDB, this variable is used as a soft limit of the number of open table instances for the data dictionary cache. It will apply the LRU mechanism once it exceeds the current value of this variable. The limit helps address situations in which significant amounts of memory would be used to cache rarely used table instances until the next server restart. Hence, parent and child table instances with foreign key relationships are not placed on the LRU list and could impose a higher than the limit defined by table_definition_cache and are not subject to eviction in memory during LRU. Additionally, the table_definition_cache defines a soft limit for the number of InnoDB file-per-table tablespaces that can be open at one time, which is also controlled by innodb_open_files and in fact, the highest setting between these variables is used, if both are set. If neither variable is set, table_definition_cache, which has a higher default value, is used. If the number of open tablespace file handles exceeds the limit defined by table_definition_cache or innodb_open_files, the LRU mechanism searches the tablespace file LRU list for files that are fully flushed and are not currently being extended. This process is performed each time a new tablespace is opened. If there are no “inactive” tablespaces, no tablespace files are closed. So keep this in mind.

max_allowed_packet

This is the per-connection maximum size of an SQL query or row returned. The value was last increased in MySQL 5.6. However in MySQL 8.0 (at least on 8.0.3), the current default value is 64 MiB. You might consider adjusting this if you have large BLOB rows that need to be pulled out (or read), otherwise you can leave this default settings with 8.0 but in older versions, default is 4 MiB so you might take care of that in case you encounter ER_NET_PACKET_TOO_LARGE error. The largest possible packet that can be transmitted to or from a MySQL 8.0 server or client is 1GB.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

skip_name_resolve

MySQL server handles incoming connections by hostname resolution. By default, MySQL does not disable any hostname resolution which means it will perform a DNS lookups, and by chance, if DNS is slow, it could be the cause of awful performance to your database. Consider turning this on if you do not need DNS resolution and take advantage of improving your MySQL performance when this DNS lookup is disabled. Take into account that this variable is not dynamic, therefore a server restart is required if you set this in your MySQL config file. You may optionally start mysqld daemon, passing --skip-name-resolve option to enable this.

max_connections

This is the number of permitted connections for your MySQL server. If you find out the error in MySQL ‘Too many connections’, you might consider setting it higher. By default, the value of 151 isn’t enough especially on a production database, and considering that you have greater resources of the server (do not waste your server resources especially if it’s a dedicated MySQL server). However, you must have enough file descriptors otherwise you will run out of them. In that case, consider adjusting your SOFT and HARD limit of your *nix operating systems and set a higher value of open_files_limit in MySQL (5000 is the default limit). Take into account that it is very frequent that the application does not close connections to the database correctly, and setting a high max_connections can result to some unresponsive or high load of your server. Using a connection pool at the application level can help resolve the issue here.

thread_cache_size

This is the cache to prevent excessive thread creation. When a client disconnects, the client's threads are put in the cache if there are fewer than thread_cache_size threads there. Requests for threads are satisfied by reusing threads taken from the cache if possible, and only when the cache is empty is a new thread created. This variable can be increased to improve performance if you have a lot of new connections. Normally, this does not provide a notable performance improvement if you have a good thread implementation. However, if your server sees hundreds of connections per second you should normally set thread_cache_size high enough so that most new connections use cached threads. By examining the difference between the Connections and Threads_created status variables, you can see how efficient the thread cache is. Using the formula stated in the documentation, 8 + (max_connections / 100) is good enough.

query_cache_size

For some setup, this variable is their worst enemy. For some systems experiencing high load and are busy with high reads, this variable will bog you down. There has been benchmarks that were well-and-tested by e.g., Percona. This variable must be set to 0 along with query_cache_type = 0 as well to turn it off. The good news in MySQL 8.0 is that, the MySQL Team has stopped supporting this, as this variable can really cause performance issues. I have to agree on their blog that it is unlikely to improve predictability of performance. If you are engaged to use query caching, I suggest to use Redis or ProxySQL.

Storage Engine - InnoDB

InnoDB is an ACID-compliant storage engine with various features to offer along with foreign key support (Declarative Referential Integrity). This has a lot of things to say here but certain variables to consider for tuning:

innodb_buffer_pool_size

This variable acts like a key buffer of MyISAM but it has lots of things to offer. Since InnoDB relies heavily on the buffer pool, you would consider setting this value typically to 70%-80% of your server’s memory. It is favorable also that you have a larger memory space than your data set, and setting a higher value for your buffer pool but not by too much. In ClusterControl, this can be monitored using our Dashboards -> InnoDB Metrics -> InnoDB Buffer Pool Pages graph. You may also monitor this with SHOW GLOBAL STATUS using the variables Innodb_buffer_pool_pages*.

innodb_buffer_pool_instances

For your concurrency workload, setting this variable can improve concurrency and reduce contention as different threads of read/write to cached pages. Minimum innodb_buffer_pool_instances should be lie between 1 (minimum) & 64 (maximum). Each page that is stored in or read from the buffer pool is assigned to one of the buffer pool instances randomly, using a hashing function. Each buffer pool manages its own free lists, flush lists, LRUs, and all other data structures connected to a buffer pool, and is protected by its own buffer pool mutex. Take note that this option takes effect only when innodb_buffer_pool_size >= 1GiB and its size is divided among the buffer pool instances.

innodb_log_file_size

This variable is the log file in a log group. The combined size of log files (innodb_log_file_size * innodb_log_files_in_group) cannot exceed a maximum value that is slightly less than 512GB. According to Vadim, a bigger log file size is better for performance, but it has a drawback (a significant one) that you need to worry about: the recovery time after a crash. You need to balance recovery time in the rare event of a crash recovery versus maximizing throughput during peak operations. This limitation can translate to a 20x longer crash recovery process!

To elaborate it, a larger value would be good for InnoDB transaction logs and are crucial for good and stable write performance. The larger the value, the less checkpoint flush activity is required in the buffer pool, saving disk I/O. However, the recovery process is pretty slow once your database was abnormally shutdown (crash or killed, either OOM or accidental). Ideally, you can have 1-2GiB in production but of course you can adjust this. Benchmarking this changes can be a great advantage to see how it performs especially during after a crash.

innodb_log_buffer_size

To save disk I/O, InnoDB’s writes the change data into lt’s log buffer and it uses the value of innodb_log_buffer_size having a default value of 8MiB. This is beneficial especially for large transactions as it does not need to write the log of changes to disk before transaction commit. If your write traffic is too high (inserts, deletes, updates), making the buffer larger saves disk I/O.

innodb_flush_log_at_trx_commit

When innodb_flush_log_at_trx_commit is set to 1 the log buffer is flushed on every transaction commit to the log file on disk and provides maximum data integrity but it also has performance impact. Setting it to 2 means log buffer is flushed to OS file cache on every transaction commit. The implication of 2 is optimal and improves performance if you can relax your ACID requirements, and can afford to lose transactions for the last second or two in case of OS crashes.

innodb_thread_concurrency

With improvements to the InnoDB engine, it is recommended to allow the engine to control the concurrency by keeping it to default value (which is zero). If you see concurrency issues, you can tune this variable. A recommended value is 2 times the number of CPUs plus the number of disks. It’s dynamic variable means it can set without restarting MySQL server.

innodb_flush_method

This variable though must be tried and tested on which hardware fits you best. If you are using a RAID with battery-backed cache, DIRECT_IO helps relieve I/O pressure. Direct I/O is not cached so it avoids double buffering with buffer pool and filesystem cache. If your disk is stored in SAN, O_DSYNC might be faster for a read-heavy workload with mostly SELECT statements.

innodb_file_per_table

innodb_file_per_table is ON by default from MySQL 5.6. This is usually recommended as it avoids having a huge shared tablespace and as it allows you to reclaim space when you drop or truncate a table. Separate tablespace also benefits for Xtrabackup partial backup scheme.

innodb_stats_on_metadata

This attempts to keep the percentage of dirty pages under control, and before the Innodb plugin, this was really the only way to tune dirty buffer flushing. However, I have seen servers with 3% dirty buffers and they are hitting their max checkpoint age. The way this increases dirty buffer flushing also doesn’t scale well on high io subsystems, it effectively just doubles the dirty buffer flushing per second when the % dirty pages exceeds this amount.

innodb_io_capacity

This setting, in spite of all our grand hopes that it would allow Innodb to make better use of our IO in all operations, simply controls the amount of dirty page flushing per second (and other background tasks like read-ahead). Make this bigger, you flush more per second. This does not adapt, it simply does that many iops every second if there are dirty buffers to flush. It will effectively eliminate any optimization of IO consolidation if you have a low enough write workload (that is, dirty pages get flushed almost immediately, we might be better off without a transaction log in this case). It also can quickly starve data reads and writes to the transaction log if you set this too high.

innodb_write_io_threads

Controls how many threads will have writes in progress to the disk. I’m not sure why this is still useful if you can use Linux native AIO. These can also be rendered useless by filesystems that don’t allow parallel writing to the same file by more than one thread (particularly if you have relatively few tables and/or use the global tablespaces)

innodb_adaptive_flushing

Specifies whether to dynamically adjust the rate of flushing dirty pages in the InnoDB buffer pool based on the workload. Adjusting the flush rate dynamically is intended to avoid bursts of I/O activity. Typically, this is enabled by default . This variable, when enabled, tries to be smarter about flushing more aggressively based on the number of dirty pages and the rate of transaction log growth.

innodb_dedicated_server

This variable is new in MySQL 8.0 which is applied globally and requires a MySQL restart since it’s not a dynamic variable. However, as documentation states that this variable is desired to be enabled only if your MySQL is running on a dedicated server. Otherwise, do not enable this on a shared host or shares system resources with other applications. When this is enabled, InnoDB will do an automatic configuration for the amount of memory detected for variables innodb_buffer_pool_size, innodb_log_file_size, innodb_flush_method. The downside only is that you cannot have the feasibility to apply your desired values on the detected variables mentioned.

MyISAM

key_buffer_size

InnoDB is the default storage engine now of MySQL, the default for key_buffer_size can probably be decreased unless you are using MyISAM productively as part of your application (but who uses MyISAM in production now?). I would suggest here to set perhaps 1% of RAM or 256 MiB at start if you have larger memory and dedicate the remaining memory for your OS cache and InnoDB buffer pool.

Other Provisions For Performance

slow_query_log

Of course, this variable does not help boost your MySQL server. However, this variable can help you out analyze slow performing queries. Value can be set to 0 or OFF to disable logging. Setting it to 1 or ON to enable this. The default value depends on whether the --slow_query_log option is given. The destination for log output is controlled by the log_output system variable; if that value is NONE, no log entries are written even if the log is enabled. You might set the filename or destination of the query log file by setting the variable slow_query_log_file.

long_query_time

If a query takes longer than this many seconds, the server increments the Slow_queries status variable. If the slow query log is enabled, the query is logged to the slow query log file. This value is measured in real time, not CPU time, so a query that is under the threshold on a lightly loaded system might be above the threshold on a heavily loaded one. The minimum and default values of long_query_time are 0 and 10, respectively. Take note also that if variable min_examined_row_limit is set > 0, it won’t log queries even if it takes too long if the number of rows returned are less than the value set in min_examined_row_limit.

For more info on tuning your slow query logging, check the documentation here.

sync_binlog

This variable controls how often MySQL will sync binlogs to the disk. By default (>=5.7.7), this is set to 1 which means it will sync to disk before transactions are committed. However, this impose a negative impact on performance due to increased number of writes. But this is the safest setting if you want strictly ACID compliant along with your slaves. Alternatively, you can set this to 0 if you want to disable disk synchronization and just rely on the OS to flush the binary log to disk from time to time. Setting it higher than 1 means the binlog is sync to disk after N binary log commit groups have been collected, where N is > 1.

Dump/Restore Buffer Pool

It is pretty common thing that your production database needs to warm up from a cold start/restart. By dumping the current buffer pool before a restart, it would save the contents from the buffer pool and once it’s up, it’ll dump the contents back again from the buffer pool. Thus, this avoids the need to warm up your database back to the cache. Take note that, this version was since introduced in 5.6 but Percona Server 5.5 has it already available, just in case you wonder. To enable this feature, set both variables innodb_buffer_pool_dump_at_shutdown = ON and innodb_buffer_pool_load_at_startup = ON.

Hardware

We’re now in 2019, there has been a lot of new hardware improvements. Typically, there’s no hard requirement that MySQL would require a specific hardware, but this depends on what you need the database to do. I would expect that you are not reading this blog because you are doing a test if it runs on an Intel Pentium 200 MHz.

For CPU, faster processors with multiple cores will be optimal for MySQL in most recent versions at least since 5.6. Intel’s Xeon/Itanium processors can be expensive but tested for scalable and reliable computing platforms. Amazon has been shipping their EC2 instances running on ARM architecture. Though I personally haven’t tried running or recall running MySQL on ARM architecture, there are benchmarks that had been made years ago. Modern CPU’s can scale their frequencies up and down based on temperature, load, and OS power saving policies. However, there’s a chance that your CPU settings in your Linux OS set to a different governor. You can check that out or set with “performance” governor by doing the following:

echo performance | sudo tee /sys/devices/system/cpu/cpu[0-9]*/cpufreq/scaling_governor

For Memory, it is very important that your memory is large and can equate the size of your dataset. Ensure that you have swappiness = 1. You can check it out by checking sysctl or checking the file in procfs. This is achieved by doing the following:

$ sysctl -e vm.swappiness
vm.swappiness = 1

Or setting it to a value of 1 as follows

$ sudo sysctl vm.swappiness=1
vm.swappiness = 1

Another great thing to consider for your Memory management is considering turning off THP (Transparrent Huge Pages). In the past, I do recall we have some weird issues encountered with CPU utilization and thought it was due to disk I/O. It turned out, the problem was with kernel khugepaged thread which allocates memory dynamically during runtime. Not only this, during kernel goes for defragmentation, your memory will be quickly allocated as it passes it to THP. Standard HugePages memory is pre-allocated at startup, and does not change during runtime. You can verify and disable this by doing the following:

$ cat /sys/kernel/mm/transparent_hugepage/enabled
$ echo "never" > /sys/kernel/mm/transparent_hugepage/enabled

For Disk, it is important that you have a good throughput. Using RAID10 is the best setup for a database with a battery backup unit. With the advent of flash drives that offers high disk throughput and high disk I/O for read/writes, it is important that it can manage the high disk utilization and disk I/O.

Operating System

Most production systems running on MySQL runs on Linux. It is because MySQL had been tested and benchmarked on Linux, and sounds that it’s the de facto standard for a MySQL installation. However, of course, there’s nothing stopping you from using it on Unix or Windows platform. It would be easier if your platform has been tested and there is a wide community to help, in case you experience some trouble. Most setups runs on RHEL/Centos/Fedora and Debian/Ubuntu systems. In AWS, Amazon has their Amazon Linux which I see as well being used in production by some.

Most important to consider with your setup is that your file system is using either XFS or Ext4. For sure, there are pros and cons between these two file systems but I won’t go to the details here. Some say XFS outperform Ext4 but there are reports as well that Ext4 outperforms XFS. ZFS is also coming out of the picture as a good candidate for an alternative file system. Jervin Real (from Percona) has a great resource on this one, you can check this presentation during the ZFS conference.

External Links

https://developer.okta.com/blog/2015/05/22/tcmalloc

https://www.percona.com/blog/2012/07/05/impact-of-memory-allocators-on-mysql-performance/

https://www.percona.com/live/18/sessions/benchmark-noise-reduction-how-to-configure-your-machines-for-stable-results

https://zfs.datto.com/2018_slides/real.pdf

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/ladbi/disabling-transparent-hugepages.html#GUID-02E9147D-D565-4AF8-B12A-8E6E9F74BEEA

by Paul Namuag at January 04, 2019 05:32 PM

Peter Zaitsev

Amazon RDS Aurora MySQL – Differences Among Editions

differences MySQL aurora versions

differences MySQL aurora versionsAmazon Aurora with MySQL Compatibility comes in three editions which, at the time of writing, have quite a few differences around the features that they support.  Make sure you don’t assume the newer Aurora 2.x supports everything in Aurora 1.x. On the contrary, right now Aurora 1.x (MySQL 5.6 based) supports most Aurora features.  The serverless option was launched for this version, and it’s not based on the latest MySQL 5.7.  However, the serverless option, too, has its own set of limitations

I found a concise comparison of what is available in which Amazon Aurora edition hard to come by so I’ve created one.  The table was compiled based mostly on documentation research, so if you spot some mistakes please let me know and I’ll make a correction.

Please keep in mind, this is expected to change over time. For example Amazon Aurora 2.x was initially released without Performance_Schema support, which was enabled in later versions.

There seems to be lag porting Aurora features from MySQL 5.6 compatible to MySQL 5.7 compatible –  the current 2.x release does not include features introduced in Aurora 1.16 or later as per this document

A comparison table

MySQL 5.6 Based MySQL 5.7 Based Serverless MySQL 5.6 Based
Compatible to MySQL MySQL 5.6.10a MySQL 5.7.12 MySQL 5.6.10a
Aurora Engine Version 1.18.0 2.03.01 1.18.0
Parallel Query Yes No No
Backtrack Yes No No
Aurora Global Database Yes No No
Performance Insights Yes No No
SELECT INTO OUTFILE S3 Yes Yes Yes
Amazon Lambda – Native Function Yes No No
Amazon Lambda – Stored Procedure Yes Yes Yes
Hash Joins Yes No Yes
Fast DDL Yes Yes Yes
LOAD DATA FROM S3 Yes Yes No
Spatial Indexing Yes Yes Yes
Asynchronous Key Prefetch (AKP) Yes No Yes
Scan Batching Yes No Yes
S3 Backed Based Migration Yes No No
Advanced Auditing Yes Yes No
Aurora Replicas Yes Yes No
Database Cloning Yes Yes No
IAM database authentication Yes Yes No
Cross-Region Read Replicas Yes Yes No
Restoring Snapshot from MySQL DB Yes Yes No
Enhanced Monitoring Yes Yes No
Log Export to Cloudwatch Yes Yes No
Minor Version Upgrade Control Yes Yes Always On
Data Encryption Configuration Yes Yes Always On
Maintenance Window Configuration Yes Yes No

Hope this is helps with selecting which Amazon Aurora edition is right for you, when it comes to supported features.


Photo by Nathan Dumlao on Unsplash

by Peter Zaitsev at January 04, 2019 03:51 PM

Percona XtraDB Cluster 5.6.42-28.30 Is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.6Percona announces the release of Percona XtraDB Cluster 5.6.42-28.30 (PXC) on January 4, 2019. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.42-28.30 is now the current release, based on the following:

All Percona software is open-source and free.

Fixed Bugs

  • PXC-2281: Debug symbols were missing in Debian dbg packages.
  • PXC-2220: Starting two instances of Percona XtraDB Cluster on the same node could cause writing transactions to a page store instead of a galera.cache ring buffer, resulting in huge memory consumption because of retaining already applied write-sets.
  • PXC-2230rgcs.fc_limit=0 not allowed as dynamic setting to avoid generating flow control on every message was still possible in my.cnf due to the inconsistent check.
  • PXC-2238: setting read_only=1 caused race condition.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Dmitriy Kostiuk at January 04, 2019 03:12 PM

Percona XtraDB Cluster 5.7.24-31.33 Is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.7Percona is glad to announce the release of Percona XtraDB Cluster 5.7.24-31.33 (PXC) on January 4, 2019. Binaries are available from the downloads section or from our software repositories.

Percona XtraDB Cluster 5.7.24-31.33 is now the current release, based on the following:

Deprecated

The following variables are deprecated starting from this release:

  • wsrep_preordered was used to turn on transparent handling of preordered replication events applied locally first before being replicated to other nodes in the cluster. It is not needed anymore due to the carried out performance fix eliminating the lag in asynchronous replication channel and cluster replication.
  • innodb_disallow_writes usage to make InnoDB avoid writes during SST was deprecated in favor of the innodb_read_only variable.
  • wsrep_drupal_282555_workaround avoided the duplicate value creation caused by buggy auto-increment logic, but the correspondent bug is already fixed.
  • session-level variable binlog_format=STATEMENT was enabled only for pt-table-checksum, which would be addressed in following releases of the Percona Toolkit.

Fixed Bugs

  • PXC-2220: Starting two instances of Percona XtraDB Cluster on the same node could cause writing transactions to a page store instead of a galera.cache ring buffer, resulting in huge memory consumption because of retaining already applied write-sets.
  • PXC-2230: rgcs.fc_limit=0 not allowed as dynamic setting to avoid generating flow control on every message was still possible in my.cnf due to the inconsistent check.
  • PXC-2238: setting read_only=1 caused race condition.
  • PXC-1131mysqld-systemd threw an error at MySQL restart in case of non-existing error-log in Centos/RHEL7.
  • PXC-2269: being not dynamic, the pxc_encrypt_cluster_traffic variable was erroneously allowed to be changed by a SET GLOBAL statement.
  • PXC-2275: checking wsrep_node_address value in the wsrep_sst_common command line parser caused parsing the wrong variable.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

 

by Dmitriy Kostiuk at January 04, 2019 01:13 PM

January 03, 2019

Peter Zaitsev

Upcoming Webinar Friday 1/4: High-Performance PostgreSQL, Tuning and Optimization Guide

High-Performance PostgreSQL, Tuning and Optimization Guide

High-Performance PostgreSQL, Tuning and Optimization GuidePlease join Percona’s Senior Software Engineer, Ibrar Ahmed as he presents his High-Performance PostgreSQL, Tuning and Optimization Guide on Friday, January, 4th, at 8:00 AM PDT (UTC-7) / 11:00 AM EDT (UTC-4).

Register Now

PostgreSQL is one of the leading open-source databases. Out of the box, the default PostgreSQL configuration is not tuned for any workload. Thus, any system with least resources can run it. PostgreSQL does not give optimum performance on high permanence machines because it is not using the all available resource. PostgreSQL provides a system where you can tune your database according to your workload and machine’s specifications. In addition to PostgreSQL, we can also tune our Linux box so that the database load can work optimally.

In this webinar on High-Performance PostgreSQL, Tuning and Optimization, we will learn how to tune PostgreSQL and we’ll see the results of that tuning. We will also touch on tuning some Linux kernel parameters.

 

by Ibrar Ahmed at January 03, 2019 04:42 PM

MongoDB Engines: MMAPV1 Vs WiredTiger

review of MongoDB storaage MMAPv1 and WiredTiger

review of MongoDB storaage MMAPv1 and WiredTigerIn this post, we’ll take a look at the differences between the MMAP and WiredTiger engines in MongoDB®. I’ve been asked this question by customers many times, and this blog is for you! We’ll tell you about the key features of these engines, then you can choose the right engine based on your requirement.

In MongoDB, we mainly use the MMAPV1 and WiredTiger engines. We could use other engines like in-Memory, rocks db with Percona Server for MongoDB (PSMDB), and in-memory engine with MongoDB Enterprise version. When MongoDB was introduced, MMAPV1 was the default engine and it’s still a part of the MongoDB releases, though it will not be seen from 4.2 as per MongoDB’s plan. Those who remember the days working with version 1.8 might miss this, even though they don’t use MMAP currently! MongoDB acquired wiredTiger Inc (see here https://www.mongodb.com/press/wired-tiger) and from version 3.2 made it the default engine of MongoDB. This engine enabled the introduction of transactions with multi-documents, and is mainly used for features such as compression and document level locking. Here we’ll see the key features of wiredTiger and MMAPV1, and also present them in a tabular column at the end – who doesn’t love a table to check quickly the differences! It reminds me my school days :-)). My co-author, and friend – Aayushi feels the same?! 🙂

Some differences in detail

Storage Engines

The MongoDB storage engines manage BSON data in memory and on disk to support read and write operations.

MMAPV1:  This is the original storage engine for MongoDB, introduced in the first release, but from version 4.0 it is deprecated

WiredTiger:  This is the pluggable engine introduced by MongoDB in version 3.0 and it became the default storage engine from version 3.2

Data compression

MMAPV1: does not support data compression and it is based on memory mapped files. So it works well when you can keep your writeset in memory. It excels at workloads with high volume inserts, reads, and in-place updates.

WiredTiger: supports snappy and zlib compression. Consequently, MongoDB with WiredTiger takes very little space comparing with MMAP. It has its own write-cache and a filesystem cache.

  • Snappy: This is the default algorithm,  efficient computation with reasonable compression. See here.
  • Zlib: higher compression rate at the cost of CPU. See here.

Data Directory

Let’s take a look at the file system supporting the same data and replica set member for each of the engines. 

MMAPV1:

total 1.2G
-rw-r--r-- 1 vagrant vagrant    5 Nov 28 04:41 mongod.lock
-rw-rw-r-- 1 vagrant vagrant   69 Nov 28 04:41 storage.bson
-rw------- 1 vagrant vagrant  16M Nov 28 04:58 local.0
drwxrwxr-x 2 vagrant vagrant 4.0K Nov 28 04:58 journal
-rw------- 1 vagrant vagrant  16M Nov 28 04:58 admin.ns
-rw------- 1 vagrant vagrant  16M Nov 28 04:58 admin.0
-rw------- 1 vagrant vagrant 512M Nov 28 04:59 local.2
drwxrwxr-x 2 vagrant vagrant 4.0K Nov 28 04:59 diagnostic.data
drwxrwxr-x 2 vagrant vagrant 4.0K Nov 28 05:16 _tmp
-rw------- 1 vagrant vagrant  16M Nov 28 05:17 test.ns
-rw------- 1 vagrant vagrant  16M Nov 28 05:17 test.0
-rw------- 1 vagrant vagrant  32M Nov 28 05:17 test.1
-rw------- 1 vagrant vagrant  16M Nov 28 09:09 local.ns
-rw------- 1 vagrant vagrant 512M Nov 28 09:09 local.1

WiredTiger:

total 5.4M
-rw-rw-r-- 1 vagrant vagrant   21 Nov 28 07:38 WiredTiger.lock
-rw-rw-r-- 1 vagrant vagrant   49 Nov 28 07:38 WiredTiger
drwxrwxr-x 2 vagrant vagrant 4.0K Nov 28 07:38 journal
-rw-rw-r-- 1 vagrant vagrant 4.0K Nov 28 07:38 WiredTigerLAS.wt
-rw-rw-r-- 1 vagrant vagrant   95 Nov 28 07:38 storage.bson
-rw-r--r-- 1 vagrant vagrant    5 Nov 28 07:38 mongod.lock
-rw-rw-r-- 1 vagrant vagrant  16K Nov 28 07:38 index-7--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant  16K Nov 28 07:38 index-5--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant  16K Nov 28 07:38 index-3--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant  16K Nov 28 07:38 index-1--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant  16K Nov 28 07:38 collection-4--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant  16K Nov 28 07:38 collection-2--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant  16K Nov 28 07:38 collection-0--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant  16K Nov 28 07:38 index-15--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant  16K Nov 28 07:38 index-14--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant 1.8M Nov 28 07:38 index-17--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant 3.2M Nov 28 07:39 collection-16--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant  16K Nov 28 07:39 collection-13--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant  32K Nov 28 07:39 _mdb_catalog.wt
-rw-rw-r-- 1 vagrant vagrant  36K Nov 28 09:09 sizeStorer.wt
-rw-rw-r-- 1 vagrant vagrant  36K Nov 28 09:09 collection-6--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant  52K Nov 28 09:09 collection-12--2134189858403062482.wt
-rw-rw-r-- 1 vagrant vagrant  76K Nov 28 09:09 WiredTiger.wt
-rw-rw-r-- 1 vagrant vagrant 1003 Nov 28 09:09 WiredTiger.turtle
drwxrwxr-x 2 vagrant vagrant 4.0K Nov 28 09:09 diagnostic.data

Journaling

MMAPV1: Ensures that writes are atomic.  If MongoDB goes down or terminates before committing changes to the data files, MongoDB can use the journal files to apply the write operation to the data files and maintain a consistent state.

WiredTiger: This uses checkpoints between writes and the journal persists all data modifications between checkpoints. So for any recovery from database crash or abrupt termination, it uses journal entries since the last checkpoint. In most cases, journal is not necessary for this engine and you enable it only if you need to be sure to recover until the last successful write before the crash from the journal. Otherwise, usually MongoDB can recover from the last valid checkpoint. Checkpoint occurs every minute by default. 

Journal directory

This is how journal files appear in the data directory for the different engines:

MMAPV1:

vagrant@m103:/data/mongo1/journal$ ls -lrth
total 35M
-rw------- 1 vagrant vagrant  88 Nov 28 09:17 lsn
-rw------- 1 vagrant vagrant 35M Nov 28 09:17 j._0

WiredTiger:

-rw-rw-r-- 1 vagrant vagrant 100M Nov 28 07:38 WiredTigerPreplog.0000000001
-rw-rw-r-- 1 vagrant vagrant 100M Nov 28 07:38 WiredTigerPreplog.0000000002
-rw-rw-r-- 1 vagrant vagrant 100M Nov 28 09:16 WiredTigerLog.0000000001

Locks and concurrency

MMAPV1

  • Up until version 2.6: uses a readers-writer [1] lock that allows concurrent reads access to a database, but gives exclusive access to a single write operation. When a read lock exists, many read operations may use this lock. However, when a write lock exists, a single write operation holds the lock exclusively, and no other read or write operations may share the lock.
  • From 3.0: The MMAPv1 storage engine uses collection level locking as of the 3.0 release series, an improvement on earlier versions in which the database lock was the finest-grain lock.

WiredTiger: supports document level locking. For most read and write operations, WiredTiger uses optimistic concurrency control. WiredTiger uses only intent locks at the global, database, and collection levels.

For example: deleting documents from the collection “testData” for a value of {x:1}, will acquire write “LOCK” at collection level differently for each of the storage engines.

MMAPV1:

2018-12-17T10:09:46.830+0000 I COMMAND &nbsp;[conn8] command
testDB.$cmd appName: "MongoDB Shell"
command: delete { delete: "testData",
deletes: [ { q: { x: 1.0 }, limit: 0.0 } ], ordered: true }
numYields:0 reslen:89 locks:{ Global: { acquireCount: { r: 100795, w: 100795 } },
MMAPV1Journal: { acquireCount: { w: 100796 }, acquireWaitCount: { w: 12 },
timeAcquiringMicros: { w: 46212 } }, Database: { acquireCount: { w: 100795 } }
, Collection: { acquireCount: { W: 795 } }

where w = Represents Exclusive (X) lock

WiredTiger:

2018-12-17T10:17:38.340+0000 I COMMAND &nbsp;[conn1] command
testDB.$cmd appName: "MongoDB Shell"
command: delete { delete: "testData",
deletes: [ { q: { x: 1.0 }, limit: 0.0 } ], ordered: true }
numYields:0 reslen:89 locks:{ Global: { acquireCount: { r: 100795, w: 100795 } },
Database: { acquireCount: { w: 100795 } }, Collection: { acquireCount: { w: 795 } }

where w = Represents Intent Exclusive (IX) lock

Memory

MMAPv1: MongoDB automatically uses all free memory on the machine as its cache. System resource monitors show that MongoDB uses a lot of memory, but its usage is dynamic. If another process suddenly needs half the server’s RAM, MongoDB will yield cached memory to the other process.

Technically, the operating system’s virtual memory subsystem manages MongoDB’s memory. This means that MongoDB will use as much free memory as it can, swapping to disk as needed. Deployments with enough memory to fit the application’s working data set in RAM will achieve the best performance.

WiredTiger: with wiredTiger, MongoDB utilizes both the WiredTiger internal cache and the filesystem cache. Via the filesystem cache, MongoDB automatically uses all free memory that is not used by the WiredTiger cache or by other processes. Starting in 3.4, the WiredTiger internal cache, by default, will use the larger of either:

  • 50% of (RAM – 1 GB), or
  • 256 MB.

Quick reference: MMAPV1 vs WiredTiger

Use this table for a quick reference to the differences between MMAPv1 and WiredTiger

Key Feature MMAPV1 wiredTiger
Introduction & Default Engine Introduced with MongoDB from scratch and default engine till 3.0 version. Deprecated in 4.0 and will be removed in future Introduced in 3.0 version and default from 3.2 version
Data Compression Doesn’t support compression Compression with default snappy compression method and zlib compression method. So occupy less space than MMAPV1 engine
Journaling MongoDB writes the in-memory changes first to on-disk journal files. If MongoDB goes down/terminates before committing the changes to the data files, MongoDB can use the journal files to apply the write operation to the data files and maintain a consistent state. The WiredTiger journal persists all data modifications between checkpoints. If MongoDB exits between checkpoints, it uses the journal to replay all data modified since the last checkpoint.
Locks & Concurrency Till 2.6, MongoDB uses a readers-writer [1] lock that allows concurrent reads access to a database but gives exclusive access to a single write operation. From 3.0, uses collection level lock It supports document level locking.
Transaction Operation on a single document is atomic Multi-document transactions are only available for deployments from version 4.0
CPU Performance adding CPU cores does not improve performance much performs better on multicore systems
Encryption Encryption is not possible Encryption at rest is available with MongoDB enterprise and as BETA in PSMDB 3.6.8
Memory automatically uses all free memory on the machine as its cache Uses internal cache and filesystem cache
Updates It excels at workloads with high volume inserts, reads, and in-place updates. Does not support in place updates. It causes the whole document to rewrite
Tuning Less chance to tune it Allows more tuning with this engine through different variables. Eg: cache size, read / write tickets, checkpoint interval etc

Conclusion

The above information does not cover every difference between MMAPV1 and WiredTiger, but it lists the key differences. If you have any key features to add, please feel free to add in the comments! Let’s share and let everyone know about them 🙂


Photo by Mathew Schwartz on Unsplash

by Vinodh Krishnaswamy at January 03, 2019 02:05 PM

January 02, 2019

MariaDB Foundation

MariaDB 10.2.21 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.2.21, the latest stable release in the MariaDB 10.2 series. See the release notes and changelogs for details. Download MariaDB 10.2.21 Release Notes Changelog What is MariaDB 10.2? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.2.21 Daniel Bartholomew (MariaDB Corporation) Eugene […]

The post MariaDB 10.2.21 now available appeared first on MariaDB.org.

by Ian Gilfillan at January 02, 2019 08:24 PM

Peter Zaitsev

TasksMax: Another Setting That Can Cause MySQL Error Messages

TasksMax setting causing errors MySQL

TasksMax setting causing errors MySQLRecently, I encountered a situation where MySQL gave error messages that I had never seen before:

2018-12-12T14:36:45.571440Z 0 [ERROR] Error log throttle: 150 'Can't create thread to handle new connection' error(s) suppressed
2018-12-12T14:36:45.571456Z 0 [ERROR] Can't create thread to handle new connection(errno= 11)
2018-12-12T14:37:47.748575Z 0 [ERROR] Error log throttle: 940 'Can't create thread to handle new connection' error(s) suppressed
2018-12-12T14:37:47.748595Z 0 [ERROR] Can't create thread to handle new connection(errno= 11)

I was thinking maybe we hit some

ulimit
 limitations or similar, but all the usual suspects were set high enough, and we were not even close to them.

After googling and discussing with the customer, I found they had had similar issues in the past, and I learned something new. Actually it is relatively new, as it has been around for a few years but is not that well known. It is called TasksMax:

Specify the maximum number of tasks that may be created in the unit. This ensures that the number of tasks accounted for the unit (see above) stays below a specific limit. This either takes an absolute number of tasks or a percentage value that is taken relative to the configured maximum number of tasks on the system. If assigned the special value “infinity“, no tasks limit is applied. This controls the “pids.max” control group attribute. For details about this control group attribute, see pids.txt.

Source Manual.

It was introduced to systemd in 2015:

I’d like to introduce DefaultTasksMax= that controls the default
value of the per-unit TasksMax= by default, and would like it to
set to some value such 1024 out-of-the-box. This will mean that any
service or scope created will by default be limited to 1024
tasks. This of course is a change from before that has the
potential to break some daemons that maintain an excessive number
of processes or threads. However, I think it’s a much better choice
to raise the limit for them, rather than stay unlimited for all
services by default. I think 1024 is not particularly low, but also
not particularly high. Note that the kernel by default limits the
number of processes to 32K in total anyway.

In the end, we can see in this commit they chose 512 to be the default settings for TasksMax, which means services that are not explicitly configured otherwise will only be able to create at most 512 processes or threads.

Why 512? I have read through the email list and there was some discussion about what should be the default. Eventually, I found this comment from one of the developers:

Anyway, for now I settled for the default TasksMax= setting of 512 for
all units, plus 4096 for the per-user slices and 8192 for each nspawn
instance. Let’s see how this will work out.

So this is how 512 become the default and no one has touched it since. MySQL is able to reach that limit and can cause error messages like those we see above.

You can increase this limit by creating a file called

/etc/systemd/system/mysqld.service
  :

[Service]
TasksMax=infinity

You can use a specific number like 4096 (or any other number based on your workload), or infinity which means MySQL can start as many processes as it wants.

Conclusion

Not everyone will reach this limit, but if MySQL is giving error messages like this you should also check TasksMax as well as the other usual suspects. The easiest way to verify the current setting is:

#> systemctl show -p TasksMax mysql
   TasksMax=512


Photo by Vlad Tchompalov on Unsplash

by Tibor Korocz at January 02, 2019 01:26 PM

December 31, 2018

Peter Zaitsev

Great things that happened with PostgreSQL in the Year 2018

PostgreSQL in 2018

In this blog post, we’ll look back at what’s been going on in the world of PostgreSQL in 2018.

Before we start talking about the good things that have happened in the PostgreSQL in  2018, we hope you had a wonderful year and we wish you a happy and prosperous 2019.

PostgreSQL has been a choice for those who are looking for a completely community-driven open source database that is feature-rich and extensible. We have seen tremendously great things happening in PostgreSQL for many years, with 2018 being a prime example. As you could see the following snippet from DB engine rankings, PostgreSQL has topped the chart for growth in popularity in the year 2018 compared to other databases.

PostgreSQL adoption growth has been increasing year over year, and 2018 has again been one such year as we can see.

Let’s start with a recap of some of the great PostgreSQL events, and look at what we should take away from 2018 in the PostgreSQL space.

PostgreSQL 11 Released

PostgreSQL 11 was a release that incorporated a lot of features offered in commercial database software governed by an enterprise license. For example, there are times when you are required to enforce the handling of embedded transactions inside a stored procedure in your application code. There are also times when you wish to partition a table with foreign keys or use hash partitioning. This used to require workarounds. The release of PostgreSQL 11 covers these scenarios.

There were many other add-ons as well, such as Just-In-Time compilation, improved query parallelism, partition elimination, etc. You can find out more in our blog post here, or the PostgreSQL 11 release notes (if you have not seen already). Special thanks to everyone involved in such a vibrant PostgreSQL release.

End of Life for PostgreSQL 9.3

9.3.25 was the last minor release that has happened for PostgreSQL 9.3 (on November 8, 2018). There will be no more minor releases supported by the community for 9.3. If you are still using PostgreSQL 9.3 (or a major earlier release than 9.3), it is the time to start planning to upgrade your database to take advantage of additional features and performance improvements.

Watch out for future Percona webinars (dates will be out soon) on PostgreSQL migrations and upgrades that will help handle situations such as downtime and other complexities involved in migrating your partitions built using table inheritance when you migrate from legacy PostgreSQL versions to the latest versions.

PostgreSQL Minor Releases

For minor PostgreSQL release, there was nothing new in what we saw this year compared to previous years. The PostgreSQL community aims for a minor version release for all the supported versions every quarter. However, we may see more minor releases due to critical bug fixes or security fixes. One of such release was done on March 3rd, 2018 for the CVE-2018-1058 security fix. This proves that you do not necessarily need to wait for specific release dates when a security vulnerability has been identified. You may see the fix released as a minor version as soon as the development, review and testing are completed for the fix.  

There have been five minor releases this year on the following dates.

Security Fixes in All the Supported PostgreSQL Releases This Year

The PostgreSQL Global Development Team and contributors handle security fixes very seriously. There have been several instances where we have received immediate responses after reporting a problem or a bug. Likewise, we have seen many security bug fixes as soon as they have been reported.

Following are a list of security fixes we have seen in the year 2018:

We thank all the Core team, Contributors, Hackers and Users involved in making it another great year for PostgreSQL and a huge WIN for the open source world.

If you would like to participate in sharing your PostgreSQL knowledge to a wider audience, or if you have got a great topic that you would like to talk about, please submit your proposal to one of the world’s biggest open source conferences: Percona Live Open Source Database Conference 2019 in Austin, Texas from May 28-30, 2019. The Call for Papers is open until Jan 20, 2019.

You can also submit blog articles to our Open Source Database Community Blog for publishing to a wider open source database audience.

Please subscribe to our blog posts to hear many more great things that are happening in the PostgreSQL space.

by Avinash Vallarapu at December 31, 2018 10:28 PM

MariaDB Foundation

Arjen’s Last Post

Tonight marks the conclusion of my brief period at the MariaDB Foundation. The MDBF team is a wonderful diverse and talented group of people, and it has been my absolute honour and pleasure to work with each and every one of them.  It was great to meet almost everybody (and some of your families!) at […]

The post Arjen’s Last Post appeared first on MariaDB.org.

by Arjen Lentz at December 31, 2018 09:52 AM

December 30, 2018

Valeriy Kravchuk

MariaDB JIRA for MySQL DBAs

These days several kinds and forks of MySQL are widely used, and while I promised not to write about MySQL bugs till the end of 2018, I think it makes sense to try to explain basic details about bug reporting for at least one of vendors that use JIRA instances as a public bug tracking systems. I work for MariaDB Corporation and it would be natural for me to write about MariaDB's JIRA that I use every day.

As a side note, Percona also switched to JIRA some time ago, and many of the JIRA-specific details described below (that are different comparing to good old https://bugs.mysql.com/) apply to Percona bugs tracking system as well.

Why would MariaDB bugs be interesting to an average MySQL community member who does not use MariaDB at all most of the time? One of the reasons is that some MySQL bugs are also reported (as "upstream") to MariaDB and they may be fixed there well before they are fixed in MySQL. Consider MDEV-15953 - "Alter InnoDB Partitioned Table Moves Files (which were originally not in the datadir) to the datadir" (reported by Chris Calender) as an example. It was fixed in MariaDB 5 months ago, while corresponding Bug #78164 is still "Verified" and got no visible attention for more that 3 years. The fix is 12 rows added in two files (test case aside), so theoretically can be easily used to modify upstream MySQL by an interested and determined MySQL user who already compiles MySQL from the source code (for whatever reason), if the problem is important in their environment.

Another reason is related to the fact that work on new features of MariaDB server and connectors is performed in an open manner at all stages. You can see current plans, discussions and decision making on new features happening in real time, in JIRA. Existing problems (that often affect both MySQL and MariaDB) are presented and analyzed, and reading related comments may be useful to understand current limitations of MySQL and decide if at some stage switching to MariaDB or using some related patches may help in your production environment. There is no need to wait for some lab preview release. You can also add comments on design decisions and question them before it's too late. Great example of such a useful to read (for anyone interested in InnoDB) feature request and work in progress is MDEV-11424 - "Instant ALTER TABLE of failure-free record format changes".

Yet another reason to use MariaDB JIRA and follow some bug reports and feature requests there is to find some insights on how MySQL, its components (like optimizer) and storage engines (like InnoDB) really work. Consider my Bug #82127 - "Deadlock with 3 concurrent DELETEs by UNIQUE key". This bug was originally reported to Percona as lp:1598822 (as it was first noticed with Percona's XtraDB emgine) and ended up in their JIRA as PS-3479 (still "New"). In MySQL bugs database it got "Verified" after some discussions. Eventually I gave up waiting for "upstream" to make any progress on it and reported it as MDEV-10962. In that MariaDB bug report you can find explanations of the behavior noticed, multiple comments and ideas on the root case and on how to improve locking behavior in this case, links to other related bugs etc. It's a useful reading. Moreover, we see that there are plans to improve/fix this in MariaDB 10.4.

I also like to check some problematic and interesting test cases, no matter in what bugs database it was reported, on both MariaDB Server, Percona Server and MySQL Server, as long as it's about some common features. But may be it's so because I work with all these as a support engineer.

Anyway, one day following MariaDB Server bugs may help some MySQL DBA to do the job better. So, I suggest all MySQL users to check MariaDB's JIRA from time to time. Some basic details about the differences comparing to MySQL's bugs database are presented below.

First thing to notice in case of MariaDB's JIRA is a domain name. It's jira.mariadb.org, so bug tracking system formally "belongs" to MariaDB Foundation - non-profit entity that supports continuity and open collaboration in the MariaDB ecosystem. Both MariaDB Foundation employees, MariaDB Corporation employees, developers working for partners (like Codership) and community members (like Olivier Bertrand, author of CONNECT storage engine I had written about here) work on source code (and bugs processing and fixing) together, at GitHub. Different users have different roles and privileges in JIRA, surely. But there is no other, "internal" bugs database in MariaDB Corporation. All work or bugs and features, time reporting, code review process, as well as release planning happen (or at least is visible) in an open manner, in JIRA.

Even if you do not have JIRA account, you still can see Jira Road Map, release plans and statuses. You can see all public comments and history of changes for each bug. If you create and log in into your account (this is needed to report new bugs, vote for them or watch them and get email notifications about any changes, obviously) you'll see also more details on bugs, like links to GitHub commits and pull requests related to the bug.

Unlike MySQL bugs database where bugs are split into "Categories" (where both "MySQL Server: Information schema" and "MySQL Workbench" are categories more or less of the same level) but are numbered sequentially over all categories, JIRA instances usually support "Projects", with separate "name" and sequential numbering of bugs per project.

At the moment there are 17 or so projects in MariaDB JIRA, of them the following public ones are most interesting for MySQL community users, I think:
Let's consider one MariaDB Server bug for example:

Unlike in MySQL bugs database, JIRA issues have "Type". For our case it's important that feature requests usually end up as "Task" vs "Bug" as a type for a bug. Some projects in MariaDB JIRA may also support a separate "New Feature" type to differentiate features from tasks not related to creating new code. In MySQL separate severity (S4, "Feature request") is used.

MariaDB JIRA issues have priorities from the following list:
  • Trivial
  • Minor
  • Major
  • Critical
  • Blocker
By default MariaDB bugs are filed with intermediate, "Major" priority. Priority may be changed by the bug reporter or by JIRA users (mostly developers) who work on the bug, it often changes with time (priority may increase if more users are affected, or if the fix does not happen for long enough time etc, or decrease when the problem can be workarounded somehow for affected users). Usually a bug with "Blocker" priority means there should be no next minor release for any major version listed in "Fix Version/s" without the fix.

There are many fields in MySQL bugs database to define priority of the fix (including "Priority" itself), but only "Severity" is visible to public. Usually "Severity" of the MySQL bug does NOT change with time (if only before it's "Verified").

It is normal to list all/many versions affected by the bug in JIRA in "Affected Version/s". If the bug is fixed, in "Fix Version/s" you can find the exact list of all minor MariaDB Server versions that got the fix.

Each JIRA issue has a "Status" and "Resolution". In MySQL bugs database there is just "Status" for both. Valid statuses are:
  • OPEN - this is usually a bug that is just reported or is not yet in the process of fixing.
  • CONFIRMED - this status means that some developer checked bug report and confirmed it's really a bug and it's clear how to reproduce it based on the information already present in the report. More or less this status matches "Verified" MySQL bug. But unlike in MySQL, even "Open" bug may be assigned to a developer to further work on it.
  • CLOSED - the bug is resolved somehow. See the content of the "Resolution" filed for details on how it was resolved.
  • STALLED - this is a real bug and some work on it was performed, but nobody actively works on it now.
  • IN PROGRESS - assignee is currently working on the fix for the bug.
  • IN REVIEW - assignee is currently reviewing the fix for the bug.
The following values are possible for "Resolution" field:
  • Unresolved - every bug that is not "CLOSED" is "Unresolved".
  • Fixed - every bug that was fixed with some change to the source code. If you log in to JIRA you should be able to find links to GitHub commit(s) with the fix in the "Fixed" JIRA issue.
  • Won't Fix - the problem is real, but it was decided not to fix it (as it's expected or may be too hard to fix). Consider my MDEV-15213 - "UPDATEs are slow after instant ADD COLUMN" as one of examples.
  • Duplicate - there is another bug report about the same problem. You can find link to it in the JIRA issue.
  • Incomplete - there is no way to reproduce or understand the problem based on the information provided. See MDEV-17808 for example.
  • Cannot Reproduce - bug reporter himself can not reproduce the problem any more, even after following the same steps that caused the problem before. See MDEV-17667 for example.
  • Not a Bug - the problem described is not a result of any bug. Everything works as designed and probably some misunderstanding caused bug reporter to think it was a bug. See MDEV-17790 as a typical example.
  • Done - this is used for completed tasks (like MDEV-17886) or bugs related to some 3rd party stored engine where the fix is done, but it's up to MariaDB to merge/use fixed version of the engine (like MDEV-17212).
  • Won't Do - it was decided NOT to do the task. See MDEV-16418 as one of examples.
In MySQL there are separate bug statuses for (most of) these. There are some tiny differences for the way some statuses like "Cannot reproduce" are applied by those who process bugs in MySQL vs MariaDB though.

Explanations above should be enough for any MySQL bugs database user to start using MariaDB's JIRA efficiently, I think. But I am open to any followup questions and I am considering separate blog posts explaining the life cycle of a MariaDB Server bug and some tips on efficient search in MariaDB JIRA.

by Valeriy Kravchuk (noreply@blogger.com) at December 30, 2018 04:41 PM

December 29, 2018

Federico Razzoli

SQL Common Errors: confusing WHERE and ON

You probably know the ON clause of SQL JOINs. It is a condition that determines which rows from a table are related to a row from another table.

Conceptually, this is extremely different from the WHERE clause, that determines which row combinations will be returned to the client. However, sometimes a condition can be moved from the WHERE clause to the ON clause, because it will still prevent rows from being excluded. But please keep on reading, because the TL;DR of this post is: don’t do that.

Actually, why would one move conditions from WHERE to ON? Because many developers think that this is an optimization. They believe that the ON clause is executed before the WHERE clause, so moving the conditions will cause them to be evaluated at an earlier stage, avoiding some useless work.

Now, one may actually show some examples where moving a condition actually speeds up a query. This can happen in some uncommon cases, and the root cause is that the optimizer is not able to find the optimal execution plan. Don’t focus on the ON clause. Just focus on building proper indexes or finding out why the optimizer is not using them. But this is a wider topic that will not be covered here.

ON means not WHERE

Let me show you an example. I used MariaDB here, but I expect the same results with all DBMSs I know.

SELECT a.full_name, b.title
    FROM author a
    LEFT JOIN book b
        ON a.id = b.author_id
    WHERE b.title LIKE 'the%'
;
+------------+-----------------------------+
| full_name  | title                       |
+------------+-----------------------------+
| H.G. Wells | The invisible man           |
| H.G. Wells | The Island of Doctor Moreau |
+------------+-----------------------------+
2 rows in set (0.000 sec)

We get a list of the books whose title starts with “the” (case-insensitive), and their authors. As you can see, the ON clause is used to join the rows correctly, and the WHERE clause determines which rows are to be filtered in.

In this database, I inserted another author: Edgar Allan Poe. But he doesn’t appear, because none of the books I inserted starts with “The”.

But what if I move the WHERE condition to ON?

SELECT a.full_name, b.title
    FROM author a
    LEFT JOIN book b
        ON a.id = b.author_id
        AND b.title LIKE 'The%'
;
+-----------------+-----------------------------+
| full_name       | title                       |
+-----------------+-----------------------------+
| H.G. Wells      | The invisible man           |
| H.G. Wells      | The Island of Doctor Moreau |
| Edgar Allan Poe | NULL                        |
+-----------------+-----------------------------+
3 rows in set (0.000 sec)

Edgar Allan Poe appears! Why? SQL is supid!? MariaDB is buggy?! These are typical reactions, but… no, the query is wrong.

This is a LEFT JOIN. Rows from the left tables are returned even if they never match the ON clause.

With an INNER JOIN, actually you will not notice any difference. But what if you are looking for books who title starts with “the” OR whose author name starts with a letter < "M"? Try it yourself, as an exercise. You will see that you cannot always fix the problem by using an INNER JOIN: you need to use ON properly : -)

Federico

by Federico at December 29, 2018 10:04 AM

December 27, 2018

Jean-Jerome Schmidt

Severalnines 2018 Momentum: Raising the Bar on MySQL, MariaDB, PostgreSQL & MongoDB Management

I’d like to take advantage of the quiet days between holidays to look back on 2018 at Severalnines as we continue to advance automation and management of the world’s most popular open source databases: MySQL, MariaDB, PostgreSQL & MongoDB!

And take this opportunity to thank you all for your support in the past 12 months and celebrate some of our successes with you …

2018 Severalnines Momentum Highlights:

For those who don’t know about it yet, ClusterControl helps database users deploy, monitor, manage and scale SQL and NoSQL open source databases such as MySQL, MariaDB, PostgreSQL and MongoDB.

Automation and control of open source database infrastructure across mixed environments makes ClusterControl the ideal polyglot solution to support modern businesses - be they large or small.

The reason for ClusterControl’s popularity is the way it provides full operational visibility and control for open source databases.

But don’t take my word for it: we’ve published a year-end video this week that not only summarises our year’s achievements, but also includes customer and user quotes highlighting why they’ve chosen ClusterControl to help them administer their open source database infrastructure.

As a self-funded (mature) startup, our team’s focus is solely on solving pressing customer and community user needs. We do so with our product of course, but just as importantly also through our content contributions to the open source database community. We publish technical content daily that ranges from blogs to white papers, webinars and more.

These Are Our Top Feature & Content Hits in 2018

Top 3 New ClusterControl Features

SCUMM: agent-based monitoring infrastructure & dashboards

SCUMM - Severalnines CMON Unified Monitoring and Management - introduces new agent-based monitoring infrastructure with a server pulling metrics from agents that run on the same hosts as the monitored databases and uses Prometheus agents for greater accuracy and customization options while monitoring your database clusters.

Cloud database deployment

Introduces tighter integration with AWS, Azure and Google Cloud, so it is now possible to launch new instances and deploy MySQL, MariaDB, MongoDB and PostgreSQL directly from the ClusterControl user interface.

Comprehensive automation and management of PostgreSQL

Throughout the year, we’ve introduced a whole range of new features for PostgreSQL: from full backup and restore encryption for pg_dump and pg_basebackup, continuous archiving and Point-in-Time Recovery (PITR) for PostgreSQL, all the way to a new PostgreSQL performance dashboard.

Top 3 Most Read New Blogs

My Favorite PostgreSQL Queries and Why They Matter

Joshua Otwell presents a combination of eight differing queries or types of queries he has found interesting and engaging to explore, study, learn, or otherwise manipulate data sets.

A Performance Cheat Sheet for PostgreSQL

Sebastian Insausti discusses how one goes about analyzing the workload, or queries, that are running, as well as review some basic configuration parameters to improve the performance of PostgreSQL databases.

Deploying PostgreSQL on a Docker Container

Our team explains how to use Docker to run a PostgreSQL database.

Top 3 Most Downloaded White Papers

MySQL on Docker - How to Containerize the Dolphin

Covers the basics you need to understand when considering to run a MySQL service on top of Docker container virtualization. Although Docker can help automate deployment of MySQL, the database still has to be managed and monitored. ClusterControl can provide a complete operational platform for production database workloads.

PostgreSQL Management & Automation with ClusterControl

Discusses some of the challenges that may arise when administering a PostgreSQL database as well as some of the most important tasks an administrator needs to handle; and how to do so effectively … with ClusterControl. See how much time and effort can be saved, as well as risks mitigated, by the usage of such a unified management platform.

How to Design Highly Available Open Source Database Environments

Discusses the requirements for high availability in database setups, and how to design the system from the ground up for continuous data integrity.

Top 3 Most Watched Webinars

Our Guide to MySQL & MariaDB Performance Tuning

Watch as Krzysztof Książek, Senior Support Engineer at Severalnines, walks you through the ins and outs of performance tuning for MySQL and MariaDB, and share his tips & tricks on how to optimally tune your databases for performance.

Designing Open Source Databases for High Availability

From discussing high availability concepts through to failover or switch over mechanisms, this webinar covers all the need-to-know information when it comes to building highly available database infrastructures.

Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB with ClusterControl

Whether you are looking at rebuilding your existing backup infrastructure, or updating it, then this webinar is for you: watch replay of this webinar on Backup Management for MySQL, MariaDB, PostgreSQL and MongoDB with ClusterControl.

 

Thanks again for your support this year and “see you” in 2019!

Happy New Year from everyone at Severalnines!

PS.: To join Severalnines’ growing customer base please click here

by jj at December 27, 2018 11:50 AM

December 26, 2018

Jean-Jerome Schmidt

MySQL in 2018: What’s in 8.0 and Other Observations

With most, if not all of 2018 behind us (depending on when you are reading this post), there is no doubt that it was a fantastic year for open-source SQL databases.

PostgreSQL 11 and MySQL 8 were both released, providing both communities with plenty to 'talk about'. Truth be told, both vendors have introduced many significant changes and additions in their respective releases and deserve their praise and accolades.

I normally guest post about the former here on the Severalnines blog (Many thanks to a great organization!) but I also have an interest in the latter. With many blog posts on my own website (link in my bio section), mostly targeting MySQL version 5.7, it (MySQL) is always in my peripherals.

So what does MySQL 8 have that version 5.7 does not have? What are the improvements? Well, there are many. In fact, too many to cover in just one blog post.

I recently upgraded to version 8 in my current Linux learning/development environment, so I thought to try my hand at pointing some of them out.

I cannot guarantee you an in-depth discussion on your 'favorite' new feature(s). On the other hand, I will visit those that have caught my attention either via a personal interest or by way of the many terrific blog posts published throughout the year on version 8.

MySQL is getting better and better...Terrific improvements in version 8!

Roles

With Roles, DBA's can mitigate redundancy, where many users would share the same privilege or set of privileges.

Roles are a part of the SQL standard.

After creating a specific role with the desired/required privilege(s), you can then assign users that particular role via the GRANT command or likewise, 'taketh away' with REVOKE.

Roles come with numerous benefits and to make life a bit easier, there are a couple of tables to help you keep track of them:

  • mysql.role_edges - Here you find those roles and the users they are assigned.

    mysql> DESC mysql.role_edges;
    +-------------------+---------------+------+-----+---------+-------+
    | Field             | Type          | Null | Key | Default | Extra |
    +-------------------+---------------+------+-----+---------+-------+
    | FROM_HOST         | char(60)      | NO   | PRI |         |       |
    | FROM_USER         | char(32)      | NO   | PRI |         |       |
    | TO_HOST           | char(60)      | NO   | PRI |         |       |
    | TO_USER           | char(32)      | NO   | PRI |         |       |
    | WITH_ADMIN_OPTION | enum('N','Y') | NO   |     | N       |       |
    +-------------------+---------------+------+-----+---------+-------+
    5 rows in set (0.01 sec)
  • mysql.default_roles - Stores any default roles and those users assigned.

    mysql> DESC mysql.default_roles;
    +-------------------+----------+------+-----+---------+-------+
    | Field             | Type     | Null | Key | Default | Extra |
    +-------------------+----------+------+-----+---------+-------+
    | HOST              | char(60) | NO   | PRI |         |       |
    | USER              | char(32) | NO   | PRI |         |       |
    | DEFAULT_ROLE_HOST | char(60) | NO   | PRI | %       |       |
    | DEFAULT_ROLE_USER | char(32) | NO   | PRI |         |       |
    +-------------------+----------+------+-----+---------+-------+
    4 rows in set (0.00 sec)

The combination of both tables (not in the SQL JOIN sense) essentially provides a 'centralized location' where you can: know, monitor, and assess all of your implemented user-role privilege relationships and assignments.

Likely the simplest example role usage scenario would be:

You have several users who need 'read-only access' on a specific table, therefore, requiring at least the SELECT privilege. Instead of granting it (SELECT) individually to each user, you can establish (create) a role having that privilege, then assign that role to those users.

But, roles come with a small 'catch'. Once created and assigned to a user, the receiving user must have an active default role set, during authentication upon login.

While on the subject of roles and users, I feel it is important to mention the change implemented in MySQL 8 concerning the validate_password component, which is a variant of the validate_password plugin used in version 5.7 .

This component provides various distinct 'categories' of password checking: low, medium (default), and strong. Visit the validate_password component documentation for a full rundown on each levels' validation specifics.

NoSQL Mingling with SQL - The Document Store

This feature is one I am still learning about, despite a fleeting interest in MongoDB in early 2016. To date, my interest, study, and learning have been focused solely on 'SQL'. However, I am aware (through much reading on the web) that many are excited about this type of structuring (document-oriented) intertwined with 'relational SQL' now available in the MySQL 8 document store.

Below are many benefits available when using the document store. Be sure and mention your favorites I may have missed in the comments section:

  • The JSON data type has been supported since MySQL version 5.7.8 yet, version 8 introduced significant enhancements for working with JSON. New JSON specific functions along with 'shorthand' operators that can be used in place of multiple function calls - with equal results/output.
  • Perhaps one of the foremost benefits is you no longer need to implement and work with multiple database solutions since NoSQL, SQL, or a combination of the two are supported in the document store.
  • A "DevAPI", provides seamless workflow capabilities within a NoSQL data context (collections and documents). (Visit the official DevAPI user guide documentation for more information).
  • Powerful command-line sessions using Python, SQL, or Javascript as the 'shell' language.
  • ACID compliant.
  • Quickly explore and discover your data without defining a schema as you would in a relational model.

Common Table Expressions (CTE's or the WITH clause)

What else can you say about CTE's? These things are a game-changer! For starters, what exactly is a common table expression?

From Wikipedia:

"A common table expression, or CTE, (in SQL) is a temporary named result set, derived from a simple query and defined within the execution scope of a SELECT, INSERT, UPDATE, or DELETE statement."

I'll provide a simple example, demonstrating CTE's. However, their full power is not harnessed in this section, as there are many more complex use-case examples than these.

I have a simple name table with this description and data:

mysql> DESC name;
+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| f_name | varchar(20) | YES  |     | NULL    |       |
| l_name | varchar(20) | YES  |     | NULL    |       |
+--------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM name;
+--------+------------+
| f_name | l_name     |
+--------+------------+
| Jim    | Dandy      |
| Johhny | Applesauce |
| Ashley | Zerro      |
| Ashton | Zerra      |
| Ashmon | Zerro      |
+--------+------------+
5 rows in set (0.00 sec)

Let's find out how many last names start with 'Z':

mysql> SELECT *
    -> FROM name
    -> WHERE l_name LIKE 'Z%';
+--------+--------+
| f_name | l_name |
+--------+--------+
| Ashley | Zerro  |
| Ashton | Zerra  |
| Ashmon | Zerro  |
+--------+--------+
3 rows in set (0.00 sec)

Easy enough.

However, using the WITH clause, you can 'access' this same query results set (which can be thought of as a derived table) and refer to it later on within the same statement - or 'scope':

 WITH last_Z AS (
           SELECT *
           FROM name
           WHERE l_name LIKE 'Z%')
   SELECT * FROM last_Z;
+--------+--------+
| f_name | l_name |
+--------+--------+
| Ashley | Zerro  |
| Ashton | Zerra  |
| Ashmon | Zerro  |
+--------+--------+
3 rows in set (0.00 sec)

I basically assign a name to the query, wrapping it in parenthesis. Then just select the data I want from what is now the last_Z CTE.

The last_Z CTE provides a complete result set, so you can filter it even further within the same statement:

WITH last_Z AS ( 
           SELECT *
           FROM name
           WHERE l_name LIKE 'Z%')
   SELECT f_name, l_name FROM last_Z WHERE l_name LIKE '%a';
+--------+--------+
| f_name | l_name |
+--------+--------+
| Ashton | Zerra  |
+--------+--------+
1 row in set (0.00 sec)

A couple of the more powerful features are 'chaining' multiple CTE's together and referencing to other CTE's within CTE's.

Here is an example to give you an idea (although not so much useful):

WITH last_Z AS ( 
           SELECT *
           FROM name
           WHERE l_name LIKE 'Z%'),
        best_friend AS (
           SELECT f_name, l_name
           FROM last_Z
           WHERE l_name LIKE '%a')
   SELECT * from best_friend;
+--------+--------+
| f_name | l_name |
+--------+--------+
| Ashton | Zerra  |
+--------+--------+
1 row in set (0.00 sec)

In the above query, you can see where I separated the last_Z CTE from the best_friend CTE with a comma then wrapped that query in parenthesis after the AS keyword.

Notice I am then able to refer to (and use) the last_Z CTE to essentially define the best_friend CTE.

Here are a few reasons why CTE's are such a significant improvement in version 8:

  • Other SQL vendors have supported CTE's (many since earlier versions within their individual ecosystem's) and now MySQL 8, has closed the gap in this area.
  • A standard SQL inclusion.
  • In some cases (where appropriate), CTE's are a better option than Temporary Tables, Views, Derived Tables (or Inline Views), and some subqueries.
  • CTE's can provide an 'on-the-fly' calculations results set you can query against.
  • A CTE can reference itself - known as a recursive CTE (not demonstrated here).
  • CTE's can name and use other CTE's
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Window Functions

Analytic queries are now possible in MySQL 8. As Window functions are not my strong suit, I am focused on a more in-depth study and better understanding of them, on a whole, moving forward. These next example(s) are mostly elementary according to my understanding. Suggestions, advice, and best practices are welcome from readers.

I have this VIEW that provides a fictitious pipe data result set (something I somewhat understand):

mysql> SELECT * FROM pipe_vw;
+---------+-------------+-----------+-------+-------------+------------+----------------+
| pipe_id | pipe_name   | joint_num | heat  | pipe_length | has_degree | wall_thickness |
+---------+-------------+-----------+-------+-------------+------------+----------------+
|     181 | Joint-278   | 39393A    | 9111  |       17.40 |          1 |          0.393 |
|     182 | Joint-8819  | 19393Y    | 9011  |       16.60 |          0 |          0.427 |
|     183 | Joint-9844  | 39393V    | 8171  |       10.40 |          0 |          0.393 |
|     184 | Joint-2528  | 34493U    | 9100  |       11.50 |          1 |          0.427 |
|     185 | Joint-889   | 18393z    | 9159  |       13.00 |          0 |          0.893 |
|     186 | Joint-98434 | 19293Q    | 8174  |        9.13 |          0 |          0.893 |
|     187 | Joint-78344 | 17QTT     | 179   |       44.40 |          1 |          0.893 |
|     188 | Joint-171C  | 34493U    | 17122 |        9.45 |          1 |          0.893 |
|     189 | Joint-68444 | 17297Q    | 6114  |       11.34 |          0 |          0.893 |
|     190 | Joint-4841R | 19395Q    | 5144  |       25.55 |          0 |          0.115 |
|     191 | Joint-1224C | 34493U    | 8575B |       15.22 |          1 |          0.893 |
|     192 | Joint-2138  | 34493C    | 91    |       13.55 |          1 |          0.893 |
|     193 | Joint-122B  | 34493U    | 9100B |        7.78 |          1 |          0.893 |
+---------+-------------+-----------+-------+-------------+------------+----------------+
13 rows in set (0.00 sec)

Imagine, I need the pipe asset records presented in some sort of row ranking depending on the length of each individual pipe. (E.g., The longest length is 'labeled' the number 1 position, the second longest length is 'labeled' position 2, etc...)

Based on the RANK() Window Function description in the documentation:

"Returns the rank of the current row within its partition, with gaps. Peers are considered ties and receive the same rank. This function does not assign consecutive ranks to peer groups if groups of size greater than one exist; the result is noncontiguous rank numbers."

It looks to be a well-suited for this requirement.

mysql> SELECT pipe_name, pipe_length,
    -> RANK() OVER(ORDER BY pipe_length DESC) AS long_to_short
    -> FROM pipe_vw;
+-------------+-------------+---------------+
| pipe_name   | pipe_length | long_to_short |
+-------------+-------------+---------------+
| Joint-78344 |       44.40 |             1 |
| Joint-4841R |       25.55 |             2 |
| Joint-278   |       17.40 |             3 |
| Joint-8819  |       16.60 |             4 |
| Joint-1224C |       15.22 |             5 |
| Joint-2138  |       13.55 |             6 |
| Joint-889   |       13.00 |             7 |
| Joint-2528  |       11.50 |             8 |
| Joint-68444 |       11.34 |             9 |
| Joint-9844  |       10.40 |            10 |
| Joint-171C  |        9.45 |            11 |
| Joint-98434 |        9.13 |            12 |
| Joint-122B  |        7.78 |            13 |
+-------------+-------------+---------------+
13 rows in set (0.01 sec)

In the next scenario, I want to build even further on the previous example by ranking the records of longest to shortest lengths, but, per each individual group of the distinct wall_thickness values.

Perhaps the below query and results will explain better where my prose may have not:

mysql> SELECT pipe_name, pipe_length, wall_thickness,
    -> RANK() OVER(PARTITION BY wall_thickness ORDER BY pipe_length DESC) AS long_to_short
    -> FROM pipe_vw;
+-------------+-------------+----------------+---------------+
| pipe_name   | pipe_length | wall_thickness | long_to_short |
+-------------+-------------+----------------+---------------+
| Joint-4841R |       25.55 |          0.115 |             1 |
| Joint-278   |       17.40 |          0.393 |             1 |
| Joint-9844  |       10.40 |          0.393 |             2 |
| Joint-8819  |       16.60 |          0.427 |             1 |
| Joint-2528  |       11.50 |          0.427 |             2 |
| Joint-78344 |       44.40 |          0.893 |             1 |
| Joint-1224C |       15.22 |          0.893 |             2 |
| Joint-2138  |       13.55 |          0.893 |             3 |
| Joint-889   |       13.00 |          0.893 |             4 |
| Joint-68444 |       11.34 |          0.893 |             5 |
| Joint-171C  |        9.45 |          0.893 |             6 |
| Joint-98434 |        9.13 |          0.893 |             7 |
| Joint-122B  |        7.78 |          0.893 |             8 |
+-------------+-------------+----------------+---------------+
13 rows in set (0.00 sec)

This query uses the PARTITION BY clause on the wall_thickness column because we want the ranking (that ORDER BY pipe_length DESC provides) however, we need it in the context of the individual wall_thickness groups.

Each long_to_short column ranking resets back to 1 as you encounter (or change) to a different wall_thickness column value.

Let's concentrate on the results of one single group.

Targeting the records with wall_thickness values 0.893, the row with pipe_length 44.40 has a corresponding long_to_short 'ranking' of 1 (it's the longest), while the row with pipe_length 7.78 has a corresponding long_to_short 'ranking' of 8 (the shortest) all within that specific group (0.893) of wall_thickness values.

Window functions are quite powerful and their entire scope and breadth could not possibly be covered in one section alone. Be sure and visit the Window Functions supported in MySQL 8 documentation for more information on those currently available.

Improved Spatial Support and Capabilities

This is a tremendous set of features included in MySQL 8. Previous versions’ support, or lack thereof, simply could not compare to other vendor implementation(s) (think PostGIS for PostgreSQL).

For the past 10 plus years, I have worked in the field as a Pipeline Surveyor, collecting GPS and asset data, so this group of changes definitely catches my attention.

Spatial data expertise is a comprehensive subject in its own right and be assured, I am far from an expert on it. However, I hope to summarize the significant changes between versions 5.7 and 8 and convey them in an clear and concise manner.

Let's familiarize ourselves with 2 key terms (and concepts) for the purposes of this section.

  1. Spatial Reference System or SRS - Here is a partial definition from Wikipedia:

    "A spatial reference system (SRS) or coordinate reference system (CRS) is a coordinate-based local, regional or global system used to locate geographical entities. A spatial reference system defines a specific map projection, as well as transformations between different spatial reference systems."

  2. Spatial Reference System Identifier or SRID - Also, Wikipedia has SRID's defined as such:

    "A Spatial Reference System Identifier (SRID) is a unique value used to unambiguously identify projected, unprojected, and local spatial coordinate system definitions. These coordinate systems form the heart of all GIS applications."

MySQL supports many spatial data types. One of the more common ones is a POINT. If you use your GPS to navigate to your favorite restaurant, then that location is a POINT on a map.

MySQL 5.7 treats pretty much every 'spatial object' as having an SRID of 0, which is significant for computations. Those calculations are computed in a Cartesian type of coordinate system. However, we all know that our globe is a sphere and far from flat. Therefore, in version 8, you have the ability to consider it as either flat or spherical in computations.

Back to those two terms, we defined previously.

Even though 0 is the default SRID in MySQL version 8, many (approximately 5,000+) other SRID's are supported.

But why is that important?

This fantastic explanation via the blog post, Spatial Reference Systems in MySQL 8.0, sums it up nicely:

"By default, if we don’t specify an SRID, MySQL will create geometries in SRID 0. SRID 0 is MySQL’s notion of an abstract, unitless, infinite, Catesian plane. While all other SRSs refer to some surface and defines units for the axes, SRID 0 does not."

Essentially, when performing calculations with SRID's other than SRID 0, then the shape of our Earth comes into play, is considered, and affects those calculations. This is crucial for any meaningful/accurate computations. For an in-depth rundown and better extrapolation, see this blog post covering geography in MySQL 8.

I also highly recommend the MySQL Server Team blog post, Geographic Spatial Reference Systems in MySQL 8.0, for clarity on SRS's. Do make sure and give it a read!

Finally, for spatial data upgrade concerns from version 5.7 to 8, visit some of the incompatible changes listed here for more information.

Other Notable Observations

Below are other release enhancements that I must acknowledge, although they are not covered in-depth in this blog post:

  • utf8mb4 is now the default character set (previously latin1) - Better support for those must have emojis in addition to some languages...
  • Transactional Data Dictionary - MySQL metadata is now housed in InnoDB tables.
  • Invisible Indexes - Set the visibility of an index for the optimizer, ultimately determining if adding or removing it (the index), is a good or bad thing. Adding an index to an existing large table can be 'expensive' in terms of locking and resources.
  • Descending Indexes - Better performance on indexed values that are stored in descending order.
  • Instant Add Column - For schema changes, specify ALGORITHM=INSTANT in ALTER TABLE statements and (if feasible for the operation) avoid metadata locks. ( For more information, see this great post by the MySQL Server Team, and the ALTER TABLE section from the official docs.)

Bonus Section: Something I Had Hoped to See...

Check constraints have not made their way into the MySQL product yet.

As with previous MySQL versions, check constraint syntax is allowed in your CREATE TABLE commands but it is ignored. To my knowledge, most other SQL vendors support check constraints. Come join the party MySQL!

MySQL has significantly 'stepped up' its offering in version 8. Supporting robust spatial capabilities, convenient user management role options, 'hybrid' SQL/NoSQL data solutions, and analytical functions among the numerous additional improvements, is truly remarkable.

In my opinion, with version 8, MySQL continues to provide a solid option in the ever-growing, competitive open-source SQL ecosystem, full of relevant and feature-rich solutions.

Thank you for reading.

by Joshua Otwell at December 26, 2018 03:02 PM

December 25, 2018

Oli Sennhauser

Using tmux for MariaDB database support and surveillance

See also our older article: Using screen for support and/or surveillance.

First simple steps

The command tmux starts a tmux server and opens a new session with a (pseudo) terminal:

shell> tmux

tmux1.png

To leave a tmux session again just type Ctrl+d inside your tmux session or:

tmux> exit

If you want to give a tmux session a specific name you can start tmux as follows to created a named session:

shell> tmux new -s mariadb

or if you are already inside tmux:

tmux> Ctrl+b $

followed by a session name where only the first 9 characters are shown in the overview:

tmux2.png

List available tmux sessions

To list the available tmux sessions we have the tmux list-sessions command:

shell> tmux list-sessions
1: 1 windows (created Sun Dec 23 13:35:37 2018) [117x33]
mariadb-104: 1 windows (created Sun Dec 23 13:13:46 2018) [130x41] (attached)

If there is no session available we will get the following error:

shell> tmux ls
failed to connect to server

tmux help

To get more information about tmux you can run:

shell> man tmux
shell> tmux --help
shell> tmux ls --help
tmux> Ctrl+b ?

Detach and re-attach to a tmux session

With the command:

tmux> Ctrl+b d

you will detach from a tmux session. With tmux ls you can list the available sessions and to reattach to a tmux session you can type:

shell> tmux attach
shell> tmux attach -t 1
shell> tmux a -t mariadb-104

Split window (session) into different panes

A tmux session uses a window and this window can be split into different panes (pseudo terminals):

  • Ctrl+b % Splits a window into 2 panes vertically.
  • Ctrl+b " Splits a window into 2 panes horizontally.

tmux3.png

To switch between the panes you can use:

tmux> Ctrl+b arrow {up|down|left|right}

If you want to make a pane full-screen you can use the Ctrl+b z to toggle.

Ctrl+b Ctrl+Cursor {up|down|left|right} resizes the current pane.

Scroll within a pane

To switch to the scroll mode you have to use the following key combination:

tmux> Ctrl+b [

Then you can navigate with Cursor {up|down|left|right} or {PgUp|PgDown}. To leave the navigation scroll mode you just have to type q.

An other possibility to switch to the scroll mode is to press the key Ctrl+b PgUp.

Taxonomy upgrade extras: 

by Shinguz at December 25, 2018 03:14 PM

December 24, 2018

MariaDB Foundation

MariaDB 10.2.20 and MariaDB Connector/C 3.0.8 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.2.20, the latest release in the MariaDB 10.2 series, as well as MariaDB Connector/C 3.0.8. Both are stable releases. See the release notes and changelogs for details. Download MariaDB 10.2.20 Release Notes Changelog What is MariaDB 10.2? MariaDB APT and YUM Repository Configuration Generator […]

The post MariaDB 10.2.20 and MariaDB Connector/C 3.0.8 now available appeared first on MariaDB.org.

by Ian Gilfillan at December 24, 2018 03:14 PM

December 23, 2018

Federico Razzoli

Revisiting my 2018 Database Wishlist

It is December and 2018 is going to end. In January, when it just started, I wrote my 2018 Database Wishlist. Probably next January I’ll write another list. But first, it makes sense to review the last one. If some of my wishes actually happen, I really should know that – I don’t want to miss something nice, or forget topics that I considered interesting less than one year ago. Well, let’ s do some copy/paste and some googling…

More research on Learned Indexes

I’m not sure if more research actually happened – I hope so, and I hope that we’ll see its results at some point. At least, it seems that the topic was not forgotten. It was mentioned at least at Artificial Intelligence conference in May, and at Stanford Seminar in October.

It’s worth noting that Wikipedia still doesn’t have a page for Learned Index.

Progress on using ML for database tuning

The Overtune website didn’t publish anything new – it just removed some previously available information. It’s now possible to register to become a tester for a beta version, so it is reasonable to think that there has been some progress. The repository is actually active. No new public articles, so bad. I’ll definitely stay tuned.

More research on stored functions transformation

I can’t find anything newer than the original paper, except for a post from Microsoft Research. I found no evidence that anyone not working at Microsoft considers this research interesting.

Galera 4

MariaDB 10.4 plans mention that Galera 4 will be included. But this could be just another optimistic hypothesis, so basically… I still see nothing new.

Transactional DDL in the MySQL ecosystem

MDEV-4259 – transactional DDL is still open, no fix version was set. They indeed dedicated resources to MDEV-11424 – Instant ALTER TABLE of failure-free record format changes.

Oracle doesn’t say much about what will be in future MySQL versions. However, someone from Oracle said that atomic ALTER TABLE is a foundation for transactional DDL, which could mean that they’re closer to that than MariaDB. So, let’s hope we’ll see this feature in the next version – but there was no claim from them about that.

Engines, engines, engines

The storage engines I mentioned are still available for MySQL and MariaDB, or come with Percona Server. Oracle still didn’t kill MyISAM. However, no, I didn’t see any big change in SPIDER.

More progress on Tarantool and CockroachDB

Apparently, Tarantool 2 (still not stable) fixed a lot of bugs and improved its SQL support. This includes removing ON CONFLICT REPLACE for UNIQUE indexes, that is also problematic for MySQL.

Cockroach actually added a lot of features. Amongst other things, I want to report the cost-based optimizer and the CDC. IMPORT command allows to import dumps from MySQL and PostgreSQL, as well as CockroachDB itself and CSV files.

Final thoughts

Some things simply didn’t happen.

Learned index structures and machine learning to tune database performance apparently weren’t forgotten, so hopefully we’ll see something interesting in the future.

Tarantool and CockroachDB show interesting enhancements. MySQL third-party storage engines didn’t introduce anything fancy, but keep on doing a good job.

Federico

by Federico at December 23, 2018 10:05 PM

December 22, 2018

Valeriy Kravchuk

How to Get Details About MyRocks Deadlocks in MariaDB and Percona Server

In my previous post on ERROR 1213 I noted that Percona Server does not support the SHOW ENGINE ROCKSDB TRANSACTION STATUS statement to get deadlock details in "text" form. I've got some clarifications in my related feature request, PS-5114. So I decided to write this followup post and show what is the way to get deadlock details for the ROCKSDB tables in current versions of MariaDB and Percona Server.

First of all, I'd like to check MariaDB's implementation of MyRocks. For this I'll re-create deadlock scenario from that my post with MariaDB 10.3.12 I have at hand. We should start with installing ROCKSDB plugin according to this KB article:
openxs@ao756:~/dbs/maria10.3$ bin/mysql --no-defaults --socket=/tmp/mariadb.sock -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 8
Server version: 10.3.12-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.


MariaDB [test]> install soname 'ha_rocksdb';
Query OK, 0 rows affected (36,338 sec)

MariaDB [test]> show engines;
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                          | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
| ROCKSDB            | YES     | RocksDB storage engine                                                           | YES          | YES  | YES        |
...
| CONNECT            | YES     | Management of External Data (SQL/NOSQL/MED), including many file formats         | NO           | NO   | NO         |
| Aria               | YES     | Crash-safe tables with MyISAM heritage                                           | NO           | NO   | NO         |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables | YES          | YES  | YES        |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                                               | NO           | NO   | NO         |
| SEQUENCE           | YES     | Generated tables filled with sequential values                                   | YES          | NO   | YES        |
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
10 rows in set (8,753 sec)

MariaDB [test]> show plugins;
+-------------------------------+----------+--------------------+---------------+---------+
| Name                          | Status   | Type               | Library       | License |
+-------------------------------+----------+--------------------+---------------+---------+
| binlog                        | ACTIVE   | STORAGE ENGINE     | NULL          | GPL     |
| mysql_native_password         | ACTIVE   | AUTHENTICATION     | NULL          | GPL     |
| mysql_old_password            | ACTIVE   | AUTHENTICATION     | NULL          | GPL     |
| wsrep                         | ACTIVE   | STORAGE ENGINE     | NULL          | GPL     |
...
| CONNECT                       | ACTIVE   | STORAGE ENGINE     | ha_connect.so | GPL     |
| ROCKSDB                       | ACTIVE   | STORAGE ENGINE     | ha_rocksdb.so | GPL     |
...
| ROCKSDB_LOCKS                 | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_TRX                   | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_DEADLOCK              | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
+-------------------------------+----------+--------------------+---------------+---------+
68 rows in set (2,451 sec)
Note that in MariaDB just one simple INSTALL SONAME ... statement is enough to get ROCKSDB with all related plugins loaded. Do not mind time to execute statements - I am running them on a netbook that is busy compiling Percona Server 8.0.13 from GitHub concurrently, to post something about it later :)

Now, let me re-create the same deadlock scenario:

MariaDB [test]> create table t1(id int, c1 int, primary key(id)) engine=rocksdb;
Query OK, 0 rows affected (4,163 sec)

MariaDB [test]> insert into t1 values (1,1), (2,2);
Query OK, 2 rows affected (0,641 sec)
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [test]> set global rocksdb_lock_wait_timeout=50;
Query OK, 0 rows affected (0,644 sec)

MariaDB [test]> set global rocksdb_deadlock_detect=ON;
Query OK, 0 rows affected (0,037 sec)

MariaDB [test]> show global variables like 'rocksdb%deadlock%';
+-------------------------------+-------+
| Variable_name                 | Value |
+-------------------------------+-------+
| rocksdb_deadlock_detect       | ON    |
| rocksdb_deadlock_detect_depth | 50    |
| rocksdb_max_latest_deadlocks  | 5     |
+-------------------------------+-------+
3 rows in set (0,022 sec)
We need two sessions. In the first one:
MariaDB [test]> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
|              11 |
+-----------------+
1 row in set (0,117 sec)

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0,000 sec)

MariaDB [test]> select * from t1 where id = 1 for update;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
+----+------+
1 row in set (0,081 sec)

In the second:
MariaDB [test]> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
|              12 |
+-----------------+
1 row in set (0,000 sec)

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0,000 sec)

MariaDB [test]> select * from t1 where id = 2 for update;
+----+------+
| id | c1   |
+----+------+
|  2 |    2 |
+----+------+
1 row in set (0,001 sec)
Back in the first:

MariaDB [test]> select * from t1 where id=2 for update;
It hangs waiting for incompatible lock. In the second:

MariaDB [test]> select * from t1 where id=1 for update;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Now, can we get any details about the deadlock that just happened using upstream SHOW ENGINE statement? Let's try:
MariaDB [test]> SHOW ENGINE ROCKSDB TRANSACTION STATUS\G
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'TRANSACTION STATUS' at line 1
Does not work, same as in Percona Server 5.7.x. Here is a related MariaDB task: MDEV-13859 - "Add SHOW ENGINE ROCKSDB TRANSACTION STATUS or its equivalent?". Still "Open" and without any target version set, not even 10.4.

The idea behind NOT supporting this statement by Percona, according to comments I've got in  PS-5114, is to rely on tables in the information_schema. That's what we have in the related tables, rocksdb_locks, rocksdb_trx and rocksdb_deadlock after deadlock above was detected:
MariaDB [test]> select * from information_schema.rocksdb_deadlock;
+-------------+------------+----------------+---------+------------------+-----------+------------+------------+-------------+
| DEADLOCK_ID | TIMESTAMP  | TRANSACTION_ID | CF_NAME | WAITING_KEY      | LOCK_TYPE | INDEX_NAME | TABLE_NAME | ROLLED_BACK |
+-------------+------------+----------------+---------+------------------+-----------+------------+------------+-------------+
|           0 | 1545481878 |              6 | default | 0000010080000002 | EXCLUSIVE | PRIMARY    | test.t1    |           0 |
|           0 | 1545481878 |              7 | default | 0000010080000001 | EXCLUSIVE | PRIMARY    | test.t1    |           1 |
+-------------+------------+----------------+---------+------------------+-----------+------------+------------+-------------+
2 rows in set (0,078 sec)

MariaDB [test]> select * from information_schema.rocksdb_trx;
+----------------+---------+------+-------------+------------+-------------+-------------+--------------------------+----------------+--------------+-----------+------------------------+----------------------+-----------+-------+
| TRANSACTION_ID | STATE   | NAME | WRITE_COUNT | LOCK_COUNT | TIMEOUT_SEC | WAITING_KEY | WAITING_COLUMN_FAMILY_ID | IS_REPLICATION | SKIP_TRX_API | READ_ONLY | HAS_DEADLOCK_DETECTION | NUM_ONGOING_BULKLOAD | THREAD_ID | QUERY |
+----------------+---------+------+-------------+------------+-------------+-------------+--------------------------+----------------+--------------+-----------+------------------------+----------------------+-----------+-------+
|              6 | STARTED |      |           0 |          2 |          50 |             |                        0 |              0 |            0 |         0 |                      1 |                    0 |        11 |       |
+----------------+---------+------+-------------+------------+-------------+-------------+--------------------------+----------------+--------------+-----------+------------------------+----------------------+-----------+-------+
1 row in set (0,001 sec)

MariaDB [test]> select * from information_schema.rocksdb_locks;
+------------------+----------------+------------------+------+
| COLUMN_FAMILY_ID | TRANSACTION_ID | KEY              | MODE |
+------------------+----------------+------------------+------+
|                0 |              6 | 0000010080000002 | X    |
|                0 |              6 | 0000010080000001 | X    |
+------------------+----------------+------------------+------+
2 rows in set (0,025 sec)
I was not able to find any really good documentation about these tables (I checked here, there and more), especially rocksdb_deadlock that is totally undocumented, so let me try to speculate and explain my ideas on how they are supposed to work and be used together. Information about up to rocksdb_max_latest_deadlocks is stored in the rocksdb_deadlock table, each deadlock is identified by deadlock_id and in case of MariaDB you can find out when it happened using the timestamp column that is a UNIX timestamp:
MariaDB [test]> select distinct deadlock_id, from_unixtime(timestamp) from information_schema.rocksdb_deadlock;+-------------+--------------------------+
| deadlock_id | from_unixtime(timestamp) |
+-------------+--------------------------+
|           0 | 2018-12-22 14:31:18      |
+-------------+--------------------------+
1 row in set (0,137 sec)

For each deadlock you have a row per lock wait for each transaction involved, identified by transaction_id. You can see for what key value (waited_key) in that index (index_name) of what table (table_name) the transaction was waited. Victim (transaction that was rolled back to prevent deadlock) of deadlock detection is identified by the value 1 in the rolled_back column. This is all the information that persists for any notable time, and I don't like it that much, as we can not see what lock(s) transactions had at the moment. We can guess conflicting lock based on what was the waiting_key, but I'd prefer InnoDB way of showing this clearly with all the details.

If you hurry up and query the rocksdb_trx table fast enough, you can get more details about those transaction(s) involved in deadlock that are NOT rolled back (and not yet committed). Join by the transaction_id column, obviously, to get the details up to current running query and processlist connection id (rocksdb_trx.thread_id column) involved. 

If you hurry up to query rocksdb_locks table also by the transaction_id of still active transaction, you can get a list of locks it holds and then guess which one was a blocking lock. If you are not fast enough and transaction is gone you have just to assume there was some blocking lock. One day gap locks may be added, and some lock would become not good enough guess.

I miss these rocks at Beaulieu-sur-Mer. MyRocks in all implementations but Facebook's, misses one useful way to get the details about deadlock(s) that happened in the past.

To summarize, while storing information about configurable number of last deadlocks in the table seems to be a good idea, in case of ROCKSDB in both Percona and MariaDB servers (as soon as all transactions involved in deadlock are completed one way or the other) we miss some details (like thread ids for sessions involved, exact locks that each transaction held etc) comparing to the text output provided by the original upstream statement (and SHOW ENGINE INNODB STATUS\G, surely). Even if we are lucky to query all tables in time, we still probably miss lock waits table (like innodb_lock_waits) and any built in way to store information in the error log about deadlocks that happened (and all locks involved). 

Note also lack of consistency in naming (rocksdb_locks, plural vs  rocksdb_deadlock, singular, in case of MariaDB), rocksdb_deadlock.lock_type with value EXCLUSIVE vs rocksdb_locks.mode with value X etc, and and very limited documentation available. In my opinion current state is unacceptable if we hope to see wide use of MyRocks by community users and DBAs outside of Facebook.

by Valeriy Kravchuk (noreply@blogger.com) at December 22, 2018 03:40 PM

December 21, 2018

Peter Zaitsev

Announcing General Availability of Percona Server for MySQL 8.0

Percona Server for MySQL 8.0

Percona Server for MySQL 8.0

Percona has released Percona Server for MySQL 8.0 as Generally Available (GA). Our Percona Server for MySQL 8.0 software is the company’s free, enhanced, drop-in replacement for MySQL Community Edition. Percona Server for MySQL 8.0 includes all of the great features in MySQL Community Edition 8.0. It also includes enterprise-class features from Percona made available free and open source. Percona Server for MySQL is trusted by thousands of enterprises to meet their need for a mature, proven, cost-effective MySQL solution that delivers excellent performance and reliability.

Downloads are available on the Percona Website and in the Percona Software Repositories.

Features in Percona Server for MySQL 8.0

Percona Server for MySQL 8.0 includes all of the features available in MySQL 8.0 Community Edition in addition to enterprise-grade features developed by Percona for the community.

MySQL Community Edition 8.0 Features

Some of the highlights from MySQL 8.0 contained in Percona Server for MySQL 8.0 include:

  • MySQL Document Store—Combining NoSQL functionality within the X API along with JSON enhancements such as new operators and functions enables developers to use MySQL 8.0 for non-relational data storage without the need for a separate NoSQL database.
  • Stronger SQL—With the addition of Window Functions, Common Table Expressions, Unicode safe Regular Expressions, and other improvements MySQL 8.0 provide broader support for the range of SQL standard functionality.
  • Transactional Data Dictionary—Enables atomic and crash-safe DDL operations, enhancing reliability, and eliminating the need for metadata files.
  • Security—SQL Roles, SHA2 default authentication, fine-grained privileges, and other enhancements make MySQL 8.0 more secure and adaptable to your organization’s compliance needs.
  • Geospatial—New SRS aware spatial data types, spatial indexes, and spatial functions, enabling the use of MySQL 8.0 for complex GIS use-cases.

Percona Server for MySQL 8.0 Features

Building on the upstream MySQL 8.0 Community Edition, Percona Server for MySQL 8.0 brings many great features in this release, including the following:

  • Security and Compliance:
    • Audit Logging Plugin: Provides monitoring and logging of database activity to assist organizations in meeting their compliance objectives. This feature is comparable to MySQL Enterprise Auditing.
    • PAM-based Authentication Plugin: Assists enterprises in integrating Percona Server for MySQL with their single sign-on (SSO) and two-factor authentication (2FA) systems by integrating with standard PAM modules. This feature is comparable to MySQL Enterprise Authentication.
    • Enhanced Encryption: Improves upon Transparent Data Encryption (TDE) present in MySQL Community Edition. Enhanced encryption adds support for binary log encryption, temporary file encryption, encryption support for all InnoDB tablespace types and logs, encryption of the parallel doublewrite buffer, key rotation, and support for centralized key management using Hashicorp Vault. Please Note: Some of the encryption features are still considered experimental and are not yet suitable for production use. These features together are comparable to MySQL Enterprise TDE.
  • Performance and Scalability:
    • Threadpool: Supporting 10000+ connections, this feature provides significant performance benefits under heavy load. This feature is comparable to MySQL Enterprise Scalability.
    • InnoDB Engine Enhancements: Enables highly concurrent IO-bound workloads to see significant performance improvements through parallel doublewrite, multithreaded LRU flushers, and single page eviction. In a simple benchmark, we saw a 60% performance improvement in some workloads when comparing Percona Server for MySQL to MySQL Community Edition
    • MyRocks Storage Engine: Based on the RocksDB storage library, MyRocks brings MySQL into the 21st century by being optimized for modern hardware such as nVME SSDs. Utilizing strong compression, MyRocks reduces write-amplification and storage requirements on SSDs compared to InnoDB to lower TCO and increase ROI when working with large datasets. Improved throughput consistency compared to InnoDB enables scaling cloud resources for your databases more strategically.
  • Observability and Usability:
    • Improved Instrumentation: Percona Server for MySQL 8.0 offers more than double the available performance and stats counters compared to MySQL Community Edition, as well as support for gathering per-user and per-thread statistics, and extended slow query logging capabilities. Together with free tools like Percona Monitoring and Management these enhancements enable your DBAs to troubleshoot issues faster and effectively improve your application performance.
    • Reduced Backup Impact: Lighter weight Backup Locking reduces the impact to performance and availability of performing backups.  This feature makes your backups run faster and your applications perform better during long-running backups when used together with Percona XtraBackup 8.0.

Features Removed in Percona Server for MySQL 8.0

Some features were not ported forward from Percona Server for MySQL 5.7 to Percona Server for MySQL 8.0.  Features which are unused, have something comparable included upstream, or are no longer relevant in this major release have been removed. For more information see our documentation.

  • Slow Query Log Rotation and Expiration: Not widely used, can be accomplished using logrotate
  • CSV engine mode for standard-compliant quote and comma parsing
  • Expanded program option modifiers
  • The ALL_O_DIRECT InnoDB flush method: it is not compatible with the new redo logging implementation
  • XTRADB_RSEG table removed from INFORMATION_SCHEMA
  • InnoDB memory size information from SHOW ENGINE INNODB STATUS; the same information is available from Performance Schema memory summary tables
  • Query cache enhancements: The query cache is no longer present in MySQL 8.0

Features Being Deprecated in Percona Server for MySQL 8.0

  • TokuDB Storage Engine: TokuDB will be supported throughout the Percona Server for MySQL 8.0 release series, but will not be available in the next major release.  Percona encourages TokuDB users to explore the MyRocks Storage Engine which provides similar benefits for the majority of workloads and has better optimized support for modern hardware.

Additional Resources

by Tyler Duzan at December 21, 2018 08:50 PM

Release Notes for Percona Server for MySQL 8.0.13-3 GA

Percona Server for MySQL 8.0

Percona Server for MySQL 8.0

Percona announces the GA release of Percona Server for MySQL 8.0.13-3 on December 21, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 8.0.13, including all the bug fixes in it. Percona Server for MySQL 8.0.13-3 is now the current GA release in the 8.0 series. All of Percona’s software is open-source and free.

Percona Server for MySQL 8.0 includes all the features available in MySQL 8.0 Community Edition in addition to enterprise-grade features developed by Percona. For a list of highlighted features from both MySQL 8.0 and Percona Server for MySQL 8.0, please see the GA release announcement.

Note: If you are upgrading from 5.7 to 8.0, please ensure that you read the upgrade guide and the document Changed in Percona Server for MySQL 8.0.

Features Removed in Percona Server for MySQL 8.0

  • Slow Query Log Rotation and Expiration: Not widely used, can be accomplished using logrotate
  • CSV engine mode for standard-compliant quote and comma parsing
  • Expanded program option modifiers
  • The ALL_O_DIRECT InnoDB flush method: it is not compatible with the new redo logging implementation
  • XTRADB_RSEG table from INFORMATION_SCHEMA
  • InnoDB memory size information from SHOW ENGINE INNODB STATUS; the same information is available from Performance Schema memory summary tables
  • Query cache enhancements: The query cache is no longer present in MySQL 8.0

Features Deprecated in Percona Server for MySQL 8.0

  • TokuDB Storage Engine: the Percona Server for MySQL 8.0 release series supports TokuDB. We are deprecating TokuDB support in the next major release. Percona encourages TokuDB users to explore the MyRocks Storage Engine which provides similar benefits for the majority of workloads and has better-optimized support for modern hardware.

Issues Resolved in Percona Server for MySQL 8.0.13-3

Improvements

  • #5014: Update Percona Backup Locks feature to use the new BACKUP_ADMIN privilege in MySQL 8.0
  • #4805: Re-Implemented Compressed Columns with Dictionaries feature in PS 8.0
  • #4790: Improved accuracy of User Statistics feature

Bugs Fixed Since 8.0.12-2rc1

  • Fixed a crash in mysqldump in the --innodb-optimize-keys functionality #4972
  • Fixed a crash that can occur when system tables are locked by the user due to a lock_wait_timeout #5134
  • Fixed a crash that can occur when system tables are locked by the user from a SELECT FOR UPDATE statement #5027
  • Fixed a bug that caused innodb_buffer_pool_size to be uninitialized after a restart if it was set using SET PERSIST#5069
  • Fixed a crash in TokuDB that can occur when a temporary table experiences an autoincrement rollover #5056
  • Fixed a bug where marking an index as invisible would cause a table rebuild in TokuDB and also in MyRocks #5031
  • Fixed a bug where audit logs could get corrupted if the audit_log_rotations was changed during runtime. #4950
  • Fixed a bug where LOCK INSTANCE FOR BACKUP and STOP SLAVE SQL_THREAD would cause replication to be blocked and unable to be restarted. #4758 (Upstream #93649)

Other Bugs Fixed:

#5155#5139#5057#5049#4999#4971#4943#4918#4917#4898, and #4744.

Known Issues in Percona Server for MySQL 8.0.13-3

We have a few features and issues outstanding that should be resolved in the next release.

Pending Feature Re-Implementations and Improvements

  • #4892: Re-Implement Expanded Fast Index Creation feature.
  • #5216: Re-Implement Utility User feature.
  • #5143: Identify Percona features which can make use of dynamic privileges instead of SUPER

Notable Issues in Features

  • #5148: Regression in Compressed Columns Feature when using innodb-force-recovery
  • #4996: Regression in User Statistics feature where TOTAL_CONNECTIONS field report incorrect data
  • #4933: Regression in Slow Query Logging Extensions feature where incorrect transaction id accounting can cause an assert during certain DDLs.
  • #5206: TokuDB: A crash can occur in TokuDB when using Native Partitioning and the optimizer has index_merge_union enabled. Workaround by using SET SESSION optimizer_switch="index_merge_union=off";
  • #5174: MyRocks: Attempting to use unsupported features against MyRocks can lead to a crash rather than an error.
  • #5024: MyRocks: Queries can return the wrong results on tables with no primary key, non-unique CHAR/VARCHAR rows, and UTF8MB4 charset.
  • #5045: MyRocks: Altering a column or table comment cause the table to be rebuilt

Find the release notes for Percona Server for MySQL 8.0.13-3 in our online documentation. Report bugs in the Jira bug tracker.

by Tyler Duzan at December 21, 2018 08:49 PM

Backup and Restore a PostgreSQL Cluster With Multiple Tablespaces Using pg_basebackup

PostgreSQL backup cluster multiple tablespaces

PostgreSQL logopg_basebackup is a widely used PostgreSQL backup tool that allows us to take an ONLINE and CONSISTENT file system level backup. These backups can be used for point-in-time-recovery or to set up a slave/standby. You may want to refer to our previous blog posts, PostgreSQL Backup StrategyStreaming Replication in PostgreSQL and Faster PITR in PostgreSQL where we describe how we used pg_basebackup for different purposes. In this post, I’ll demonstrate the steps to restore a backup taken using pg_basebackup when we have many tablespaces that store databases or their underlying objects.

A simple backup can be taken using the following syntax.

Tar and Compressed Format
$ pg_basebackup -h localhost -p 5432 -U postgres -D /backupdir/latest_backup -Ft -z -Xs -P
Plain Format
$ pg_basebackup -h localhost -p 5432 -U postgres -D /backupdir/latest_backup -Fp -Xs -P

Using a tar and compressed format is advantageous when you wish to use less disk space to backup and store all tablespaces, data directory and WAL segments, with everything in just one directory (target directory for backup).

Whereas a plain format stores a copy of the data directory as is, in the target directory. When you have one or more non-default tablespaces, tablespaces may be stored in a separate directory. This is usually the same as the original location, unless you use

--tablespace-mapping
  to modify the destination for storing the tablespaces backup.

PostgreSQL supports the concept of tablespaces. In simple words, a tablespace helps us maintain multiple locations to scatter databases or their objects. In this way, we can distribute the IO and balance the load across multiple disks.

To understand what happens when we backup a PostgreSQL cluster that contains multiple tablespaces, let’s consider the following example. We’ll take these steps:

  • Create two tablespaces in an existing master-slave replication setup.
  • Take a backup and see what is inside the backup directory.
  • Restore the backup.
  • Conclude our findings

Create 2 tablespaces and take a backup (tar format) using pg_basebackup

Step 1 :

I set up a replication cluster using PostgreSQL 11.2. You can refer to our blog post Streaming Replication in PostgreSQL to reproduce the same scenario. Here are the steps used to create two tablespaces:

$ sudo mkdir /data_pgbench
$ sudo mkdir /data_pgtest
$ psql -c "CREATE TABLESPACE data_pgbench LOCATION '/data_pgbench'"
$ psql -c "CREATE TABLESPACE data_pgtest LOCATION '/data_pgtest'"
$ psql -c "select oid, spcname, pg_tablespace_location(oid) from pg_tablespace"
oid | spcname | pg_tablespace_location
-------+--------------+------------------------
1663 | pg_default |
1664 | pg_global |
16419 | data_pgbench | /data_pgbench
16420 | data_pgtest | /data_pgtest
(4 rows)

Step 2 :

Now, I create two databases in two different tablespaces, using pgbench to create a few tables and load some data in them.

$ psql -c "CREATE DATABASE pgbench TABLESPACE data_pgbench"
$ psql -c "CREATE DATABASE pgtest TABLESPACE data_pgtest"
$ pgbench -i pgbench
$ pgbench -i pgtest

In a master-slave setup built using streaming replication, you must ensure that the directories exist in the slave, before running a

"CREATE TABLESPACE ..."
  on the master. This is because, the same statements used to create a tablespace are shipped/applied to the slave through WALs – this is unavoidable. The slave crashes with the following message, when these directories do not exist:

2018-12-15 12:00:56.319 UTC [13121] LOG: consistent recovery state reached at 0/80000F8
2018-12-15 12:00:56.319 UTC [13119] LOG: database system is ready to accept read only connections
2018-12-15 12:00:56.327 UTC [13125] LOG: started streaming WAL from primary at 0/9000000 on timeline 1
2018-12-15 12:26:36.310 UTC [13121] FATAL: directory "/data_pgbench" does not exist
2018-12-15 12:26:36.310 UTC [13121] HINT: Create this directory for the tablespace before restarting the server.
2018-12-15 12:26:36.310 UTC [13121] CONTEXT: WAL redo at 0/9000448 for Tablespace/CREATE: 16417 "/data_pgbench"
2018-12-15 12:26:36.311 UTC [13119] LOG: startup process (PID 13121) exited with exit code 1
2018-12-15 12:26:36.311 UTC [13119] LOG: terminating any other active server processes
2018-12-15 12:26:36.314 UTC [13119] LOG: database system is shut down
2018-12-15 12:27:01.906 UTC [13147] LOG: database system was interrupted while in recovery at log time 2018-12-15 12:06:13 UTC
2018-12-15 12:27:01.906 UTC [13147] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.

Step 3 :

Let’s now use pg_basebackup to take a backup. In this example, I use a tar format backup.

$ pg_basebackup -h localhost -p 5432 -U postgres -D /backup/latest_backup -Ft -z -Xs -P
94390/94390 kB (100%), 3/3 tablespaces

In the above log, you could see that there are three tablespaces that have been backed up: one default, and two newly created tablespaces. If we go back and check how the data in the two tablespaces are distributed to appropriate directories, we see that there are symbolic links created inside the pg_tblspc directory (within the data directory) for the oid’s of both tablespaces. These links are directed to the actual location of the tablespaces, we specified in Step 1.

$ ls -l $PGDATA/pg_tblspc
total 0
lrwxrwxrwx. 1 postgres postgres 5 Dec 15 12:31 16419 -> /data_pgbench
lrwxrwxrwx. 1 postgres postgres 6 Dec 15 12:31 16420 -> /data_pgtest

Step 4 :

Here are the contents inside the backup directory, that was generated through the backup taken in Step 3.

$ ls -l /backup/latest_backup
total 8520
-rw-------. 1 postgres postgres 1791930 Dec 15 12:54 16419.tar.gz
-rw-------. 1 postgres postgres 1791953 Dec 15 12:54 16420.tar.gz
-rw-------. 1 postgres postgres 5113532 Dec 15 12:54 base.tar.gz
-rw-------. 1 postgres postgres 17097 Dec 15 12:54 pg_wal.tar.gz

Tar Files :

16419.tar.gz
 and
16420.tar.gz
 are created as a backup for the two tablespaces. These are created with the same names as the OIDs of their respective tablespaces.

Let’s now take a look how we can restore this backup to completely different locations for data and tablespaces.

Restore a backup with multiple tablespaces

Step 1 :

In order to proceed further with the restore, let’s first extract the base.tar.gz file. This file contains some important files that help us to proceed further.

$ tar xzf /backup/latest_backup/base.tar.gz -C /pgdata
$ ls -larth /pgdata
total 76K
drwx------. 2 postgres postgres 18 Dec 14 14:15 pg_xact
-rw-------. 1 postgres postgres 3 Dec 14 14:15 PG_VERSION
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_twophase
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_subtrans
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_snapshots
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_serial
drwx------. 4 postgres postgres 36 Dec 14 14:15 pg_multixact
-rw-------. 1 postgres postgres 1.6K Dec 14 14:15 pg_ident.conf
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_dynshmem
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_commit_ts
drwx------. 6 postgres postgres 54 Dec 14 14:18 base
-rw-------. 1 postgres postgres 4.5K Dec 14 16:16 pg_hba.conf
-rw-------. 1 postgres postgres 208 Dec 14 16:18 postgresql.auto.conf
drwx------. 2 postgres postgres 6 Dec 14 16:18 pg_stat
drwx------. 2 postgres postgres 58 Dec 15 00:00 log
drwx------. 2 postgres postgres 6 Dec 15 12:54 pg_stat_tmp
drwx------. 2 postgres postgres 6 Dec 15 12:54 pg_replslot
drwx------. 4 postgres postgres 68 Dec 15 12:54 pg_logical
-rw-------. 1 postgres postgres 224 Dec 15 12:54 backup_label
drwx------. 3 postgres postgres 28 Dec 15 12:57 pg_wal
drwx------. 2 postgres postgres 4.0K Dec 15 12:57 global
drwx------. 2 postgres postgres 32 Dec 15 13:01 pg_tblspc
-rw-------. 1 postgres postgres 55 Dec 15 13:01 tablespace_map
-rw-------. 1 postgres postgres 24K Dec 15 13:04 postgresql.conf
-rw-r--r--. 1 postgres postgres 64 Dec 15 13:07 recovery.conf
-rw-------. 1 postgres postgres 44 Dec 15 13:07 postmaster.opts
drwx------. 2 postgres postgres 18 Dec 15 13:07 pg_notify
-rw-------. 1 postgres postgres 30 Dec 15 13:07 current_logfiles

Step 2 :

The files that we need to consider for our recovery are :

  • backup_label
  • tablespace_map

When you open the backup_label file, we see the start WAL location, backup start time, etc. These are some details that help us perform a point-in-time-recovery.

$ cat backup_label
START WAL LOCATION: 0/B000028 (file 00000001000000000000000B)
CHECKPOINT LOCATION: 0/B000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2018-12-15 12:54:10 UTC
LABEL: pg_basebackup base backup
START TIMELINE: 1

Now, let us see what is inside the

tablespace_map
 file.

$ cat tablespace_map
16419 /data_pgbench
16420 /data_pgtest

In the above log, you could see that there are two entries – one for each tablespace. This is a file that maps a tablespace (oid) to its location. When you start PostgreSQL after extracting the tablespace and WAL tar files, symbolic links are created automatically by postgres – inside the pg_tblspc directory for each tablespace – to the appropriate tablespace location, using the mapping done in this files.

Step 3 :

Now, in order to restore this backup in the same postgres server from where the backup was taken, you must remove the existing data in the original tablespace directories. This allows you to extract the tar files of each tablespaces to the appropriate tablespace locations.

The actual commands for extracting tablespaces from the backup in this case were the following:

$ tar xzf 16419.tar.gz -C /data_pgbench (Original tablespace location)
$ tar xzf 16420.tar.gz -C /data_pgtest  (Original tablespace location)

In a scenario where you want to restore the backup to the same machine from where the backup was originally taken, we must use different locations while extracting the data directory and tablespaces from the backup. In order to achieve that, tar files for individual tablespaces may be extracted to different directories than the original directories specified in

tablespace_map
 file, upon which we can modify the
tablespace_map
 file with the new tablespace locations. The next two steps should help you to see how this works.

Step 3a :

Create two different directories and extract the tablespaces to them.

$ tar xzf 16419.tar.gz -C /pgdata_pgbench (Different location for tablespace than original)
$ tar xzf 16420.tar.gz -C /pgdata_pgtest  (Different location for tablespace than original)

Step 3b :

Edit the

tablespace_map
 file with the new tablespace locations. Replace the original location of each tablespace with the new location, where we have extracted the tablespaces in the previous step. Here is how it appears after the edit.

$ cat tablespace_map
16419 /pgdata_pgbench
16420 /pgdata_pgtest

Step 4 :

Extract pg_wal.tar.gz from backup to pg_wal directory of the new data directory.

$ tar xzf pg_wal.tar.gz -C /pgdata/pg_wal

Step 5 :

Create

recovery.conf
 to specify the time until when you wish to perform a point-in-time-recovery. Please refer to our previous blog post – Step 3, to understand recovery.conf for PITR in detail.

Step 6 :

Once all of the steps above are complete you can start PostgreSQL.
You should see the following files renamed after recovery.

backup_label   --> backup_label.old
tablespace_map --> tablespace_map.old
recovery.conf  --> recovery.done

To avoid the exercise of manually modifying the tablespace_map file, you can use

--tablespace-mapping
 . This is an option that works when you use a plain format backup, but not with tar. Let’s see why you may prefer a tar format when compared to plain.

Backup of PostgreSQL cluster with tablespaces using plain format

Consider the same scenario where you have a PostgreSQL cluster with two tablespaces. You might see the following error when you do not use

--tablespace-mapping
 .

$ pg_basebackup -h localhost -p 5432 -U postgres -D /backup/latest_backup -Fp -Xs -P -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/22000028 on timeline 1
pg_basebackup: directory "/data_pgbench" exists but is not empty
pg_basebackup: removing contents of data directory "/backup/latest_backup"

What the above error means is that the pg_basebackup is trying to store the tablespaces in the same location as the original tablespace directory. Here

/data_pgbench
 is the location of tablespace :
data_pgbench.
 And, now, pg_basebackup is trying to store the tablespace backup in the same location. In order to overcome this error, you can apply tablespace mapping using the following syntax.

$ pg_basebackup -h localhost -p 5432 -U postgres -D /backup/latest_backup -T "/data_pgbench=/pgdata_pgbench" -T "/data_pgtest=/pgdata_pgtest" -Fp -Xs -P

-T
 is used to specify the tablespace mapping.
-T
 can be replaced by
--tablespace-mapping
.

The advantage of using -T (

--tablespace-mapping
 ) is that the tablespaces are stored separately in the mapping directories. In this example with plain format backup, you must extract all the following three directories in order to restore/recover the database using backup.

  • /backup/latest_backup
  • /pgdata_pgtest
  • /pgdata_pgbench

However, you do not need a

tablespace_map
  file in this scenario, as it is automatically taken care of by PostgreSQL.
If you take a backup in tar format, you see all the tar files for base, tablespaces and WAL segments stored in the same backup directory, and just this directory can be extracted for performing restore/recovery. However, you must manually extract the tablespaces and WAL segments to appropriate locations and edit the tablespace_map file, as discussed above.


Image based on Photos by Alan James Hendry on Unsplash   and  Tanner Boriack on Unsplash

by Avinash Vallarapu at December 21, 2018 05:35 PM

Jean-Jerome Schmidt

Database High Availability Comparison - MySQL / MariaDB Replication vs Oracle Data Guard

In the “State of the Open-Source DBMS Market, 2018”, Gartner predicts that by 2022, 70 percent of new in-house applications will be developed on an open-source database. And 50% of existing commercial databases will have converted. So, Oracle DBAs, get ready to start deploying and managing new open source databases - along with your legacy Oracle instances. Unless you’re already doing it.

So how does MySQL or MariaDB replication stack up against Oracle Data Guard? In this blog, we’ll compare the two from the standpoint of a high availability database solution.

What To Look For

A modern data replication architecture is built upon flexible designs that enable unidirectional and bidirectional data replication, as well as quick, automated failover to secondary databases in the event of unplanned service break. Failover should be also easy to execute and reliable so no committed transactions would be lost. Moreover switchover or failover should ideally be transparent to applications.

Data replication solutions have to be capable to copy data with very low latency to avoid processing bottlenecks and guarantee real-time access to data. Real-time copies could be deployed on a different database running on low-cost hardware.

When used for disaster recovery, the system must be validated to ensure application access to the secondary system with minimal service interruption. The ideal solution should allow regular testing of the disaster recovery process.

Main Topics of Comparison

  • Data availability and consistency
    • Gtid, scm
    • Mention Replication to multiple standby, async + sync models
    • Isolation of standby from production faults (e.g. delayed replication for mysql)
    • Avoid loss of data (sync replication)
  • Standby systems utilization
    • Usage of the standby
  • Failover, Switchover and automatic recovery
    • Database failover
    • Transparent application failover (TAF vs ProxySQL, MaxScale)
  • Security
  • Ease of use and management (unified management of pre-integrated components)

Data Availability and Consistency

MySQL GTID

MySQL 5.5 replication was based on binary log events, where all a slave knew was the precise event and the exact position it just read from the master. Any single transaction from a master may have ended in various binary logs from different slaves, and the transaction would typically have different positions in these logs. It was a simple solution that came with limitations, topology changes could require an admin to stop replication on the instances involved. These changes could cause some other issues, e.g., a slave couldn’t be moved down the replication chain without a time-consuming rebuild. Fixing a broken replication link would require manually determining a new binary log file and position of the last transaction executed on the slave and resuming from there, or a total rebuild. We’ve all had to work around these limitations while dreaming about a global transaction identifier.

MySQL version 5.6 (and MariaDB version 10.0.2) introduced a mechanism to solve this problem. GTID (Global Transaction Identifier) provides better transactions mapping across nodes.

With GTID, slaves can see a unique transaction coming in from several masters and this can easily be mapped into the slave execution list if it needs to restart or resume replication. So, the advice is to always use GTID. Note that MySQL and MariaDB have different GTID implementations.

Oracle SCN

In 1992 with the release 7.3 Oracle introduced a solution to keep a synchronized copy of a database as standby, know as Data Guard from version 9i release 2. A Data Guard configuration consists of two main components, a single primary database, and a standby database (up to 30). Changes on the primary database are passed through the standby database, and these changes are applied to the standby database to keep it synchronized.

Oracle Data Guard is initially created from a backup of the primary database. Data Guard automatically synchronizes the primary database and all standby databases by transmitting primary database redo - the information used by every Oracle Database to protect transactions - and applying it to the standby database. Oracle uses an internal mechanism called SCN (System Change Number). The system change number (SCN) is Oracle's clock, every time we commit, the clock increments. The SCN marks a consistent point in time in the database which is a checkpoint that is the act of writing dirty blocks (modified blocks from the buffer cache to disk). We can compare it to GTID in MySQL.

Data Guard transport services handle all aspects of transmitting redo from a primary to a standby database. As users commit transactions on the primary, redo records are generated and written to a local online log file. Data Guard transport services simultaneously transmit the same redo directly from the primary database log buffer (memory allocated within system global area) to the standby database(s) where it is written to a standby redo log file.

There are a few main differences between MySQL replication and Data Guard. Data Guard’s direct transmission from memory avoids disk I/O overhead on the primary database. It is different from how MySQL works - reading data from memory decreases I/O on a primary database.

Data Guard transmits only database redo. It is in stark contrast to storage remote-mirroring which must transmit every changed block in every file to maintain real-time synchronization.

Async + Sync Models

Oracle Data Guard offers three different models for the redo apply. Adaptive models dependent on available hardware, processes, and ultimately business needs.

  • Maximum Performance - default mode of operation, allowing a transaction to commit as soon as the redo data needed to recover that transaction is written to the local redo log on the master.
  • Maximum Protection - no data loss and the maximum level of protection. The redo data needed to improve each operation must be written to both the local online redo log on the master and standby redo log on at least one standby database before the transaction commits (Oracle recommends at least two standbys). The primary database will shut down if a fault blocks it from writing its redo stream to at least one synchronized standby database.
  • Maximum Availability - similar to Maximum Protection but the primary database will not shut down if a fault prevents it from writing its redo stream.

When it comes to choosing your MySQL replication setup, you have the choice between Asynchronous replication or Semi-Synchronous replication.

  • Asynchronous binlog apply is the default method for MySQL replication. The master writes events to its binary log and slaves request them when they are ready. There is no guarantee that any event will ever reach any slave.
  • Semi-synchronous commit on primary is delayed until master receives an acknowledgment from the semi-synchronous slave that data is received and written by the slave. Please note that semi-synchronous replication requires an additional plugin to be installed.

Standby Systems Utilization

MySQL is well known for its replication simplicity and flexibility. By default, you can read or even write to your standby/slave servers. Luckily, MySQL 5.6 and 5.7 brought many significant enhancements to Replication, including Global Transaction IDs, event checksums, multi-threaded slaves and crash-safe slaves/masters to make it even better. DBAs accustomed to MySQL replication reads and writes would expect a similar or even simpler solution from it's bigger brother, Oracle. Unfortunately not by default.

The standard physical standby implementation for Oracle is closed for any read-write operations. In fact, Oracle offers logical variation but it has many limitations, and it's not designed for HA. The solution to this problem is an additional paid feature called Active Data Guard, which you can use to read data from the standby while you apply redo logs.

Active Data Guard is a paid add-on solution to Oracle’s free Data Guard disaster recovery software available only for Oracle Database Enterprise Edition (highest cost license). It delivers read-only access, while continuously applying changes sent from the primary database. As an active standby database, it helps offload read queries, reporting and incremental backups from the primary database. The product’s architecture is designed to allow standby databases to be isolated from failures that may occur at the primary database.

An exciting feature of Oracle database 12c and something that Oracle DBA would miss is the data corruption validation. Oracle Data Guard corruption checks are performed to ensure that data is in exact alignment before data is copied to a standby database. This mechanism can also be used to restore data blocks on the primary directly from the standby database.

Failover, Switchover, and Automatic Recovery

To keep your replication setup stable and running, it is crucial for the system to be resilient to failures. Failures are caused by either software bugs, configuration problems or hardware issues, and can happen at any time. In case a server goes down, you need an alarm notification about the degraded setup. Failover (promotion of a slave to master) can be performed by the admin, who needs to decide which slave to promote.

The admin needs information about the failure, the synchronization status in case any data will be lost, and finally, steps to perform the action. Ideally, all should be automated and visible from a single console.

There are two main approaches to MySQL failover, automatic and manual. Both options have its fans, we describe the concepts in another article.

With the GTID, the manual failover becomes much easier. It consists of steps like:

  • Stop the receiver module (STOP SLAVE IO_THREAD)
  • Switch master (CHANGE MASTER TO <new_master_def>)
  • Start the receiver module (START SLAVE IO_THREAD)

Oracle Data Guard comes with a dedicated failover/switchover solution - Data Guard Broker. The broker is a distributed management framework that automates and centralizes the creation, maintenance, and monitoring of Oracle Data Guard configurations. With the access to the DG broker tool, you can perform configuration changes, switchovers, failovers and even dry test of your high availability setup. The two main actions are:

  • The command SWITCHOVER TO < standby database name > is used to perform the switchover operation. After the successful switchover operation, database instances switch places and replication continues. It’s not possible to switchover when standby is not responding or it’s down.
  • The common FAILOVER TO <standby database name> is used to perform the failover. After the failover operation, the previous primary server requires recreation but the new primary can take the database workload.

Speaking about failover, we need to consider how seamless your application failover can be. In the event of a planned/unplanned outage, how efficiently can user sessions be directed to a secondary site, with minimal business interruption.

The standard approach for MySQL would be to use one of the available Load Balancers. Starting from HAProxy which is widely used for HTTP or TCP/IP failover to database aware Maxscale or ProxySQL.

In Oracle, this problem is addressed by TAF (Transparent Application Failover). Once switchover or failover occurs, the application is automatically directed to the new primary. TAF enables the application to automatically and transparently reconnect to a new database, if the database instance to which the connection is made fails.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Security

Data security is a hot issue for many organizations these days. For those who need to implement standards like PCI DSS or HIPAA, database security is a must. The cross WAN environments might lead to concerns about data privacy and security especially as more businesses are having to comply with national and international regulations. MySQL binary logs used for replication may contain easy to read sensitive data. With the standard configuration, stealing data is a very easy process. MySQL supports SSL as a mechanism to encrypt traffic both between MySQL servers (replication) and between MySQL servers and clients. A typical way of implementing SSL encryption is to use self-signed certificates. Most of the time, it is not required to obtain an SSL certificate issued by the Certificate Authority. You can either use openssl to create certificates, example below:

$ openssl genrsa 2048 > ca-key.pem
$ openssl req -new -x509 -nodes -days 3600 -key ca-key.pem > ca-cert.pem
$ openssl req -newkey rsa:2048 -days 3600 -nodes -keyout client-key.pem > client-req.pem
$ openssl x509 -req -in client-req.pem -days 1000 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 > client-cert.pem
$ openssl req -newkey rsa:2048 -days 3600 -nodes -keyout client-key.pem > client-req.pem
$ openssl x509 -req -in client-req.pem -days 1000 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 > client-cert.pem
$ openssl rsa -in client-key.pem -out client-key.pem
$ openssl rsa -in server-key.pem -out server-key.pem

Then modify replication with parameters for SSL.

….MASTER_SSL=1, MASTER_SSL_CA = '/etc/security/ca.pem', MASTER_SSL_CERT = '/etc/security/client-cert.pem', MASTER_SSL_KEY = '/etc/security/client-key.pem';

For more automated option, you can use ClusterControl to enable encryption and manage SSL keys.

In Oracle 12c, Data Guard redo transport can be integrated with a set of dedicated security features called Oracle Advanced Security (OAS). Advanced Security can be used to enable encryption and authentication services between the primary and standby systems. For example, enabling Advanced Encryption Standard (AES) encryption algorithm requires only a few parameter changes in sqlnet.ora file to make redo (similar to MySQL binlog) encrypted. No external certificate setup is required and it only requires a restart of the standby database. The modification in sqlnet.ora and wallet are simple as:

Create a wallet directory

mkdir /u01/app/wallet

Edit sqlnet.ora

ENCRYPTION_WALLET_LOCATION=
 (SOURCE=
  (METHOD=file)
   (METHOD_DATA=
    (DIRECTORY=/u01/app/wallet)))

Create a keystore

ADMINISTER KEY MANAGEMENT CREATE KEYSTORE '/u01/app/wallet' identified by root ;

Open store

ADMINISTER KEY MANAGEMENT set KEYSTORE open identified by root ;

Create a master key

ADMINISTER KEY MANAGEMENT SET KEY IDENTIFIED BY root WITH BACKUP;

On standby

copy p12 and .sso files in the wallet directory and to update the sqlnet.ora file similar to the primary node.

For more information please follow Oracle's TDE white paper, you can learn from the whitepaper how to encrypt datafile and make wallet always open.

Ease of Use and Management

When you manage or deploy Oracle Data Guard configuration, you may find out that there are many steps and parameters to look for. To answer that, Oracle created DG Broker.

You can certainly create a Data Guard configuration without implementing the DG Broker but it can make your life much more comfortable. When it's implemented, the Broker’s command line utility - DGMGRL is probably the primary choice for the DBA. For those who prefer GUI, Cloud Control 13c has an option to access DG Broker via the web interface.

The tasks that Broker can help with are an automatic start of the managed recovery, one command for failover/switchover, monitoring of DG replication, configuration verification and many other.

DGMGRL> show configuration 
Configuration - orcl_s9s_config 

Protection Mode: MaxPerformance
  Members:

s9sp  - Primary database
    s9ss - Physical standby database 

Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS   (status updated 12 seconds ago

MySQL does not offer a similar solution to Oracle DG Broker. However you can extend its functionality by using tools like Orchestrator, MHA and load balancers (ProxySQL, HAProxy or Maxscale). The solution to manage databases and load balancers is ClusterControl. The ClusterControl Enterprise Edition gives you will a full set of management and scaling features in addition to the deployment and monitoring functions offered as part of the free Community Edition.

by Bart Oles at December 21, 2018 02:34 PM

Peter Zaitsev

Percona Server for MongoDB Authentication Using Active Directory

authentication

mongodb authentication with active directoryThis article will walk you through using the SASL library to allow your Percona Server for MongoDB instance to authenticate with your company’s Active Directory server. Percona Server for MongoDB includes enterprise level features, such as LDAP authentication, audit logging and with the 3.6.8 release a beta version of data encryption at rest, all in its open source offering.

Pre set-up assumptions

In this article we will make a couple of assumptions:

  1. You have an Active Directory server up and running and that it is accessible to the server that you have Percona Server for MongoDB installed on.
  2. These machines are installed behind a firewall as the communications between the two servers will be in plain text. This is due the fact that we can only use the SASL mechanism of PLAIN when authenticating and credentials will be sent in plain text.
  3. You have sudo privilege on the server you are going to install Percona Server for MongoDB on.

Installing Percona Server for MongoDB

The first thing you are going to need to do is to install the Percona Server for MongoDB package. You can get this in a couple of different ways. You can either install from the Percona repositories, or you can download the packages and install them manually.

Once you have Percona Server for MongoDB installed, we want to start the mongod service and make sure it is set to run on restart.

sudo systemctl start mongod
sudo systemctl enable mongod

Now that the service is up and running, we want to open the mongo shell and add a database administrator user. This user will be authenticated inside of the MongoDB server itself and will not have any interactions with the Active Directory server.

To start the mongo shell up, type mongo from a terminal window. Once you do this you will see something similar to the following:

Percona Server for MongoDB shell version v3.6.8-2.0
connecting to: mongodb://127.0.0.1:27017
Percona Server for MongoDB server version: v3.6.8-2.0
Server has startup warnings:
2018-12-11T17:48:47.471+0000 I STORAGE [initandlisten]
2018-12-11T17:48:47.471+0000 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-12-11T17:48:47.471+0000 I STORAGE [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-12-11T17:48:48.197+0000 I CONTROL [initandlisten]
2018-12-11T17:48:48.197+0000 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-12-11T17:48:48.197+0000 I CONTROL [initandlisten] **          Read and write access to data and configuration is unrestricted.
2018-12-11T17:48:48.197+0000 I CONTROL [initandlisten] **          You can use percona-server-mongodb-enable-auth.sh to fix it.
2018-12-11T17:48:48.197+0000 I CONTROL [initandlisten]

Notice the second warning that access control is not enabled for the database. Percona Server for MongoDB comes with a script that you can run that will enable authentication for you, but we can also do this manually.

We will go ahead and manually add a user in MongoDB that has the root role assigned to it. This user will have permission to do anything on the server, so you will want to make sure to keep the password safe. You will also not want to use this user for doing your day to day work inside of MongoDB.

This user needs to be created in the admin database as it needs to have access to the entire system. To do this run the following commands inside of the mongo shell:

> use admin
switched to db admin
> db.createUser({"user": "admin", "pwd": "$3cr3tP4ssw0rd", "roles": ["root"]})
Successfully added user: { "user" : "admin", "roles" : [ "root" ] }

Now that we have a user created in MongoDB we can go ahead and enable authorization. To do this we need to modify the /etc/mongod.conf file and add the following lines:

security:
  authorization: enabled
setParameter:
  authenticationMechanisms: PLAIN,SCRAM-SHA-1

Notice that we have two mechanisms set up for authentication. The first one, PLAIN, is used for authenticating with Active Directory. The second one, SCRAM-SHA-1 is used for internal authentication inside of MongoDB.

Once you’ve made the changes, you can restart the mongod service by running the following command:

sudo systemctl restart mongod

Now if you were to run the mongo shell again, you wouldn’t see the access control warning any more, and you would need to log in as your new user to be able to run any commands.

If you were to try to get a list of databases before logging in you would get an error:

> show dbs;
2018-12-11T21:50:39.551+0000 E QUERY [thread1] Error: listDatabases failed:{
"ok" : 0,
"errmsg" : "not authorized on admin to execute command { listDatabases: 1.0, $db: \"admin\" }",
"code" : 13,
"codeName" : "Unauthorized"
} :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
Mongo.prototype.getDBs@src/mongo/shell/mongo.js:65:1
shellHelper.show@src/mongo/shell/utils.js:849:19
shellHelper@src/mongo/shell/utils.js:739:15
@(shellhelp2):1:1

Let’s go ahead and run the mongo shell and then log in with our admin user:

> use admin
switched to db admin
> db.auth("admin", "$3cr3tP4ssw0rd")
1

If you are successful you will get a return value of 1. If authentication fails, you will get a return value of 0. Failure is generally due to a mistyped username or password, but you could also be trying to authenticate in the wrong database. In MongoDB you must be in the database that the user was created in before trying to authenticate.

Now that we’ve logged in as the admin user, we will add a document that will be used to verify that our Active Directory based user can successfully access the data at the end of this post.

> use percona
switched to db percona
> db.test.insert({"message": "Active Directory user success!"})
WriteResult({ "nInserted" : 1 })

Install the Cyrus SASL packages

Now that we have a Percona Server for MongoDB instance set up and it is secured, we need to add some packages that will allow us to communicate properly with the Active Directory server.

For RedHat use the following command

sudo yum install -y cyrus-sasl cyrus-sasl-plain

For Ubuntu use this command

sudo app install -y sasl2-bin

Next we need to update the SASL configuration to use LDAP instead of PAM, which is the default. To do this we need to edit the file /etc/sysconfig/saslauthd, remembering to backup up your original file first.

For RedHat we use the following commands

sudo cp /etc/sysconfig/saslauthd /etc/sysconfig/saslauthd.bak
sudo sed -i -e s/^MECH=pam/MECH=ldap/g /etc/sysconfig/saslauthd

For Ubuntu we use these commands instead

sudo cp /etc/default/saslauthd /etc/default/saslauthd.bak
sudo sed -i -e s/^MECHANISMS="pam"/MECHANISMS="ldap"/g /etc/default/saslauthd 
sudo sed -i -e s/^START=no/START=yes/g /etc/default/saslauthd

We also need to create the file /etc/saslauthd.conf with contents similar to the following (replace values as necessary for your Active Directory installation):

ldap_servers: ldap://LDAP.EXAMPLE.COM
ldap_mech: PLAIN
ldap_filter: cn=%u,CN=Users,DC=EXAMPLE,DC=COM
ldap_search_base:CN=Users,DC=EXAMPLE,DC=COM
ldap_filter:(cn=%u)
ldap_bind_dn:CN=ADADMIN,CN=Users,DC=EXAMPLE,DC=COM
ldap_password:ADADMINPASSWORD

Now that we’ve got SASL set up, we can start the saslauthd process and set it to run on restart.

sudo systemctl start saslauthd
sudo systemctl enable saslauthd

Next we need to allow the mongod process to write to the saslauthd mux socket and change the permissions on the owning directory to 755 so MongoDB can write to it. This is the default on RedHat, but not for Ubuntu.

On Ubuntu you can either change the permissions on the folder

sudo chmod 755 /run/saslauthd

Or you could add the mongod user to the sasl group

sudo usermod -a -G sasl mongod

Test the users

The SASL installation provides us with a tool to test that our Active Directory users can be logged in from this machine. Let’s go ahead and test to see if we can authenticate with our Active Directory user.

sudo testsaslauthd -u aduser -p ADP@assword1

You should see 0: OK "Success." if authentication worked.

Create a SASL config file for MongoDB

To allow MongoDB to use SASL to communicate with Active Direcory, we need to create a configuration file.

Create the requisite directory if it doesn’t exist:

mkdir -p /etc/sasl2

And then we need to create the file /etc/sasl2/mongodb.conf and place the following contents into it:

pwcheck_method: saslauthd
saslauthd_path: /var/run/saslauthd/mux
log_level: 5
mech_list: plain

Add Active Directory user to MongoDB

Now we can finally add our Active Directory user to our MongoDB instance:

$ mongo
Percona Server for MongoDB shell version v3.6.8-2.0
connecting to: mongodb://127.0.0.1:27017
Percona Server for MongoDB server version: v3.6.8-2.0
> use admin
switched to db admin
> db.auth("admin", "$3cr3tP4ssw0rd")
1
> use $external
switched to db $external
> db.createUser({"user": "aduser", "roles": [{"role": "read", "db": "percona"}]})
Successfully added user: {
        "user" : "aduser",
        "roles" : [
                {
                        "role" : "read",
                        "db" : "percona"
                }
        ]
}

As you can see from the above, when we create the user that will be authenticated with Active Directory, we need to be in the special $external database and we don’t supply a password as we would when we create a MongoDB authenticated user.

Now let’s try to log in with our Active Directory based user. First we need exit our current mongo shell and restart it, and then we can log in with our Active Directory user:

> exit
bye
$ mongo
Percona Server for MongoDB shell version v3.6.8-2.0
connecting to: mongodb://127.0.0.1:27017
Percona Server for MongoDB server version: v3.6.8-2.0
> use $external
switched to db $external
> db.auth({"mechanism": "PLAIN", "user": "aduser", "pwd": "adpassword", "digestPassword ": false})
1
> use percona
switched to db percona
> db.test.find()
{ "_id" : ObjectId("5c12a47904a287e45fcb580e"), "message" : "Active Directory user success!" }

As you can see above our Active Directory based user was able to authenticate and then change over to the percona database and see the document we stored earlier.

You will notice that our auth() call above is different than the one we used to log in with MongoDB based users. In this case we need to pass in a document with not only the user and password, but also the mechanism to use. We also want to set digestPassword to false.

You can also log in directly from the command line with the following:

mongo percona --host localhost --port 27017 --authenticationMechanism PLAIN --authenticationDatabase \$external --username dduncan --p

There are a couple of things to note here if you’re not used to using the command line to log in:

  1. We place the --password option at the end of the command line and do not provide a password here. This will cause the application to prompt us for a password.
  2. You will also automatically be placed into the percona database, or whatever database name you provide after mongo.
  3. You need to escape the $external database name with a backslash (\) or the terminal will treat $external as an environment variable and you will most likely get an error.

Conclusion

In conclusion, it is easy to connection your Percona Server for MongoDB instance to your corporate Active Directory server. This allows your MongoDB users to use the same credentials to log into MongoDB as they do their corporate email and workstation.


Photo by Steve Halama on Unsplash

by Doug Duncan at December 21, 2018 01:09 PM

December 20, 2018

Peter Zaitsev

Benchmark PostgreSQL With Linux HugePages

Benchmarking HugePages and PostgreSQL

Linux kernel provides a wide range of configuration options that can affect performance. It’s all about getting the right configuration for your application and workload. Just like any other database, PostgreSQL relies on the Linux kernel to be optimally configured. Poorly configured parameters can result in poor performance. Therefore, it is important that you benchmark database performance after each tuning session to avoid performance degradation. In one of my previous posts, Tune Linux Kernel Parameters For PostgreSQL Optimization, I described some of the most useful Linux kernel parameters and how those may help you improve database performance. Now I am going to share my benchmark results with you after configuring Linux Huge Page with different PostgreSQL workload. I have performed a comprehensive set of benchmarks for many different PostgreSQL load sizes and different number concurrent clients.

Benchmark Machine

  • Supermicro server:
    • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
    • Memory: 256GB of RAM
    • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
    • Filesystem: ext4/xfs
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic
  • PostgreSQL: version 11

Linux Kernel Settings

I have used default kernel settings without any optimization/tuning except for disabling Transparent HugePages. Transparent HugePages are by default enabled, and allocate a page size that may not be recommended for database usage. For databases generally, fixed sized HugePages are needed, which Transparent HugePages do not provide. Hence, disabling this feature and defaulting to classic HugePages is always recommended.

PostgreSQL Settings

I have used consistent PostgreSQL settings for all the benchmarks in order to record different PostgreSQL workloads with different settings of Linux HugePages. Here is the PostgreSQL setting used for all benchmarks:

shared_buffers = '64GB'
work_mem = '1GB'
random_page_cost = '1'
maintenance_work_mem = '2GB'
synchronous_commit = 'on'
seq_page_cost = '1'
max_wal_size = '100GB'
checkpoint_timeout = '10min'
synchronous_commit = 'on'
checkpoint_completion_target = '0.9'
autovacuum_vacuum_scale_factor = '0.4'
effective_cache_size = '200GB'
min_wal_size = '1GB'
wal_compression = 'ON'

Benchmark scheme

In the benchmark, the benchmark scheme plays an important role. All the benchmarks are run three times with thirty minutes duration for each run. I took the median value from these three benchmarks. The benchmarks were carried out using the PostgreSQL benchmarking tool pgbench.  pgbench works on scale factor, with one scale factor being approximately 16MB of workload. 

HugePages

Linux, by default, uses 4K memory pages along with HugePages. BSD has Super Pages, whereas Windows has Large Pages. PostgreSQL has support for HugePages (Linux) only. In cases where there is a high memory usage, smaller page sizes decrease performance. By setting up HugePages, you increase the dedicated memory for the application and therefore reduce the operational overhead that is incurred during allocation/swapping; i.e. you gain performance by using HugePages.

Here is the Hugepage setting when using Hugepage size of 1GB. You can always get this information from /proc.

AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:     100
HugePages_Free:       97
HugePages_Rsvd:       63
HugePages_Surp:        0
Hugepagesize:    1048576 kB

For more detail about HugePages please read my previous blog post.

https://www.percona.com/blog/2018/08/29/tune-linux-kernel-parameters-for-postgresql-optimization/

Generally, HugePages comes in sizes 2MB and 1GB, so it makes sense to use 1GB size instead of the much smaller 2MB size.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-transhuge
https://kerneltalks.com/services/what-is-huge-pages-in-linux/

Benchmark Results

This benchmark shows the overall impact of different sizes of HugePages. The first set of benchmarks was created with the default Linux 4K page size without enabling HugePages. Note that Transparent Hugepages were also disabled, and remained disabled throughout these benchmarks.

Then the second set of benchmarks was performed with 2MB HugePages. Finally, the third set of benchmarks is performed with HugePages set to 1GB in size.

All these benchmarks were executed with PostgreSQL version 11. The sets include a combination of different database sizes and clients. The graph below shows comparative performance results for these benchmarks with TPS (transactions per seconds) on the y-axis, and database size and the number of clients per database size on the x-axis.

 

Clearly, from the graph above, you can see that the performance gain with HugePages increases as the number of clients and the database size increases, as long as the size remains within the pre-allocated shared buffer.

This benchmark shows TPS versus clients. In this case, the database size is set to 48GB. On the y-axis, we have TPS and on the x-axis, we have the number of connected clients. The database size is small enough to fit in the shared buffer, which is set to 64GB.

With HugePages set to 1GB, the higher the number of clients, the higher the comparative performance gain.

The next graph is the same as the one above except for a database size of 96GB. This exceeds the shared buffer size, which is set to 64GB.

 

The key observation here is that the performance with 1GB HugePages improves as the number of clients increases and it eventually gives better performance than 2MB HugePages or the standard 4KB page size.

This benchmark shows the TPS versus database size. In this case, the number of connected clients it set to 32. On the y-axis, we have TPS and on the x-axis, we have database sizes.

As expected, when the database spills over the pre-allocated HugePages, the performance degrades significantly.

Summary

One of my key recommendations is that we must keep Transparent HugePages off. You will see the biggest performance gains when the database fits into the shared buffer with HugePages enabled. Deciding on the size of huge page to use requires a bit of trial and error, but this can potentially lead to a significant TPS gain where the database size is large but remains small enough to fit in the shared buffer.

by Ibrar Ahmed at December 20, 2018 06:13 PM

MariaDB Foundation

MariaDB 10.4.1 and MariaDB Connector/Node.js 2.0.2 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.4.1, the first beta release in the MariaDB 10.4 series, as well as MariaDB Connector/Node.js 2.0.2, the first release candidate of the 100% JavaScript non-blocking MariaDB client for Node.js. See the release notes and changelogs for details. Download MariaDB 10.4.1 Release Notes Changelog What […]

The post MariaDB 10.4.1 and MariaDB Connector/Node.js 2.0.2 now available appeared first on MariaDB.org.

by Ian Gilfillan at December 20, 2018 05:54 PM

Peter Zaitsev

Percona Database Performance Blog 2018 Year in Review: Top Blog Posts

Percona Database Performance Blog

Percona Database Performance BlogLet’s look at some of the most popular Percona Database Performance Blog posts in 2018.

The closing of a year lends itself to looking back. And making lists. With the Percona Database Performance Blog, Percona staff and leadership work hard to provide the open source community with insights, technical support, predictions and metrics around multiple open source database software technologies. We’ve had nearly 4 million visits to the blog in 2018: thank you! We look forward to providing you with even better articles, news and information in 2019.

As 2018 moves into 2019, let’s take a quick look back at some of the most popular posts on the blog this year.

Top 10 Most Read

These posts had the most number of views (working down from the highest):

When Should I Use Amazon Aurora and When Should I use RDS MySQL?

Now that Database-as-a-service (DBaaS) is in high demand, there is one question regarding AWS services that cannot always be answered easily : When should I use Aurora and when RDS MySQL?

About ZFS Performance

ZFS has many very interesting features, but I am a bit tired of hearing negative statements on ZFS performance. It feels a bit like people are telling me “Why do you use InnoDB? I have read that MyISAM is faster.” I found the comparison of InnoDB vs. MyISAM quite interesting, and I’ll use it in this post.

Linux OS Tuning for MySQL Database Performance

In this post we will review the most important Linux settings to adjust for performance tuning and optimization of a MySQL database server. We’ll note how some of the Linux parameter settings used OS tuning may vary according to different system types: physical, virtual or cloud.

A Look at MyRocks Performance

As the MyRocks storage engine (based on the RocksDB key-value store http://rocksdb.org ) is now available as part of Percona Server for MySQL 5.7, I wanted to take a look at how it performs on a relatively high-end server and SSD storage.

How to Restore MySQL Logical Backup at Maximum Speed

The ability to restore MySQL logical backups is a significant part of disaster recovery procedures. It’s a last line of defense.

Why MySQL Stored Procedures, Functions and Triggers Are Bad For Performance

MySQL stored procedures, functions and triggers are tempting constructs for application developers. However, as I discovered, there can be an impact on database performance when using MySQL stored routines. Not being entirely sure of what I was seeing during a customer visit, I set out to create some simple tests to measure the impact of triggers on database performance. The outcome might surprise you.

AMD EPYC Performance Testing… or Don’t get on the wrong side of SystemD

Ever since AMD released their EPYC CPU for servers I wanted to test it, but I did not have the opportunity until recently, when Packet.net started offering bare metal servers for a reasonable price. So I started a couple of instances to test Percona Server for MySQL under this CPU. In this benchmark, I discovered some interesting discrepancies in performance between  AMD and Intel CPUs when running under systemd.

Tuning PostgreSQL Database Parameters to Optimize Performance

Out of the box, the default PostgreSQL configuration is not tuned for any particular workload. Default values are set to ensure that PostgreSQL runs everywhere, with the least resources it can consume and so that it doesn’t cause any vulnerabilities. It is primarily the responsibility of the database administrator or developer to tune PostgreSQL according to their system’s workload. In this blog, we will establish basic guidelines for setting PostgreSQL database parameters to improve database performance according to workload.

Using AWS EC2 instance store vs EBS for MySQL: how to increase performance and decrease cost

If you are using large EBS GP2 volumes for MySQL (i.e. 10TB+) on AWS EC2, you can increase performance and save a significant amount of money by moving to local SSD (NVMe) instance storage. Interested? Then read on for a more detailed examination of how to achieve cost-benefits and increase performance from this implementation.

Why You Should Avoid Using “CREATE TABLE AS SELECT” Statement

In this blog post, I’ll provide an explanation why you should avoid using the CREATE TABLE AS SELECT statement. The SQL statement “create table <table_name> as select …” is used to create a normal or temporary table and materialize the result of the select. Some applications use this construct to create a copy of the table. This is one statement that will do all the work, so you do not need to create a table structure or use another statement to copy the structure.

Honorable Mention:

Is Serverless Just a New Word for Cloud-Based?

Top 10 Most Commented

These posts generated some healthy discussions (not surprisingly, this list overlaps with the first):

Posts Worth Revisiting

Don’t miss these great posts that have excellent information on important topics:

Have a great end of the year celebration, and we look forward to providing more great blog posts in 2019.

by Dave Avery at December 20, 2018 12:27 PM

December 19, 2018

Peter Zaitsev

Percona Server for MySQL 5.7.24-27 Is Now Available

Percona Server for MySQL 8.0

Percona Server for MySQLPercona announces the release of Percona Server for MySQL 5.7.24-27 on December 19, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.7.24, including all the bug fixes in it. Percona Server for MySQL 5.7.24-27 is now the current GA release in the 5.7 series. All of Percona’s software is open-source and free.

If you’re currently using Percona Server for MySQL 5.7, Percona recommends upgrading to this version of 5.7 prior to upgrading to Percona Server for MySQL 8.0.

Bugs Fixed:

  • When uninstalling Percona Server for MySQL packages on CentOS 7 default configuration file my.cnf would get removed as well. This fix makes the backup of the configuration file instead of removing it. Bug fixed #5092.

Find the release notes for Percona Server for MySQL 5.7.24-27 in our online documentation. Report bugs in the Jira bug tracker.

by Hrvoje Matijakovic at December 19, 2018 04:00 PM

Using Partial and Sparse Indexes in MongoDB

MongoDb using partial sparse indexes

MongoDb using partial sparse indexesIn this article I’m going to talk about partial and sparse indexes in MongoDB® and Percona Server for MongoDB®. I’ll show you how to use them, and look at cases where they can be helpful. Prior to discussing these indexes in MongoDB in detail, though, let’s talk about an issue on a relational database like MySQL®.

The boolean issue in MySQL

Consider you have a very large table in MySQL with a boolean column. Typically you created a ENUM(‘T’,’F’) field to store the boolean information or a TINYINT column to store only 1s and 0s. This is good so far. But think now what you can do if you need to run a lot of queries on the table, with a condition on the boolean field, and no other relevant conditions on other indexed columns are used to filter the examined rows.

Why not create and index on the boolean field? Well, yes, you can, but in some cases this solution will be completely useless and will introduce an overhead for the index maintenance.

Think about if you have an even distribution of true and false values in the table, in more or less a 50:50 split. In this situation, the index on the boolean column cannot be used because MySQL will prefer to do a full scan of the large table instead of selecting half of rows using the BTREE entries. We can say that a boolean field like this one has a low cardinality, and it’s not highly selective.

Consider now the case in which you don’t have an even distribution of the values, let’s say 2% of the rows contain false and the remaining 98% contain true. In such a situation, a query to select the false values will most probably use the index. The queries to select the true values won’t use the index, for the same reason we have discussed previously. In this second case the index is very useful, but only for selecting the great minority of rows. The remaining 98% of the entries in the index are completely useless. This represents a great waste of disk space and resources, because the index must be maintained for each write.

It’s not just booleans that can have this problem in relation to index usage, but any field with a low cardinality.

Note: there are several workarounds to deal with this problem, I know. For example, you can create a multi-column index using a more selective field and the boolean. Or you could design your database differently. Here, I’m illustrating the nature of the problem in order to explain a MongoDB feature in a context. 

The boolean issue in MongoDB

How about MongoDB? Does MongoDB have the same problem?  The answer is: yes, MongoDB has the same problem. If you have a lot of documents in a collection with a boolean field or a low cardinality field, and you create an index on it, then you will have a very large index that’s not really useful. But more importantly you will have writes degradation for the index maintenance.

The only difference is that MongoDB will tend to use the index anyway, instead of doing the entire collection scan, but the execution time will be of the same magnitude as doing the COLLSCAN. In the case of very large indexes, a COLLSCAN should be preferable.

Fortunately MongoDB has an option that you can specify during index creation to define a Partial Index. Let’s see.

Partial Index

A partial index is an index that contains only a subset of values based on a filter rule. So, in the case of the unevenly distributed boolean field, we can create an index on it specifying that we want to consider only the false values. This way we avoid recording the remaining 98% of useless true entries. The index will be smaller, we’ll save disk and memory space, and the most frequent writes – when entering the true values – won’t initiate the index management activity. As a result, we won’t have lots of penalties during writes but we’ll have a useful index when searching the false values.

Let’s say that, when you have an uneven distribution, the most relevant searches are the ones for the minority of the values. This is in general the scenario for real applications.

Let’s see now how to create a Partial Index.

First, let’s create a collection with one million random documents. Each document contains a boolean field generated by the javascript function randomBool(). The function generates a false value in 5% of the documents, in order to have an uneven distribution. Then, test the number of false values in the collection.

> function randomBool() { var bool = true; var random_boolean = Math.random() >= 0.95; if(random_boolean) { bool = false }; return bool; }
> for (var i = 1; i <= 1000000; i++) { db.test.insert( { _id: i, name: "name"+i, flag: randomBool() } ) }
WriteResult({ "nInserted" : 1 })
> db.test.find().count()
1000000
> db.test.find( { flag: false } ).count()
49949

Create the index on the flag field and look at the index size using db.test.stats().

> db.test.createIndex( { flag: 1 } )
{ "createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1 }
> db.test.stats().indexSizes
{ "_id_" : 13103104, "flag_1" : 4575232 }

The index we created is 4575232 bytes.

Test some simple queries to extract the documents based on the flag value and take a look at the index usage and the execution times. (For this purpose, we use an explainable object)

// create the explainable object
> var exp = db.test.explain( "executionStats" )
// explain the complete collection scan
> exp.find( {  } )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1000000,
		"executionTimeMillis" : 250,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 1000000,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"nReturned" : 1000000,
			"executionTimeMillisEstimate" : 200,
			"works" : 1000002,
			"advanced" : 1000000,
			"needTime" : 1,
			"needYield" : 0,
			"saveState" : 7812,
			"restoreState" : 7812,
			"isEOF" : 1,
			"invalidates" : 0,
			"direction" : "forward",
			"docsExamined" : 1000000
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}
// find the documents flag=true
> exp.find( { flag: true } )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : true
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[true, true]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 950051,
		"executionTimeMillis" : 1028,
		"totalKeysExamined" : 950051,
		"totalDocsExamined" : 950051,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 950051,
			"executionTimeMillisEstimate" : 990,
			"works" : 950052,
			"advanced" : 950051,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 7422,
			"restoreState" : 7422,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 950051,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 950051,
				"executionTimeMillisEstimate" : 350,
				"works" : 950052,
				"advanced" : 950051,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 7422,
				"restoreState" : 7422,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[true, true]"
					]
				},
				"keysExamined" : 950051,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}
// find the documents with flag=false
> exp.find( { flag: false } )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : false
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 49949,
		"executionTimeMillis" : 83,
		"totalKeysExamined" : 49949,
		"totalDocsExamined" : 49949,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 49949,
			"executionTimeMillisEstimate" : 70,
			"works" : 49950,
			"advanced" : 49949,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 390,
			"restoreState" : 390,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 49949,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 49949,
				"executionTimeMillisEstimate" : 10,
				"works" : 49950,
				"advanced" : 49949,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 390,
				"restoreState" : 390,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				},
				"keysExamined" : 49949,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

As expected, MongoDB does a COLLSCAN when looking for db.test.find( {} ). The important thing here is that it takes 250 milliseconds for the entire collection scan.

In both the other cases – find ({flag:true}) and find({flag:false}) – MongoDB uses the index. But let’s have a look at the execution times:

  • for db.test.find({flag:true}) is 1028 milliseconds. The execution time is more than the COLLSCAN. The index in this case is not useful. COLLSCAN should be preferable.
  • for db.test.find({flag:false}) is 83 milliseconds. This is good. The index in this case is very useful.

Now, create the partial index on the flag field. To do it we must use the PartialFilterExpression option on the createIndex command.

// drop the existing index
> db.test.dropIndex( { flag: 1} )
{ "nIndexesWas" : 2, "ok" : 1 }
// create the partial index only on the false values
> db.test.createIndex( { flag : 1 }, { partialFilterExpression :  { flag: false }  } )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
// get the index size
> db.test.stats().indexSizes
{ "_id_" : 13103104, "flag_1" : 278528 }
// create the explainalbe object
> var exp = db.test.explain( "executionStats" )
// test the query for flag=false
> exp.find({ flag: false  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : false
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 49949,
		"executionTimeMillis" : 80,
		"totalKeysExamined" : 49949,
		"totalDocsExamined" : 49949,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 49949,
			"executionTimeMillisEstimate" : 80,
			"works" : 49950,
			"advanced" : 49949,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 390,
			"restoreState" : 390,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 49949,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 49949,
				"executionTimeMillisEstimate" : 40,
				"works" : 49950,
				"advanced" : 49949,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 390,
				"restoreState" : 390,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				},
				"keysExamined" : 49949,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}
// test the query for flag=true
> exp.find({ flag: true  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : true
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"flag" : {
					"$eq" : true
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 950051,
		"executionTimeMillis" : 377,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 1000000,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"flag" : {
					"$eq" : true
				}
			},
			"nReturned" : 950051,
			"executionTimeMillisEstimate" : 210,
			"works" : 1000002,
			"advanced" : 950051,
			"needTime" : 49950,
			"needYield" : 0,
			"saveState" : 7812,
			"restoreState" : 7812,
			"isEOF" : 1,
			"invalidates" : 0,
			"direction" : "forward",
			"docsExamined" : 1000000
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

We can notice the following:

  • db.test.find({flag:false}) uses the index and the execution time is more or less the same as before
  • db.test.find({flag:true}) doesn’t use the index. MongoDB does the COLLSCAN and the execution is better than before
  • note the index size is only 278528 bytes. now A great saving in comparison to the complete index on flag. There won’t be overhead during the writes in the great majority of the documents.

Partial option on other index types

You can use the partialFilterExpression option even in compound indexes or other index types. Let’s see an example of a compound index.

Insert some documents in the students collection

db.students.insert( [
{ _id:1, name: "John", class: "Math", grade: 10 },
{ _id: 2, name: "Peter", class: "English", grade: 6 },
{ _id: 3, name: "Maria" , class: "Geography", grade: 8 },
{ _id: 4, name: "Alex" , class: "Geography", grade: 5},
{ _id: 5, name: "George" , class: "Math", grade: 7 },
{ _id: 6, name: "Tony" , class: "English", grade: 9 },
{ _id: 7, name: "Sam" , class: "Math", grade: 6 },
{ _id: 8, name: "Tom" , class: "English", grade: 5 }
])

Create a partial compound index on name and class fields for the grade greater or equal to 8.

> db.students.createIndex( { name: 1, class: 1  }, { partialFilterExpression: { grade: { $gte: 8} } } )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}

Notice that the grade field doesn’t necessarily need to be part of the index.

Query coverage

Using the students collection, we want now to show when a partial index can be used.

The important thing to remember is that a partial index is “partial”. It means that it doesn’t contain all the entries.

In order for MongoDB to use it the conditions in the query must include an expression on the filter field and the selected documents must be a subset of the index.

Let’s see some examples.

The following query can use the index because we are selecting a subset of the partial index.

> db.students.find({name:"Tony", grade:{$gt:8}})
{ "_id" : 6, "name" : "Tony", "class" : "English", "grade" : 9 }
// let's look at the explain
> db.students.find({name:"Tony", grade:{$gt:8}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"name" : {
						"$eq" : "Tony"
					}
				},
				{
					"grade" : {
						"$gt" : 8
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"grade" : {
					"$gt" : 8
				}
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"name" : 1,
					"class" : 1
				},
				"indexName" : "name_1_class_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"name" : [ ],
					"class" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"name" : [
						"[\"Tony\", \"Tony\"]"
					],
					"class" : [
						"[MinKey, MaxKey]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

The following query cannot use the index because the condition on grade > 5 is not selecting a subset of the partial index. So the COLLSCAN is needed.

> db.students.find({name:"Tony", grade:{$gt:5}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"name" : {
						"$eq" : "Tony"
					}
				},
				{
					"grade" : {
						"$gt" : 5
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"name" : {
							"$eq" : "Tony"
						}
					},
					{
						"grade" : {
							"$gt" : 5
						}
					}
				]
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

Even the following query cannot use the index. As we said the grade field is not part of the index. The simple condition on grade is not sufficient.

> db.students.find({grade:{$gt:8}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"grade" : {
				"$gt" : 8
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"grade" : {
					"$gt" : 8
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

Sparse Index

A sparse index is an index that contains entries only for the documents that have the indexed field.

Since MongoDB is a schemaless database, not all the documents in a collection will necessarily contain the same fields. So we have two options when creating an index:

  • create a regular “non-sparse” index
    • the index contains as many entries as the documents
    • the index contains entries as null for all the documents without the indexed field
  • create a sparse index
    • the index contains as many entries as the documents with the indexed field

We call it “sparse” because it doesn’t contain all the documents of the collection.

The main advantage of the sparse option is to reduce the index size.

Here’s how to create a sparse index:

db.people.createIndex( { city: 1 }, { sparse: true } )

Sparse indexes are a subset of partial indexes. In fact you can emulate a sparse index using the following definition of a partial.

db.people.createIndex(
{city:  1},
{ partialFilterExpression: {city: {$exists: true} } }
)

For this reason partial indexes are preferred over sparse indexes.

Conclusions

Partial indexing is a great feature in MongoDB. You should consider using it to achieve the following advantages:

  • have smaller indexes
  • save disk and memory space
  • improve writes performance

You are strongly encouraged to consider partial indexes if you have one or more of these use cases:

  • you run queries on a boolean field with an uneven distribution, and you look mostly for the less frequent value
  • you have a low cardinality field and the majority of the queries look for a subset of the values
  • the majority of the queries look for a limited subset of the values in a field
  • you don’t have enough memory to store very large indexes – for example, you have a lot of page evictions from the WiredTiger cache

Further readings

Partial indexes: https://docs.mongodb.com/manual/core/index-partial/

Sparse indexes: https://docs.mongodb.com/manual/core/index-sparse/

Articles on query optimization and investigation:


Photo by Mike Greer from Pexels

by Corrado Pandiani at December 19, 2018 01:29 PM

Jean-Jerome Schmidt

How to Improve Replication Performance in a MySQL or MariaDB Galera Cluster

In the comments section of one of our blogs a reader asked about the impact of wsrep_slave_threads on Galera Cluster’s I/O performance and scalability. At that time, we couldn’t easily answer that question and back it up with more data, but finally we managed to set up the environment and run some tests.

Our reader pointed towards benchmarks that showed that increasing wsrep_slave_threads did not have any impact on the performance of the Galera cluster.

To explain what the impact of that setting is, we set up a small cluster of three nodes (m5d.xlarge). This allowed us to utilize directly attached nvme SSD for the MySQL data directory. By doing this, we minimized the chance of storage becoming the bottleneck in our setup.

We set up InnoDB buffer pool to 8GB and redo logs to two files, 1GB each. We also increased innodb_io_capacity to 2000 and innodb_io_capacity_max to 10000. This was also intended to ensure that neither of those settings would impact our performance.

The whole problem with such benchmarks is that there are so many bottlenecks that you have to eliminate them one by one. Only after doing some configuration tuning and after making sure that the hardware will not be a problem, one can have hope that some more subtle limits will show up.

We generated ~90GB of data using sysbench:

sysbench /usr/share/sysbench/oltp_write_only.lua --threads=16 --events=0 --time=600 --mysql-host=172.30.4.245 --mysql-user=sbtest --mysql-password=sbtest --mysql-port=3306 --tables=28 --report-interval=1 --skip-trx=off --table-size=10000000 --db-ps-mode=disable --mysql-db=sbtest_large prepare

Then the benchmark was executed. We tested two settings: wsrep_slave_threads=1 and wsrep_slave_threads=16. The hardware was not powerful enough to benefit from increasing this variable even further. Please also keep in mind that we did not do a detailed benchmarking in order to determine whether wsrep_slave_threads should be set to 16, 8 or maybe 4 for the best performance. We were interested to see if we can show an impact on the cluster. And yes, the impact was clearly visible. For starters, some flow control graphs.

While running with wsrep_slave_threads=1, on average, nodes were paused due to flow control ~64% of the time.

While running with wsrep_slave_threads=16, on average, nodes were paused due to flow control ~20% of the time.

You can also compare the difference on a single graph. The drop at the end of the first part is the first attempt to run with wsrep_slave_threads=16. Servers ran out of disk space for binary logs and we had to re-run that benchmark once more at a later time.

How did this translate in performance terms? The difference is visible although definitely not that spectacular.

First, the query per second graph. First of all, you can notice that in both cases results are all over the place. This is mostly related to the unstable performance of the I/O storage and the flow control randomly kicking in. You can still see that the performance of the “red” result (wsrep_slave_threads=1) is quite lower than the “green” one (wsrep_slave_threads=16).

Quite similar picture is when we look at the latency. You can see more (and typically deeper) stalls for the run with wsrep_slave_thread=1.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

The difference is even more visible when we calculated average latency across all the runs and you can see that the latency of wsrep_slave_thread=1 is 27% higher of the latency with 16 slave threads, which obviously is not good as we want latency to be lower, not higher.

The difference in throughput is also visible, around 11% of the improvement when we added more wsrep_slave_threads.

As you can see, the impact is there. It is by no means 16x (even if that’s how we increased the number of slave threads in Galera) but it is definitely prominent enough so that we cannot classify it as just a statistical anomaly.

Please keep in mind that in our case we used quite small nodes. The difference should be even more significant if we are talking about large instances running on EBS volumes with thousands of provisioned IOPS.

Then we would be able to run sysbench even more aggressively, with higher number of concurrent operations. This should improve parallelization of the writesets, improving the gain from the multithreading even further. Also, faster hardware means that Galera will be able to utilize those 16 threads in more efficient way.

When running tests like this you have to keep in mind you need to push your setup almost to its limits. Single-threaded replication can handle quite a lot of load and you need to run heavy traffic to actually make it not performant enough to handle the task.

We hope this blog post gives you more insight into Galera Cluster’s abilities to apply writesets in parallel and the limiting factors around it.

by krzysztof at December 19, 2018 01:16 PM

December 18, 2018

Peter Zaitsev

Percona Server for MongoDB 4.0.4-1 GA Is Now Available

Percona Server for MongoDB Operator

Percona announces the GA release of Percona Server for MongoDB 4.0.4-1 on December 18, 2018. Download the latest version from the Percona website or the Percona software repositories.

Date: December 18, 2018
Download: Percona website
Installation: Installing Percona Server for MongoDB

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 4.0 Community Edition. It supports MongoDB 4.0 protocols and drivers.

Percona Server for MongoDB extends the functionality of the MongoDB 4.0 Community Edition by including the Percona Memory Engine storage engine, encrypted WiredTiger storage engine, audit logging, SASL authentication, hot backups, and enhanced query profilingPercona Server for MongoDB requires no changes to MongoDB applications or code.

This release includes all features of MongoDB 4.0 Community Edition 4.0. Most notable among these are:

Note that the MMAPv1 storage engine is deprecated in MongoDB 4.0 Community Edition 4.0.

In Percona Server for MongoDB 4.0.4-1, data at rest encryption is considered BETA quality. Do not use this feature in a production environment.

Bugs Fixed

  • PSMDB-235: In some cases, hot backup did not back up the keydb directory; mongod could crash after restore.
  • PSMDB-233: When starting Percona Server for MongoDB with WiredTiger encryption options but using a different storage engine, the server started normally and produced no warnings that these options had been ignored
  • PSMDB-239: The WiredTiger encryption was not disabled when using the Percona Memory Engine storage engine.
  • PSMDB-241: WiredTiger per database encryption keys were not purged when the database was deleted
  • PSMDB-243: A log message was added to indicate that the server is running with encryption
  • PSMDB-245: KeyDB’s WiredTiger logs were not properly rotated without restarting the server.
  • PSMDB-266: When running the server with the --directoryperdb option, the user could add arbitrary collections to the keydb directory which is designated for data encryption.

Due to the fix of bug PSMDB-266, it is not possible to downgrade from version 4.0.4-1 to version 3.6.8-2.0 of  Percona Server for MongoDB if using data at rest encryption (it will be possible to downgrade to PSMDB 3.6 as soon as PSMDB-266 is ported to that version).

by Borys Belinsky at December 18, 2018 06:42 PM

December 17, 2018

Federico Razzoli

On Percona Community Blog

I liked Percona Community Blog from the beginning. First of all, the idea is great. There is no other community blog for the MySQL ecosystem.

Well, Oracle has its own planet.mysql.com – and I have to say, they are correct: as far as I know, they never censored posts about MariaDB and Percona Server, nor opinions that they don’t like. I wrote heavy criticism about Oracle, sometimes using strong terms (“another dirty trick”), but they never censored me. I like to be fair regardless who/what I’m talking about, so this is a good time to spend some good words about them. Not the first time, anyway. That said, their blogroll only advertises a very small number of blogs. Very good ones of course (except for mine?), but this has the inevitable side effect of obfuscating the rest of the world. If John Doe writes an enlightening post about MySQL, I’ll never read it, because everything I need to know appears on Planet MySQL.

Percona Community Blog may have the same side effect… or maybe not, or the effect could be weaker at least. I saw outstanding contents by my well-known friend JF, yes. But I also saw articles by people that I don’t know, and I never saw on Planet MySQL. So I believe PCB is proving itself quite inclusive.

I started to publish some contents there. First, I used it to promote my talk at Percona Live Europe in Frankfurt, MariaDB System-Versioned Tables. Then I published an article on the same topic, Some notes on MariaDB system-versioned tables. Even if recently I’m not writing as much as I used to do some years ago, I believe that you will see more posts from me in the near future. PCB is a great place to publish stuff.

One could object that PBC contains the name of a private company and is hosted on its own website, so it is not genuinely a community project. Which is absolutely true. But if you want to see something better in the MySQL ecosystem, you will have to create it, because currently it doesn’t exist.

So, is this blog going to die? Absolutely not. This is my personal space. Any third-party website, no matter how good, can disappear or delete our contents, and there is nothing you can do about it. A personal space is there till you want it to be there. I don’t know how I will decide what will go here and what will go on PCB, I’ll have to think more about it.

Furthermore, being in several places is a form of redundancy, if we decide that our presence on the web is important for us. That is why I always keep my profiles on LinkedIn and Facebook a bit active, and some days ago I even created a YouTube playlist with my webinar recordings – only three, two of which in Italian, but still.

Well, enough babbling. Just a final word: if you have something interesting to say about open source databases, you should definitely propose it to PBC. Making it even more interesting is up to us!

Federico

by Federico at December 17, 2018 07:38 PM

Peter Zaitsev

Amazon RDS Aurora Serverless – The Basics

amazon aurora serverless

amazon aurora serverlessWhen I attended AWS Re:Invent 2018, I saw there was a lot of attention from both customers and the AWS team on Amazon RDS Aurora Serverless. So I decided to take a deeper look at this technology, and write a series of blog posts on this topic.

In this first post of the series, you will learn about Amazon Aurora Serverless basics and use cases. In later posts, I will share benchmark results and in depth realization results.

What Amazon Aurora Serverless Is

A great source of information on this topic is How Amazon Aurora Serverless Works from the official AWS  documentation. In this article, you learn what Serverless deployment rather than provisional deployment means. Instead of specifying an instance size you specify the minimum and maximum number of “Aurora Capacity Units” you would like to have:

choose MySQL version on Aurora

Amazon Aurora setup

capacity settings on Amazon Aurora

Once you set up such an instance it will automatically scale between its minimum and maximum capacity points. You also will be able to scale it manually if you like.

One of the most interesting Aurora Serverless properties in my opinion is its ability to go into pause if it stays idle for specified period of time.

pause capacity on Amazon Aurora

This feature can save a lot of money for test/dev environment where load can be intermittent.  Be careful, though, using this for production size databases as waking up is far from instant. I’ve seen cases of it taking over 30 seconds in my experiments.

Another thing which may surprise you about Amazon Aurora Serverless, at the time of this writing, is that it is not very well coordinated with other Amazon RDS Aurora products –  it is only available as a MySQL 5.6 based edition and is not compatible with recent parallel query innovations either as it comes with list of other significant limitations. I’m sure Amazon will resolve these in due course, but for now you need to be aware of them.

A simple way to think about it is as follows: Amazon Aurora Serverless is a way to deploy Amazon Aurora so it scales automatically with load; can automatically pause when there is no load; and resume automatically when requests come in.

What Amazon Aurora Serverless is not

When I think about Serverless Computing I think about about elastic scalability across multiple servers and resource usage based pricing.   DynamoDB, another Database which is advertised as Serverless by Amazon, fits those criteria while Amazon Aurora Serverless does not.

With Amazon Aurora Serverless, for better or for worse, you’re still living in the “classical” instance word.  Aurora Capacity Units (ACUs) are pretty much CPU and Memory Capacity. You still need to understand how many database connections you are allowed to have. You still need to monitor your CPU usage on the instance to understand when auto scaling will happen.

Amazon Aurora Serverless also does not have any magic to scale you beyond single instance performance, which you can get with provisioned Amazon Aurora

Summary

I’m excited about the new possibilities Amazon Aurora Serveless offers.  As long as you do not expect magic and understand this is one of the newest products in the Amazon Aurora family, you surely should give it a try for applications which fit.

If you’re hungry for more information about Amazon Aurora Serverless and can’t wait for the next articles in this series, this article by Jeremy Daly contains a lot of great information.


Photo by Emily Hon on Unsplash

by Peter Zaitsev at December 17, 2018 12:59 PM

Chris Calender

Find out A Little More About College or university Training?

Obtaining Advanced schooling Training

As it will probably appear improbable, there ARE ways to continue committed and have the best semester as of yet. Only in the following you’re willing to purchase training of this supreme great in a matter of several working days. Vacationing enthusiastic for the semester is often a nightmare for everyone.

Be certain to also label downward any buy research papers necessary times or timetable transformations connected to your work, every time you haven’t actually. Applying school help to a customized training is definitely not to be concerned about. More, it will certainly potentially indicate itself somehow by way of your application.

It’s right, there is the time to decide on the journalist, and enjoy their description prior to buying. Commercial frontrunners should trainer or assist staff to make sure that they’ve the perfect context and practicing for their designated succeed. You won’t ever be bad in looking for advice to gain success.

International candidates are responsible for their personal university student visa premiums and written documents and ought to note that there’s no funds (like federal government financial aid) accessible to world wide applicants. Our http://colegioebenezer.edu.co/?the=masters-writing-creative-uk-the-in-programs.htm coursework provider has existed for some generations. A. No, you can utilize the same PGP policy for all your permits and contents locations.

The Five-Moment Dominate for College or university Training

Training authoring is among the most important parts of academic lifestyle. Caused by our thorough crafting strategy, you’ll also are able to ranking properly in your particular elegance. Working hard people which might be cramming curriculums in their own stretched daily schedules will happen on large quantities following and making quickly thrown onto their to-do catalog.

Get started with Smallish much like with any new valuable experience and challenge, get into bit of and take a while to know the way the method is most effective. Also customize the united products till you’re major from the other conclusion. At exactly the same time, your topic area shouldn’t be brand new considering that it isn’t just going to be simplistic that you diligently uncover suitable components during https://payforessay.net/buy-essay preliminary research position.

You are likely to get a more powerful comprehension of this suitable arrangement for school article writing and possess the possibility to experience new information specialist methods and rehearse your personal coming up with qualities. Demands, finally, you will definitely get an mysterious that top level completeness organize article writing methods help you stay far by a lousy personal debt and present you converse more effective levels. The fact is, scholars run into the wide range of the duties that must be implemented in writing.

The Basic Principles of Advanced schooling Coursework Exposed

As for instance, you will definitely have already got a trustworthy source of earnings established, that make it less difficult to have enough money for tuition coupled with other expenses. At the same time, there are many of advantages to doing so! Unless of course you’re completed your experiments, you won’t know how to have a nice certainly-payed off position which includes a a great deal position advancement.

Should you have questions dealing with which developmental programs it is important to choose, watch your advisor. For biochemistry, AP credit standing might be implemented all the way to the all around chemistry condition as well as an individual semester of at the same time yrs all round extremely important training. It is definitely not really worth to keep and also hardwearing . grades at risk thanks to troubling coursework university specifications.

You will discover also on-line-centred coursework system you will get training to incorporate a unique selection of excellent quality by way of your work. Among the many leading problems some school students go through is the amount of training mandatory. You ought to understand how to improve your reviewing to coincide employing the rigor of an courses you’re currently being confronted with.

The Secrets to College Coursework

Use a glimpse at what a CSU-Global training course appears to be. The strain of balancing work closely with college or university labor sometimes have you questioning regardless of whether higher education was a terrific view. Robust consuming food is extremely important to obtain ideal nutrition so as to continue being concentrated and inform even while understanding.

Factors You Won’t Like About University Training and Issues You Will

Higher education willingness shouldn’t are the intent. Handling your classes in lieu of beyond it is typically specifically easy and can make it easier for make certain that school coursework integrates effectively in to your schedule. Classmates are expected to visit MCC tutorials even in the event the school isn’t in program to have a specified moment.

You may not be going to collect your college college degree yet would like to display that you’re advanced schooling prepared. Advanced schooling coursework is certainly an essential part of learning for every single learner. They is definitely not able to end up with educational credit for courses of instruction for which they failed to in the correct way subscribe.

Staying any material up-to-date is essential. When you finish you’ve applied to participate in the Pioneer Voice Package, we’ll delegate you with a Pioneer Show professional. Make sure you investigate the self-revealed suggestions you’ve provided.

University Coursework and University or college Coursework – The Ideal Formula

Definitely, listed here, at our website, the students in many cases can recognized the full range of sustaining assistance and don’t get complications with the crafted activities in anyway. Just like a highest an amazing term paper can provide you with a large status and luxury within the exam. At the identical time, your subject shouldn’t be innovative since it isn’t going to be simplistic for you to choose suitable content while in the exploration point.

Stuff You Won’t Like About Advanced schooling Coursework and Things You Will

A large amount of medical related universities will urge increased training in biology also. Also, you don’t have to pay us promptly, as long as we provide you university training aid and you also are pleased with it only then do you need to pay for us. Right before start typing in your coursework, you’ll requirement to attach your college information.

Some prerequisites are characterized with respect to training as well as others are described relating to competencies. Find trainings dependent on the stories you have to consider! Pick modules depending on the concepts you prefer to consider.

Up in Arms About Higher education Coursework?

Just look at the example following, and you will have a notion of what circumstances to predict away from your experience with CSU-International. When choosing a university, purchase the a you prefer possibly the most and experience the most from home, not one that you consider you might want to have a look at or one that you feel person desire you to consult with. Should you don’t correspond with the folks close to you as you’re being employed full-time on consuming any amount of college training, you are likely to be likely nowhere rapidly.

It’s also recommended that you gain benefit from the wealth of online details and solutions available for get acquainted with the basic techniques of AI. When you are done you’ve used to participate in the Pioneer Express Program, we’ll delegate that you simply Pioneer Convey expert. The data is assessed upon a every year base.

What University Training Is – and What exactly it is Not

The world wide web part is really personal reference to exactly how you purchase the training paper within the internet on the site. Should you be focused on your educational history, don’t be reluctant to get in touch with us using your requests. Many organisations may give a cheap coursework priceand offer exclusive writing articles.

The Grubby Truth About Advanced schooling Training

If being in another country in the correct time of app, global applicants accessible interview must be geared up into the future one on one. Make sure you be capable of focus on the institution do the trick as well and then have no supplementary figuring out bend in employing the correct software application that within the internet advanced schooling has to have. The University won’t earn any tuition changes for adjustments in enrollment adopting the verdict this initial four weeks as a result of the start of signing up.

Move credit rating is probably not found in being able to meet the nominal array of 400-standard credit score hours needed for the master’s level. Our training expert services has been in existence for the majority of ages. Don’t sense like it is important to daily schedule everything into 15-moment increments, but do ensure that you can match all obligations.

The Secret Jewel of Higher education Coursework

You might be a existing university or advanced schooling scholar, or you might be going to the workforce for the very first time. Come to consider your academic likes and dislikes and what you would like to do subsequently, after higher education. School students are liable for featuring method of travel.

These questions and answers can certainly help you know more. Your full time occupation will permit you to maintain a proficient mindset and viewpoint as you’re in education at precisely the same time, you’re just going to be well before every one of your other other graduate students. One can find opportunities youngsters could make ever since will possibly grow their possibility for success, or solely go away them around the rank quo with all others.

The Brand New Slope On College or university Coursework Just Published

Whilst it could look and feel impossible, it is possible to keep commited and have the perfect semester yet. Only at this point you’re able to acquire coursework with the the greatest possible level in only a few weeks. The toughest day of my college livelihood, undoubtedly, was aiming to find out what credits transferred, states in the usa Crotty.

The University or college Training Mask

You are likely to obtain a more robust comprehension of the suitable design for school writing articles and also have the chance to look into new research routines and employ your own special generating competencies. At any time you don’t have lots of experience to indicate your functionality and functionality working, it is normally vital to include any appropriate college or university coursework, even when you didn’t scholar that includes a qualification. Either you’re seeking category composition articles or any subject areas a lot when you are they’ve been at the sciences, we’ll manage to assist you.

Absolute best Choices of Higher education Coursework

Move Move guides in which the learner earned a passing class are eligible for review and college or university exchange credit rating might possibly be awarded. Whether or not you’re an expensive university undergraduate which desires to gain a high standard for the hire file, or even College or university learner aiming to find a doctorate, we’ve tailored software to suit what you will necessitate. Trainees are expected to go to MCC training systems even in the event the highschool isn’t in session for a chosen event.

Individuals with over a few semesters of university training may wish to think about our Liberal Analyses degree finalization opportunities. College students ought to be signed up for a partnering institution so as to take part in a DEEP product. Enrollees also need to be accepted directly into the guru internship consisting of college student educating.

In the event you don’t fully enter into each and every courses as you to start with send in the application or don’t make corrections as asked, the application will probably be delayed in finalizing and you could endanger your odds for admission. You ought to are able to target the high school effort by itself and just have no special trying to learn contour in using the best software packages that through the internet university or college will involve. The School won’t make any tuition changes for alterations in registration following final result of that early 4 weeks once the beginning of subscription.

The 5-Min Law for University Coursework

Training writing articles is among one of the most crucial elements of educational reality. The past task will bring together what you’ve perfected in the type of a dissertation, studies carrying out, or establishment growth proposition. Moving classmates which have been cramming training systems within his or her snug agendas will come over large quantities perusing and writing articles quickly thrown to their to-do report.

Working hard full-time when bringing any university or college coursework will be hectic and perhaps even wholly mind-boggling now and again. Only the following you’re willing to get coursework for this supreme fine quality within just two to three days or weeks. Keeping motivated for the whole semester is definitely a headache for everyone.

The Key to university Training

You’re almost definitely going crazy thinking about means to make all of it perform well. It’s unbelievably snug considering that it enables you to discover who more likely to draft your educational papers. If you decide you don’t connect with the people surrounding you as you’re working out full time over spending any sum of higher education coursework, you will be getting not anywhere high-speed.

There are several pupils for whom the whole download of honors sessions may well be environmentally friendly. There are lots of nice information out there for children, and thus don’t be scared to question! So long as you anticipate having to take a considerable amount of Traditional Training training courses for high school, decide on lessons from several different zones.

You do not be intending to are given your college college diploma yet wish to establish that you’re university or college informed. It is vital for pupils in school as it chooses their potential future and helps all of them to come up with a triumphant vocation. Honors College students utilize the Honors Coursebook in combination with the University’s world-wide-web lessons listing to build their group schedules each one semester.

Create a list of all of the occasions you want to caused by a new level, frame it, and set it into your rested learn room. Look into employing a planner if you ever must practice better time management planning to be assured you’re ways to get quite enough learn time. At the exact same time, your topic shouldn’t be cutting edge since it isn’t going to be simple that you identify best suited fabrics through evaluation state.

The Unseen Gem of School Training

You’ve reached identify that it is important. 1 grounds shutting will result in all campuses having been not open. Until you’re completed your studies, you won’t have the option to include a well-paid back project with a even more career building.

These answers and questions may help you learn more. Your full time profession will enable you to carry on knowledgeable attitude and personality as you’re in education as well as identical time, you’re usually in advance of your entire other fellow graduated pupils. You will find possible choices college students might make now that will possibly boost their possibility for success, or simply just post them within a reputation quo with other people.

News, Sits and University or college Training

You’ll be a part of every individual action of an technique, training and paying attention to your entire instant. In the event the pieces of paper is designed in acquiescence with an attained basic, without a glitches, then it’s ready for defense. In actual fact, participants go through the large amount of the tasks that must definitely be completed in creating.

Essential Components of School Training

Preserving any material updated is crucial. Immediately after you’ve applied to take part in the Pioneer Exhibit Training program, we’ll delegate that you simply Pioneer Show specialist. The Leader Talk about Application provides a remarkable chance of quickly convenience to college.

Exchange credit will not be utilized in achieving the nominal amount of 400-tier consumer credit working hours essential for the master’s college diploma. Our training customer service has existed for quite a few generations. Finding out how to examine properly can assist you to make your a lot of the time that you just simply pay for your schoolwork.

Loads of healthcare schools will promote even more training in biology also. For chemistry, AP credit might possibly be carried out to the complete chemistry necessity so when at least one semester of simultaneously decades total a necessity coursework. It is definitely not worthwhile to continue to keep your levels on the line as a consequence of troubling training university qualifications.

You will discover also on-line-dependent training customer service you can receive training to add a selected array of good quality by way of your career. One of several biggest worries some enrollees ordeal is the amount of coursework recommended. It truly is easy to write coursework exclusively by yourself, but there’s no good results warranted.

December 17, 2018 12:00 AM

December 15, 2018

Valeriy Kravchuk

Fun with Bugs #75 - On MySQL Bug Reports I am Subscribed to, Part XII

From the lack of comments to my previous post it seems everything is clear with ERROR 1213 in different kinds and forks of MySQL. I may still write a post of two about MyRocks or TokuDB deadlocks one day, but let's get back to my main topic of MySQL bugs. Today I continue my series of posts about community bug reports I am subscribed to with a review of bugs reported in November, 2018, starting from the oldest and skipping those MySQL 8 regression ones I've already commented on. I also skip documentation bugs that should be a topic for a separate post one day (to give more illustration to these my statements).

These are the most interesting bug reports from Community members in November 2018:
  • Bug #93139 - "mysqldump temporary views missing definer". This bug reported by Nikolai Ikhalainen from Percona looks like a regression (that can appear in a bit unusual case of missing root user) in all versions starting from 5.6. There is no regression tag, surely. Also for some reason I do not see 8.0.x as affected version, while from the text it seems MySQL 8 is also affected.
  • Bug #93165 - "Memory leak in sync_latch_meta_init() after mysqld shutdown detected by ASan". This bug was reported by Yura Sorokin from Percona, who also made important statement in his last comment (that I totally agree with):
    "In commit https://github.com/mysql/mysql-server/commit/e93e8db42d89154b37f63772ce68c1efda637609 you literally made 14 MTR test cases ignore ALL memory problems detected by ASan, not only those which you consider 'OK' when you terminate the process with the call to 'exit()'. In other words, new memory leaks introduced in FUTURE commits may not be detected because of those changes. Address Sanitizer is a very powerful tool and its coverage should be constantly extending rather than shrinking."
  • Bug #93196 - "DD crashes on assert if ha_commit_trans() returns error". It seems Vlad Lesin from Percona spent notable time testing everything related to new MySQL 8 data dictionary (maybe while Percona worked on their Percona Server for MySQL 8.0 that should have MyRocks also supported, should be able to provide native partitioning and proper integration with data dictionary). See also his Bug #93250 - "the result of tc_log->commit() is ignored in trans_commit_stmt()".
  • Bug #93241 - "Query against full text index with ORDER BY silently fails". Nice finding by Jonathan Balinski, with detailed test cases and comments added by Shane Bester. One more confirmation that FULLTEXT indexes in InnoDB are still problematic.
  • Bug #93276 - "Crash when calling mysql_real_connect() in loop". Nice regression in C API (since 8.0.4!) noted by Reggie Burnett and still not fixed.
  • Bug #93321 - "Assertion `rc == TYPE_OK' failed". The last but not the least, yet another debug assertion (and error in non-debug build) found in MySQL 8.0.13 by Roel Van de Paar from Percona. You already know where QA for MySQL happens to large extent, don't you?
  • Bug #93361 - "memory/performance_schema/table_handles have memory leak!". It's currently in "Need Feedback" status and may end up as not a bug, but I've never seen 9G of memory used for just one Performance Schema table so far. It's impressive.
  • Bug #93365 - "Query on performance_schema.data_locks causes replication issues". Probably the first case when it was proved that query to some Performance Schema table may block some important server activity. Nice finding by Daniël van Eeden.
  • Bug #93395 - "ALTER USER succeeds on master but fails on slave." Yet another way to break replication was found by Jean-François Gagné. See also his Bug #93397 - "Replication does not start if restart MySQL after init without start slave."
  • Bug #93423 - "binlog_row_image=full not always honored for binlog_format=MIXED". For some reason this bug (with a clear test case) reported by James Lawrie is still "Open".
  • Bug #93430 - "Inconsistent output of SHOW STATUS LIKE 'Handler_read_key';". This weird inconsistency was found by Przemysław Skibiński from Percona.
Thinking about the future of MySQL 8 somewhere in Greenwich...
To summarize this review:
  1. I obviously pay a lot of attention to bug reports from Percona engineers.
  2. It seems memory problems detected by ASan in some MTR test cases are deliberately ignored instead of being properly fixed.
  3. There are still many surprises waiting for early adopters of MySQL 8.0 GA :) 
That's all I have to say about specific MySQL bugs in 2018. Next "Fun with Bugs" post, if any, will appear only next year. I am already subscribed to 11 bugs reported in December 2018. Stay tuned!

by Valeriy Kravchuk (noreply@blogger.com) at December 15, 2018 04:15 PM

December 14, 2018

Oli Sennhauser

To NULL, or not to NULL, that is the question!

As we already stated in earlier articles in this blog [1 and 2] it is a good idea to use NULL values properly in MariaDB and MySQL.

One of my Mantras in MariaDB performance tuning is: Smaller tables lead to faster queries! One consequence out of this is to store NULL values instead of some dummy values into the columns if the value is not known (NULL: undefined/unknown).

To show how this helps related to space used by a table we created a little example:

CREATE TABLE big_null1 (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, c01 VARCHAR(32) NOT NULL
, c02 VARCHAR(32) NOT NULL
, c03 VARCHAR(32) NOT NULL
, c04 VARCHAR(32) NOT NULL
, c05 VARCHAR(32) NOT NULL
, c06 VARCHAR(32) NOT NULL
, c07 VARCHAR(32) NOT NULL
, c08 VARCHAR(32) NOT NULL
, c09 VARCHAR(32) NOT NULL
, c10 VARCHAR(32) NOT NULL
, c11 VARCHAR(32) NOT NULL
, c12 VARCHAR(32) NOT NULL
, INDEX (c03)
, INDEX (c06)
, INDEX (c09)
);

CREATE TABLE big_null2 (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, c01 VARCHAR(32) NOT NULL
, c02 VARCHAR(32) NOT NULL
, c03 VARCHAR(32) NOT NULL
, c04 VARCHAR(32) NOT NULL
, c05 VARCHAR(32) NOT NULL
, c06 VARCHAR(32) NOT NULL
, c07 VARCHAR(32) NOT NULL
, c08 VARCHAR(32) NOT NULL
, c09 VARCHAR(32) NOT NULL
, c10 VARCHAR(32) NOT NULL
, c11 VARCHAR(32) NOT NULL
, c12 VARCHAR(32) NOT NULL
, INDEX (c03)
, INDEX (c06)
, INDEX (c09)
);

Now we fill the table with default values (empty string or dummy values) because we do not know yet the contents:

INSERT INTO big_null1 VALUES (NULL, '', '', '', '', '', '', '', '', '', '', '', '');
INSERT INTO big_null1 SELECT NULL, '', '', '', '', '', '', '', '', '', '', '', '' FROM big_null1;
... up to 1 Mio rows

INSERT INTO big_null2
VALUES (NULL, 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.'
  , 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.');
INSERT INTO big_null2
SELECT NULL, 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.'
  , 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.'
  FROM big_null2;
... up to 1 Mio rows

ANALYZE TABLE big_null1;
ANALYZE TABLE big_null2;

SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free
  FROM information_schema.tables
 WHERE table_name IN ('big_null1', 'big_null2')
 ORDER BY table_name;
+------------+------------+----------------+-------------+--------------+-----------+
| table_name | table_rows | avg_row_length | data_length | index_length | data_free |
+------------+------------+----------------+-------------+--------------+-----------+
| big_null1  |    1046760 |             37 |    39387136 |     36225024 |   4194304 |
| big_null2  |    1031990 |            264 |   273416192 |     89899008 |   6291456 |
+------------+------------+----------------+-------------+--------------+-----------+

The opposite example is a table which allows NULL values for unknown fields:

CREATE TABLE big_null3 (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, c01 VARCHAR(32) NULL
, c02 VARCHAR(32) NULL
, c03 VARCHAR(32) NULL
, c04 VARCHAR(32) NULL
, c05 VARCHAR(32) NULL
, c06 VARCHAR(32) NULL
, c07 VARCHAR(32) NULL
, c08 VARCHAR(32) NULL
, c09 VARCHAR(32) NULL
, c10 VARCHAR(32) NULL
, c11 VARCHAR(32) NULL
, c12 VARCHAR(32) NULL
, INDEX (c03)
, INDEX (c06)
, INDEX (c09)
);

Also this table is filled with unknown values but this time with value NULL instead of an empty string:

INSERT INTO big_null3 (id) VALUES (NULL);
INSERT INTO big_null3 (id) SELECT NULL FROM big_null3;
... up to 1 Mio rows

ANALYZE TABLE big_null3;

SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free
  FROM information_schema.tables
 WHERE table_name IN ('big_null1', 'big_null2', 'big_null3')
 ORDER BY table_name
;
+------------+------------+----------------+-------------+--------------+-----------+
| table_name | table_rows | avg_row_length | data_length | index_length | data_free |
+------------+------------+----------------+-------------+--------------+-----------+
| big_null1  |    1046760 |             37 |    39387136 |     36225024 |   4194304 |
| big_null2  |    1031990 |            264 |   273416192 |     89899008 |   6291456 |
| big_null3  |    1047800 |             26 |    27852800 |     36225024 |   7340032 |
+------------+------------+----------------+-------------+--------------+-----------+

We see, that this table already uses much less space when we make correct use of NULL values...

So let us do some simple query run time tests:

big_null1big_null2big_null3
SELECT * FROM big_nullx1.1 s1.3 s0.9 s
SELECT * FROM big_nullx AS t1
  JOIN big_nullx AS t2 ON t2.id = t1.id
  JOIN big_nullx AS t3 ON t1.id = t3.id
5.0 s5.7 s4.2 s

One of my advices is, to fill the columns with NULL values if possible. So let us try this advice as well:

CREATE TABLE big_null4 (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, c01 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c02 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c03 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c04 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c05 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c06 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c07 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c08 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c09 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c10 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c11 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c12 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, INDEX (c03)
, INDEX (c06)
, INDEX (c09)
);

INSERT INTO big_null4 (id) VALUES (NULL);
INSERT INTO big_null4 (id) SELECT NULL FROM big_null4;
... up to 1 Mio rows

ANALYZE TABLE big_null4;

SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free
  FROM information_schema.tables
 WHERE table_name IN ('big_null1', 'big_null2', 'big_null3', 'big_null4')
 ORDER BY table_name
;
+------------+------------+----------------+-------------+--------------+-----------+
| table_name | table_rows | avg_row_length | data_length | index_length | data_free |
+------------+------------+----------------+-------------+--------------+-----------+
| big_null1  |    1046760 |             37 |    39387136 |     36225024 |   4194304 |
| big_null2  |    1031990 |            264 |   273416192 |     89899008 |   6291456 |
| big_null3  |    1047800 |             26 |    27852800 |     36225024 |   7340032 |
| big_null4  |     998533 |            383 |   382599168 |    118358016 |   6291456 |
+------------+------------+----------------+-------------+--------------+-----------+

So following my advice we fill with NULL values:

UPDATE big_null4
   SET c01 = NULL, c02 = NULL, c03 = NULL, c04 = NULL, c05 = NULL, c06 = NULL
     , c07 = NULL, c08 = NULL, c09 = NULL, c10 = NULL, c11 = NULL, c12 = NULL;

ANALYZE TABLE big_null4;

SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free
  FROM information_schema.tables
 WHERE table_name IN ('big_null1', 'big_null2', 'big_null3', 'big_null4')
 ORDER BY table_name;
+------------+------------+----------------+-------------+--------------+-----------+
| table_name | table_rows | avg_row_length | data_length | index_length | data_free |
+------------+------------+----------------+-------------+--------------+-----------+
| big_null1  |    1046760 |             37 |    39387136 |     36225024 |   4194304 |
| big_null2  |    1031990 |            264 |   273416192 |     89899008 |   6291456 |
| big_null3  |    1047800 |             26 |    27852800 |     36225024 |   7340032 |
| big_null4  |    1047285 |            364 |   381779968 |    126222336 |  33554432 |
+------------+------------+----------------+-------------+--------------+-----------+

It seems like we do not see the effect yet. So lets optimize the table to reclaim the space:

OPTIMIZE TABLE big_null4;

SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free
  FROM information_schema.tables
 WHERE table_name IN ('big_null1', 'big_null2', 'big_null3', 'big_null4')
 ORDER BY table_name
;
+------------+------------+----------------+-------------+--------------+-----------+
| table_name | table_rows | avg_row_length | data_length | index_length | data_free |
+------------+------------+----------------+-------------+--------------+-----------+
| big_null1  |    1046760 |             37 |    39387136 |     36225024 |   4194304 |
| big_null2  |    1031990 |            264 |   273416192 |     89899008 |   6291456 |
| big_null3  |    1047800 |             26 |    27852800 |     36225024 |   7340032 |
| big_null4  |    1047180 |             30 |    32030720 |     39370752 |   4194304 |
+------------+------------+----------------+-------------+--------------+-----------+

And you see there we get much of the space back... NULL is a good thing!

by Shinguz at December 14, 2018 07:33 AM

December 13, 2018

Peter Zaitsev

MongoDB Backup: How and When To Use PSMDB hotbackup and mongodb_consistent_backup

mongodb backup

mongodb backupWe have many backup methods to backup a MongoDB database using native mongodump or external tools. However, in this article, we’ll take a look at the backup tools offered by Percona, keeping in mind the restoration scenarios for MongoDB replicaSet and Sharded Cluster environments. We’ll explore how and when to use the tool mongodb-consistent-backup from Percona lab to backup the database consistently in Sharded Cluster/replicaSet environments. We’ll also take a look at hotbackup, a tool that’s available in Percona Server for MongoDB (PSMDB) packages. 

Backup is done – What about Restore?

Those who are responsible for data almost always think about the methods needed to backup the database and store the backups securely. But they often fail to foresee the scenario where the backup needs to be used to restore data. For example, unfortunately, I have seen many companies schedule the backup of config files and shard servers separately, but they start and complete the backups at different times based on data volumes. But can we use that backup when we need to restore and start the cluster with it? The answer is no—well, maybe yes if you can tweak the metadata, but data inconsistency may occur. Using this backup schedule, the backup is not consistent for the whole cluster, and we don’t have a point where we can restore the data for all shards/config dbs so that we can start the cluster from that point. Consequently, we face a difficult situation where we really need to use that backup! 

Let’s explore the two tools/features available to backup MongoDB from Percona, and look at which method to choose based on your restoration plan. 

Hot backup for both replicaset and Sharded cluster

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. These handy utilities can help you save time and effort.

Percona software builds located in the PerconaLabs and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements.

The main problem with backup is maintaining consistency, as an application still writes to the DB while backup is going on. So to maintain the consistency throughout the backup, and get a reliable full backup of all data needed to restore the database, the backup tool needs to track changes via oplog as well.  Using the mongodump utility along with oplog backup would help to achieve this easily in a replicaSet environment since you will need consistency for that replicaSet alone.

But when we need a consistent backup of a Sharded cluster, then it is very difficult to achieve the total cluster consistency as it involvs the backup of all shards and config servers all together up to a particular point,  to reuse in failover cases. In this case, even if you use mongodump manually in each shard/config separately, and try to take a consistent backup of the total cluster when there are writes being made, it is a very tedious job.  The backup of each shard ends at different points based on different scenarios such as load, data volume etc.

To remedy this, we could take a consistent hot backup of the Sharded cluster by using our utility mongodb-consistent-backup – in other words, point-in-time backup for the sharded cluster environment. This utility internally uses mongodump and gets the oplog changes from each node until the backup from all data nodes and configs are complete. This ensures that there is consistency in the backup of a total Sharded Cluster! You have to make sure you are using replicaSet for your config server too.  In fact, this tool also helps you to take a consistent backup in the replicaSet environment. 

This utility is available in our Percona lab but please note that it is not yet supported officially. To install this package, please make sure you install all the dependency packages, and follow the steps mentioned in this link to complete the installation process.

If you have enabled authentication in your environment, then create a user like below:

db.createUser({
	user: "backup_usr",
	pwd: "backup_pass",
	roles: [
	{ role: "clusterMonitor", db: "admin" }
	]
})/

The backup could be taken as follows by connecting one of the mongos node in the Sharded Cluster. Here mongos is running on 27051 port and the Cluster has one config replicaSet cfg and two Shards s1 and s2.

[root@app mongodb_consistent_backup-master]# ./bin/mongodb-consistent-backup -H localhost \
> -P 27051 \
> -u backup_usr \
> -p backup_pass \
> -a admin \
> -n clusterFullBackup \
> -l backup/mongodb
[2018-12-05 18:57:38,863] [INFO] [MainProcess] [Main:init:144] Starting mongodb-consistent-backup version 1.4.0 
(git commit: unknown)
[2018-12-05 18:57:38,864] [INFO] [MainProcess] [Main:init:145] Loaded config: {"archive": {"method": "tar", "tar": 
{"binary": "tar", "compression": "gzip"}, "zbackup": {"binary": "/usr/bin/zbackup", "cache_mb": 128, "compression": "lzma"}}, 
"authdb": "admin", "backup": {"location": "backup/mongodb", "method": "mongodump", "mongodump": {"binary": "/usr/bin/mongodump", 
"compression": "auto"}, "name": "clusterFullBackup"}, "environment": "production", "host": "localhost", "lock_file": 
"/tmp/mongodb-consistent-backup.lock", "notify": {"method": "none"}, "oplog": {"compression": "none", "flush": {"max_docs": 100, 
"max_secs": 1}, "tailer": {"enabled": "true", "status_interval": 30}}, "password": "******", "port": 27051, "replication": 
{"max_lag_secs": 10, "max_priority": 1000}, "sharding": {"balancer": {"ping_secs": 3, "wait_secs": 300}}, "upload": {"method": 
"none", "retries": 5, "rsync": {"path": "/", "port": 22}, "s3": {"chunk_size_mb": 50, "region": "us-east-1", "secure": true}, 
"threads": 4}, "username": "backup_usr"}
...
...
[2018-12-05 18:57:40,715] [INFO] [MongodumpThread-5] [MongodumpThread:run:204] Starting mongodump backup of s2/127.0.0.1:27043
[2018-12-05 18:57:40,722] [INFO] [MongodumpThread-7] [MongodumpThread:run:204] Starting mongodump backup of cfg/127.0.0.1:27022
[2018-12-05 18:57:40,724] [INFO] [MongodumpThread-6] [MongodumpThread:run:204] Starting mongodump backup of s1/127.0.0.1:27032
[2018-12-05 18:57:40,800] [INFO] [MongodumpThread-5] [MongodumpThread:wait:130] s2/127.0.0.1:27043:	Enter password:
[2018-12-05 18:57:40,804] [INFO] [MongodumpThread-6] [MongodumpThread:wait:130] s1/127.0.0.1:27032:	Enter password:
[2018-12-05 18:57:40,820] [INFO] [MongodumpThread-7] [MongodumpThread:wait:130] cfg/127.0.0.1:27022:	Enter password:
...
...
[2018-12-05 18:57:54,880] [INFO] [MainProcess] [Mongodump:wait:105] All mongodump backups completed successfully
[2018-12-05 18:57:54,892] [INFO] [MainProcess] [Stage:run:95] Completed running stage mongodb_consistent_backup.Backup with task 
Mongodump in 14.21 seconds
[2018-12-05 18:57:54,913] [INFO] [MainProcess] [Tailer:stop:86] Stopping all oplog tailers
[2018-12-05 18:57:55,955] [INFO] [MainProcess] [Tailer:stop:118] Waiting for tailer s2/127.0.0.1:27043 to stop
[2018-12-05 18:57:56,889] [INFO] [TailThread-2] [TailThread:run:177] Done tailing oplog on s2/127.0.0.1:27043, 2 oplog changes, 
end ts: Timestamp(1544036268, 1)
[2018-12-05 18:57:59,967] [INFO] [MainProcess] [Tailer:stop:118] Waiting for tailer s1/127.0.0.1:27032 to stop
[2018-12-05 18:58:00,801] [INFO] [TailThread-3] [TailThread:run:177] Done tailing oplog on s1/127.0.0.1:27032, 3 oplog changes, 
end ts: Timestamp(1544036271, 1)
[2018-12-05 18:58:03,985] [INFO] [MainProcess] [Tailer:stop:118] Waiting for tailer cfg/127.0.0.1:27022 to stop
[2018-12-05 18:58:04,803] [INFO] [TailThread-4] [TailThread:run:177] Done tailing oplog on cfg/127.0.0.1:27022, 8 oplog changes, 
end ts: Timestamp(1544036279, 1)
[2018-12-05 18:58:06,989] [INFO] [MainProcess] [Tailer:stop:125] Oplog tailing completed in 27.85 seconds
...
...
[2018-12-05 18:58:09,478] [INFO] [MainProcess] [Rotate:symlink:83] Updating clusterFullBackup latest symlink to current backup 
path: backup/mongodb/clusterFullBackup/20181205_1857
[2018-12-05 18:58:09,480] [INFO] [MainProcess] [Main:run:461] Completed mongodb-consistent-backup in 30.49 sec

where,
n – backup directory name to be created
l – backup directory
H – hostname
P – port
p – password
u – user
a – authentication database

The log, above, shows the backup pattern going on, and it captures the state of the oplog, and updates the changes. The same command could be used to connect the replicaSet by having a proper hostname. The tool also has the ability to identify whether it is a replicaSet or Sharded cluster before proceeding with the backup. This can be determined from the log output, as shown below, which is written by the tool when running the backup:

For sharding cluster:

[2018-12-05 19:05:02,453] [INFO] [MainProcess] [Main:run:299] Running backup in sharding mode using seed node(s): localhost:27051

For replicaSet:

[2018-12-05 19:23:05,070] [INFO] [MainProcess] [Main:run:257] Running backup in replset mode using seed node(s): localhost:27041

You can check out a couple of our blogs here and here for more details about the utility.

Hot but Cold backup

You may be wondering about the title Hot but Cold backup. Yes, for Percona Server for MongoDB (PSMDB) packages, there is feature to take the binary hot backup using hotbackup. Those who know the MySQL world will already know about Percona XtraBackup which is our open source and free binary hot backup utility for MySQL. PSMDB hotbackup works in a similar way. When you use hotbackup to backup, then you will have a binary backup ready to start an instance with the backup directory. You don’t need to worry about restoring from scratch and recreating indices. However, this solution works for replicaset/standalone mongodb instances only. 

If you can plan well, then you could feasibly use this feature to backup a Sharded cluster by bringing down one of the secondaries from all shards/config servers at the same time (probably when there is low or no transaction writing), then start them on a different port and without the replicaSet variable option, so that those instances won’t rejoin their replicaSet. Now you can start the hotbackup in all instances, once they are finished. You can revert the changes in the config file and allow them to rejoin their replicaSet.

Cautionary notes: Please make sure you are using the low priority or hidden nodes for this purpose, so that the election is not triggered when they split/join back to the replicaSet and don’t use SIGKILL (kill -9) to stop the db as it shuts down the database abruptly. Also, please plan to have at least an equal amount of disk space to that of your shard. A hotbackup takes an approximately equal amount of space as your node. 

 My colleague Tim Vaillancourt has written a great blogpost on this. See here.  

Conclusion

So from the above two methods, now you have the option to choose the similar backup methods based on your RTO, RPO explained here. Hope this helps you! Please share your comments and feedback below, and tell me what you think!

REFERENCES:

https://www.percona.com/doc/percona-server-for-mongodb/LATEST/hot-backup.html
https://www.percona.com/forums/questions-discussions/percona-server-for-mongodb/53006-percona-mongodb-difference-between-hot-backup-and-backup-using-mongo-dump
https://www.percona.com/blog/2016/07/25/mongodb-consistent-backups/
https://www.percona.com/blog/2018/04/06/free-fast-mongodb-hot-backup-with-percona-server-for-mongodb/
https://www.bluelock.com/blog/rpo-rto-pto-and-raas-disaster-recovery-explained/
https://en.wikipedia.org/wiki/Disaster_recovery
https://www.druva.com/blog/understanding-rpo-and-rto/
https://www.percona.com/live/e17/sites/default/files/slides/Running%20MongoDB%20in%20Production%20-%20FileId%20-%20115299.pdf
https://major.io/2010/03/18/sigterm-vs-sigkill/
https://docs.mongodb.com/manual/core/sharded-cluster-config-servers


Photo by Designecologist from Pexels

 

by Vinodh Krishnaswamy at December 13, 2018 11:43 AM

December 12, 2018

Jean-Jerome Schmidt

Webinar Replay: How to Manage Replication Failover Processes for MySQL, MariaDB & PostgreSQL

If you’re looking at minimizing downtime and meet your SLAs through an automated or semi-automated approach, then this webinar replay is for you:

A detailed overview of what failover processes may look like in MySQL, MariaDB and PostgreSQL replication setups.

Failover is the process of moving to a healthy standby component, during a failure or maintenance event, in order to preserve uptime. The quicker it can be done, the faster you can be back online.

However, failover can be tricky for transactional database systems as we strive to preserve data integrity - especially in asynchronous or semi-synchronous topologies.

There are risks associated: from diverging datasets to loss of data. Failing over due to incorrect reasoning, e.g., failed heartbeats in the case of network partitioning, can also cause significant harm.

In this webinar we cover the dangers related to the failover process, and discuss the tradeoffs between failover speed and data integrity. We’ll find out about how to shield applications from database failures with the help of proxies.

And we will finally have a look at how ClusterControl manages the failover process, and how it can be configured for both assisted and automated failover.

Agenda

  • An introduction to failover - what, when, how
    • in MySQL / MariaDB
    • in PostgreSQL
  • To automate or not to automate
  • Understanding the failover process
  • Orchestrating failover across the whole HA stack
  • Difficult problems
    • Network partitioning
    • Missed heartbeats
    • Split brain
  • From assisted to fully automated failover with ClusterControl
    • Demo

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

by jj at December 12, 2018 02:21 PM

Peter Zaitsev

AWS Elastic Block Storage (EBS) – Can We Get It Truly Elastic?

very old disk storage

very old disk storageAt AWS Re:Invent 2018 there were many great announcements of AWS New Services and New Features, but one basic feature that I’ve been waiting for years to be released is still nowhere to be  found.

AWS Elastic Block Storage (EBS) is great and it’s got better through the years, adding different storage types and features like Provisioned IOPS. However, it still has the most basic inconvenient requirement – I have to decide in advance how much space I need to allocate, and pay for all of that allocated space whether I use it or not.

It would be so much better if AWS would allow true consumption model pricing with EBS, where you pay for the storage used, not the storage allocated. This is already the case for S3,  RDS, and even EC2 instances (with Unlimited Option on T2/T3 Instances), not to mention Serverless focused services.

For example, I would love to be able to create a 1TB EBS volume but only pay for 10GB of storage if I only use this amount of space.

Modern storage subsystems do a good job differentiating between the space available on the block device and what’s being used by user files and filesystem metadata. The space that’s not allocated any more can be TRIMmed. This is a basic requirement for working well on flash storage, and as modern EC2 instances already provision EBS storage as emulated NVMe devices, I would imagine Amazon could hook into such functionality to track space actually used.

For us at Percona this would make shipping applications on AWS Marketplace much more convenient. Right now for Percona Monitoring and Management (PMM)  we have to choose how much space to allocate to the EBS volume by default, picking between it being expensive to run because of paying for the large unused EBS volume or setting a very limited by default capacity that needs user action to resize the EBS volume. Consumption-based EBS pricing would solve this dilemma.

This problem seems to be well recognized and understood. For example Pure Storage Cloud Block Storage (currently in Beta) is  expected to have such a feature.

I hope with its insane customer focus AWS will add this feature in the future, but currently we have to get by without it.


Image: Arnold Reinhold [CC BY-SA 2.5], via Wikimedia Commons

by Peter Zaitsev at December 12, 2018 12:28 PM

Percona XtraDB Cluster Operator Is Now Available as an Early Access Release

Percona XtraDB Cluster Operator

Percona announces the early access release of Percona XtraDB Cluster Operator.Percona XtraDB Cluster Operator

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. These handy utilities can help you save time and effort.

Percona software builds located in the PerconaLabs and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements.

Percona XtraDB Cluster Operator simplifies the deployment and management of Percona XtraDB Cluster in a Kubernetes or OpenShift environment. Kubernetes and the Kubernetes-based OpenShift platform provide users with a distributed orchestration system that automates the deployment, management and scaling of containerized applications.

It extends the Kubernetes API with a new custom resource for deploying, configuring and managing the application through the whole life cycle. You can compare the Kubernetes Operator to a System Administrator who deploys the application and watches the Kubernetes events related to it, taking administrative/operational actions when needed.

The Percona XtraDB Cluster Operator on PerconaLabs is an early access release. It is not recommended for production environments. 

You can install Percona XtraDB Cluster Operator can be installed on Kubernetes or OpenShift. While the operator does not support all the Percona XtraDB Cluster features in this early access release, instructions on how to install and configure it are already available along with the operator source code, hosted in our Github repository.

The operator was developed with high availability in mind, so it will attempt to run ProxySQL and XtraDB Cluster instances on separate worker nodes if possible, deploying the database cluster on at least three member nodes.

Percona XtraDB Cluster is an open source, cost-effective and robust clustering solution for businesses that integrates Percona Server for MySQL with the Galera replication library to produce a highly-available and scalable MySQL® cluster complete with synchronous multi-master replication, zero data loss and automatic node provisioning using Percona XtraBackup.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

by Dmitriy Kostiuk at December 12, 2018 01:15 AM

December 11, 2018

Peter Zaitsev

Upcoming Webinar Wed 12/12: MySQL 8 for Developers

MySQL 8 for Developers

MySQL 8 for DevelopersPlease join Percona’s CEO Peter Zaitsev as he presents MySQL 8 for Developers on Wednesday, December 12th, 2018 at 11:00 AM PST (UTC-7) / 2:00 PM EST (UTC-5).

Register Now

There are many great new features in MySQL 8, but how exactly can they help your application? This session takes a practical look at MySQL 8 features. It also details which limitations of previous MySQL versions are overcome by MySQL 8. Lastly, what you can do with MySQL 8 that you could not have done before is discussed.

Register for MySQL 8 for Developers to learn how MySQL’s new features can help your application and more.

by Peter Zaitsev at December 11, 2018 05:16 PM

Percona Server for MongoDB Operator Is Now Available as an Early Access Release

Percona Server for MongoDB Operator

Percona Server for MongoDB OperatorPercona announces the early access release of Percona Server for MongoDB Operator.

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. These handy utilities can help you save time and effort.

Percona software builds located in the PerconaLabs and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements.

Percona Server for MongoDB Operator simplifies the deployment and management of Percona Server for MongoDB in a Kubernetes or OpenShift environment. Kubernetes and the Kubernetes-based OpenShift platform provide users with a distributed orchestration system that automates the deployment, management and scaling of containerized applications.

It extends the Kubernetes API with a new custom resource for deploying, configuring and managing the application through the whole life cycle. You can compare the Kubernetes Operator to a System Administrator who deploys the application and watches the Kubernetes events related to it, taking administrative/operational actions when needed.

The Percona Server for MongoDB Operator on PerconaLabs is an early access release. It is not recommended for production environments. 

Percona Server for MongoDB Operator can be installed on Kubernetes or OpenShift. While the operator does not support all the Percona Server for MongoDB features in this early access release, instructions on how to install and configure it are already available along with the operator source code, which is hosted in our Github repository.

The operator was developed to give consideration to high availability, so it will attempt to run MongoDB instances on separate worker nodes (if possible), and deploy the database cluster as a single Replica Set with at least three member nodes.

Percona Server for MongoDB Operator

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine, as well as several enterprise-grade features. It requires no changes to MongoDB applications or code.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

by Dmitriy Kostiuk at December 11, 2018 09:06 AM

December 10, 2018

Peter Zaitsev

Percona XtraBackup 8.0.4 Is Now Available

Percona XtraBackup 8.0

Percona XtraBackup 8.0Percona is glad to announce the release of Percona XtraBackup 8.0.4 on December 10, 2018. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

This release of Percona Xtrabackup is a General Availability release ready for use in a production environment.

Please note the following about this release:

  • The deprecated innobackupex has been removed. Use the xtrabackup command to back up your instances: $ xtrabackup --backup --target-dir=/data/backup
  • When migrating from earlier database server versions, backup and restore and using XtraBackup 2.4 and then use mysql_upgrade from MySQL 8.0.x
  • If using yum or apt repositories to install Percona Xtrabackup 8.0.4, ensure that you have enabled the new tools repository. You can do this with the percona-release enable tools release command and then install the percona-xtrabackup-80 package.

All Percona software is open-source and free. We are grateful to the community for the invaluable contributions to Percona XtraBackup. We would especially like to highlight the input of Alexey Kopytov who has been actively offering improvements and submitting bug reports for Percona XtraBackup.

New Features

  • Percona XtraBackup 8.0.4 is based on MySQL 8.0.13 and fully supports Percona Server for MySQL 8.0 series and MySQL 8.0 series.

Bugs Fixed

  • PXB-1699:xtrabackup --prepare could fail on backups of MySQL 8.0.13 databases
  • PXB-1704:xtrabackup --prepare could hang while performing insert buffer merge
  • PXB-1668: When the --throttle option was used, the applied value was different from the one specified by the user (off by one error)
  • PXB-1679: PXB could crash when ALTER TABLE … TRUNCATE PARTITION command was run during a backup without locking DDL

by Borys Belinsky at December 10, 2018 07:05 PM

Chris Calender

What Nearly everybody Else Does In relation to College Coursework and What Is Important To Do Distinctive

A Brief History of College Training Refuted

Although it can potentially start looking improbable, there ARE ways to keep empowered and have the top semester as of yet. No matter what grounds, elderly yr is a most troublesome twelve months making it using. The most challenging week of my college position, absolutely, was planning to learn what credits transferred, states in the usa Crotty.

If buy assignment experiencing in another country for the correct time of use, foreign applicants that are offered interviews needs to be geared up coming directly. If business has taken you far away from school are working for above two generations, we advise that you join overwhelming college or university tier training systems before posting an authorized job application. The College won’t make money any college tuition changes for adjustments in registration right after the final result of this starting 30 days as a result of the start of sign up.

These answers and questions will help you get more. If you are able and also hardwearing . look into your goals, it will help control that senioritis. Without delay, also, there are many many a great time and vital greatest essays of 2018 to choose from that might help you get moolah https://www.scu.edu/ethics/focus-areas/business-ethics/resources/littlebrother-is-watching-you/ for assistance.

Relocate credit rating may not be employed in conference the bare minimum wide range of 400-level credit working hours meant for the master’s amount. Due to the fact almost every get is extremely important and effective. Learning how to analyze comfortably will help uou have the the majority of the time that you can invest in your schoolwork.

Automatically, you could always identify some college newspapers on the net. We’re an on-line coursework formulating care which supplies assistance with an extensive offering of coursework substance. As soon as you absolute the application form application and choose this issue now you may very well pick the journalist!

Compose a list of all of the occurrences you wish to do today to your levels, framework it, and set it in the invigorated learning room or space. Being a max a really good school assignment can supply you with a high evaluation and pleasure through the check-up. For this reason, any time https://payforessay.net/assignment you, like several thousand specific pupils day-to-day, feel as though you want some assistance with training, at this stage you understand best places to happen for the efficient, effective help which provides you with a top-notch prime quality document for a very low amount and within your specified deadline.

You can expect to find a stronger comprehension on the proper building for scholastic producing and enjoy the chance to experience new information skills and practice your personal crafting functionality. Ought to have, subsequently, you might get an undiscovered that excellent completeness scheme creating relevant skills help you stay well away in a horrible credit debt and offer a person to communicate far better marks. The simple truth is, college students come upon the massive amount of the tasks that need to be completed on paper.

Generating university credits in college is often a great and practical go through, but you will have to start thinking about no matter if a particular strategy definitely fulfills your wants and meshes with all the remainder in your life. Commence to consider your educational hobbies and interests and what you would want to do just after school. Every scholar may differ, and outreach will need to be modified on to the particular preferences of the student.

New Step-by-step Roadmap for University or college Coursework

Some schools should have an formal secondary school transcript a little too. College students might also be a part of an absolute-life skills at the place where they are able to utilise their very own coursework within a reliable arranging. Its not all classmates are both equally reliable in all of the subject matter, which is the reason they typically must have college training help you to enhance their levels and raise the probability of a bright potential.

In the event that you needed the proper approvals in advance of enrolling in the intended path, the equivalency will probably be recognized. Obviously, you need to give full attention to your powerful arrangements tutorials before anything else before getting to your GEs. Alternative training could very well be looked at as vital.

The Background of University Coursework Refuted

If you’re experiencing burnt off out, you ought to look at having a modest destroy from institution in a brief even as. Meditating and workout are the most effective solutions to drop force whilst keeping you choosing the remainder during the day. Well balanced consuming is crucial to become proper nutrients and vitamins in order to continue being targeted and inform even though understanding.

If you’d like to see lessons descriptions, you are able to seek for the college or university webpage within your ideal university for trainings currently being readily available. The ranking of these final exam establishes a student’s positioning at his or her postsecondary organization. If you want to obtain training courses that is not on your own college degree or certification, it is really potential that you’re not throughout the software program that many matches your instructional intent.

It will always be annoying for any pupils. College students have to be signed up on a partnering school so as to take part in a DEEP technique. Honors University students take advantage of the Honors Coursebook along with the University’s internet class listing which will make their category daily schedules each individual semester.

Where to shop for Advanced schooling Training

Preserving your whole facts and techniques updated is crucial. There aren’t any costs for the service offered for the Individual Accomplishment Middle. Be sure you have a look at the self-announced knowledge you’ve so long as.

College Coursework – Useless or Lively?

Make a list of all the events you aspire to caused by your brand new place, structure it, and set it on your own restored learning area. Bear in mind working with a adviser as soon as you is required to use more effective time management to make sure you’re choosing just enough learning time. For that reason, for those who, like numerous alternative classmates day by day, feel as though you need some assistance with coursework, at this point you are aware of best places appear to obtain a quick, professional help which offers you a high value old fashioned paper for a really low expense and as part of your specified deadline.

Some educational facilities have a need for an established high school graduation transcript a bit too. High school students also will take part in a real-arena endure at where he or she can utilize almost all their coursework inside of an authentic atmosphere. So, every single person is required to address coursework crafting a minumum of a time during his studying.

You will discover also net-founded training assistance you may get training to feature a specific wide range of good quality by way of your employment. There are many of applications readily available to guide you with coursework. It is really possible to be able to write coursework alone, but there’s no achievements certain to get.

University Training Guide!

You’re in all likelihood losing their mind questioning ways to make everything do the trick. It’s fairly relaxed given it permits you to understand who planning to write your scholastic cardstock. If you decide you don’t connect with the folks near you as you’re earning a living full time atop taking any sum of college or university training, you will be progressing not anywhere speedily.

If you’re natural schooled, make a call to the Morris on-line Understanding home office for more info. As it has to do with training make it easier for, it is important to purchase the definite most expertly combined solution which is often perfectly reputable. The info is reviewed for the annual basis.

The Covered Treasure of College Coursework

Training writing is one of the most critical elements of scholastic lifetime. Usually, just about every little inventive article writing is verified for plagiarism and any kind of slips following your project is complete. Engaging university students which might be cramming training courses for their snug routines will arrive across big amounts viewing and penning surprisingly tossed right onto their to-do variety.

Be certain to also mark all the way down any serious schedules or time frame enhancements in connection with your task, if you haven’t definitely. Working with scholastic help to a personalized coursework is certainly not to be concerned about. Additionally, it should certainly reveal again in some manner with the application.

Relocate credit rating may not be employed in connecting with the lowest amount of 400-tier credit a lot of time required for the master’s education. Our training assist has been in existence for lots of long time. A. No, you can utilize the exact same PGP approach for your certificates and article areas.

You may be a actual university or university scholar, or you might be entering into the personnel for the 1st occasion. Commence to seriously think about your academic concerns and what you would want to do just after university. Students are liable for offering vehicles.

It’s right, you will have the time to pick the freelance writer, to check out their user profile prior to buying. If you can and also hardwearing . target your focuses on, it is going to help control that senioritis. There exist solutions students might make given that could boost their likelihood for fulfillment, or only put them throughout position quo with other people.

Highest Choices of Higher education Coursework

The utmost test credit score within a specific treatment will likely be deemed. About middle of the-semester, perhaps it will grow to be tough to continue being concentrated on the conclusion goal, but it’s vitally important to ensuring your success. When a instruction requires a C for this primary, then you should make a C for any transferred tutorials to be very positioned on the key.

Learn how to get Going with College or university Training?

You can buy a more powerful understanding about the pertinent shape for scholastic publishing and get the chance to research new research techniques and use your current coming up with skills. In the event the newspaper is designed in acquiescence for an carried out typical, without the problems, then it’s all set for security. Studying can every now and then be tricky, however the proper incentive, knowledge, advanced planning and way of thinking, you may obtain the energy implemented.

If you’d like to look at path product descriptions, you can overall look to the university or college web blog of your personal suggested class for curriculums being provided. Mathematics and Information The bulk of wellness-linked educational facilities don’t have specific arithmetic condition. If you expect ingesting many Basic Learning modules through the course of high school, purchase categories from quite a lot of facets.

Contenders with more than 5 semesters of advanced schooling training may want to consider our Liberal Clinical tests qualification finalization selections. It is very important for students in advanced schooling simply because decides their potential future helping to all of them to build a fantastic career. Classmates also have to be admitted into the knowledgeable internship integrating pupil coaching.

Hearsay, Deception and College Training

If you ever don’t successfully enter into all your guides any time you before anything else publish your application or don’t make improvements as asked for, the application will be postponed in running and you can jeopardize your chances for entrance. If provider has had you far away from scholastic benefit above two long time, we advise that you sign up for complex advanced schooling stage training systems long before sending an recognized application. The School won’t pull in any tuition improvements for alterations in signing up adopting the final result of the basic 4 weeks immediately following the beginning of sign up.

If you’ve examine several training paperwork but still usually do not realize how to make any cardstock, you’re within the appropriate position. Another task will bring with each other what you’ve figured out in the kind of a dissertation, preliminary research task, or reputable company enhancement proposition. Working college students which have been cramming programmes within his or her restricted routines will happen on large quantities going through and creating surprisingly thrown to their to-do collection.

How to find College Coursework

Plan out your calendar at the start of the semester, and so you know what’s coming later on. About medium-semester, it may change into not easy to keep on being focused on the final ambition, but it’s extremely important to your prosperity. In case your system requires C relating to the critical, then you should make a C for our transferred lessons being positioned on the foremost.

You’re definitely going crazy questioning strategies to make everything jobs. When it is time to understanding the most effective technique of be given any premium quality career executed is to always just change them off of and focus exclusively on training books and remarks. It’s quite not easy to position the boys and girls less than the sort of amount of pressure you might want to set them according to in camp out and see the way they answer, he clarified.

The Single Thing to achieve for College or university Training

Florida’s advanced schooling concept is meant to let university students that want to make bachelor’s diploma to finish their starting several years with a two-12 months organization like MDC. The score through the really last exam determines a student’s positioning at his or her postsecondary establishment. If you need to receive sessions which might be not within the diploma or qualification, this is potential that you’re not within your package that a number of suits your helpful unbiased.

It is always problematic for the learners. Learners must be signed up from a partnering faculty in order to participate in a DEEP routine. They is definitely not able to have educational credit for courses of instruction for that they can failed to perfectly create an account.

Basically, below, at our website, students can potentially learned the entire range of supporting treatments and don’t have issues with the penned plans in anyway. Give consideration to working with a adviser if you happen to must definitely method a lot better time management to be certain you’re taking plenty of examine time. At precisely the same time, your problem shouldn’t be innovative because it isn’t usually basic which you just uncover correct material while in the analyze state.

The Essential Details of University or college Training

No matter the casing, an entirely free of cost research is offered to virtually every college student. In addition, there are various of benefits of completing this task! In addition to certification just like a BCBA, there’s an additional decrease level of certification to choose from.

University Training for Newbies

The program was created for pupil results beginning with the small proportions of the 1st quarter mastering community. For lots of of us, a specialist is most likely the only method to obtain coursework implemented. A lot of students have set a lot of prosperity with online world coursework assist to and that’s why the firm carries on to blossom.

You may acquire a much stronger understanding of our suitable construction for school coming up with and enjoy the time to take a look at new research practices and employ your very own new composing talents. Needs, finally, you can get an unheard of that most effective completeness plan creating capabilities make you stay well away through the unfortunate personal debt and gives you to ultimately talk significantly better marks. No matter whether you’re interested in category constitution subject areas or any ideas plenty although they’ve been by the sciences, we’ll be capable to aid you.

The Thing to handle for University or college Coursework

Keeping all of your knowledge updated is critical. There aren’t any expenses with the products and services available inside of the Person Triumph Facility. You should definitely look at the personal-reported resources you’ve supplied.

Intercontinental applicants are responsible for his or her university student visa costs and documentation and should really be aware that there’s no funding (like federal educational funding) available for global applicants. Our training service has been in existence for quite a few many years. Focusing on how to analyze comfortably will help make the almost all of the time that you just pay for your schoolwork.

New Specific Roadmap for University Training

Arithmetic training requires distinct cognitive potential, most definitely contemplating in terms of math. University coursework is one of the most crucial jobs even as reviewing. So, just about every single student needs to deal with coursework coming up with a minumum of one time throughout his mastering.

In beginning on the university or college training, you might also explain how you want to handle the issue. There are a selection of sources you can get to work with you utilizing your coursework. Each individual little coursework we manufacture will be to an excellent customary.

December 10, 2018 12:00 AM

December 08, 2018

Valeriy Kravchuk

What May Cause MySQL ERROR 1213

Probably all of us, MySQL users, DBAs and developers had seen error 1213 more than once, in one context or the other:
mysql> select * from t1;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
The first thing that comes to mind in this case is: "OK, we have InnoDB deadlock, let's check the details", followed by the SHOW ENGINE INNODB STATUS check, like this:
mysql> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
2018-12-08 17:41:11 0x7f2f8b8db700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 12 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 59 srv_active, 0 srv_shutdown, 14824 srv_idle
srv_master_thread log flush and writes: 14882
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 326
OS WAIT ARRAY INFO: signal count 200
RW-shared spins 0, rounds 396, OS waits 195
RW-excl spins 0, rounds 120, OS waits 4
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 396.00 RW-shared, 120.00 RW-excl, 0.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 14960
Purge done for trx's n:o < 14954 undo n:o < 0 state: running but idle
History list length 28
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421316960193880, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421316960192752, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
...
Now, what if you get the output like the one above? Without any LATEST DETECTED DEADLOCK section? I've seen people wondering how is it even possible and trying to find some suspicious bug somewhere...

Do not be in a hurry - time to recall that there are actually at least 4 quite common reasons to get error 2013 in modern (5.5+) version of MySQL, MariaDB and Friends:
  1. InnoDB deadlock happened
  2. Metadata deadlock happened
  3. If you are lucky enough to use Galera cluster, Galera conflict happened
  4. Deadlock happened in some other storage engine (for example, MyRocks)
I am not lucky enough to use MySQL's group replication yet, but I know that conflicts there are also possible. I am just not sure if error 1213 is also reported in that case. Feel free to check with a test case similar to the one I've used for Galera below.

I also suspect deadlocks with other engines are also possible. As a bonus point, I'll demonstrate the deadlock with MyRocks also.

Let's reproduce these 3 cases one by one and check how to get more information on them. In all cases it's enough to have at most 2 InnoDB tables with just two rows:
mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  `c1` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `id` int(11) NOT NULL,
  `c1` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> select * from t1;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)

mysql> select * from t2;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)
We'll need two sessions, surely.

InnoDB Deadlock

With InnoDB and tables above it's really easy to end up with a deadlock. In the first session execute the following:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1 where id = 1 for update;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
+----+------+
1 row in set (0.00 sec)
In the second session execute:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1 where id = 2 for update;
+----+------+
| id | c1   |
+----+------+
|  2 |    2 |
+----+------+
1 row in set (0.02 sec)
Now in the first session try to access the row with id=2 asking for incompatible lock:
mysql> select * from t1 where id = 2 for update;
This statement hangs waiting for a lock (up to innodb_lock_wait_timeout seconds). Try to access the row with id=1 asking for incompatible lock in the second session, and you'll get the deadlock error:
mysql> select * from t1 where id = 1 for update;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
at this moment SELECT in the first transaction returns data:
+----+------+
| id | c1   |
+----+------+
|  2 |    2 |
+----+------+
1 row in set (5.84 sec)
It's that simple, one table and two rows is enough. We can get the details in the output of SHOW ENGINE INNODB STATUS:
...
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-12-08 18:32:59 0x7f2f8b8db700
*** (1) TRANSACTION:
TRANSACTION 15002, ACTIVE 202 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 8, OS thread handle 139842181244672, query id 8545 localhost root statistics
select * from t1 where id = 2 for update
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 94 page no 3 n bits 72 index PRIMARY of table `test`.`t1` trx id 15002 lock_mode X locks rec but not gap waiting
*** (2) TRANSACTION:
TRANSACTION 15003, ACTIVE 143 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 9, OS thread handle 139842181510912, query id 8546 localhost root statistics
select * from t1 where id = 1 for update
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 94 page no 3 n bits 72 index PRIMARY of table `test`.`t1` trx id 15003 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 94 page no 3 n bits 72 index PRIMARY of table `test`.`t1` trx id 15003 lock_mode X locks rec but not gap waiting
*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
...
In the case above I've used Percona Server  5.7.24-26 (why not). Details of output may vary depending on version (and bugs it has :).  If you use MariaDB 5.5+, in case of InnoDB deadlock special innodb_deadlocks status variable is also incremented.

Metadata Deadlock

Unlike with InnoDB deadlocks, chances that you've seen deadlocks with metadata locks involved are low. One may spend notable time trying to reproduce such a deadlock, but (as usual) quck check of MySQL bugs database may help to find an easy to reproduce case. I mean Bug #65890 - "Deadlock that is not a deadlock with transaction and lock tables".

So, let's try the following scenario with two sessions and out InnoDB tables, t1 and t2. In one session:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t2 for update;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)
In another session:
mysql> lock tables t1 write, t2 write;
It hangs, waiting as long as lock_wait_timeout. We can check what happens with metadata locks using performance_schema.metadata_locks table (as we use MySQL or Percona Server 5.7+, more on setup, alternatives for MariaDB etc here and there). In the first session:
mysql> select * from performance_schema.metadata_locks;
+-------------+--------------------+----------------+-----------------------+----------------------+---------------+-------------+--------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA      | OBJECT_NAME    | OBJECT_INSTANCE_BEGIN | LOCK_TYPE            | LOCK_DURATION | LOCK_STATUS | SOURCE | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+--------------------+----------------+-----------------------+----------------------+---------------+-------------+--------+-----------------+----------------+
| TABLE       | test               | t2             |       139841686765904 | SHARED_WRITE         | TRANSACTION   | GRANTED     |        |              45 |           2850 || GLOBAL      | NULL               | NULL           |       139841688088672 | INTENTION_EXCLUSIVE  | STATEMENT     | GRANTED     |        |              46 |            205 |
| SCHEMA      | test               | NULL           |       139841688088912 | INTENTION_EXCLUSIVE  | TRANSACTION   | GRANTED     |        |              46 |            205 |
| TABLE       | test               | t1             |       139841688088992 | SHARED_NO_READ_WRITE | TRANSACTION   | GRANTED     |        |              46 |            207 |
| TABLE       | test               | t2             |       139841688089072 | SHARED_NO_READ_WRITE | TRANSACTION   | PENDING     |        |              46 |            208 |
| TABLE       | performance_schema | metadata_locks |       139841686219040 | SHARED_READ          | TRANSACTION   | GRANTED     |        |              45 |           3003 |
+-------------+--------------------+----------------+-----------------------+----------------------+---------------+-------------+--------+-----------------+----------------+
6 rows in set (0.00 sec)
As soon as we try this in the first session:
mysql> select * from t1;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
we get the same deadlock error 1213 and LOCK TABLES in the second session completes. We can find nothing about this deadlock in the output of SHOW ENGINE INNODB STATUS (as shared at the beginning of this post). I am also not aware about any status variables to count metadata deadlocks.

You can find some useful information about metadata deadlocks in the manual.

Galera Conflict

For simplicity I'll use MariaDB 10.1.x and simple 2 nodes setup on the same box as I described here. I'll start first node as a new cluster and create tables for this test:
openxs@ao756:~/dbs/maria10.1$ bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode1.cnf --wsrep-new-cluster &
[1] 13022
openxs@ao756:~/dbs/maria10.1$ 181208 20:40:52 mysqld_safe Logging to '/tmp/mysql-node1.err'.
181208 20:40:52 mysqld_safe Starting mysqld daemon with databases from /home/openxs/galera/node1

openxs@ao756:~/dbs/maria10.1$ bin/mysql  --socket=/tmp/mysql-node1.sock test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.1.34-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> drop table t1, t2;
ERROR 1051 (42S02): Unknown table 'test.t2'
MariaDB [test]> create table t1(id int, c1 int, primary key(id));
Query OK, 0 rows affected (0.29 sec)

MariaDB [test]> create table t2(id int, c1 int, primary key(id));
Query OK, 0 rows affected (0.22 sec)

MariaDB [test]> insert into t1 values (1,1), (2,2);
Query OK, 2 rows affected (0.07 sec)
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [test]> insert into t2 values (1,1), (2,2);
Query OK, 2 rows affected (0.18 sec)
Records: 2  Duplicates: 0  Warnings: 0
Then I'll start second node, make sure it joined the cluster and has the same data:
openxs@ao756:~/dbs/maria10.1$ bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode2.cnf &
[2] 15110
openxs@ao756:~/dbs/maria10.1$ 181208 20:46:11 mysqld_safe Logging to '/tmp/mysql-node2.err'.
181208 20:46:11 mysqld_safe Starting mysqld daemon with databases from /home/openxs/galera/node2

openxs@ao756:~/dbs/maria10.1$ bin/mysql --socket=/tmp/mysql-node2.sock test     Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.1.34-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [test]> show status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 4                                    |
| wsrep_cluster_size       | 2                                    |
| wsrep_cluster_state_uuid | b1d227b1-0211-11e6-8ce0-3644ad2b03dc |
| wsrep_cluster_status     | Primary                              |
+--------------------------+--------------------------------------+
4 rows in set (0.04 sec)

MariaDB [test]> select * from t2;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.02 sec)
Now we are ready to try to provoke Galera conflict. For this we have to try to update the same data in transactions on two different nodes. In one session connected to node1:
MariaDB [test]> select @@wsrep_node_name;
+-------------------+
| @@wsrep_node_name |
+-------------------+
| node1             |
+-------------------+
1 row in set (0.00 sec)

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> update test.t1 set c1 = 5 where id=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0


In another session connected to other node:

MariaDB [test]> select @@wsrep_node_name;
+-------------------+
| @@wsrep_node_name |
+-------------------+
| node2             |
+-------------------+
1 row in set (0.00 sec)

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> update test.t1 set c1 = 6 where id=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
Now in the first we can COMMIT successfully:
MariaDB [test]> commit;
Query OK, 0 rows affected (0.12 sec)
But if we try to COMMIT in the second:
MariaDB [test]> commit;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
We get that same error 1213 about the deadlock. Surely you'll see nothing about this deadlock in INNODB STATUS output, as it was NOT an InnoDB deadlock, but Galera conflict. Check these status variables on the node2:
MariaDB [test]> show status like 'wsrep_local%';
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_local_bf_aborts      | 1                                    |
| wsrep_local_cached_downto  | 75                                   |
| wsrep_local_cert_failures  | 0                                    |
| wsrep_local_commits        | 0                                    |
| wsrep_local_index          | 0                                    |
| wsrep_local_recv_queue     | 0                                    |
| wsrep_local_recv_queue_avg | 0.000000                             |
| wsrep_local_recv_queue_max | 1                                    |
| wsrep_local_recv_queue_min | 0                                    |
| wsrep_local_replays        | 0                                    |
| wsrep_local_send_queue     | 0                                    |
| wsrep_local_send_queue_avg | 0.000000                             |
| wsrep_local_send_queue_max | 1                                    |
| wsrep_local_send_queue_min | 0                                    |
| wsrep_local_state          | 4                                    |
| wsrep_local_state_comment  | Synced                               |
| wsrep_local_state_uuid     | b1d227b1-0211-11e6-8ce0-3644ad2b03dc |
+----------------------------+--------------------------------------+
17 rows in set (0.01 sec)
If wsrep_local_bf_aborts > 0, you had conflicts and local transaction was rolled back to prevent them. We can see that remote one wins, on node2:
MariaDB [test]> select * from t1;
+----+------+
| id | c1   |
+----+------+
|  1 |    5 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)
To summarize, in Galera "first commit wins" and local transaction involved in conflict is always a looser. You can get a lot of information about conflicts in the error log if you enable conflict logging features through wsrep_log_conflicts and cert.log_conflicts. See this fine manual for details.

MyRocks Deadlock

We can easily check how deadlocks are processed by MyRocks by just loading the plugin for the engine, converting tables to MyRocks and trying the same InnoDB scenario with the same Percona Server we used initially. But first, if you use Percona binaries you have to install a separate package:
openxs@ao756:~$ dpkg -l | grep rocksdb
openxs@ao756:~$ sudo apt-get install percona-server-rocksdb-5.7
[sudo] password for openxs:
Reading package lists... Done
Building dependency tree
...
Unpacking percona-server-rocksdb-5.7 (5.7.24-26-1.trusty) ...
Setting up percona-server-rocksdb-5.7 (5.7.24-26-1.trusty) ...


 * This release of Percona Server is distributed with RocksDB storage engine.
 * Run the following script to enable the RocksDB storage engine in Percona Server:

        ps-admin --enable-rocksdb -u <mysql_admin_user> -p[mysql_admin_pass] [-S <socket>] [-h <host> -P <port>]
Percona's manual has a lot more details and relies on separate ps-admin script, but basically you have to INSTALL PLUGINs like this (check script's code):
mysql> INSTALL PLUGIN ROCKSDB SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.86 sec)

mysql> INSTALL PLUGIN ROCKSDB_CFSTATS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.06 sec)

mysql> INSTALL PLUGIN ROCKSDB_DBSTATS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.08 sec)

mysql> INSTALL PLUGIN ROCKSDB_PERF_CONTEXT SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_PERF_CONTEXT_GLOBAL SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.06 sec)

mysql> INSTALL PLUGIN ROCKSDB_CF_OPTIONS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_GLOBAL_INFO SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_COMPACTION_STATS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_DDL SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.06 sec)

mysql> INSTALL PLUGIN ROCKSDB_INDEX_FILE_MAP SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_LOCKS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_TRX SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_DEADLOCK SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.06 sec)
Then check that the engine is there and convert tables:
mysql> show engines;
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                    | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| ROCKSDB            | YES     | RocksDB storage engine                                                     | YES          | YES  | YES        |
...

mysql> alter table t1 engine=rocksdb;
Query OK, 2 rows affected (0.64 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> alter table t2 engine=rocksdb;
Query OK, 2 rows affected (0.58 sec)
Records: 2  Duplicates: 0  Warnings: 0
Now we are ready to try the same InnoDB scenario. Just note that lock wait timeout for MyRocks is defined by the rocksdb_lock_wait_timeout that is small by default, 1 second, do you have have to increase it first. You also have to set rocksdb_deadlock_detect to ON (as it's OFF by default):
mysql> set global rocksdb_lock_wait_timeout=50;
Query OK, 0 rows affected (0.00 sec)

mysql> set global rocksdb_deadlock_detect=ON;
Query OK, 0 rows affected (0.00 sec)

mysql> \r
Connection id:    14
Current database: test

mysql> start transaction;
Query OK, 0 rows affected (0.02 sec)

mysql> select * from t1 where id = 1 for update;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
+----+------+
1 row in set (0.00 sec)
Then in the second session:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1 where id = 2 for update;
+----+------+
| id | c1   |
+----+------+
|  2 |    2 |
+----+------+
1 row in set (0.00 sec)
In the first:
mysql> select * from t1 where id = 2 for update;
and in the second we can get deadlock error:
mysql> select * from t1 where id = 1 for update;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
mysql> show global status like '%deadlock%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| rocksdb_row_lock_deadlocks | 1     |
+----------------------------+-------+
1 row in set (0.00 sec)
Note that MyRocks has status variable to count deadlocks. Note that Percona Server still does NOT seem to support SHOW ENGINE ROCKSDB TRANSACTION STATUS statement available upstream:
mysql> show engine rocksdb transaction status\G
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'transaction status' at line 1
I was not able to find a bug about this (sorry if I missed it), and just reported new task to Percona's JIRA: PS-5114 - "Add support for SHOW ENGINE ROCKSDB TRANSACTION STATUS".

That's probably more than enough for single blog post (that is mostly NOT about bugs). One day I'll refresh my knowledge of MyRocks etc and maybe write more about deadlocks troubleshooting there.

Do not be surprised if you can not find anything in INNODB STATUS when you get error 1213, just proceed with further steps. There are other reasons to explore. Venice hides a lot early in the morning...

To summarize, do not be surprised that after you got MySQL error 1213 you see no information about recent InnoDB deadlock - there are at least 3 more reasons for this error to be reported, as explained above. You should know your configuration and use several other commands and sources of information to pinpoint what exactly happened and why.

by Valeriy Kravchuk (noreply@blogger.com) at December 08, 2018 07:08 PM

December 07, 2018

Peter Zaitsev

MySQL 8 and The FRM Drop… How To Recover Table DDL

MySQL 8 frm drop recover ddl

… or what I should keep in mind in case of disaster

MySQL 8 frm drop recover ddl

To retrieve and maintain in SQL format the definition of all tables in a database, is a best practice that we all should adopt. To have that under version control is also another best practice to keep in mind.

While doing that may seem redundant, it can become a life saver in several situations. From the need to review what has historically changed in a table, to knowing who changed what and why… to when you need to recover your data and have your beloved MySQL instance not start…

But let’s be honest, only a few do the right thing, and even fewer keep that information up to date. Given that’s the case, what can we do when we have the need to discover/recover the table structure?

From the beginning, MySQL has used some external files to describe its internal structure.

For instance, if I have a schema named windmills and a table named wmillAUTOINC1, on the file system I will see this:

-rw-r-----. 1 mysql mysql     8838 Mar 14 2018 wmillAUTOINC1.frm
-rw-r-----. 1 mysql mysql   131072 Mar 14 2018 wmillAUTOINC1.ibd

The ibd file contains the data, while the frm file contains the structure information.

Putting aside ANY discussion about if this is safe, if it’s transactional and more… when we’ve experienced some major crash and data corruption this approach has been helpful. Being able to read from the frm file was the easiest way to get the information we need.
Simple tools like DBSake made the task quite trivial, and allowed us to script table definition when needed to run long, complex tedious data recovery:

[root@master1 windmills]# /opt/tools/dbsake frmdump wmillAUTOINC1.frm
--
-- Table structure for table `wmillAUTOINC1`
-- Created with MySQL Version 5.7.20
--
CREATE TABLE `wmillAUTOINC1` (
  `id` bigint(11) NOT NULL AUTO_INCREMENT,
  `uuid` char(36) COLLATE utf8_bin NOT NULL,
  `millid` smallint(6) NOT NULL,
  `kwatts_s` int(11) NOT NULL,
  `date` date NOT NULL,
  `location` varchar(50) COLLATE utf8_bin NOT NULL,
  `active` tinyint(2) NOT NULL DEFAULT '1',
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `strrecordtype` char(3) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_millid` (`millid`,`active`),
  KEY `IDX_active` (`id`,`active`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC;

Of course, if the frm file was also corrupt, then we could try to get the information from the ibdata dictionary. If that is corrupted too (trust me I’ve seen all of these situations) … well a last resource was hoping the customer has a recent table definition stored somewhere, but as mentioned before, we are not so diligent, are we?

Now, though, in MySQL8 we do not have FRM files, they were dropped. Even more interesting is that we do not have the same dictionary, most of the things that we knew have changed, including the dictionary location. So what can be done?

Well Oracle have moved the FRM information—and more—to what is called Serialized Dictionary Information (SDI), the SDI is written INSIDE the ibd file, and represents the redundant copy of the information contained in the data dictionary.

The SDI is updated/modified by DDL operations on tables that reside in that tablespace. This is it: if you have one file per table normally, then you will have in that file ONLY the SDI for that table, but if you have multiple tables in a tablespace, the SDI information will refer to ALL of the tables.

To extract this information from the IBD files, Oracle provides a utility called ibd2sdi. This application parses the SDI information and reports a JSON file that can be easily manipulated to extract and build the table definition.

One exception is represented by Partitioned tables. The SDI information is contained ONLY in the first partition, and if you drop it, it is moved to the next one. I will show that later.

But let’s see how it works. In the next examples I will look for the table’s name, attributes, and datatype starting from the dictionary tables.

To obtain the info I will do this:

/opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/mysql.ibd |jq  '.[]?|.[]?|.dd_object?|("------------------------------------"?,"TABLE NAME = ",.name?,"****",(.columns?|.[]?|(.name?,.column_type_utf8?)))'

The result will be something like:

"------------------------------------"
"TABLE NAME = "
"tables"
"****"
"id"
"bigint(20) unsigned"
"schema_id"
"bigint(20) unsigned"
"name"
"varchar(64)"
"type"
"enum('BASE TABLE','VIEW','SYSTEM VIEW')"
"engine"
"varchar(64)"
"mysql_version_id"
"int(10) unsigned"
"row_format"
"enum('Fixed','Dynamic','Compressed','Redundant','Compact','Paged')"
"collation_id"
"bigint(20) unsigned"
"comment"
"varchar(2048)"
<snip>
"------------------------------------"
"TABLE NAME = "
"tablespaces"
"****"
"id"
"bigint(20) unsigned"
"name"
"varchar(259)"
"options"
"mediumtext"
"se_private_data"
"mediumtext"
"comment"
"varchar(2048)"
"engine"
"varchar(64)"
"DB_TRX_ID"
""
"DB_ROLL_PTR"
""

I cut the output for brevity, but if you run the above command yourself you’ll be able to see that this retrieves the information for ALL the tables residing in the IBD.

The other thing I hope you noticed is that I am NOT parsing ibdata, but mysql.ibd. Why? Because the dictionary was moved out from ibdata and is now in mysql.ibd.

Look what happens if I try to parse ibdata:

[root@master1 ~]# /opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/ibdata1 |jq '.'
[INFO] ibd2sdi: SDI is empty.

Be very careful here to not mess up your mysql.ibd file.

Now what can I do to get information about my wmillAUTOINC1 table in MySQL8?

That is quite simple:

/opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/windmills/wmillAUTOINC.ibd |jq '.'
[
  "ibd2sdi",
  {
    "type": 1,
    "id": 1068,
    "object": {
      "mysqld_version_id": 80013,
      "dd_version": 80013,
      "sdi_version": 1,
      "dd_object_type": "Table",
      "dd_object": {
        "name": "wmillAUTOINC",
        "mysql_version_id": 80011,
        "created": 20180925095853,
        "last_altered": 20180925095853,
        "hidden": 1,
        "options": "avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=1;row_type=2;stats_auto_recalc=0;stats_sample_pages=0;",
        "columns": [
          {
            "name": "id",
            "type": 9,
            "is_nullable": false,
            "is_zerofill": false,
            "is_unsigned": false,
            "is_auto_increment": true,
            "is_virtual": false,
            "hidden": 1,
            "ordinal_position": 1,
            "char_length": 11,
            "numeric_precision": 19,
            "numeric_scale": 0,
            "numeric_scale_null": false,
            "datetime_precision": 0,
            "datetime_precision_null": 1,
            "has_no_default": false,
            "default_value_null": false,
            "srs_id_null": true,
            "srs_id": 0,
            "default_value": "AAAAAAAAAAA=",
            "default_value_utf8_null": true,
            "default_value_utf8": "",
            "default_option": "",
            "update_option": "",
            "comment": "",
            "generation_expression": "",
            "generation_expression_utf8": "",
            "options": "interval_count=0;",
            "se_private_data": "table_id=1838;",
            "column_key": 2,
            "column_type_utf8": "bigint(11)",
            "elements": [],
            "collation_id": 83,
            "is_explicit_collation": false
          },
<SNIP>
        "indexes": [
          {
            "name": "PRIMARY",
            "hidden": false,
            "is_generated": false,
            "ordinal_position": 1,
            "comment": "",
            "options": "flags=0;",
            "se_private_data": "id=2261;root=4;space_id=775;table_id=1838;trx_id=6585972;",
            "type": 1,
            "algorithm": 2,
            "is_algorithm_explicit": false,
            "is_visible": true,
            "engine": "InnoDB",
<Snip>
        ],
        "foreign_keys": [],
        "partitions": [],
        "collation_id": 83
      }
    }
  },
  {
    "type": 2,
    "id": 780,
    "object": {
      "mysqld_version_id": 80011,
      "dd_version": 80011,
      "sdi_version": 1,
      "dd_object_type": "Tablespace",
      "dd_object": {
        "name": "windmills/wmillAUTOINC",
        "comment": "",
        "options": "",
        "se_private_data": "flags=16417;id=775;server_version=80011;space_version=1;",
        "engine": "InnoDB",
        "files": [
          {
            "ordinal_position": 1,
            "filename": "./windmills/wmillAUTOINC.ibd",
            "se_private_data": "id=775;"
          }
        ]
      }
    }
  }
]

The JSON will contains:

  • A section describing the DB object at high level
  • Array of columns and related information
  • Array of indexes
  • Partition information (not here but in the next example)
  • Table space information

That is a lot more detail compared to what we had in the FRM, and it is quite relevant and interesting information as well.

Once you have extracted the SDI, any JSON parser tool script can generate the information for the SQL DDL.

I mention partitions, so let’s look at this a bit more, given they can be tricky.

As mentioned, the SDI information is present ONLY in the first partition. All other partitions hold ONLY the tablespace information. Given that, then the first thing to do is to identify which partition is the first… OR simply try to access all partitions, and when you are able to get the details, extract them.

The process is the same:

[root@master1 ~]# /opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170301.ibd |jq '.'
[
  "ibd2sdi",
  {
    "type": 1,
    "id": 1460,
    "object": {
      "mysqld_version_id": 80013,
      "dd_version": 80013,
      "sdi_version": 1,
      "dd_object_type": "Table",
      "dd_object": {
        "name": "wmillAUTOINCPART",
        "mysql_version_id": 80013,
        "created": 20181125110300,
        "last_altered": 20181125110300,
        "hidden": 1,
        "options": "avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=1;row_type=2;stats_auto_recalc=0;stats_sample_pages=0;",
        "columns": [<snip>
    	  "schema_ref": "windmills",
        "se_private_id": 18446744073709552000,
        "engine": "InnoDB",
        "last_checked_for_upgrade_version_id": 80013,
        "comment": "",
        "se_private_data": "autoinc=31080;version=2;",
        "row_format": 2,
        "partition_type": 7,
        "partition_expression": "to_days(`date`)",
        "partition_expression_utf8": "to_days(`date`)",
        "default_partitioning": 1,
        "subpartition_type": 0,
        "subpartition_expression": "",
        "subpartition_expression_utf8": "",
        "default_subpartitioning": 0,
       ],
<snip>
        "foreign_keys": [],
        "partitions": [
          {
            "name": "PT20170301",
            "parent_partition_id": 18446744073709552000,
            "number": 0,
            "se_private_id": 1847,
            "description_utf8": "736754",
            "engine": "InnoDB",
            "comment": "",
            "options": "",
            "se_private_data": "autoinc=0;version=0;",
            "values": [
              {
                "max_value": false,
                "null_value": false,
                "list_num": 0,
                "column_num": 0,
                "value_utf8": "736754"
              }
            ],

The difference, as you can see, is that the section related to partitions and sub partitions will be filled with all the details you might need to recreate the partitions.

We will have:

  • Partition type
  • Partition expression
  • Partition values
  • …more

Same for sub partitions.

Now again see what happens if I parse the second partition:

[root@master1 ~]# /opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170401.ibd |jq '.'
[
  "ibd2sdi",
  {
    "type": 2,
    "id": 790,
    "object": {
      "mysqld_version_id": 80011,
      "dd_version": 80011,
      "sdi_version": 1,
      "dd_object_type": "Tablespace",
      "dd_object": {
        "name": "windmills/wmillAUTOINCPART#P#PT20170401",
        "comment": "",
        "options": "",
        "se_private_data": "flags=16417;id=785;server_version=80011;space_version=1;",
        "engine": "InnoDB",
        "files": [
          {
            "ordinal_position": 1,
            "filename": "./windmills/wmillAUTOINCPART#P#PT20170401.ibd",
            "se_private_data": "id=785;"
          }
        ]
      }
    }
  }
]

I will get only the information about the tablespace, not the table.

As promised let me show you now what happens if I delete the first partition, and the second partition becomes the first:

(root@localhost) [windmills]>alter table wmillAUTOINCPART drop partition PT20170301;
Query OK, 0 rows affected (1.84 sec)
Records: 0  Duplicates: 0  Warnings: 0
[root@master1 ~]# /opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170401.ibd |jq '.'|more
[
  "ibd2sdi",
  {
    "type": 1,
    "id": 1461,
    "object": {
      "mysqld_version_id": 80013,
      "dd_version": 80013,
      "sdi_version": 1,
      "dd_object_type": "Table",
      "dd_object": {
        "name": "wmillAUTOINCPART",
        "mysql_version_id": 80013,
        "created": 20181129130834,
        "last_altered": 20181129130834,
        "hidden": 1,
        "options": "avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=1;row_type=2;stats_auto_recalc=0;stats_sample_pages=0;",
        "columns": [
          {
            "name": "id",
            "type": 9,
            "is_nullable": false,
            "is_zerofill": false,
            "is_unsigned": false,
            "is_auto_increment": true,
            "is_virtual": false,
            "hidden": 1,
            "ordinal_position": 1,

As I mentioned before, each DDL updates the SDI, and here we go: I will have all the information on what’s NOW the FIRST partition. Please note the value of the attribute “created” between the first time I queried the other partition, and the one that I have now:

/opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170301.ibd
       "created": 20181125110300,
/opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170401.ibd
       "created": 20181129130834,

To be clear the second created is NOW (PT20170401) from when I dropped the other partition (PT20170301).

Conclusions

In the end, this solution is definitely more powerful than the FRM files. It will allow us to parse the file and identify the table definition more easily, providing us with much more detail and information.

The problems will arise if and when the IBD file becomes corrupt.

As for the manual:  For InnoDB, an SDI record requires a single index page, which is 16KB in size by default. However, SDI data is compressed to reduce the storage footprint.

By which it means that for each table I have a page, if I associate record=table. Which means that in case of IBD corruption I should (likely) be able to read those pages. Unless I have bad (very bad) luck.

I still wonder how the dimension of an IBD affects the SDI retrieval, but given I have not tried it yet I will have to let you know.

As an aside, I am working on a script to facilitate the generation of the SQL, it’s not yet ready but you can find it here

Last note but keep this in mind! It is stated in the manual but in a hidden place and in small letters:
DDL operations take longer due to writing to storage, undo logs, and redo logs instead of .frm files.

References

https://stedolan.github.io/jq/

https://dev.mysql.com/doc/refman/8.0/en/ibd2sdi.html

https://dev.mysql.com/doc/refman/8.0/en/serialized-dictionary-information.html

https://dev.mysql.com/doc/refman/8.0/en/data-dictionary-limitations.html


Photo by chuttersnap on Unsplash

by Marco Tusa at December 07, 2018 01:31 PM

December 06, 2018

Jean-Jerome Schmidt

MySQL & MariaDB Query Caching with ProxySQL & ClusterControl

Queries have to be cached in every heavily loaded database, there is simply no way for a database to handle all traffic with reasonable performance. There are various mechanisms in which a query cache can be implemented. Starting from the MySQL query cache, which used to work just fine for mostly read-only, low concurrency workloads and which has no place in high concurrent workloads (to the extent that Oracle removed it in MySQL 8.0), to external key-value stores like Redis, memcached or CouchBase.

The main problem with using an external dedicated data store (as we would not recommend to use MySQL query cache to anyone) is that this is yet another datastore to manage. It is yet another environment to maintain, scaling issues to handle, bugs to debug and so on.

So why not kill two birds with one stone by leveraging your proxy? The assumption here is that you are using a proxy in your production environment, as it helps load balance queries across instances, and mask the underlying database topology by provide a simple endpoint to applications. ProxySQL is a great tool for the job, as it can additionally function as a caching layer. In this blog post, we’ll show you how to cache queries in ProxySQL using ClusterControl.

How Query Cache Works in ProxySQL?

First of all, a bit of a background. ProxySQL manages traffic through query rules and it can accomplish query caching using the same mechanism. ProxySQL stores cached queries in a memory structure. Cached data is evicted using time-to-live (TTL) setting. TTL can be defined for each query rule individually so it is up to the user to decide if query rules are to be defined for each individual query, with distinct TTL or if she just needs to create a couple of rules which will match the majority of the traffic.

There are two configuration settings that define how a query cache should be used. First, mysql-query_cache_size_MB which defines a soft limit on the query cache size. It is not a hard limit so ProxySQL may use slightly more memory than that, but it is enough to keep the memory utilization under control. Second setting you can tweak is mysql-query_cache_stores_empty_result. It defines if an empty result set is cached or not.

ProxySQL query cache is designed as a key-value store. The value is the result set of a query and the key is composed from concatenated values like: user, schema and query text. Then a hash is created off that string and that hash is used as the key.

Setting up ProxySQL as a Query Cache Using ClusterControl

As the initial setup, we have a replication cluster of one master and one slave. We also have a single ProxySQL.

This is by no means a production-grade setup as we would have to implement some sort of high availability for the proxy layer (for example by deploying more than one ProxySQL instance, and then keepalived on top of them for floating Virtual IP), but it will be more than enough for our tests.

First, we are going to verify the ProxySQL configuration to make sure query cache settings are what we want them to be.

256 MB of query cache should be about right and we want to cache also the empty result sets - sometimes a query which returns no data still have to do a lot of work to verify there’s nothing to return.

Next step is to create query rules which will match the queries you want to cache. There are two ways to do that in ClusterControl.

Manually Adding Query Rules

First way requires a bit more manual actions. Using ClusterControl you can easily create any query rule you want, including query rules that do the caching. First, let’s take a look at the list of the rules:

At this point, we have a set of query rules to perform the read/write split. The first rule has an ID of 100. Our new query rule has to be processed before that one so we will use lower rule ID. Let’s create a query rule which will do the caching of queries similar to this one:

SELECT DISTINCT c FROM sbtest8 WHERE id BETWEEN 5041 AND 5140 ORDER BY c

There are three ways of matching the query: Digest, Match Digest and Match Pattern. Let’s talk a bit about them here. First, Match Digest. We can set here a regular expression that will match a generalized query string that represents some query type. For example, for our query:

SELECT DISTINCT c FROM sbtest8 WHERE id BETWEEN 5041 AND 5140 ORDER BY c

The generic representation will be:

SELECT DISTINCT c FROM sbtest8 WHERE id BETWEEN ? AND ? ORDER BY c

As you can see, it stripped the arguments to the WHERE clause therefore all queries of this type are represented as a single string. This option is quite nice to use because it matches whole query type and, what’s even more important, it’s stripped off any whitespaces. This makes it so much easier to write a regular expression as you don’t have to account for weird line breaks, whitespaces at the beginning or end of the string and so on.

Digest is basically a hash that ProxySQL calculates over the Match Digest form.

Finally, Match Pattern matches against full query text, as it was sent by the client. In our case, the query will have a form of:

SELECT DISTINCT c FROM sbtest8 WHERE id BETWEEN 5041 AND 5140 ORDER BY c

We are going to use Match Digest as we want all of those queries to be covered by the query rule. If we wanted to cache just that particular query, a good option would be to use Match Pattern.

The regular expression that we use is:

SELECT DISTINCT c FROM sbtest[0-9]+ WHERE id BETWEEN \? AND \? ORDER BY c

We are matching literally the exact generalized query string with one exception - we know that this query hit multiple tables therefore we added a regular expression to match all of them.

Once this is done, we can see if the query rule is in effect or not.

We can see that ‘Hits’ are increasing which means that our query rule is being used. Next, we’ll look at another way to create a query rule.

Using ClusterControl to Create Query Rules

ProxySQL has a useful functionality of collecting statistics of the queries it routed. You can track data like execution time, how many times a given query was executed and so on. This data is also present in ClusterControl:

What is even better, if you point on a given query type, you can create a query rule related to it. You can also easily cache this particular query type.

As you can see, some of the data like Rule IP, Cache TTL or Schema Name are already filled. ClusterControl will also fill data based on which matching mechanism you decided to use. We can easily use either hash for a given query type or we can use Match Digest or Match Pattern if we would like to fine-tune the regular expression (for example doing the same as we did earlier and extending the regular expression to match all the tables in sbtest schema).

This is all you need to easily create query cache rules in ProxySQL. Download ClusterControl to try it today.

by krzysztof at December 06, 2018 10:58 AM

December 05, 2018

Oli Sennhauser

UNDO logs in InnoDB system tablespace ibdata1

We see sometimes at customers that they have very big InnoDB system tablespace files (ibdata1) although they have set innodb_file_per_table = 1.

So we want to know what else is stored in the InnoDB system tablespace file ibdata1 to see what we can do against this unexpected growth.

First let us check the size of the ibdata1 file:

# ll ibdata1 
-rw-rw---- 1 mysql mysql 109064486912 Dez  5 19:10 ibdata1

The InnoDB system tablespace is about 101.6 Gibyte in size. This is exactly 6'656'768 InnoDB blocks of 16 kibyte block size.

So next we want to analyse the InnoDB system tablespace ibdata1 file. For this we can use the tool innochecksum:

# innochecksum --page-type-summary ibdata1 
Error: Unable to lock file:: ibdata1
fcntl: Resource temporarily unavailable

But... the tool innochecksum throughs an error. It seems like it is not allowed to analyse the InnoDB system tablespace with a running database. So then let us stop the database first and try it again. Now we get a useful output:

# innochecksum --page-type-summary ibdata1 
File::ibdata1
================PAGE TYPE SUMMARY==============
#PAGE_COUNT     PAGE_TYPE
===============================================
  349391        Index page                        5.25%
 6076813        Undo log page                    91.29%
   18349        Inode page                        0.28%
  174659        Insert buffer free list page      2.62%
   36639        Freshly allocated page            0.55%
     405        Insert buffer bitmap              0.01%
      98        System page
       1        Transaction system page
       1        File Space Header
     404        Extent descriptor page            0.01%
       0        BLOB page
       8        Compressed BLOB page
       0        Other type of page
-------------------------------------------------------
 6656768        Pages total                     100.00%
===============================================
Additional information:
Undo page type: 3428 insert, 6073385 update, 0 other
Undo page state: 1 active, 67 cached, 249 to_free, 1581634 to_purge, 0 prepared, 4494862 other

So we can see that about 91% (about 92 Gibyte) of the InnoDB system tablespace ibdata1 blocks are used by InnoDB UNDO log pages. To avoid growing of ibdata1 you have to create a database instance with separate InnoDB UNDO tablespaces: Undo Tablespaces.

Taxonomy upgrade extras: 

by Shinguz at December 05, 2018 08:55 PM

Peter Zaitsev

Nondeterministic Functions in MySQL (i.e. rand) Can Surprise You

MySQL non deterministic functions rand

Working on a test case with sysbench, I encountered this:

mysql> select * from sbtest1 where id = round(rand()*10000, 0);
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id   | k      | c                                                                                                                       | pad                                                         |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
|  179 | 499871 | 09833083632-34593445843-98203182724-77632394229-31240034691-22855093589-98577647071-95962909368-34814236148-76937610370 | 62233363025-41327474153-95482195752-11204169522-13131828192 |
| 1606 | 502031 | 81212399253-12831141664-41940957498-63947990218-16408477860-15124776228-42269003436-07293216458-45216889819-75452278174 | 25423822623-32136209218-60113604068-17409951653-00581045257 |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
2 rows in set (0.30 sec)

I was really surprised. First, and the most important, id is a primary key and the rand() function should produce just one value. How come it returns two rows? Second, why is the response time 0.30 sec? That seems really high for a primary key access.

Looking further:

CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=latin1
mysql> explain select * from sbtest1 where id = round(rand()*10000, 0);
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | sbtest1 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 986400 |    10.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+

So it is a primary key, but MySQL does not use an index, and it returns two rows. Is this a bug?

Deterministic vs nondeterministic functions

Turned out it is not a bug at all. It is pretty logical behavior from MySQL, but it is not what we would expect. First, why a full table scan? Well, rand() is nondeterministic function. That means we do not know what it will return ahead of time, and actually that is exactly the purpose of rand() – to return a random value. In this case, it is only logical to evaluate the function for each row, each time, and compare the results. i.e. in our case

  1. Read row 1, get the value of id, evaluate the value of RAND(), compare
  2. Proceed using the same algorithm with the remaining rows.

In other words, as the value of rand() is not known (not evaluated) beforehand, so we can’t use an index.

And in this case – rand() function – we have another interesting consequence. For larger tables with an auto_increment primary key, the probability of matching the rand() value and the auto_increment value is higher, so we can get multiple rows back. In fact, if we read the whole table from the beginning and keep comparing the auto_inc sequence with “the roll of the dice”, we can get many rows back.

That behavior is totally counter-intuitive. Nevertheless, to me, it’s also the only correct behavior.

We expect to have the rand() function evaluated before running the query.  This can actually be achieved by assigning rand() to a variable:

mysql> set @id=round(rand()*10000, 0); select @id; select * from sbtest1 where id = @id;
Query OK, 0 rows affected (0.00 sec)
+------+
| @id  |
+------+
| 6068 |
+------+
1 row in set (0.00 sec)
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id   | k      | c                                                                                                                       | pad                                                         |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 6068 | 502782 | 84971025350-12845068791-61736600622-38249796467-85706778555-74134284808-24438972515-17848828748-86869270666-01547789681 | 17507194006-70651503059-23792945260-94159543806-65683812344 |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain select * from sbtest1 where id = @id;
+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | sbtest1 | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.01 sec)

This would meet our expectations.

There are (at least) two bug reports filed, with very interesting discussion:

  1. rand() used in scalar functions returns multiple rows
  2. SELECT on PK with ROUND(RAND()) give wrong errors

Other databases

I wanted to see how it works in other SQL databases. In PostgreSQL, the behavior is exactly the same as MySQL:

postgres=# select * from t2 where id = cast(random()*10000 as int);
  id  |    c
------+---------
 4093 | asdasda
 9378 | asdasda
(2 rows)
postgres=# select * from t2 where id = cast(random()*10000 as int);
  id  |    c
------+---------
 5988 | asdasda
 6674 | asdasda
(2 rows)
postgres=# explain select * from t2 where id = cast(random()*10000 as int);
                             QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on t2  (cost=0.00..159837.60 rows=1 width=12)
   Filter: (id = ((random() * '10000'::double precision))::integer)
(2 rows)

And SQLite seems different, evaluating the random() function beforehand:

sqlite> select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
16239|asdsadasdsa
sqlite> select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
32910|asdsadasdsa
sqlite> select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
58658|asdsadasdsa
sqlite> explain select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     12    0                    00  Start at 12
1     OpenRead       0     30182  0     2              00  root=30182 iDb=0; t2
2     Function0      0     0     3     random(0)      00  r[3]=func(r[0])
3     Cast           3     69    0                    00  affinity(r[3])
4     Function0      0     3     2     abs(1)         01  r[2]=func(r[3])
5     Divide         4     2     1                    00  r[1]=r[2]/r[4]
6     Cast           1     68    0                    00  affinity(r[1])
7     SeekRowid      0     11    1                    00  intkey=r[1]; pk
8     Copy           1     5     0                    00  r[5]=r[1]
9     Column         0     1     6                    00  r[6]=t2.c
10    ResultRow      5     2     0                    00  output=r[5..6]
11    Halt           0     0     0                    00
12    Transaction    0     0     2     0              01  usesStmtJournal=0
13    Int64          0     4     0     92233720368547  00 r[4]=92233720368547
14    Goto           0     1     0                    00

Conclusion

Be careful when using MySQL nondeterministic functions in  a “where” condition – rand() is the most interesting example – as their behavior may surprise you. Many people believe this to be a bug that should be fixed. Let me know in the comments: do you think it is a bug or not (and why)? I would also be interested to know how it works in other, non-opensource databases (Microsoft SQL Server, Oracle, etc)

PS: Finally, I’ve got a “clever” idea – what if I “trick” MySQL by using the deterministic keyword…

MySQL stored functions: deterministic vs not deterministic

So, I wanted to see how it works with MySQL stored functions if they are assigned “deterministic” and “not deterministic” keywords. First, I wanted to “trick” mysql and pass the deterministic to the stored function but use rand() inside. Ok, this is not what you really want to do!

DELIMITER $$
CREATE FUNCTION myrand() RETURNS INT
    DETERMINISTIC
BEGIN
 RETURN round(rand()*10000, 0);
END$$
DELIMITER ;

From MySQL manual about MySQL stored routines we can read:

Assessment of the nature of a routine is based on the “honesty” of the creator: MySQL does not check that a routine declared DETERMINISTIC is free of statements that produce nondeterministic results. However, misdeclaring a routine might affect results or affect performance. Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices. Declaring a deterministic routine as NONDETERMINISTIC might diminish performance by causing available optimizations not to be used.

The result is interesting:

mysql> select myrand();
+----------+
| myrand() |
+----------+
|     4202 |
+----------+
1 row in set (0.00 sec)
mysql> select myrand();
+----------+
| myrand() |
+----------+
|     7548 |
+----------+
1 row in set (0.00 sec)
mysql> explain select * from t2 where id = myrand()\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Impossible WHERE noticed after reading const tables
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------+
| Level | Code | Message                                                                        |
+-------+------+--------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select '2745' AS `id`,'asasdas' AS `c` from `test`.`t2` where 0 |
+-------+------+--------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select * from t2 where id = 4202;
+------+---------+
| id   | c       |
+------+---------+
| 4202 | asasdas |
+------+---------+
1 row in set (0.00 sec)
mysql> select * from t2 where id = 2745;
+------+---------+
| id   | c       |
+------+---------+
| 2745 | asasdas |
+------+---------+
1 row in set (0.00 sec)

So MySQL optimizer detected the problem (somehow).

If I use the NOT DETERMINISTIC keyword, then MySQL works the same as when using the rand() function:

DELIMITER $$
CREATE FUNCTION myrand2() RETURNS INT
   NOT DETERMINISTIC
BEGIN
 RETURN round(rand()*10000, 0);
END$$
DELIMITER ;
mysql> explain select * from t2 where id = myrand2()\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 262208
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

 


Photo by dylan nolte on Unsplash

by Alexander Rubin at December 05, 2018 06:44 PM

Jean-Jerome Schmidt

MySQL on Docker: Multiple Delayed Replication Slaves for Disaster Recovery with Low RTO

Delayed replication allows a replication slave to deliberately lag behind the master by at least a specified amount of time. Before executing an event, the slave will first wait, if necessary, until the given time has passed since the event was created on the master. The result is that the slave will reflect the state of the master some time back in the past. This feature is supported since MySQL 5.6 and MariaDB 10.2.3. It can come in handy in case of accidental data deletion, and should be part of your disaster recovery plan.

The problem when setting up a delayed replication slave is how much delay we should put on. Too short of time and you risk the bad query getting to your delayed slave before you can get to it, thus wasting the point of having the delayed slave. Optionally, you can have your delayed time to be so long that it take hours for your delayed slave to catch up to where the master was at the time of the error.

Luckily with Docker, process isolation is its strength. Running multiple MySQL instances is pretty convenient with Docker. It allows us to have multiple delayed slaves within a single physical host to improve our recovery time and save hardware resources. If you think a 15-minute delay is too short, we can have another instance with 1-hour delay or 6-hour for an even older snapshot of our database.

In this blog post, we are going to deploy multiple MySQL delayed slaves on one single physical host with Docker, and show some recovery scenarios. The following diagram illustrates our final architecture that we want to build:

Our architecture consists of an already deployed 2-node MySQL Replication running on physical servers (blue) and we would like to set up another three MySQL slaves (green) with following behaviour:

  • 15 minutes delay
  • 1 hour delay
  • 6 hours delay

Take note that we are going to have 3 copies of the exact same data on the same physical server. Ensure our Docker host has the storage required, so do allocate sufficient disk space beforehand.

MySQL Master Preparation

Firstly, login to the master server and create the replication user:

mysql> GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%' IDENTIFIED BY 'YlgSH6bLLy';

Then, create a PITR-compatible backup on the master:

$ mysqldump -uroot -p --flush-privileges --hex-blob --opt --master-data=1 --single-transaction --skip-lock-tables --skip-lock-tables --triggers --routines --events --all-databases | gzip -6 -c > mysqldump_complete.sql.gz

If you are using ClusterControl, you can make a PITR-compatible backup easily. Go to Backups -> Create Backup and pick "Complete PITR-compatible" under the "Dump Type" dropdown:

Finally, transfer this backup to the Docker host:

$ scp mysqldump_complete.sql.gz root@192.168.55.200:~

This backup file will be used by the MySQL slave containers during the slave bootstrapping process, as shown in the next section.

Delayed Slave Deployment

Prepare our Docker container directories. Create 3 directories (mysql.conf.d, datadir and sql) for every MySQL container that we are going to launch (you can use loop to simplify the commands below):

$ mkdir -p /storage/mysql-slave-15m/mysql.conf.d
$ mkdir -p /storage/mysql-slave-15m/datadir
$ mkdir -p /storage/mysql-slave-15m/sql
$ mkdir -p /storage/mysql-slave-1h/mysql.conf.d
$ mkdir -p /storage/mysql-slave-1h/datadir
$ mkdir -p /storage/mysql-slave-1h/sql
$ mkdir -p /storage/mysql-slave-6h/mysql.conf.d
$ mkdir -p /storage/mysql-slave-6h/datadir
$ mkdir -p /storage/mysql-slave-6h/sql

"mysql.conf.d" directory will store our custom MySQL configuration file and will be mapped into the container under /etc/mysql.conf.d. "datadir" is where we want Docker to store the MySQL data directory, which maps to /var/lib/mysql of the container and "sql" directory stores our SQL files - backup files in .sql or .sql.gz format to stage the slave before replicating and also .sql files to automate the replication configuration and startup.

15-minute Delayed Slave

Prepare the MySQL configuration file for our 15-minute delayed slave:

$ vim /storage/mysql-slave-15m/mysql.conf.d/my.cnf

And add the following lines:

[mysqld]
server_id=10015
binlog_format=ROW
log_bin=binlog
log_slave_updates=1
gtid_mode=ON
enforce_gtid_consistency=1
relay_log=relay-bin
expire_logs_days=7
read_only=ON

** The server-id value we used for this slave is 10015.

Next, under /storage/mysql-slave-15m/sql directory, create two SQL files, one to RESET MASTER (1reset_master.sql) and another one to establish the replication link using CHANGE MASTER statement (3setup_slave.sql).

Create a text file 1reset_master.sql and add the following line:

RESET MASTER;

Create a text file 3setup_slave.sql and add the following lines:

CHANGE MASTER TO MASTER_HOST = '192.168.55.171', MASTER_USER = 'rpl_user', MASTER_PASSWORD = 'YlgSH6bLLy', MASTER_AUTO_POSITION = 1, MASTER_DELAY=900;
START SLAVE;

MASTER_DELAY=900 is equal to 15 minutes (in seconds). Then copy the backup file taken from our master (that has been transferred into our Docker host) to the "sql" directory and renamed it as 2mysqldump_complete.sql.gz:

$ cp ~/mysqldump_complete.tar.gz /storage/mysql-slave-15m/sql/2mysqldump_complete.tar.gz

The final look of our "sql" directory should be something like this:

$ pwd
/storage/mysql-slave-15m/sql
$ ls -1
1reset_master.sql
2mysqldump_complete.sql.gz
3setup_slave.sql

Take note that we prefix the SQL filename with an integer to determine the execution order when Docker initializes the MySQL container.

Once everything is in place, run the MySQL container for our 15-minute delayed slave:

$ docker run -d \
--name mysql-slave-15m \
-e MYSQL_ROOT_PASSWORD=password \
--mount type=bind,source=/storage/mysql-slave-15m/datadir,target=/var/lib/mysql \
--mount type=bind,source=/storage/mysql-slave-15m/mysql.conf.d,target=/etc/mysql/mysql.conf.d \
--mount type=bind,source=/storage/mysql-slave-15m/sql,target=/docker-entrypoint-initdb.d \
mysql:5.7

** The MYSQL_ROOT_PASSWORD value must be the same as the MySQL root password on the master.

The following lines are what we are looking for to verify if MySQL is running correctly and connected as a slave to our master (192.168.55.171):

$ docker logs -f mysql-slave-15m
...
2018-12-04T04:05:24.890244Z 0 [Note] mysqld: ready for connections.
Version: '5.7.24-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)
2018-12-04T04:05:25.010032Z 2 [Note] Slave I/O thread for channel '': connected to master 'rpl_user@192.168.55.171:3306',replication started in log 'FIRST' at position 4

You can then verify the replication status with following statement:

$ docker exec -it mysql-slave-15m mysql -uroot -p -e 'show slave status\G'
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                    SQL_Delay: 900
                Auto_Position: 1
...

At this point, our 15-minute delayed slave container is replicating correctly and our architecture is looking something like this:

1-hour Delayed Slave

Prepare the MySQL configuration file for our 1-hour delayed slave:

$ vim /storage/mysql-slave-1h/mysql.conf.d/my.cnf

And add the following lines:

[mysqld]
server_id=10060
binlog_format=ROW
log_bin=binlog
log_slave_updates=1
gtid_mode=ON
enforce_gtid_consistency=1
relay_log=relay-bin
expire_logs_days=7
read_only=ON

** The server-id value we used for this slave is 10060.

Next, under /storage/mysql-slave-1h/sql directory, create two SQL files, one to RESET MASTER (1reset_master.sql) and another one to establish the replication link using CHANGE MASTER statement (3setup_slave.sql).

Create a text file 1reset_master.sql and add the following line:

RESET MASTER;

Create a text file 3setup_slave.sql and add the following lines:

CHANGE MASTER TO MASTER_HOST = '192.168.55.171', MASTER_USER = 'rpl_user', MASTER_PASSWORD = 'YlgSH6bLLy', MASTER_AUTO_POSITION = 1, MASTER_DELAY=3600;
START SLAVE;

MASTER_DELAY=3600 is equal to 1 hour (in seconds). Then copy the backup file taken from our master (that has been transferred into our Docker host) to the "sql" directory and renamed it as 2mysqldump_complete.sql.gz:

$ cp ~/mysqldump_complete.tar.gz /storage/mysql-slave-1h/sql/2mysqldump_complete.tar.gz

The final look of our "sql" directory should be something like this:

$ pwd
/storage/mysql-slave-1h/sql
$ ls -1
1reset_master.sql
2mysqldump_complete.sql.gz
3setup_slave.sql

Take note that we prefix the SQL filename with an integer to determine the execution order when Docker initializes the MySQL container.

Once everything is in place, run the MySQL container for our 1-hour delayed slave:

$ docker run -d \
--name mysql-slave-1h \
-e MYSQL_ROOT_PASSWORD=password \
--mount type=bind,source=/storage/mysql-slave-1h/datadir,target=/var/lib/mysql \
--mount type=bind,source=/storage/mysql-slave-1h/mysql.conf.d,target=/etc/mysql/mysql.conf.d \
--mount type=bind,source=/storage/mysql-slave-1h/sql,target=/docker-entrypoint-initdb.d \
mysql:5.7

** The MYSQL_ROOT_PASSWORD value must be the same as the MySQL root password on the master.

The following lines are what we are looking for to verify if MySQL is running correctly and connected as a slave to our master (192.168.55.171):

$ docker logs -f mysql-slave-1h
...
2018-12-04T04:05:24.890244Z 0 [Note] mysqld: ready for connections.
Version: '5.7.24-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)
2018-12-04T04:05:25.010032Z 2 [Note] Slave I/O thread for channel '': connected to master 'rpl_user@192.168.55.171:3306',replication started in log 'FIRST' at position 4

You can then verify the replication status with following statement:

$ docker exec -it mysql-slave-1h mysql -uroot -p -e 'show slave status\G'
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                    SQL_Delay: 3600
                Auto_Position: 1
...

At this point, our 15-minute and 1-hour MySQL delayed slave containers are replicating from the master and our architecture is looking something like this:

6-hour Delayed Slave

Prepare the MySQL configuration file for our 6-hour delayed slave:

$ vim /storage/mysql-slave-15m/mysql.conf.d/my.cnf

And add the following lines:

[mysqld]
server_id=10006
binlog_format=ROW
log_bin=binlog
log_slave_updates=1
gtid_mode=ON
enforce_gtid_consistency=1
relay_log=relay-bin
expire_logs_days=7
read_only=ON

** The server-id value we used for this slave is 10006.

Next, under /storage/mysql-slave-6h/sql directory, create two SQL files, one to RESET MASTER (1reset_master.sql) and another one to establish the replication link using CHANGE MASTER statement (3setup_slave.sql).

Create a text file 1reset_master.sql and add the following line:

RESET MASTER;

Create a text file 3setup_slave.sql and add the following lines:

CHANGE MASTER TO MASTER_HOST = '192.168.55.171', MASTER_USER = 'rpl_user', MASTER_PASSWORD = 'YlgSH6bLLy', MASTER_AUTO_POSITION = 1, MASTER_DELAY=21600;
START SLAVE;

MASTER_DELAY=21600 is equal to 6 hours (in seconds). Then copy the backup file taken from our master (that has been transferred into our Docker host) to the "sql" directory and renamed it as 2mysqldump_complete.sql.gz:

$ cp ~/mysqldump_complete.tar.gz /storage/mysql-slave-6h/sql/2mysqldump_complete.tar.gz

The final look of our "sql" directory should be something like this:

$ pwd
/storage/mysql-slave-6h/sql
$ ls -1
1reset_master.sql
2mysqldump_complete.sql.gz
3setup_slave.sql

Take note that we prefix the SQL filename with an integer to determine the execution order when Docker initializes the MySQL container.

Once everything is in place, run the MySQL container for our 6-hour delayed slave:

$ docker run -d \
--name mysql-slave-6h \
-e MYSQL_ROOT_PASSWORD=password \
--mount type=bind,source=/storage/mysql-slave-6h/datadir,target=/var/lib/mysql \
--mount type=bind,source=/storage/mysql-slave-6h/mysql.conf.d,target=/etc/mysql/mysql.conf.d \
--mount type=bind,source=/storage/mysql-slave-6h/sql,target=/docker-entrypoint-initdb.d \
mysql:5.7

** The MYSQL_ROOT_PASSWORD value must be the same as the MySQL root password on the master.

The following lines are what we are looking for to verify if MySQL is running correctly and connected as a slave to our master (192.168.55.171):

$ docker logs -f mysql-slave-6h
...
2018-12-04T04:05:24.890244Z 0 [Note] mysqld: ready for connections.
Version: '5.7.24-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)
2018-12-04T04:05:25.010032Z 2 [Note] Slave I/O thread for channel '': connected to master 'rpl_user@192.168.55.171:3306',replication started in log 'FIRST' at position 4

You can then verify the replication status with following statement:

$ docker exec -it mysql-slave-6h mysql -uroot -p -e 'show slave status\G'
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                    SQL_Delay: 21600
                Auto_Position: 1
...

At this point, our 5 minutes, 1-hour and 6-hour delayed slave containers are replicating correctly and our architecture is looking something like this:

Disaster Recovery Scenario

Let's say a user has accidentally dropped a wrong column on a big table. Consider the following statement was executed on the master:

mysql> USE shop;
mysql> ALTER TABLE settings DROP COLUMN status;

If you are lucky enough to realize it immediately, you could use the 15-minute delayed slave to catch up to the moment before the disaster happens and promote it to become master, or export the missing data out and restore it on the master.

Firstly, we have to find the binary log position before the disaster happened. Grab the time now() on the master:

mysql> SELECT now();
+---------------------+
| now()               |
+---------------------+
| 2018-12-04 14:55:41 |
+---------------------+

Then, get the active binary log file on the master:

mysql> SHOW MASTER STATUS;
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                                                                                                                                     |
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| binlog.000004 | 20260658 |              |                  | 1560665e-ed2b-11e8-93fa-000c29b7f985:1-12031,
1b235f7a-d37b-11e8-9c3e-000c29bafe8f:1-62519,
1d8dc60a-e817-11e8-82ff-000c29bafe8f:1-326575,
791748b3-d37a-11e8-b03a-000c29b7f985:1-374 |
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Using the same date format, extract the information that we want from the binary log, binlog.000004. We estimate the start time to read from the binlog around 20 minutes ago (2018-12-04 14:35:00) and filter the output to show 25 lines before the "drop column" statement:

$ mysqlbinlog --start-datetime="2018-12-04 14:35:00" --stop-datetime="2018-12-04 14:55:41" /var/lib/mysql/binlog.000004 | grep -i -B 25 "drop column"
'/*!*/;
# at 19379172
#181204 14:54:45 server id 1  end_log_pos 19379232 CRC32 0x0716e7a2     Table_map: `shop`.`settings` mapped to number 766
# at 19379232
#181204 14:54:45 server id 1  end_log_pos 19379460 CRC32 0xa6187edd     Write_rows: table id 766 flags: STMT_END_F

BINLOG '
tSQGXBMBAAAAPAAAACC0JwEAAP4CAAAAAAEABnNidGVzdAAHc2J0ZXN0MgAFAwP+/gME/nj+PBCi
5xYH
tSQGXB4BAAAA5AAAAAS1JwEAAP4CAAAAAAEAAgAF/+AYwwAAysYAAHc0ODYyMjI0NjI5OC0zNDE2
OTY3MjY5OS02MDQ1NTQwOTY1Ny01MjY2MDQ0MDcwOC05NDA0NzQzOTUwMS00OTA2MTAxNzgwNC05
OTIyMzM3NzEwOS05NzIwMzc5NTA4OC0yODAzOTU2NjQ2MC0zNzY0ODg3MTYzOTswMTM0MjAwNTcw
Ni02Mjk1ODMzMzExNi00NzQ1MjMxODA1OS0zODk4MDQwMjk5MS03OTc4MTA3OTkwNQEAAADdfhim
'/*!*/;
# at 19379460
#181204 14:54:45 server id 1  end_log_pos 19379491 CRC32 0x71f00e63     Xid = 622405
COMMIT/*!*/;
# at 19379491
#181204 14:54:46 server id 1  end_log_pos 19379556 CRC32 0x62b78c9e     GTID    last_committed=11507    sequence_number=11508   rbr_only=no
SET @@SESSION.GTID_NEXT= '1560665e-ed2b-11e8-93fa-000c29b7f985:11508'/*!*/;
# at 19379556
#181204 14:54:46 server id 1  end_log_pos 19379672 CRC32 0xc222542a     Query   thread_id=3162  exec_time=1     error_code=0
SET TIMESTAMP=1543906486/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
ALTER TABLE settings DROP COLUMN status

In the bottom few lines of the mysqlbinlog output, you should have the erroneous command that was executed at position 19379556. The position that we should restore is one step before this, which is in position 19379491. This is the binlog position where we want our delayed slave to be up to.

Then, on the chosen delayed slave, stop the delayed replication slave and start again the slave to a fixed end position that we figured out above:

$ docker exec -it mysql-slave-15m mysql -uroot -p
mysql> STOP SLAVE;
mysql> START SLAVE UNTIL MASTER_LOG_FILE = 'binlog.000004', MASTER_LOG_POS = 19379491;

Monitor the replication status and wait until Exec_Master_Log_Pos is equal to Until_Log_Pos value. This could take some time. Once caught up, you should see the following:

$ docker exec -it mysql-slave-15m mysql -uroot -p -e 'SHOW SLAVE STATUS\G'
... 
          Exec_Master_Log_Pos: 19379491
              Relay_Log_Space: 50552186
              Until_Condition: Master
               Until_Log_File: binlog.000004
                Until_Log_Pos: 19379491
...

Finally verify if the missing data that we were looking for is there (column "status" still exists):

mysql> DESCRIBE shop.settings;
+--------+------------------+------+-----+---------+----------------+
| Field  | Type             | Null | Key | Default | Extra          |
+--------+------------------+------+-----+---------+----------------+
| id     | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| sid    | int(10) unsigned | NO   | MUL | 0       |                |
| param  | varchar(100)     | NO   |     |         |                |
| value  | varchar(255)     | NO   |     |         |                |
| status | int(11)          | YES  |     | 1       |                |
+--------+------------------+------+-----+---------+----------------+

Then export the table from our slave container and transfer it to the master server:

$ docker exec -it mysql-slave-1h mysqldump -uroot -ppassword --single-transaction shop settings > shop_settings.sql

Drop the problematic table and restore it back on the master:

$ mysql -uroot -p -e 'DROP TABLE shop.settings'
$ mysqldump -uroot -p -e shop < shop_setttings.sql

We have now recovered our table back to its original state before the disastrous event. To summarize, delayed replication can be used for several purposes:

  • To protect against user mistakes on the master. A DBA can roll back a delayed slave to the time just before the disaster.
  • To test how the system behaves when there is a lag. For example, in an application, a lag might be caused by a heavy load on the slave. However, it can be difficult to generate this load level. Delayed replication can simulate the lag without having to simulate the load. It can also be used to debug conditions related to a lagging slave.
  • To inspect what the database looked like in the past, without having to reload a backup. For example, if the delay is one week and the DBA needs to see what the database looked like before the last few days' worth of development, the delayed slave can be inspected.

Final Thoughts

With Docker, running multiple MySQL instances on a same physical host can be done efficiently. You may use Docker orchestration tools like Docker Compose and Swarm to simplify the multi-container deployment as opposed to the steps shown in this blog post.

by ashraf at December 05, 2018 09:50 AM

December 04, 2018

Peter Zaitsev

Percona Server for MySQL 5.7.24-26 Is Now Available

Percona Server for MySQL 8.0

Percona Server for MySQLPercona announces the release of Percona Server for MySQL 5.7.24-26 on December 4, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.7.24, including all the bug fixes in it. Percona Server for MySQL 5.7.24-26 is now the current GA release in the 5.7 series. All of Percona’s software is open-source and free.

This release includes fixes to the following upstream CVEs (Common Vulnerabilities and Exposures): CVE-2016-9843, CVE-2018-3155, CVE-2018-3143, CVE-2018-3156, CVE-2018-3251, CVE-2018-3133, CVE-2018-3144, CVE-2018-3185, CVE-2018-3247CVE-2018-3187, CVE-2018-3174, CVE-2018-3171. For more information, see Oracle Critical Patch Update Advisory – October 2018.

Improvements

  • PS-4790: Improve user statistics accuracy

Bugs Fixed

  • Slave replication could break if upstream bug #74145 (FLUSH LOGS improperly disables the logging if the log file cannot be accessed) occurred in master. Bug fixed PS-1017 (Upstream #83232).
  • Setting the tokudb_last_lock_timeout variable via the command line could cause the server to stop working when the actual timeout took place. Bug fixed PS-4943.
  • Dropping a TokuDB table with non-alphanumeric characters could lead to a crash. Bug fixed PS-4979.
  • When using the MyRocks storage engine, the server could crash after running ALTER TABLE DROP INDEX on a slave. Bug fixed PS-4744.
  • The audit log could be corrupted when the audit_log_rotations variable was changed at runtime. Bug fixed PS-4950.

Other Bugs Fixed

  • PS-4781: sql_yacc.yy uses SQLCOM_SELECT instead of SQLCOM_SHOW_XXXX_STATS
  • PS-4881: Add LLVM/clang 7 to Travis-CI
  • PS-4825: Backport MTR fixes from 8.0
  • PS-4998: Valgrind: compilation fails with: writing to ‘struct buf_buddy_free_t’ with no trivial copy-assignment
  • PS-4980: Valgrind: Syscall param write(buf) points to uninitialised byte(s): Event_encrypter::encrypt_and_write()
  • PS-4982: Valgrind: Syscall param io_submit(PWRITE) points to uninitialised byte(s): buf_dblwr_write_block_to_datafile()
  • PS-4983: Valgrind: Syscall param io_submit(PWRITE) points to uninitialised byte(s): buf_flush_write_block_low()
  • PS-4951: Many libc-related Valgrind errors on CentOS7
  • PS-5012: Valgrind: misused UNIV_MEM_ALLOC after ut_zalloc_nokey
  • PS-4908: UBSan and valgrind errors with encrypted temporary files
  • PS-4532: Replace obsolete HAVE_purify with HAVE_VALGRIND in ha_rocksdb.cc
  • PS-4955: Backport mysqld fixes for valgrind warnings from 8.0
  • PS-4529: MTR: index_merge_rocksdb2 inadvertently tests InnoDB instead of MyRocks
  • PS-5056: handle_fatal_signal (sig=11) in ha_tokudb::write_row
  • PS-5084: innodb_buffer_pool_size is an uninitialized variable
  • PS-4836: Missing PFS signed variable aggregation
  • PS-5033: rocksdb.show_engine: Result content mismatch
  • PS-5034: rocksdb.rocksdb: Result content mismatch
  • PS-5035: rocksdb.show_table_status: 1051: Unknown table ‘db_new’

Find the release notes for Percona Server for MySQL 5.6.24-26 in our online documentation. Report bugs in the Jira bug tracker.

 

by Borys Belinsky at December 04, 2018 05:54 PM

MongoDB 4.0: Using ACID Multi-Document Transactions

mongodb 4.0 acid compliant transactions

mongodb 4.0 acid compliant transactionsMongoDB 4.0 is around, and there are a lot of new features and improvements. In this article we’re going to focus on the major feature which is, undoubtedly, the support for multi-document ACID transactions. This novelty for a NoSQL database could be seen as a way to get closer to the relational world. Well, it’s not that—or maybe not just that. It’s a way to add to the document-based model a new, important, and often requested feature to address a wider range of use cases. The document model and its flexibility should remain the best way to start building an application on MongoDB. At this stage, transactions should be used in specific cases, when you absolutely need them: for example, because your application is aware of data consistency and atomicity. Transactions incur a greater performance cost over single document writes, so the denormalized data model will continue to be optimal in many cases and this helps to minimize the need for transactions.

Single writes are atomic by design: as long as you are able to embed documents in your collections you absolutely don’t need to use a transaction. Even so, transaction support is a very good and interesting feature that you can rely on in MongoDB from now on.

MongoDB 4.0 provides fully ACID transactions support but remember:

  • multi-document transactions are available for replica set deployments only
    • you can use transactions even on a standalone server but you need to configure it as a replica set (with just one node)
  • multi-document transactions are not available for sharded cluster
    • hopefully transactions will be available from version 4.2
  • multi-document transactions are available for the WiredTiger storage engine only

ACID transactions in MongoDB 4.0

ACID properties are well known in the world of relational databases, but let’s recap what the acronym means.

  • Atomicity: a group of commands inside the transaction must follow the “all or nothing” paradigm. If only one of the commands fails for any reason, the complete transaction fails as well.
  • Consistency: if a transaction successfully executes, it will take the database from one state that is consistent to another state that is also consistent.
  • Isolation: multiple transactions can run at the same time in the system. Isolation guarantees that each transaction is not able to view partial results of the others. Executing multiple transactions in parallel must have the same results as running them sequentially
  • Durability: it guarantees that a transaction that has committed will remain persistent, even in the case of a system failure

Limitations of transactions

The support for transactions introduced some limitations:

  • a collection MUST exist in order to use transactions
  • a collection cannot be created or dropped inside a transaction
  • an index cannot be created or dropped inside a transaction
  • non-CRUD operations are not permitted inside a transaction (for example, administrative commands like createUser are not permitted )
  • a transaction cannot read or write in config, admin, and local databases
  • a transaction cannot write to system.* collections
  • the size of a transaction is limited to 16MB
    • a single oplog entry is generated during the commit: the writes inside the transaction don’t have single oplog entries as in regular queries
    • the limitation is a consequence of the 16MB maximum size of any BSON document in the oplog
    • in case of larger transactions, you should consider splitting these into smaller transactions
  • by default a transaction that executes for longer then 60 seconds will automatically expire
    • you can change this using the configuration parameter transactionLifetimeLimitSeconds
    • transactions rely on WiredTiger snapshot capability, and having a long running transaction can result in high pressure on WiredTiger’s cache to maintain snapshots, and lead to the retention of a lot of unflushed operations in memory

Sessions

Sessions were deployed in version 3.6 in order to run the retryable writes (for example) but they are very important, too, for transactions. In fact any transaction is associated with an open session. Prior to starting a transaction, a session must be created. A transaction cannot be run outside a session.

At any given time you may have multiple running sessions in the system, but each session may run only a single transaction at a time. You can run transactions in parallel according to how many open sessions you have.

Three new commands were introduce for creating, committing, and aborting transactions:

  • session.startTransaction()
    • starts a new transaction in the current session
  • session.commitTransaction()
    • saves consistently and durably the changes made by the operations in the transaction
  • session.abortTransaction()
    • the transaction ends without saving any of the changes made by the operations in the transaction

Note: in the following examples, we use two different connections to create two sessions. We do this for the sake of simplicity, but remember that you can create multiple sessions even inside a single connection, assigning each session to a different variable.

Our first transaction

To test our first transaction if you don’t have a replica set already configured let’s start a standalone server like this:

#> mongod --dbpath /data/db --logpath /data/mongo.log --fork --replSet foo

Create a new collection, and insert some data.

foo:PRIMARY> use percona
switched to db percona
foo:PRIMARY> db.createCollection('people')
{
   "ok" : 1,
   "operationTime" : Timestamp(1538483120, 1),
   "$clusterTime" : {
      "clusterTime" : Timestamp(1538483120, 1),
      "signature" : {
         "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
         "keyId" : NumberLong(0)
       }
    }
}
foo:PRIMARY> db.people.insert([{_id:1, name:"Corrado"},{_id:2, name:"Peter"},{_id:3,name:"Heidi"}])

Create a session

foo:PRIMARY> session = db.getMongo().startSession()
session { "id" : UUID("dcfa7de5-527d-4b1c-a890-53c9a355920d") }

Start a transaction and insert some new documents

foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.insert([{_id: 4 , name : "George"},{_id: 5, name: "Tom"}])
WriteResult({ "nInserted" : 2 })

Now read the collection from inside and outside the session and see what happens

foo:PRIMARY> session.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
foo:PRIMARY> db.people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }

As you might notice, since the transaction is not yet committed, you can see the modifications only from inside the session. You cannot see any of the modifications outside of the session, even in the same connection. If you try to open a new connection to the database, then you will not be able to see any of the modifications either.

Now, commit the transaction and see that you can now read the same data both inside and outside the session, as well as from any other connection.

foo:PRIMARY> session.commitTransaction()
foo:PRIMARY> session.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
foo:PRIMARY> db.people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

When the transaction is committed, all the data are written consistently and durably in the database, just like any typical write. So, writing to the journal file and to the oplog takes place in the same way it as for any single write that’s not inside a transaction. As long as the transaction is open, any modification is stored in memory.

Isolation test

Let’s test now the isolation between two concurrent transactions.

Open the first connection, create a session and start a transaction:

//Connection #1
foo:PRIMARY> var session1 = db.getMongo().startSession()
foo:PRIMARY> session1.startTransaction()

do the same on the second connection:

//Connection #2
foo:PRIMARY> var session2 = db.getMongo().startSession()
foo:PRIMARY> session2.startTransaction()

Update the document on connection #1 to record Heidi’s document. Add the gender field to the document.

//Connection #1
foo:PRIMARY> session1.getDatabase("percona").people.update({_id:3},{$set:{ gender: "F" }})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

Update the same collection on connection #2 to add the same gender field to all the males:

//Connection #2
foo:PRIMARY> session2.getDatabase("percona").people.update({_id:{$in:[1,2,4,5]}},{$set:{ gender: "M" }},{multi:"true"})
WriteResult({ "nMatched" : 4, "nUpserted" : 0, "nModified" : 4 })
foo:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George", "gender" : "M" }
{ "_id" : 5, "name" : "Tom", "gender" : "M" }

The two transactions are isolated, each one can see only the ongoing modifications that it has made itself.

Commit the transaction in connection #1:

//Connection #1
foo:PRIMARY> session1.commitTransaction()
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

In the connection #2 read the collection:

//Connection #2
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M"  }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George", "gender" : "M"  }
{ "_id" : 5, "name" : "Tom", "gender" : "M"  }

As you can see the second transaction still sees its own modifications, and cannot see the already committed updates of the other transaction. This kind of isolation works the same as the “REPEATABLE READ” level of MySQL and other relational databases.

Now commit the transaction in connection #2 and see the new values of the collection:

//Connection #2
foo:PRIMARY> session2.commitTransaction()
foo:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George", "gender" : "M" }
{ "_id" : 5, "name" : "Tom", "gender" : "M" }

Conflicts

When two (or more) concurrent transactions modify the same documents, we may have a conflict. MongoDB can detect a conflict immediately, even while transactions are not yet committed. The first transaction to acquire the lock on a document will continue, the second one will receive the conflict error message and fail. The failed transaction can then be retried later.

Let’s see an example.

Create a new transaction in connection #1 to update Heidi’s document. We want to change the name to Luise.

//Connection #1
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Luise"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Let’s try to modify the same document in a concurrent transaction in connection #2. Modify the name from Heidi to Marie in this case.

//Connection #2
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Marie"}})
WriteCommandError({
    "errorLabels" : [
       "TransientTransactionError"
    ],
    "operationTime" : Timestamp(1538495683, 1),
    "ok" : 0,
    "errmsg" : "WriteConflict",
    "code" : 112,
    "codeName" : "WriteConflict",
    "$clusterTime" : {
       "clusterTime" : Timestamp(1538495683, 1),
       "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
       }
     }
})

We received an error and the transaction failed. We can retry it later.

Other details

  • the individual writes inside the transaction are not retry-able even if retryWrites is set to true
  • each commit operation is a retry-able write operation regardless of whether retryWrites is set to true. The drivers retry the commit a single time in case of an error.
  • Read Concern supports snapshot, local and majority values
  • Write Concern can be set at the transaction level. The individual operations inside the transaction ignore the write concern. Write concern is evaluated during the commit
  • Read Preference supports only primary value

Conclusions

Transaction support in MongoDB 4.0 is a very interesting new feature, but it isn’t fully mature yet, there are strong limitations at this stage: a transaction cannot be larger than 16MB, you cannot use it on sharded clusters and others. If you absolutely need a transaction in your application use it. But don’t use transactions only because they are cool, since in some cases a proper data model based on embedding documents in collections and denormalizing your data could be the best solution. MongoDB isn’t by its nature a relational database; as long as you are able to model your data keeping in mind that it’s a NOSQL database you should avoid using transactions. In specific cases, or if you already have a database with strong “informal relations” between the collections that you cannot change, then you could choose to rely on transactions.

Image modified from original photo: by Annie Spratt on Unsplash

by Corrado Pandiani at December 04, 2018 01:13 PM

December 03, 2018

Peter Zaitsev

Percona Live 2019 Call for Papers is Now Open!

Percona Live CFP 2019

Percona Live 2019Announcing the opening of the Percona Live 2019 Open Source Database Conference call for papers. It will be open from now until January 20, 2019. The Percona Live Open Source Database Conference 2019 takes place May 28-30 in Austin, Texas.

Our theme this year is CONNECT. ACCELERATE. INNOVATE.

As a speaker at Percona Live, you’ll have the opportunity to CONNECT with your peers—open source database experts and enthusiasts who share your commitment to improving knowledge and exchanging ideas. ACCELERATE your projects and career by presenting at the premier open source database event, a great way to build your personal and company brands. And influence the evolution of the open source software movement by demonstrating how you INNOVATE!

Community initiatives remain core to the open source ethos, and we are proud of the contribution we make with Percona Live in showcasing thought leading practices in the open source database world.

With a nod to innovation, this year we are introducing a business track to benefit those business leaders who are exploring the use of open source and are interested in learning more about its costs and benefits.

Speaking Opportunities

The Percona Live Open Source Database Conference 2019 Call for Papers is open until January 20, 2019. We invite you to submit your speaking proposal for breakout, tutorial or lightning talk sessions. Classes and talks are invited for Foundation (either entry-level or of general interest to all), Core (intermediate), and Masterclass (advanced) levels.

  • Breakout Session. Broadly cover a technology area using specific examples. Sessions should be either 25 minutes or 50 minutes in length (including Q&A).
  • Tutorial Session. Present a technical session that aims for a level between a training class and a conference breakout session. We encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be three or six hours in length (including Q&A).
  • Lightning Talk. Give a five-minute presentation focusing on one key point that interests the open source community: technical, lighthearted or entertaining talks on new ideas, a successful project, a cautionary story, a quick tip or demonstration.

If your proposal is selected for breakout or tutorial sessions, you will receive a complimentary full conference pass.

Topics and Themes

We want proposals that cover the many aspects of application development using all open source databases, as well as new and interesting ways to monitor and manage database environments. Did you just embrace open source databases this year? What are the technical and business values of moving to or using open source databases? How did you convince your company to make the move? Was there tangible ROI?

Best practices and current trends, including design, application development, performance optimization, HA and clustering, cloud, containers and new technologies –  what’s holding your focus? Share your case studies, experiences and technical knowledge with an engaged audience of open source peers.

In the submission entry, indicate which of these themes your proposal best fits: tutorial, business needs; case studies/use cases; operations; or development. Also include which track(s) from the list below would be best suited to your talk.

Tracks

The conference committee is looking for proposals that cover the many aspects of using, deploying and managing open source databases, including:

  • MySQL. Do you have an opinion on what is new and exciting in MySQL? With the release of MySQL 8.0, are you using the latest features? How and why? Are they helping you solve any business issues, or making deployment of applications and websites easier, faster or more efficient? Did the new release influence you to change to MySQL? What do you see as the biggest impact of the MySQL 8.0 release? Do you use MySQL in conjunction with other databases in your environment?
  • MariaDB. Talks highlighting MariaDB and MariaDB compatible databases and related tools. Discuss the latest features, how to optimize performance, and demonstrate the best practices you’ve adopted from real production use cases and applications.
  • PostgreSQL. Why do you use PostgreSQL as opposed to other SQL options? Have you done a comparison or benchmark of PostgreSQL vs. other types of databases related to your applications? Why, and what were the results? How does PostgreSQL help you with application performance or deployment? How do you use PostgreSQL in conjunction with other databases in your environment?
  • MongoDB. Has the 4.0 release improved your experience in application development or time-to-market? How are the new features making your database environment better? What is it about MongoDB 4.0 that excites you? What are your experiences with Atlas? Have you moved to it, and has it lived up to its promises? Do you use MongoDB in conjunction with other databases in your environment?
  • Polyglot Persistence. How are you using multiple open source databases together? What tools and technologies are helping you to get them interacting efficiently? In what ways are multiple databases working together helping to solve critical business issues? What are the best practices you’ve discovered in your production environments?
  • Observability and Monitoring. How are you designing your database-powered applications for observability? What monitoring tools and methods are providing you with the best application and database insights for running your business? How are you using tools to troubleshoot issues and bottlenecks? How are you observing your production environment in order to understand the critical aspects of your deployments? 
  • Kubernetes. How are you running open source databases on the Kubernetes, OpenShift and other container platforms? What software are you using to facilitate their use? What best practices and processes are making containers a vital part of your business strategy? 
  • Automation and AI. How are you using automation to run databases at scale? Are you using automation to create self-running, self-healing, and self-tuning databases? Is machine learning and artificial intelligence (AI) helping you create a new generation of database automation?
  • Migration to Open Source Databases. How are you migrating to open source databases? Are you migrating on-premises or to the cloud? What are the tools and strategies you’ve used that have been successful, and what have you learned during and after the migration? Do you have real-world migration stories that illustrate how best to migrate?
  • Database Security and Compliance. All of us have experienced security and compliance challenges. From new legislation like GDPR, PCI and HIPAA, exploited software bugs, or new threats such as ransomware attacks, when is enough “enough”? What are your best practices for preventing incursions? How do you maintain compliance as you move to the cloud? Are you finding that security and compliance requirements are preventing your ability to be agile?
  • Other Open Source Databases. There are many, many great open source database software and solutions we can learn about. Submit other open source database talk ideas – we welcome talks for both established database technologies as well as the emerging new ones that no one has yet heard about (but should).
  • Business and Enterprise. Has your company seen big improvements in ROI from using Open Source Databases? Are there efficiency levels or interesting case studies you want to share? How did you convince your company to move to Open Source?

How to Respond to the Call for Papers

For information on how to submit your proposal, visit our call for papers page.

Sponsorship

If you would like to obtain a sponsor pack for Percona Live Open Source Database Conference 2019, you will find more information including a prospectus on our sponsorship page. You are welcome to contact me, Bronwyn Campbell, directly.

by Bronwyn Campbell at December 03, 2018 05:43 PM

December 02, 2018

Valeriy Kravchuk

Fun with Bugs #74 - On MySQL Bug Reports I am Subscribed to, Part XI

For some reason the Committee of FOSDEM 2019 MySQL, MariaDB & Friends Devroom of all my talks submitted picked up the one on how to create a useful MySQL bug report, so I have no options but continue to write about MySQL bugs, as long and MySQL Community wants and even prefers to listen and read about them... That's what I do, with pleasure.

Today I'll continue my series of posts about community bug reports I am subscribed to with the review of bugs reported since October 1, 2018, starting from the oldest and skipping those MySQL 8 regression ones I've already commented on:
  • Bug #92609 - "Upgrade to 8.0.12 fails". This bug reported by Frederic Steinfels is about upgrade from MySQL 5.7 that leads to crash. Nice workaround was found:
    "The work around is delete all .TRG file (or move them to out side of mysql data folder) then update. After success we can re-create the trigger."
    The bug is really fixed in MySQL 8.0.14 according to the last comment, but for some reason it is still "Verified". Probably will be closed when MySQL 8.0.14 is released.
  • Bug #92631 - "importing dump from mysqldump --all-databases breaks SYS schema due to routines". This bug affecting MySQL 5.7 (and not 8.0) was reported by Shane Bester. Workaround is actually documented in the manual - add sys schema explicitly while dumping and dump it separately, then re-create.
  • Bug #92661 - "SELECT on key partitioned enum reading all partitions instead of 1". Interesting corner case found by Frederic Steinfels. MariaDB 10.3.7 also seems affected:
    MariaDB [test]> explain partitions select id from product where outdated='0';
    +------+-------------+---------+------------+------+---------------+----------+-
    --------+-------+------+--------------------------+
    | id   | select_type | table   | partitions | type | possible_keys | key      |
    key_len | ref   | rows | Extra                    |
    +------+-------------+---------+------------+------+---------------+----------+-
    --------+-------+------+--------------------------+
    |    1 | SIMPLE      | product | p0,p1      | ref  | outdated      | outdated |
    1       | const |    2 | Using where; Using index |
    +------+-------------+---------+------------+------+---------------+----------+-
    --------+-------+------+--------------------------+
    1 row in set (0.002 sec)

    MariaDB [test]> explain partitions select id from product where outdated='1';
    +------+-------------+---------+------------+------+---------------+----------+-
    --------+-------+------+--------------------------+
    | id   | select_type | table   | partitions | type | possible_keys | key      |
    key_len | ref   | rows | Extra                    |
    +------+-------------+---------+------------+------+---------------+----------+-
    --------+-------+------+--------------------------+
    |    1 | SIMPLE      | product | p0,p1      | ref  | outdated      | outdated |
    1       | const | 2048 | Using where; Using index |
    +------+-------------+---------+------------+------+---------------+----------+-
    --------+-------+------+--------------------------+
    1 row in set (0.002 sec)
  • Bug #92809 - "Inconsistent ResultSet for different Execution Plans". The full test case is not public and it took a lot of arguing until this bug (reported by Juan Arruti) was finally "Verified". Based on the workaround, setting optimizer_switch='materialization=off', this feature of MySQL optimizer is still problematic.
  • Bug #92850 - "Bad select+order by+limit performance in 5.7". As Sveta Smirnova demonstrated, there are still cases when FORCE INDEX hints really needed to help optimizer to use proper plan, even in somewhat obvious case of single table and single proper index... What a surprise!
  • Bug #92882 - "MTS not replication crash-safe with GTID and all the right parameters." As  Jean-François Gagné proved, at least statement-based multi-threaded replication with GTIDs is not safe in case of OS crash. Good that there is an easy enough workaround: stop slave; reset slave; start slave; See also his Bug #93081 - "Please implement a better relay log recovery." that refers to several more known problems with relay log recovery.
  • Bug #92949 - "add auto_increment column as PK cause RBR replication inconsistent". This probably should never happen in production (adding primary key while data are already active changed concurrently), but still this is a nice corner case reported by Fungo Wang.
  • Bug #92996 - "ANALYZE TABLE still locks tables 10 years later". Domas Mituzas is trying hard to escalate this well known problem of blocking queries if ANALYZE TABLE was executed at some wrong time (when long running query against the table was in progress). The problem was resolved this year in Percona Server, see this blog post. See also my MDEV-15101 (fix is planned for version 10.4 in MariaDB).
  • Bug #93033 - "Missing info on partitioned tables in I_S.INNODB_COLUMNS after upgrade to 8.0". Yet another regression in MySQL 8 vs 5.7 was reported by Alexey Kopytov.
  • Bug #93049 - "ORDER BY pk, otherdata should just use PK". There is no reason to use filesort, as Domas Mituzas kindly noted. MariaDB is also, unfortunately, affected.
  • Bug #93083 - "InnoDB: Failing assertion: srv_read_only_mode || trx->in_depth > 0". This Severity 6 bug (only debug binaries are directly affected) was reported by Ramesh Sivaraman from Percona QA. At least it was verified instantly (I've subscribed to double check what happens to bugs with low severity levels).
I'll stop for now. More detailed review of remaining bugs reported in November is coming soon.

Sheep are everywhere in East Sussex and bugs are everywhere in MySQL. Not that many, but still they are visible.
To summarize my conclusions from this list:
  1. Sometimes it takes too much efforts to force proper bug report processing. I had written more about this here.
  2. Having materialization=on in optimizer_switch in MySQL 5.7+ may cause wrong results. Take care to double check.
  3. There still cases of single table SELECTs where optimizer can do a much better job. FORCE INDEX helps, sometimes.
  4. Multi-threaded statement-based replication in MySQL 5.6 and 5.7 is not crash safe, even with GTIDs, relay_log_info_repository-TABLE and relay_log_recovery=ON. A lot of improvements in relay log recovery are needed.
  5. Oracle engineers still care to document workarounds in active public bug reports. This is great!
  6. Percona still fixes some really important and annoying bugs way before other vendors.
  7. MySQL 8 is different in many small details, so regressions are to be expected.

by Valeriy Kravchuk (noreply@blogger.com) at December 02, 2018 04:30 PM

November 30, 2018

Peter Zaitsev

PostgreSQL Streaming Physical Replication With Slots

postgres replication using slots

PostgreSQLPostgreSQL streaming physical replication with slots simplifies setup and maintenance procedures. Usually, you should estimate disk usage for the Write Ahead Log (WAL) and provide appropriate limitation to the number of segments and setup of the WAL archive procedure. In this article, you will see how to use replication with slots and understand what problems it could solve.

Introduction

PostgreSQL physical replication is based on WAL. Th Write Ahead Log contains all database changes, saved in 16MB segment files. Normally postgres tries to keep segments between checkpoints. So with default settings, just 1GB of WAL segment files is available.

Replication requires all WAL files created after backup and up until the current time. Previously, it was necessary to keep a huge archive directory (usually mounted by NFS to all slave servers). The slots feature introduced in 9.4 allows Postgres to track the latest segment downloaded by a slave server. Now, PostgreSQL can keep all segments on disk, even without archiving, if a slave is seriously behind its master due to downtime or networking issues. The drawback: the disk space could be consumed infinitely in the case of configuration error. Before continuing, if you need a better understanding of physical replication and streaming replication, I recommend you read “Streaming Replication with PostgreSQL“.

Create a sandbox with two PostgreSQL servers

To setup replication, you need at least two PostgreSQL servers. I’m using pgcli (pgc) to setup both servers on the same host. It’s easy to install on Linux, Windows, and OS X, and provides the ability to download and run any version of PostgreSQL on your staging server or even on your laptop.

python -c "$(curl -fsSL https://s3.amazonaws.com/pgcentral/install.py)"
mv bigsql master
cp -r master slave
$ cd master
master$ ./pgc install pg10
master$ ./pgc start pg10
$ cd ../slave
slave$ ./pgc install pg10
slave$ ./pgc start pg10

First of all you should allow the replication user to connect:

master$ echo "host replication replicator 127.0.0.1/32 md5" >> ./data/pg10/pg_hba.conf

If you are running master and slave on different servers, please replace 127.0.0.1 with the slave’s address.

Next pgc creates a shell environment file with PATH and all the other variables required for PostgreSQL:

master$ source ./pg10/pg10.env

Allow connections from the remote host, and create a replication user and slot on master:

master$ psql
postgres=# CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'replicator';
CREATE ROLE
postgres=# ALTER SYSTEM SET listen_addresses TO '*';
ALTER SYSTEM
postgres=# SELECT pg_create_physical_replication_slot('slot1');
pg_create_physical_replication_slot
-------------------------------------
(slot1,)

To apply system variables changes and hba.conf, restart the Postgres server:

master$ ./pgc stop ; ./pgc start
pg10 stopping
pg10 starting on port 5432

Test table

Create a table with lots of padding on the master:

master$ psql psql (10.6) Type "help" for help.
postgres=# CREATE TABLE t(id INT, pad CHAR(200));
postgres=# CREATE INDEX t_id ON t (id);
postgres=# INSERT INTO t SELECT generate_series(1,1000000) AS id, md5((random()*1000000)::text) AS pad;

Filling WAL with random data

To see the benefits of slots, we should fill the WAL with some data by running transactions. Repeat the update statement below to generate a huge amount of WAL data:

UPDATE t SET pad = md5((random()*1000000)::text);

Checking the current WAL size

You can check total size for all WAL segments from the shell or from psql:

master$ du -sh data/pg10/pg_wal
17M data/pg10/pg_wal
master$ source ./pg10/pg10.env
master$ psql
postgres=# \! du -sh data/pg10/pg_wal
17M data/pg10/pg_wal

Check maximum WAL size without slots activated

Before replication configuration, we can fill the WAL with random data and find that after 1.1G, the data/pg10/pg_wal directory size does not increase regardless of the number of update queries.

postgres=# UPDATE t SET pad = md5((random()*1000000)::text); -- repeat 4 times
postgres=# \! du -sh data/pg10/pg_wal
1.1G data/pg10/pg_wal
postgres=# UPDATE t SET pad = md5((random()*1000000)::text);
postgres=# \! du -sh data/pg10/pg_wal
1.1G data/pg10/pg_wal

Backup master from the slave server

Next, let’s make a backup for our slot1:

slave$ source ./pg10/pg10.env
slave$ ./pgc stop pg10
slave$ rm -rf data/pg10/*
# If you are running master and slave on different servers, replace 127.0.0.1 with master's IP address.
slave$ PGPASSWORD=replicator pg_basebackup -S slot1 -h 127.0.0.1 -U replicator -p 5432 -D $PGDATA -Fp -P -Xs -Rv

Unfortunately pg_basebackup hangs with: initiating base backup, waiting for checkpoint to complete.
We can wait for the next checkpoint, or force the checkpoint on the master. Checkpoint happens every checkpoint_timeout seconds, and is set to five minutes by default.

Forcing checkpoint on master:

master$ psql
postgres=# CHECKPOINT;

The backup continues on the slave side:

pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/92000148 on timeline 1
pg_basebackup: starting background WAL receiver
1073986/1073986 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/927FDDE8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed

The backup copies settings from the master, including its TCP port value. I’m running both master and slave on the same host, so I should change the port in the slave .conf file:

slave$ vim data/pg10/postgresql.conf
# old value port = 5432
port = 5433

Now we can return to the master and run some queries:

slave$ cd ../master
master$ source pg10/pg10.env
master$ psql
postgres=# UPDATE t SET pad = md5((random()*1000000)::text);
UPDATE t SET pad = md5((random()*1000000)::text);

By running these queries, the WAL size is now 1.4G, and it’s bigger than 1.1G! Repeat this update query three times and the WAL grows to 2.8GB:

master$ du -sh data/pg10/pg_wal
2.8G data/pg10/pg_wal

Certainly, the WAL could grow infinitely until whole disk space is consumed.
How do we find out the reason for this?

postgres=# SELECT redo_lsn, slot_name,restart_lsn,
round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind
FROM pg_control_checkpoint(), pg_replication_slots;
redo_lsn    | slot_name | restart_lsn | gb_behind
------------+-----------+-------------+-----------
1/2A400630  | slot1     |  0/92000000 | 2.38

We have one slot behind the master of 2.38GB.

Let’s repeat the update and check again. The gap has increased:

postgres=# postgres=# SELECT redo_lsn, slot_name,restart_lsn,
round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind
FROM pg_control_checkpoint(), pg_replication_slots;
redo_lsn    | slot_name | restart_lsn | gb_behind
------------+-----------+-------------+-----------
1/8D400238  |     slot1 | 0/92000000  | 3.93

Wait, though: we have already used slot1 for backup! Let’s start the slave:

master$ cd ../slave
slave$ ./pgc start pg10

Replication started without any additional change to recovery.conf:

slave$ cat data/pg10/recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=replicator password=replicator passfile=''/home/pguser/.pgpass'' host=127.0.0.1 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'
primary_slot_name = 'slot1'

pg_basebackup -R option instructs backup to write to the recovery.conf file with all required options, including primary_slot_name.

WAL size, all slots connected

The gap reduced several seconds after the slave started:

postgres=# SELECT redo_lsn, slot_name,restart_lsn,
round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind
FROM pg_control_checkpoint(), pg_replication_slots;
redo_lsn    | slot_name | restart_lsn | gb_behind
------------+-----------+-------------+-----------
 1/8D400238 |     slot1 |  0/9A000000 | 3.80

And a few minutes later:

postgres=# SELECT redo_lsn, slot_name,restart_lsn,
round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind
FROM pg_control_checkpoint(), pg_replication_slots;
redo_lsn    | slot_name | restart_lsn | gb_behind
------------+-----------+-------------+-----------
 1/9E5DECE0 |     slot1 |  1/9EB17250 | -0.01
postgres=# \!du -sh data/pg10/pg_wal
1.3G data/pg10/pg_wal

Slave server maintenance

Let’s simulate slave server maintenance with ./pgc stop pg10 executed on the slave. We’ll push some data onto the master again (execute the UPDATE query 4 times).

Now, “slot1” is again 2.36GB behind.

Removing unused slots

By now, you might realize that a problematic slot is not in use. In such cases, you can drop it to allow retention for segments:

master$ psql
postgres=# SELECT pg_drop_replication_slot('slot1');

Finally the disk space is released:

master$ du -sh data/pg10/pg_wal
1.1G data/pg10/pg_wal

Important system variables

  • archive_mode is not required for streaming replication with slots.
  • wal_level – is replica by default
  • max_wal_senders – set to 10 by default, a minimum of three for one slave, plus two for each additional slave
  • wal_keep_segments – 32 by default, not important because PostgreSQL will keep all segments required by slot
  • archive_command – not important for streaming replication with slots
  • listen_addresses – the only option that it’s necessary to change, to allow remote slaves to connect
  • hot_standby – set to on by default, important to enable reads on slave
  • max_replication_slots – 10 by default https://www.postgresql.org/docs/10/static/runtime-config-replication.html

Summary

  • Physical replication setup is really easy with slots. By default in pg10, all settings are already prepared for replication setup.
  • Be careful with orphaned slots. PostgreSQL will not remove WAL segments for inactive slots with initialized restart_lsn.
  • Check pg_replication_slots restart_lsn value and compare it with current redo_lsn.
  • Avoid long downtime for slave servers with slots configured.
  • Please use meaningful names for slots, as that will simplify debug.

References

by Nickolay Ihalainen at November 30, 2018 05:05 PM

Jean-Jerome Schmidt

Database Backup Encryption - Best Practices

Offsite backup storage should be a critical part of any organisation’s disaster recovery plan. The ability to store data in a separate physical location, where it could survive a catastrophic event which destroys all the data in your primary data center, ensures your data survival and continuity of your organisation. A Cloud storage service is quite a good method to store offsite backups. No matter if you are using a cloud provider or if you are just copying data to an external data center, the backup encryption is a must in such cases. In one of our previous blogs, we discussed several methods of encrypting your backups. Today we will focus on some best practices around backup encryption.

Ensure that your secrets are safe

To encrypt and decrypt your data you have to use some sort of a password or a key. Depending on the encryption method (symmetrical or asymmetrical), it can be one secret for both encryption and decryption or it can be a public key for encryption and a private key for decryption. What is important, you should keep those safe. If you happen to use asymmetric encryption, you should focus on the private key, the one you will use for decrypting backups.

You can store keys in a key management system or a vault - there are numerous options on the market to pick from like Amazon’s KMS or Hashicorp’s Vault. Even if you decide not to use those solutions, you still should apply generic security practices like to ensure that only the correct users can access your keys and passwords. You should also consider preparing your backup scripts in a way that you will not expose keys or passwords in the list of running processes. Ideally, put them in the file instead of passing them as an argument to some commands.

Consider asymmetric encryption

The main difference between symmetric and asymmetric encryption is that while using symmetric encryption for both encryption and decryption, you use a single key or password. This requires higher security standards on both ends of the process. You have to make sure that the host on which you encrypt the data is very secure as a leak of the symmetric encryption key will allow the access to all of your encrypted backups.

On the other hand, if you use asymmetric encryption, you have two keys: the public key for encrypting the data and the private key for decryption. This makes things so much easier - you don’t really have to care about the public key. Even if it would be compromised, it will still not allow for any kind of access to the data from backups. You have to focus on the security of the private key only. It is easier - you are most likely encrypting backups on a daily basis (if not more frequent) while restore happens from time to time, making it feasible to store the private key in more secure location (even on a dedicated physical device). Below is a very quick example on how you can use gpg to generate a key pair and use it to encrypt data.

First, you have to generate the keys:

root@vagrant:~# gpg --gen-key
gpg (GnuPG) 1.4.20; Copyright (C) 2015 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

gpg: directory `/root/.gnupg' created
gpg: new configuration file `/root/.gnupg/gpg.conf' created
gpg: WARNING: options in `/root/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/root/.gnupg/secring.gpg' created
gpg: keyring `/root/.gnupg/pubring.gpg' created
Please select what kind of key you want:
   (1) RSA and RSA (default)
   (2) DSA and Elgamal
   (3) DSA (sign only)
   (4) RSA (sign only)
Your selection?
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 4096
Requested keysize is 4096 bits
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0)
Key does not expire at all
Is this correct? (y/N) y

You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"

Real name: Krzysztof Ksiazek
Email address: my@backups.cc
Comment: Backup key
You selected this USER-ID:
    "Krzysztof Ksiazek (Backup key) <my@backups.cc>"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? o
You need a Passphrase to protect your secret key.

This created both public and private keys. Next, you want to export your public key to use for encrypting the data:

gpg --armor --export my@backups.cc > mybackupkey.asc

Next, you can use it to encrypt your backup.

root@vagrant:~# xtrabackup  --backup --stream=xbstream  | gzip | gpg -e --armor -r my@backups.cc -o /backup/pgp_encrypted.backup

Finally, an example how you can use your primary key (in this case it’s stored in the local key ring) to decrypt your backups:

root@vagrant:/backup# gpg -d /backup/pgp_encrypted.backup | gunzip | xbstream -x
encryption: using gcrypt 1.6.5

You need a passphrase to unlock the secret key for
user: "Krzysztof Ksiazek (Backup key) <my@backups.cc>"
4096-bit RSA key, ID E047CD69, created 2018-11-19 (main key ID BC341551)

gpg: gpg-agent is not available in this session
gpg: encrypted with 4096-bit RSA key, ID E047CD69, created 2018-11-19
      "Krzysztof Ksiazek (Backup key) <my@backups.cc>"

Rotate your encryption keys

No matter what kind of encryption you implemented, symmetric or asymmetric, you have to think about the key rotation. First of all, it is very important to have a mechanism in place to rotate the keys. This might be useful in case of a security breach, and you would have to quickly change keys that you use for backup encryption and decryption. Of course, in case of a security breach, you need to consider what is going to happen with the old backups which were encrypted using compromised keys. They have been compromised although they still may be useful and required as per Recovery Point Objective. There are couple of options including re-encrypting them or moving them to a non-compromised localization.

Speed up the encryption process by parallelizing it

If you have an option to implement parallelization of the encryption process, consider it. Encryption performance mostly depends on the CPU power, thus allowing more CPU cores to work in parallel to encrypt the file should result in much smaller encryption times. Some of the encryption tools give such option. One of them is xtrabackup which has an option to use embedded encryption and parallelize the process.

What you are looking for is either “--encrypt-key” or “--encrypt-key-file” options which enable embedded encryption. While doing that you can also define “--encrypt-threads” and “--encrypt-chunk-size”. Second increases a working buffer for encryption, first defines how many threads should be used for encryption.

Of course, this is just one of the solutions you can implement. You can achieve this using shell tools. An example below:

root@vagrant:~# files=2 ; mariabackup --user=root --backup --pass=pass --stream=xbstream  |split -b 60M - backup ; ls backup* |  parallel -j ${files} --workdir "$(pwd)" 'echo "encrypting {}" ; openssl  enc -aes-256-cbc -salt -in "{}" -k mypass > "111{}"'

This is by no means a perfect solution as you have to know in advance how big, more or less, the backup will be to split it to predefined number of files matching the parallelization level you want to achieve (if you want to use 2 CPU cores, you should have two files, if you want to use 4 cores, 4 files etc). It also requires disk space that is twice the size of the backup, as at first it generates multiple files using split and then encryption creates another set of encrypted files. On the other hand, if your data set size is acceptable and you would like to improve encryption performance, that’s an option you can consider. To decrypt the backup you will have to decrypt each of the individual files and then use ‘cat’ to join them together.

Test your backups

No matter how you are going to implement the backup encryption, you have to test it. First of all, all backups have to be tested, encrypted or not. Backups may not be complete, or may suffer from some type of corruption. You cannot be sure that your backup can be restored until you actually perform the restore. That’s why regular backup verification is a must. Encryption adds more complexity to the backup process. Issues may show up at the encryption time, again - bugs or glitches may corrupt the encrypted files. Once encrypted, the question is then if it is possible to decrypt it and restore?

You should have a restore test process in place. Ideally, the restore test would be executed after each backup. As a minimum, you should test your backups a couple of times per year. Definitely you have to test it as soon as a change in the backup process had been introduced. Have you added compression to the backup? Did you change the encryption method? Did you rotate the encryption key? All of those actions may have some impact on your ability to actually restore your backup. Therefore you should make sure you test the whole process after every change.

ClusterControl can automate the verification process, both on demand or scheduled after every backup.

To verify an existing backup, you just need to pick the one from the list, click on “Restore” option and then go through the restore wizard. First, you need to verify which backup you want to restore.

Then, on the next step, you should pick the restore and verify option.

You need to pass some information about the host on which you want to test the restore. It has to be accessible via SSH from the ClusterControl instance. You may decide to keep the restore test server up and running (and then dump some partial data from it if you wanted to go for a partial restore) or shut it down.

The final step is all about verifying if you made the correct choices. If yes, you can start the backup verification job.

If the verification completed successfully, you will see that the backup is marked as verified on the list of the backups.

If you want to automate this process, it is also possible with ClusterControl. When scheduling the backup you can enable backup verification:

This adds another step in the backup scheduling wizard.

Here you again have to define the host which you want to use for backup restore tests, decide if you want to install the software on it (or maybe you already have it done), if you want to keep the restore server up and whether you want to test the backup immediately after it is completed or maybe you want to wait a bit.

by krzysztof at November 30, 2018 10:29 AM

November 29, 2018

Oli Sennhauser

MariaDB indexing of NULL values

In the recent MariaDB DBA advanced training class the question came up if MariaDB can make use of an index when searching for NULL values... And to be honest I was not sure any more. So instead of reading boring documentation I did some little tests:

Search for NULL

First I started with a little test data set. Some of you might already know it:

CREATE TABLE null_test (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, data VARCHAR(32) DEFAULT NULL
, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP() 
);

INSERT INTO null_test VALUES (NULL, 'Some data to show if null works', NULL);
INSERT INTO null_test SELECT NULL, 'Some data to show if null works', NULL FROM null_test;
... up to 1 Mio rows

Then I modified the data according to my needs to see if the MariaDB Optimizer can make use of the index:

-- Set 0.1% of the rows to NULL
UPDATE null_test SET data = NULL WHERE ID % 1000 = 0;

ALTER TABLE null_test ADD INDEX (data);

ANALYZE TABLE null_test;

and finally I run the test (MariaDB 10.3.11):

EXPLAIN EXTENDED
SELECT * FROM null_test WHERE data IS NULL;
+------+-------------+-----------+------+---------------+------+---------+-------+------+----------+-----------------------+
| id   | select_type | table     | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra                 |
+------+-------------+-----------+------+---------------+------+---------+-------+------+----------+-----------------------+
|    1 | SIMPLE      | null_test | ref  | data          | data | 35      | const | 1047 |   100.00 | Using index condition |
+------+-------------+-----------+------+---------------+------+---------+-------+------+----------+-----------------------+

We can clearly see that the MariaDB Optimizer considers and uses the index and its estimation of about 1047 rows is quite appropriate.

Unfortunately the optimizer chooses the completely wrong strategy (3 times slower) for the opposite query:

EXPLAIN EXTENDED
SELECT * FROM null_test WHERE data = 'Some data to show if null works';
+------+-------------+-----------+------+---------------+------+---------+-------+--------+----------+-----------------------+
| id   | select_type | table     | type | possible_keys | key  | key_len | ref   | rows   | filtered | Extra                 |
+------+-------------+-----------+------+---------------+------+---------+-------+--------+----------+-----------------------+
|    1 | SIMPLE      | null_test | ref  | data          | data | 35      | const | 522351 |   100.00 | Using index condition |
+------+-------------+-----------+------+---------------+------+---------+-------+--------+----------+-----------------------+

Search for NOT NULL

Now let us try to test the opposite problem:

CREATE TABLE anti_null_test (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, data VARCHAR(32) DEFAULT NULL
, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
);

INSERT INTO anti_null_test VALUES (NULL, 'Some data to show if null works', NULL);
INSERT INTO anti_null_test SELECT NULL, 'Some data to show if null works', NULL FROM anti_null_test;
... up to 1 Mio rows

Then I modified the data as well but this time in the opposite direction:

-- Set 99.9% of the rows to NULL
UPDATE anti_null_test SET data = NULL WHERE ID % 1000 != 0;

ALTER TABLE anti_null_test ADD INDEX (data);

ANALYZE TABLE anti_null_test;

and then we have to test again the query:

EXPLAIN EXTENDED
SELECT * FROM anti_null_test WHERE data IS NOT NULL;
+------+-------------+----------------+-------+---------------+------+---------+------+------+----------+-----------------------+
| id   | select_type | table          | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                 |
+------+-------------+----------------+-------+---------------+------+---------+------+------+----------+-----------------------+
|    1 | SIMPLE      | anti_null_test | range | data          | data | 35      | NULL | 1047 |   100.00 | Using index condition |
+------+-------------+----------------+-------+---------------+------+---------+------+------+----------+-----------------------+

Also in this case the MariaDB Optimizer considers and uses the index and produces a quite fast Query Execution Plan.

Also in this case the optimizer behaves wrong for the opposite query:

EXPLAIN EXTENDED
SELECT * FROM anti_null_test WHERE data IS NULL;
+------+-------------+----------------+------+---------------+------+---------+-------+--------+----------+-----------------------+
| id   | select_type | table          | type | possible_keys | key  | key_len | ref   | rows   | filtered | Extra                 |
+------+-------------+----------------+------+---------------+------+---------+-------+--------+----------+-----------------------+
|    1 | SIMPLE      | anti_null_test | ref  | data          | data | 35      | const | 523506 |   100.00 | Using index condition |
+------+-------------+----------------+------+---------------+------+---------+-------+--------+----------+-----------------------+
Taxonomy upgrade extras: 

by Shinguz at November 29, 2018 07:10 PM

Peter Zaitsev

Percona Server for MySQL 5.6.42-84.2 Is Now Available

Percona Server for MySQL 8.0

Percona Server for MySQL 5.6Percona announces the release of Percona Server 5.6.42-84.2 on November 29, 2018 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.6.42, including all the bug fixes in it, Percona Server 5.6.42-84.2 is the current GA release in the Percona Server 5.6 series. All of Percona‘s software is open-source and free.

Improvements

  • PS-4790: Improve user statistics accuracy

Bugs Fixed

  • Slave replication could break if upstream bug #74145 (FLUSH LOGS improperly disables the logging if the log file cannot be accessed) occurred in master. Bug fixed PS-1017 (Upstream #83232).
  • The binary log could be corrupted when the disk partition used for temporary. files (tmpdir system variable) had little free space. Bug fixed PS-1107 (Upstream #72457).
  • PURGE CHANGED_PAGE_BITMAPS did not work when the innodb_data_home_dir system variable was used. Bug fixed PS-4723.
  • Setting the tokudb_last_lock_timeout variable via the command line could cause the server to stop working when the actual timeout took place. Bug fixed PS-4943.
  • Dropping TokuDB table with non-alphanumeric characters could lead to a crash. Bug fixed PS-4979.

Other bugs fixed

  • PS-4781: sql_yacc.yy uses SQLCOM_SELECT instead of SQLCOM_SHOW_XXXX_STATS
  • PS-4529: MTR: index_merge_rocksdb2 inadvertently tests InnoDB instead of MyRocks
  • PS-4746: Revert our fix for PS-3851 (Percona Ver 5.6.39-83.1 Failing assertion: sym_node->table != NULL)
  • PS-4773: Percona Server sources can’t be compiled without server
  • PS-4785: Setting version_suffix to NULL leads to handle_fatal_signal (sig=11) in Sys_var_version::global_value_ptr
  • PS-4813: Using flush_caches leads to SELinux denial errors
  • PS-4881: Add LLVM/clang 7 to Travis-CI

Find the release notes for Percona Server for MySQL 5.6.42-84.2 in our online documentation. Report bugs in the Jira bug tracker.

 

by Borys Belinsky at November 29, 2018 05:21 PM

Jean-Jerome Schmidt

Cloud Backup Options for MySQL & MariaDB Databases

The principal objective of backing up your data is, of course, the ability to roll back and access your archives in case of hardware failure. To do business today, you need the certainty of knowing that in the case of disaster, your data will be protected and accessible. You would need to store your backups offsite, in case your datacenter goes down in flames.

Data protection remains a challenge for small and medium-sized businesses. Small-to-medium sized businesses prefer to archive their company’s data using direct-attached storage, with the majority of firms having plans to do offsite backup copies. Local storage approach can lead to one of the most severe dilemmas the modern company can face - loss of data in case of disaster.

Many factors come into deliberation when judging on whether to allow a business critical database to be transferred offsite, and when choosing a suitable vendor to do so. Traditional methods like writing to tape and shipping to a remote location can be a complicated process that requires special hardware, adequately trained staff and procedures to ensure that backups are regularly produced, protected and that the information contained in them is verified for integrity. Small businesses usually have small IT budgets. Often they can not afford to have a secondary datacenter, even if they have a dedicated data center. But nevertheless, it is still important to keep a copy of your backup files offsite. Disasters like hurricane, flood, fire or theft can destroy your servers and storage. Keeping backed up data in the separate data center ensures data is safe, no matter what is going on in your primary datacenter. Cloud storage is a great way of addressing this problem.
With the cloud backup approach, there are a number of factors to consider. Some of the questions you have are:

  • Is backed-up data secured at rest in the external data center?
  • Is transfer to or from the external data center through the public internet network safe?
  • Is there an effect on RTO (Recovery Time Objective)?
  • Is the backup and recovery process easy enough for our IT staff?
  • Are there any changes required to existing processes?
  • Are the 3rd party backup tools needed?
  • What are the additional costs in terms of required software or data transfer?
  • What are the storage costs?

Backup features when doing a backup to the cloud

If your MySQL server or backup destination is located in an exposed infrastructure like a public cloud, hosting provider or connected through an untrusted WAN network, you need to think about additional actions in your backup policy. There are few different ways to perform database backups for MySQL, and depending on the type of backup, recovery time, size, and infrastructure options will vary. Since many of the cloud storage solutions are simply storage with different API front ends, any backup solution can be performed with a bit of scripting. So what are the options we have to make process smooth and secure?

Encryption

It is always a good idea to enforce encryption to enhance the security of backup data. A simple use case to implement encryption is where you want to push the backup to an offsite backup storage located in the public cloud.

When creating an encrypted backup, one thing to have in mind is that it usually takes more time to recover. The backup has to be decrypted before any recovery activities. With a big dataset, this could introduce some delays to the RTO.

On the other hand, if you are using private key for encryption, make sure to store the key in a safe place. If the private key is missing, the backup will be useless and unrecoverable. If the key is stolen, all created backups that use the same key would be compromised as they are no longer secured. You can use the popular GnuPG or OpenSSL to generate the private or public keys.
To perform mysqldump encryption using GnuPG, generate a private key and follow the wizard accordingly:

$ gpg --gen-key

Create a plain mysqldump backup as usual:

$ mysqldump --routines --events --triggers --single-transaction db1 | gzip > db1.tar.gz

Encrypt the dump file and remove the older plain backup:

$ gpg --encrypt -r ‘admin@email.com’ db1.tar.gz
$ rm -f db1.tar.gz

GnuPG will automatically append .gpg extension on the encrypted file. To decrypt,
simply run the gpg command with --decrypt flag:

$ gpg --output db1.tar.gz --decrypt db1.tar.gz.gpg

To create an encrypted mysqldump using OpenSSL, one has to generate a private key and a public key:
OpenSSL req -x509 -nodes -newkey rsa:2048 -keyout dump.priv.pem -out dump.pub.pem

This private key (dump.priv.pem) must be kept in a safe place for future decryption. For mysqldump, an encrypted backup can be created by piping the content to openssl, for example

mysqldump --routines --events --triggers --single-transaction database | openssl smime -encrypt -binary -text -aes256
-out database.sql.enc -outform DER dump.pub.pem

To decrypt, simply use the private key (dump.priv.pem) alongside the -decrypt flag:
openssl smime -decrypt -in database.sql.enc -binary -inform

DEM -inkey dump.priv.pem -out database.sql

Percona XtraBackup can be used to encrypt or decrypt local or streaming backups with xbstream option to add another layer of protection to the backups. Encryption is done with the libgcrypt library. Both --encrypt-key option and --encryptkey-file option can be used to specify the encryption key. Encryption keys can be generated with commands like

$ openssl rand -base64 24
$ bWuYY6FxIPp3Vg5EDWAxoXlmEFqxUqz1

This value then can be used as the encryption key. Example of the innobackupex command using the --encrypt-key:

$ innobackupex --encrypt=AES256 --encrypt-key=”bWuYY6FxIPp3Vg5EDWAxoXlmEFqxUqz1” /storage/backups/encrypted

The output of the above OpenSSL command can also be redirected to a file and can be treated as a key file:

openssl rand -base64 24 > /etc/keys/pxb.key

Use it with the --encrypt-key-file option instead:

innobackupex --encrypt=AES256 --encrypt-key-file=/etc/keys/pxb.key /storage/backups/encrypted

To decrypt, simply use the --decrypt option with appropriate --encrypt-key or --encrypt-key-file:

$ innobackupex --decrypt=AES256 --encrypt-key=”bWuYY6FxIPp3Vg5EDWAxoXlmEFqxUqz1”
/storage/backups/encrypted/2018-11-18_11-10-09/

For more information about MySQL and MariaDB encryption, please check our another blog post.

Compression

Within the database cloud backup world, compression is one of your best friends. It can not only save storage space, but it can also significantly reduce the time required to download/upload data.
There are lots of compression tools available out there, namely gzip, bzip2, zip, rar, and 7z.
Normally, mysqldump can have best compression rates as it is a flat text file. Depending on the compression tool and ratio, a compressed mysqldump can be up to 6 times smaller than the original backup size. To compress the backup, you can pipe the mysqldump output to a compression tool and redirect it to a destination file. You can also skip several things like comments, lock tables statement (if InnoDB), skip GTID purged and triggers:

mysqldump --single-transaction --skip-comments --skip-triggers --skip-lock-tables --set-gtid-purged OFF --all-databases | gzip > /storage/backups/all-databases.sql.gz

With Percona Xtrabackup, you can use the streaming mode (innobackupex), which sends the backup to STDOUT in special tar or xbstream format instead of copying files to the backup directory. Having a compressed backup could save you up to 50% of the original backup size, depending on the dataset. Append the --compress option in the backup command. By using the xbstream in streaming backups, you can speed up the compression process by using the --compress-threads option. This option specifies the number of threads created by xtrabackup for parallel data compression. The default value for this option is 1. To use this feature, add the option to a local backup. An example backup with compression:

innobackupex --stream=xbstream --compress --compress-threads=4 > /storage/backups/backup.xbstream

Before applying logs during the preparation stage, compressed files will need to be
decompressed using xbstream:
Then, use qpress to extract each file ending with .qp in their respective directory before
running --apply-log command to prepare the MySQL data.

$ xbstream -x < /storage/backups/backup.xbstream

Limit network throughput

An great option for cloud backups is to limit network streaming bandwidth (Mb/s) when doing a backup. You can achieve that with pv tool. The pv utility comes with data modifiers option -L RATE, --rate-limit RATE which limit the transfer to a maximum of RATE bytes per second. Below example will restrict it to 2MB/s.

$ pv -q -L 2m

In below example, you can see xtrabackup with parallel gzip, encryption

 /usr/bin/innobackupex --defaults-file=/etc/mysql/my.cnf --galera-info --parallel 4 --stream=xbstream --no-timestamp . | pv -q -L 2m | pigz -9 - | openssl enc -aes-256-cbc -pass file:/var/tmp/cmon-008688-19992-72450efc3b6e9e4f.tmp > /home/ubuntu/backups/BACKUP-3445/backup-full-2018-11-28_213540.xbstream.gz.aes256 ) 2>&1.

Transfer backup to Cloud

Now when your backup is compressed and encrypted, it is ready for transfer.

Google cloud

The gsutil command line tool is used to manage, monitor and use your storage buckets on Google Cloud Storage. If you already installed the gcloud util, you already have the gsutil installed. Otherwise, follow the instructions for your Linux distribution from here.

To install the gcloud CLI you can follow below procedure:

curl https://sdk.cloud.google.com | bash

Restart your shell:

exec -l $SHELL

Run gcloud init to initialize the gcloud environment:

gcloud init

With the gsutil command line tool installed and authenticated, create a regional storage bucket named mysql-backups-storage in your current project.

gsutil mb -c regional -l europe-west1 gs://severalnines-storage/
Creating gs://mysql-backups-storage/

Amazon S3

If you are not using RDS to host your databases, it is very probable that you are doing your own backups. Amazon’s AWS platform, S3 (Amazon Simple Storage Service) is a data storage service that can be used to store database backups or other business critical files. Either it’s Amazon EC2 instance or your on-prem environment you can use the service to secure your data.

While backups can be uploaded through the web interface, the dedicated s3 command line interface can be used to do it from the command line and through backup automation scripts. If backups are to be kept for a very long time, and recovery time isn’t a concern, backups can be transferred to Amazon Glacier service, providing much cheaper long-term storage. Files (amazon objects) are logically stored in a huge flat container named bucket. S3 presents a REST interface to its internals. You can use this API to perform CRUD operations on buckets and objects, as well as to change permissions and configurations on both.

The primary distribution method for the AWS CLI on Linux, Windows, and macOS is pip, a package manager for Python. Instruction can be found here.

aws s3 cp severalnines.sql s3://severalnine-sbucket/mysql_backups

By default S3 provides eleven 9s object durability. It means that if you store 1.000.000.000 (1 billion) objects into it, you can expect to lose 1 object every 10 years on average. The way S3 achieves that impressive number of 9s is by replicating the object automatically in multiple Availability Zones, which we’ll talk about in another post. Amazon has regional datacenters all around the world.

Microsoft Azure Storage

Microsoft’s public cloud platform, Azure, has storage options with their control line interface. Information can be found here. The open-source, cross-platform Azure CLI provides a set of commands for working with the Azure platform. It gives much of the functionality seen in the Azure portal, including rich data access.

The installation of Azure CLI is fairly simple, you can find instructions here. Below you can find how to transfer your backup to Microsoft storage.

az storage blob upload --container-name severalnines --file severalnines.sql --name severalnines_backup
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Hybrid Storage for MySQL and MariaDB backups

With the growing public and private cloud storage industry, we have a new category called hybrid storage. This technology allows the files to be stored locally, with changes automatically synced to remote in the cloud. Such an approach is coming from the need of having recent backups stored locally for fast restore (lower RTO), as well as business continuity objectives.
The important aspect of efficient resource usage is to have separate backup retentions. Data that is stored locally, on redundant disk drives would be kept for a shorter period while cloud backup storage would be held for a longer time. Many times the requirement for longer backup retention comes from legal obligations for different industries (like telecoms having to store connection metadata). Cloud providers like Google Cloud Services, Microsoft Azure and Amazon S3 each offer virtually unlimited storage, decreasing local space needs. It allows you to retain your backup files longer, for as long as you would like and not have concerns around local disk space.

ClusterControl backup management - hybrid storage
ClusterControl backup management - hybrid storage

When scheduling backup with ClusterControl, each of the backup methods are configurable with a set of options on how you want the backup to be executed. The most important for the hybrid cloud storage would be:

  • Network throttling
  • Encryption with the build in key management
  • Compression
  • Retention period for the local backups
  • Retention period for the cloud backups
ClusterControl dual backup retention
ClusterControl dual backup retention
ClusterControl advanced backup features for cloud, parallel compression, network bandwitch limit, encryption etc ...
ClusterControl advanced backup features for cloud, parallel compression, network bandwitch limit, encryption etc ...

Conclusion

The cloud has changed the data backup industry. Because of its affordable price point, smaller businesses have an offsite solution that backs up all of their data.

Your company can take advantage of cloud scalability and pay-as-you-go pricing for growing storage needs. You can design a backup strategy to provide both local copies in the datacenter for immediate restoration, and a seamless gateway to cloud storage services from AWS, Google and Azure.

Advanced TLS and AES 256-bit encryption and compression features support secure backups that take up significantly less space in the cloud.

by Bart Oles at November 29, 2018 03:26 PM

Peter Zaitsev

MySQL High Availability: Stale Reads and How to Fix Them

solutions for MySQL Stale Reads

solutions for MySQL Stale ReadsContinuing on the series of blog posts about MySQL High Availability, today we will talk about stale reads and how to overcome this issue.

The Problem

Stale reads is a read operation that fetches an incorrect value from a source that has not synchronized an update operation to the value (source Wiktionary).

A practical scenario is when your application applies INSERT or UPDATE data to your master/writer node, and has to read it immediately after. If this particular read is served from another server in the replication/cluster topology, the data is either not there yet (in case of an INSERT) or it still provides the old value (in case of an UPDATE).

If your application or part of your application is sensitive to stale reads, then this is something to consider when implementing HA/load balancing.

How NOT to fix stale reads

While working with customers, we have seen a few incorrect attempts to fix the issue:

SELECT SLEEP(X)

The most common incorrect approach that we see in Percona support is when customers add a sleep between the write and the read. This may work in some cases, but it’s not 100% reliable for all scenarios, and it can add latency when there is no need.

Let’s review an example where by the time you query your slave, the data is already applied and you have configured your transaction to start with a SELECT SLEEP(1). In this case, you just added 1000ms latency when there was no need for it.

Another example could be when the slave is lagging behind for more than whatever you configured as the parameter on the sleep command. In this case, you will have to create a login to keep trying the sleep until the slave has received the data: potentially it could take several seconds.

Reference: SELECT SLEEP.

Semisync replication

By default, MySQL replication is asynchronous, and this is exactly what causes the stale read. However, MySQL distributes a plugin that can make the replication semi-synchronous. We have seen customers enabling it hoping the stale reads problem will go away. In fact, that is not the case. The semi-synchronous plugin only ensures that at least one slave has received it (IO Thread has streamed the binlog event to relay log), but the action of applying the event is done asynchronously. In other words, stale reads are still a problem with semi-sync replication.

Reference: Semisync replication.

How to PROPERLY fix stale reads

There are several ways to fix/overcome this situation, and each one has its pros and cons:

1) MASTER_POS_WAIT

Consists of executing a SHOW MASTER STATUS right after your write, getting the binlog file and position, connecting on a slave, and executing the SELECT MASTER_POS_WAIT function, passing the binlog file and position as parameters. The execution will block until the slave has applied the position via the function. You can optionally pass a timeout to exit the function in case of exceeding this timeout.

Pros:

  • Works on all MySQL versions
  • No prerequisites

Cons:

  • Requires an application code rewrite.
  • It’s a blocking operation, and can add significant latency to queries in cases where a slave/node is too far behind.

Reference: MASTER_POS_WAIT.

2) WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS

Requires GTID: this is similar to the previous approach, but in this case, we need to track the executed GTID from the master (also available on SHOW MASTER STATUS).

Pros:

  • Works on all MySQL versions.

Cons:

  • Requires an application code rewrite.
  • It’s a blocking operation, can add significant latency to queries in cases where a slave/node is too far behind.
  • As it requires GTID, it only works on versions from 5.6 onwards.

Reference: WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS

3) Querying slave_relay_log_info

Consists of enabling relay_log_info_repository=TABLE and sync_relay_log_info=1 on the slave, and using a similar approach to option 1. After the write, execute  SHOW MASTER STATUS, connect to the slave, and query mysql.slave_relay_log_info , passing the binlog name and position to verify if the slave is already applying a position after the one you got from SHOW MASTER STATUS.

Pros:

  • This is not a blocking operation.
  • In cases where the slave is missing the position you require, you can try to connect to another slave and repeat the process. There is even an option to fail over back to the master if none of the slaves have the said position.

Cons:

  • Requires an application code rewrite.
  • In cases of checking multiple slaves, this can add significant latency.

Reference: slave_relay_log_info.

4) wsrep-sync-wait

Requires Galera/Percona XtraDB Cluster: Consists of setting a global/session variable to enforce consistency. This will block execution of subsequent queries until the node has applied all write-sets from it’s applier queue. It can be configured to trigger on multiple commands, such as SELECT, INSERT, and so on.

Pros:

  • Easy to implement. Built-in as a SESSION variable.

Cons:

  • Requires an application code rewrite in the event that you want to implement the solution on per session basis.
  • It’s a blocking operation, and can add significant latency to queries if a slave/node is too far behind.

Reference: wsrep-sync-wait

5) ProxySQL 2.0 GTID consistent reads

Requires MySQL 5.7 and GTID: MySQL 5.7 returns the GTID generated by a commit as part of the OK package. ProxySQL with the help of binlog readers installed on MySQL servers can keep track of which GTID the slave has already applied. With this information + the GTID received from the OK package at the moment of the write, ProxySQL will decide if it will route a subsequent read to one of the slaves/read nodes or if the master/write node will serve the read.

Pros:

  • Transparent to the application – no code changes are required.
  • Adds minimal latency.

Cons:

  • This still a new feature of ProxySQL 2.0, which is not yet GA.

Referece: GTID consistent reads.

Conclusions

Undesirable issues can arise from adding HA and distributing the load across multiple servers. Stale reads can cause an impact on applications sensitive to them. We have demonstrated various approaches you can use to overcome them.


Photo by Tim Evans on Unsplash

by Marcelo Altmann at November 29, 2018 02:51 PM