Planet MariaDB

June 22, 2017

Peter Zaitsev

ClickHouse in a General Analytical Workload (Based on a Star Schema Benchmark)

ClickHouse

ClickHouseIn this blog post, we’ll look at how ClickHouse performs in a general analytical workload using the star schema benchmark test.

We have mentioned ClickHouse in some recent posts (ClickHouse: New Open Source Columnar Database, Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark), where it showed excellent results. ClickHouse by itself seems to be event-oriented RDBMS, as its name suggests (clicks). Its primary purpose, using Yandex Metrica (the system similar to Google Analytics), also points to an event-based nature. We also can see there is a requirement for date-stamped columns.

It is possible, however, to use ClickHouse in a general analytical workload. This blog post shares my findings. For these tests, I used a Star Schema benchmark — slightly-modified so that able to handle ClickHouse specifics.

First, let’s talk about schemas. We need to adjust to ClickHouse data types. For example, the biggest fact table in SSB is “lineorder”. Below is how it is defined for Amazon RedShift (as taken from https://docs.aws.amazon.com/redshift/latest/dg/tutorial-tuning-tables-create-test-data.html):

CREATE TABLE lineorder
(
  lo_orderkey          INTEGER NOT NULL,
  lo_linenumber        INTEGER NOT NULL,
  lo_custkey           INTEGER NOT NULL,
  lo_partkey           INTEGER NOT NULL,
  lo_suppkey           INTEGER NOT NULL,
  lo_orderdate         INTEGER NOT NULL,
  lo_orderpriority     VARCHAR(15) NOT NULL,
  lo_shippriority      VARCHAR(1) NOT NULL,
  lo_quantity          INTEGER NOT NULL,
  lo_extendedprice     INTEGER NOT NULL,
  lo_ordertotalprice   INTEGER NOT NULL,
  lo_discount          INTEGER NOT NULL,
  lo_revenue           INTEGER NOT NULL,
  lo_supplycost        INTEGER NOT NULL,
  lo_tax               INTEGER NOT NULL,
  lo_commitdate        INTEGER NOT NULL,
  lo_shipmode          VARCHAR(10) NOT NULL
);

For ClickHouse, the table definition looks like this:

CREATE TABLE lineorderfull (
        LO_ORDERKEY             UInt32,
        LO_LINENUMBER           UInt8,
        LO_CUSTKEY              UInt32,
        LO_PARTKEY              UInt32,
        LO_SUPPKEY              UInt32,
        LO_ORDERDATE            Date,
        LO_ORDERPRIORITY        String,
        LO_SHIPPRIORITY         UInt8,
        LO_QUANTITY             UInt8,
        LO_EXTENDEDPRICE        UInt32,
        LO_ORDTOTALPRICE        UInt32,
        LO_DISCOUNT             UInt8,
        LO_REVENUE              UInt32,
        LO_SUPPLYCOST           UInt32,
        LO_TAX                  UInt8,
        LO_COMMITDATE           Date,
        LO_SHIPMODE             String
)Engine=MergeTree(LO_ORDERDATE,(LO_ORDERKEY,LO_LINENUMBER),8192);

From this we can see we need to use datatypes like UInt8 and UInt32, which are somewhat unusual for database world datatypes.

The second table (RedShift definition):

CREATE TABLE customer
(
  c_custkey      INTEGER NOT NULL,
  c_name         VARCHAR(25) NOT NULL,
  c_address      VARCHAR(25) NOT NULL,
  c_city         VARCHAR(10) NOT NULL,
  c_nation       VARCHAR(15) NOT NULL,
  c_region       VARCHAR(12) NOT NULL,
  c_phone        VARCHAR(15) NOT NULL,
  c_mktsegment   VARCHAR(10) NOT NULL
);

For ClickHouse, I defined as:

CREATE TABLE customerfull (
        C_CUSTKEY       UInt32,
        C_NAME          String,
        C_ADDRESS       String,
        C_CITY          String,
        C_NATION        String,
        C_REGION        String,
        C_PHONE         String,
        C_MKTSEGMENT    String,
        C_FAKEDATE      Date
)Engine=MergeTree(C_FAKEDATE,(C_CUSTKEY),8192);

For reference, the full schema for the benchmark is here: https://github.com/vadimtk/ssb-clickhouse/blob/master/create.sql.

For this table, we need to define a rudimentary column C_FAKEDATE Date in order to use ClickHouse’s most advanced engine (MergeTree). I was told by the ClickHouse team that they plan to remove this limitation in the future.

To generate data acceptable by ClickHouse, I made modifications to ssb-dbgen. You can find my version here: https://github.com/vadimtk/ssb-dbgen. The most notable change is that ClickHouse can’t accept dates in CSV files formatted as “19971125”. It has to be “1997-11-25”. This is something to keep in mind when loading data into ClickHouse.

It is possible to do some preformating on the load, but I don’t have experience with that. A common approach is to create the staging table with datatypes that match loaded data, and then convert them using SQL functions when inserting to the main table.

Hardware Setup

One of the goals of this benchmark to see how ClickHouse scales on multiple nodes. I used a setup of one node, and then compared to a setup of three nodes. Each node is 24 cores of “Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz” CPUs, and the data is located on a very fast PCIe Flash storage.

For the SSB benchmark I use a scale factor of 2500, which provides (in raw data):

Table lineorder – 15 bln rows, raw size 1.7TB, Table customer – 75 mln rows

When loaded into ClickHouse, the table lineorder takes 464GB, which corresponds to a 3.7x compression ratio.

We compare a one-node (table names lineorderfull, customerfull) setup vs. a three-node (table names lineorderd, customerd) setup.

Single Table Operations

Query:

SELECT
    toYear(LO_ORDERDATE) AS yod,
    sum(LO_REVENUE)
FROM lineorderfull
GROUP BY yod

One node:

7 rows in set. Elapsed: 9.741 sec. Processed 15.00 billion rows, 90.00 GB (1.54 billion rows/s., 9.24 GB/s.)

Three nodes:

7 rows in set. Elapsed: 3.258 sec. Processed 15.00 billion rows, 90.00 GB (4.60 billion rows/s., 27.63 GB/s.)

We see a speed up of practically three times. Handling 4.6 billion rows/s is blazingly fast!

One Table with Filtering

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorderfull
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)

One node:

1 rows in set. Elapsed: 3.175 sec. Processed 2.28 billion rows, 18.20 GB (716.60 million rows/s., 5.73 GB/s.)

Three nodes:

1 rows in set. Elapsed: 1.295 sec. Processed 2.28 billion rows, 18.20 GB (1.76 billion rows/s., 14.06 GB/s.)

It’s worth mentioning that during the execution of this query, ClickHouse was able to use ALL 24 cores on each box. This confirms that ClickHouse is a massively parallel processing system.

Two Tables (Independent Subquery)

In this case, I want to show how Clickhouse handles independent subqueries:

SELECT sum(LO_REVENUE)
FROM lineorderfull
WHERE LO_CUSTKEY IN
(
    SELECT C_CUSTKEY AS LO_CUSTKEY
    FROM customerfull
    WHERE C_REGION = 'ASIA'
)

One node:

1 rows in set. Elapsed: 28.934 sec. Processed 15.00 billion rows, 120.00 GB (518.43 million rows/s., 4.15 GB/s.)

Three nodes:

1 rows in set. Elapsed: 14.189 sec. Processed 15.12 billion rows, 121.67 GB (1.07 billion rows/s., 8.57 GB/s.)

We  do not see, however, the close to 3x speedup on three nodes, because of the required data transfer to perform the match LO_CUSTKEY with C_CUSTKEY

Two Tables JOIN

With a subquery using columns to return results, or for GROUP BY, things get more complicated. In this case we want to GROUP BY the column from the second table.

First, ClickHouse doesn’t support traditional subquery syntax, so we need to use JOIN. For JOINs, ClickHouse also strictly prescribes how it must be written (a limitation that will also get changed in the future). Our JOIN should look like:

SELECT
    C_REGION,
    sum(LO_EXTENDEDPRICE * LO_DISCOUNT)
FROM lineorderfull
ANY INNER JOIN
(
    SELECT
        C_REGION,
        C_CUSTKEY AS LO_CUSTKEY
    FROM customerfull
) USING (LO_CUSTKEY)
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)
GROUP BY C_REGION

One node:

5 rows in set. Elapsed: 31.443 sec. Processed 2.35 billion rows, 28.79 GB (74.75 million rows/s., 915.65 MB/s.)

Three nodes:

5 rows in set. Elapsed: 25.160 sec. Processed 2.58 billion rows, 33.25 GB (102.36 million rows/s., 1.32 GB/s.)

In this case the speedup is not even two times. This corresponds to the fact of the random data distribution for the tables lineorderd and customerd. Both tables were defines as:

CREATE TABLE lineorderd AS lineorder ENGINE = Distributed(3shards, default, lineorder, rand());
CREATE TABLE customerd AS customer ENGINE = Distributed(3shards, default, customer, rand());

Where  rand() defines that records are distributed randomly across three nodes. When we perform a JOIN by LO_CUSTKEY=C_CUSTKEY, records might be located on different nodes. One way to deal with this is to define data locally. For example:

CREATE TABLE lineorderLD AS lineorderL ENGINE = Distributed(3shards, default, lineorderL, LO_CUSTKEY);
CREATE TABLE customerLD AS customerL ENGINE = Distributed(3shards, default, customerL, C_CUSTKEY);

Three Tables JOIN

This is where it becomes very complicated. Let’s consider the query that you would normally write:

SELECT sum(LO_REVENUE),P_MFGR, toYear(LO_ORDERDATE) yod FROM lineorderfull ,customerfull,partfull WHERE C_REGION = 'ASIA' and
LO_CUSTKEY=C_CUSTKEY and P_PARTKEY=LO_PARTKEY GROUP BY P_MFGR,yod ORDER BY P_MFGR,yod;

With Clickhouse’s limitations on JOINs syntax, the query becomes:

SELECT
    sum(LO_REVENUE),
    P_MFGR,
    toYear(LO_ORDERDATE) AS yod
FROM
(
    SELECT
        LO_PARTKEY,
        LO_ORDERDATE,
        LO_REVENUE
    FROM lineorderfull
    ALL INNER JOIN
    (
        SELECT
            C_REGION,
            C_CUSTKEY AS LO_CUSTKEY
        FROM customerfull
    ) USING (LO_CUSTKEY)
    WHERE C_REGION = 'ASIA'
)
ALL INNER JOIN
(
    SELECT
        P_MFGR,
        P_PARTKEY AS LO_PARTKEY
    FROM partfull
) USING (LO_PARTKEY)
GROUP BY
    P_MFGR,
    yod
ORDER BY
    P_MFGR ASC,
    yod ASC

By writing queries this way, we force ClickHouse to use the prescribed JOIN order — at this moment there is no optimizer in ClickHouse and it is totally unaware of data distribution.

There is also not much speedup when we compare one node vs. three nodes:

One node execution time:

35 rows in set. Elapsed: 697.806 sec. Processed 15.08 billion rows, 211.53 GB (21.61 million rows/s., 303.14 MB/s.)

Three nodes execution time:

35 rows in set. Elapsed: 622.536 sec. Processed 15.12 billion rows, 211.71 GB (24.29 million rows/s., 340.08 MB/s.)

There is a way to make the query faster for this 3-way JOIN, however. (Thanks to Alexander Zaytsev from https://www.altinity.com/ for help!)

Optimized query:

SELECT
    sum(revenue),
    P_MFGR,
    yod
FROM
(
    SELECT
        LO_PARTKEY AS P_PARTKEY,
        toYear(LO_ORDERDATE) AS yod,
        SUM(LO_REVENUE) AS revenue
    FROM lineorderfull
    WHERE LO_CUSTKEY IN
    (
        SELECT C_CUSTKEY
        FROM customerfull
        WHERE C_REGION = 'ASIA'
    )
    GROUP BY
        P_PARTKEY,
        yod
)
ANY INNER JOIN partfull USING (P_PARTKEY)
GROUP BY
    P_MFGR,
    yod
ORDER BY
    P_MFGR ASC,
    yod ASC

Optimized query time:

One node:

35 rows in set. Elapsed: 106.732 sec. Processed 15.00 billion rows, 210.05 GB (140.56 million rows/s., 1.97 GB/s.)

Three nodes:

35 rows in set. Elapsed: 75.854 sec. Processed 15.12 billion rows, 211.71 GB (199.36 million rows/s., 2.79 GB/s.

That’s an improvement of about 6.5 times compared to the original query. This shows the importance of understanding data distribution, and writing the optimal query to process the data.

Another option for dealing with JOIN complexity, and to improve performance, is to use ClickHouse’s dictionaries. These dictionaries are described here: https://www.altinity.com/blog/2017/4/12/dictionaries-explained.

I will review dictionary performance in future posts.

Another traditional way to deal with JOIN complexity in an analytics workload is to use denormalization. We can move some columns (for example, P_MFGR from the last query) to the facts table (lineorder).

Observations

  • ClickHouse can handle general analytical queries (it requires special schema design and considerations, however)
  • Linear speedup is possible, but it depends on query design and requires advanced planning — proper speedup depends on data locality
  • ClickHouse is blazingly fast (beyond what I’ve seen before) because it can use all available CPU cores for query, as shown above using 24 cores for single server and 72 cores for three nodes
  • Multi-table JOINs are cumbersome and require manual work to achieve better performance, so consider using dictionaries or denormalization

by Vadim Tkachenko at June 22, 2017 07:20 PM

June 21, 2017

MariaDB AB

Secure Binlog Server: Encrypted binary Logs and SSL Communication

Secure Binlog Server: Encrypted binary Logs and SSL Communication massimiliano_pinto_g Wed, 06/21/2017 - 19:23

The 2.1.3 GA release of MariaDB MaxScale, introduces the following key features for the secure setup of MaxScale Binlog Server:

  • The binlog cache files in the MaxScale host can now be encrypted.

  • MaxScale binlog server also uses SSL in communication with the master and the slave servers.

The MaxScale binlog server can optionally encrypt the events received from the master server: the setup requires a MariaDB (from 10.1.7) master server with encryption active and the mariadb10-compatibility=On option set in maxscale.cnf. This way both master and MaxScale will have encrypted events stored in the binlog files. How does the Binary log encryption work in MariaDB Server and in MaxScale?

Let’s look at MariaDB Server 10.1 or 10.2 implementation first.

  • The encryption is related to stored events and not to their transmission over the network: SSL must be enabled in order to secure the network layer.

  • Each binlog file holds a new special Binlog Event, type 0xa4 (164), called START_ENCRIPTION event: it's written to disk but never sent to connected slave servers.

  • Each binlog event is encrypted/decrypted using an AES Key and an Initialization Vector (IV) built from the "nonce" data in START_ENCRIPTION event and the current event position in the file.

  • The encrypted event has the same size as that of a non encrypted event.

Let’s start with MariaDB Server 10.1.7 configuration: my.cnf.

[mysqld]
…
encrypt-binlog=1
plugin-load-add=file_key_management.so
file_key_management_encryption_algorithm=aes_cbc
file_key_management_filename = /some_path/keys.txt

Binlog files can be optionally encrypted by specifying 

encrypt-binlog=1

The encryption key facility has to be added as well. The easiest one to setup is the Key File:

plugin-load-add=file_key_management.so

With its key file:

file_key_management_filename = /some_path/keys.txt

The keys listed in the key file can be specified per table but the system XtraDB/InnoDB tablespace and binary log files always use the key number 1, so it must always exists.

The last parameter is the AES encryption algorithm. AES_CBC (default) or AES_CTR.

file_key_management_encryption_algorithm=aes_cbc | aes_ctr

There is another key management solution (Eperi Gateway for Databases), which allows storing keys in a key server, preventing an attacker with file system access from unauthorized database file reading.

For additional information, please read more about MariaDB data-at-rest encryption.

MariaDB MaxScale Implementation:

The implementation follows the same MariaDB Server implementation above.

  • MaxScale adds its own START_ENCRIPTION event (which has same size but different content from the master one).

  • The encryption algorithm can be selected: AES_CBC or AES_CTR

  • The event size on disk is the same as the “clear” event.

  • Only key file management is currently supported: no key rotation.

A closer look at the new event might be interesting for most readers:

START_ENCRIPTION size is 36 or 40 bytes in size depending on CRC32 being used or not:

  • Replication header 19 bytes +

  • 1 byte encryption schema           // 1 is for system files

  • 4 bytes Encryption Key Version  // It allows key rotation

  • 12 bytes NONCE random bytes // first part of IV

As the saved event on disk has the same size of the clear event, the event size is in “clear” in the replication header.

Moving some bytes in the replication event header is required in order to encrypt/decrypt the event header and content.

MariaDB MaxScale configuration

Encrypt_binlog = On|Off
Encryption_algorithm = aes_ctr OR aes_cbc
Encryption_key_file =/maxscale_path/enc_key.txt

The specified key file must have this format: a line with 1;HEX(KEY)

Id is the scheme identifier, which must have the value 1 for binlog encryption, the ';' is a separator and HEX(KEY) contains the hex representation of the KEY. The KEY must have exact 16, 24 or 32 bytes size and the selected algorithm (aes_ctr or aes_cbc) with 128, 192 or 256 ciphers will be used.

Note: the key file has the same format as MariaDB Server 10.1 so it's possible to use an existing key file (not encrypted) which could contain several scheme; keys: only key id with value 1 will be parsed, and if not found, an error will be reported.

Example:

#
# This is the Encryption Key File
# key id 1 is for binlog files encryption: it's mandatory
# The keys come from a 32bytes value, 64 bytes with HEX format
#
2;abcdef1234567890abcdef12345678901234567890abcdefabcdef1234567890
1;5132bbabcde33ffffff12345ffffaaabbbbbbaacccddeee11299000111992aaa
3;bbbbbbbbbaaaaaaabbbbbccccceeeddddd3333333ddddaaaaffffffeeeeecccd
 

See all the encryption options in “router options”

[BinlogServer]
type=service
router=binlogrouter
version_string=10.1.17-log
router_options=server-id=93,mariadb10-compatibility=On,encrypt_binlog=On,encryption_key_file=/home/maxscale/binlog/keys.txt,encryption_algorithm=aes_cbc

Enable SSL communication to Master and from connecting Slaves

We have seen how to secure data-at-rest and we would like to show how to secure all the components in the Binlog Server setup:

  • Master

  • MaxScale

  • Slaves

  • Network connections

SSL between the Master and MaxScale is required, along with SSL communications with the slave servers.

How to use SSL in the master connection?

We have to add some new options to CHANGE MASTER TO command issued to MySQL Listener of Binlog Server, the options are used in the following examples:

All options.

MySQL [(none)]> CHANGE MASTER TO MASTER_SSL = 1, MASTER_SSL_CERT='/home/maxscale/packages/certificates/client/client-cert.pem', MASTER_SSL_CA='/home/maxscale/packages/certificates/client/ca.pem', MASTER_SSL_KEY='/home/maxscale/packages/certificates/client/client-key.pem', MASTER_TLS_VERSION='TLSv12';

Or just use some of them:

MySQL [(none)]> CHANGE MASTER TO MASTER_TLS_VERSION='TLSv12';
MySQL [(none)]> CHANGE MASTER TO MASTER_SSL = 0;

Some constraints:

  • In order to enable/re-enable Master SSL communication the MASTER_SSL=1 option is required and all certificate options must be explicitly set in the same CHANGE MASTER TO command.

  • New certificate options changes take effect after MaxScale restart or after MASTER_SSL=1 with the new options.

MySQL> SHOW SLAVE STATUS\G
Master_SSL_Allowed: Yes
       Master_SSL_CA_File: /home/mpinto/packages/certificates/client/ca.pem
       Master_SSL_CA_Path:
          Master_SSL_Cert: /home/mpinto/packages/certificates/client/client-cert.pem
        Master_SSL_Cipher:
           Master_SSL_Key: /home/mpinto/packages/certificates/client/client-key.pem

Setting Up Binlog Server Listener for Slave Server

[Binlog_server_listener]
type=listener
service=BinlogServer
protocol=MySQLClient
address=192.168.100.10
port=8808
authenticator=MySQL
ssl=required
ssl_cert=/usr/local/mariadb/maxscale/ssl/crt.maxscale.pem
ssl_key=/usr/local/mariadb/maxscale/ssl/key.csr.maxscale.pem
ssl_ca_cert=/usr/local/mariadb/maxscale/ssl/crt.ca.maxscale.pem
ssl_version=TLSv12

The final step is for each slave that connects to Binlog Server:

The slave itself should connect to MaxScale using SSL options in the CHANGE MASTER TO … SQL command

The setup is done! Encryption for data-at-rest and for data-in-transit is now complete!

The 2.1.3 GA release of MariaDB MaxScale, introduces the following key features for the secure setup of MariaDB MaxScale Binlog Server:

  • The binlog cache files in the MaxScale host can now be encrypted.

  • MaxScale binlog server also uses SSL in communication with the master and the slave servers.

This blog covers how the binary log encryption works in MariaDB Server and in MariaDB MaxScale.

Login or Register to post comments

by massimiliano_pinto_g at June 21, 2017 11:23 PM

Peter Zaitsev

Percona Monitoring and Management 1.1.5 is Now Available

Percona Monitoring and Management (PMM)

Percona announces the release of Percona Monitoring and Management 1.1.5 on June 21, 2017.

For installation instructions, see the Deployment Guide.


Changes in PMM Server

  • PMM-667: Fixed the Latency graph in the ProxySQL Overview dashboard to plot microsecond values instead of milliseconds.

  • PMM-800: Fixed the InnoDB Page Splits graph in the MySQL InnoDB Metrics Advanced dashboard to show correct page merge success ratio.

  • PMM-1007: Added links to Query Analytics from MySQL Overview and MongoDB Overview dashboards. The links also pass selected host and time period values.

    NOTE: These links currently open QAN2, which is still considered experimental.

Changes in PMM Client

  • PMM-931: Fixed pmm-admin script when adding MongoDB metrics monitoring for secondary in a replica set.

About Percona Monitoring and Management

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

by Alexey Zhebel at June 21, 2017 05:58 PM

Tracing MongoDB Queries to Code with Cursor Comments

Tracing MongoDB Queries

Tracing MongoDB QueriesIn this short blog post, we will discuss a helpful feature for tracing MongoDB queries: Cursor Comments.

Cursor Comments

Much like other database systems, MongoDB supports the ability for application developers to set comment strings on their database queries using the Cursor Comment feature. This feature is very useful for both DBAs and developers for quickly and efficiently tying a MongoDB query found on the database server to a line of code in the application source.

Once Cursor Comments are set in application code, they can be seen in the following areas on the server:

  1. The
    db.currentOp()
     shell command. If Auth is enabled, this requires a role that has the ‘inprog’ privilege.
  2. Profiles in the system.profile collection (per-db) if profiling is enabled.
  3. The QUERY log component.

Note: the Cursor Comment string shows as the field “query.comment” in the Database Profiler output, and as the field originatingCommand.comment in the output of the db.currentOp() command.

This is fantastic because this makes comments visible in the areas commonly used to find performance issues!

Often it is very easy to find a slow query on the database server, but it is difficult to target the exact area of a large application that triggers the slow query. This can all be changed with Cursor Comments!

Python Example

Below is a snippet of Python code implementing a cursor comment on a simple query to the collection “test.test”. (Most other languages and MongoDB drivers should work similarly if you do not use Python.)

My goal in this example is to get the MongoDB Profiler to log a custom comment, and then we will read it back manually from the server afterward to confirm it worked.

In this example, I include the following pieces of data in my comment:

  1. The Python class
  2. The Python method that executed the query
  3. The file Python was executing
  4. The line of the file Python was executing

Unfortunately, three of the four useful details above are not built-in variables in Python, so the “inspect” module is required to fetch those details. Using the “inspect” module and setting a cursor comment for every query in an application is a bit clunky, so it is best to create a method to do this. I made a class-method named “find_with_comment” in this example code to do this. This method performs a MongoDB query and sets the cursor comment automagically, finally returning a regular pymongo cursor object.

Below is the simple Python example script. It connects to a Mongod on localhost:27017, and demonstrates everything for us. You can run this script yourself if you have the “pymongo” Python package installed.

Script:

from inspect import currentframe, getframeinfo
from pymongo import MongoClient
class TestClass:
    def find_with_comment(self, conn, query, db, coll):
        frame      = getframeinfo(currentframe().f_back)
        comment    = "%s:%s;%s:%i" % (self.__class__.__name__, frame.function, frame.filename, frame.lineno)
        collection = conn[db][coll]
        return collection.find(query).comment(comment)
    def run(self):
        uri   = "localhost:27017"
        conn  = MongoClient(uri)
        query = {'user.name': 'John Doe'}
        for doc in self.find_with_comment(conn, query, 'test', 'test'):
            print doc
        conn.close()
if __name__  == "__main__":
    t = TestClass()
    t.run()

There are a few things to explain in this code:

  1. Line #6-10: The “find_with_comment” method runs a pymongo query and handles adding our special cursor comment string. This method takes-in the connection, query and db+collection name as variables.
  2. Line #7: is using the “inspect” module to read the last Python “frame” so we can fetch the file, line number, that called the query.
  3. Line #12-18: The “run” method makes a database connection, runs the “find_with_comment” method with a query, prints the results and closes the connection. This method is just boilerplate to run the example.
  4. Line #20-21: This code initiates the TestClass and calls the “run” method to run our test.

Trying It Out

Before running this script, enable database profiling mode “2” on the “test” database. This is the database the script will query. The profiling mode “2” causes MongoDB to profile all queries:

$ mongo --port=27017
> use test
switched to db test
> db.setProfilingLevel(2)
{ "was" : 1, "slowms" : 100, "ratelimit" : 1, "ok" : 1 }
> quit()

Now let’s run the script. There should be no output from the script, it is only going to do a find query to generate a Profile.

I saved the script as cursor-comment.py and ran it like this from my Linux terminal:

$ python cursor-comment.py
$

Now, let’s see if we can find any Profiles containing the “query.comment” field:

$ mongo --port=27017
> use test
> db.system.profile.find({ "query.comment": {$exists: true} }, { query: 1 }).pretty()
{
	"query" : {
		"find" : "test",
		"filter" : {
			"user.name" : "John Doe"
		},
		"comment" : "TestClass:run;cursor-comment.py:16"
	}
}

Now we know the exact class, method, file and line number that ran this profiled query! Great!

From this Profile we can conclude that the class-method “TestClass:run” initiated this MongoDB query from Line #16 of cursor-comment.py. Imagine this was a query that slowed down your production system and you need to know the source quickly. The usefulness of this feature/workflow becomes obvious, fast.

More on Python “inspect”

Instead of constructing a custom comment like the example above, you can also use Python “inspect” to collect the Python source code comment that precedes the code that is running. This might be useful for projects that have code comments that would be more useful than class/method/file/line number. As the comment is a string, the sky is the limit on what you can set!

Read about the

.getcomments()
  method of “inspect” here: https://docs.python.org/2/library/inspect.html#inspect.getcomments

Aggregation Comments

MongoDB 3.5/3.6 added support for comments in aggregations. This is a great new feature, as aggregations are often heavy operations that would be useful to tie to a line of code as well!

This can be used by adding a “comment” field to your “aggregate” server command, like so:

db.runCommand({
  aggregate: "myCollection",
  pipeline: [
    { $match: { _id: "foo" } }
  ],
  comment: "fooMatch"
})

See more about this new feature in the following MongoDB tickets: SERVER-28128 and DOCS-10020.

Conclusion

Hopefully this blog gives you some ideas on how this feature can be useful in your application. Start adding comments to your application today!

by Tim Vaillancourt at June 21, 2017 05:16 PM

Jean-Jerome Schmidt

Docker: All the Severalnines Resources

While the idea of containers have been around since the early days of Unix, Docker made waves in 2013 when it hit the market with its innovative solution. What began as an open source project, Docker allows you to add your stacks and applications to containers where they share a common operating system kernel. This lets you have a lightweight virtualized system with almost zero overhead. Docker also lets you bring up or down containers in seconds, making for rapid deployment of your stack.

Severalnines, like many other companies, got excited early on about Docker and began experimenting and developing ways to deploy advanced open source database configurations using Docker containers. We also released, early on, a docker image of ClusterControl that lets you utilize the management and monitoring functionalities of ClusterControl with your existing database deployments.

Here are just some of the great resources we’ve developed for Docker over the last few years...

Severalnines on Docker Hub

In addition to the ClusterControl Docker Image, we have also provided a series of images to help you get started on Docker with other open source database technologies like Percona XtraDB Cluster and MariaDB.

Check Out the Docker Images

ClusterControl on Docker Documentation

For detailed instructions on how to install ClusterControl utilizing the Docker Image click on the link below.

Read More

Top Blogs

MySQL on Docker: Running Galera Cluster on Kubernetes

In our previous posts, we showed how one can run Galera Cluster on Docker Swarm, and discussed some of the limitations with regards to production environments. Kubernetes is widely used as orchestration tool, and we’ll see whether we can leverage it to achieve production-grade Galera Cluster on Docker.

Read More

MySQL on Docker: Swarm Mode Limitations for Galera Cluster in Production Setups

This blog post explains some of the Docker Swarm Mode limitations in handling Galera Cluster natively in production environments.

Read More

MySQL on Docker: Composing the Stack

Docker 1.13 introduces a long-awaited feature called Compose-file support. Compose-file defines everything about an application - services, databases, volumes, networks, and dependencies can all be defined in one place. In this blog, we’ll show you how to use Compose-file to simplify the Docker deployment of MySQL containers.

Read More

MySQL on Docker: Deploy a Homogeneous Galera Cluster with etcd

Our journey to make Galera Cluster run smoothly on Docker containers continues. Deploying Galera Cluster on Docker is tricky when using orchestration tools. With this blog, find out how to deploy a homogeneous Galera Cluster with etcd.

Read More

MySQL on Docker: Introduction to Docker Swarm Mode and Multi-Host Networking

This blog post covers the basics of managing MySQL containers on top of Docker swarm mode and overlay network.

Read More

MySQL on Docker: Single Host Networking for MySQL Containers

This blog covers the basics of how Docker handles single-host networking, and how MySQL containers can leverage that.

Read More

MySQL on Docker: Building the Container Image

In this post, we will show you two ways how to build a MySQL Docker image - changing a base image and committing, or using Dockerfile. We’ll show you how to extend the Docker team’s MySQL image, and add Percona XtraBackup to it.

Read More

MySQL Docker Containers: Understanding the basics

In this post, we will cover some basics around running MySQL in a Docker container. It walks you through how to properly fire up a MySQL container, change configuration parameters, how to connect to the container, and how the data is stored.

Read More

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

ClusterControl on Docker

ClusterControl provides advanced management and monitoring functionality to get your MySQL replication and clustered instances up-and-running using proven methodologies that you can depend on to work. Used in conjunction with other orchestration tools for deployment to the containers, ClusterControl makes managing your open source databases easy with point-and-click interfaces and no need to have specialized knowledge about the technology.

ClusterControl delivers on an array of features to help manage and monitor your open source database environments:

  • Management & Monitoring: ClusterControl provides management features to repair and recover broken nodes, as well as test and automate MySQL upgrades.
  • Advanced Monitoring: ClusterControl provides a unified view of all MySQL nodes and clusters across all your data centers and lets you drill down into individual nodes for more detailed statistics.
  • Automatic Failure Detection and Handling: ClusterControl takes care of your replication cluster’s health. If a master failure is detected, ClusterControl automatically promotes one of the available slaves to ensure your cluster is always up.

Learn more about how ClusterControl can enhance performance here or pull the Docker Image here.

We hope that these resources prove useful!

Happy Clustering!

by Severalnines at June 21, 2017 09:59 AM

June 20, 2017

Peter Zaitsev

Webinar Thursday June 22, 2017: Deploying MySQL in Production

Deploying MySQL

Join Percona’s Senior Operations Engineer, Daniel Kowalewski as he presents Deploying MySQL in Production on Thursday, June 22, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7).

 MySQL is famous for being something you can install and get going in less than five minutes in terms of development. But normally you want to run MySQL in production, and at scale. This requires some planning and knowledge. So why not learn the best practices around installation, configuration, deployment and backup?

This webinar is a soup-to-nuts talk that will have you going from zero to hero in no time. It includes discussion of the best practices for installation, configuration, taking backups, monitoring, etc.

Register for the webinar here.

Deploying MySQLDaniel Kowalewski, Senior Technical Operations Engineer

Daniel has been designing and deploying solutions around MySQL for over ten years. He lives for those magic moments where response time drops by 90%, and loves adding more “nines” to everything.

by Dave Avery at June 20, 2017 10:42 PM

The MySQL High Availability Landscape in 2017 (The Elders)

High Availability

In this blog, we’ll look at different MySQL high availability options.

The dynamic MySQL ecosystem is rapidly evolving many technologies built around MySQL. This is especially true for the technologies involved with the high availability (HA) aspects of MySQL. When I joined Percona back in 2009, some of these HA technologies were very popular – but have since been almost forgotten. During the same interval, new technologies have emerged. In order to give some perspective to the reader, and hopefully help to make better choices, I’ll review the MySQL HA landscape as it is in 2017. This review will be in three parts. The first part (this post) will cover the technologies that have been around for a long time: the elders. The second part will focus on the technologies that are very popular today: the adults. Finally, the last part will try to extrapolate which technologies could become popular in the upcoming years: the babies.

Quick disclaimer, I am reporting on the technologies I see the most. There are likely many other solutions not covered here, but I can’t talk about technologies I have barely or never used. Apart from the RDS-related technologies, all the technologies covered are open-source. The target audience for this post are people relatively new to MySQL.

The Elders

Let’s define the technologies in the elders group. These are technologies that anyone involved with MySQL for last ten years is sure to be aware of. I could have called this group the “classics”.  I include the following technologies in this group:

  • Replication
  • Shared storage
  • NDB cluster

Let’s review these technologies in the following sections.

Replication

Simple replication topology

 

MySQL replication is very well known. It is one of the main features behind the wide adoption of MySQL. Replication gets used almost everywhere. The reasons for that are numerous:

  • Replication is simple to setup. There are tons of how-to guides and scripts available to add a slave to a MySQL server. With Amazon RDS, adding a slave is just a few clicks.
  • Slaves allow you to easily scale reads. The slaves are accessible and can be used for reads. This is the most common way of scaling up a MySQL database.
  • Slaves have little impact on the master. Apart from the added network traffic, the presence of slaves does not impact the master performance significantly.
  • It is well known. No surprises here.
  • Used for failover. Your master died, promote a slave and use it as your new master.
  • Used for backups. You don’t want to overload your master with the backups, run them off a slave.

Of course, replication also has some issues:

  • Replication can lag. Replication used to be single-threaded. That means a master with a concurrent load could easily outpace a slave. MySQL 5.6 and MariaDB 10.0 have introduced some parallelism to the slave. Newer versions have further improved to a point where today’s slaves are many times faster than they were.
  • Slaves can diverge. When you modify data on the master, the slave must perform the exact same update. That seems easy, but there are many ways an update can be non-deterministic with statement-based replication. They fixed many issues, and the introduction of row-based replication has been another big step forward. Still, if you write directly to a slave you are asking for trouble. There is a read_only setting, but if the MySQL user has the “SUPER” privilege it is just ignored. That’s why there is now the “super_read_only” setting. Tools like pt-table-checksum and pt-table-sync from the Percona toolkit exist to solve this problem.
  • Replication can impact the master. I wrote above that the presence of slaves does not affect the master, but logging changes are more problematic. The most common issue is the InnoDB table-level locking for auto_increment values with statement-based replication. Only one thread can insert new rows at a time. You can avoid this issue with row-based replication and properly configuring settings.
  • Data gets lost. Replication is asynchronous. That means the master will reply “done” after a commit statement even though the slaves have not received updates yet. Some transactions can get lost if the master crashes.

Although an old technology, a lot of work has been done on replication. It is miles away from the replication implementation of 5.0.x. Here’s a list, likely incomplete, of the evolution of replication:

  • Row based replication (since 5.1). The binary internal representation of the rows is sent instead of the SQL statements. This makes replication more robust against slave divergence.
  • Global transaction ID (since 5.6). Transactions are uniquely identified. Replication can be setup without knowing the binlog file and offset.
  • Checksum (since 5.6). Binlog events have checksum values to validate their integrity.
  • Semi-sync replication (since 5.5). An addition to the replication protocol to make the master aware of the reception of events by the slaves. This helps to avoid losing data when a master crashes.
  • Multi-source replication (since 5.7). Allows a slave to have more than one master.
  • Multi-threaded replication (since 5.6). Allows a slave to use multiple threads. This helps to limit the slave lag.

Managing replication is a tedious job. The community has written many tools to manage replication:

  • MMM. An old Perl tool that used to be quite popular, but had many issues. Now rarely used.
  • MHA. The most popular tool to manage replication. It excels at reconfiguring replication without losing data, and does a decent at handling failover.  It is also simple. No wonder it is popular.
  • PRM. A Pacemaker-based solution developed to replace MMM. It’s quite good at failover, but not as good as MHA at reconfiguring replication. It’s also quite complex, thanks to Pacemaker. Not used much.
  • Orchestrator. The new cool tool. It can manage complex topologies and has a nice web-based interface to monitor and control the topology.

 

Shared Storage

Simple shared storage topology

 

Back when I was working for MySQL ten years ago, shared storage HA setups were very common. A shared storage HA cluster uses one copy of the database files between one of two servers. One server is active, the other one is passive. In order to be shared, the database files reside on a device that can be mounted by both servers. The device can be physical (like a SAN), or logical (like a Linux DRBD device). On top of that, you need a cluster manager (like Pacemaker) to handle the resources and failovers. This solution is very popular because it allows for failover without losing any transactions.

The main drawback of this setup is the need for an idle standby server. The standby server cannot have any other assigned duties since it must always be ready to take over the MySQL server. A shared storage solution is also obviously not resilient to file-level corruption (but that situation is exceptional). Finally, it doesn’t play well with a cloud-based environment.

Today, newly-deployed shared storage HA setups are rare. The only ones I encountered over the last year were either old implementations needing support, or new setups that deployed because of existing corporate technology stacks. That should tell you about the technology’s loss of popularity.

NDB Cluster

A simple NDB Cluster topology

 

An NDB Cluster is a distributed clustering solution that has been around for a long time. I personally started working with this technology back in 2008. An NDB Cluster has three types of nodes: SQL, management and data. A full HA cluster requires a minimum of four nodes.

An NDB Cluster is not a general purpose database due to its distributed nature. For suitable workloads, it is extraordinary good. For unsuitable workloads, it is miserable. A suitable workload for an NDB Cluster contains high concurrency, with a high rate of small primary key oriented transactions. Reaching one million trx/s on an NDB Cluster is nothing exceptional.

At the other end of the spectrum, a poor workload for an NDB Cluster is a single-threaded report query on a star-like schema. I have seen some extreme cases where just the network time of a reporting query amounted to more than 20 minutes.

Although NDB Clusters have improved, and are still improving, their usage has been pushed toward niche-type applications. Overall, the technology is losing ground and is now mostly used for Telcos and online gaming applications.

by Yves Trudeau at June 20, 2017 10:36 PM

Jean-Jerome Schmidt

ClusterControl on Docker

(This blog was updated on June 20, 2017)

We’re excited to announce our first step towards dockerizing our products. Please welcome the official ClusterControl Docker image, available on Docker Hub. This will allow you to evaluate ClusterControl with a couple of commands:

$ docker pull severalnines/clustercontrol

The Docker image comes with ClusterControl installed and configured with all of its components, so you can immediately use it to manage and monitor your existing databases. Supported database servers/clusters:

  • Galera Cluster for MySQL
  • Percona XtraDB Cluster
  • MariaDB Galera Cluster
  • MySQL/MariaDB Replication
  • MySQL/MariaDB single instance
  • MongoDB Replica Set
  • PostgreSQL single instance

As more and more people will know, Docker is based on the concept of so called application containers and is much faster or lightweight than full stack virtual machines such as VMWare or VirtualBox. It's a very nice way to isolate applications and services to run in a completely isolated environment, which a user can launch and tear down the whole stack within seconds.

Having a Docker image for ClusterControl at the moment is convenient in terms of how quickly it is to get it up and running and it's 100% reproducible. Docker users can now start testing ClusterControl, since we have an image that everyone can pull down and then launch the tool.

ClusterControl Docker Image

The image is available on Docker Hub and the code is hosted in our Github repository. Please refer to the Docker Hub page or our Github repository for the latest instructions.

The image consists of ClusterControl and all of its components:

  • ClusterControl controller, CLI, cmonapi, UI and NodeJS packages installed via Severalnines repository.
  • Percona Server 5.6 installed via Percona repository.
  • Apache 2.4 (mod_ssl and mod_rewrite configured)
  • PHP 5.4 (gd, mysqli, ldap, cli, common, curl)
  • SSH key for root user.
  • Deployment script: deploy-container.sh

The core of automatic deployment is inside a shell script called “deploy-container.sh”. This script monitors a custom table inside CMON database called “cmon.containers”. The database container created by run command or some orchestration tool shall report and register itself into this table. The script will look for new entries and perform the necessary actions using s9s, the ClusterControl CLI. You can monitor the progress directly from the ClusterControl UI or using “docker logs” command.

New Deployment - ClusterControl and Galera Cluster on Docker

There are several ways to deploy a new database cluster stack on containers, especially with respect to the container orchestration platform used. The currently supported methods are:

  • Docker (standalone)
  • Docker Swarm
  • Kubernetes

A new database cluster stack would usually consist of a ClusterControl server with a three-node Galera Cluster. From there, you can scale up or down according to your desired setup. We have built another Docker image to serve as database base image called “centos-ssh”, which can be used with ClusterControl for automatic deployment. This image comes with a simple bootstrapping logic to download the public key from ClusterControl container (automatic passwordless SSH setup), register itself to ClusterControl and pass the environment variables for the chosen cluster setup.

Docker (Standalone)

To run on standalone Docker host, one would do the following:

  1. Run the ClusterControl container:

    $ docker run -d --name clustercontrol -p 5000:80 severalnines/clustercontrol
  2. Then, run the DB containers. Define the CC_HOST (the ClusterControl container's IP) environment variable or simply use container linking. Assuming the ClusterControl container name is ‘clustercontrol’:

    $ docker run -d --name galera1 -p 6661:3306 --link clustercontrol:clustercontrol -e CLUSTER_TYPE=galera -e CLUSTER_NAME=mygalera -e INITIAL_CLUSTER_SIZE=3 severalnines/centos-ssh
    $ docker run -d --name galera2 -p 6662:3306 --link clustercontrol:clustercontrol -e CLUSTER_TYPE=galera -e CLUSTER_NAME=mygalera -e INITIAL_CLUSTER_SIZE=3 severalnines/centos-ssh
    $ docker run -d --name galera3 -p 6663:3306 --link clustercontrol:clustercontrol -e CLUSTER_TYPE=galera -e CLUSTER_NAME=mygalera -e INITIAL_CLUSTER_SIZE=3 severalnines/centos-ssh

ClusterControl will automatically pick the new containers to deploy. If it finds the number of containers is equal or greater than INITIAL_CLUSTER_SIZE, the cluster deployment shall begin. You can verify that with:

$ docker logs -f clustercontrol

Or, open the ClusterControl UI at http://{Docker_host}:5000/clustercontrol and look under Activity -> Jobs.

To scale up, just run new containers and ClusterControl will add them into the cluster automatically:

$ docker run -d --name galera4 -p 6664:3306 --link clustercontrol:clustercontrol -e CLUSTER_TYPE=galera -e CLUSTER_NAME=mygalera -e INITIAL_CLUSTER_SIZE=3 severalnines/centos-ssh

Docker Swarm

Docker Swarm allows container deployment on multiple hosts. It manages a cluster of Docker Engines called a swarm.

Take note of some prerequisites for running ClusterControl and Galera Cluster on Docker Swarm:

  • Docker Engine version 1.12 and later.
  • Docker Swarm Mode is initialized.
  • ClusterControl must be connected to the same overlay network as the database containers.

In Docker Swarm mode, centos-ssh will default to look for 'cc_clustercontrol' as the CC_HOST. If you create the ClusterControl container with 'cc_clustercontrol' as the service name, you can skip defining CC_HOST variable.

We have explained the deployment steps in this blog post.

Kubernetes

Kubernetes is an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts, and provides a container-centric infrastructure. In Kubernetes, centos-ssh will default to look for “clustercontrol” service as the CC_HOST. If you use other name, please specify this variable accordingly.

Example YAML definition of ClusterControl deployment on Kubernetes is available in our Github repository under Kubernetes directory. To deploy a ClusterControl pod using a ReplicaSet, the recommended way is:

$ kubectl create -f cc-pv-pvc.yml
$ kubectl create -f cc-rs.yml

Kubernetes will then create the necessary PersistentVolume (PV) and PersistentVolumeClaim (PVC) by connecting to the NFS server. The deployment definition (cc-rs.yml) will then run a pod and use the created PV resources to map the datadir and cmon configuration directory for persistency. This allows the ClusterControl pod to be relocated to other Kubernetes nodes if it is rescheduled by the scheduler.

Import Existing DB Containers into ClusterControl

If you already have a Galera Cluster running on Docker and you would like to have ClusterControl manage it, you can simply run the ClusterControl container in the same Docker network as the database containers. The only requirement is to ensure the target containers have SSH related packages installed (openssh-server, openssh-clients). Then allow passwordless SSH from ClusterControl to the database containers. Once done, use “Add Existing Server/Cluster” feature and the cluster should be imported into ClusterControl.

Example of importing the existing DB containers

We have a physical host, 192.168.50.130 installed with Docker, and assume that you already have a three-node Galera Cluster in containers, running under the standard Docker bridge network. We are going to import the cluster into ClusterControl, which is running in another container. The following is the high-level architecture diagram:

Adding into ClusterControl

  1. Install OpenSSH related packages on the database containers, allow the root login, start it up and set the root password:

    $ docker exec -ti [db-container] apt-get update
    $ docker exec -ti [db-container] apt-get install -y openssh-server openssh-client
    $ docker exec -it [db-container] sed -i 's|^PermitRootLogin.*|PermitRootLogin yes|g' /etc/ssh/sshd_config
    $ docker exec -ti [db-container] service ssh start
    $ docker exec -it [db-container] passwd
  2. Start the ClusterControl container as daemon and forward port 80 on the container to port 5000 on the host:

    $ docker run -d --name clustercontrol -p 5000:80 severalnines/clustercontrol
  3. Verify the ClusterControl container is up:

    $ docker ps | grep clustercontrol
    59134c17fe5a        severalnines/clustercontrol   "/entrypoint.sh"       2 minutes ago       Up 2 minutes        22/tcp, 3306/tcp, 9500/tcp, 9600/tcp, 9999/tcp, 0.0.0.0:5000->80/tcp   clustercontrol
  4. Open a browser, go to http://{docker_host}:5000/clustercontrol and create a default admin user and password. You should now see the ClusterControl landing page.

  5. The last step is setting up the passwordless SSH to all database containers. Attach to ClusterControl container interactive console:

    $ docker exec -it clustercontrol /bin/bash
  6. Copy the SSH key to all database containers:

    $ ssh-copy-id 172.17.0.2
    $ ssh-copy-id 172.17.0.3
    $ ssh-copy-id 172.17.0.4
  7. Start importing the cluster into ClusterControl. Open a web browser and go to Docker’s physical host IP address with the mapped port e.g, http://192.168.50.130:5000/clustercontrol and click “Add Existing Cluster/Server” and specify following information:

    Ensure you got the green tick when entering the hostname or IP address, indicating ClusterControl is able to communicate with the node. Then, click Import button and wait until ClusterControl finishes its job. The database cluster will be listed under the ClusterControl dashboard once imported.

Happy containerizing!

by ashraf at June 20, 2017 11:09 AM

Shlomi Noach

Observations on the hashicorp/raft library, and notes on RDBMS

The hashicorp/raft library is a Go library to provide consensus via Raft protocol implementation. It is the underlying library behind Hashicorp's Consul.

I've had the opportunity to work with this library a couple projects, namely freno and orchestrator. Here are a few observations on working with this library:

  • TL;DR on Raft: a group communication protocol; multiple nodes communicate, elect a leader. A leader leads a consensus (any subgroup of more than half the nodes of the original group, or hopefully all of them). Nodes may leave and rejoin, and will remain consistent with consensus.
  • The hashicorp/raft library is an implementation of the Raft protocol. There are other implementations, and different implementations support different features.
  • The most basic premise is leader election. This is pretty straightforward to implement; you set up nodes to communicate to each other, and they elect a leader. You may query for the leader identity via Leader(), VerifyLeader(), or observing LeaderCh.
  • You have no control over the identity of the leader. You cannot "prefer" one node to be the leader. You cannot grab leadership from an elected leader, and you cannot demote a leader unless by killing it.
  • The next premise is gossip, sending messages between the raft nodes. With hashicorp/raft, only the leader may send messages to the group. This is done via the Apply() function.
  • Messages are nothing but blobs. Your app encodes the messages into []byte and ships it via raft. Receiving ends need to decode the bytes into a meaningful message.
  • You will check the result of Apply(), an ApplyFuture. The call to Error() will wait for consensus.
  • Just what is a message consensus? It's a guarantee that the consensus of nodes has received and registered the message.
  • Messages form the raft log.
  • Messages are guaranteed to be handled in-order across all nodes.
  • The leader is satisfied when the followers receive the messages/log, but it cares not for their interpretation of the log.
  • The leader does not collect the output, or return value, of the followers applying of the log.
  • Consequently, your followers may not abort the message. They may not cast an opinion. They must adhere to the instruction received from the leader.
  • hashicorp/raft uses either an LMDB-based store or BoltDB for persisting your messages. Both are transactional stores.
  • Messages are expected to be idempotent: a node that, say, happens to restart, will request to join back the consensus (or to form a consensus with some other node). To do that, it will have to reapply historical messages that it may have applied in the past.
  • Number of messages (log entries) will grow infinitely. Snapshots are taken so as to truncate the log history. You will implement the snapshot dump & load.
  • A snapshot includes the log index up to which it covers.
  • Upon startup, your node will look for the most recent snapshot. It will read it, then resume replication from the aforementioned log index.
  • hashicorp/raft provides a file-system based snapshot implementation.

One of my use cases is completely satisfied with the existing implementations of BoltDB and of the filesystem snapshot.

However in another (orchestrator), my app stores its state in a relational backend. To that effect, I've modified the logstore and snapshot store. I'm using either MySQL or sqlite as backend stores for my app. How does that affect my raft use?

  • My backend RDBMS is the de-facto state of my orchestrator app. Anything written to this DB is persisted and durable.
  • When orchestrator applies a raft log/message, it runs some app logic which ends with a write to the backend DB. At that time, the raft log is effectively not required anymore to persist. I care not for the history of logs.
  • Moreover, I care not for snapshotting. To elaborate, I care not for snapshot data. My backend RDBMS is the snapshot data.
  • Since I'm running a RDBMS, I find BoltDB to be wasteful, an additional transaction store on top a transaction store I already have.
  • Likewise, the filesystem snapshots are yet another form of store.
  • Log Store (including Stable Store) are easily re-implemented on top of RDBMS. The log is a classic relational entity.
  • Snapshot is also implemented on top of RDBMS,  however I only care for the snapshot metadata (what log entry is covered by a snapshot) and completely discard storing/loading snapshot state or content.
  • With all these in place, I have a single entity that defines:
    • What my data looks like
    • Where my node fares in the group gossip
  • A single RDBMS restore returns a dataset that will catch up with raft log correctly. However my restore window is limited by the number of snapshots I store and their frequency.

by shlomi at June 20, 2017 04:05 AM

June 19, 2017

Peter Zaitsev

Upcoming HA Webinar Wed 6/21: Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication

High Availability

High AvailabilityJoin Percona’s MySQL Practice Manager Kenny Gryp and QA Engineer, Ramesh Sivaraman as they present a high availability webinar around Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication on Wednesday, June 21, 2017 at 10:00 am PDT / 1:00 pm EDT (UTC-7).

What are the implementation differences between Percona XtraDB Cluster 5.7, Galera Cluster 5.7 and MySQL Group Replication?

  • How do they work?
  • How do they behave differently?
  • Do these methods have any major issues?

This webinar will describe the differences and shed some light on how QA is done for each of the different technologies.

Register for the webinar here.

High AvailabilityRamesh Sivaraman, QA Engineer

Ramesh joined the Percona QA Team in March 2014. He has almost six years of experience in database administration and, before joining Percona, was giving MySQL database support to various service and product based internet companies. Ramesh’s professional interests include writing shell/Perl script to automate routine tasks and new technology. Ramesh lives in Kerala, the southern part of India, close to his family.

High AvailabilityKenny Gryp, MySQL Practice Manager

Kenny is currently MySQL Practice Manager at Percona.

by Dave Avery at June 19, 2017 06:53 PM

MariaDB Server 10.2 GA Release Overview

MariaDB Server 10.2

MariaDB Server 10.2This blog post looks at the recent MariaDB Server 10.2 GA release.

Congratulations to the MariaDB Foundation for releasing a generally available (GA) stable version of MariaDB Server 10.2! We’ll definitely spend the next few weeks talking about MariaDB Server 10.2, but here’s a quick overview in the meantime. Keep in mind that when thinking about compatibility, this is meant to be the equivalent of MySQL 5.7 (GA: October 21, 2015, with Percona Server for MySQL 5.7 GA available February 23, 2016).

Some of the highlights include:

  • Window functions – this is the first release in the MySQL ecosystem that includes Window functions and Recursive Common Table Expression. At the time of this writing, MariaDB hasn’t completed the documentation. It is worth noting that the implementation of Window functions in MariaDB Server 10.2 differs from what you see in MariaDB ColumnStore.
  • JSON functions – Many JSON functions to query, update, index and validate JSON. It’s worth noting that MariaDB Server 10.2 does not include a JSON data type as compared to MySQL 5.7). This means you can’t do CREATE TABLE t1 (jdoc JSON) – instead you need to use a VARCHAR or TEXT column. There are also other differences that produce different result sets, and seemingly no column path operator.
  • There is also support for GeoJSON functionality, but when we tried ST_AsGeoJSON (yes, documentation needs work), we noticed that the output could vary from MySQL 5.7.
  • MyRocks – MariaDB added the hot new storage engine MyRocks as an alpha. You will have to install the MyRocks engine package separably. It isn’t fully merged yet. Watch the umbrella task MDEV-9658.
  • SHOW CREATE USER – A new SHOW CREATE USER statement allows you to look at user limitation, as you can now limit users to a maximum number of queries, updates and connections per hour, as well as a maximum number of connections permitted by the user (see setting account resource limits for the MySQL 5.7 equivalent). You’ll want to read the updated documentation around CREATE USER. Don’t be surprised when you see something like “ERROR 1226 (42000): User ‘foo’ has exceeded the ‘max_user_connections’ resource (current value: 1)”. This also is an extension to ALTER USER.
  • Flashback – binary log based rollback, aka flashback, can rollback tables and databases to an older snapshot. This should help when the DBA or a user makes an error. This tool works well as long as its a DML statement. This feature came from Alibaba’s AliSQL tree.
  • Time delayed replication – new in MariaDB Server 10.2.
  • OpenSSL 1.1 – now there is support for OpenSSL 1.1, LibreSSL
  • MariaDB Connector/C – most importantly, MariaDB Connector/C replaces libmysql (see: MDEV-9055). This should be API and ABI compatible, but naturally there are some teething problems (see: MDEV-12950).
  • Amazon Key Management plugin – from a key management standpoint, the Amazon Key Management plugin is now available to use for encryption. It’s compiled and available as a package. Previously, you had to compile it yourself.

Some of the important things to take note of are:

  • As of this release, MariaDB now ships with InnoDB as the default storage engine (as opposed to Percona XtraDB). This means that from here on out, the improvements and fixes to Percona XtraDB won’t necessarily be available in MariaDB. This also means that Percona XtraDB parameters might get ignored (as reported in MDEV-12472).
  • In MySQL 5.6+, you can use SHA-256 pluggable authentication. However, this features is still not implemented in MariaDB Server 10.2 (see: MDEV-9804). You can use the ed25519 authentication plugin as a replacement, however.
  • When it comes to replication, MySQL 5.7 defaults to row-based replication. MariaDB Server 10.2 defaults to mixed-mode replication (see the discussion around this at MDEV-7635).
  • It is worth noting that in order to make MariaDB Server more “Oracle compatible,” DECIMAL now goes up to 38 decimals instead of 30 decimals. MDEV-10138 tells you what happens when you migrate from a long decimal to a default decimal type install (i.e., if you’re moving to another variant in the MySQL ecosystem).
  • If you’re familiar with how MySQL 5.7 manages passwords and a new install, the MariaDB Server 10.2 method hasn’t changed.

All in all, this release took a little over a year to make (Alpha was 18 April 2016, GA was 23 May 2017). It is extremely important to read the release notes and the changelogs of each and every release, as MariaDB Server diverges from MySQL quite a bit. At Percona, we will monitor Jira closely to ensure that you always stay informed of the latest changes.

by Colin Charles at June 19, 2017 06:36 PM

MariaDB AB

MariaDB Server Default in Debian 9

MariaDB Server Default in Debian 9 rasmusjohansson Mon, 06/19/2017 - 12:15

One way of distributing Linux based software is to make the software so popular and easily consumable that distributions want to include it. By consumable, I mean that the software should be packaged correctly, processes around the software need to be transparent and any issues found with the software should be addressed swiftly.Debian logo large.jpg

MariaDB Server has become a software package distributed widely by the Linux distributions. There are a lot of distros including MariaDB Server nowadays. We try to keep an updated list of distributions that include MariaDB but let us know if there is something missing.

Since MariaDB started as a fork of MySQL, users evaluate between MariaDB or MySQL when choosing the right open source database. The same question is relevant for the distros as well, which variant should be installed by default? Red Hat, SUSE, CentOS, Fedora and many others already choose to have MariaDB Server as the default MySQL variant.

Now Debian joins by making MariaDB Server 10.1 the default in Debian 9.

With MariaDB Server as the default in Debian it’s included in the Stable repository for Debian 9 and available to all users of Debian 9. To read Debian’s update on this, please refer to the what’s new in Debian 9 article.  

From a MariaDB Server perspective, to be the default MySQL variant in Debian provides many benefits:

  • There will be more users running MariaDB Server. Whenever there is a dependency on MySQL or MariaDB, MariaDB Server will be installed by default.

  • Popular software will make sure that they are compatible with MariaDB since MariaDB Server will be installed by default.

  • Being the default means that MariaDB Server is part of Debian’s Stable repository and is being built by the Debian build system on all possible platforms from IBM System z to ARM-based platforms.

There are of course benefits for the users as well:

  • MariaDB follows good open source practices such as being open about security issues and providing immediate information about them. Also, the open development of MariaDB Server is hugely important. Users can see what’s going on, can participate in development and will know when they can expect new features and versions.

  • Debian users can benefit from all features provided in MariaDB Server 10.1 such as multi-master support through the inclusion of Galera, Data at Rest Encryption and improved replication.

Let’s look at what all this means in practice. Let’s say you have installed Debian 9 as your web server on which you want to install Wordpress for your blog. Looking at the details of the Wordpress package you’ll see a couple of dependencies and suggested installations together with installing Wordpress itself:

  • The first dependency of interest in this case is on the default-mysql-client. This is a metapackage that points to the latest version available in the repository for the chosen MySQL compatible client library that should be used with Wordpress. The metapackage points to mariadb-client-10.1.

  • The other dependency, which is of type suggestion is the package default-mysql-server. This is again a metapackage that points to the default MySQL variant in Debian, which is the package mariadb-server-10.1.

When installing Wordpress you’ll end up with the MariaDB 10.1 client library installed and you’ll also install MariaDB Server 10.1 (if you accept the suggestion).

A lot of work went into the packaging of MariaDB to support the concept of metapackages, and making the installation and upgrading process as smooth as possible as well as to also coordinate any package compatibility issues when making MariaDB the default MySQL variant. A special thanks goes to the package maintainers Otto Kekäläinen, Arnaud Fontaine and Ondřej Surý for making this happen! Thanks also to the many other developers  addressing issues found by the package maintainers and the Debian build system.

Especially if you’re upgrading from an earlier version of Debian and you already have MySQL installed on your system, we recommend that you read through our guidance in the article Moving from MySQL to MariaDB in Debian 9.

Testing has been done on both the Debian side as well as the MariaDB side. If you run into any problems when installing, upgrading or migrating to MariaDB Server, we’re of course here to help.

Red Hat, SUSE, CentOS, Fedora and many others already choose to have MariaDB Server as the default MySQL variant. Now Debian joins by making MariaDB Server 10.1 the default in Debian 9.

Login or Register to post comments

by rasmusjohansson at June 19, 2017 04:15 PM

MariaDB Foundation

Duel: gdb vs. linked lists, trees, and hash tables

My first encounter with the gdb command duel was on some old IRIX about 15 years ago. I immediately loved how convenient it was for displaying various data structures during MySQL debugging, and I wished Linux had something similar. Later I found out that Duel was not something IRIX specific, but a public domain patch […]

The post Duel: gdb vs. linked lists, trees, and hash tables appeared first on MariaDB.org.

by Sergei at June 19, 2017 09:01 AM

June 18, 2017

MariaDB Foundation

Debian 9 released with MariaDB as the only MySQL variant

The Debian project has today announced their 9th release, code named “Stretch”. This is a big milestone for MariaDB, because the release team decided to ship and support only one MySQL variant in this release, and MariaDB was chosen over MySQL. This is prominently mentioned in the press release about Debian 9 and more information […]

The post Debian 9 released with MariaDB as the only MySQL variant appeared first on MariaDB.org.

by Otto Kekäläinen at June 18, 2017 04:24 PM

Valeriy Kravchuk

MySQL Support Engineer's Chronicles, Issue #6

Previous post in series was published almost 4 months ago, but I do not plan to end it. So, let me quickly discuss some of problems I worked on or was interested in so far in June, and provide some useful links.

Back on June 2 I had to find out what exact files are created by MariaDB's ColumnStore when I create a table in this storage engine. Actually in recent versions one can check the tables in the INFORMATION_SCHEMA, but if still wonders why are all these directories with numbers in the names (/usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/193.dir/000.dir/FILE000.cdf), please, check also this storage architecture review and take into account the following hint by the famous LinuxJedi:
partA: oid>>24
partB: (oid&0x00ff0000)>>16
partC: (oid&0x0000ff00) >> 8
partD: oid&0x000000ff
partE: partition
fName: segment
Those numbers in the directory names are decimal integers, 3 digits, zero-padded. oid is object_id.

If you plan to try to work with MariaDB ColumnStore, please, make sure to check also this KB article, MariaDB ColumnStore System Usage.

Later I had to work on some out-of-memory issues, and found this article on how to get the details on transparent huge pages usage on RedHat systems very useful.

No matter how much I hate clusters, I have to deal at least with Galera-based ones. One of the problems to deal with, recently, was about many threads "hanging" in "wsrep in pre-commit stage" status. In the process of investigation I had found this bug report, lp:1197771, and I am still wondering what is the proper status for it ("Fix Committed" to nothing does not impress me). Check also this discussion.

The most important discussions for MySQL Community, IMHO, happened around June 9 in relation to Bug #86607, "InnoDB crashed when master thread evict dict_table_t object". Great finding and detailed analysis by Alibaba's engineer, Jianwei Zhao, as well as his detailed MTR test case (that worked on a bit modified source code with a couple of lines like os_thread_sleep(2000000); added here and there), were all rejected by my deal old friend Sinisa... It caused a lot of reactions and comments on Facebook. Further references to magic "Oracle policies"  had not helped. What really helped is a real verification by Umesh Shastry later. As many colleagues noted, one had just to try, and test fails even on a non-modified code, quite often. It's quite possible that MariaDB will fix this bug faster, so you may want to monitor this related bug report, MDEV-13051. To summarize, it's not about any policies, but all about the proper care and attitude, and had always been like that...

Another great by to follow, Bug #86215, was reported back in May by Mark Callaghan. He spent a lot of time checking how well MySQL performs, historically, on different versions, for 4.0 to 8.0. Recent findings are summarized in this blog post. Based on the results, it seems that for most tests huge drop in performance happened in MySQL 5.7, and so far version 8.0.x had made few improvements on top of that. Check also this summary of tests and related blog posts. I wonder what would be the results of analysis for this bug and related posts by the great Dimitri Kravtchuk, whom the bug is assigned to currently.

Those who may have to work with MariaDB or MySQL on Windows and deal with crashes or hangs, may find this Micsoft's article useful. It explains how to get call stacks for threads with Visual Studio 2017. Thanks to Vladislav Vaintroub, for the link. I badly need new box with Windows 10 it seems, to check all this and other cool stuff, like shadow backups...

MariaDB's MaxScale 2.1 GA version had appeared recently, with a new cache filter feature. You can find the documentation here and there, and some comparison of it's initial implementation with ProxySQL's cache in this great blog post.

Yet another MySQL bug, reported by my colleague Hartmut Holzgraefe long time ago, attracted my attention while working on issues. This was Bug #76793, "Different row size limitations in Anaconda and Barracuda file formats". It was verified more than 2 years ago, but still nothing happens, neither in the code, nor in the manual to describe and explain the differences. Sad, but true.

The last but not the least, Support team of MariaDB is hiring now. We need one more engineer for our APAC team. If you want to work on complex and interesting issues from reasonable customers with properly set expectations, in a wisely managed team, and get a chance to work with me for 3-4 hours on a daily basis (yes, I am around/working during some APAC hours often enough, at least this summer) - please, get in touch and send your CV!

by Valeriy Kravchuk (noreply@blogger.com) at June 18, 2017 02:19 PM

June 16, 2017

MariaDB AB

This is What it’s Like When Data Collides (Relational + JSON)

This is What it’s Like When Data Collides (Relational + JSON) Shane Johnson Fri, 06/16/2017 - 13:46

In the past, a benefit to using non-relational databases (i.e., NoSQL) was its simple and flexible structure. If the data was structured, a relational database was deployed. If it was semi-structured (e.g., JSON), a NoSQL database was deployed.

Today, a relational database like MariaDB Server (part of MariaDB TX) can read, write and query both structured and semi-structured data, together. MariaDB Server supports semi-structured data via dynamic columns and JSON functions. This blog post will focus on JSON functions with MariaDB Server, using examples to highlight one of they key benefits: data integrity.

MariaDB JSON example #1 – Table constraints with JSON

In this example, there is a single table for products, and every product has an id, type, name, format (e.g., Blu-ray) and price. This data will be stored in the id, name, format and price columns. However, every movie has audio and video properties and this data will be stored in the attributes column, in a JSON document.

id

type

name

format

price

attr

1

M

Aliens

Blu-ray

13.99

{

  "video": {

     "resolution": "1080p",

     "aspectRatio": "1.85:1"

  },

  "cuts": [{

     "name": "Theatrical",

     "runtime": 138

  },

  {

     "name": "Special Edition",

     "runtime": 155

  }

  ],

  "audio": ["DTS HD", "Dolby Surround"]

}

We can’t have anyone inserting a movie record without audio and video properties.

To ensure data integrity, the JSON document for audio and video attributes must have:

  • a video object

    • with a resolution

    • with an aspect ratio

  • an audio array

    • with at least one element

  • a cuts array

    • with at least one element

  • a disks integer

And while you may not be able to enforce the data integrity of JSON documents in a NoSQL database, you can with MariaDB Server – create a table constraint with JSON functions.

ALTER TABLE products ADD CONSTRAINT check_movies
	CHECK (
type = 'M' and 
	JSON_TYPE(JSON_QUERY(attr, '$.video')) = 'OBJECT' and
	JSON_TYPE(JSON_QUERY(attr, '$.cuts')) = 'ARRAY' and
	JSON_TYPE(JSON_QUERY(attr, '$.audio')) = 'ARRAY' and
	JSON_TYPE(JSON_VALUE(attr, '$.disks')) = 'INTEGER' and
	JSON_EXISTS(attr, '$.video.resolution') = 1 and
	JSON_EXISTS(attr, '$.video.aspectRatio') = 1 and
JSON_LENGTH(JSON_QUERY(attr, '$.cuts')) > 0 and
JSON_LENGTH(JSON_QUERY(attr, '$.audio')) > 0);

With MariaDB you get the best of both worlds, the data integrity of a relational database and the flexibility of semi-structured data.

MariaDB JSON example #2 – Multi-statement transactions

In this example, there is one table for inventory and one table for shopping carts. In the inventory table, there are columns for the SKU, the in-stock quantity and the in-cart quantity. In the shopping cart table, there are columns for the session id and the cart.

Inventory

sku

in_stock

in_cart

123456

1

99

 

Shopping carts

session_id

cart

aldkjdi8i8jdlkjd

{

  "items": {

     "123456": {

        "name": "Aliens",

        "quantity": 1

     }

  }

}

Let’s assume this is the SKU of Aliens on Blu-ray (one of the best movies ever), and let’s assume we have to update the inventory when a customer adds a copy of it to their shopping cart.

It’s easy to update the inventory table.

UPDATE products SET 
in_stock = in_stock = in_stock - 1,
In_cart = in_cart + 1
WHERE sku = '123456';

What about the JSON document for the shopping cart?

UPDATE products SET
	cart = JSON_SET(cart, '$.items.123456.quantity', 2)
WHERE session_id = 'aldkjdi8i8jdlkjd';

This is a problem for NoSQL databases, which do not support multi-statement transactions like relational databases do. You don’t know what’s going to be updated: both the inventory and shopping cart, either the inventory or the shopping cart, or neither.

MariaDB Server will ensure both the inventory and the shopping cart are updated using multi-statement transactions. On top of that, you get to use both structured and semi-structured data in the same database, together – relational and JSON.

Learn more

To learn more about using JSON functions and semi-structured data with MariaDB Server, join us for a webinar on June 29, 2017. Also, read two new white papers: dynamic columns and JSON functions.

Today, a relational database like MariaDB Server (part of MariaDB TX) can read, write and query both structured and semi-structured data, together. MariaDB Server supports semi-structured data via dynamic columns and JSON functions. This blog post will focus on JSON functions with MariaDB Server, using examples to highlight one of they key benefits: data integrity.

Manjot Singh

Manjot Singh

Fri, 06/16/2017 - 15:50

semi structured

I can see the usefulness of this, but in the case above, why wouldn't you just create multiple JSON columns with their own objects, ie video, cuts, audio

Login or Register to post comments

by Shane Johnson at June 16, 2017 05:46 PM

Peter Zaitsev

Peter Zaitsev’s Speaking Schedule: Percona University Belgium / PG Day / Meetups

Peter Zaitsev Speaking Schedule

This blog shows Peter Zaitsev’s speaking schedule for this summer.

Summer 2017 Speaking Engagements

This week I spoke at the DB Tech Showcase OSS conference in Japan and am now heading to Europe. I have a busy schedule in June and early July, but there are events and places where we can cross paths and have a quick conversation. Let’s meet at these events if you need anything from Percona (or me personally). 

Below is a full list of places I’ll be at this summer:

Amsterdam, Netherlands

On June 20 I am speaking at the In-Memory Computing Summit 2017 with Denis Magda (Product Manager, Gridgain Systems). Our talk “Accelerate MySQL® for Demanding OLAP and OLTP Use Cases with Apache® Ignite™” starts at 2:35 pm.

On the same day in Amsterdam, Denis and I will speak at the local MySQL User Group meetupI will share some how-tos for MySQL monitoring with Percona Monitoring and Management (PMM), along with a PMM demo.

Ghent, Belgium

On June 22 we are organizing a Percona University event in Ghent, Belgium, which is a widely known tech hub in the region. I will give several talks there on MySQL, MongoDB and PMM monitoring. Dimitri Vanoverbeke from Percona will discuss MySQL in the Cloud. We have also invited guest speakers: Frederic Descamps from Oracle, and Julien Pivotto from Inuits.

Percona University technical events are 100% free to attend, and so far we are getting very positive attendee feedback on them. To check the full agenda for the Belgium edition, and to register, please use this link.

St. Petersburg, Russia

Percona is sponsoring PG Day’17 Russia, the PostgreSQL conference. This year they added a track on open source databases (and I was happy to be their Committee member for the OSDB track). The conference starts on July 5, and on that day I will give a tutorial on InnoDB Architecture and Performance Optimization. Sveta Smirnova will also present a tutorial on MySQL Performance Troubleshooting.

On July 6-7, you can expect four more talks given by Perconians at PG Day. We invite you to stop by our booth (“Percona”) and ask us any tough questions you might have.

Moscow, Russia

On July 11 I will speak at a Moscow MySQL User Group meetup at the Mail.Ru Group office. While we’re still locking down the agenda, we always have a great selection of speakers at the MMUG meetups. Make sure you don’t miss this gathering!

Thank you, and I hope to see many of you at these events.

by Peter Zaitsev at June 16, 2017 03:05 PM

June 15, 2017

Peter Zaitsev

Three Methods of Installing Percona Monitoring and Management

Installing Percona Monitoring and Management

Installing Percona Monitoring and ManagementIn this blog post, we’ll look at three different methods for installing Percona Monitoring and Management (PMM).

Percona offers multiple methods of installing Percona Monitoring and Management, depending on your environment and scale. I’ll also share comments on which installation methods we’ve decided to forego for now. Let’s begin by reviewing the three supported methods:

  1. Virtual Appliance
  2. Amazon Machine Image
  3. Docker

Virtual Appliance

We ship an OVF/OVA method to make installation as simple as possible, with the least amount of effort required and at the lowest cost to you. You can leverage the investment in your virtualization deployment platform. OVF is an open standard for packaging and distributing virtual appliances, designed to be run in virtual machines.

Using OVA with VirtualBox as a first step is common in order to quickly play with a working PMM system, and get right to adding clients and observing activity within your own environment against your MySQL and MongoDB instances. But you can also use the OVA file for enterprise deployments. It is a flexible file format that can be imported into other popular hypervisor systems such as VMware, Red Hat Virtualization, XenServer, Microsoft System Centre Virtual Machine Manager and others.

We’d love to hear your feedback on this installation method!

AWS AMI

We also have an AWS AMI in order to provide easy scaling of PMM Server in AWS, so that you can deploy onto any instance size required for your monitoring instance. Depending on the AWS region you’re in, you’ll need to choose from the appropriate AMI Instance ID. Soon we’ll be moving to the AWS Marketplace for even easier deployment. When this is implemented, you will no longer need to clone an existing AMI ID.

Docker

Docker is our most common production deployment method. It is easy (three commands) and scalable (tuning passed on the command line to Docker run). While we recognize that Docker is still a relatively new deployment system for many users, it is dramatically gaining adoption. It is also where Percona is investing the bulk of our development efforts. We deploy PMM Server as two Docker containers: one for storing the data that persists across restarts/upgrades, and the other for running the actual PMM Server binaries (Grafana, Prometheus, consul, Orchestrator, QAN, etc.).

Where are the RPM/DEB/tar.gz packages?!

A common question I hear is why doesn’t Percona support binary-based installation?

We hear you: RPM/DEB/tar.gz methods are commonly used today for many of your own applications. Percona is striving for simplicity in our deployment of PMM Server, and we spend considerable development and QA effort validating the specific versions of Grafana/Prometheus/QAN/consul/Orchestrator all work seamlessly together.

Percona wants to ensure OS compatibility and long-term support of PMM, and to do binary distribution “right” means it can quickly get expensive to build and QA across all the popular Linux distributions available today. We’re in no way against binary distributions. For example, see our list of the nine supported platforms for which we provide bug fix support.

Percona decided to focus our development efforts on stability and features, and less on the number of supported platforms. Hence the hyper-focus on Docker. We don’t have any current plans to move to a binary deployment method for PMM, but we are always open to hearing your feedback. If there is considerable interest, then please let me know via the comments below. We’ll take these thoughts into consideration for PMM planning in the second half of 2017.

Which other methods of installing Percona Monitoring and Management would you like to see?

by Michael Coburn at June 15, 2017 06:54 PM

MariaDB AB

Introduction to Window Functions in MariaDB Server 10.2

Introduction to Window Functions in MariaDB Server 10.2 guest Thu, 06/15/2017 - 14:52

This is a guest post by Vicențiu Ciorbaru, software engineer for the MariaDB Foundation. 

Practical Examples Part 1

MariaDB Server 10.2 has now reached GA status with its latest 10.2.6 release. One of the main focuses of Server 10.2 was analytical use cases. With the surge of Big Data and Business Intelligence, the need for sifting and aggregating through lots of data creates new challenges for database engines. For really large datasets, tools like Hadoop are the go-to solution. However that is not always the most practical or easy-to-use solution, especially when you’re in a position where you need to generate on-the-fly reports. Window functions are meant to bridge that gap and provide a nice “middle of the road” solution for when you need to go through your production or data warehouse database.

The MariaDB Foundation has worked in collaboration with the MariaDB Corporation to get this feature out the door, as it was an area where the MariaDB Server was lagging behind other modern DBMS. Being an advanced querying feature, it is not trivial to explain. The best way to provide an introduction to window functions is through examples. Let us see what window functions can do with a few common use cases.

Generating an increasing row number

One of the simplest use cases (and one of the most frequent) is to generate a monotonically increasing sequence of numbers for a set of rows. Let us see how we can do that.

Our initial data looks like this:

SELECT
email, first_name,
last_name, account_type
FROM users
ORDER BY email;
 
+------------------------+------------+-----------+--------------+
| email                  | first_name | last_name | account_type |
+------------------------+------------+-----------+--------------+
| admin@boss.org         | Admin      | Boss      | admin        |
| bob.carlsen@foo.bar    | Bob        | Carlsen   | regular      |
| eddie.stevens@data.org | Eddie      | Stevens   | regular      |
| john.smith@xyz.org     | John       | Smith     | regular      |
| root@boss.org          | Root       | Chief     | admin        |
+------------------------+------------+-----------+--------------+

Now let’s add our window function.

SELECT
row_number() OVER (ORDER BY email) AS rnum,
email, first_name,
last_name, account_type
FROM users
ORDER BY email;
 
+------+------------------------+------------+-----------+--------------+
| rnum | email                  | first_name | last_name | account_type |
+------+------------------------+------------+-----------+--------------+
|    1 | admin@boss.org         | Admin      | Boss      | admin        |
|    2 | bob.carlsen@foo.bar    | Bob        | Carlsen   | regular      |
|    3 | eddie.stevens@data.org | Eddie      | Stevens   | regular      |
|    4 | john.smith@xyz.org     | John       | Smith     | regular      |
|    5 | root@boss.org          | Root       | Chief     | admin        |
+------+------------------------+------------+-----------+--------------+

So, a new column has been generated, with values always increasing. Time to dissect the syntax.

Notice the OVER keyword as well as the additional ORDER BY statement after it. The OVER keyword is required to make use of window functions. row_number is a special function, only available as a “window function”, it can not be used without the OVER clause. Inside the OVER clause, the ORDER BY statement is required to ensure that the values are assigned to each row deterministically. With this ordering, the first email alphabetically will have the rnum value of 1, the second email will have rnum value of 2, and so on.

Can we do more with this?

Yes! Notice the account_type column. We can generate separate sequences based on account type. The OVER clause supports another keyword called PARTITION BY, which works very similar to how GROUP BY works in a regular select. With PARTITION BY we split our data in “partitions” and compute the row number separately for each partition.

SELECT
row_number() OVER (PARTITION BY account_type ORDER BY email) AS rnum,
email, first_name,
last_name, account_type
FROM users
ORDER BY account_type,email;
 
+------+------------------------+------------+-----------+--------------+
| rnum | email                  | first_name | last_name | account_type |
+------+------------------------+------------+-----------+--------------+
|    1 | admin@boss.org         | Admin      | Boss      | admin        |
|    2 | root@boss.org          | Root       | Chief     | admin        |
|    1 | bob.carlsen@foo.bar    | Bob        | Carlsen   | regular      |
|    2 | eddie.stevens@data.org | Eddie      | Stevens   | regular      |
|    3 | john.smith@xyz.org     | John       | Smith     | regular      |
+------+------------------------+------------+-----------+--------------+

Looks good! Now that we have an understanding of the basic syntax supported by window functions (there’s more, but I won’t go into all the details at once) let’s look at another similar example.

Top-N Queries

The request is: “Find the top 5 salaries from each department.” Our data looks like this:

describe employee_salaries;

+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| Pk     | int(11)     | YES  |     |    NULL |       |
| name   | varchar(20) | YES  |     |    NULL |       |
| dept   | varchar(20) | YES  |     |    NULL |       |
| salary | int(11)     | YES  |     |    NULL |       |
+--------+-------------+------+-----+---------+-------+

If we don’t have access to window functions, we can do this query using regular SQL like so:

select dept, name, salary
from employee_salaries as t1
where (select count(t2.salary)
       from employee_salaries as t2
       where t1.name != t2.name and
             t1.dept = t2.dept and
             t2.salary > t1.salary) < 5
order by dept, salary desc;

The idea here is to use a subquery to count the number of employees (via count()) from the same department as the current row (where t1.dept = t2.dept) that have a salary greater than the current row (where t2.salary > t1.salary). Afterwards, we accept only the employees which have at most 4 other salaries greater than theirs.

The problem with using this subquery is that if there is no index in the table, the query takes a very long time to execute the larger the employee_salaries table gets. Of course, you can add an index to speed it up, but there are multiple problems that come up when you do it. First, creating the index is costly and maintaining it afterwards adds overhead for each transaction. This gets worse if the table you are using is a frequently updated one. Even if there is an index on the dept and salary column, each subquery execution needs to perform a lookup through the index. This adds overhead as well. There must be a better way!

Window functions to the rescue. For such queries there is a dedicated window function called rank. It is basically a counter that increments whenever we encounter a different salary, while also keeping track of identical values. Let’s see how we would get the same result using the rank function:

Let’s start with generating our ranks for all our employees.

select
rank() over (partition by dept order by salary desc) as ranking,
dept, name, salary
from employee_salaries
order by dept, ranking;

+----------------+----------+--------+
|ranking | dept  | name     | salary |
+--------+-------+----------+--------+
|      1 | Eng   | Kristian |   3500 |
|      2 | Eng   | Sergei   |   3000 |
|      3 | Eng   | Sami     |   2800 |
|      4 | Eng   | Arnold   |   2500 |
|      5 | Eng   | Scarlett |   2200 |
|      6 | Eng   | Michael  |   2000 |
|      7 | Eng   | Andrew   |   1500 |
|      8 | Eng   | Tim      |   1000 |
|      1 | Sales | Bob      |    500 |
|      2 | Sales | Jill     |    400 |
|      3 | Sales | Tom      |    300 |
|      3 | Sales | Lucy     |    300 |
|      5 | Sales | Axel     |    250 |
|      6 | Sales | John     |    200 |
|      7 | Sales | Bill     |    150 |
+--------+-------+----------+--------+

We can see that each department has a separate sequence of ranks, this is due to our partition by clause. This particular sequence of values for rank() is given by our ORDER BY clause inside the window function’s OVER clause. Finally, to get our results in a readable format we order our data by dept and the newly generated ranking column.

Given that we have the result set as is, we might be tempted to put a where clause in this statement and filter only the first 5 values per department. Let’s try it!

select
rank() over (partition by dept order by salary desc) as ranking,
dept, name, salary
from employee_salaries
where ranking <= 5
order by dept, ranking;

Unfortunately, we get the error message:

Unknown column 'ranking' in 'where clause'

There is a reason for this! This has to do with how window functions are computed. The computation of window functions happens after all WHERE, GROUP BY and HAVING clauses have been completed, right before ORDER BY. Basically, the WHERE clause has no idea that the ranking column exists. It is only present after we have filtered and grouped all the rows. To counteract this problem, we need to wrap our query into a derived table. We can then attach a where clause to it. We end up with:

select *
from (select 
        rank() over (partition by dept order by salary desc)
               as ranking,
        dept, name, salary
      from employee_salaries) as salary_ranks
where (salary_ranks.ranking <= 5)
order by dept, ranking;

This query does not return an error. Here’s how the result set looks like with a few sample data points.

+----------------+----------+--------+
|ranking | dept  | name     | salary |
+--------+-------+----------+--------+
|      1 | Eng   | Kristian |   3500 |
|      2 | Eng   | Sergei   |   3000 |
|      3 | Eng   | Sami     |   2800 |
|      4 | Eng   | Arnold   |   2500 |
|      5 | Eng   | Scarlett |   2200 |
|      1 | Sales | Bob      |    500 |
|      2 | Sales | Jill     |    400 |
|      3 | Sales | Tom      |    300 |
|      3 | Sales | Lucy     |    300 |
|      5 | Sales | Axel     |    250 |
+--------+-------+----------+--------+

We have our result with window functions, but how do the two queries compare performance wise? Disclaimer, this is done on a development machine to give a rough idea of the performance differences. This is not a “go-to” benchmark result.
 

#Rows

Regular SQL (seconds)

Regular SQL + Index (seconds)

Window Functions (seconds)

2 000

1.31

0.14

    0.00

20 000

123.6

    12.6

    0.02

200 000

14326.34

1539.79

    0.21

2 000 000

...

...

5.61

20 000 000

...

...

76.04

The #Rows represent the number of entries in the table. Each department has 100 employees. For our queries, we can see the computation time grows in a polynomial fashion. For each 10x increase in the number of rows, for regular SQL (with or without an index), the computation time increases by a factor of 100x. On the other hand, for each 10x increase in the number of rows, for regular Window Functions, the computation takes only 10x amount of time. This is due to eliminating an entire subquery evaluation for each row in our final result set.

I’ll dive in more detail about the window function implementation in further articles, but for now this should give you a taste of what window functions can do.

This is a guest post by Vicențiu Ciorbaru, software engineer for the MariaDB Foundation. 

MariaDB Server 10.2 has now reached GA status with its latest 10.2.6 release. One of the main focuses of Server 10.2 was analytical use cases. With the surge of Big Data and Business Intelligence, the need for sifting and aggregating through lots of data creates new challenges for database engines. For really large datasets, tools like Hadoop are the go-to solution. However that is not always the most practical or easy-to-use solution, especially when you’re in a position where you need to generate on-the-fly reports. Window functions are meant to bridge that gap and provide a nice “middle of the road” solution for when you need to go through your production or data warehouse database.

Login or Register to post comments

by guest at June 15, 2017 06:52 PM

Jean-Jerome Schmidt

Watch the tutorial: backup best practices for MySQL, MariaDB and Galera Cluster

Many thanks to everyone who registered and/or participated in Tuesday’s webinar on backup strategies and best practices for MySQL, MariaDB and Galera clusters led by Krzysztof Książek, Senior Support Engineer at Severalnines. If you missed the session, would like to watch it again or browse through the slides, they’re now online for viewing. Also check out the transcript of the Q&A session below.

Watch the webinar replay

Whether you’re a SysAdmin, DBA or DevOps professional operating MySQL, MariaDB or Galera clusters in production, you should make sure that your backups are scheduled, executed and regularly tested. Krzysztof shared some of his key best practice tips & tricks yesterday on how to do just that; including a live demo with ClusterControl. In short, this webinar replay shows you the pros and cons of different backup options and helps you pick the one that goes best with your environment.

Happy backuping!

Questions & Answers

Q. Can we control I/O while taking the backups with mysqldump and mysqldumper (I’ve used nice before, but it wasn’t helpful).

A. Theoretically it might be possible, although we haven’t really tested that. If you really want to apply some throttling then you may want to look into cgroups - it should help you to throttle I/O activity on a per-process basis.

Q. Can we take mydumper with ClusterControl and is ClusterControl is free software?

A. We don't currently support it, but you can always use it manually; ClusterControl doesn't prevent you from using this tool. There is a free community version of ClusterControl, yes, though its backup features are part of the commercial version. With the free community version you can deploy and monitor your database (clusters) as well as develop your own custom database advisors. You also have a one-month trial period that gives you access to all of ClusterControl’s features. You can find all the feature details here: https://severalnines.com/pricing

Q. Can xtrabackup work with data-at-rest encryption?

A. It can work with encrypted data in MySQL or Percona Server - it is because they encrypt only tablespaces, which xtrabackup just copies - it doesn’t have to access contents of tablespaces. MariaDB encrypts not only tablespaces but also, for example, InnoDB redo logs, which have to be accessed by xtrabackup - therefore xtrabackup cannot work with data-at-rest encryption as implemented in MariaDB. Because of this MariaDB Corporation forked xtrabackup into MariaDB Backup. This tool supports encryption done by MariaDB.

Q. Can you use mydumper for point-in-time recovery?

A. Yes, it is possible. mydumper can store GTID data so you can identify last applied transaction and use it as a starting position for processing binary logs.

Q. Is it a problem if we use binary logs with xtrabackup with start-datetime and end-datetime instead of start-position and end-position? We make a full backup on Fridays and every other day an incremental backup. When we need to recover we take the last full and all incremental backups and the binary logs from this day starting from 00:00 to NOW ... could there be a problem with apply-log?

A. In general, you should not use --start-datetime or --end-datetime when you want to reply binary log on the database. It’s not granular enough - it has a resolution of one second and there could be many transactions that happened during that second. You can use it to minimize timeframe to look for manually, but that’s all. If you want to replay binary logs, you should use --start-position and --end-position. Only this will precisely define from which event you will replay binlogs and on which event it’ll end up.

Q. Should I run the dump software on load balancer or one of the MySQL nodes?

A. Typically you’ll use it on MySQL nodes. Some of the tools can only do just that. For example, Xtrabackup - you have to run it locally, on the database host. You can stream output to another location, but it has to be started locally.

Q. Can we take partial backups with ClusterControl? And if yes, how can we restore a backup on a running instance?

A. Yes, you can take a partial backup using ClusterControl (you can backup separate schema using xtrabackup) but, as of now, you cannot restore a partial backup on a running instance. This is caused by the fact that the schema you’d recover will not be consistent with the rest of the cluster. To make it consistent, the cluster has to be bootstrapped from the node on which you restore a backup. So, technically, the node runs all the time but it’s a fairly heavy and invasive operation. This will change in the next version of ClusterControl in which you’d be able to restore backups on a separate host. From that host you could then dump contents of a restored schema using mysqldump (or mydumper) and restore it on a production cluster.

Q. Can you please share the mysqldumper command?

A. It’s rather hard to answer this question without doing copy and paste from the documentation, so we think it will be the best if we’d point you to the documentation: https://github.com/maxbube/mydumper/tree/master/docs

Watch the webinar replay

by jj at June 15, 2017 11:33 AM

MariaDB Foundation

Tencent Cloud Becomes a Platinum Sponsor of the MariaDB Foundation

The MariaDB Foundation today announced that Tencent Cloud, the cloud computing service provided by Tencent, has become its platinum sponsor. The sponsorship will help the Foundation ensure continuous collaboration amongst stakeholders within the MariaDB ecosystem, to drive adoption and serve the ever-growing community of users and developers. Tencent Cloud has already sponsored the MariaDB Foundation […]

The post Tencent Cloud Becomes a Platinum Sponsor of the MariaDB Foundation appeared first on MariaDB.org.

by Otto Kekäläinen at June 15, 2017 02:00 AM

June 14, 2017

Peter Zaitsev

MySQL Triggers and Updatable Views

MySQL Triggers

MySQL TriggersIn this post we’ll review how MySQL triggers can affect queries.

Contrary to what the documentation states, we can activate triggers even while operating on views:

https://dev.mysql.com/doc/refman/5.7/en/triggers.html

Important: MySQL triggers activate only for changes made to tables by SQL statements. They do not activate for changes in views, nor by changes to tables made by APIs that do not transmit SQL statements to the MySQL server.

Be on the lookout if you use and depend on triggers, since it’s not the case for updatable views! We have reported a documentation bug for this but figured it wouldn’t hurt to mention this as a short blog post, too. 🙂 The link to the bug in question is here:

https://bugs.mysql.com/bug.php?id=86575

Now, we’ll go through the steps we took to test this, and their outputs. These are for the latest MySQL version (5.7.18), but the same results were seen in 5.5.54, 5.6.35, and MariaDB 10.2.5.

First, we create the schema, tables and view needed:

mysql> CREATE SCHEMA view_test;
Query OK, 1 row affected (0.00 sec)
mysql> USE view_test;
Database changed
mysql> CREATE TABLE `main_table` (
   ->   `id` int(11) NOT NULL AUTO_INCREMENT,
   ->   `letters` varchar(64) DEFAULT NULL,
   ->   `numbers` int(11) NOT NULL,
   ->   `time` time NOT NULL,
   ->   PRIMARY KEY (`id`),
   ->   INDEX col_b (`letters`),
   ->   INDEX cols_c_d (`numbers`,`letters`)
   -> ) ENGINE=InnoDB  DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.31 sec)
mysql> CREATE TABLE `table_trigger_control` (
   ->   `id` int(11),
   ->   `description` varchar(255)
   -> ) ENGINE=InnoDB  DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.25 sec)
mysql> CREATE VIEW view_main_table AS SELECT * FROM main_table;
Query OK, 0 rows affected (0.02 sec)

Indexes are not really needed to prove the point, but were initially added to the tests for completeness. They make no difference in the results.

Then, we create the triggers for all possible combinations of [BEFORE|AFTER] and [INSERT|UPDATE|DELETE]. We will use the control table to have the triggers insert rows, so we can check if they were actually called by our queries.

mysql> CREATE TRIGGER trigger_before_insert BEFORE INSERT ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "BEFORE INSERT");
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TRIGGER trigger_after_insert AFTER INSERT ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "AFTER INSERT");
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TRIGGER trigger_before_update BEFORE UPDATE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "BEFORE UPDATE");
Query OK, 0 rows affected (0.19 sec)
mysql> CREATE TRIGGER trigger_after_update AFTER UPDATE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "AFTER UPDATE");
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TRIGGER trigger_before_delete BEFORE DELETE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (OLD.id, "BEFORE DELETE");
Query OK, 0 rows affected (0.18 sec)
mysql> CREATE TRIGGER trigger_after_delete AFTER DELETE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (OLD.id, "AFTER DELETE");
Query OK, 0 rows affected (0.05 sec)

As you can see, they will insert the ID of the row in question, and the combination of time/action appropriate for each one. Next, we will proceed in the following manner:

  1. INSERT three rows in the main table
  2. UPDATE the second
  3. DELETE the third

The reasoning behind doing it against the base table is to check that the triggers are working correctly, and doing what we expect them to do.

mysql> INSERT INTO main_table VALUES (1, 'A', 10, time(NOW()));
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO main_table VALUES (2, 'B', 20, time(NOW()));
Query OK, 1 row affected (0.14 sec)
mysql> INSERT INTO main_table VALUES (3, 'C', 30, time(NOW()));
Query OK, 1 row affected (0.17 sec)
mysql> UPDATE main_table SET letters = 'MOD' WHERE id = 2;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> DELETE FROM main_table WHERE id = 3;
Query OK, 1 row affected (0.10 sec)

And we check our results:

mysql> SELECT * FROM main_table;
+----+---------+---------+----------+
| id | letters | numbers | time     |
+----+---------+---------+----------+
|  1 | A       |      10 | 15:19:14 |
|  2 | MOD     |      20 | 15:19:14 |
+----+---------+---------+----------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM table_trigger_control;
+------+---------------+
| id   | description   |
+------+---------------+
|    1 | BEFORE INSERT |
|    1 | AFTER INSERT  |
|    2 | BEFORE INSERT |
|    2 | AFTER INSERT  |
|    3 | BEFORE INSERT |
|    3 | AFTER INSERT  |
|    2 | BEFORE UPDATE |
|    2 | AFTER UPDATE  |
|    3 | BEFORE DELETE |
|    3 | AFTER DELETE  |
+------+---------------+
10 rows in set (0.00 sec)

Everything is working as it should, so let’s move on with the tests that we really care about. We will again take the three steps mentioned above, but this time directly on the view.

mysql> INSERT INTO view_main_table VALUES (4, 'VIEW_D', 40, time(NOW()));
Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO view_main_table VALUES (5, 'VIEW_E', 50, time(NOW()));
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO view_main_table VALUES (6, 'VIEW_F', 60, time(NOW()));
Query OK, 1 row affected (0.11 sec)
mysql> UPDATE view_main_table SET letters = 'VIEW_MOD' WHERE id = 5;
Query OK, 1 row affected (0.04 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> DELETE FROM view_main_table WHERE id = 6;
Query OK, 1 row affected (0.01 sec)

And we check our tables:

mysql> SELECT * FROM main_table;
+----+----------+---------+----------+
| id | letters  | numbers | time     |
+----+----------+---------+----------+
|  1 | A        |      10 | 15:19:14 |
|  2 | MOD      |      20 | 15:19:14 |
|  4 | VIEW_D   |      40 | 15:19:34 |
|  5 | VIEW_MOD |      50 | 15:19:34 |
+----+----------+---------+----------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM view_main_table;
+----+----------+---------+----------+
| id | letters  | numbers | time     |
+----+----------+---------+----------+
|  1 | A        |      10 | 15:19:14 |
|  2 | MOD      |      20 | 15:19:14 |
|  4 | VIEW_D   |      40 | 15:19:34 |
|  5 | VIEW_MOD |      50 | 15:19:34 |
+----+----------+---------+----------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM table_trigger_control;
+------+---------------+
| id   | description   |
+------+---------------+
|    1 | BEFORE INSERT |
|    1 | AFTER INSERT  |
|    2 | BEFORE INSERT |
|    2 | AFTER INSERT  |
|    3 | BEFORE INSERT |
|    3 | AFTER INSERT  |
|    2 | BEFORE UPDATE |
|    2 | AFTER UPDATE  |
|    3 | BEFORE DELETE |
|    3 | AFTER DELETE  |
|    4 | BEFORE INSERT |
|    4 | AFTER INSERT  |
|    5 | BEFORE INSERT |
|    5 | AFTER INSERT  |
|    6 | BEFORE INSERT |
|    6 | AFTER INSERT  |
|    5 | BEFORE UPDATE |
|    5 | AFTER UPDATE  |
|    6 | BEFORE DELETE |
|    6 | AFTER DELETE  |
+------+---------------+
20 rows in set (0.00 sec)

As seen in the results, all triggers were executed, even when the queries were run against the view. Since this was an updatable view, it worked. On the contrary, if we try on a non-updatable view it fails (we can force ALGORITHM = TEMPTABLE to test it).

mysql> CREATE ALGORITHM=TEMPTABLE VIEW view_main_table_temp AS SELECT * FROM main_table;
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO view_main_table_temp VALUES (7, 'VIEW_H', 70, time(NOW()));
ERROR 1471 (HY000): The target table view_main_table_temp of the INSERT is not insertable-into
mysql> UPDATE view_main_table_temp SET letters = 'VIEW_MOD' WHERE id = 5;
ERROR 1288 (HY000): The target table view_main_table_temp of the UPDATE is not updatable
mysql> DELETE FROM view_main_table_temp WHERE id = 5;
ERROR 1288 (HY000): The target table view_main_table_temp of the DELETE is not updatable

As mentioned before, MariaDB shows the same behavior. The difference, however, is that the documentation is correct in mentioning the limitations, since it only shows the following:

https://mariadb.com/kb/en/mariadb/trigger-limitations/

Triggers cannot operate on any tables in the mysql, information_schema or performance_schema database.

Corollary to the Discussion

It’s always good to thoroughly check the documentation, but it’s also necessary to test things and prove the documentation is showing the real case (bugs can be found everywhere, not just in the code :)).

by Agustín at June 14, 2017 06:54 PM

June 13, 2017

Peter Zaitsev

Webinar Thursday, June 15, 2017: Demystifying Postgres Logical Replication

Postgres Logical Replication

Postgres Logical ReplicationJoin Percona’s Senior Technical Services Engineer Emanuel Calvo as he presents Demystifying Postgres Logical Replication on Thursday, June 15, 2017 at 7 am PDT / 10 am EDT (UTC-7).

The Postgres logical decoding feature was added in version 9.4, and thankfully it is continuously improving due to the vibrant open source community. In this webinar, we are going to walk through its concepts, usage and some of the new things coming up in future releases.

Logical decoding is one of the features under the BDR implementation, allowing bidirectional streams of data between Postgres instances. It also allows you to stream data outside Postgres into many other data systems.

Register for the webinar here.

Emanuel CalvoEmanuel Calvo, Percona Sr. Technical Services

Emanuel has worked with MySQL for more than eight years. He is originally from Argentina, but also lived and worked in Spain and other Latin American countries. He lectures and presents at universities and local events. Emanuel currently works as a Sr. Technical Services at Percona, focusing primarily on MySQL. His professional background includes experience at telecommunication companies, educational institutions and data warehousing solutions. In his career, he has worked as a developer, SysAdmin and DBA in companies like Pythian, Blackbird.io/PalominoDB, Siemens IT Solutions, Correo Argentino (Argentinian Postal Services), Globant-EA, SIU – Government Educational Institution and Aedgency among others. As a community member he has lectured and given talks in Argentina, Brazil, United States, Paraguay, Spain and Belgium as well as written several technical papers.

by Dave Avery at June 13, 2017 07:21 PM

MariaDB AB

Performance Improvements in MariaDB MaxScale 2.1 GA

Performance Improvements in MariaDB MaxScale 2.1 GA Johan Tue, 06/13/2017 - 03:19

Performance is important and perhaps doubly so for a database proxy such as MariaDB MaxScale. From the start, the goal of MaxScale’s architecture has been that the performance should scale with the number of cpu-cores and the means for achieving that is the use of asynchronous I/O and Linux' epoll, combined with a fixed number of worker threads.

However, benchmarking showed that there were some bottlenecks, as the throughput did not continuously increase as more threads were added, but rather quickly it would start to decrease. The throughput also hit a relatively low ceiling, above which it would not grow, irrespective of the configuration of MaxScale.

When we started working with MariaDB MaxScale 2.1 we decided to make some significant changes to the internal architecture, with the aim of improving the overall performance. In this blog post, I will describe what we have done and show how the performance has improved.

The benchmark results have all been obtained using the same setup; two computers (connected with a GBE LAN), both of which have 16 cores / 32 hyperthreads, 128GB RAM and an SSD drive. One of the computers was used for running 4 MariaDB-10.1.13 servers and the other for running MariaDB MaxScale and sysbench. The used sysbench is a version slightly modified by MariaDB and it is available here. The configurations used and the raw data can be found here.

In all test runs, the MaxScale configuration was basically the same and 16 worker threads were used. As can be seen here. 16 threads is not optimal for MaxScale 2.0, but that does not distort the general picture too much. The used workload is in all cases the same; an OLTP read-only benchmark with 100 simple selects per iteration, no transaction boundaries, and autocommit enabled.

In the figures

  • direct means that sysbench itself distributes the load to the servers in a round-robin fashion (i.e. not through MaxScale),

  • rcr means that MaxScale with the readconnroute router is used, and

  • rws means that MaxScale with the readwritesplit router is used.

The rcr and rws results are meaningful to compare and the difference reflects the overhead caused by the additional complexity and versatility of the readwritesplit router.

As a baseline, in the following figure is shown the result of MaxScale 2.0.5 GA.

MaxScale-2.0.5.png

As can be seen, a saturation point is reached at 64 client connections, at which point the throughput is less than half of that of a direct connection.

Threading Architecture

Before MaxScale 2.1, all threads were used in a completely symmetric fashion; every thread listened on the same epoll instance, which in practice meant that every thread could accept an incoming connection, read from and write to any client and read from and write to any server related to that client. Without going into the specifics, that implied not only a lot of locking inside MaxScale but also made it quite hard to ensure that there could no data races.

For 2.1 we changed the architecture so that when a client has connected, a particular thread is used for handling all communication with and behalf of that client. That is, it is the thread that reads a request from a client that also writes that request to one or more servers, reads the responses from those servers and finally writes a response to the client. With this change quite a lot of locking could be removed, while at the same time the possibilities for data races were reduced.

As can be seen in the following figure, that had quite an impact on the throughput, as far as readconnroute is concerned. The behaviour of readwritesplit did not really change.

MaxScale-2.1.0.png

In 2.1.0 Beta we also introduced a caching filter and as long as the number of client connections is low, the impact is quite significant. However, as the number of client connections grows, the benefit shrinks dramatically.

The cache can be configured to be shared between all threads or to be specific for each thread and in the benchmark a thread specific cache was used. Otherwise the cache was not configured in any way, which means that all SELECT statements, but for those that are considered non-cacheable, are cached and that the cache uses as much memory as it needs (no LRU based cache eviction).

It is not shown in the figure, but readconnroute + cache performed roughly equivalently with readwritesplit + cache. That is, from 32 client connections onwards readconnroute + cache performed much worse than readconnroute alone.

Increasing Query Classification Concurrency

The change in the threading architecture improved the raw throughput of MaxScale but it was still evident that the throughput did not increase in all contexts in the expected way. Investigations showed that there was some unnecessary serialization going on when statements were classified, something that readwritesplit needs to do, but readconnroute does not.

Statements are classified inside MaxScale using a custom sqlite based parser. An in memory database is created for each thread so that there can be no concurrency issues between classification performed by different threads. However, it turned out that sqlite was not built in a way that would have turned off all locking inside sqlite, but some locking was still taking place. For 2.1.1 Beta this issue was fixed and now no locking takes place when statements are classified.

More Aggressive Caching

Initially the cache worked so that caching was taking place only if no transaction was in progress or if an explicitly read-only transaction was in progress. In 2.1.1 Beta we changed that so that caching takes place also in transactions that are not explicitly read-only, but only until the first updating statement.

Reduce Data Collection

Up until 2.1.1 Beta the query classifier always collected of a statement all information that is accessible via the query classifier API. Profiling showed that the cost for the allocation of memory for storing the accessed fields and the used functions was significant. In most cases paying that cost is completely unnecessary, as it primarily is only the database firewall filter that is interested in that information.

For 2.1.2 Beta the internal API was changed so that filters and routers can express what information they need and only that information is subsequently collected during query classification.

Custom Parser

Many routers and filters need to know whether a transaction is in progress or not. Up until 2.1.1 Beta that information was obtained using the parser of the query classifier. That imposed a significant cost if classification otherwise was not needed. For 2.1.2 Beta a custom parser for only detecting statements affecting the transaction state and autocommit mode was written. Now the detection carries almost no cost at all.

Optional Cache Safety Checking

By default, the cache filter verifies that a SELECT statement is cacheable. For instance, a SELECT that refers to user variables or uses functions such as CURRENT_DATE(), is not cacheable. In order to do that verification, the cache uses the query classifier, which means that the statement will be parsed. In 2.1.2 Beta we introduced the possibility to specify in the configuration file that all SELECT statements can be assumed to be cacheable, which means that no parsing needs to be performed. In practice this behaviour is enabled by adding the following entry to the cache section in the MaxScale configuration file.

    [Cache]
    ...
    selects=assume_cacheable

2.1.3 GA

The impact of the changes above is shown in the figure below.

MaxScale-2.1.3.png

Compared to MaxScale 2.1.0, the performance of readconnroute is roughly the same. However, the performance of readwritesplit has increased by more than 100% and is no longer significantly different from readconnroute. Also the cache performance has improved quite dramatically, as can be seen in the following figure.

MaxScale-2.1.3-rwsplit.png

In the figure, cacheable means that the cache has been configured so that it can assume that all SELECT statements can be cached.

From the figure can be seen that if there are less than 64 concurrent client connections, the cache improves the throughput, even if is verified that each SELECT statement is cacheable. With more concurrent connections than that, the verification constitutes a significant cost that diminishes the benefit of the cache. Best performance is always obtained if the verification can be turned off.

Compared to 2.0.5 the overall improvement is quite evident and although the shown results reflect a particular benchmark, it is to expected that 2.1 consistently performs better than 2.0, irrespective of the workload.

Looking Forward

Performance wise 2.1.3 GA is the best MaxScale version there has ever been, but there are still improvements that can be made.

In 2.1 all threads listen on new clients but once the connection has been accepted, it is assigned to a particular thread. That causes cross-thread activity that requires locking. In the next version, MaxScale 2.2 will change this so that the thread that accepts the connection is also the one that will handle it. That way there will be no cross-thread activity when a client connects.

A significant amount of the remaining locking inside MaxScale is actually related to the collection of statistics. Namely, as the MaxScale component for handling maxadmin requests is a regular MaxScale router, a maxadmin connection will run on some thread, yet it needs to access data belonging to the other threads. In 2.2 we introduce an internal epoll-based cross-thread messaging mechanism and have a dedicated admin thread, which communicates with the worker threads using that messaging mechanism. The result is that all statistics related locking can be removed.

Try it out today. Download MariaDB MaxScale.

Performance is important and perhaps doubly so for a database proxy such as MariaDB MaxScale. When we started working with MariaDB MaxScale 2.1 we decided to make some significant changes to the internal architecture, with the aim of improving the overall performance. In this blog post, I will describe what we have done and show how the performance has improved.

Login or Register to post comments

by Johan at June 13, 2017 07:19 AM

June 09, 2017

Peter Zaitsev

Q & A: MySQL In the Cloud – Migration, Best Practices, High Availability, Scaling

MySQL in the Cloud

MySQL in the CloudIn this blog, we will provide answers to the Q & A for the MySQL In the Cloud: Migration, Best Practices, High Availability, Scaling webinar.

First, we want to thank everybody for attending the June 7, 2017 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that we were unable to answer during the webinar:

How does Percona XtraDB cluster work with AWS for MySQL clustering?

Percona XtraDB Cluster works especially well in cloud environments, including Amazon EC2. Since Percona XtraDB Cluster only requires one network round trip per transaction for write transactions commits, and keeps all reads local, allows it to deploy high performance multi AZ and even multi region clusters. The fact that each Percona XtraDB Cluster node contains all the data allows it to avoid reliance on the EBS storage. You can run Percona XtraDB Cluster on NVMe storage based i3 EC2 nodes to achieve high performance even with very IO-intensive workloads. Automatic provisioning and cluster self healing allows you to easily scale the cluster. We have simple tutorial on how to deploy Percona XtraDB Cluster on AWS – check it out here.

How do you approach master-master model? Are there enough reasons to use the model to implement multi-site scaling?

There are two distinct multi-master modes in existence. A synchronous Master-Master solution, like the one offered by Percona XtraDB Cluster (virtually synchronous to be exact), guarantees there are no data conflicts as you connect to the nodes located at different sites. The downside of this model is that writes can be expensive. As such, it works well in environments with low latency between the different sites, or when high latency for updates can be tolerated. Percona XtraDB Cluster is greatly optimized in that it requires only one network roundtrip to complete a commit transaction. This significantly reduces the added latency compared to many other solutions.

In contrast, asynchronous Master-Master means you can perform writes locally, without waiting on a network round trip.  It comes with the downside of possible data conflicts. In MySQL, it can be implemented using MySQL Replication. MySQL Replication only detects conflicts at this point, however, and stops if it detects a conflict. It has no good built-in conflict resolution. Ensuring conflicts do not happen on the application level is hard and error prone, and only recommended in rare cases. Most applications out there do not use Active Master-Master, but rather design an architecture where each database replication set operates with a only a single writable node.

Do the Percona tools work in the cloud, like in Amazon Aurora?

We try to make Percona software in the cloud when it makes sense. For example, Percona Toolkit and Percona Monitoring Management support Amazon RDS and Amazon Aurora. Percona XtraBackup does not, as it requires physical access to the database files (Amazon RDS and Aurora don’t provide that).  Having said that, Amazon recently updated its Aurora migration documentation to include the use of XtraBackup. Amazon Aurora supports backups taken by Percona XtraBackup as a way to import data.

What is the fastest way to verify and validate backups created by XtraBackup for databases around 2-3TB?

In the big picture, you test backups by doing some sort of restore and validation. This can be done manually, but is much better if automated. There are three levels of such validation:

  • Basic Validation. Run –apply-log and ensure it completes successfully. Start the MySQL instance and run some basic queries to ensure it works. Often running some queries to see that recent data is present is a good idea.  
  • Consistency Validation.  Additionally, run Check Table on all tables to ensure there is no corruption. This way, tables and indexes data structures are validated.   
  • Full Validation. Restore the backup and connect the restored backup as a MySQL slave (possibly to one of the existing slaves). Let it catch up and then run pt-table-checksum to validate consistency and ensure that the data in backup matches what is in the source.

Running a checktable on databases on AWS IO optimized instances takes up to eight hours. Any other suggestions on how to replace checktable in validation?”

Without knowing the table size, it is hard for me to assess whether eight hours is reasonable for your environment. However, generally speaking you should not run a Full Validation on every backup. Full Validation first and foremost validates the backup and restore pipeline. If you’re not seeing issues, doing it once per month is plenty. You want to do lighter checks on a daily and weekly basis. 

What approach would you recommend for a data warehouse needing about 80,000IOPS, currently on FusionIO bare metal? Which cloud solution would be my best bet?

This is complicated question. To answer it properly requires more information. We need to know what type of operations your database performs. Working with a Percona Consultant to do an A&D for your environment would give you best answer. In general though, EBS (even with a large number of provisioned IOPs) would not match FusionIO in IO request latency. I3 high IO instances with NVMe storage is closer match. If budget is not a concern, you can look into X1 instances. These can have up to 2TB of memory and often allow getting all (or a large portion) of the database in memory for even higher performance.

Thanks for attending the MySQL In the Cloud: Migration, Best Practices, High Availability, Scaling webinar! Post any more MySQL in the cloud comments below.

by Peter Zaitsev at June 09, 2017 08:36 PM

Jean-Jerome Schmidt

MySQL on Docker: Running Galera Cluster on Kubernetes

In the last couple of blogs, we covered how to run a Galera Cluster on Docker, whether on standalone Docker or on multi-host Docker Swarm with overlay network. In this blog post, we’ll look into running Galera Cluster on Kubernetes, an orchestration tool to run containers at scale. Some parts are different, such as how the application should connect to the cluster, how Kubernetes handles failover and how the load balancing works in Kubernetes.

Kubernetes vs Docker Swarm

Our ultimate target is to ensure Galera Cluster runs reliably in a container environment. We previously covered Docker Swarm, and it turned out that running Galera Cluster on it has a number of blockers, which prevent it from being production ready. Our journey now continues with Kubernetes, a production-grade container orchestration tool. Let’s see which level of “production-readiness” it can support when running a stateful service like Galera Cluster.

Before we move further, let us highlight some of key differences between Kubernetes (1.6) and Docker Swarm (17.03) when running Galera Cluster on containers:

  • Kubernetes supports two health check probes - liveness and readiness. This is important when running a Galera Cluster on containers, because a live Galera container does not mean it is ready to serve and should be included in the load balancing set (think of a joiner/donor state). Docker Swarm only supports one health check probe similar to Kubernetes’ liveness, a container is either healthy and keeps running or unhealthy and gets rescheduled. Read here for details.
  • Kubernetes has a UI dashboard accessible via “kubectl proxy”.
  • Docker Swarm only supports round-robin load balancing (ingress), while Kubernetes uses least connection.
  • Docker Swarm supports routing mesh to publish a service to the external network, while Kubernetes supports something similar called NodePort, as well as external load balancers (GCE GLB/AWS ELB) and external DNS names (as for v1.7)

Installing Kubernetes using kubeadm

We are going to use kubeadm to install a 3-node Kubernetes cluster on CentOS 7. It consists of 1 master and 2 nodes (minions). Our physical architecture looks like this:

1. Install kubelet and Docker on all nodes:

$ ARCH=x86_64
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-${ARCH}
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
        https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
$ setenforce 0
$ yum install -y docker kubelet kubeadm kubectl kubernetes-cni
$ systemctl enable docker && systemctl start docker
$ systemctl enable kubelet && systemctl start kubelet

2. On the master, initialize the master, copy the configuration file, setup the Pod network using Weave and install Kubernetes Dashboard:

$ kubeadm init
$ cp /etc/kubernetes/admin.conf $HOME/
$ export KUBECONFIG=$HOME/admin.conf
$ kubectl apply -f https://git.io/weave-kube-1.6
$ kubectl create -f https://git.io/kube-dashboard

3. Then on the other remaining nodes:

$ kubeadm join --token 091d2a.e4862a6224454fd6 192.168.55.140:6443

4. Verify the nodes are ready:

$ kubectl get nodes
NAME          STATUS    AGE       VERSION
kube1.local   Ready     1h        v1.6.3
kube2.local   Ready     1h        v1.6.3
kube3.local   Ready     1h        v1.6.3

We now have a Kubernetes cluster for Galera Cluster deployment.

Galera Cluster on Kubernetes

In this example, we are going to deploy a MariaDB Galera Cluster 10.1 using Docker image pulled from our DockerHub repository. The YAML definition files used in this deployment can be found under example-kubernetes directory in the Github repository.

Kubernetes supports a number of deployment controllers. To deploy a Galera Cluster, one can use:

  • ReplicaSet
  • StatefulSet

Each of them has their own pro and cons. We are going to look into each one of them and see what’s the difference.

Prerequisites

The image that we built requires an etcd (standalone or cluster) for service discovery. To run an etcd cluster requires each etcd instance to be running with different commands so we are going to use Pods controller instead of Deployment and create a service called “etcd-client” as endpoint to etcd Pods. The etcd-cluster.yaml definition file tells it all.

To deploy a 3-pod etcd cluster, simply run:

$ kubectl create -f etcd-cluster.yaml

Verify if the etcd cluster is ready:

$ kubectl get po,svc
NAME                        READY     STATUS    RESTARTS   AGE
po/etcd0                    1/1       Running   0          1d
po/etcd1                    1/1       Running   0          1d
po/etcd2                    1/1       Running   0          1d
 
NAME              CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
svc/etcd-client   10.104.244.200   <none>        2379/TCP            1d
svc/etcd0         10.100.24.171    <none>        2379/TCP,2380/TCP   1d
svc/etcd1         10.108.207.7     <none>        2379/TCP,2380/TCP   1d
svc/etcd2         10.101.9.115     <none>        2379/TCP,2380/TCP   1d

Our architecture is now looking something like this:

Using ReplicaSet

A ReplicaSet ensures that a specified number of pod “replicas” are running at any given time. However, a Deployment is a higher-level concept that manages ReplicaSets and provides declarative updates to pods along with a lot of other useful features. Therefore, it’s recommended to use Deployments instead of directly using ReplicaSets, unless you require custom update orchestration or don’t require updates at all. When you use Deployments, you don’t have to worry about managing the ReplicaSets that they create. Deployments own and manage their ReplicaSets.

In our case, we are going to use Deployment as the workload controller, as shown in this YAML definition. We can directly create the Galera Cluster ReplicaSet and Service by running the following command:

$ kubectl create -f mariadb-rs.yml

Verify if the cluster is ready by looking at the ReplicaSet (rs), pods (po) and services (svc):

$ kubectl get rs,po,svc
NAME                  DESIRED   CURRENT   READY     AGE
rs/galera-251551564   3         3         3         5h
 
NAME                        READY     STATUS    RESTARTS   AGE
po/etcd0                    1/1       Running   0          1d
po/etcd1                    1/1       Running   0          1d
po/etcd2                    1/1       Running   0          1d
po/galera-251551564-8c238   1/1       Running   0          5h
po/galera-251551564-swjjl   1/1       Running   1          5h
po/galera-251551564-z4sgx   1/1       Running   1          5h
 
NAME              CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
svc/etcd-client   10.104.244.200   <none>        2379/TCP            1d
svc/etcd0         10.100.24.171    <none>        2379/TCP,2380/TCP   1d
svc/etcd1         10.108.207.7     <none>        2379/TCP,2380/TCP   1d
svc/etcd2         10.101.9.115     <none>        2379/TCP,2380/TCP   1d
svc/galera-rs     10.107.89.109    <nodes>       3306:30000/TCP      5h
svc/kubernetes    10.96.0.1        <none>        443/TCP             1d

From the output above, we can illustrate our Pods and Service as below:

Running Galera Cluster on ReplicaSet is similar to treating it as a stateless application. It orchestrates pod creation, deletion and updates and can be targeted for Horizontal Pod Autoscales (HPA), i.e. a ReplicaSet can be auto-scaled if it meets certain thresholds or targets (CPU usage, packets-per-second, request-per-second etc).

If one of the Kubernetes nodes goes down, new Pods will be scheduled on an available node to meet the desired replicas. Volumes associated with the Pod will be deleted, if the Pod is deleted or rescheduled. The Pod hostname will be randomly generated, making it harder to track where the container belongs by simply looking at the hostname.

All this works pretty well in test and staging environments, where you can perform a full container lifecycle like deploy, scale, update and destroy without any dependencies. Scaling up and down is straightforward, by updating the YAML file and posting it to Kubernetes cluster or by using the scale command:

$ kubectl scale replicaset galera-rs --replicas=5

Using StatefulSet

Known as PetSet on pre 1.6 version, StatefulSet is the best way to deploy Galera Cluster in production, because:

  • Deleting and/or scaling down a StatefulSet will not delete the volumes associated with the StatefulSet. This is done to ensure data safety, which is generally more valuable than an automatic purge of all related StatefulSet resources.
  • For a StatefulSet with N replicas, when Pods are being deployed, they are created sequentially, in order from {0 .. N-1}.
  • When Pods are being deleted, they are terminated in reverse order, from {N-1 .. 0}.
  • Before a scaling operation is applied to a Pod, all of its predecessors must be Running and Ready.
  • Before a Pod is terminated, all of its successors must be completely shut down.

StatefulSet provides first-class support for stateful containers. It provides a deployment and scaling guarantee. When a three-node Galera Cluster is created, three Pods will be deployed in the order db-0, db-1, db-2. db-1 will not be deployed before db-0 is “Running and Ready”, and db-2 will not be deployed until db-1 is “Running and Ready”. If db-0 should fail, after db-1 is “Running and Ready”, but before db-2 is launched, db-2 will not be launched until db-0 is successfully relaunched and becomes “Running and Ready”.

We are going to use Kubernetes implementation of persistent storage called PersistentVolume and PersistentVolumeClaim. This to ensure data persistency if the pod got rescheduled to the other node. Even though Galera Cluster provides the exact copy of data on each replica, having the data persistent in every pod is good for troubleshooting and recovery purposes.

To create a persistent storage, first we have to create PersistentVolume for every pod. PVs are volume plugins like Volumes in Docker, but have a lifecycle independent of any individual pod that uses the PV. Since we are going to deploy a 3-node Galera Cluster, we need to create 3 PVs:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: datadir-galera-0
  labels:
    app: galera-ss
    podindex: "0"
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  hostPath:
    path: /data/pods/galera-0/datadir
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: datadir-galera-1
  labels:
    app: galera-ss
    podindex: "1"
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  hostPath:
    path: /data/pods/galera-1/datadir
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: datadir-galera-2
  labels:
    app: galera-ss
    podindex: "2"
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  hostPath:
    path: /data/pods/galera-2/datadir

The above definition shows we are going to create 3 PV, mapped to the Kubernetes nodes’ physical path with 10GB of storage space. We defined ReadWriteOnce, which means the volume can be mounted as read-write by only a single node. Save the above lines into mariadb-pv.yml and post it to Kubernetes:

$ kubectl create -f mariadb-pv.yml
persistentvolume "datadir-galera-0" created
persistentvolume "datadir-galera-1" created
persistentvolume "datadir-galera-2" created

Next, define the PersistentVolumeClaim resources:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: mysql-datadir-galera-ss-0
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  selector:
    matchLabels:
      app: galera-ss
      podindex: "0"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: mysql-datadir-galera-ss-1
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  selector:
    matchLabels:
      app: galera-ss
      podindex: "1"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: mysql-datadir-galera-ss-2
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  selector:
    matchLabels:
      app: galera-ss
      podindex: "2"

The above definition shows that we would like to claim the PV resources and use the spec.selector.matchLabels to look for our PV (metadata.labels.app: galera-ss) based on the respective pod index (metadata.labels.podindex) assigned by Kubernetes. The metadata.name resource must use the format “{volumeMounts.name}-{pod}-{ordinal index}” defined under the spec.templates.containers so Kubernetes knows which mount point to map the claim into the pod.

Save the above lines into mariadb-pvc.yml and post it to Kubernetes:

$ kubectl create -f mariadb-pvc.yml
persistentvolumeclaim "mysql-datadir-galera-ss-0" created
persistentvolumeclaim "mysql-datadir-galera-ss-1" created
persistentvolumeclaim "mysql-datadir-galera-ss-2" created

Our persistent storage is now ready. We can then start the Galera Cluster deployment by creating a StatefulSet resource together with Headless service resource as shown in mariadb-ss.yml:

$ kubectl create -f mariadb-ss.yml
service "galera-ss" created
statefulset "galera-ss" created

Now, retrieve the summary of our StatefulSet deployment:

$ kubectl get statefulsets,po,pv,pvc -o wide
NAME                     DESIRED   CURRENT   AGE
statefulsets/galera-ss   3         3         1d        galera    severalnines/mariadb:10.1   app=galera-ss
 
NAME                        READY     STATUS    RESTARTS   AGE       IP          NODE
po/etcd0                    1/1       Running   0          7d        10.36.0.1   kube3.local
po/etcd1                    1/1       Running   0          7d        10.44.0.2   kube2.local
po/etcd2                    1/1       Running   0          7d        10.36.0.2   kube3.local
po/galera-ss-0              1/1       Running   0          1d        10.44.0.4   kube2.local
po/galera-ss-1              1/1       Running   1          1d        10.36.0.5   kube3.local
po/galera-ss-2              1/1       Running   0          1d        10.44.0.5   kube2.local
 
NAME                  CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                               STORAGECLASS   REASON    AGE
pv/datadir-galera-0   10Gi       RWO           Retain          Bound     default/mysql-datadir-galera-ss-0                            4d
pv/datadir-galera-1   10Gi       RWO           Retain          Bound     default/mysql-datadir-galera-ss-1                            4d
pv/datadir-galera-2   10Gi       RWO           Retain          Bound     default/mysql-datadir-galera-ss-2                            4d
 
NAME                            STATUS    VOLUME             CAPACITY   ACCESSMODES   STORAGECLASS   AGE
pvc/mysql-datadir-galera-ss-0   Bound     datadir-galera-0   10Gi       RWO                          4d
pvc/mysql-datadir-galera-ss-1   Bound     datadir-galera-1   10Gi       RWO                          4d
pvc/mysql-datadir-galera-ss-2   Bound     datadir-galera-2   10Gi       RWO                          4d

At this point, our Galera Cluster running on StatefulSet can be illustrated as in the following diagram:

Running on StatefulSet guarantees consistent identifiers like hostname, IP address, network ID, cluster domain, Pod DNS and storage. This allows the Pod to easily distinguish itself from others in a group of Pods. The volume will be retained on the host and will not get deleted if the Pod is deleted or rescheduled onto another node. This allows for data recovery and reduces the risk of total data loss.

On the negative side, the deployment time will be N-1 times (N = replicas) longer because Kubernetes will obey the ordinal sequence when deploying, rescheduling or deleting the resources. It would be a bit of a hassle to prepare the PV and claims before thinking about scaling your cluster. Take note that updating an existing StatefulSet is currently a manual process, where you can only update spec.replicas at the moment.

Connecting to Galera Cluster Service and Pods

There are a couple of ways you can connect to the database cluster. You can connect directly to the port. In the “galera-rs” service example, we use NodePort, exposing the service on each Node’s IP at a static port (the NodePort). A ClusterIP service, to which the NodePort service will route, is automatically created. You’ll be able to contact the NodePort service, from outside the cluster, by requesting {NodeIP}:{NodePort}.

Example to connect to the Galera Cluster externally:

(external)$ mysql -udb_user -ppassword -h192.168.55.141 -P30000
(external)$ mysql -udb_user -ppassword -h192.168.55.142 -P30000
(external)$ mysql -udb_user -ppassword -h192.168.55.143 -P30000

Within the Kubernetes network space, Pods can connect via cluster IP or service name internally which is retrievable by using the following command:

$ kubectl get services -o wide
NAME          CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE       SELECTOR
etcd-client   10.104.244.200   <none>        2379/TCP            1d        app=etcd
etcd0         10.100.24.171    <none>        2379/TCP,2380/TCP   1d        etcd_node=etcd0
etcd1         10.108.207.7     <none>        2379/TCP,2380/TCP   1d        etcd_node=etcd1
etcd2         10.101.9.115     <none>        2379/TCP,2380/TCP   1d        etcd_node=etcd2
galera-rs     10.107.89.109    <nodes>       3306:30000/TCP      4h        app=galera-rs
galera-ss     None             <none>        3306/TCP            3m        app=galera-ss
kubernetes    10.96.0.1        <none>        443/TCP             1d        <none>

From the service list, we can see that the Galera Cluster ReplicaSet Cluster-IP is 10.107.89.109. Internally, another pod can access the database through this IP address or service name using the exposed port, 3306:

(etcd0 pod)$ mysql -udb_user -ppassword -hgalera-rs -P3306 -e 'select @@hostname'
+------------------------+
| @@hostname             |
+------------------------+
| galera-251551564-z4sgx |
+------------------------+

You can also connect to the external NodePort from inside any pod on port 30000:

(etcd0 pod)$ mysql -udb_user -ppassword -h192.168.55.143 -P30000 -e 'select @@hostname'
+------------------------+
| @@hostname             |
+------------------------+
| galera-251551564-z4sgx |
+------------------------+

Connection to the backend Pods will be load balanced accordingly based on least connection algorithm.

Summary

At this point, running Galera Cluster on Kubernetes in production seems much more promising as compared to Docker Swarm. As discussed in the last blog post, the concerns raised are tackled differently with the way Kubernetes orchestrates containers in StatefulSet, (although it’s still a beta feature in v1.6). We do hope that the suggested approach is going to help run Galera Cluster on containers at scale in production.

by ashraf at June 09, 2017 08:54 AM

June 08, 2017

Peter Zaitsev

Blog Poll: What Operating System Do You Run Your Production Database On?

blog poll

blog pollIn this post, we’ll use a blog poll to find out what operating system you use to run your production database servers.

As databases grow to meet more challenges and expanding application demands, they must try and get the maximum amount of performance out of available resources. How they work with an operating system can affect many variables, and help or hinder performance. The operating system you use for your database can impact consumable choices (such as hardware and memory). The operation system you use can also impact your choice of database engine as well (or vice versa).

Please let us know what operating system you use to run your database. For this poll, we’re asking which operating system you use to actually run your production database server (not the base operating system).

If you’re running virtualized Linux on Windows, please select Linux as the OS used for development. Pick up to three that apply. Add any thoughts or other options in the comments section:

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
Thanks in advance for your responses – they will help the open source community determine how database environments are being deployed.

by Dave Avery at June 08, 2017 08:24 PM

Jean-Jerome Schmidt

Video: Interview with Krzysztof Książek on the Upcoming Webinar: MySQL Tutorial - Backup Tips for MySQL

We sat down with Severalnines Senior Support Engineer Krzysztof Książek to discuss the upcoming webinar MySQL Tutorial - Backup Tips for MySQL, MariaDB & Galera Cluster.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

ClusterControl for MySQL Backup

ClusterControl provides you with sophisticated backup and failover features with a point-and-click interface to easily restore your data if something goes wrong. These advanced automated failover and backup technologies ensure your mission critical applications achieve high availability with zero downtime.

ClusterControl allows you to...

  • Create Backups
  • Schedule Backups
  • Set backup configuration method
    • Enable compression
    • Use mysqldump, xtrabackup, or NBD backup
    • Use PIGZ for parallel gzip
  • Backup to multiple locations (including the cloud)
  • Enable automated failover
  • View logs

ClusterControl also offers backup support for MongoDB & PostgreSQL

Learn more about the backup features in ClusterControl for MySQL here.

by Severalnines at June 08, 2017 01:15 PM

MariaDB Foundation

New MariaDB Foundation staff member sponsored by Tencent Cloud: Bin Cheng

The MariaDB Foundation is happy to announce that a new staff member, Bin Cheng, has started working as a Senior Developer of MariaDB. Bin’s work is sponsored by Tencent Cloud, and he also holds the title of Expert Engineer at Tencent Cloud. Tencent is one of the largest IT companies in China and the world, […]

The post New MariaDB Foundation staff member sponsored by Tencent Cloud: Bin Cheng appeared first on MariaDB.org.

by Otto Kekäläinen at June 08, 2017 09:47 AM

June 07, 2017

Peter Zaitsev

ProxySQL Admin Interface Is Not Your Typical MySQL Server!

ProxySQL Admin

ProxySQL AdminIn this blog post, I’ll look at how ProxySQL Admin behaves in some unusual and unexpected ways from a MySQL perspective.

ProxySQL allows you to connect to its admin interface using the MySQL protocol and use familiar tools, like the MySQL command line client, to manage its configuration as a set of configuration tables. This ability may trick you into thinking that you’re working with a stripped-down MySQL server – and expect it to behave like MySQL. 

It would be a mistake to think this! In fact, ProxySQL embeds the SQLite database to store its configuration. As such, it behaves much closer to SQLite!

Below, I’ll show you a few things that confused me at first. All of these are as of ProxySQL 1.3.6 (in case behavior changes in the future).

Fake support for Use command

mysql> show databases;
+-----+---------+-------------------------------+
| seq | name    | file                          |
+-----+---------+-------------------------------+
| 0   | main    |                               |
| 2   | disk    | /var/lib/proxysql/proxysql.db |
| 3   | stats   |                               |
| 4   | monitor |                               |
+-----+---------+-------------------------------+
4 rows in set (0.00 sec)
mysql> select database();
+------------+
| DATABASE() |
+------------+
| admin      |
+------------+
1 row in set (0.00 sec)
mysql> use stats;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select database();
+------------+
| DATABASE() |
+------------+
| admin      |
+------------+
1 row in set (0.00 sec)
mysql> use funkydatabase;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed

So here we can see that:

  • There is a concept of multiple databases in the ProxySQL admin interface
  • The ProxySQL admin interface supports the 
    select database();
     function, which is always same value independent of the database you tried to set. Typically it will be “admin” or “stats”, depending on what user you use to connect to the database.
  • You can use the “use” command to change the database – but it does not really change the database. This is a required command, because if you don’t support it many MySQL clients will not connect.

Invisible tables

mysql> show tables;
+--------------------------------------+
| tables                               |
+--------------------------------------+
| global_variables                     |
| mysql_collations                     |
| mysql_query_rules                    |
| mysql_replication_hostgroups         |
| mysql_servers                        |
| mysql_users                          |
| runtime_global_variables             |
| runtime_mysql_query_rules            |
| runtime_mysql_replication_hostgroups |
| runtime_mysql_servers                |
| runtime_mysql_users                  |
| runtime_scheduler                    |
| scheduler                            |
+--------------------------------------+
13 rows in set (0.00 sec)
mysql> show tables from stats;
+--------------------------------+
| tables                         |
+--------------------------------+
| global_variables               |
| stats_mysql_commands_counters  |
| stats_mysql_connection_pool    |
| stats_mysql_global             |
| stats_mysql_processlist        |
| stats_mysql_query_digest       |
| stats_mysql_query_digest_reset |
| stats_mysql_query_rules        |
+--------------------------------+
8 rows in set (0.00 sec)
mysql> select count(*) from stats_mysql_commands_counters;
+----------+
| count(*) |
+----------+
| 52       |
+----------+
1 row in set (0.00 sec)

We can query a list of tables in our default database (which can’t change), and we also get lists of tables in the “stats” database with very familiar MySQL syntax. But we can also query the “stats” table directly without specifying the “stats” database, even if it is not shown in “show tables” for our current database.

Again this is SQLite behavior! 🙂

Strange Create Table syntax

mysql> show create table scheduler G
*************************** 1. row ***************************
      table: scheduler
Create Table: CREATE TABLE scheduler (
   id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
   active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1,
   interval_ms INTEGER CHECK (interval_ms>=100 AND interval_ms<=100000000) NOT NULL,
   filename VARCHAR NOT NULL,
   arg1 VARCHAR,
   arg2 VARCHAR,
   arg3 VARCHAR,
   arg4 VARCHAR,
   arg5 VARCHAR,
   comment VARCHAR NOT NULL DEFAULT '')
1 row in set (0.00 sec)

If we look into the ProxySQL Admin interface table structure, we see it is not quite MySQL. It uses CHECK constraints and doesn’t specify the length for VARCHAR. This is because it is SQLite table definition. 

SHOW command nuances

The ProxySQL Admin interface supports SHOW PROCESSLIST and even SHOW FULL PROCESSLIST commands, but not all the commands match the MySQL server output:

mysql> show processlist;
+-----------+---------------+--------+-----------+---------+---------+--------+
| SessionID | user          | db     | hostgroup | command | time_ms | info   |
+-----------+---------------+--------+-----------+---------+---------+--------+
| 129       | proxysql_user | sbtest | 10        | Query   | 14      | COMMIT |
| 130       | proxysql_user | sbtest | 10        | Query   | 16      | COMMIT |
| 131       | proxysql_user | sbtest | 10        | Query   | 9       | COMMIT |
| 133       | proxysql_user | sbtest | 10        | Query   | 0       | COMMIT |
| 134       | proxysql_user | sbtest | 10        | Query   | 5       | COMMIT |
….
| 191       | proxysql_user | sbtest | 10        | Query   | 4       | COMMIT |
| 192       | proxysql_user | sbtest | 10        | Query   | 1       | COMMIT |
+-----------+---------------+--------+-----------+---------+---------+--------+
62 rows in set (0.01 sec)

SHOW VARIABLES works, as does SHOW GLOBAL VARIABLES, but not SHOW SESSION VARIABLES.

SHOW STATUS doesn’t work as expected:

mysql> show status;
ERROR 1045 (#2800): near "show": syntax error

As you can see, while some typical MySQL commands and constructs work, others don’t. This is by design: ProxySQL implemented some of the commands to make it easy and familiar for MySQL users to navigate the ProxySQL interface. But don’t get fooled! It is not MySQL, and doesn’t always behave as you would expect.

You’ve been warned!

by Peter Zaitsev at June 07, 2017 05:20 PM

MariaDB Foundation

MariaDB Galera Cluster 10.0.31 and Connector/Java 2.0.2 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB Galera Cluster 10.0.31, MariaDB Connector/J 2.0.2, and MariaDB Connector/J 1.6.1. See the release notes and changelogs for details. Download MariaDB Galera Cluster 10.0.31 Release Notes Changelog What is MariaDB Galera Cluster? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Connector/J 2.0.2 Release […]

The post MariaDB Galera Cluster 10.0.31 and Connector/Java 2.0.2 now available appeared first on MariaDB.org.

by Daniel Bartholomew at June 07, 2017 02:37 PM

June 06, 2017

Peter Zaitsev

MySQL Encryption at Rest – Part 1 (LUKS)

MySQL Encryption at Rest

MySQL Encryption at RestIn this first of a series of blog posts, we’ll look at MySQL encryption at rest.

At Percona, we work with a number of clients that require strong security measures for PCI, HIPAA and PHI compliance, where data managed by MySQL needs to be encrypted “at rest.” As with all things open source, there several options for meeting the MySQL encryption at rest requirement. In this three-part series, we cover several popular options of encrypting data and present the various pros and cons to each solution. You may want to evaluate which parts of these tutorials work best for your situation before using them in production.

Part one of this series is implementing disk-level encryption using crypt+LUKS.

In MySQL 5.7, InnoDB has built-in encryption features. This solution has some cons, however. Specifically, InnoDB tablespace encryption doesn’t cover undo logs, redo logs or the main ibdata1 tablespace. Additionally, binary-logs and slow-query-logs are not covered under InnoDB encryption.

Using crypt+LUKS, we can encrypt everything (data + logs) under one umbrella – provided that all files reside on the same disk. If you separate the various logs on to different partitions, you will have to repeat the tutorial below for each partition.

LUKS Tutorial

The Linux Unified Key Setup (LUKS) is the current standard for disk encryption. In the examples below, the block device /dev/sda4 on CentOS 7 is encrypted using a generated key, and then mounted as the default MySQL data directory at /var/lib/mysql.

WARNING! Loss of the key means complete loss of data! Be sure to have a backup of the key.

Install the necessary utilities:

# yum install cryptsetup

Creating, Formatting and Mounting an Encrypted Disk

The cryptsetup command initializes the volume and sets an initial key/passphrase. Please note that the key is not recoverable, so do not forget it. Take the time now to decide where you will securely store a copy of this key. LastPass Secure Notes are a good option, as they allow file attachments. This enhances our backup later on.

Create a passphrase for encryption. Choose something with high entropy (i.e., lots of randomness). Here are two options (pick one):

# openssl rand -base64 32
# date | md5 | rev | head -c 24 | md5 | tail -c 32

Next, we need to initialize and format our partition for use with LUKS. Any mounted points using this block device must be unmounted beforehand.

WARNING! This command will delete ALL DATA ON THE DEVICE! BE SURE TO COMPLETE ANY BACKUPS BEFORE YOU RUN THIS!

# cryptsetup -c aes-xts-plain -v luksFormat /dev/sda4

You will be prompted for a passphrase. Provide the phrase you generated above. After you provide a passphrase, you now need to “open” the encrypted disk and provide a device mapper name (i.e., an alias). It can be anything, but for our purposes, we will call it “mysqldata”:

# cryptsetup luksOpen /dev/sda4 mysqldata

You will be prompted for the passphrase you used above. On success, you should see the device show up:

# ls /dev/mapper/
lrwxrwxrwx  1 root root      7 Jun  2 11:50 mysqldata -> ../dm-0

You can now format this encrypted block device and create a filesystem:

# mkfs.ext4 /dev/mapper/mysqldata

Now you can mount the encrypted block device you just formatted:

# mount /dev/mapper/mysqldata /var/lib/mysql

Unfortunately you cannot add this to /etc/fstab to automount on a server reboot, since the key is needed to “open” the device. Please keep this in mind that if your server ever reboots MySQL will not start since the data directory is unavailable until opened and mounted (we will look at how to make this work using scripts in Part Two of this series).

Creating a Backup of Encryption Information

The header of a LUKS block device contains information regarding the current encryption key(s). Should this ever get damaged, or if you need to recover because you forgot the new passphrase, you can restore this header information:

# cryptsetup luksHeaderBackup --header-backup-file ${HOSTNAME}_`date +%Y%m%d`_header.dat /dev/sda4

Go ahead and make a SHA1 of this file now to verify that it doesn’t get corrupted later on in storage:

# sha1sum ${HOSTNAME}_`date +%Y%m%d`_header.dat

GZip the header file. Store the SHA1 and the .gz file in a secure location (for example, attach it to the secure note created above). Now you have a backup of the key you used and a backup of the header which uses that key.

Unmounting and Closing a Disk

If you know you will be storing a disk, or just want to make sure the contents are not visible (i.e., mounted), you can unmount and “close” the encrypted device:

# umount /var/lib/mysql/
# cryptsetup luksClose mysqldata

In order to mount this device again, you must “open” it and provide one of the keys.

Rotating Keys (Adding / Removing Keys)

Various compliance and enforcement rules dictate how often you need to rotate keys. You cannot rotate or change a key directly. LUKS supports up to eight keys per device. You must first add a new key to any slot (other than the slot currently occupying the key you are trying to remove), and then remove the older key.

Take a look at the existing header information:

# cryptsetup luksDump /dev/sda4
LUKS header information for /dev/sda4
Version: 1
Cipher name: aes
Cipher mode: cbc-essiv:sha256
Hash spec: sha1
Payload offset: 4096
MK bits: 256
MK digest: 81 37 51 6c d5 c8 32 f1 7a 2d 47 7c 83 62 70 d9 f7 ce 5a 6e
MK salt: ae 4b e8 09 c8 7a 5d 89 b0 f0 da 85 7e ce 7b 7f
47 c7 ed 51 c1 71 bb b5 77 18 0d 9d e2 95 98 bf
MK iterations: 44500
UUID: 92ed3e8e-a9ac-4e59-afc3-39cc7c63e7f6
Key Slot 0: ENABLED
Iterations: 181059
Salt: 9c a9 f6 12 d2 a4 2a 3d a4 08 b2 32 b0 b4 20 3b
69 13 8d 36 99 47 42 9c d5 41 35 8c b3 d0 ff 0e
Key material offset: 8
AF stripes: 4000
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

Here we can see a key is currently occupying “Key Slot 0”. We can add a key to any DISABLED key slot. Let’s use slot #1:

# cryptsetup luksAddKey --key-slot 1 -v /dev/sda4
Enter any passphrase:
Key slot 0 unlocked.
Enter new passphrase for key slot:
Verify passphrase:
Command successful.

LUKS asks for “any” passphrase to authenticate us. Had there been keys in other slots, we could have used any one of them. As only one is currently saved, we have to use it. We can then add a new passphrase for slot 1.

Now that we have saved the new key in slot 1, we can remove the key in slot 0.

# cryptsetup luksKillSlot /dev/sda4 0
Enter any remaining LUKS passphrase:
No key available with this passphrase.

In the example above, the existing passphrase stored in slot 0 was used. This is not allowed. You cannot provide the passphrase for the same slot you are attempting to remove.

Repeat this command and provide the passphrase for slot 1, which was added above. We are now able to remove the passphrase stored in slot 0:

# cryptsetup luksKillSlot /dev/sda4 0
Enter any remaining LUKS passphrase:
# cryptsetup luksDump /dev/sda4
LUKS header information for /dev/sda4
Version: 1
Cipher name: aes
Cipher mode: cbc-essiv:sha256
Hash spec: sha1
Payload offset: 4096
MK bits: 256
MK digest: 81 37 51 6c d5 c8 32 f1 7a 2d 47 7c 83 62 70 d9 f7 ce 5a 6e
MK salt: ae 4b e8 09 c8 7a 5d 89 b0 f0 da 85 7e ce 7b 7f
47 c7 ed 51 c1 71 bb b5 77 18 0d 9d e2 95 98 bf
MK iterations: 44500
UUID: 92ed3e8e-a9ac-4e59-afc3-39cc7c63e7f6
Key Slot 0: DISABLED
Key Slot 1: ENABLED
Iterations: 229712
Salt: 5d 71 b2 3a 58 d7 f8 6a 36 4f 32 d1 23 1a df df
cd 2b 68 ee 18 f7 90 cf 58 32 37 b9 02 e1 42 d6
Key material offset: 264
AF stripes: 4000
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

After you change the passphrase, it’s a good idea to repeat the header dump steps we performed above and store the new passphrase in your vault.

Conclusion

Congratulations, you have now learned how to encrypt and mount a partition using LUKS! You can now use this mounted device just like any other. You can also restore a backup and start MySQL.

In Part Two, we will cover using InnoDB tablespace encryption.

by Manjot Singh at June 06, 2017 07:00 PM

Upcoming Webinar Thursday June 8, 2017: MongoDB Shell – A Primer

MongoDB Shell

MongoDB ShellJoin Percona’s Solutions Engineer, Rick Golba as he presents MongoDB Shell: A Primer on Thursday, June 8, 2017, at 11 am PDT / 2 pm EDT (UTC-7).

Every good DBA should be a master of the database shell. In this webinar, we will help you understand how to structure shell commands and discuss all the advanced functions and ways to chain commands in the mongo shell.

This webinar will teach you how to:

  • Limit the number of documents, or skip documents, when running a query
  • Work with the MongoDB aggregation pipeline
  • View an explain plan for a MongoDB query
  • Understand the MongoDB write concerns
  • Validate the contents of a database on various nodes in a replica set
  • Understand the MongoDB read preference

We will touch on CRUD functions, but a great deal more time will be spent on the areas above. We will have a dedicated webinar for mastering CRUD operations in MongoDB in the future.

Register for the webinar here.

MongoDB ShellRick Golba, Solutions Engineer

Rick Golba is a Solutions Engineer at Percona. Rick has over 20 years of experience working with databases. Prior to Percona, he worked as a Technical Trainer for HP/Vertica.

 

by Dave Avery at June 06, 2017 06:35 PM

MariaDB Foundation

Position open for an experienced open source C/C++ developer

The MariaDB Foundation is looking to expand its staff with one or two part-time or full-time C/C++ developers. We are looking in particular for people with a deep understanding of how open source works, how collaboration is fostered, how licenses and copyright work, how tools like git and test suites are best used and so […]

The post Position open for an experienced open source C/C++ developer appeared first on MariaDB.org.

by Otto Kekäläinen at June 06, 2017 08:55 AM

Peter Zaitsev

Percona Live Open Source Database Conference Europe 2017 in Dublin, Ireland Call for Papers is Open!

Percona Live Call for Papers

Percona Live Call for PapersAnnouncing the opening of the Percona Live Open Source Database Conference Europe 2017 in Dublin, Ireland call for papers. It will be open from now until July 17, 2017.*

Do you have a big idea to explain, use case to share or skill to teach? Submit your speaking proposal for either breakout or tutorial sessions. This is your chance to put your developer ideas, business and case studies, and operational expertise in front of an intelligent, engaged audience of open source technology users.

The theme of Percona Live Europe 2017 is “Championing Open Source Databases,” with sessions in MySQL, MariaDB, MongoDB and other open source databases, including time series databases, PostgreSQL and RocksDB. Are you:

  • Working with MongoDB as a developer?
  • Creating a new MySQL-variant time series database?
  • Deploying MariaDB in a novel way?
  • Using open source database technology to solve a particular business issue?

We are looking for topics that address a variety of open source issues. The tracks at this year’s conference are:

  • Developers
  • Business / Case Studies
  • Operations

We invite you to submit your speaking proposal for breakout, tutorial or lightning talk sessions. Share your open source database experiences with peers and professionals in the open source community by presenting a:

  • Breakout Session. Broadly cover a technology area using specific examples. Sessions should be either 25 minutes or 50 minutes in length (including Q&A).
  • Tutorial Session. Present a technical session that aims for a level between a training class and a conference breakout session. Encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be three or six hours in length (including Q&A).
  • Lightning Talk. Give a five-minute presentation focusing on one key point that interests the open source community: technical, lighthearted or entertaining talks on new ideas, a successful project, a cautionary story, a quick tip or demonstration.

Speaking at Percona Live Europe is a great way to build your personal and company brands. If selected, you will receive a complimentary full conference pass!

Submit your talks now.

*NOTE: We have changed our registration platform this year, so you will need to register before submitting a talk idea (even if you have previously registered).

Tips for Submitting

Include presentation details, but be concise. Clearly state:

  • Purpose of the talk (problem, solution, action format, etc.)
  • Covered technologies
  • Target audience
  • Audience takeaway

Keep proposals free of sales pitches. The Committee is looking for in-depth technical talks, not ones that sound like a commercial.

Be original! Make your presentation stand out by submitting a proposal that focuses on real-world scenarios, relevant examples, and knowledge transfer.

Submit your proposals as soon as you can – the call for papers closes July 17, 2017!

by Dave Avery at June 06, 2017 01:08 AM

June 05, 2017

Peter Zaitsev

Webinar June 7, 2017: MySQL In the Cloud – Migration, Best Practices, High Availability, Scaling

MySQL in the Cloud

MySQL in the CloudJoin Percona’s CEO and Founder Peter Zaitsev as he presents MySQL In the Cloud: Migration, Best Practices, High Availability, Scaling on Wednesday, June 7, 2017, at 10 am PDT / 1:00 pm EDT (UTC-7).

Businesses are moving many of the systems and processes they once owned to offsite “service” models: Platform as a Service (PaaS), Software as a Service (SaaS), Infrastructure as a Service (IaaS), etc. These services are usually referred to as being “in the cloud” – meaning that the infrastructure and management of the service in question are not maintained by the enterprise using the service.

When it comes to database environment and infrastructure, more and more enterprises are moving to MySQL in the cloud to manage this vital part of their business organization. We often refer to database services provided in the cloud as Database as a Service (DBaaS). The next question after deciding to move your database to the cloud is “How to I plan properly to as to avoid a disaster?”

Before moving to the cloud, it is important to carefully define your database needs, plan for the migration and understand what putting a solution into production entails. This webinar discusses the following subjects on moving to the cloud:

  • Public and private cloud
  • Migration to the cloud
  • Best practices
  • High availability
  • Scaling

Register for the webinar here.

Peter ZaitsevPeter Zaitsev, Percona CEO and Founder

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 150 professionals in 20+ countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014 and 2015.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. Fortune and DZone often tap Peter as a contributor, and his recent ebook Practical MySQL Performance Optimization is one of percona.com’s most popular downloads.

by Dave Avery at June 05, 2017 06:09 PM

June 04, 2017

Colin Charles

Speaking in June 2017

I will be at several events in June 2017:

  • db tech showcase 2017 – 16-17 June 2017 – Tokyo, Japan. I’m giving a talk about best practices around MySQL High Availability.
  • O’Reilly Velocity 2017 – 19-22 June 2017 – San Jose, California, USA. I’m giving a tutorial about best practices around MySQL High Availability. Use code CC20 for a 20% discount.

I look forward to meeting with you at either of these events, to discuss all things MySQL (High Availability, security, cloud, etc.), and how Percona can help you.

As I write this, I’m in Budva, Montenegro, for the Percona engineering meeting.

by Colin Charles at June 04, 2017 08:46 AM

June 02, 2017

Peter Zaitsev

Percona XtraDB Cluster 5.7.18-29.20 is now available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.7Percona announces the release of Percona XtraDB Cluster 5.7.18-29.20 on June 2, 2017. Binaries are available from the downloads section or our software repositories.

NOTE: You can also run Docker containers from the images in the Docker Hub repository.

NOTE: Due to new package dependency, Ubuntu/Debian users should use apt-get dist-upgrade or apt-get installpercona-xtradb-cluster-57 to upgrade.

Percona XtraDB Cluster 5.7.18-29.20 is now the current release, based on the following:

All Percona software is open-source and free.

Fixed Bugs

  • PXC-749: Fixed memory leak when running INSERT on a table without primary key defined and wsrep_certify_nonPK disabled (set to 0).

    NOTE: We recommend you define primary keys on all tables for correct write-set replication.

  • PXC-812: Fixed SST script to leave the DONOR keyring when JOINER clears the datadir.

  • PXC-813: Fixed SST script to use UTC time format.

  • PXC-816: Fixed hook for caching GTID events in asynchronous replication. For more information, see #1681831.

  • PXC-820: Enabled querying of pxc_maint_mode by another client during the transition period.

  • PXC-823: Fixed SST flow to gracefully shut down JOINER node if SST fails because DONOR leaves the cluster due to network failure. This ensures that the DONOR is then able to recover to synced state when network connectivity is restored For more information, see #1684810.

  • PXC-824: Fixed graceful shutdown of Percona XtraDB Cluster node to wait until applier thread finishes.

Other Improvements

  • PXC-819: Added five new status variables to expose required values from wsrep_ist_receive_status and wsrep_flow_control_interval as numbers, rather than strings that need to be parsed:

    • wsrep_flow_control_interval_low
    • wsrep_flow_control_interval_high
    • wsrep_ist_receive_seqno_start
    • wsrep_ist_receive_seqno_current
    • wsrep_ist_receive_seqno_end

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Alexey Zhebel at June 02, 2017 07:41 PM

Percona XtraDB Cluster 5.6.36-26.20 is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.6.34-26.19Percona announces the release of Percona XtraDB Cluster 5.6.36-26.20 on June 2, 2017. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.36-26.20 is now the current release, based on the following:

All Percona software is open-source and free.

NOTE: Due to end of life, Percona will stop producing packages for the following distributions after July 31, 2017:

  • Red Hat Enterprise Linux 5 (Tikanga)
  • Ubuntu 12.04 LTS (Precise Pangolin)

You are strongly advised to upgrade to latest stable versions if you want to continue using Percona software.

Fixed Bugs

  • PXC-749: Fixed memory leak when running INSERT on a table without primary key defined and wsrep_certify_nonPK disabled (set to 0).

    NOTE: We recommended you define primary keys on all tables for correct write set replication.

  • PXC-813: Fixed SST script to use UTC time format.

  • PXC-823: Fixed SST flow to gracefully shut down JOINER node if SST fails because DONOR leaves the cluster due to network failure. This ensures that the DONOR is then able to recover to synced state when network connectivity is restored For more information, see #1684810.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Alexey Zhebel at June 02, 2017 07:40 PM

Jean-Jerome Schmidt

Comparing Oracle MySQL, Percona Server and MariaDB

Back in the days when somebody said MySQL, there was only the MySQL. You could pick different versions (4.0, 4.1) but the vendor was the same. This changed around MySQL 5.0/5.1 when Percona decided to release their own flavour of MySQL - Percona Server. A bit later, MariaDB joined with MariaDB 5.1 and the fun (or confusion) increased. What version should I use? What is the difference between MySQL 5.1, Percona Server 5.1 and MariaDB 5.1? Which one is faster? Which one is more stable? Which one has superior functionality? With time, this got worse as more and more changes were introduced in each of flavours. This blog post will be our attempt to summarize the key features that differentiate them. We will also try to give you some suggestions about which flavour may be the best for a given type of project. Let’s get started.

Oracle MySQL

It used to be the MySQL, now it’s the upstream. Most of the development starts here, each version starting from 5.6 resolves some internal contentions and brings better performance. New features are being added too on a regular basis. MySQL 5.6 brought us (among others) GTID and an initial implementation of parallel replication. It also gave us ability to execute most of the ALTERs in an online manner. Let’s take a look at the features of the latest MySQL version - MySQL 5.7

Features of MySQL 5.7

One of the major changes are changes in the deployment process - instead of different scripts, you can just run mysqld --initialize to set MySQL up from scratch. Another very important change - parallel replication based on logical clock. Finally, we can use parallel replication in all cases - no matter if you use multiple schemas or not. Another replication improvement is multi-source replication - a 5.7 slave can have multiple masters - it’s great feature if you want to build an aggregation slave and, let’s say, combine data from multiple, separate clusters.

InnoDB started to support spatial types, InnoDB buffer pool can finally be resized at runtime, online ALTERs have been improved to support more cases like partitioning and no-op ALTERs.

MySQL started to support JSON data types natively along with several new functions which are focused on adding functionality around JSON. Security of your data is very important these days, MySQL 5.7 supports data-at-rest encryption for file-per-table tablespaces. Some improvements have also been added to SSL support (SSL will be configured if keys are in place, a script is included which can be used to create certificates). From a user management perspective, password lifetime setup has been added which should make the design of password expiration policies a bit easier.

Another feature which is intended to assist DBAs is the ‘sys’ schema, a set of views designed to make it easier to use Performance Schema. It’s been included by default in MySQL 5.7.

Finally, MySQL Group Replication (and eventually MySQL InnoDB Cluster) has been added to MySQL 5.7. It works as a plugin and is included in recent versions of the 5.7 branch, but that is a topic of its own. In short, Group Replication allows you to build a “virtually” synchronous cluster.

This is definitely not a full list of features - if you are interested in all of them, you may want to consult MySQL 5.7 documentation.

Percona Server

At the beginning, there was a set of patches to apply to the MySQL source code which added some performance improvements and functionality. At some point, Percona decided to release their own build of MySQL which included these patches. In time, more development resources became available, so more and more features have been added.

In general, you can view Percona Server as the latest MySQL version with multiple patches/improvements. With time, some of the Percona Server feature improvements are replaced by features from the upstream - whenever Oracle develops a feature which supersedes one of functionalities added in Percona Server. As long as the implementation is on par, Percona removes their own code in favor of code from the upstream. This makes Percona Server basically a drop-in replacement for Oracle’s MySQL. One of the areas in which major performance improvements have been made is InnoDB. It’s been modified significantly enough to brand it as XtraDB. Currently it is fully compatible with InnoDB but it hasn’t always been like that. For instance, some features in Percona Server 5.5 were not compatible with MySQL 5.5. It is also true for more recent Percona Server versions. What’s important is that, by default, Percona Server comes with all incompatible features disabled - it makes easy to test it and roll back to Oracle’s MySQL if needed. Taking into consideration all above,  you still should exercise caution when you plan to migrate from Percona Server to MySQL - someone might have enabled one of incompatible features.

What’s worth highlighting is that Percona strives to reimplement enterprise features of the upstream. In case of MySQL, examples are implementation of a thread pool or PAM authentication plugin. Let’s take a quick look at some of the features of the Percona Server.

Features of Percona Server 5.7

One of the main features of XtraDB is improved buffer pool scalability - even though there’s less and less contention due to the work Oracle does on every MySQL version, Percona’s engineering team strives to push performance even further and remove additional mutexes which may limit performance. Additionally, more data is written into InnoDB monitor (accessible via SHOW ENGINE INNODB STATUS) regarding contentions within InnoDB - e.g., a section on semaphores has been added.

Another set of improvements have been made in the area of I/O. In InnoDB, you can set a flush method only for InnoDB tablespaces and this causes double-buffering for InnoDB redo logs. XtraDB makes it possible to use O_DIRECT also for those files. It also adds more data regarding checkpointing to the output of SHOW ENGINE INNODB STATUS. Additionally to that, parallel doublewrite buffer and multi-threaded LRU flusher have been implemented to reduce contention in I/O operations within InnoDB.

Thread pool is another feature made available by Percona Server. In Oracle MySQL it’s available only in the Enterprise edition. Here you can use Percona’s implementation for free. In general, thread pool reduces contentions while serving high number of connections from the application by reusing existing connections to the database.

Two more features are direct replacements of features from the Enterprise version of MySQL. One of them is the PAM authentication plugin which has been developed by Percona and is designed to allow the use of plenty of different authentication options like LDAP, RSA SecurID or any other methods supported by PAM. Second feature is also related to security - audit log plugin. It will create a file with a record of actions taken on the database server.

From time to time, Percona introduces significant improvements to other storage engines like the changes they made in MEMORY engine which allowed VARCHAR or BLOB type of data to be used.

Introduction of backup locks was also a rather significant improvement. In Oracle and MariaDB the only method of locking the table to get a consistent backup was to use FLUSH TABLES WITH READ LOCK (FTWRL). It’s rather heavy and it forces MySQL to close all of the opened tables. Backup locks, on the other hand, use a more lightweight approach of metadata locks. In many cases of heavy loaded server running FTWRL takes too long (and locks the server for too long) to be considered feasible while backup locks make it possible to take a backup using mysqldump or xtrabackup.

Percona is open also on porting features from other vendors. One such example is port of MariaDB’s START TRANSACTION WITH CONSISTENT SNAPSHOTS. This feature is also related to backups - with its help, you can take a consistent logical backup (using mysqldump) without running FLUSH TABLE WITH READ LOCK.

Finally, three features which improve observability - first: user statistics. This is fairly light-weight feature which collects data about users, indexes, tables and threads. It allows you to find unused indexes or determine which user is responsible for the load on the server. Currently it’s partially redundant to performance_schema but it’s a bit lighter and it was created in the days of MySQL 5.0 - 5.1, where no one even dreamt about the performance_schema.

Second - enhanced slow query log. Again, it was added in times where the highest granularity of long_query_time was 1 second. With this addition you had microsecond granularity and a bunch of additional data about InnoDB stats per query or its overall performance characteristics. Did it create a temporary table? Did it use an index? Has it been cached in the MySQL query cache?

Third feature we mentioned couple of times above - Percona Server exposes significantly more data in SHOW ENGINE INNODB STATUS than upstream. It definitely helps to further understand the workload and catch more issues before they unfold.

Of course, this is not a full list - if you are interested in more details, you may want to check Percona Server’s documentation.

MariaDB

MariaDB started as a fork of MySQL but, with part of original MySQL development team joining MariaDB, it quickly focused on adding features. In MariaDB 5.3, lots of features had been added to the optimizer: Batch Key Access, MultiRange Read, Index Condition Pushdown to name a few. This allowed MariaDB to excel in some workloads where MySQL or Percona Server would struggle. Until now, some of those features have been added to MySQL (mostly in MySQL 5.6) but some are still unique to MariaDB.

Another important feature which was introduced by MariaDB was Global Transaction ID. Not too much later Oracle released its own implementation but MariaDB was the first to have it. Similar story is with another replication feature - multisource replication: MariaDB had it before Oracle. Now, recently released MariaDB 10.2 also contains features which will be made available in MySQL 8.0, which is still under development. We are talking about, for example, recursive common table expressions or window functions.

Features of MariaDB 10.2

As we mentioned, MariaDB 10.2 introduces window functions and recursive common table expressions - enhancements in SQL which should help developers to write more efficient SQL queries.

Very important change is that MariaDB 10.2 uses InnoDB. Up to 10.1, XtraDB has been used as the default storage. This, unfortunately, makes features added in latest XtraDB unavailable for MariaDB 10.2 users.

Improvements have been made in virtual columns - more limitations have been lifted in 10.2.

Finally, support for multiple triggers for the same event has been added - now you can create several, for example, ON UPDATE triggers on the same table.

Developers should benefit from JSON support, along with a couple of functions which are related to it. They should also like new functions which allows them to export spatial data into GeoJSON format. Talking about JSON, improvements have been made in EXPLAIN FORMAT=JSON output - more data is shown.

On security front, support for OpenSSL 1.1 and LibreSSL has been added.

Of course, this list is not complete and if you are interested into what has been added to MySQL 10.2, you may want to consult documentation.

Additionally to new features MariaDB 10.2 benefits from features implemented in previous versions. We’ll go through the most important ones.

The most important features of MariaDB 10.1

First of all, MariaDB since 10.1 comes bundled with Galera cluster - no need to install additional libraries, everything is ready to use.

MariaDB 10.1 brought implementation of data-at-rest encryption. Compared to the feature implemented in Oracle’s MySQL, MariaDB has it more extended. Not only it encrypts tablespaces but it also encrypts redo logs, temporary files and binary logs. This comes with some issues - CLI tools like mysqldump or xtrabackup can’t access binary logs and may have problems backing up encrypted instances (this is especially true for xtrabackup - quite recently MariaDB created xtrabackup fork called MariaDB Backup which supports data-at-rest encryption).

Ok, so which flavor I should use?

As it usually is, the correct answer would be: “It depends” :-) . We will share couple of our observations which may or may not help you to decide, but, at the end, it’s up to you to run benchmarks and find whatever option will work best for your environment and application.

First of all, let’s talk about the flow. Oracle releases new version - let’s say MySQL 5.7. Performance-wise, at that moment, this is the fastest MySQL flavor on the market. This is because only Oracle has enough resources to work on improving InnoDB to that extent. Within a couple of months (in case of 5.7, it was 4 months) Percona releases Percona Server 5.7 with their set of improvements - depending on the type of workload, it may deliver even better performance than the upstream. Finally, MariaDB adopts new upstream version and builds its new version on top of it.

This is how it looked like in a calendar (we are still talking about MySQL 5.7).

MySQL 5.7 GA: October 21st, 2015

Percona Server 5.7 GA: February 23rd, 2016

MariaDB 10.2 GA: May 23rd, 2017

Please note how long it took MariaDB to release a version based on MySQL 5.7 - all previous versions have been based on MySQL 5.6 and, obviously, delivered performance lower than MySQL 5.7. On the other hand, MariaDB 10.2 has been released with InnoDB replacing XtraDB. While it’s true that Oracle mostly closed the performance gap between MySQL and Percona Server, it’s still just “mostly”. As a result, MariaDB 10.2 may deliver performance lower than the one of Percona Server in some cases (and better in some other cases - due to optimizer work done in MariaDB 5.3, some of which hasn’t been recreated yet in MySQL).

Feature-wise, it’s more complex. MariaDB has been adding lots of features, so if you are interested into some of them, you may definitely consider using MariaDB. There is a downside of it too. Percona Server had a great deal of features differentiating it from upstream MySQL but when Oracle started to implement them in MySQL, Percona decided to depreciate their implementations in favor of using the implementation from upstream. This reduced the amount of code that’s different between MySQL and Percona Server, makes it easier to maintain Percona Server’s code and, what’s the most important, makes Percona Server 100% compatible with MySQL.

This is, unfortunately, not true for MariaDB. MariaDB introduced GTID first, that’s true, but after Oracle developed their version of GTID, MariaDB decided to stick to their own implementation. This blog is not the place to decide which implementation is better, but as a result we have to manage two different, incompatible GTID systems - it adds burden on any tool that manages replication and reduces interoperability. Sticking to replication - group commit and parallel replication: both Oracle and MariaDB have their own implementation and if you work with both of them, you need to learn them both to apply the required tuning - the knobs are different and work in a different way. Similar case is with virtual columns support - two different, not 100% compatible implementations which, as a result, made it not possible to easily dump data from MariaDB and load into Oracle’s MySQL (and vice versa) because the syntax is slightly different. So, should you decide to use a version of MariaDB for some brand new feature, you may end up getting stuck with it even if you’d like to migrate back to MySQL to use Oracle’s implementation. At best, migration would require much more effort to execute. Of course, if you stay in the one environment all the time, it may not affect you severely. But even then, a lack of compatibility will be noticeable for you, if only while you read blogs on the internet and find solutions that are not really applicable to your flavour of MySQL.

So, to sum it up - if you are interested in maintaining compatibility with MySQL, Percona Server (or MySQL itself, of course) would probably be the way to go. If you are interested in performance, as long as there is Percona Server built on top of the latest MySQL, it may be the way to go. Of course, you may want to benchmark MariaDB and see if your workload can’t benefit from some of optimizations which are still unique to MariaDB. Operational-wise, it’s probably good to stick to one of the environments (Oracle/Percona or MariaDB), whichever would work for you better. MySQL or Percona Server have an advantage in a way that they are more commonly used and it’s slightly easier to integrate them with external tools (because not all of the tools support all of the MariaDB features). If you would benefit from a new and shiny feature which has been just implemented in MariaDB, you should consider it, keeping in mind any potential compatibility issues and possible lower performance.

We hope this blog post gave you some ideas about different choices we have in MySQL world and different angles from which you can compare them. At the end of the day, it will be your task to decide what’s best for your setup. It may not be easy but then we still should be grateful that we have a choice and we can pick what works best for us.

by krzysztof at June 02, 2017 01:23 PM

May 31, 2017

MariaDB Foundation

MariaDB 10.1.24 and Connector/C 2.3.3 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.24, and MariaDB Connector/C 2.3.3. See the release notes and changelogs for details. Download MariaDB 10.1.24 Release Notes Changelog What is MariaDB 10.1? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Connector/C 2.3.3 Release Notes Changelog About MariaDB Connector/C Thanks, and enjoy […]

The post MariaDB 10.1.24 and Connector/C 2.3.3 now available appeared first on MariaDB.org.

by Daniel Bartholomew at May 31, 2017 07:28 PM

Peter Zaitsev

ProxySQL-Assisted Percona XtraDB Cluster Maintenance Mode

Percona XtraDB Cluster Maintenance Mode

Percona XtraDB Cluster Maintenance ModeIn this blog post, we’ll look at how Percona XtraDB Cluster maintenance mode uses ProxySQL to take cluster nodes offline without impacting workloads.

Percona XtraDB Cluster Maintenance Mode

Since Percona XtraDB Cluster offers a high availability solution, it must consider a data flow where a cluster node gets taken down for maintenance (through isolation from a cluster or complete shutdown).

Percona XtraDB Cluster facilitated this by introducing a maintenance mode. Percona XtraDB Cluster maintenance mode reduces the number of abrupt workload failures if a node is taken down using ProxySQL (as a load balancer).

The central idea is delaying the core node action and allowing ProxySQL to divert the workload.

How ProxySQL Manages Percona XtraDB Cluster Maintenance Mode

With Percona XtraDB Cluster maintenance mode, ProxySQL marks the node as OFFLINE when a user triggers a shutdown signal (or wants to put a specific node into maintenance mode):

  • When a user triggers a shutdown, Percona XtraDB Cluster node sets
    pxc_maint_mode
     to SHUTDOWN (from the DISABLED default) and sleep for x seconds (dictated by
    pxc_maint_transition_period
      — 10 secs by default). ProxySQL
    auto detects this change and marks the node as OFFLINE. With this change, ProxySQL avoids opening new connections for any DML transactions, but continues to service existing queries until
    pxc_maint_transition_period
    . Once the sleep period is complete, Percona XtraDB Cluster delivers a real shutdown signal — thereby giving ProxySQL enough time to transition the workload.
  • If the user needs to take a node into maintenance mode, the user can simply set
    pxc_maint_mode
     to MAINTENANCE. With that, 
    pxc_maint_mode
     is updated and the client connection updating it goes into sleep for x seconds (as dictated by
    pxc_maint_transition_period
    ) before giving back control to the user. ProxySQL 
    auto-detects this change and marks the node as OFFLINE. With this change ProxySQL avoids opening new connections for any DML transactions but continues to service existing queries.
  • ProxySQL auto-detects this change in maintenance state and then automatically re-routes traffic, thereby reducing abrupt workload failures.

Technical Details:

  • The ProxySQL Galera checker script continuously monitors the state of individual nodes by checking the
    pxc_maint_mode
     
    variable status (in addition to the existing
    wsrep_local_state
    ) using the ProxySQL scheduler feature
  • Scheduler is a Cron-like implementation integrated inside ProxySQL, with millisecond granularity.
  • If
    proxysql_galera_checker
     
    detects
    pxc_maint_mode = SHUTDOWN | MAINTENANCE
    , then it marks the node as OFFLINE_SOFT.  This avoids the creation of new connections (or workloads) on the node.

Sample

proxysql_galera_checker
 log:

Thu Dec  8 11:21:11 GMT 2016 Enabling config
Thu Dec  8 11:21:17 GMT 2016 Check server 10:127.0.0.1:25000 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:17 GMT 2016 Check server 10:127.0.0.1:25100 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:17 GMT 2016 Check server 10:127.0.0.1:25200 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:17 GMT 2016 Changing server 10:127.0.0.1:25200 to status OFFLINE_SOFT due to SHUTDOWN
Thu Dec  8 11:21:17 GMT 2016 Number of writers online: 2 : hostgroup: 10
Thu Dec  8 11:21:17 GMT 2016 Enabling config
Thu Dec  8 11:21:22 GMT 2016 Check server 10:127.0.0.1:25000 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:22 GMT 2016 Check server 10:127.0.0.1:25100 , status ONLINE , wsrep_local_state 4
Thu Dec  8 11:21:22 GMT 2016 Check server 10:127.0.0.1:25200 , status OFFLINE_SOFT , wsrep_local_state 4

Ping us below with any questions or comments.

by Ramesh Sivaraman at May 31, 2017 06:34 PM

May 30, 2017

Peter Zaitsev

Percona XtraBackup 2.4.7-2 is Now Available

Percona XtraBackup 2.4.7-2

Percona XtraBackup 2.4.7-2Percona announces the GA release of Percona XtraBackup 2.4.7-2 on May 29, 2017. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

Bug Fixed:
  • Fixed build failure on Debian 9.0 (Stretch). Bug fixed #1678947.

Release notes with all the bugfixes for Percona XtraBackup 2.4.7-2 are available in our online documentation. Please report any bugs to the launchpad bug tracker.

by Hrvoje Matijakovic at May 30, 2017 04:53 PM

Webinar May 31, 2017: Online MySQL Backups with Percona XtraBackup

Percona XtraBackup 2.4.7-2

Percona XtraBackupPlease join Percona’s solution engineer, Dimitri Vanoverbeke as he presents Online MySQL Backups with Percona XtraBackup on Wednesday, May 31, 2017 at 7:00 am PDT / 10:00 am EDT (UTC-7).

Percona XtraBackup is a free, open source, complete online backup solution for all versions of Percona Server, MySQL® and MariaDB®. Percona XtraBackup provides:

  • Fast and reliable backups
  • Uninterrupted transaction processing during backups
  • Savings on disk space and network bandwidth with better compression
  • Automatic backup verification
  • Higher uptime due to faster restore time

This webinar will discuss the different features of Percona XtraBackup, including:

  • Full and incremental backups
  • Compression, streaming, and encryption of backups
  • Backing up to the cloud (swift)
  • Percona XtraDB Cluster / Galera Cluster
  • Percona Server specific features
  • MySQL 5.7 support
  • Tools that use Percona XtraBackup
  • Limitations

Register for the webinar here.

DimitriDimitri Vanoverbeke, Solution Engineer

At the age of seven, Dimitri received his first computer. Since then he has felt addicted to anything with a digital pulse. Dimitri has been active in IT professionally since 2003, when he took various roles from internal system engineering to consulting. Prior to joining Percona, Dimitri worked as a consultant for a leading open source software consulting firm in Belgium. During his career, Dimitri has become familiar with a broad range of open source solutions and with the devops philosophy. Whenever he’s not glued to his computer screen, he enjoys traveling, cultural activities, basketball and the great outdoors.

 

by Dave Avery at May 30, 2017 04:40 PM

Jean-Jerome Schmidt

MySQL on Docker: Swarm Mode Limitations for Galera Cluster in Production Setups

In the last couple of blog posts on Docker, we have looked into understanding and running Galera Cluster on Docker Swarm. It scales and fails over pretty well, but there are still some limitations that prevent it from running smoothly in a production environment. We will be discussing about these limitations, and see how we can overcome them. Hopefully, this will clear some of the questions that might be circling around in your head.

Docker Swarm Mode Limitations

Docker Swarm Mode is tremendous at orchestrating and handling stateless applications. However, since our focus is on trying to make Galera Cluster (a stateful service) to run smoothly on Docker Swarm, we have to make some adaptations to bring the two together. Running Galera Cluster in containers in production requires at least:

  • Health check - Each of the stateful containers must pass the Docker health checks, to ensure it achieves the correct state before being included into the active load balancing set.
  • Data persistency - Whenever a container is replaced, it has to be started from the last known good configuration. Else you might lose data.
  • Load balancing algorithm - Since Galera Cluster can handle read/write simultaneously, each node can be treated equally. A recommended balancing algorithm for Galera Cluster is least connection. This algorithm takes into consideration the number of current connections each server has. When a client attempts to connect, the load balancer will try to determine which server has the least number of connections and then assign the new connection to that server.

We are going to discuss all the points mentioned above with a great detail, plus possible workarounds on how to tackle those problems.

Health Check

HEALTHCHECK is a command to tell Docker how to test a container, to check that it is still working. In Galera, the fact that mysqld is running does not mean it is healthy and ready to serve. Without a proper health check, Galera could be wrongly diagnosed when something goes wrong, and by default, Docker Swarm’s ingress network will include the “STARTED” container into the load balancing set regardless of the Galera state. On the other hand, you have to manually attach to a MySQL container to check for various MySQL statuses to determine if the container is healthy.

With HEALTHCHECK configured, container healthiness can be retrieved directly from the standard “docker ps” command:

$ docker ps
CONTAINER ID        IMAGE                       COMMAND             CREATED             STATUS                    PORTS                 
42f98c8e0934        severalnines/mariadb:10.1   "/entrypoint.sh "   13 minutes ago      Up 13 minutes (healthy)   3306/tcp, 4567-4568/tcp

Plus, Docker Swarm’s ingress network will include only the healthy container right after the health check output starts to return 0 after startup. The following table shows the comparison of these two behaviours:

Options Sample output Description
Without HEALTHCHECK Hostname: db_mariadb_galera.2
Hostname: db_mariadb_galera.3
Hostname: ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.1.100' (111)
Hostname: db_mariadb_galera.1
Hostname: db_mariadb_galera.2
Hostname: db_mariadb_galera.3
Hostname: ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.1.100' (111)
Hostname: db_mariadb_galera.1
Hostname: db_mariadb_galera.2
Hostname: db_mariadb_galera.3
Hostname: db_mariadb_galera.4
Hostname: db_mariadb_galera.1
Applications will see an error because container db_mariadb_galera.4 is introduced incorrectly into the load balancing set. Without HEALTHCHECK, the STARTED container will be part of the “active” tasks in the service.
With HEALTHCHECK Hostname: db_mariadb_galera.1
Hostname: db_mariadb_galera.2
Hostname: db_mariadb_galera.3
Hostname: db_mariadb_galera.1
Hostname: db_mariadb_galera.2
Hostname: db_mariadb_galera.3
Hostname: db_mariadb_galera.4
Hostname: db_mariadb_galera.1
Hostname: db_mariadb_galera.2
Container db_mariadb_galera.4 is introduced correctly into the load balancing set. With proper HEALTHCHECK, the new container will be part of the “active” tasks in the service if it’s marked as healthy.

The only problem with Docker health check is it only supports two exit codes - either 1 (unhealthy) or 0 (healthy). This is enough for a stateless application, where containers can come and go without caring much about the state itself and other containers. With a stateful service like Galera Cluster or MySQL Replication, another exit code is required to represent a staging phase. For example, when a joiner node comes into the picture, syncing is required from a donor node (by SST or IST). This process is automatically started by Galera and probably requires minutes or hours to complete, and the current workaround for this is to configure [--update-delay] and [--health-interval * --health-retires] to higher than the SST/IST time.

For a clearer perspective, consider the following “service create” command example:

$ docker service create \
--replicas=3 \
--health-interval=30s \
--health-retries=20 \
--update-delay=600s \
--name=galera \
--network=galera_net \
severalnines/mariadb:10.1

The container will be destroyed if the SST process has taken more than 600 seconds. While in this state, the health check script will return “exit 1 (unhealthy)” in both joiner and donor containers because both are not supposed to be included by Docker Swarm’s load balancer since they are in syncing stage. After failures for 20 consecutive times at every 30 seconds (equal to 600 seconds), the joiner and donor containers will be removed by Docker Swarm and will be replaced by new containers.

It would be perfect if Docker’s HEALTHCHECK could accept more than exit "0" or "1" to signal Swarm’s load balancer. For example:

  • exit 0 => healthy => load balanced and running
  • exit 1 => unhealthy => no balancing and failed
  • exit 2 => unhealthy but ignore => no balancing but running

Thus, we don’t have to determine SST time for containers to survive the Galera Cluster startup operation, because:

  • Joiner/Joined/Donor/Desynced == exit 2
  • Synced == exit 0
  • Others == exit 1

Another workaround apart setting up [--update-delay] and [--health-interval * --health-retires] to be higher than SST time is you could use HAProxy as the load balancer endpoints, instead of relying on Docker Swarm’s load balancer. More discussion further on.

Data Persistency

Stateless doesn’t really care about persistency. It shows up, serves and get destroyed if the job is done or it is unhealthy. The problem with this behaviour is there is a chance of a total data loss in Galera Cluster, which is something that cannot be afforded by a database service. Take a look at the following example:

$ docker service create \
--replicas=3 \
--health-interval=30s \
--health-retries=20 \
--update-delay=600s \
--name=galera \
--network=galera_net \
severalnines/mariadb:10.1

So, what happens if the switch connecting the three Docker Swarm nodes goes down? A network partition, which will split a three-node Galera Cluster into 'single-node' components. The cluster state will get demoted into Non-Primary and the Galera node state will turn to Initialized. This situation turns the containers into an unhealthy state according to the health check. After a period 600 seconds if the network is still down, those database containers will be destroyed and replaced with new containers by Docker Swarm according to the "docker service create" command. You will end up having a new cluster starting from scratch, and the existing data is removed.

There is a workaround to protect from this, by using mode global with placement constraints. This is the preferred way when you are running your database containers on Docker Swarm with persistent storage in mind. Consider the following example:

$ docker service create \
--mode=global \
--constraints='node.labels.type == galera' \
--health-interval=30s \
--health-retries=20 \
--update-delay=600s \
--name=galera \
--network=galera_net \
severalnines/mariadb:10.1

The cluster size is limited to the number of available Docker Swarm node labelled with "type=galera". Dynamic scaling is not an option here. Scaling up or down is only possible if you introduce or remove a Swarm node with the correct label. The following diagram shows a 3-node Galera Cluster container with persistent volumes, constrained by custom node label "type=galera":

It would also be great if Docker Swarm supported more options to handle container failures:

  • Don’t delete the last X failed containers, for troubleshooting purposes.
  • Don’t delete the last X volumes, for recovery purposes.
  • Notify users if a container is recreated, deleted, rescheduled.

Load Balancing Algorithm

Docker Swarm comes with a load balancer, based on IPVS module in Linux kernel, to distribute traffic to all containers in round-robin fashion. It lacks several useful configurable options to handle routing of stateful applications, for example persistent connection (so source will always reach the same destination) and support for other balancing algorithm, like least connection, weighted round-robin or random. Despite IPVS being capable of handling persistent connections via option "-p", it doesn’t seem to be configurable in Docker Swarm.

In MySQL, some connections might take a bit longer time to process before it returns the output back to the clients. Thus, Galera Cluster load distribution should use "least connection" algorithm, so the load is equally distributed to all database containers. The load balancer would ideally monitor the number of open connections for each server, and sends to the least busy server. Kubernetes defaults to least connection when distributing traffic to the backend containers.

As a workaround, relying on other load balancers in front of the service is still the recommended way. HAProxy, ProxySQL and MaxScale excel in this area. However, you have to make sure these load balancers are aware of the dynamic changes of the backend database containers especially during scaling and failover.

Summary

Galera Cluster on Docker Swarm fits well in development, test and staging environments, but it needs some more work when running in production. The technology still needs some time to mature, but as we saw in this blog, there are ways to work around the current limitations.

by ashraf at May 30, 2017 09:59 AM

May 29, 2017

Peter Zaitsev

Percona Monitoring and Management 1.1.4 is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management 1.1.4 on May 29, 2017.

For installation instructions, see the Deployment Guide.

This release includes experimental support for MongoDB in Query Analytics, including updated QAN interface.

Query Analytics for MongoDB

To enable MongoDB query analytics, use the mongodb:queries alias when adding the service. As an experimental feature, it also requires the --dev-enable option:

sudo pmm-admin add --dev-enable mongodb:queries

NOTE: Currently, it monitors only collections that are present when you enable MongoDB query analytics. Query data for collections that you add later is not gathered. This is a known issue and it will be fixed in the future.

Query Analytics Redesign

The QAN web interface was updated for better usability and functionality (including the new MongoDB query analytics data). The new UI is experimental and available by specifying /qan2 after the URL of PMM Server.

New Query Analytics web interface

NOTE: The button on the main landing page still points to the old QAN interface.

You can check out the new QAN web UI at https://pmmdemo.percona.com/qan2

New in PMM Server

  • PMM-724: Added the Index Condition Pushdown (ICP) graph to the MySQL InnoDB Metrics dashboard.
  • PMM-734: Fixed the MySQL Active Threads graph in the MySQL Overview dashboard.
  • PMM-807: Fixed the MySQL Connections graph in the MySQL Overview dashboard.
  • PMM-850: Updated the MongoDB RocksDB and MongoDB WiredTiger dashboards.
  • Removed the InnoDB Deadlocks and Index Collection Pushdown graphs from the MariaDB dashboard.
  • Added tooltips with descriptions for graphs in the MySQL Query Response Time dashboard.Similar tooltips will be gradually added to all graphs.

New in PMM Client

  • PMM-801: Improved PMM Client upgrade process to preserve credentials that are used by services.
  • Added options for pmm-admin to enable MongoDB cluster connections.

About Percona Monitoring and Management

Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

by Alexey Zhebel at May 29, 2017 07:28 PM

May 26, 2017

Peter Zaitsev

Percona Server for MongoDB 3.0.15-1.10 is Now Available

Percona Server for MongoDB 3.2

Percona Server for MongoDB 3.0Percona announces the release of Percona Server for MongoDB 3.0.15-1.10 on May 26, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, fully compatible, highly-scalable, zero-maintenance downtime database supporting the MongoDB v3.0 protocol and drivers. It extends MongoDB with PerconaFT and MongoRocks storage engines, as well as several enterprise-grade features:

NOTE: PerconaFT was deprecated and is not available in later versions. TokuBackup was replaced with Hot Backup for WiredTiger and MongoRocks storage engines.

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.0.15 and includes the following additional change:

Percona Server for MongoDB 3.0.15-1.10 release notes are available in the official documentation.

by Alexey Zhebel at May 26, 2017 05:28 PM

Percona Server for MySQL 5.7.18-15 is Now Available

Percona Server for MySQL 5.7.18-15

Percona Server for MySQL 5.7.18-15Percona announces the GA release of Percona Server for MySQL 5.7.18-15 on May 26, 2017. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

Based on MySQL 5.7.18, including all the bug fixes in it, Percona Server for MySQL 5.7.18-15 is the current GA release in the Percona Server for MySQL 5.7 series. Percona’s provides completely open-source and free software. Find release details in the 5.7.18-15 milestone at Launchpad.

Bugs Fixed:
  • The server would crash when querying partitioning table with a single partition. Bug fixed #1657941 (upstream #76418).
  • Running a query on InnoDB table with ngram full-text parser and a LIMIT clause could lead to a server crash. Bug fixed #1679025 (upstream #85835).

The release notes for Percona Server for MySQL 5.7.18-15 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at May 26, 2017 05:05 PM

May 25, 2017

Peter Zaitsev

What About ProxySQL and Mirroring?

ProxySQL and Mirroring

In this blog post, we’ll look at how ProxySQL and mirroring go together.

Overview

Let me be clear: I love ProxySQL, and I think it is a great component for expanding architecture flexibility and high availability. But not all that shines is gold! In this post, I want to correctly set some expectations, and avoid selling carbon for gold (carbon has it’s own uses, while gold has others).

First of all, we need to cover the basics of how ProxySQL manages traffic dispatch (I don’t want to call it mirroring, and I’ll explain further below).

ProxySQL receives a connection from the application, and through it we can have a simple SELECT or a more complex transaction. ProxySQL gets each query, passes them to the Query Processor, processes them, identifies if a query is mirrored, duplicates the whole MySQL session ProxySQL internal object and associates it to a mirror queue (which refer to a mirror threads pool). If the pool is free (has an available active slot in the concurrent active threads set) then the query is processed right away. If not, it will stay in the queue. If the queue is full, the query is lost.

Whatever is returned from the query goes to /dev/null, and as such no result set is passed back to the client.

The whole process is not free for a server. If you check the CPU utilization, you will see that the “mirroring” in ProxySQL actually doubles the CPU utilization. This means that the traffic on server A is impacted because of resource contention.

Summarizing, ProxySQL will:

  1. Send the query for execution in different order
  2. Completely ignore any transaction isolation
  3. Have different number of query executed on B with respect to A
  4. Add significant load on the server resources

This point, coupled with the expectations I mention in the reasoning at the end of this article, it is quite clear to me that at the moment we cannot consider ProxySQL as a valid mechanism to duplicate a consistent load from server A to server B.

Personally, I don’t think that the ProxySQL development team (Rene :D) should waste time on fixing this issue, as there are so many other things to cover and improve on in ProxySQL.

After working extensively with ProxySQL, and doing a deep QA on mirroring, I think that either we keep it as basic blind traffic dispatcher. Otherwise, a full re-conceptualization is required. But once we have clarified that, ProxySQL “traffic dispatch” (still don’t want to call it mirroring) remains a very interesting feature that can have useful applications – especially since it is easy to setup.

The following test results should help set the correct expectations.

The tests were simple: load data in a Percona XtraDB Cluster and use ProxySQL to replicate the load on a MySQL master-slave environment.

  • Machines for MySQL/Percona XtraDB Cluster: VM with CentOS 7, 4 CPU 3 GB RAM, attached storage
  • Machine for ProxySQL: VM CentOS 7, 8 CPU 8GB RAM

Why did I choose to give ProxySQL a higher volume of resources? I knew in advance I could need to play a bit with a couple of settings that required more memory and CPU cycles. I wanted to be sure I didn’t get any problems from ProxySQL in relation to CPU and memory.

The application that I was using to add load is a Java application I develop to perform my tests. The app is at https://github.com/Tusamarco/blogs/blob/master/stresstool_base_app.tar.gz, and the whole set I used to do the tests are here:  https://github.com/Tusamarco/blogs/tree/master/proxymirror.

I used four different tables:

+------------------+
| Tables_in_mirror |
+------------------+
| mirtabAUTOINC    |
| mirtabMID        |
| mirtabMIDPart    |
| mirtabMIDUUID    |

Ok so let start. Note that the meaningful tests are the ones below. For the whole set, refer to the whole set package. First setup ProxySQL:

First setup ProxySQL:

delete from mysql_servers where hostgroup_id in (500,501,700,701);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.5',500,3306,60000,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.5',501,3306,100,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.21',501,3306,20000,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.231',501,3306,20000,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.7',700,3306,1,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.7',701,3306,1,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.25',701,3306,1,400);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('192.168.0.43',701,3306,1,400);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
delete from mysql_users where username='load_RW';
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('load_RW','test',1,500,'test',1);
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;
delete from mysql_query_rules where rule_id=202;
insert into mysql_query_rules (rule_id,username,destination_hostgroup,mirror_hostgroup,active,retries,apply) values(202,'load_RW',500,700,1,3,1);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

Test 1

The first test is mainly a simple functional test during which I insert records using one single thread in Percona XtraDB Cluster and MySQL. No surprise, here I have 3000 loops and at the end of the test I have 3000 records on both platforms.

To have a baseline we can see that the ProxySQL CPU utilization is quite low:

ProxySQL and Mirroring

At the same time, the number of “questions” against Percona XtraDB Cluster and MySQL very similar:

Percona XtraDB Cluster

ProxySQL and Mirroring

MySQL

ProxySQL and Mirroring

The other two metrics we want to keep an eye on are Mirror_concurrency and Mirror_queue_length. These two refer respectively to mysql-mirror_max_concurrency and mysql-mirror_max_queue_length:

ProxySQL and Mirroring

These two new variables and metrics were introduced in ProxySQL 1.4.0, with the intent to control and manage the load ProxySQL generates internally related to the mirroring feature. In this case, you can see we have a max of three concurrent connections and zero queue entries (all good).

Now that we have a baseline, and that we know at functional level “it works,” let see what happens when increasing the load.

Test 2

The scope of the test was identifying how ProxySQL behaves with a standard configuration and increasing load. It comes up that as soon as ProxySQL has a little bit more load, it starts to lose some queries along the way.

Executing 3000 loops for 40 threads only results in 120,000 rows inserted in all the four tables in Percona XtraDB Cluster. But the table in the secondary (mirrored) platform only has a variable number or inserted rows, between 101,359 and 104,072. This demonstrates consistent loss of data.

After reviewing and comparing the connections running in Percona XtraDB Cluster and the secondary, we can see that (as expected) Percona XtraDB Cluster’s number of connections is scaling and serving the number of incoming requests, while the connections on the secondary are limited by the default value of mysql-mirror_max_concurrency=16.

ProxySQL and Mirroring

Is also interesting to note that the ProxySQL transaction process queue maintains its connection to the Secondary longer than the connection to Percona XtraDB Cluster.

ProxySQL and Mirroring

As we can see above, the queue is an evident bell curve that reaches 6K entries (which is quite below the mysql-mirror_max_queue_length limit (32K)). Yet queries were dropped by ProxySQL, which indicates the queue is not really enough to accommodate the pending work.

ProxySQL and Mirroring

CPU-wise, ProxySQL (as expected) take a few more cycles, but nothing crazy. The overhead for the simple mirroring queue processing can be seen when the main load stops around 12:47.

Another interesting graph to keep an eye on is the one describing the executed commands inside Percona XtraDB Cluster and the secondary:

Percona XtraDB Cluster

ProxySQL and Mirroring

Secondary

ProxySQL and Mirroring

As you can see, the traffic on the secondary was significantly less (669 on average, compared to Percona XtraDB Cluster’s 1.17K). Then it spikes when the main load on the Percona XtraDB Cluster node terminates. In short it is quite clear that ProxySQL is not able to scale following the traffic existing in Percona XtraDB Cluster, and actually loses a significant amount of data on the secondary.

Doubling the load in Test 3 shows the same behavior, with ProxySQL reaches its limit for traffic duplication.

But can this be optimized?

The answer is, of course, yes! This is what the mysql-mirror_max_concurrency is for, so let;’s see what happens if we increase the value from 16 to 100 (just to make it crazy high).

Test 4 (two app node writing)

The first thing that comes to attention is that both Percona XtraDB Cluster and secondary report the same number of rows in the tables (240,000). That is a good first win.

Second, note the number of running connections:

ProxySQL and Mirroring

The graphs are now are much closer, and the queue drops to just a few entries.

Commands executed in Percona XtraDB Cluster:

And commands executed in the secondary:

Average execution reports the same value, and very similar trends.

Finally, what was the CPU cost and effect?

Percona XtraDB Cluster and secondary CPU utilization:

     

As expected, some difference in the CPU usage distribution exists. But the trend is consistent between the two nodes, and the operations.

The ProxySQL CPU utilization is definitely higher than before:

But it’s absolutely manageable, and still reflects the initial distribution.

What about CRUD? So far I’ve only tested the insert operation, but what happen if we run a full CRUD set of tests?

Test 7 (CRUD)

First of all, let’s review the executed commands in Percona XtraDB Cluster:

And the secondary:

While in appearance we have very similar workloads, selects aside the behavior will significantly diverge. This is because in the secondary the different operations are not encapsulated by the transaction. They are executed as they are received. We can see a significant difference in update and delete operations between the two.

Also, the threads in the execution show a different picture between the two platforms:

Percona XtraDB Cluster

Secondary

It appears quite clear that Percona XtraDB Cluster is constantly running more threads and more connections. Nevertheless, both platforms process a similar total number of questions:

Percona XtraDB Cluster

Secondary

Both have an average or around 1.17K/second questions.

This is also another indication of how much the impact of concurrent operation on behavior, with no respect to the isolation or execution order. Below we can clearly see different behavior by reviewing the CPU utilization:

Percona XtraDB Cluster

Secondary

Conclusions

To close this article, I want to go back to the start. We cannot consider the mirror function in ProxySQL as a real mirroring, but more as traffic redirection (check here for more reasoning on mirroring from my side).

Using ProxySQL with this approach is still partially effective in testing the load and the effect it has on a secondary platform. As we know, data consistency is not guaranteed in this scenario, and Selects, Updates and Deletes are affected (given the different data-set and result-set they manage).

The server behaviors change between the original and mirror, if not in the quantity or the quality.

I am convinced that when we need a tool able to test our production load on a different or new platform, we would do better to look to something else. Possibly query Playback, recently reviewed and significantly patched by DropBox (https://github.com/Percona-Lab/query-playback).

In the end, ProxySQL is already a cool tool. If it doesn’t cover mirroring well, I can live with that. I am interested in having it working as it should (and it does in many other functionalities).

Acknowledgments

As usual, to Rene, who worked on fixing and introducing new functionalities associated with mirroring, like queue and concurrency control.

To the Percona team who developed Percona Monitoring and Management (PMM): all the graphs here (except 3) come from PMM (some of them I customized).

by Marco Tusa at May 25, 2017 10:45 PM

MariaDB AB

Using MariaDB MaxScale 2.1 Regex Filter for Migrations

Using MariaDB MaxScale 2.1 Regex Filter for Migrations anderskarlsson4 Thu, 05/25/2017 - 13:13

Migrating applications from one database system to another is sometimes easy and sometimes not. But they are hardly ever effortless. Among the obvious issues are schema and data, migrating from one datatype to another, with slightly different behavior and semantics is one thing and another is migrating the actual data, is it UTF8 and if so how many bytes? What is the collation? What is the required accuracy of numeric types?

And on top of this are things such as triggers, stored procedures and such. Not to mention performance tuning and the optimal way to construct SQL statements.

Speaking of SQL statements, we have application code also. Yes, most databases have some kind of application running on them, often more than one, and these access the database using SQL over some kind of API such as JDBC, ODBC or some proprietary driver. And application code, even simple SQL tends to have one or two database specific constructs in them, and that is what this blog is about.

Before moving on to that though, a few words on MariaDB Server 10.2.6 which is GA since May 23. MariaDB Server 10.2 does contain more than a few things that make migration from other database systems to MariaDB a lot easier. Among these features are:

  • CHECK constraints. The syntax for these has been supported before, but in MariaDB Server 10.2 these are actually implementing proper constraints.
  • DEFAULT values. In MariaDB Server before version 10.2, these were several restrictions around what DEFAULT values could be used, and how, but in 10.2 these are lifted.
  • Multiple triggers per event. In MariaDB Server before 10.2 you could only have one trigger per DML eevent, i.e. several BEFORE INSERT triggers. This has two advantages, one is the obvious one that the database you are migrating from might be supporting multiple triggers per event. Another is that sometimes you want to add a trigger to implement some compatibility with the database system you are migrating from, and this feature makes doing this a lot easier.

With that said, let’s say you have migrated the schema and the procedures and what have you not, and also the data. Then you have replaced that ODBC driver from the existing database system with one from MariaDB, which means we are all set to try the application. And the application falls over on the first SQL statement because it uses some proprietary feature of the database we are migrating from. There are some ways of getting around that, with MariaDB there are two ways that have been used in the past:

  • Use MariaDB compatibility features. As stated above, there are many new compatibility features in MariaDB Server 10.2.6 GA. In addition there are some settings for the SQL_MODE parameter for compatibility, such as the PIPES_AS_CONCAT that ensures that the ANSI SQL concatenation operator, two pipes (||), is interpreted as a MariaDB CONCAT.
  • Develop procedures, functions and user defined functions that mimic procedures and functions in other database system.

There is nothing wrong with the above means of migration, but they don’t cover all aspect of a migration. One more tool that is available now is the new MariaDB MaxScale 2.1.3 GA and there is a plugin that is particularly useful, the Regex one. What this allows us to do is to replace text in the SQL statement so that it matches something that MariaDB Server can work with, and a good example is the Oracle DECODE() function. This function is rather special, in a few ways:

  • It takes a variable number of arguments, from 3 and up.
  • The type of the return value depends on the type of the arguments.

The SQL Standard construct for this is the CASE statement, which has the same attributes as above. We cannot solve the use of the DECODE function by adding a STORED FUNCTION. A UDF (User Defined Function) is possible as this can take any number and type of arguments. Also even though a UDF can only return a predefined type, this is not a big issue as MariaDB is loosely typed, so we can always, for numeric results, return a numeric string. A bigger issue though is that MariaDB already has a DECODE function that does something else.

Also, we would really like to use the CASE function and a way to deal with that is to use the MariaDB MaxScale Regex filter. Let me show you how. To begin with, we need to set up the Regex filter itself, and the way I do it here, I will use multiple filters, one for each of the number of arguments I pass to DECODE. I guess there is some way of doing this in a smarter way, but here I am just showing the principle. Also note that the Regex filter use the PCRE2 regular expressions, not the Posix one. Let’s start with a couple of filter specification for a DECODE with 3 and 4 arguments and define them in our MariaDB MaxScale configuration file:

[DecodeFilter3]
type=filter
module=regexfilter
options=ignorecase
match=DECODE\(([^,)]*),([^,)]*),([^,)]*)\)
replace=CASE $1 WHEN $2 THEN $3 END

[DecodeFilter4]
type=filter
module=regexfilter
options=ignorecase
match=DECODE\(([^,)]*),([^,)]*),([^,)]*),([^,)]*)\)
replace=CASE $1 WHEN $2 THEN $3 ELSE $4 END

As anyone can see, the above really isn’t perfect, things like strings with embedded comas and what have you not will not work, but in the general case, this should work reasonable well, which is not to say you would want to use this in production, but for a test or a proof-of-concept this is good enough. For DECODE with 5, 6 or more arguments, you add these following the pattern above.
Before we show this in action, let me add one more useful filter for the Oracle SYSDATE psedocolumn. In Oracle SQL, SYSDATE is the same as NOW() in MariaDB, so this is a simple replacement, but as SYSDATE is a pseudocolumn and not a function, like NOW(), we cannot write a simple Stored Function to handle it, but using a MaraiaDB MaxScale filter should do the trick, like this:

[sysdate]
type=filter
module=regexfilter
options=ignorecase
match=([^[:alpha:]])SYSDATE
replace=$1NOW()


With this, it is now time to enable these filters, and that is done by adding them to the Service in MariaDB MaxScale which we will use:

[Read-Write Service]
type=service
router=readwritesplit
servers=srv1
user=rwuser
passwd=rwpwd
max_slave_connections=100%
filters=DecodeFilter3|DecodeFilter4|sysdate


Assuming you have your MariaDB MaxScale correctly configured in any other place, let’s see if this works as expected. First, we have to restart MariaDB MaxScale and then when we connect to MariaDB and do a call to DECODE the way it looks like in Oracle and see what is returned:

$ mysql -h moe -P 4008 -u theuser -pthepassword
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 21205
Server version: 10.0.0 2.1.3-maxscale MariaDB Server

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB> SELECT DECODE(1, 2, 3, 4) FROM dual;
+------------------------------------+
| CASE 1 WHEN  2 THEN  3 ELSE  4 END |
+------------------------------------+
|                                  4 |
+------------------------------------+
MariaDB> SELECT DECODE('Str1', 'Str1', 'Was str1', 'Not str1') FROM dual;
+----------------------------------------------------------------+
| CASE 'Str1' WHEN  'Str1' THEN  'Was str1' ELSE  'Not str1' END |
+----------------------------------------------------------------+
| Was str1                                                       |
+----------------------------------------------------------------+’
MariaDB> SELECT DECODE('Str1', 'Str2', 'Was str1') FROM dual;
+-----------------------------------------------+
| CASE 'Str1' WHEN  'Str2' THEN  'Was str1' END |
+-----------------------------------------------+
| NULL                                          |
+-----------------------------------------------+


As can be seen, the translation from DECODE to a CASE statement seems to work as expected. Let’s also try with SYSDATE

MariaDB> SELECT DECODE('Today', 'Today', SYSDATE, 'Some other day') FROM dual;
+-------------------------------------------------------------------+
| CASE 'Today' WHEN  'Today' THEN  NOW() ELSE  'Some other day' END |
+-------------------------------------------------------------------+
| 2017-05-19 18:51:22                                               |
+-------------------------------------------------------------------+


As we see here, not only does SYSDATE work as expected, we can handle both DECODE and SYSDATE conversions as the filters are piped to each other. Using MariaDB MaxScale with the Regex filter is yet another tool for migrating applications.

Happy SQL’ing
/Karlsson

MariaDB MaxScale includes multiple filters, and one of the most useful, flexible and easiest to use is the regex filter and in this blog we will look at how this can be used to transform SQL statements that aren't 100% compatible with MariaDB.

Login or Register to post comments

by anderskarlsson4 at May 25, 2017 05:13 PM

May 24, 2017

Peter Zaitsev

Percona Software and Roadmap Update with CEO Peter Zaitsev: Q2 2017

Percona Software and Services

This blog post is a summary of the Percona Software and Roadmap Update – Q2 2017 webinar given by Peter Zaitsev on May 4, 2017. This webinar reflects changes and updates since the last update (Q1 2017).

A full recording of this webinar, along with the presentation slide deck, can be found here.

Percona Software

Below are the latest and upcoming features in Percona’s software. All of Percona’s software is 100% free and open source, with no restricted “Enterprise” version. Percona doesn’t restrict users with open core or “open source, eventually” (BSL) licenses.

Percona Server for MySQL 5.7

Latest Improvements

Features About To Be Released 

  • Integration of TokuDB and Performance Schema
  • MyRocks integration in Percona Server
  • Starting to look towards MySQL 8

Percona XtraBackup 2.4

Latest Improvements

Percona Toolkit

Latest Improvements

Percona Server for MongoDB 3.4

Latest Improvements

Percona XtraDB Cluster 5.7

Latest Improvements

Performance Improvement Benchmarks

Below, you can see the benchmarks for improvements to Percona XtraDB Cluster 5.7 performance. You can read about the improvements and benchmark tests in more detail here and here.

Percona Software and Roadmap Update

Percona XtraDB Cluster 5.7 Integrated with ProxySQL 1.3

Percona Monitoring and Management

New in Percona Monitoring and Management

Advanced MariaDB Dashboards in PMM (Links go to PMM Demo)

Percona Q217 Roadmap 4

Improved MongoDB Dashboards in PMM (Links go to PMM Demo)

Percona Q217 Roadmap 7

Percona Q217 Roadmap 9

Percona Q217 Roadmap 10

Check out the PMM Demo

Thanks for tuning in for an update on Percona Software and Roadmap Update – Q2 2017.

New Percona Online Store – Easy to Buy, Pay Monthly

by Dave Avery at May 24, 2017 10:24 PM

May 23, 2017

MariaDB AB

What's New in MariaDB MaxScale 2.1

What's New in MariaDB MaxScale 2.1 Dipti Joshi Tue, 05/23/2017 - 19:44

We are happy to announce the 2.1 GA release of MariaDB MaxScale, the next-generation database proxy for MariaDB.

MariaDB MaxScale 2.1 introduces the following key new capabilities:

Dynamic Configuration

  • Server, monitor and listeners: MaxScale 2.1 supports dynamic configuration of servers, monitors and listeners, which can be added, modified or removed during runtime. A set of new commands were added to maxadmin.

  • Database firewall filter: Rules can now be modified during runtime using the new module commands introduced in this release.

  • Persistent configuration changes: The runtime configuration changes are immediately applied to the running MaxScale as well as persisted using the new hierarchical configuration architecture.

Security

  • Selective data masking: Meet your HIPAA and PCI compliance needs by obfuscating sensitive data using the new masking filter.

  • Result set limiting: Prevent access to large sets of data with a single query by using maxrows filter - securing your database servers against malicious or accidental DoS attack.

  • Secured single sign-on: MariaDB MaxScale now supports LDAP/GSSAPI authentication support.

  • Prepared statement filtering by database firewall: The database firewall filter now applies the filtering rules to prepared statements as well.

  • Function filtering by database firewall: Now the database firewall filter adds a rule to whitelist or blacklist a query based on presence of a function.

  • Secure binlog server: The binlog cache files on MaxScale can now be encrypted. MaxScale binlog server also uses SSL in communication with master and slave.

Query Performance

  • Query cache filter: MariaDB MaxScale 2.1 now allows caching of query results in MaxScale for a configurable timeout. If a query is in cache, MaxScale will return results from cache before going to server to fetch query results. Our internal testing has shown 2.8x performance improvement from 2.0 to 2.1.3 using cache filter.

  • Streaming insert plugin: A new plugin in MariaDB MaxScale 2.1 converts all INSERT statements done inside an explicit transaction into LOAD DATA LOCAL INFILE.

Scalability

  • Aurora Cluster support: MariaDB MaxScale can now be used as a proxy for Amazon Aurora Cluster. Newly added monitor detects read replicas and write node in Aurora Cluster, and supports launchable scripts on monitored events like other monitors.

  • Multi-master for MySQL monitor: Now MariaDB MaxScale can detect complex multi-master replication topologies for MariaDB and MySQL environment.

  • Failover mode for MySQL monitor: For a two-node master-slave cluster, MariaDB MaxScale now allows slave to act as a master in case the original master fails.

  • Read-write splitting with master pinning: MariaDB MaxScale 2.1 introduces a new “Consistent Critical Read Filter.” This filter detects a statement that would modify the database and route all subsequent statements to the master server where data is guaranteed to be in an up-to-date state.

Thanks to community members and especially OutboundEngine who beta tested MariaDB MaxScale 2.1.0 to MariaDB MaxScale 2.1.2 in order for us to reach GA with MariaDB MaxScale 2.1.3.

Links:

Please post your questions in our Knowledge Base or email me at dipti.joshi@mariadb.com.

We are happy to announce the 2.1 GA release of MariaDB MaxScale, the next-generation database proxy for MariaDB.

Login or Register to post comments

by Dipti Joshi at May 23, 2017 11:44 PM

Monty Says

MariaDB 10.2 GA released with several advanced features

MariaDB 10.2.6 GA is now released. It's a release where we have concentrated on adding new advanced features to MariaDB

The most noteworthy ones are:
  • Windows Functions gives you the ability to do advanced calculation over a sliding window.
  • Common table expressions allows you to do more complex SQL statements without having to do explicit temporary tables.
  • We finally have a DEFAULT clause that can take expressions and also CHECK CONSTRAINT.
  • Multiple triggers for the same event. This is important for anyone trying to use tools, like pt-online-schema-change, which requires multiple triggers for the same table.
  • A new storage engine, MyRocks, that gives you high compression of your data without sacrificing speed. It has been developed in cooperation with Facebook and MariaDB to allow you to handle more data with less resources.
  • flashback, a feature that can rollback instances/databases/tables to an old snapshot. The version in MariaDB 10.2 is DML only. In MariaDB 10.3 we will also allow rollback over DML (like DROP TABLE).
  • Compression of events in the binary log.
  • JSON functions added. In 10.2.7 we will also add support for CREATE TABLE ... (a JSON).
A few smaller but still noteworthy new features:
  • Connection setup was made faster by moving creation of THD to a new thread. This, in addition with better thread caching, can give a connection speedup for up to 85 % in some cases.
  • Table cache can automatically partition itself as needed to reduce the contention.
  • NO PAD collations, which means that end space are significant in comparisons.
  • InnoDB is now the default storage engine. Until MariaDB 10.1, MariaDB used the XtraDB storage engine as default. XtraDB in 10.2 is not up to date with the latest features of InnoDB and cannot be used. The main reason for this change is that most of the important features of XtraDB are nowadays implemented in InnoDB . As the MariaDB team is doing a lot more InnoDB development than ever before, we can't anymore manage updating two almost identical engines. The InnoDB version in MariaDB contains the best features of MySQL InnoDB and XtraDB and a lot more. As the InnoDB on disk format is identical to XtraDB's this will not cause any problems when upgrading to MariaDB 10.2
  • The old GPL client library is gone; now MariaDB Server comes with the LGPL Connector/C client library.

There are a lot of other new features, performance enhancements and variables in MariaDB 10.2 for you to explore!

I am happy to see that a lot of the new features have come from the MariadB community! (Note to myself; This list doesn't include all contributors to MariadB 10.2, needs to be update.)

Thanks a lot to everyone that has contributed to MariaDB!

by Michael "Monty" Widenius (noreply@blogger.com) at May 23, 2017 10:35 PM

Peter Zaitsev

How to Save and Load Docker Images to Offline Servers

Docker Images

Docker ImagesIn this post, we’ll see how to make Docker images available to servers that don’t have access to the Internet (i.e., machines where docker pull <image_name> does not work).

As a specific example, we will do this with the latest Percona Monitoring and Management Docker images, since we had requests for this from users and customers. With the following steps, you’ll be able to deploy PMM within your secure network, without access to the Internet. Additionally, the same steps can be used when you need to upgrade the containers’ version in future releases.

There are two ways in which we can do this:

  • the easy way, by using docker save and docker load, or
  • the not-so-easy way, by setting up our own registry

We’ll focus on the first option, since the latter is a bit more convoluted. If you need your own registry, you are probably looking into something else rather than simply avoiding a firewall to pull one image to a server. Check out the Docker online docs in case option two fits your needs better.

As of this writing, 1.1.3 is the latest PMM version, so this is what we’ll use in the example. An image name is comprised of three parts, namely:

  • user_account/ (note the ‘/’ at the end); or empty string (and no ‘/’) for the official Docker repo
  • image_name
  • :tag (note the ‘:’ at the beginning)

The PMM Docker images have the following syntax: percona/pmm-server:1.1.3, but you can change this in the following examples to whatever image name you want, and it will work just the same. Before moving on to the commands needed, let’s imagine that serverA is the machine that has access to the Internet and serverB is the machine behind the firewall.

The steps are simple enough. On serverA, get the image, and save it to a file:

serverA> docker pull percona/pmm-server:1.1.3
1.1.3: Pulling from percona/pmm-server
45a2e645736c: Pull complete
7a3c6f252004: Pull complete
2cc1d8878ff1: Pull complete
6c49ea4e9955: Pull complete
bc4630d3a194: Pull complete
75f0952c00bd: Pull complete
79d583a1689c: Pull complete
5a820193ac79: Pull complete
927a0614b164: Pull complete
Digest: sha256:5310b23066d00be418a7522c957b2da4155a63c3e7b08663327aef075674bc2e
Status: Downloaded newer image for percona/pmm-server:1.1.3
serverA> docker save percona/pmm-server:1.1.3 > ~/pmm-server_1.1.3.tar

Now, all you need to do is move the generated tar file to serverB (by using “scp” or any other means), and execute the following:

serverB> docker load < ~/pmm-server_1.1.3.tar
serverB> docker images
REPOSITORY           TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
percona/pmm-server   1.1.3               acc9af2459a4        3 weeks ago         1.146 GB

Now you’ll be able to use the image as if you had used docker pull percona/pmm-server:1.1.3​:

serverB> docker create ... percona/pmm-server:1.1.3 /bin/true
301a9e89ee95886f497482038aa6601d6cb2e21c0532e1077fa44213ef597f38
serverB> docker run -d ... percona/pmm-server:1.1.3
dbaffa80f62bc0b80239b922bbc746d828fbbeb212a638cfafea92b827141abb
serverB> curl http://localhost | grep "Percona Monitoring and Management"
...
                   <p>Percona Monitoring and Management (PMM) is a free and open-source
solution for managing and monitoring performance on MySQL and MongoDB, and provides
time-based analysis of performance to ensure that your data works as efficiently as
possible.</p>
...

Lastly, let me add the relevant documentation links, so you have them at hand, if needed:

https://www.percona.com/doc/percona-monitoring-and-management/deploy/server/docker.html

https://docs.docker.com/engine/reference/commandline/save/

https://docs.docker.com/engine/reference/commandline/load/

by Agustín at May 23, 2017 10:16 PM

MariaDB AB

What's New in MariaDB Server 10.2

What's New in MariaDB Server 10.2 RalfGebhardt Tue, 05/23/2017 - 17:03

We are happy to announce the general availability (GA) of MariaDB Server 10.2! MariaDB Server 10.2 is the newest major version of MariaDB Server, the fastest growing open source relational database.

MariaDB Server 10.2 is the next evolution after MariaDB Server 10.1. In 10.1 the integration of Galera Cluster as a high availability solution, data-at-rest encryption and other security features like the password validation API have been the key enhancements of MariaDB Server.

Now, with MariaDB Server 10.2.6 GA, new significant enhancements are available for our users and customers, including:

  • SQL enhancements like window functions, common table expressions and JSON functions allow new use cases for MariaDB Server

  • Standard MariaDB Server replication has further optimizations

  • Many area limitations have been removed, which allows easier use and there is no need for limitation handling on the application level

  • MyRocks, a new storage engine developed by Facebook, has been introduced, which will further enrich the use cases for MariaDB Server

Window Functions

Window functions are popular in Business Intelligence (BI) where more complex report generation is needed based on a subset of the data, like country or sales team metrics. Another common use case is where time-series based data should be aggregated based on a time window instead of just a current record, like all rows inside a certain time span.

As analytics is becoming more and more important to end users, window functions deliver a new way of writing performance optimized analytical SQL queries, which are easy to read and maintain, and eliminates the need to write expensive subqueries and self-joins.

Common Table Expressions

Hierarchical and recursive queries are usually implemented using common table expressions (CTEs). They are similar to derived tables in a FROM clause, but by having an identification keyword WITH, the optimizer can produce more efficient query plans. Acting as an automatically created temporary and named result set, which is only valid for the time of the query, it can be used for recursive and hierarchical execution, and also allows for reuse of the temporary dataset. Having a dedicated method also helps to create more expressive and cleaner SQL code.

JSON Functions

JSON (JavaScript Object Notation), a text-based and platform independent data exchange format, is used not only to exchange data, but also as a format to store unstructured data. MariaDB Server 10.2 offers more than 24 JSON functions to allow querying, modification, validation and indexing of JSON formated data, which is stored in a text-based field of a database. As a result, the powerful relational model of MariaDB can be enriched by working with unstructured data, where required.

Through the use of virtual columns, the JSON function, JSON_VALUE and the newest indexing feature of MariaDB Server 10.2 on virtual columns, JSON values will be automatically extracted from the JSON string, stored in a virtual column and indexed providing the fastest access to the JSON string.

Using the JSON function JSON_VALID, the new CHECK CONSTRAINTS in MariaDB Server 10.2 guarantee that only JSON strings of the correct JSON format can be added into a field.

Binary Log Based Rollback

The enhanced mysqlbinlog utility delivered with MariaDB Server 10.2 includes a new point-in-time rollback function, which allows a database or table to revert to an earlier state, and delivers binary log based rollback of already committed data. The tool mysqlbinlog is not directly modifying any data, it is generating an “export file” including the reverted statements of the transactions, logged in a binary log file. The created file can be used with the command line client or other SQL tool to execute the included SQL statements. This way all committed transactions up to a given timestamp will be rolled back.

In the case of addressing logical mistakes like adding, changing or deleting data, so far the only possible way has been to use mysqlbinlog to review transactions and fix the problems manually. However, this often leads to data inconsistency because corrections typically only address the wrong statement, thereby ignoring other data dependencies.

Typically caused by DBA or user error, restoring a huge database can result in a significant outage of service. Rolling back the last transactions using point-in-time roll back takes only the time of the extract, a short review and the execution of the reverted transactions – saving valuable time, resources and service.

 

Start now and learn about the newest evolution of MariaDB Server 10.2

We are happy to announce the general availability (GA) of MariaDB Server 10.2! MariaDB Server 10.2 is the newest major version of MariaDB Server, the fastest growing open source relational database.

Login or Register to post comments

by RalfGebhardt at May 23, 2017 09:03 PM

Say Hello to MariaDB TX

Say Hello to MariaDB TX Shane Johnson Tue, 05/23/2017 - 15:49

We believe an enterprise database solution requires technology, tools and services, and that it should be easy to buy, easy to deploy and easy to manage – providing a great customer and user experience from beginning to end. 

In fact, we've made ease of use a central theme in our roadmap, and with the introduction of MariaDB TX today, we're taking another first step – packaging MariaDB technology, tools and services into a unified offering to help customers succeed with MariaDB infrastructure: MariaDB Server, MariaDB MaxScale and MariaDB Cluster.

In addition to introducing MariaDB TX, we are releasing MariaDB Server 10.2 and MariaDB MaxScale 2.1.

MariaDB-Product Overview 17 May 2017 (1).png

MariaDB Server, MariaDB MaxScale and MariaDB Cluster (Galera Cluster for MariaDB), along with MariaDB connectors and drivers, form the base technology in MariaDB TX. By bringing them together, we are building a modular, integrated platform rather than separate, independent products – making them easier to deploy, easier to use and easier to manage ... together.

In addition, MariaDB TX includes tools for administration (SQLyog for MariaDB), monitoring (Monyog for MariaDB) and backup/restore (MariaDB Backup) as well as notification services (e.g., security alerts). In addition, we're creating and investing in more innovative tools for future release, including MariaDB Replication Manager (MRM). 

And finally, to ensure customer success and satisfaction with MariaDB TX, we provide expert services for everything from database administration to enterprise architecture to migration management. In addition to technical support, customers can choose to extend their team with a remote DBA, a dedicated enterprise architect or migration project manager – resources with expert knowledge and experience.   

A great way to learn more about MariaDB TX 2.0, including the new features in MariaDB Server 10.2 and MariaDB MaxScale 2.1, is to join our upcoming launch webinar

If you want to dive right in, keep reading!

 

MariaDB TX 2.0

Product highlights

  • A comprehensive set of JSON functions in a relational database
  • An SSD optimized storage engine with unrivaled performance and efficiency
  • An advanced database firewall, complete with data masking

With MariaDB Server 10.2 and MaxScale 2.1, MariaDB TX 2.0 raises the completeness, compatibility, performance, scalability, security and disaster recovery of MariaDB – setting a new standard for open source database solutions in the enterprise. 

Completeness and compatibility

First, we wanted to increase SQL completeness and schema compatibility. If you choose MariaDB TX over Oracle Database or Microsoft SQL Server, you should be able to perform similar queries on similar schemas. And while many of these features have long been available in proprietary databases, we're excited to make them available in an open source database. 

Completeness

Compatibility

  • Common table expressions

  • Window functions

  • JSON and GeoJSON functions

  • EXECUTE IMMEDIATE statements

  • Subqueries within views

  • Multiple temp tables per query

  • CHECK constraints with expressions

  • DEFAULT values for BLOB/TEXT

  • DECIMAL columns up to 38 places

  • Multiple triggers per type per table

Performance and scalability

Next, we wanted to increase performance and scalability, focusing on storage, replication, querying and routing. At scale, improving storage efficiency and reducing disk IO not only improves performance, it reduces costs. Today, that means optimizing for SSDs, which is why we're introducing MyRocks, a storage engine developed by Facebook for web-scale use cases.

Storage

Querying

  • MyRocks storage engine

  • InnoDB enhancements

  • InnoDB NUMA interleave

  • Virtual Column indexes

  • Fast connections

  • Optimizer enhancements

  • Query caching

  • Streaming inserts

Replication

Routing

  • Binary Log read throttling

  • Binary Log compression

  • Dynamic server configuration

  • Read-write splitting + master pinning

  • Multi-statement routing + master pinning

Security and disaster recovery

Finally, we wanted to increase security and disaster recovery. In particular, by expanding the capabilities of the database firewall by ensuring sensitive data was protected, prepared statements were examined and denial of service queries were prevented. For disaster recovery, we introduce point-in-time rollback a la Oracle Flashback.

Security

Disaster recovery

  • Per user resource limits

  • Enforced TLS connections

  • Data masking

  • Prepared Statement filtering

  • Result Set limiting

  • Dynamic firewall rule configuration

  • Delayed replication
  • Binary Log based rollback

Resources

We believe an enterprise database solution requires technology, tools and services, and that it should be easy to buy, easy to deploy and easy to manage – providing a great customer and user experience from beginning to end. That's why we're introducing MariaDB TX.

Login or Register to post comments

by Shane Johnson at May 23, 2017 07:49 PM

MariaDB Foundation

MariaDB 10.2.6 Stable now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.2.6 Stable (GA). This is the first stable (GA) release in the MariaDB 10.2 series. See the What is MariaDB 10.2? page for more details on the new features in MariaDB 10.2. Download MariaDB 10.2.6 Release Notes Changelog What is MariaDB 10.2? MariaDB […]

The post MariaDB 10.2.6 Stable now available appeared first on MariaDB.org.

by Daniel Bartholomew at May 23, 2017 02:37 PM

May 22, 2017

Peter Zaitsev

ICP Counters in information_schema.INNODB_METRICS

ICP Counters

ICP CountersIn this blog, we’ll look at ICP counters in the information_schema.INNODB_METRICS. This is part two of the Index Condition Pushdown (ICP) counters blog post series. 

As mentioned in the previous post, in this blog we will look at how to check on ICP counters on MySQL and Percona Server for MySQL. This also applies to MariaDB, since the INNODB_METRICS table is also available for MariaDB (as opposed to the Handler_icp_% counters being MariaDB-specific). We will use the same table and data set as in the previous post.

For simplicity we’ll show the examples on MySQL 5.7.18, but they also apply to the latest Percona Server for MySQL (5.7.18) and MariaDB Server (10.2.5):

mysql [localhost] {msandbox} (test) > SELECT @@version, @@version_comment;
+-----------+------------------------------+
| @@version | @@version_comment            |
+-----------+------------------------------+
| 5.7.18    | MySQL Community Server (GPL) |
+-----------+------------------------------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SHOW CREATE TABLE t1G
*************************** 1. row ***************************
      Table: t1
Create Table: CREATE TABLE `t1` (
 `f1` int(11) DEFAULT NULL,
 `f2` int(11) DEFAULT NULL,
 `f3` int(11) DEFAULT NULL,
 KEY `idx_f1_f2` (`f1`,`f2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
|  3999996 |
+----------+
1 row in set (3.98 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 LIMIT 12;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    1 |    2 |    1 |
|    1 |    3 |    1 |
|    1 |    4 |    1 |
|    2 |    1 |    1 |
|    2 |    2 |    1 |
|    2 |    3 |    1 |
|    2 |    4 |    1 |
|    3 |    1 |    1 |
|    3 |    2 |    1 |
|    3 |    3 |    1 |
|    3 |    4 |    1 |
+------+------+------+
12 rows in set (0.00 sec)

Before proceeding with the examples, let’s see what counters we have available and how to enable and query them. The documentation page is at the following link: https://dev.mysql.com/doc/refman/5.7/en/innodb-information-schema-metrics-table.html.

The first thing to notice is that we are advised to check the validity of the counters for each version where we want to use them. The counters represented in the INNODB_METRICS table are subject to change, so for the most up-to-date list it’s best to query the running MySQL server:

mysql [localhost] {msandbox} (test) > SELECT NAME, SUBSYSTEM, STATUS FROM information_schema.INNODB_METRICS WHERE NAME LIKE '%icp%';
+------------------+-----------+----------+
| NAME             | SUBSYSTEM | STATUS   |
+------------------+-----------+----------+
| icp_attempts     | icp       | disabled |
| icp_no_match     | icp       | disabled |
| icp_out_of_range | icp       | disabled |
| icp_match        | icp       | disabled |
+------------------+-----------+----------+
4 rows in set (0.00 sec)

Looking good! We have all the counters we expected, which are:

  • icp_attempts: the number of rows where ICP was evaluated
  • icp_no_match: the number of rows that did not completely match the pushed WHERE conditions
  • icp_out_of_range: the number of rows that were checked that were not in a valid scanning range
  • icp_match: the number of rows that completely matched the pushed WHERE conditions

This link to the code can be used for reference: https://github.com/mysql/mysql-server/blob/5.7/include/my_icp.h.

After checking which counters we have at our disposal, you need to enable them (they are not enabled by default). For this, we can use the “modules” provided by MySQL to group similar counters for ease of use. This is also explained in detail in the documentation link above, under the “Counter Modules” section. INNODB_METRICS counters are quite inexpensive to maintain, as you can see in this post by Peter Z.

mysql [localhost] {msandbox} (test) > SET GLOBAL innodb_monitor_enable = module_icp;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT NAME, SUBSYSTEM, STATUS FROM information_schema.INNODB_METRICS WHERE NAME LIKE '%icp%';
+------------------+-----------+---------+
| NAME             | SUBSYSTEM | STATUS  |
+------------------+-----------+---------+
| icp_attempts     | icp       | enabled |
| icp_no_match     | icp       | enabled |
| icp_out_of_range | icp       | enabled |
| icp_match        | icp       | enabled |
+------------------+-----------+---------+
4 rows in set (0.00 sec)

Perfect, we now know what counters we need, and how to enable them. We just need to know how to query them, and we can move on to the examples. However, before rushing into saying that a simple SELECT against the INNODB_METRICS table will do, let’s step back a bit and see what columns we have available that can be of use:

mysql [localhost] {msandbox} (test) > DESCRIBE information_schema.INNODB_METRICS;
+-----------------+--------------+------+-----+---------+-------+
| Field           | Type         | Null | Key | Default | Extra |
+-----------------+--------------+------+-----+---------+-------+
| NAME            | varchar(193) | NO   |     |         |       |
| SUBSYSTEM       | varchar(193) | NO   |     |         |       |
| COUNT           | bigint(21)   | NO   |     | 0       |       |
| MAX_COUNT       | bigint(21)   | YES  |     | NULL    |       |
| MIN_COUNT       | bigint(21)   | YES  |     | NULL    |       |
| AVG_COUNT       | double       | YES  |     | NULL    |       |
| COUNT_RESET     | bigint(21)   | NO   |     | 0       |       |
| MAX_COUNT_RESET | bigint(21)   | YES  |     | NULL    |       |
| MIN_COUNT_RESET | bigint(21)   | YES  |     | NULL    |       |
| AVG_COUNT_RESET | double       | YES  |     | NULL    |       |
| TIME_ENABLED    | datetime     | YES  |     | NULL    |       |
| TIME_DISABLED   | datetime     | YES  |     | NULL    |       |
| TIME_ELAPSED    | bigint(21)   | YES  |     | NULL    |       |
| TIME_RESET      | datetime     | YES  |     | NULL    |       |
| STATUS          | varchar(193) | NO   |     |         |       |
| TYPE            | varchar(193) | NO   |     |         |       |
| COMMENT         | varchar(193) | NO   |     |         |       |
+-----------------+--------------+------+-----+---------+-------+
17 rows in set (0.00 sec)

There are two types: %COUNT and %COUNT_RESET. The former counts since the corresponding counters were enabled, and the latter since they were last reset (we have the TIME_% columns to check when any of these were done). This is why in our examples we are going to check the %COUNT_RESET counters, so we can reset them before running each query (as we did with FLUSH STATUS in the previous post).

Without further ado, let’s check how this all works together:

mysql [localhost] {msandbox} (test) > SET GLOBAL innodb_monitor_reset = module_icp;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE f1 < 3 AND (f2 % 4) = 1;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    2 |    1 |    1 |
+------+------+------+
2 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT NAME, COUNT_RESET FROM information_schema.INNODB_METRICS WHERE NAME LIKE 'icp%';
+------------------+-------------+
| NAME             | COUNT_RESET |
+------------------+-------------+
| icp_attempts     |           9 |
| icp_no_match     |           6 |
| icp_out_of_range |           1
| icp_match        |           2 |
+------------------+-------------+
4 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE f1 < 3 AND (f2 % 4) = 1;
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | t1    | NULL       | range | idx_f1_f2     | idx_f1_f2 | 5       | NULL |    8 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

If you checked the GitHub link above, you might have noted that the header file only contains three of the counters. This is because icp_attempts is computed as the sum of the rest. As expected, icp_match equals the number of returned rows, which makes sense. icp_no_match should also make sense if we check the amount of rows present without the WHERE conditions on f2.

mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE f1 < 3;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    1 |    2 |    1 |
|    1 |    3 |    1 |
|    1 |    4 |    1 |
|    2 |    1 |    1 |
|    2 |    2 |    1 |
|    2 |    3 |    1 |
|    2 |    4 |    1 |
+------+------+------+
8 rows in set (0.00 sec)

So, 8 – 2 = 6, which is exactly icp_no_match‘s value. Finally, we are left with icp_out_of_range. For each end of range the ICP scan detects, this counter is incremented by one. We only scanned one range in the previous query, so let’s try something more interesting (scanning three ranges):

mysql [localhost] {msandbox} (test) > SET GLOBAL innodb_monitor_reset = module_icp;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE ((f1 < 2) OR (f1 > 4 AND f1 < 6) OR (f1 > 8 AND f1 < 12)) AND (f2 % 4) = 1;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    5 |    1 |    1 |
|    9 |    1 |    1 |
|   10 |    1 |    1 |
|   11 |    1 |    1 |
+------+------+------+
5 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT NAME, COUNT_RESET FROM information_schema.INNODB_METRICS WHERE NAME LIKE 'icp%';
+------------------+-------------+
| NAME             | COUNT_RESET |
+------------------+-------------+
| icp_attempts     |          23 |
| icp_no_match     |          15 |
| icp_out_of_range |           3 |
| icp_match        |           5 |
+------------------+-------------+
4 rows in set (0.01 sec)

We have now scanned three ranges on f1, namely: (f1 < 2), (4 < f1 < 6) and (8 < f1 < 12). This is correctly reflected in the corresponding counter. Remember that the MariaDB Handler_icp_attempts status counter we looked at in the previous post does not take into account the out-of-range counts. This means the two “attempts” counters will not be the same!

mysql [localhost] {msandbox} (test) > SET GLOBAL innodb_monitor_reset = module_icp; SET GLOBAL innodb_monitor_reset = dml_reads; FLUSH STATUS;
...
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE ((f1 < 2) OR (f1 > 4 AND f1 < 6) OR (f1 > 8 AND f1 < 12)) AND (f2 % 4) = 1;
...
5 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT NAME, COUNT_RESET FROM information_schema.INNODB_METRICS WHERE NAME LIKE 'icp_attempts';
+--------------+-------------+
| NAME         | COUNT_RESET |
+--------------+-------------+
| icp_attempts |          23 |
+--------------+-------------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SHOW STATUS LIKE 'Handler_icp_attempts';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| Handler_icp_attempts | 20    |
+----------------------+-------+
1 row in set (0.00 sec)

It can be a bit confusing to have two counters that supposedly measure the same counts yielding different values, so watch this if you use MariaDB.

ICP Counters in PMM

Today you can find an ICP counters graph for MariaDB (Handler_icp_attempts) in PMM 1.1.3.

Additionally, in release 1.1.4 you’ll find graphs for ICP metrics from information_schema.INNODB_METRICS: just look for the INNODB_METRICS-based graph on the InnoDB Metrics dashboard!

I hope you found this blog post series useful! Let me know if you have any questions or comments below.

by Agustín at May 22, 2017 08:53 PM

Webinar May 23, 2017: MongoDB Monitoring and Performance for the Savvy DBA

MongoDB Monitoring

MongoDB MonitoringJoin Percona’s Senior Technical Services Engineer Bimal Kharel on Tuesday, May 23, 2017, as he presents a webinar on MongoDB monitoring called How to Help Your DBA’s Sleep Better at Night at 10:00 am PDT / 1:00 pm EDT (UTC-7).

Are you trying to stay on top of your database before things turn ugly? Between metrics for throughput, database performance, resource utilization, resource saturation, errors (asserts) and many others, how do you know which one needs to be looked at NOW (and which can wait)?

Both DBAs and system admins must stay on top of the systems they manage. But filtering between metrics that need immediate attention and those that should be watched over time is challenging. In this webinar, Bimal narrows down the list of metrics that help you decide whether the on-call DBA gets their recommended eight hours of shuteye, or gets to run on caffeine with no sleep.

Bimal also discusses which graphs relate to each other, with examples from Percona’s Monitoring and Management (PMM) tool, to help you understand how things in MongoDB can impact other areas.

Please register for the webinar here.

MongoDB MonitoringBimal Kharel, Senior Technical Services Engineer, Percona

Bimal is a MongoDB support engineer at Percona. Before Percona he worked as a MongoDB DBA at EA and Charles Schwab. He has been in various roles throughout his career, from graphics to web developer to systems administration. MongoDB was the first database Bimal got into (he used MySQL for some websites but never other relational databases).

by Dave Avery at May 22, 2017 04:54 PM

May 19, 2017

Peter Zaitsev

Percona Toolkit 3.0.3 is Now Available

Percona Server for MongoDB

Percona ToolkitPercona announces the release of Percona Toolkit 3.0.3 on May 19, 2017.

Percona Toolkit is a collection of advanced command-line tools that perform a variety of MySQL and MongoDB server and system tasks too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

You download Percona Toolkit packages from the web site or install from official repositories.

This release includes the following changes:

New Features

  • Added the --skip-check-slave-lag option for pt-table-checksum, pt-online-schema-change, and pt-archiverdp.This option can be used to specify a list of servers where to skip checking for slave lag.
  • 1642754: Added support for collecting replication slave information in pt-stalk.
  • PT-111: Added support for collecting information about variables from Performance Schema in pt-stalk. For more information, see 1642753.
  • PT-116: Added the --[no]use-insert-ignore option for pt-online-schema-change to force or prevent using IGNORE on INSERT statements. For more information, see 1545129.

Bug Fixes

  • PT-115: Fixed OptionParser to accept repeatable DSNs.
  • PT-126: Fixed pt-online-schema-change to correctly parse comments. For more information, see 1592072.
  • PT-128: Fixed pt-stalk to include memory usage information. For more information, see 1510809.
  • PT-130: Fixed pt-mext to work with non-empty RSA public key. For more information, see 1587404.
  • PT-132: Fixed pt-online-schema-change to enable --no-drop-new-table when --no-swap-tables and --no-drop-triggers are used.

You can find release details in the release notes. Report bugs in Toolkit’s launchpad bug tracker.

by Alexey Zhebel at May 19, 2017 04:23 PM

May 17, 2017

Peter Zaitsev

MongoDB Authentication and Roles: Creating Your First Personalized Role

MongoDB Authentication and Roles

MongoDB Authentication and RolesIn this blog post, we’ll walk through the native MongoDB authentication and roles, and learn how to create personalized roles. It is a continuation of Securing MongoDB instances.

As said before, MongoDB features a few authentication methods and built-in roles that offer great control of both who is connecting to the database and what they are allowed to do. However, some companies have their own security policies that are often not covered by default roles. This blog post explains not only how to create personalized roles, but also how to grant minimum access to a user.

Authentication Methods

SCRAM-SHA-1 and MONGODB-CR are challenge-response protocols. All the users and passwords are saved encrypted in the MongoDB instance. Challenge-response authentication methods are widely used on the internet in several server-client software. These authentication methods do not send passwords as plain text to the server when the client is starting an authentication. Each new session has a different hash/code, which stops people from getting the password when sniffing the network.

The MONGODB-CR method was deprecated in version 3.0.

The x.509 authentication is an internal authentication that allows instances and clients to communicate to each other. All certificates are signed by the same Certificate Authority and must be valid. All the network traffic is encrypted by a given key, and it is only possible to read data with a valid certificate signed by such key.

MongoDB also offers external authentications such as LDAP and Kerberos. When using LDAP, users can log in to MongoDB using their centralized passwords. The LDAP application is commonly used to manage users and passwords in wide networks. Kerberos is a service that allows users to login only once, and then generates access tickets so that the users are allowed to access other services. Some configuration is necessary to use external authentication.

Built in roles

  • read: collStats,dbHash,dbStats,find,killCursors,listIndexes,listCollections,
  • readWrite: all read privileges + convertToCapped, createCollection,dbStats, dropCollection, createIndex, dropIndex, emptycapped, insert, listIndexes,remove, renameCollectionSameDB, update.
  • readAnyDatabase: allows the user to perform read in any database except the local and the config databases.

And so on…

In this tutorial, we are going to give specific privileges to a user who is allowed to only read the database, although he is allowed to write in a specific collection.

For this tutorial, we are using MongoDB 3.4 with previously configured authentication.

Steps:

  1. Create the database:
    mongo --authenticationDatbase admin -u superAdmin -p
    use percona
    db.foo.insert({x : 1})
    db.foo2.insert({x : 1})
  2. Create a new user:
    > db.createUser({user : 'client_read', pwd : '123', roles : ['read']})
    Successfully added user: { "user" : "client_read", "roles" : [ "read" ] }
  3. Log in with the user that has just been created and check the user access:
    ./mongo localhost/percona -u client_read -p
    MongoDB shell version v3.4.0-rc5
    Enter password:
    db.foo.find()
    { "_id" : ObjectId("586bc2e9cac0bbb93f325d11"), "x" : 1 }
    db.foo2.find().count()
    1
    // If user try to insert documents will receive an error:
    > db.foo.insert({x : 2})
    WriteResult({
                "writeError" : {
                "code" : 13,
                "errmsg" : "not authorized on percona to execute command
                     { insert: "foo", documents: [ { _id: ObjectId('586bc36e7b114fb2517462f3'), x: 2.0 } ], ordered: true }"
                }
    })
  4. Log out and log in again with administrator user to create a new role for this user:
    mongo --authenticationDatabase admin -u superAdmin -p
    db.createRole({
    role : 'write_foo2_Collection',
    privileges : [ {resource : {db : "percona", collection : "foo2"}, actions : ["insert","remove"]}
    ],
    roles : ["read"]
    })
    db.updateUser('client_read', roles : ['write_foo2_Collection'])
  5. Check the new access:
    ./mongo
    db.auth('client_read','123')
    1
    > show collections
    foo
    foo2
    > db.foo.find()
    { "_id" : ObjectId("586bc2e9cac0bbb93f325d11"), "x" : 1 }
    > db.foo2.insert({y : 2})
    WriteResult({ "nInserted" : 1 })
    > db.foo.insert({y : 2}) //does not have permission.
    WriteResult({
          "writeError" : {
                "code" : 13,
                "errmsg" : "not authorized on percona to execute command { insert: "foo", documents: [ { _id: ObjectId('586bc5e26f05b3a5db849359'), y: 2.0 } ], ordered: true }"
                         }
    })
  6. We can also add access to other database resources. Let’s suppose we would like to grant this just created user permission to execute a getLog command. This command is available in the clusterAdmin role, but we do not want to give all this role’s access to him. See https://docs.mongodb.com/v3.0/reference/privilege-actions/#authr.getLog.

    There is a caveat/detail/observation here. If we want to grant cluster privileges to a user, we should create the role in the admin database. Otherwise, the command will fail:

    db.grantPrivilegesToRole(
         "write_foo2_Collection",
               [
                      {resource : {cluster : true}, actions : ["getLog"] }
               ]
    )
    Roles on the 'percona' database cannot be granted privileges that target other databases or the cluster :

  7. We are creating the same role in the admin database. This user only works properly if the admin database is present in a possible restore. Otherwise, the privileges fail:

    use admin
    db.createRole({
         role : 'write_foo2_Collection_getLogs',
         privileges : [
                           {resource : {db : "percona", collection : "foo2"}, actions : ["insert","remove"]},
                           {resource : {cluster : true}, actions : ["getLog"]}],  
         roles : [ {role : "read", db: "percona"}]
    })
    use percona
    db.updateUser( "client_read",
    {
         roles : [
              { role : "write_foo2_Collection_getLogs", db : "admin" }
                    ]
    }
    )

  8. Now the user has the same privileges as before, plus the getLog permission. We can test this user new access with:

    mongo --authenticationDatabase percona -u read_user -p
    db.adminCommand({getLog : 'global'})
    {
              "totalLinesWritten" : 287,
              "log" : [....
    ….
    }

I hope you find this post useful. Please feel free to ping me on twitter @AdamoTonete or @percona and let us know your thoughts.

by Adamo Tonete at May 17, 2017 06:53 PM

May 16, 2017

Peter Zaitsev

Percona Live Open Source Database Conference 2017 Slides and Videos Available

Percona Live

Percona LiveThe slides and videos from the Percona Live Open Source Database Conference 2017 are available for viewing and download. The videos and slides cover the keynotes, breakout sessions and MySQL and MongoDB 101 sessions.

To view slides, go to the Percona Live agenda, and select the talk you want slides for from the schedule, and click through to the talk web page. The slides are available below the talk description. There is also a page with all the slides that is searchable by topic, talk title, speaker, company or keywords.

To view videos, go to the Percona Live 2017 video page. The available videos are searchable by topic, talk title, speaker, company or keywords.

There are a few slides and videos outstanding due to unforeseen circumstances. However, we will upload those as they become available.

Some examples of videos and slide decks from the Percona Live conference:

MongoDB 101: Efficient CRUD Queries in MongoDB
Adamo Tonete, Senior Technical Engineer, Percona
Video: https://www.percona.com/live/17/content/efficient-crud-queries-mongodb
Slides: https://www.percona.com/live/17/sessions/efficient-crud-queries-mongodb

MySQL 101: Choosing a MySQL High Availability Solution
Marcos Albe, Principal Technical Services Engineer, Percona
Video: https://www.percona.com/live/17/content/choosing-mysql-high-availability-solution
Slides: https://www.percona.com/live/17/sessions/choosing-mysql-high-availability-solution

Breakout Session: Using the MySQL Document Store
Mike Zinner, Sr. Software Development Director and Alfredo Kojima, Sr. Software Development Manager, Oracle
Video: https://www.percona.com/live/17/content/using-mysql-document-store
Slides: https://www.percona.com/live/17/sessions/using-mysql-document-store

Keynote: Continuent is Back! But What Does Continuent Do Anyway?
Eero Teerikorpi, Founder and CEO and MC Brown, VP Products, Continuent
Video: https://www.percona.com/live/17/content/continuent-back-what-does-continuent-do-anyway
Slides: https://www.percona.com/live/17/sessions/continuent-back-what-does-continuent-do-anyway

Please let us know if you have any issues. Enjoy the videos!

Percona Live Europe 2017
Percona Live Europe 2017: Dublin, Ireland!

This year’s Percona Live Europe will take place September 25th-27th, 2017, in Dublin, Ireland. Put it on your calendar now! Information on speakers, talks, sponsorship and registration will be available in the coming months.

We have developed multiple sponsorship options to allow participation at a level that best meets your partnering needs. Our goal is to create a significant opportunity for our partners to interact with Percona customers, other partners and community members. Sponsorship opportunities are available for Percona Live Europe 2017.

Download a prospectus here.

We look forward to seeing you there!

by Dave Avery at May 16, 2017 10:13 PM

Jean-Jerome Schmidt

Command Line Aficionados: Introducing s9s for ClusterControl

Easily Integrate ClusterControl With Your Existing DevOps Tools via s9s - Our New Command Line Interface

We’ve heard your call (and, selfishly, our own): please meet s9s - the new command line interface (CLI) for ClusterControl, our all-inclusive open source database management system.

At every conference we’ve attended so far, visitors have been asking us whether there is a command line interface for ClusterControl. And, we’re not afraid to admit, some of us at Severalnines have always wanted to have one as well. So those same colleagues have gone and created the s9s CLI for ClusterControl, which we’re happy to present today.

In fact, Johan Andersson, our CTO, is one of our command line aficionados and he describes the new CLI as follows:

What’s the ClusterControl CLI all about?

The ClusterControl CLI, is an open source project and optional package introduced with ClusterControl version 1.4.1. It is a command line tool to interact, control and manage your entire database infrastructure using ClusterControl. The s9s command line project is open source and is located on GitHub.

The ClusterControl CLI opens a new door for cluster automation where you can easily integrate it with existing deployment automation tools like Ansible, Puppet, Chef or Salt. This allows you to easily integrate scripts from your orchestration tools inside the CLI.

Users who have downloaded ClusterControl can use the CLI for all the ClusterControl features while they’re on the Enterprise trial of ClusterControl. Community users can then use the deployment and monitoring functionalities of ClusterControl. Existing customers can use the CLI to the full extent of ClusterControl.

Usage and Installation

The CLI can be installed by adding the s9s tools repository and using a package manager, as well as be compiled from source. The current installation script to install ClusterControl, install-cc, will automatically install the command line client. The command line client can also be installed on another computer or workstation for remote management. Finally, the CLI requires ClusterControl 1.4.1 or later.

Moreover, all communication between the client and the controller is encrypted and secured using TLS.

The ClusterControl CLI allows you to deploy and manage open source databases and load balancers in a way that is fully integrated and aligned with the ClusterControl core and GUI.

The s9s command line project is open source and located on GitHub: https://github.com/severalnines/s9s-tools

For examples and additional information, e.g, how to setup users and authentication, please visit
https://severalnines.com/docs/components.html#clustercontrol-cli

Before you get started, you need to have ClusterControl version 1.4.1 or later installed, see https://severalnines.com/download-clustercontrol-database-management-system

Some of the things you can do from the CLI in ClusterControl

  • Deploy and manage database clusters
    • MySQL
    • PostgreSQL
    • MongoDB to be added soon
  • Monitor your databases
    • Status of nodes and clusters
    • Cluster properties can be extracted
    • Gives detailed enough information about your clusters
  • Manage your systems and integrate with DevOps tools
    • Create, stop or start clusters
    • Add, remove, or restart nodes in the cluster
    • Create database users (CREATE USER, GRANT privileges to user)
      • Users created in the CLI are traceable through the system
    • Create load balancers (HAProxy, ProxySQL)
    • Create and Restore backups
    • Use maintenance mode
    • Conduct configuration changes of db nodes
    • Integrate with existing deployment automation
      • Ansible, Puppet, Chef or Salt, ...

Actions you take from the CLI will be visible in the ClusterControl Web UI and vice versa.

How to contribute

The CLI project (aka s9s-tools) can be accessed via GitHub. We encourage users to contribute to the project by:

  • Trying out the CLI and give us feedback
  • Letting us know about missing features, wishes, or problems by opening issues on GitHub
  • Contributing patches to the project

To sum things up

The ClusterControl CLI and GUI are fully integrated and synced to allow you to utilize the CLI for deployment and management of your databases and load balancers, whilst using the advanced graphs in the GUI for monitoring and troubleshooting. The CLI offers detailed information about node stats and cluster stats, enabling scripts and other tools to benefit from those.

In our experience, System Administrators and DevOps professionals are the mostly likely to benefit from a CLI for ClusterControl as they are accustomed to using scripts to perform their daily tasks.

Happy command-line-clustering!

by jj at May 16, 2017 01:26 PM

May 15, 2017

Peter Zaitsev

Percona Server for MongoDB 3.2.13-3.3 is Now Available

Percona Server for MongoDB 3.2

Percona Server for MongoDB 3.2Percona announces the release of Percona Server for MongoDB 3.2.13-3.3 on May 15, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.2 protocol and drivers. It extends MongoDB with MongoRocks, Percona Memory Engine, and PerconaFT storage engine, as well as enterprise-grade features like External Authentication, Audit Logging, Profiling Rate Limiting, and Hot Backup at no extra cost. Percona Server for MongoDB requires no changes to MongoDB applications or code.

NOTE: We deprecated the PerconaFT storage engine. It will not be available in future releases.

This release is based on MongoDB 3.2.13 and includes the following additional changes:

  • #PSMDB-127: Fixed cleanup of deleted documents and indexes for MongoRocks. When you upgrade to this release, deferred compaction may occur and cause database size to decrease significantly.
  • #PSMDB-133: Added the wiredTigerCheckpointSizeMB variable, set to 1000 in the configuration template for WiredTiger. Valid values are 32 to 2048 (2GB), with the latter being default.
  • #PSMDB-138: Implemented SERVER-23418 for MongoRocks.

Percona Server for MongoDB 3.2.13-3.3 release notes are available in the official documentation.

by Alexey Zhebel at May 15, 2017 05:55 PM

May 12, 2017

Peter Zaitsev

Percona Server for MySQL 5.7.18-14 is Now Available

Percona Server for MySQL 5.7.18-14

Percona Server for MySQL 5.7.18-14Percona announces the GA release of Percona Server for MySQL 5.7.18-14 on May 12, 2017. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

Based on MySQL 5.7.18, including all the bug fixes in it, Percona Server for MySQL 5.7.18-14 is the current GA release in the Percona Server for MySQL 5.7 series. Percona’s provides completely open-source and free software. Find release details in the 5.7.18-14 milestone at Launchpad.

New Features:
Bugs Fixed:
  • Deadlock could occur in I/O-bound workloads when server was using several small buffer pool instances in combination with small redo log files and variable innodb_empty_free_list_algorithm set to backoff algorithm. Bug fixed #1651657.
  • Fixed a memory leak in Percona TokuBackup. Bug fixed #1669005.
  • Compressed columns with dictionaries could not be added to a partitioned table by using ALTER TABLE. Bug fixed #1671492.
  • Fixed a memory leak that happened in case of failure to create a multi-threaded slave worker thread. Bug fixed #1675716.
  • In-place upgrade from Percona Server 5.6 to 5.7 by using standalone packages would fail if /var/lib/mysql wasn’t defined as the datadir. Bug fixed #1687276.
  • Combination of using any audit API-using plugin, like Audit Log Plugin and Response Time Distribution, with multi-byte collation connection and PREPARE statement with a parse error could lead to a server crash. Bug fixed #1688698 (upstream #86209).
  • Fix for a #1433432 bug caused a performance regression due to suboptimal LRU manager thread flushing heuristics. Bug fixed #1631309.
  • Creating Compressed columns with dictionaries in MyISAM tables by specifying partition engines would not result in error. Bug fixed #1631954.
  • It was not possible to configure basedir as a symlink. Bug fixed #1639735.
  • Replication slave did not report Seconds_Behind_Master correctly when running in multi-threaded slave mode. Bug fixed #1654091 (upstream #84415).
  • DROP TEMPORARY TABLE would create a transaction in binary log on a read-only server. Bug fixed #1668602 (upstream #85258).
  • Processing GTIDs in the relay log that were already been executed were causing write/fsync amplification. Bug fixed #1669928 (upstream #85141).
  • Text/BLOB fields were not handling sorting of the empty string consistently between InnoDB and filesort. Bug fixed #1674867 (upstream #81810) by porting a Facebook patch for MySQL.
  • InnoDB adaptive hash index was using a partitioning algorithm which would produce uneven distribution when the server contained many tables with an identical schema. Bug fixed #1679155 (upstream #81814).
  • For plugin variables that are signed numbers, doing a SHOW VARIABLES would always show an unsigned number. Fixed by porting a Facebook patch for MySQL.

Other bugs fixed: #1629250 (upstream #83245), #1660828 (upstream #84786), #1664519 (upstream #84940), #1674299, #1670588 (upstream #84173), #1672389, #1674507, #1675623, #1650294, #1659224, #1662908, #1669002, #1671473, #1673800, #1674284, #1676441, #1676705, #1676847 (upstream #85671), #1677130 (upstream #85678), #1677162, #1677943, #1678692, #1680510 (upstream #85838), #1683993, #1684012, #1684078, #1684264, #1687386, #1687432, #1687600, and #1674281.

The release notes for Percona Server for MySQL 5.7.18-14 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at May 12, 2017 06:00 PM

Percona Server for MySQL 5.6.36-82.0 is Now Available

Percona Server for MySQL 5.7.18-14

Percona Server for MySQL 5.6.36-82.0Percona announces the release of Percona Server for MySQL 5.6.36-82.0 on May 12, 2017. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

Based on MySQL 5.6.36, and including all the bug fixes in it, Percona Server for MySQL 5.6.36-82.0 is the current GA release in the Percona Server for MySQL 5.6 series. Percona Server for MySQL is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release are available in the 5.6.36-82.0 milestone on Launchpad.

New Features:
Bugs Fixed:
  • Deadlock could occur in I/O-bound workloads when server was using several small buffer pool instances in combination with small redo log files and variable innodb_empty_free_list_algorithm set to backoff algorithm. Bug fixed #1651657.
  • Querying TABLE_STATISTICS in combination with a stored function could lead to a server crash. Bug fixed #1659992.
  • tokubackup_slave_info file was created for a master server after taking the backup with Percona TokuBackup. Bug fixed #135.
  • Fixed a memory leak in Percona TokuBackup. Bug fixed #1669005.
  • Compressed columns with dictionaries could not be added to a partitioned table by using ALTER TABLE. Bug fixed #1671492.
  • Fixed a memory leak that happened in case of failure to create a multi-threaded slave worker thread. Bug fixed #1675716.
  • The combination of using any audit API-using plugin, like Audit Log Plugin and Response Time Distribution, with multi-byte collation connection and PREPARE statement with a parse error could lead to a server crash. Bug fixed #1688698 (upstream #86209).
  • Fix for a #1433432 bug in Percona Server 5.6.28-76.1 caused a performance regression due to suboptimal LRU manager thread flushing heuristics. Bug fixed #1631309.
  • Creating Compressed columns with dictionaries in MyISAM tables by specifying partition engines would not result in error. Bug fixed #1631954.
  • It was not possible to configure basedir as a symlink. Bug fixed #1639735.
  • Replication slave did not report Seconds_Behind_Master correctly when running in multi-threaded slave mode. Bug fixed #1654091 (upstream #84415).
  • DROP TEMPORARY TABLE would create a transaction in binary log on a read-only server. Bug fixed #1668602 (upstream #85258).
  • Creating a compression dictionary with innodb_fake_changes enabled could lead to a server crash. Bug fixed #1629257.

Other bugs fixed: #1660828 (upstream #84786), #1664519 (upstream #84940), #1674299, #1683456, #1670588 (upstream #84173), #1672389, #1674507, #1674867, #1675623, #1650294, #1659224, #1660565, #1662908, #1669002, #1671473, #1673800, #1674284, #1676441, #1676705, #1676847 (upstream #85671), #1677130 (upstream #85678), #1677162, #1678692, #1678792, #1680510 (upstream #85838), #1683993, #1684012, #1684078, #1684264, and #1674281.

Release notes for Percona Server for MySQL 5.6.36-82.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at May 12, 2017 05:43 PM

Kristian Nielsen

Improving replication with multiple storage engines

New MariaDB/MySQL storage engines such as MyRocks and TokuDB have renewed interest in using engines other than InnoDB. This is great, but also presents new challenges. In this article, I will describe work that I am currently finishing, and which addresses one such challenge.

For example, the left bar in the figure shows what happens to MyRocks replication performance when used with a default install where the replication state table uses InnoDB. The middle bar shows the performance improvement from my patch.

Current MariaDB and MySQL replication uses tables to transactionally record the replication state (eg mysql.gtid_slave_pos). When non-InnoDB storage engines are introduced the question becomes: What engine should be used for the replication table? Any choice will penalise other engines heavily by injecting a cross-engine transaction with every replicated change. Unless all tables can be migrated to the other engine at once, this is an unavoidable problem with current MariaDB / MySQL code.

To solve this I have implemented MDEV-12179, per-engine mysql.gtid_slave_pos tables, which should hopefully be in MariaDB 10.3 soon. This patch makes the server able to use multiple replication state tables, one for each engine used. This way, InnoDB transactions can update the InnoDB replication table, and eg. MyRocks transactions can update the MyRocks table. No cross-engine transactions are needed (unless the application itself uses both InnoDB and MyRocks in a single transaction).

The feature is enabled with the new option --gtid-pos-auto-engines=innodb,rocksdb. The server will automatically create the new replication tables when/if needed, and will read any such tables present at server start to restore the replication state.

Performance test

To test the impact of the new feature, I ran a sysbench write-only load on a master, and measured the time for a slave to apply the full load. The workload is using MyRocks tables, while the default mysql.gtid_slave_pos table is stored in InnoDB. The performance was compared with and without --gtid-pos-auto-engines=innodb,rocksdb. Full details of test options are available following the link at the end of the article.

Replication injects an update into a small table as part of each commit. The performance impact of this will be most noticeable for fast transactions, where the commit overhead is relatively larger. It will be particularly noticeable when durability is enabled (--innodb-flush-log-at-trx-commit=1 and similar). If another storage engine is added into a transaction, extra fsync() calls are needed in the commit step, which can be very expensive.

I tested the performance in two scenarios:

  1. A "worst case" scenario with durability/fsync enabled for binlog, InnoDB, and MyRocks, on hardware with slow fsync.
  2. A "best case" scenario with all durability/fsync disabled.
In the "worst case" we would hope to see substantial improvement due to reducing the number of fsync operations. In the "best case" improvements will be expected to be small, if any, though there may still be some improvement due to avoiding CPU and some I/O overhead from running the commits through two engines.

The figure at the start of the article shows the results from the "worst case". The left bar is the time for the slave to catch up when the replication state table is using the default InnoDB storage engine. The middle bar is the time when using --gtid-pos-auto-engines=innodb,myrocks and the right bar is when the state table is changed to MyRocks (MyRocks-only load).

We see a huge speed penalty from the cross-engine transactions introduced by the InnoDB state table, the slave is twice as slow. However, with the patch, all the performance is recovered compared to MyRocks-only setup.

The test was run on consumer-grade hardware with limited I/O capabilities. I ran a small script to test the speed of fdatasync() (see link at end of article). This SATA-attached SSD can do around 120 fdatasync() calls per second, writing 16 KB blocks at random round-robin among five 1MB data files. In the "worst case" test, the load is completely disk-bound. Thus, the absolute transactions-per-second numbers are low, and the impact of the new feature is very big.

The results of the "best case" is in the following figure. The "best case" workload is CPU-bound, disk utilisation is low. Sysbench write-only does several queries in each transaction, so commit overhead is relatively lower. Still, we see a substantial cost of replication introduced cross-engine, it runs 18% slower than the MyRocks-only case. And again, the patch is able to fully recover the performance.

So I think these are really good results for the new feature. The impact for the user is low - just set a server option, and the server will handle the rest. We could eventually make InnoDB, TokuDB, and MyRocks default for --gtid-pos-auto-engines to make it fully automatic. The actual performance gain will depend completely on the workload, and the absolute numbers from these performance tests mean little, perhaps. But they do show that there should be significant potential gain in many cases, and enourmous gains in some cases.

Conclusions

I hope this feature will help experiments with, and eventual migration to, the new storage engines such as TokuDB and MyRocks. The ability to have good replication performance when different storage engines are used in different transactions (but not within a single transaction) should make it easier to experiment, without committing everything on a server to a new and unknown engine. There might even be use cases for deploying permanently on a mixed-engine setup, with different parts of the data utilising different performance characteristics of each engine.

The present work here is implemented for MariaDB only. However, there is some discussions on porting it to other MySQL variants. While the details of the implementation will differ somewhat, due to code differences in MariaDB replication, I believe a similar approach should work well in the other MySQL variants also. So it is definitely a possibility, if there is interest.

Links:

May 12, 2017 10:01 AM

May 11, 2017

MariaDB AB

MariaDB Analytics Tutorial: 5 Steps to Get Started in 10 Minutes

MariaDB Analytics Tutorial: 5 Steps to Get Started in 10 Minutes Amy Krishnamohan Thu, 05/11/2017 - 16:40

Looking for an easy way to get started with analytics? MariaDB ColumnStore provides a simple, open and scalable analytics solution. It leverages a pluggable storage engine to handle analytic workloads while keeping the same ANSI SQL interface that is used across the MariaDB portfolio. This blog provides a quick 5-step tutorial to help you get started with MariaDB ColumnStore.

Before you begin, please download the sample dataset, including:

 

Step 1: MariaDB ColumnStore Installation and Configuration

In this step, you will learn how to download and install MariaDB ColumnStore.

Step 2: Create Table and Load Data

MariaDB ColumnStore does not require you to set up index and partitioning. It provides an easy way to create a table and load data without help from DBAs. In addition, when ColumnStore loads data, it uses cpimport which leverages parallel query loading capability. To learn more about cpimport, watch this presentation by our solutions engineer, Anders Karlsson.

Step 3: Create Dimension Table / Cross Engine Join

Leveraging the MariaDB Server interface, we can use "Dimension Tables" from the InnoDB storage engine and join those with the "Fact Table" data in ColumnStore. In this demo, we join a loan stats fact table and dimension table to create a sample quarterly report on loan amount.

Step 4: Window Function

Another benefit of ColumnStore is built-in analytics queries like window functions. Without writing complex code, users can run window functions in SQL to run time series analysis or run averages on a certain dataset. In this example, with one SQL query, you can report on the top ranked delinquent loan amounts in five specific states.

Step 5: Data Visualization: Tableau integration

ColumnStore provides an easy way to connect to third-party BI tools like Tableau using a generic ODBC driver, enabling you to better visualize your data.

CS blog.png

Hope you enjoyed the tutorial! Here are some additional resources to help you along the way:

Looking for an easy way to get started with analytics? MariaDB ColumnStore provides a simple, open and scalable analytics solution. It leverages a pluggable storage engine to handle analytic workloads while keeping the same ANSI SQL interface that is used across the MariaDB portfolio. This blog provides a quick 5-step tutorial to help you get started with MariaDB ColumnStore.

Login or Register to post comments

by Amy Krishnamohan at May 11, 2017 08:40 PM

Peter Zaitsev

MyRocks and LOCK IN SHARE MODE

LOCK IN SHARE MODE

LOCK IN SHARE MODEIn this blog post, we’ll look at MyRocks and the

LOCK IN SHARE MODE
.

Those who attended the March 30th webinar “MyRocks Troubleshooting” might remember our discussion with Yoshinori on 

LOCK IN SHARE MODE
.

I did more tests, and I can confirm that his words are true:

LOCK IN SHARE MODE
 works in MyRocks.

This quick example demonstrates this. The initial setup:

CREATE TABLE t (
id int(11) NOT NULL,
f varchar(100) DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=ROCKSDB;
insert into t values(12345, 'value1'), (54321, 'value2');

In session 1:

session 1> begin;
Query OK, 0 rows affected (0.00 sec)
session 1> select * from t where id=12345 lock in share mode;
+-------+--------+
| id | f |
+-------+--------+
| 12345 | value1 |
+-------+--------+
1 row in set (0.01 sec)

In session 2:

session 2> begin;
Query OK, 0 rows affected (0.00 sec)
session 2> update t set f='value3' where id=12345;
ERROR HY000: Lock wait timeout exceeded; try restarting transaction

However, in the webinar I wanted to remind everyone about the differences between

LOCK IN SHARE MODE
  and
FOR UPDATE
. To do so, I added the former to my “session 2” test for the webinar. Once I did, it ignores the lock set in “session 1”. I can update a row and commit:

session 2> select * from t where id=12345 lock in share mode;
+-------+--------+
| id    | f      |
+-------+--------+
| 12345 | value1 |
+-------+--------+
1 row in set (0.00 sec)
session 2> update t set f='value3' where id=12345;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
session 2> commit;
Query OK, 0 rows affected (0.02 sec)

I reported this behavior here, and also at Percona Jira bugs database: MYR-107. In Facebook, this bug is already fixed.

This test clearly demonstrates that it is fixed in Facebook. In “session 1”:

session1> CREATE TABLE `t` (
    -> `id` int(11) NOT NULL,
    -> `f` varchar(100) DEFAULT NULL,
    -> PRIMARY KEY (`id`)
    -> ) ENGINE=ROCKSDB;
Query OK, 0 rows affected (0.00 sec)
session1> insert into t values(12345, 'value1'), (54321, 'value2');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0
session1> begin;
Query OK, 0 rows affected (0.00 sec)
session1>  select * from t where id=12345 lock in share mode;
+-------+--------+
| id    | f      |
+-------+--------+
| 12345 | value1 |
+-------+--------+
1 row in set (0.00 sec)

And now in another session:

session2> begin;
Query OK, 0 rows affected (0.00 sec)
session2> select * from t where id=12345 lock in share mode;
+-------+--------+
| id    | f      |
+-------+--------+
| 12345 | value1 |
+-------+--------+
1 row in set (0.00 sec)
session2> update t set f='value3' where id=12345;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction: Timeout on index: test.t.PRIMARY

If you want to test the fix with the Facebook MySQL build, you need to update submodules to download the patch:

git submodule update
.

by Sveta Smirnova at May 11, 2017 08:36 PM

Oli Sennhauser

MySQL Enterprise Backup Incremental Cumulative and Differential Backup

Preparing the MySQL Enterprise Administrator Training I found that the MySQL Enterprise Backup Incremental Backup is not described very well. Thus I tried it out and wrote down this how-to:

Incremental Differential Backup

incremental_backup_diff.png

Full Backup

mysqlbackup --user=root --backup-dir=/tape/full backup-and-apply-log
grep end_lsn /tape/full/meta/backup_variables.txt
end_lsn=2583666

Incremental Backups

mysqlbackup --user=root --incremental-backup-dir=/tape/inc1 --start-lsn=2583666 --incremental backup
grep end_lsn /tape/inc1/meta/backup_variables.txt
end_lsn=2586138

mysqlbackup --user=root --incremental-backup-dir=/tape/inc2 --start-lsn=2586138 --incremental backup
grep end_lsn /tape/inc2/meta/backup_variables.txt
end_lsn=2589328

mysqlbackup --user=root --incremental-backup-dir=/tape/inc3 --start-lsn=2589328 --incremental backup
grep end_lsn /tape/inc3/meta/backup_variables.txt
end_lsn=2592519

Binary Log Backups

cp /var/lib/binlog/binlog.* /tape/binlog/

Restore

This step will modify the original full backup!

mysqlbackup --incremental-backup-dir=/tape/inc1 --backup-dir=/tape/full apply-incremental-backup

mysqlbackup --incremental-backup-dir=/tape/inc2 --backup-dir=/tape/full apply-incremental-backup

mysqlbackup --incremental-backup-dir=/tape/inc3 --backup-dir=/tape/full apply-incremental-backup

mysqlbackup --user=root --datadir=/var/lib/mysql --backup-dir=/tape/full copy-back

Point-in-Time-Recovery

grep binlog_position /tape/inc3/meta/backup_variables.txt
/tape/inc3/meta/backup_variables.txt:binlog_position=binlog.000001:7731

cd /tape/binlog
mysqlbinlog --disable-log-bin --start-position=7731 binlog.000001 | mysql -uroot

Incremental Cumulative Backup

incremental_backup_cum.png

Full Backup

mysqlbackup --user=root --backup-dir=/tape/full backup-and-apply-log
grep end_lsn /tape/full/meta/backup_variables.txt
end_lsn=2602954

Incremental Backups

mysqlbackup --user=root --incremental-backup-dir=/tape/inc1 --start-lsn=2602954 --incremental backup

mysqlbackup --user=root --incremental-backup-dir=/tape/inc2 --start-lsn=2602954 --incremental backup

mysqlbackup --user=root --incremental-backup-dir=/tape/inc3 --start-lsn=2602954 --incremental backup

Binary Log Backups

cp /home/mysql/database/mysql-5.7/binlog/binlog.* /tape/binlog/

Restore

This step will modify the original full backup!

mysqlbackup --incremental-backup-dir=/tape/inc3 --backup-dir=/tape/full apply-incremental-backup

mysqlbackup --user=root --datadir=/var/lib/mysql --backup-dir=/tape/full copy-back

Point-in-Time-Recovery

grep binlog_position /tape/*/meta/backup_variables.txt
/tape/inc3/meta/backup_variables.txt:binlog_position=binlog.000001:7731

cd /tape/binlog
mysqlbinlog --disable-log-bin --start-position=7731 binlog.000001 | mysql -uroot

I very much dislike that during my restore the backup is modified. So if I do a mistake during restore my backup is gone and I am doomed.

by Shinguz at May 11, 2017 03:20 PM

May 10, 2017

Peter Zaitsev

Percona Server for MySQL 5.5.55-38.8 is Now Available

Percona Server for MySQL 5.7.18-14

Percona Server for MySQL 5.5.55-38.8Percona announces the release of Percona Server for MySQL 5.5.55-38.8 on May 10, 2017. Based on MySQL 5.5.55, including all the bug fixes in it, Percona Server for MySQL 5.5.55-38.8 is now the current stable release in the 5.5 series.

Percona Server for MySQL is open-source and free. You can find release details in the 5.5.55-38.8 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

New Features:
  • Percona Server 5.5 packages are now available for Ubuntu 17.04 (Zesty Zapus).
Bugs Fixed:
  • If a bitmap write I/O errors happened in the background log tracking thread while a FLUSH CHANGED_PAGE_BITMAPS is executing concurrently it could cause a server crash. Bug fixed #1651656.
  • Querying TABLE_STATISTICS in combination with a stored function could lead to a server crash. Bug fixed #1659992.
  • Queries from the INNODB_CHANGED_PAGES table would needlessly read potentially incomplete bitmap data past the needed LSN range. Bug fixed #1625466.
  • It was not possible to configure basedir as a symlink. Bug fixed #1639735.

Other bugs fixed: #1688161, #1683456, #1670588 (upstream #84173), #1672389, #1675623, #1660243, #1677156, #1680061, #1680510 (upstream #85838), #1683993, #1684012, #1684025, and #1674281.

Find the release notes for Percona Server for MySQL 5.5.55-38.8 in our online documentation. Report bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at May 10, 2017 05:43 PM

Percona-Lab/mongodb_consistent_backup: 1.0 Release Explained

mongodb_consistent_backup

In this blog post, I will cover the Percona-Lab/mongodb_consistent_backup tool and the improvements in the 1.0.1 release of the tool.

Percona-Lab/mongodb_consistent_backup

mongodb_consistent_backup is a tool for performing cluster consistent backups on MongoDB clusters or single-replica sets. This tool is open source Python code, developed by Percona and published under our Percona-Lab GitHub repository. Percona-Lab is a place for code projects maintained and supported with only best-effort from Percona.

By considering the entire MongoDB cluster’s shards and individual shard members, mongodb_consistent_backup can backup a cluster with one or many shards to a single point in time. Single-point-in-time consistency of cluster backups is critical to data integrity for any “sharded” database technology, and is a topic often overlooked in database deployments.

This topic is explained in detail by David Murphy in this Percona blog: https://www.percona.com/blog/2016/07/25/mongodb-consistent-backups/.

1.0 Release

mongodb_consistent_backup originally was a single replica set backup script internal to Percona, which morphed into a large multi-threaded/concurrent Python project. It was released to the public (Percona-Lab) with some rough edges.

This release focuses on the efficiency and reliability of the existing components, many of the pain-points in extending, deploying and troubleshooting the tool and adding some small features.

New Features: Config File Overhaul

The tool was moved to use a structured, nested YAML config file instead of the messy config implemented in 0.x.

You can see a full example of this new format at this URL: https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/conf/mongodb-consistent-backup.example.conf

Here’s an example of a very basic config file that’s using 3 x replica-set config servers as “seed hosts” (a new feature in 1.0!), username+password and the optional Nagios NSCA notification method:

production:
  host: csReplSet/config01:27019,config02:27019,config03:27019
  username: mongodb_consistent_password
  password: "correct horse battery staple"
  authdb: admin
  log_dir: /var/log/mongodb_consistent_backup
  backup:
    method: mongodump
    name: production-eu
    location: /var/lib/mongodb_consistent_backup
  archive:
    method: tar
  notify:
    method: nsca
    nsca:
      check_host: mongodb-production-eu
      check_name: "mongodb_consistent_backup"
      server: nagios01.example.com:5667
  upload:
    method: none

New Features: Logging

Overall there is much more logged in this release, both in “regular” mode and “verbose” mode. A highlight for this release is live logging of the output of mongodump, something that was missing from the 0.x versions of the tool.

Now we can see the progress of the backup of each shard/replset in a cluster! Below we can see the backup of csReplset (a config server replica set) dump many collections and complete its backup. After, we can see the replica sets “test1” and “test2” dumping “wikipedia.pages”.

...
[2017-05-05 20:11:05,366] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	done dumping config.settings (1 document)
[2017-05-05 20:11:05,367] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	writing config.version to
[2017-05-05 20:11:05,372] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	done dumping config.version (1 document)
[2017-05-05 20:11:05,373] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	writing config.locks to
[2017-05-05 20:11:05,377] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	done dumping config.locks (3 documents)
[2017-05-05 20:11:05,378] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	writing config.databases to
[2017-05-05 20:11:05,381] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	done dumping config.databases (1 document)
[2017-05-05 20:11:05,383] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	writing config.tags to
[2017-05-05 20:11:05,385] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	done dumping config.tags (0 documents)
[2017-05-05 20:11:05,387] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	writing config.changelog to
[2017-05-05 20:11:05,399] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	done dumping config.changelog (112 documents)
[2017-05-05 20:11:05,401] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	writing captured oplog to
[2017-05-05 20:11:05,578] [INFO] [MongodumpThread-7] [MongodumpThread:run:133] Backup csReplSet/172.17.0.1:27019 completed in 0.71 seconds, 0 oplog changes
[2017-05-05 20:11:08,042] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/172.17.0.1:27017:	[........................]  wikipedia.pages  636/35080  (1.8%)
[2017-05-05 20:11:08,071] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/172.17.0.1:28017:	[........................]  wikipedia.pages  878/35118  (2.5%)
[2017-05-05 20:11:11,041] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/172.17.0.1:27017:	[#.......................]  wikipedia.pages  1853/35080  (5.3%)
[2017-05-05 20:11:11,068] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/172.17.0.1:28017:	[#.......................]  wikipedia.pages  2063/35118  (5.9%)
[2017-05-05 20:11:14,043] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/172.17.0.1:27017:	[##......................]  wikipedia.pages  2983/35080  (8.5%)
[2017-05-05 20:11:14,075] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/172.17.0.1:28017:	[##......................]  wikipedia.pages  3357/35118  (9.6%)
[2017-05-05 20:11:17,040] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/172.17.0.1:27017:	[##......................]  wikipedia.pages  4253/35080  (12.1%)
[2017-05-05 20:11:17,070] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/172.17.0.1:28017:	[###.....................]  wikipedia.pages  4561/35118  (13.0%)
[2017-05-05 20:11:20,038] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/172.17.0.1:27017:	[###.....................]  wikipedia.pages  5180/35080  (14.8%)
[2017-05-05 20:11:20,067] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/172.17.0.1:28017:	[###.....................]  wikipedia.pages  5824/35118  (16.6%)
[2017-05-05 20:11:23,050] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/172.17.0.1:27017:	[####....................]  wikipedia.pages  6216/35080  (17.7%)
[2017-05-05 20:11:23,072] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/172.17.0.1:28017:	[####....................]  wikipedia.pages  6964/35118  (19.8%)
...

Also, while backup data is gathered the status output from each Oplog tailing thread is now logged every 30 seconds (by default):

...
[2017-05-05 20:12:09,648] [INFO] [TailThread-2] [TailThread:status:60] Oplog tailer test1/172.17.0.1:27017 status: 256 oplog changes, ts: Timestamp(1494020048, 6)
[2017-05-05 20:12:11,033] [INFO] [TailThread-3] [TailThread:status:60] Oplog tailer test2/172.17.0.1:28017 status: 1588 oplog changes, ts: Timestamp(1494020049, 50)
[2017-05-05 20:12:22,804] [INFO] [TailThread-4] [TailThread:status:60] Oplog tailer csReplSet/172.17.0.1:27019 status: 43 oplog changes, ts: Timestamp(1494020062, 1)
...

You can now write log files to disk by setting the ‘log_dir’ config variable or ‘–log-dir’ command-line flag. One log file per backup is written to this directory, with a symlink pointing to the latest log file. The previous backup’s log file is automatically compressed with gzip.

New Features: ZBackup

ZBackup is an open-source de-duplication, compression and (optional) encryption tool for archive-like data (similar to backups). Files that are fed into ZBackup are organized at a block-level into pieces called “bundles”. When more files are fed into ZBackup, it can re-use the bundles when it notices the same blocks are being backed up. This approach provides significant savings on disk space (required for many database backups). To add to the savings, all data in ZBackup is compressed using LZMA compression, which generally compresses better than gzip/deflate or zip. ZBackup also supports an optional AES-128 encryption at rest. You enable it by providing a key file to ZBackup that allows it to encode/decode the data.

mongodb_consistent_backup 1.0.0 now supports ZBackup as a new archiving method!

Below is an example of ZBackup used on a small database (about 1GB) that is constantly growing.

This graph compares the size added on disk for seven backups taken 10-minutes apart using two methods. The first method is mongodb_consistent_backup, with mongodump built-in gzip compression (available via the –gzip flag since 3.2) enabled. By default mongodump gzip is enabled by mongodb_consistent_backup (if it’s available), so this is a good “baseline”. The second method is mongodb_consistent_backup with mongodump gzip compression disabled and ZBackup used as the mongodb_consistent_backup archiving method, a post-backup stage in our tool. Notice each backup in the graph after the first only adds 14-18mb to the disk usage, meaning ZBackup was able to recognize similarities in the data.

To try out ZBackup as an archive method, use one of these methods:

  1. Set the field “method” under the “archive” section of your mongodb_consistent_backup config file to “zbackup” (example):
    production:
      ...
      archive:
         method: zbackup
      ...
  2. Or, add the command-line flag “archive.method=zbackup” to your command line.

This archive method causes mongodb_consistent_backup to create a subdirectory in the backup location named “mongodb-consistent-backup_zbackup” and import completed backups into ZBackup after the backup stage. This directory contains the ZBackup storage files that it needs to operate, and they should not be modified!

Of course, there are trade-offs. ZBackup adds some additional system resource usage and time to the overall backup AND restore process – both importing and exporting data into ZBackup takes some additional time.

By default ZBackup’s restore uses a very small amount of RAM for cache, so increasing the cache with the “–cache-size” flag may improve restore performance. ZBackup uses threading so more CPUs can also improve performance of backups and restores.

New Features: Docker Container

We now offer a Dockerfile for building mongodb_consistent_backup with all dependencies into a Docker container! The goal for the image is to be as “thin” as possible, and so the build merely downloads a prebuilt binary of the tool and installs dependencies. See: https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/Dockerfile

Some interesting use cases for a Docker-based deployment of the tool come to mind:

  • Running MongoDB backups using ephemeral containers on Apache Mesos or Kubernetes (with persistent volumes or remote upload)
  • Restricting system resources used by mongodb_consistent_backup via Docker/cgroup’s isolation features
  • Simplified deployment or isolated dependencies (e.g., Python, Mongodump, etc.)

Up-to-date images of mongodb_consistent_backup are available at this Dockerhub URL: https://hub.docker.com/r/timvaillancourt/mongodb_consistent_backup/. This image includes mongodb_consistent_backup, gzip-capable mongodump binaries and latest-stable ZBackup binaries.

To run the latest Dockerhub image:

$ docker run -i timvaillancourt/mongodb_consistent_backup:latest <mongodb_consistent_backup-flags here>

To just list the “help” page (all available options):

$ docker run -i timvaillancourt/mongodb_consistent_backup:latest --help
usage: mongodb-consistent-backup [-h] [-c CONFIGPATH]
                                 [-e {production,staging,development}] [-V]
                                 [-v] [-H HOST] [-P PORT] [-u USER]
                                 [-p PASSWORD] [-a AUTHDB] [-n BACKUP.NAME]
                                 [-l BACKUP.LOCATION] [-m {mongodump}]
                                 [-L LOG_DIR] [--lock-file LOCK_FILE]
                                 [--sharding.balancer.wait_secs SHARDING.BALANCER.WAIT_SECS]
                                 [--sharding.balancer.ping_secs SHARDING.BALANCER.PING_SECS]
                                 [--archive.method {tar,zbackup,none}]
                                 [--archive.tar.compression {gzip,none}]
                                 [--archive.tar.threads ARCHIVE.TAR.THREADS]
                                 [--archive.zbackup.binary ARCHIVE.ZBACKUP.BINARY]
                                 [--archive.zbackup.cache_mb ARCHIVE.ZBACKUP.CACHE_MB]
                                 [--archive.zbackup.compression {lzma}]
...
...

An example script for running the container with persistent Docker volumes is available here: https://github.com/Percona-Lab/mongodb_consistent_backup/tree/master/scripts

New Features: Multiple Seed Hosts + Config Servers

mongodb_consistent_backup 1.0 introduces the ability to define a list of multiple “seed” hosts, preventing a potential for a single-point of failure in your backups! If a host in the list is unavailable, it will be skipped.

Multiple hosts should be specified with this replica-set URL format, many hosts separated by commas:

    <replica-set>/<host/ip>:<port>,<host/ip>:<port>,…

Or you can specify a comma-separated list without the replica set name for non-replset nodes (eg: mongos or non-replset config servers):

    <host/ip>:<port>,<host/ip>:<port>,…

Also, the functionality to use cluster Config Servers as seed hosts was added. Before version 1.0 a clustered backup needed to use a single mongos router as a seed host to find all shards and cluster members. Sometimes mongos routers can come and go as you scale, making this design brittle.

With this new functionality, mongodb_consistent_backup can use the Cluster Config Servers to map out the cluster, which are usually three times the fairly-static hosts in an infrastructure. This makes the deployment and operation of the tool a bit simpler and more reliable.

Overall Improvements

As mentioned, a focus in this release was improving the existing code. A major refactoring of the code structure of the project was completed in 1.0, and moves the major “phases” or “stages” in the tool to their own Python sub-modules (e.g., “Backup” and “Archive”) that then auto-load their various “methods” like “mongodump” or “Zbackup”.

The code was broken into these high-level stages:

  1. Backup. The stage that gathers the backup of data. During this stage, Oplog tailing and resolving also occur if the backup is for a cluster. More backup methods are coming soon!
  2. Archive. The stage that archives and optionally compresses the backup data. The new ZBackup method also adds de-duplication and encryption ability to this stage.
  3. Upload. The stage that uploads the resulting data to a remote storage. Currently only AWS S3 is supported with Google Cloud Storage and Rsync being added as we speak.
  4. Notify. The stage that notifies external systems of the success/failure of the backup. Currently, our tool only supports Nagios NSCA, with plans for PagerDuty and others to be added.

Some interesting code enhancements include:

  • Reusing of database connections. This reduces the number of connections on seed hosts.
  • Replication heartbeat time (“operational lag”). This is now considered in replica set lag calculations.
  • Added thread safety for oplog tailing threads. This resolves some issues on extremely-overloaded hosts.

Another focus was efficiency and preventing race conditions. The tool should be much less susceptible to error as a result, although if you see any problems we’d like to hear about them on our GitHub “Issues” page.

Lastly, we encourage the open source community to contribute additional functionality to this tool via our GitHub!

Release Notes:

  • 1.0.0
    • Move to dynamic code “Submodules” and subclassing of repeated components
    • Restructuring of YAML config to nested config
    • Safe start/stopping of oplog tailer threads, additional checking on all thread states
    • File-based logging with gzip of old log
    • Oplog tailer ‘oplogReplay’ performance optimization
    • Fixes to oplog durability to-disk
    • Live mongodump output to stdout in realtime
    • Oplog tailer status logging
    • ZBackup archive method: supporting deduplication, compression and option AES encryption
    • Support for list discovery/seed hosts
    • Support configdb servers as cluster seed hosts
    • Fewer (reused) database connections
    • Database connections to use strong write concern
    • Consider replication operational lag in secondary scoring
    • Backup metadata is written for future functionality and troubleshooting
    • mongodb_consistent_backup.Errors custom exceptions for proper exception handling
    • Python PyPi support added
    • Dockerfile support for running under containers
    • Additional log messages
    • Support for MongoDB 3.4 datatypes
    • Significant reworking of existing code for efficiency, reliability and readability

More about our releases can be seen here: https://github.com/Percona-Lab/mongodb_consistent_backup/releases.

by Tim Vaillancourt at May 10, 2017 04:58 PM

Jean-Jerome Schmidt

Webinar Replay and Q&A: how to deploy and manage ProxySQL, HAProxy and MaxScale

Thanks to everyone who participated in yesterday’s webinar on how to deploy and manage ProxySQL, HAProxy and MaxScale with ClusterControl.

Krzysztof Książek, Senior Support Engineer at Severalnines, discussed support for proxies for MySQL HA setups in ClusterControl: how they differ and what their pros and cons are. And he demonstrated how you can easily deploy and manage HAProxy, MaxScale and ProxySQL from ClusterControl during a live demo.

If you missed the webinar, would like to watch it again or browse through the slides, it’s all available for viewing online now. You’ll also find below a transcript of the Q&A session, which took place at the end of the webinar.

Watch the webinar replay

Webinar Questions & Answers

Q.: I’d like to replace HAProxy by ProxySQL - can I deploy ProxySQL on the same VMs as my current HAProxy ones or do I have to create new VMs? I deploy and manage it all with ClusterControl.

A.: Yes, as long as there is no conflict in ports used by those two proxies, there’s no reason why they couldn’t coexist. By default there is no such conflict, but a user may customize ports when deploying a proxy from ClusterControl, so if you are not sure how HAProxy is configured, it’s better to double-check it.

Q.: Do you know what happened to the admin interface MaxScale 2.0 and why it was removed?

A.: We don’t have detailed knowledge - it’s been deprecated due to security reasons, but what exactly is hidden behind this statement, we don’t know.

Q.: Have you any plans to talk about or support other load balancers in future, such as F5 BigIP, A10 Networks, or Citrix Netscaler? Or do you have any immediate thoughts on them you can share just now?

A.: As of now we don’t have any plans related to these load balancers, but if we get more requests for it, we’ll look into them more.

Q.: How can we sync users across multiple Proxysql instances? Or add existing users automatically to a newly added Proxysql instance?

A.: As of now, it is not possible to do that using ClusterControl - you still can do it manually, accessing ProxySQL through the command line interface. Having said that, we have plans to implement configuration syncing in one of the next ClusterControl releases. For adding users in batches, right now, CLI is the best way - ProxySQL accepts passwords in a form of MySQL password hash, so it’s fairly easy to write a script which will do the import. This is one of the feature requests we got so we will, most likely, implement it at some point. We can’t share any ETA though.

Q.: How does ClusterControl handle configuration changes in ProxySQL?

A.: ClusterControl does not take advantage of multiple configuration levels in ProxySQL - any change introduced via the UI is immediately loaded to runtime configuration.

Q.: Can you describe what the CPU usage is for ProxySQL or MaxScale on read/write split?

A.: Typically you’ll see ProxySQL utilizing less CPU resources compared to MaxScale, but it all depends on your workload and the number of query rules you may have added to ProxySQL.

Watch the webinar replay

by krzysztof at May 10, 2017 01:03 PM

May 09, 2017

Peter Zaitsev

MariaDB Handler_icp_% Counters: What They Are, and How To Use Them

Handler_icp_% counters

Handler_icp_% countersIn this post we’ll see how MariaDB’s Handler_icp_% counters status counters (Handler_icp_attempts and Handler_icp_matches) measure ICP-related work done by the server and storage engine layers, and how to see if our queries are getting any gains by using them.

These counters (as seen in SHOW STATUS output) are MariaDB-specific. In a later post, we will see how we can get this information in MySQL and Percona Server. This investigation spun off from comments in Michael’s post about the new MariaDB dashboard in PMM. Comments are very useful, so keep them coming! 🙂

We can start by checking the corresponding documentation pages:

https://mariadb.com/kb/en/mariadb/server-status-variables/#handler_icp_attempts

Description: Number of times pushed index condition was checked. The smaller the ratio of Handler_icp_attempts to Handler_icp_match the better the filtering. See Index Condition Pushdown.

https://mariadb.com/kb/en/mariadb/server-status-variables/#handler_icp_match

Description: Number of times pushed index condition was matched. The smaller the ratio of Handler_icp_attempts to Handler_icp_match the better the filtering. See See Index Condition Pushdown.

As we’ll see below, “attempts” counts the number of times the server layer sent a WHERE clause down to the storage engine layer to check if it can be filtered out. “Match”, on the other hand, counts whether an attempt ended up in the row being returned (i.e., if the pushed WHERE clause was a complete match).

Now that we understand what they measure, let’s check how to use them for reviewing our queries. Before moving forward with the examples, here are a couple of points to keey in mind:

  • Even if the attempt was not successful, it doesn’t mean that it is bad. However, a high (attempts – match) number is good in this context, since this is a measure of the rows that were “saved” from being checked in the server layer after getting the complete row from the storage engine. (This is explained more thoroughly below in Øystein Grøvlen’s comment – check it out!) On the other hand, a low number is not bad – it just means that most (or all) attempts ended up being a match.
  • From the documentation links above, it is stated that “the smaller the ratio between attempts to match, the better the filtering.”, which I believe is the contrary.

Back to our examples, then. First, let’s review version, table structure and data set.

mysql [localhost] {msandbox} (test) > SELECT @@version, @@version_comment;
+-----------------+-------------------+
| @@version       | @@version_comment |
+-----------------+-------------------+
| 10.1.20-MariaDB | MariaDB Server    |
+-----------------+-------------------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SHOW CREATE TABLE t1G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `f1` int(11) DEFAULT NULL,
  `f2` int(11) DEFAULT NULL,
  `f3` int(11) DEFAULT NULL,
  KEY `idx_f1_f2` (`f1`,`f2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.01 sec)
mysql [localhost] {msandbox} (test) > SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
|  3999996 |
+----------+
1 row in set (1.75 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 LIMIT 12;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    1 |    2 |    1 |
|    1 |    3 |    1 |
|    1 |    4 |    1 |
|    2 |    1 |    1 |
|    2 |    2 |    1 |
|    2 |    3 |    1 |
|    2 |    4 |    1 |
|    3 |    1 |    1 |
|    3 |    2 |    1 |
|    3 |    3 |    1 |
|    3 |    4 |    1 |
+------+------+------+
12 rows in set (0.00 sec)

It’s trivial, but it will work well for what we intend to show:

mysql [localhost] {msandbox} (test) > FLUSH STATUS;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE f1 < 3 and f2 < 4;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    1 |    2 |    1 |
|    1 |    3 |    1 |
|    2 |    1 |    1 |
|    2 |    2 |    1 |
|    2 |    3 |    1 |
+------+------+------+
6 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM information_schema.SESSION_STATUS WHERE VARIABLE_NAME LIKE '%icp%' OR VARIABLE_NAME='ROWS_READ' OR VARIABLE_NAME='ROWS_SENT';
+----------------------+----------------+
| VARIABLE_NAME        | VARIABLE_VALUE |
+----------------------+----------------+
| HANDLER_ICP_ATTEMPTS | 8              |
| HANDLER_ICP_MATCH    | 6              |
| ROWS_READ            | 6              |
| ROWS_SENT            | 6              |
+----------------------+----------------+
4 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE f1 < 3 AND f2 < 4;
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-----------------------+
|    1 | SIMPLE      | t1    | range | idx_f1_f2     | idx_f1_f2 | 5       | NULL |    7 | Using index condition |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-----------------------+
1 row in set (0.01 sec)

In this scenario, the server sent a request to the storage engine to check on eight rows, from which six were a complete match. This is the case where a low attempts - match number is seen. The server scanned the index on the f1 column to decide which rows needed a “request for further check”, then the storage engine checked the WHERE condition on the f2 column with the pushed down (f2 < 4) clause.

Now let’s change the condition on f2:

mysql [localhost] {msandbox} (test) > FLUSH STATUS;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE f1 < 3 and (f2 % 4) = 1;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    2 |    1 |    1 |
+------+------+------+
2 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM information_schema.SESSION_STATUS WHERE VARIABLE_NAME LIKE '%icp%' OR VARIABLE_NAME='ROWS_READ' OR VARIABLE_NAME='ROWS_SENT';
+----------------------+----------------+
| VARIABLE_NAME        | VARIABLE_VALUE |
+----------------------+----------------+
| HANDLER_ICP_ATTEMPTS | 8              |
| HANDLER_ICP_MATCH    | 2              |
| ROWS_READ            | 2              |
| ROWS_SENT            | 2              |
+----------------------+----------------+
4 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE f1 < 3 and (f2 % 4) = 1;
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-----------------------+
|    1 | SIMPLE      | t1    | range | idx_f1_f2     | idx_f1_f2 | 5       | NULL |    7 | Using index condition |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-----------------------+
1 row in set (0.00 sec)

In this scenario, the server also sent a request for eight rows, of which only two ended up being a match, due to the changed condition on f2. This is the case where a “high” attempts - match number is seen.

Great, we understand how to see the amount of rows sent between the server and storage engine layers. Now let’s move forward with the “how can I make sense of these numbers?” part. We can use the other counters included in the outputs that haven’t been mentioned yet (ROWS_READ and ROWS_SENT) and compare them when running the same queries with ICP disabled (which can be conveniently done with a simple SET):

mysql [localhost] {msandbox} (test) > SET optimizer_switch='index_condition_pushdown=off';
Query OK, 0 rows affected (0.00 sec)

Let’s run the queries again. For the first query:

mysql [localhost] {msandbox} (test) > FLUSH STATUS; SELECT * FROM t1 WHERE f1 < 3 and f2 < 4;
...
mysql [localhost] {msandbox} (test) > SELECT * FROM information_schema.SESSION_STATUS WHERE VARIABLE_NAME LIKE '%icp%' OR VARIABLE_NAME='ROWS_READ' OR VARIABLE_NAME='ROWS_SENT';
+----------------------+----------------+
| VARIABLE_NAME        | VARIABLE_VALUE |
+----------------------+----------------+
| HANDLER_ICP_ATTEMPTS | 0              |
| HANDLER_ICP_MATCH    | 0              |
| ROWS_READ            | 9              |
| ROWS_SENT            | 6              |
+----------------------+----------------+
4 rows in set (0.01 sec)
mysql [localhost] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE f1 < 3 AND f2 < 4;
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
|    1 | SIMPLE      | t1    | range | idx_f1_f2     | idx_f1_f2 | 5       | NULL |    7 | Using where |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
1 row in set (0.00 sec)

As we can see by the handler counters, ICP is indeed not being used. Now the server is reading nine rows, as opposed to six when using ICP. Moreover, notice how we now see a Using where in the “Extra” column in the EXPLAIN output. This means that we are doing the filtering on the server layer; and we are using the first column of the composite index for the range scan (f1 < 3).

For the second query:

mysql [localhost] {msandbox} (test) > FLUSH STATUS; SELECT * FROM t1 WHERE f1 < 3 and (f2 % 4) = 1;
...
mysql [localhost] {msandbox} (test) > SELECT * FROM information_schema.SESSION_STATUS WHERE VARIABLE_NAME LIKE '%icp%' OR VARIABLE_NAME='ROWS_READ' OR VARIABLE_NAME='ROWS_SENT';
+----------------------+----------------+
| VARIABLE_NAME        | VARIABLE_VALUE |
+----------------------+----------------+
| HANDLER_ICP_ATTEMPTS | 0              |
| HANDLER_ICP_MATCH    | 0              |
| ROWS_READ            | 9              |
| ROWS_SENT            | 2              |
+----------------------+----------------+
4 rows in set (0.01 sec)
mysql [localhost] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE f1 < 3 and (f2 % 4) = 1;
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
|    1 | SIMPLE      | t1    | range | idx_f1_f2     | idx_f1_f2 | 5       | NULL |    7 | Using where |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
1 row in set (0.00 sec)

The server is also reading nine rows (because it’s also using only column f1 from the composite index, and we have the same condition on it), with the difference that it used to read only two while using ICP. We could say that this case was much better (and it was), but as with most of the time the answer to the bigger “how much better” question is “it depends”. As stated in the documentation, it has two factors:

  • How many records are filtered out
  • How expensive it is to read them

So it really depends on the size and kind of data set, and if it’s in memory or not. It’s better to benchmark the queries to have more information (like actual response times), but if it’s not possible we can get some valuable information by taking a look at these counters.

Lastly, I wanted to go back to the “attempts to match ratio” mentioned initially. As we saw in these examples, the first one had a 8:6 (or 4:3) ratio and the second a 8:2 (or 4:1) ratio, and the second one filtered more rows. Given this, I believe that the inverse will hold true: “The bigger the ratio of Handler_icp_attempts to Handler_icp_match, the better the filtering.”

Stay tuned for the next part, where we’ll see how to get this information from MySQL and Percona Server, too!

by Agustín at May 09, 2017 07:39 PM

Percona Server for MongoDB 3.4.4-1.4 is Now Available

Percona Server for MongoDB 3.2

Percona Server for MongoDB 3.4.3-1.3Percona announces the release of Percona Server for MongoDB 3.4.4-1.4 on May 9, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, fully compatible, highly-scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.4 and includes the following additional changes:

  • #PSMDB-122: Added the percona-server-mongodb-enable-auth.sh script to binary tarball.
  • #PSMDB-127: Fixed cleanup of deleted documents and indexes for MongoRocks. When you upgrade to this release, deferred compaction may occur and cause database size to decrease significantly.
  • #PSMDB-133: Added the wiredTigerCheckpointSizeMB variable, set to 1000 in the configration template for WiredTiger. Valid values are 32 to 2048 (2GB), with the latter being default.
  • #PSMDB-138: Implemented SERVER-23418 for MongoRocks.

Percona Server for MongoDB 3.4.4-1.4 release notes are available in the official documentation.

by Alexey Zhebel at May 09, 2017 05:18 PM

MariaDB Foundation

2017 MariaDB Developers (Un)Conference New York Presentations

The 2017 MariaDB Developers (Un)Conference was held on April 9 and 10 in New York, and was kindly hosted by BNY Mellon. Below are a list of the sessions with links to slides/Periscope where available. Day One *Welcoming talk: BNY Mellon and MariaDB (Zak Murad), MariaDB Foundation in 2017 (Otto Kekäläinen) – periscope *What’s new […]

The post 2017 MariaDB Developers (Un)Conference New York Presentations appeared first on MariaDB.org.

by Ian Gilfillan at May 09, 2017 12:11 PM

May 08, 2017

Peter Zaitsev

Chasing a Hung MySQL Transaction: InnoDB History Length Strikes Back

Hung MySQL Transaction

In this blog post, I’ll review how a hung MySQL transaction can cause the InnoDB history length to grow and negatively affect MySQL performance.

Recently I was helping a customer discover why SELECT queries were running slower and slower until the server restarts (which got things back to normal). It took some time to get from that symptom to a final diagnosis. Please follow me on the journey of chasing this strange MySQL behavior!

Symptoms

Changes in the query response time can mean tons of things. We can check everything from the query plan to the disk performance. However, fixing it with a restart is less common. After looking at “show engine innodb status”, I noticed some strange lines:

Trx read view will not see trx with id >= 41271309593, sees < 41268384363
---TRANSACTION 41271309586, ACTIVE 766132 sec
2 lock struct(s), heap size 376, 0 row lock(s), undo log entries 1
...

There was a total of 940 transactions like this.

Another insight was the InnoDB transaction history length graph from Percona Monitoring and Management (PMM):

Hung MySQL Transaction

History length of 6 million and growing clearly indicates a problem.

Problem localized

There have been a number of blog posts describing a similar problem: Peter stated in a blog post: “InnoDB transaction history often hides dangerous ‘debt’“. As the InnoDB transaction history grows, SELECTs need to scan more and more previous versions of the rows, and performance suffers. That explains the issue: SELECT queries get slower and slower until restart. Peter also filed this bug: Major regression having many row versions.

But why does the InnoDB transaction history start growing? There are 940 transactions in this state: ACTIVE 766132 sec. MySQL’s process list shows those transactions in “Sleep” state. It turns out that those transactions were “lost” or “hung”. As we can also see, each of those transactions holds two lock structures and one undo record, so they are not committed and not rolled-back. They are sitting there doing nothing. In this case, with the default isolation level REPEATABLE-READ, InnoDB can’t purge the undo records (transaction history) for other transactions until these “hung” transactions are finished.

The quick solution is simple: kill those connections and InnoDB will roll back those transactions and purge transaction history. After killing those 940 transactions, the graph looked like this:

Hung MySQL Transaction

However, several questions remain:

  1. What are the queries inside of this lost transaction? Where are they coming from? The problem is that neither MySQL’s process list nor InnoDB’s status shows the queries for this transaction, as it is not running those queries right now (the process list is a snapshot of what is happening inside MySQL right at this moment)
  2. Can we fix it so that the “hung” transactions don’t affect other SELECT queries and don’t cause the growth of transaction history?

Simulation

As it turns out, it is very easy to simulate this issue with sysbench.

Test preparation

To add some load, I’m using sysbench,16 threads (you can open less, it does not really matter here) and a script for a “write-only” load (running for 120 seconds):

conn=" --db-driver=mysql --mysql-host=localhost --mysql-user=user --mysql-password=password --mysql-db=sbtest "
sysbench --test=/usr/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-table-engine=InnoDB --oltp-table-size=1000000 $conn prepare
sysbench --num-threads=16 --max-requests=0 --max-time=120 --test=/usr/share/sysbench/tests/include/oltp_legacy/oltp.lua --oltp-table-size=1000000 $conn
--oltp-test-mode=complex --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0
--oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=1 --oltp-non-index-updates=0 run

Simulate a “hung” transaction

While the above sysbench is running, open another connection to MySQL:

use test;
CREATE TABLE `a` (
`i` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
insert into a values(1);
begin; insert into a values(1); select * from a;

Note: we will need to run the SELECT as a part of this transaction. Do not close the connection.

Watch the history

mysql> select name, count from information_schema.INNODB_METRICS where name like '%hist%';
+----------------------+-------+
| name                 | count |
+----------------------+-------+
| trx_rseg_history_len | 34324 |
+----------------------+-------+
1 row in set (0.00 sec)
mysql> select name, count from information_schema.INNODB_METRICS where name like '%hist%';
+----------------------+-------+
| name                 | count |
+----------------------+-------+
| trx_rseg_history_len | 36480 |
+----------------------+-------+
1 row in set (0.01 sec)

We can see it is growing. Now it is time to commit or rollback or even kill our original transaction:

mysql> rollback;
...
mysql> select name, count from information_schema.INNODB_METRICS where name like '%hist%';
+----------------------+-------+
| name                 | count |
+----------------------+-------+
| trx_rseg_history_len | 793   |
+----------------------+-------+
1 row in set (0.00 sec)

As we can see, it has purged the history length.

Finding the queries from the hung transactions

There are a number of options to find the queries from that “hung” transaction. In older MySQL versions, the only way is to enable the general log (or the slow query log). Starting with MySQL 5.6, we can use the Performance Schema. Here are the steps:

  1. Enable performance_schema if not enabled (it is disabled on RDS / Aurora by default).
  2. Enable events_statements_history:
    mysql> update performance_schema.setup_consumers set ENABLED = 'YES' where NAME='events_statements_history';
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
  3. Run the query to find all transaction started 10 seconds ago (change the number of seconds to match your workload):
    SELECT ps.id as processlist_id,
                 trx_started, trx_isolation_level,
                 esh.EVENT_ID,
                 esh.TIMER_WAIT,
                 esh.event_name as EVENT_NAME,
                 esh.sql_text as SQL,
                 esh.RETURNED_SQLSTATE, esh.MYSQL_ERRNO, esh.MESSAGE_TEXT, esh.ERRORS, esh.WARNINGS
    FROM information_schema.innodb_trx trx
    JOIN information_schema.processlist ps ON trx.trx_mysql_thread_id = ps.id
    LEFT JOIN performance_schema.threads th ON th.processlist_id = trx.trx_mysql_thread_id
    LEFT JOIN performance_schema.events_statements_history esh ON esh.thread_id = th.thread_id
    WHERE trx.trx_started < CURRENT_TIME - INTERVAL 10 SECOND
      AND ps.USER != 'SYSTEM_USER'
    ORDER BY esh.EVENT_IDG
    ...
             PROCESS ID: 1971
            trx_started: 2017-05-03 17:36:47
    trx_isolation_level: REPEATABLE READ
               EVENT_ID: 79
             TIMER_WAIT: 33767000
             EVENT NAME: statement/sql/begin
                    SQL: begin
      RETURNED_SQLSTATE: 00000
            MYSQL_ERRNO: 0
           MESSAGE_TEXT: NULL
                 ERRORS: 0
               WARNINGS: 0
    *************************** 9. row ***************************
             PROCESS ID: 1971
            trx_started: 2017-05-03 17:36:47
    trx_isolation_level: REPEATABLE READ
               EVENT_ID: 80
             TIMER_WAIT: 2643082000
             EVENT NAME: statement/sql/insert
                    SQL: insert into a values(1)
      RETURNED_SQLSTATE: 00000
            MYSQL_ERRNO: 0
           MESSAGE_TEXT: NULL
                 ERRORS: 0
               WARNINGS: 0
    *************************** 10. row ***************************
             PROCESS ID: 1971
            trx_started: 2017-05-03 17:36:47
    trx_isolation_level: REPEATABLE READ
               EVENT_ID: 81
             TIMER_WAIT: 140305000
             EVENT NAME: statement/sql/select
                    SQL: select * from a
      RETURNED_SQLSTATE: NULL
            MYSQL_ERRNO: 0
           MESSAGE_TEXT: NULL
                 ERRORS: 0
               WARNINGS: 0

    Now we can see the list of queries from the old transaction (the MySQL query used was taken with modifications from this blog post: Tracking MySQL query history in long running transactions).

At this point, we can chase this issue at the application level and find out why this transaction was not committed. The typical causes:

  • There is a heavy, non-database-related process inside the application code. For example, the application starts a transaction to get a list of images for analysis and then starts an external application to process those images (machine learning or similar), which can take a very long time.
  • The application got an uncaught exception and exited, but the connection to MySQL was not closed for some reason (i.e., returned to the connection pool).

We can also try to configure the timeouts on MySQL or the application so that the connections are closed after “N” minutes.

Changing the transaction isolation level to fix the InnoDB transaction history issue

Now that we know which transaction is holding up the purge process of InnoDB history, we can find this transaction and make changes so it will not “hang”. We can change the transaction isolation level from REPEATABLE READ (default) to READ COMMITTED. In READ COMMITTED, InnoDB does not need to maintain history length when other transactions have committed changes. (More details about different isolation methods and how they affect InnoDB transactions.) That will work in MySQL 5.6 and later. However this doesn’t work in Amazon Aurora (as of now): even with READ COMMITTED isolation level, the history length still grows.

Here is the list of MySQL versions where changing the isolation level fixes the issue

MySQL Version  Transaction isolation  InnoDB History Length
MySQL 5.6 repeatable read history is not purged until “hung” transaction finishes
MySQL 5.6 read committed (fixed) history is purged
Aurora repeatable read history is not purged until “hung” transaction finishes
Aurora read committed history is not purged until “hung” transaction finishes


Summary

Hung transactions can cause the InnoDB history length to grow and (surprisingly, on the first glance) affect the performance of other running select queries. We can use the performance schema to chase the “hung” transaction. Changing the MySQL transaction isolation level can potentially help.

by Alexander Rubin at May 08, 2017 07:09 PM

May 07, 2017

Daniël van Eeden

MySQL and SSL/TLS Performance

In conversations about SSL/TLS people often say that they either don't need TLS because they trust their network or they say it is too slow to be used in production.

With TLS the client and server has to do additional work, so some overhead is expected. But the price of this overhead also gives you something in return: more secure communication and more authentication options (client certificates).

SSL and TLS have existed for quite a long time. First they were only used for online banking and during authentication on web sites. But slowly many websites went to full-on SSL/TLS. And with the introduction of Let's encrypt many small websites are now using SSL/TLS. And many non-HTTP protocols either add encryption or move to a HTTP based protocol.

So TLS performance is very important for day-to-day usage. Many people and companies have put a lot of effort into improving TLS performance. This includes browser vendors, hardware vendors and much more.

But instead of just hoping for good performance: Let's try to measure it with a simple benchmark.

There are multiple pieces of a database connection we have to benchmark:
  1. New connections
  2. Reconnecting
  3. Bulk transfer
 And for all of these there are multiple things we can measure:
  1. Connect and/or transfer time (performance)
  2. CPU usage (efficiency)
  3. Concurrency 
The benchmark code can be found here: https://github.com/dveeden/mysql_go_tls

Let's look at connection performance. In this test I connect a number of times to MySQL  and do a "DO 1". This is on a localhost TCP connection, so it should be fast.


This is the connection time in ms for a single connection.
With 5.6.33 Community Edition, which is YaSSL based we see a very noticable overhead. And with 5.7.17 Community Edition this overhead is much smaller, but still very noticable.

Then MySQL 5.7 with OpenSSL (compiled on Fedora 25) shows another very noticable improvement over YaSSL. This can be explained because in this case the AVX2 and AES-NI CPU features can be used.

Also OpenSSL supports TLS tickets and YaSSL doesn't. This is why the yellow bar is much shorter that the orange bar. This is not yet supported in libmysqlclient, see Bug #76921 for details.

So SSL/TLS can be slow, but doesn't have to be slow.

Note that TLS needs multiple roundtrips. When testing this with netem on Linux I see this with MySQL 5.7.18 (YaSSL) and a 5ms delay:
No TLS goes from 0.5ms to 52ms
TLS goes from 8ms to 85ms

The second thing to measure is bulk performance. This is for large result sets including mysqldump.

With mysqldump from MySQL 5.7 it is easy to do:

$ time mysqldump --ssl-mode=disabled -A > /dev/null

real 0m0.145s
user 0m0.021s
sys 0m0.005s
$ time mysqldump --ssl-mode=required --ssl-cipher=AES128-SHA -A > /dev/null

real 0m0.120s
user 0m0.039s
sys 0m0.007s 
 
If you do this with multiple ciphers and put some data in the database you'll see something like this:
No TLS
4.5s
TLS Default
10.4s
RC4-MD5
7.1s
DES-CBC3-SHA
23.2s
 This is with MySQL 5.6.33 with YaSSL. Note that this is without using modern CPU features etc.

To conclude, there are some steps you can take to improve SSL/TLS performance:
  1. Upgrade to 5.7
  2. Compile MySQL with OpenSSL
  3. Use TLS tickets
  4. Use persistent connections
  5. Try different cipher suits for mysqldump and other places where you transfer larger amounts of data.

by Daniël van Eeden (noreply@blogger.com) at May 07, 2017 01:28 PM

May 05, 2017

Colin Charles

Speaking in May 2017

It was a big April if you’re in the MySQL ecosystem, so am looking forward to other events that have different focus and a different base, so to speak. See you at:

  • rootconf – May 11-12 2017 – Bangalore, India. My first Rootconf was last year, and it was a great event; I look forward to going there again this year, to talk about capacity planning for your databases. If you register with this link you get a 10% discount.
  • Open Source Data Center Conference – May 16-18 2017 – Berlin, Germany. I’ve enjoyed my trips to OSDC in the last few years, and they’re on their last tickets now – so register if you plan to go!

by Colin Charles at May 05, 2017 08:28 AM

May 04, 2017

MariaDB AB

How to Create Microservices and Set-up a Microservice Architecture with MariaDB, Docker and Go

How to Create Microservices and Set-up a Microservice Architecture with MariaDB, Docker and Go Bjorge Staijen Thu, 05/04/2017 - 17:34

Intro

In this blogpost we’re going to use the photo-sharing application that has been demoed during MariaDB’s M|17 conference. The sources of the application are available for download on Github. The purpose of this blog post is to demonstrate using a MariaDB database server in your stack when building microservice applications. During this post we’re going to talk about what microservices are, the architecture of the photo-sharing application, bootstrapping the application, and scaling parts of the application. The code demoed in this blogpost could be used as a starting template for building your own microservices. The code is tested against Docker version 17.03, Docker Machine version 0.10.0 and VirtualBox version 5.1.12, and the application has been created and tested on a Mac. Let’s start with talking about what microservices are.

What are Microservices

Creating applications with microservices is an approach that attempts to avoid the problems often found in monolithic applications. It is an approach that makes web-based development more agile and code bases easier to understand. Lots of companies claim to use the microservice approach, but they all have their own definitions and best practises. While there is no mutual agreement on what microservices really are, there are some similarities on how most companies are using microservices. With the microservice approach, developers create the application as a suite of small independent services, each independently deployable, running on a unique process, and communicate with each other using public API’s. The main benefits from this approach are resiliency of services, ease of deployment and independent scaling.

High-Level Architecture of the Application

The photo-sharing application has been created utilizing the microservices architecture. This means that the business logic is divided into multiple smaller services, each responsible for a certain set of tasks and responsible for their own data. The services can communicate by using each other’s API.

We divided the application into the following services: profile-service, photo-service, vote-service, comment-service, authentication-service and web-server. The web-server also functions as Proxy and API gateway. The services are split into five separate services for a couple of reasons. We wanted to make the services small, and the first step in to making them small was by creating a service for each domain. In our application we identified four domains: profiles, votes, comments and photos. This resulted in the first four services: profile, vote, comment and photo-service. The authentication-service is not in this list and was created to separate authentication functionality from profile management. This solution shows its value once authentication traffic starts growing, the container orchestrator can then scale the authentication-service independently. Also the authentication-service can easily be rewritten or replaced when the decision is made to use a third party authentication standard, without touching the profile service.

See figure 1 for a visualization of how the services are placed and how they are connected to each other. Note that not all services are interchanging data. For example the authentication-service doesn’t need to communicate with services like photo-service or vote-service.

 

image4.png

Figure 1. Top level domain of the photo sharing application.

Downloading and starting the application

Now we know a little bit about the architecture of the application it is time to download the code and to bootstrap the application. First we’re going to get the application from Github and after that we’re using Docker Swarm and Docker Stacks to bootstrap the application.

From this point you’ll need a terminal.

Preparing our environment

The photo-sharing application is available on Github and can be downloaded here. After downloading, the first thing we need to do is set-up our environment on top of which the application is running. The sources you’ve just downloaded contain a /scripts directory, and within that directory the create_machines.sh script.

The script is responsible for two things, which are the creation of 5 virtual machines and to configure Docker Swarm. If you can’t run this script on your machine then please follow the instructions in the Docker documentation to create a Swarm on your own machine. In this blogpost we use the following machine names:

  • manager1
  • worker1
  • worker2
  • worker3
  • worker4

If you can run a bash script on your machine then run the following command to setup your environment:

./create_machines.sh

After the machines are created we’re going to deploy a cluster visualizer. For that we need the IP of the cluster manager. Use the following command to retrieve the IP address of the manager:

docker-machine ip manager1

Now replace ‘YOUR_IP’ with the IP from above, and run the following command to deploy the visualizer:

docker-machine ssh manager1 docker run -it -d -p 8080:8080 -e HOST=YOUR_IP -e PORT=8080 -v /var/run/docker.sock:/var/run/docker.sock manomarks/visualizer

After we’ve deployed the visualizer we’re going to deploy an etcd instance on the manager-machine for the service discovery for our database cluster. Again, use the IP address of the manager and then run the following command (replace YOUR_IP in 4 locations):

docker-machine ssh manager1 docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 \
--name etcd quay.io/coreos/etcd etcd \
-name etcd0 \
-advertise-client-urls http://YOUR_IP:2379,http://YOUR_IP:4001 \
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
-initial-advertise-peer-urls http://YOUR_IP:2380 \
-listen-peer-urls http://0.0.0.0:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster etcd0=http://YOUR_IP:2380 \
-initial-cluster-state new

We need to verify that our machines are properly working, and to do that we are going to access the visualizer in our webbrowser.

Use the manager’s IP with port 8080 to access the visualizer, e.g. 192.168.99.100:8080. See figure 2 to see what the visualizer looks like.

image3.png

Figure 2. The manomarks/visualizer without any running services.

Starting the stack

To direct the terminal to the Docker Engine of the Swarm-manager we used the command:

eval $(docker-machine env manager1)

Now the first thing that we need to do is we have to pass the location of the etcd service to our database configuration. To do that we need to edit the docker-compose-stacks.yml file and add the IP and port to the “DISCOVERY_SERVICE” environment variable of the database. Get the IP of the manager with:

docker-machine ip manager1

Use the IP and add it to configuration on line 126. Change it so it looks something like this example:

- "DISCOVERY_SERVICE=192.168.99.100:2379

Now we can deploy our application. Before executing the next command, make sure that you’re in the root folder of the photo-sharing application. Run the following command to start the application:

docker stack deploy --compose-file docker-compose-stacks.yml demo

This command deploys our application and distributes the services across our virtual machines. Deploying of the services might take a few minutes, depending on the speed of your internet connection.

Using the photo-sharing application

To check if the photo-sharing application is running you simply have to go to the IP of any of the machines and use the port of the web application, port 4999. We can use any machine because all nodes in Docker Swarm are participating in an ingress routing mesh. To get the IP of the machine use the command:

docker-machine ip manager1

In our case the web application can be access by using the following address: 192.168.99.100:4999.

image5.png

Figure 3. The User Interface of the photo-sharing application.

Now the application is running you can start using it. The application is a photo-sharing and photo-voting application. An anonymous user can visit the website via the browser and can toggle between three different timelines of photos. The first one is a timeline of uploaded photos (ordered by last uploaded), second one is a timeline of hot photos (based on upvotes), and the last one is a timeline of trending photos. The difference between hot photos and trending photos is that the hot photos timeline is showing photos only from the current day; which means that every day there’s a different hot timeline.

People who visit the website can create a free account. An account gives the user the privilege to upload photos, upvote or downvote a photo and leave a comment on photos. If a visitor does not create an account he or she can still see the pictures but they cannot upload a photo, leave a comment or vote.

The photo-sharing application is written in Go and to connect the application with the MariaDB database we use the by the community developed database driver: go-sql-driver/mysql.

Scaling the application

The next thing we want to do is scale our application. See figure 4 to see the current state of the cluster.

image1.png

Figure 4. Cluster overview before the database is scaled.

To scale our application we use docker service scale, with docker service scale you can scale any service you want. For the purpose of demonstration we’re going to scale the database. Scale the database to three instances with the following command:

docker service scale demo_db=3

After scaling we have a three node Galera Cluster. The database containers takes care of setting up the replication. See figure 5 to see the current state of the cluster.

image2.png

Figure 5. Cluster overview after the database is scaled.

The traffic will automatically be spread over all the instances. The swarm manager uses internal load balancing to distribute requests among other services. Load balancing is done based upon DNS name of the service.

Conclusion

In this blogpost we’ve downloaded the photo-sharing application, created a cluster of machines, run the application on the cluster and we ended with scaling the application. The purpose of this blogpost was to merely demonstrate a way of using MariaDB Galera Cluster with microservices. It was a lot to cover for a blogpost and there is still much more to consider before all of this is ready for production. Many areas have not been touched during this blogpost. Things that you should think about before running production are (and not limited to): security, automatic failover, storing data, backups, disaster recovery, using an API gateway, improve overall code, data integrity, data aggregation and service boundaries.

Feel free to ask questions or provide feedback on this blogpost in the comments. Or if you have any suggestions for my next blog post; I’m happy to learn about that as well.

Related Topics

  1. Martin Fowler's article on Microservices
  2. Chris Richardson's website about Microservices
  3. Christian Posta's blogpost about "The Hardest Part About Microservice: Your Data"
  4. MariaDB's Knowledge Base article on Installing and using MariaDB via Docker
  5. MariaDB's Knowledge Base article on MariaDB Galera Cluster

In this blogpost we’re going to use the photo-sharing application that has been demoed during MariaDB’s M|17 conference. The sources of the application are available for download on Github. The purpose of this blog post is to demonstrate using a MariaDB database server in your stack when building microservice applications. During this post we’re going to talk about what microservices are, the architecture of the photo-sharing application, bootstrapping the application, and scaling parts of the application. The code demoed in this blogpost could be used as a starting template for building your own microservices.

Login or Register to post comments

by Bjorge Staijen at May 04, 2017 09:34 PM