Planet MariaDB

December 03, 2016

Valeriy Kravchuk

MySQL Support Engineer's Chronicles, Issue #4

This week I had to deal with some unusual problems. But let me start with Percona's xtrabackup, software that I consider a key component of many current production MySQL setups and use regularly. Recently new minor versions of XtraBackup were released, check the details on 2.4.5, for example. It made a step towards support of MariaDB 10.2, but it's still a long way to go, see this pull request #200.

My main problem with xtrabackup, though, is not with lack of support of MariaDB 10,2-specific features. Why should they care, after all... The problem is that old well known bugs and problems are not resolved, those that may affect all MySQL versions, forks and environments. Check lp:1272329 , "innobackupex dies if a directory is not readable. i.e.: lost+found", for example. Why not to read and take into account ignore_db_dir option (as server does) and let those poor souls who used mount point as a datadir to make backups? Check even older problem, passwords that are not hidden in the command lines, see lp:907280, "innobackupex script shows the password in the ps output, when its passed as a command line argument". My colleague Hartmut even suggested the fix recently, see pull request #289.

Because of these old, known problems (some of them being low hanging fruits) that are not fixed users still suffer while using xtrabackup way more often than they would like to. One day, as a result, they'll have to switch to some other online backup tools or approaches. One may dream about extended myrocks_hotbackup to cover InnoDB one day (when MyRocks and InnoDB will work together in one instance), or just use Percona TokuBackup (after adding script to go SST for Galera with it, maybe), or try something totally different. Anyway, I feel that if more bugs (including low hanging fruits) in xtrabackup are not getting fixed and pull requests are not actively accepted, the tool may become much less relevant and used soon.

I had to deal with MaxScale-related issues this week, so I'd like to remind those who use Docker for testing about https://github.com/asosso/maxscale-docker/blob/master/Dockerfile. Personally I prefer to build from source. In any case, I'd like us all to remember that in older versions one may have to set strip_db_esc option explicitly for service to deal with database names containing underscore (_). Recent 2.0.x versions have it enabled by default (see MXS-801).

I also had to explain how online ALTER TABLE works, specifically, when it sets exclusive metadata locks in the process. I still do not see this topic properly explained in the manual, so I had to report Bug #84004, "Manual misses details on MDL locks set and released for online ALTER TABLE".


By no means I am a developer for 11 years already, even less one should expect writing Java code from me. Anyway, I had to explain how to replace Oracle's ref_cursors (a.k.a cursor variables) in MySQL, both in stored procedures and in Java code that calls them. If you are wondering what is this about, check this fine manual. Note that this feature is missing in MySQL, even though it was suggested to implement it here. In general, MySQL allows just to run SELECTs in stored procedures and then in Java you can process each of the result sets returned any way you want. Things may get more complicated when more than one result set is produced, and they are even more complicated in Oracle with nested cursor expressions. So, I plan to devote a separate blog post to this topic one day. Stay tuned.

I answer questions coming not only from customers. Old friends, community users out of nowhere and, even more, colleagues are welcomed to discuss whatever MySQL- or MariaDB-related  problem they may have. If I know how to help, I'll do this, otherwise I'll quickly explain that I am of no good use. This is how I ended up testing MariaDB's CONNECT storage engine quickly as a replacement for the Federated engine, that is, to link table to a remote MySQL table. Basic instructions on how to set it up and use MySQL type looked simple, but when I tried to test on Fedora 23 and hit a problem of missing libodbc.so.1:

MariaDB [(none)]> INSTALL SONAME 'ha_connect';
ERROR 1126 (HY000): Can't open shared library '/home/openxs/dbs/mariadb10.1/lib/plugin/ha_connect.so'
  (errno: 2, libodbc.so.1: cannot open shared object file: No such file or directory)
the solution was not really straightforward. First of all I had to install unixODBC.x86_64 2.3.4-1.fc23 RPM, but it also does not provide libodbc.so.1:

[openxs@fc23 node2]$ find / -name libodbc.* 2>/dev/null
/usr/lib64/libodbc.so.2
/usr/lib64/libodbc.so
/usr/lib64/libodbc.so.2.0.0
So, I had to apply a quick and dirty hack:




[openxs@fc23 node2]$ sudo ln -s /usr/lib64/libodbc.so.2.0.0  /usr/lib64/libodbc.so.1
As a result CONNECT engine worked as expected, as long as proper account and IP-address where used:
MariaDB [test]> INSTALL SONAME 'ha_connect';
Query OK, 0 rows affected (0.27 sec)


MariaDB [test]> create table r_t2(id int primary key, c1 int) engine=connect table_type=mysql connection='mysql://msandbox:msandbox@127.0.0.1:23532/test/t';
Query OK, 0 rows affected (0.04 sec)

MariaDB [test]> select * from r_t2;                     
+----+------+
| id | c1   |
+----+------+
|  1 |    2 |
|  2 |    3 |
+----+------+
2 rows in set (0.00 sec)
From configuring MaxScale to work with database having underscore in the name to re-writing Java code that used to work with Oracle RDBMS for MySQL, with many missing details in the manuals and software bugs identified or reported in between, and all that with ongoing studies of performance problems and lost quorums, rolling upgrades and failed SSTs in Galera clusters - this is what support engineers here in MariaDB have to deal with during a typical working week.



by Valeriy Kravchuk (noreply@blogger.com) at December 03, 2016 07:39 PM

December 02, 2016

Peter Zaitsev

Business Continuity and MySQL Backups

MySQL Backups

MySQL BackupsThis blog post discusses the business continuity plan around MySQL backups, and how organizations should think about them.

During the years I’ve worked in IT, I’ve learned that backups sometimes are a conceptual subject in organizations. Many companies have them, but don’t document the associated business continuation plan for them. I experienced this the hard way many many years ago, somewhere around when MySQL 5.0 was still widely used.

In most organizations, there are a couple of business continuity subjects that should be described internally. For example, what is the recovery time objective and what is the recovery point objective. Let’s go a bit deeper into both concepts:

Recovery Point Objective:

A recovery point objective describes the utter limit of time data can be lost during a major incident. For example, recovery while a massive data center failure happens. One of the questions you should ask prior to these situations is what is a tolerable time point for lost information? 

If you have a recovery point objective of over a day, your daily backup routines might cover this. However, if you have a recovery point objective that is more stringent, you might be forced to have some additional tools like binary streaming or incremental backup.

Recovery Time Objective

This second term and concept is also essential in building a business continuity plan. Your environment has to remain active to generate traffic and, potentially, revenue.

What are the requirements promised to your customers? Are there any SLA’s described with the customer, or is it best effort? If it’s best effort, what would be the tipping point for your users to start using an alternative service from your competitor. These are all factors to consider while determining your RTO.

In Short

If the recovery point objective and recovery time objective are stringent, this might mean additional costs might be required when buying hardware, or perhaps having a secondary data center becomes mandatory. However, it’s a cost/value discussion: what makes your company lose revenue, and what is acceptable during a crisis?

Based on your business continuity requirements, you can potentially build your DR plans. Make sure your business continuity requirements builds the DR plan, and not vice versa.

What tools do you have at your disposal to create sensible MySQL backups?

Logical backups

MySQLdump. Remember mysqldump, the original tool included in MySQL? The good thing about mysqldump is that you can actually read and even edit the output of the backup before potentially restoring data, which can prove interesting during development work.

mysqldump’s biggest negative is that it’s not scalable, nor fast for backing up large amounts of data. Additionally, restoring data is even slower as you must replay the complete dataset on your new MySQL database servers (rebuild indexes, large IO, etc.).

mysqldump’s advantages include the convenience and flexibility of viewing or even editing the output before restoring. It gives you the ability to clone databases for development, and produce slight variations of an existing database for testing.

mydumper. This tool is comparable to mysqldump, however it does it in parallel, which provides significant benefits in backup time and restoration time.

Binary backups

Binary backups refers to copies made of the entire MySQL dataset. Binary backups are typically faster compared to logical backups, especially  on larger datasets. Several tools come to mind in these cases.

Percona XtrabackupAn opensource binary backup solution for InnoDB. The good thing about XtraBackup is that it is non-locking when using MySQL with the InnoDB storage engine. 

MySQL Enterprise BackupAn InnoDB hot backup solution that is included in the subscription level of MySQL enterprise. 

These tools can offer you incremental and daily backups, however they still don’t bring you point-in-time recovery. If your recovery point objective is very limited, it might mean that that you require to externally store (backup) your binary logs and replay them on your restored database. Keep in mind that this factor potentially impacts your recovery time objective.

Delayed Slaves

This concept is not a backup, but this technology might help you to recover your database and limit the recovery time significantly.

Conclusion

We’ve discussed having a business continuity requirement list, and some potential tools that might assist you in covering them (at least on the MySQL level). One of the last items that is important is actual testing. The number of companies that require data recovery and then notice that their backups are corrupted are way too numerous.

Make sure your organization tests their backups regularly. Are you sure they work properly? Make sure that you perform regression tests for new code – for example on a restoration set of the backups.

If you make sure you trust your backups, you might sleep better at night!   ;-). 

by Dimitri Vanoverbeke at December 02, 2016 10:30 PM

Make MySQL 8.0 Better Through Better Benchmarking

MySQL 8.0

MySQL 8.0This blog post discusses how better MySQL 8.0 benchmarks can improve MySQL in general.

Like many in MySQL community, I’m very excited about what MySQL 8.0 offers. There are a lot of great features and architecture improvements. Also like many in the MySQL community, I would like to see MySQL 8.0 perform better. Better performance is what we always want (and expect) from new database software releases.

Rarely do performance improvements happen by accident – they require running benchmarks, finding bottlenecks and eliminating them. This is the area where I think things could use improvement.

If you come to the MySQL Keynote at Oracle OpenWorld, or if you go to MySQL Benchmarks Page, you find a very limited set of benchmarks. They mostly focus around sysbench performance, with large numbers of connections and large numbers of cores. I’m not convinced this effort is the best use of our time these days.

Don’t get me wrong: as one of the original designers of sysbench, it is a great and simple tool that helps spot many bottlenecks. I still use it to find performance issues. But it is only one tool, which is by no means provides full coverage of real-world MySQL workloads.

I agree with Mark Callaghan (see discussion here): we need to run more benchmarks using a wider set of circumstances, to ensure there are no regressions in new releases. This will help move MySQL performance forward for real users.      

Here are some specific ideas on how I think we could benchmark MySQL 8.0 better:

  1. Focus on production-recommended settings. Way too often we see benchmarks run with the doublewrite buffer disabled, InnoDB checksum disabled and no binary log (like in this benchmark). While they produce some excitingly high numbers, they have little practical value for real workloads. At very least I would very much like to see separate numbers for the “street legal car,” versus one designed to set a record on the salt flats.
  2. Go beyond sysbench. Sysbench focuses on PK-only based access for very simple tables, and does not even do JOINs as part of its workload. I would like to see more benchmarks that have tables with many indexes, using secondary key lookups and joins, involving rows with many fields, and medium and large size blobs that are common bottlenecks. We also need more database features covered. Are foreign keys or triggers getting faster or slower? What about stored procedure execution performance? I would love to see these get covered. Mark Callaghan suggests LinkBench, which I think is a fine benchmark to add, but it shouldn’t be the only one.
  3. Do more with sysbench. Sysbench could get more done and cover more important workloads. Workloads with data fitting in memory and not fitting in memory should be shown separately. Testing performance with large numbers of tables is also very important – many MySQL installations for SaaS applications run tens of thousands of tables (sometimes going into millions). I would also suggest running more injection benchmarks with sysbench, as they are more representative of the real world.
  4. Look at latency, not just throughput. The benchmarks results we commonly see are focused on the throughput over a long period of time, without looking at the latency and how performance changes over time. Stalls and performance dips are well known in the MySQL space – especially the famous InnoDB checkpointing woes (though this issue has gotten a lot better) There are other cases and circumstances where stalls and slowdowns can happen.  
  5. Measure resource consumption. I very much like how Mark Callaghan shows the CPU usage and IO usage per transaction/operation, so we can get a better idea of efficiency.
  6. Concurrency. Recently, the focus has been on very high concurrency in terms of connections and active connections, typically on very big iron (using as many as 72 cores). And as much as this is important to “future-proofing” MySQL as we get more and more cores per socket every year, it should not be the only focus. In fact, it is extremely rare for me to see sustained loads of more than 20-40  “threads running” for well-configured systems. With modern solutions like ProxySQL, you can restrict concurrency to the most optimal levels for your server through multiplexing. Not to mention the thread pool, which is available in MySQL Enterprise, Percona Server and MariaDB. I would like to see a much more focused benchmark at medium-to-low concurrency. The fact that single thread performance has gotten slower in every Major MySQL version is not a good thing. As MySQL currently runs a single query in a single thread, it impacts query latencies in many real-world situations.
  7. Virtualization. We need more benchmarks in virtualized environments, as virtualization and the cloud are where most workloads are these days (by number). Yes, big iron and bare metal are where you get the best performance, but it’s not where most users are running MySQL. Whenever you are looking at full blown virtualization or containers, the performance profile can be substantially different from bare metal. Virtualized instances often have smaller CPU cores – getting the best performance with 8-16 virtual cores might be a more relevant data set for many than the performance with 100+ cores.
  8. SSL and encryption. MySQL 5.7 was all about security. We’re supposed to be able to enable SSL easily, but was any work done on making it cheap? The benchmark Ernie Souhrada did a few years back showed a pretty high overhead (in MySQL 5.6). We need more focus on SSL performance, and getting it would allow more people to run MySQL with SSL. I would also love to see more benchmarks with encryption enabled, to understand better how much it costs to have your data encrypted “at rest,” and in what cases.
  9. Protocol X and MySQL Doc Store. These were added after MySQL 5.7 GA, so it would be unfair to complain about the lack of benchmarks comparing the performance of those versus previous versions. But if Protocol X is the future, some benchmarks are in order. It would be great to have official numbers on the amount of overhead using MySQL Doc Store has compared to SQL (especially since we know that queries are converted to SQL for execution).
  10. Replication benchmarks. There are a lot of great replication features in newer MySQL versions: statement/row/mixed, GTID or no GTID, chose multiple formats for row events, enable various forms of semi-sync replication, two ways of parallel replication and multi-source replication. Additionally, MySQL group replication is on the way. There seems to be very little comprehensive benchmarks for these features, however. We really need to understand how they scale and perform under various workloads.
  11. Mixed workloads.  Perhaps one of the biggest differences between benchmarks and real production environments is that in benchmarks the same workload often is used over and over, while in the real world there is a mix of “application workloads.” The real world also has additional tasks such as backups, reporting or running “online” ALTER TABLE operations. Practical performance is performance you can count on while also serving these types of background activities. Sometimes you can get a big surprise from the severity of impact from such background activities.
  12. Compression benchmarks. There have been some InnoDB compression benchmarks (both for new and old methods), but they are a completely separate set of benchmarks that are hard to put in context with everything else. For example, do they scale well with high numbers of connections and large numbers of cores?
  13. Long-running benchmarks. A lot of the benchmarks run are rather short. Many of the things that affect performance take time to accumulate: memory fragmentation on the process (or OS Kernel) side, disk fragmentation and database fragmentation. For a database that is expected to run many months without restarting, it would be great to see some benchmark runs that last several days/weeks to check long term stability, or if there is a regression or resource leak.
  14. Complex queries. While MySQL is not an analytical database, it would still be possible to run complex queries with JOINs while the MySQL optimizer team provides constant improvements to the optimizer. It would be quite valuable to see how optimizer improvements affect query execution. We want to see how these improvements affect scalability with hardware and concurrency as well..

These are just some of ideas on what could be done. Of course, there are only so many things the performance engineering team can focus at the time: one can’t boil the ocean! My main suggestion is this: we have done enough deep optimizing of primary key lookups with sysbench on high concurrency and monster hardware, and it’s time to go wider. This ensures that MySQL doesn’t falter with poor performance on commonly run workloads. Benchmarks like these have much more practical value than beating one million primary key selects a second on a single server.

by Peter Zaitsev at December 02, 2016 07:58 PM

Jean-Jerome Schmidt

Planets9s - Eurofunk replaces Oracle with feature-rich Severalnines ClusterControl

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

Eurofunk replaces Oracle with feature-rich Severalnines ClusterControl

This week we’re happy to announce Eurofunk, one of the largest European command centre system specialists, as our latest ClusterControl customer. Severalnines was brought on board to help manage the databases used by European blue light services’ command centres who are responsible for dispatching response teams to emergencies. Severalnines’ ClusterControl was preferred to Oracle because database speed was improved at a fraction of Oracle’s licensing costs.

Read the story

Webinar next Tuesday: How to build a stable MySQL Replication environment

If you'd like to learn how to build a stable environment with MySQL replication, this webinar is for you. From OS and DB configuration checklists to schema changes and disaster recovery, you’ll have the information needed. Join us next Tuesday as Krzysztof Książek, Senior Support Engineer at Severalnines, shares his top 9 tips on how to best build a production-ready MySQL Replication environment.

Sign up for the webinar

How to deploy MySQL & MongoDB clusters in the cloud

This blog post describes how you can easily deploy and monitor your favourite open source databases on AWS and DigitalOcean. NinesControl is a service we recently released, which helps you deploy MySQL Galera and MongoDB clusters in the cloud. As a developer, if you want unified and real-time monitoring of your database and server infrastructure with access to 100+ collected key database and host metrics with custom dashboards providing insight to your operational and historic performance … Then NinesControl is for you :-)

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at December 02, 2016 11:07 AM

December 01, 2016

Peter Zaitsev

Managing Replication with Percona XtraDB Cluster

Managing Replication

Managing ReplicationThis blog post discusses managing replication with Percona XtraDB Cluster.

Recently a customer asked me to setup replication between two distinct Percona XtraDB Clusters located in geographically separate data centers. The customer goal was to use one of the clusters only in case of disaster recovery. They tried extending the cluster, but because of the WAN latency impact on their writes and the requirements of a node in a third data center for quorum, they walked away from that setup. Since they were not concerned about losing a few transactions in case of a major disaster, they were OK with regular MySQL replication using GTIDs.

Easy enough right! Both clusters are cloud-based, of course, and the provider can stop/restart any node on short notice. This setup caused some concern for the customer around how to handle replication. Since they don’t have dedicated personnel to monitor replication, or at least handle alerts, they asked if we could find a way to automate the process. So, here we go!

We all try to solve the problems with the tools we know. In my case, I like Pacemaker a lot. So using Pacemaker was my first thought. In a cloud environment, a Pacemaker setup is not easy (wouldn’t that be a cluster in a cluster… a bit heavy). But wait! Percona XtraDB Cluster with Galera replication is already handling quorum, and it provides a means of exchanging information between the nodes. Why not use that?

We can detect quorum status the same way the clustercheck scripts do it. To exchange messages, why don’t we simply write to a table. The Galera replication will update the other nodes. I went on and wrote a bash script that is called by cron every minute. The script monitors the node state and the content of the table. If all is right, it updates the table to report its presence (and if it is acting as a slave or not). The script validates the presence of a slave in the cluster. If no reporting slave is found, the script proceeds to the “election” of a new slave, based on the

wsrep_local_index
 value. Basically, the script is a big bunch of “if” statements. The script is here, and the basic documentation on how to set it up here

Of course, if it works for one cluster, it can work for two. I have configured my customer’s two Percona XtraDB Clusters in a master-to-master relationship using this script. I ran through a bunch of failure scenario cases. The script survived all of them! But of course, this is new. If you are going to implement this solution, run your own set of tests! If you find any problem, file an issue on GitHub. I’ll be happy to fix it!

by Yves Trudeau at December 01, 2016 10:56 PM

Database Daily Ops Series: GTID Replication and Binary Logs Purge

GTID replication

GTID replicationThis blog continues the ongoing series on daily operations and GTID replication.

In this blog, I’m going to investigate why the error below has been appearing in a special environment I’ve been working with on the last few days:

Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log:
'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the
master has purged binary logs containing GTIDs that the slave requires.'

The error provides the right message, and explains what is going on. But sometimes, it can be a bit tricky to solve this issue: you need additional information discovered after some tests and readings. We try and keep Managed Services scripted, in the sense that our advice and best practices are repeatable and consistent. However, some additional features and practices can be added depending on the customer situation.

Some time ago one of our customer’s database servers presented the above message. At that point, we could see the binary log files in a compressed form on master (gzipped). Of course, MySQL can’t identify a compressed file with a .gz extension as a binary log. We uncompressed the file, but replication presented the same problem – even after uncompressing the file and making sure the UUID of the current master and the TRX_ID were there. Obviously, I needed to go and investigate the problem to see what was going on.

After some reading, I re-read the below:

When the server starts, the global value of gtid_purged, which was called before as gtid_lost, is initialized to the set of GTIDs contained by the Previous_gtid_log_event of the oldest binary log. When a binary log is purged, gtid_purged is re-read from the binary log that has now become the oldest one.

=> https://dev.mysql.com/doc/refman/5.6/en/replication-options-gtids.html#sysvar_gtid_purged

That makes me think: if something is compressing binlogs on the master without purging them as expected by the GTID mechanism, it’s not going to be able to re-read existing GTIDs on disk. When the slave replication threads restarts, or the DBA issues commands like reset slave and reset master (to clean out the increased GTID sets on Executed_Gtid_Set from the SHOW SLAVE STATUS command, for example), this error can occur. But if I compress the file:

  • Will the slave get lost and not find all the needed GTIDs on the master after a reset slave/reset master?
  • If I purge the logs correctly, using PURGE BINARY LOGS, will the slave be OK when restarting replication threads?

Test 1: Compressing the oldest binary log file on master, restarting slave threads

I would like to test this very methodically. We’ll create one GTID per binary log, and then I will compress the oldest binary log file in order to make it unavailable for the slaves. I’m working with three virtual machines, one master and two slaves. On the second slave, I’m going to run the following sequence: stop slave, reset slave, reset master, start slave, and then, check the results. Let’s see what happens.

On master (tool01):

tool01 [(none)]:> show master logs;
+-------------------+-----------+
| Log_name          | File_size |
+-------------------+-----------+
| mysqld-bin.000001 |       341 |
| mysqld-bin.000002 |       381 |
| mysqld-bin.000003 |       333 |
+-------------------+-----------+
3 rows in set (0.00 sec)
tool01 [(none)]:> show binlog events in 'mysqld-bin.000001';
+-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| Log_name          | Pos | Event_type     | Server_id | End_log_pos | Info                                                              |
+-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| mysqld-bin.000001 |   4 | Format_desc    |         1 |         120 | Server ver: 5.6.32-log, Binlog ver: 4                             |
| mysqld-bin.000001 | 120 | Previous_gtids |         1 |         151 |                                                                   |
| mysqld-bin.000001 | 151 | Gtid           |         1 |         199 | SET @@SESSION.GTID_NEXT= '4fbe2d57-5843-11e6-9268-0800274fb806:1' |
| mysqld-bin.000001 | 199 | Query          |         1 |         293 | create database wb01                                              |
| mysqld-bin.000001 | 293 | Rotate         |         1 |         341 | mysqld-bin.000002;pos=4                                           |
+-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
5 rows in set (0.00 sec)
tool01 [(none)]:> show binlog events in 'mysqld-bin.000002';
+-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| Log_name          | Pos | Event_type     | Server_id | End_log_pos | Info                                                              |
+-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| mysqld-bin.000002 |   4 | Format_desc    |         1 |         120 | Server ver: 5.6.32-log, Binlog ver: 4                             |
| mysqld-bin.000002 | 120 | Previous_gtids |         1 |         191 | 4fbe2d57-5843-11e6-9268-0800274fb806:1                            |
| mysqld-bin.000002 | 191 | Gtid           |         1 |         239 | SET @@SESSION.GTID_NEXT= '4fbe2d57-5843-11e6-9268-0800274fb806:2' |
| mysqld-bin.000002 | 239 | Query          |         1 |         333 | create database wb02                                              |
| mysqld-bin.000002 | 333 | Rotate         |         1 |         381 | mysqld-bin.000003;pos=4                                           |
+-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
5 rows in set (0.00 sec)
tool01 [(none)]:> show binlog events in 'mysqld-bin.000003';
+-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| Log_name          | Pos | Event_type     | Server_id | End_log_pos | Info                                                              |
+-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| mysqld-bin.000003 |   4 | Format_desc    |         1 |         120 | Server ver: 5.6.32-log, Binlog ver: 4                             |
| mysqld-bin.000003 | 120 | Previous_gtids |         1 |         191 | 4fbe2d57-5843-11e6-9268-0800274fb806:1-2                          |
| mysqld-bin.000003 | 191 | Gtid           |         1 |         239 | SET @@SESSION.GTID_NEXT= '4fbe2d57-5843-11e6-9268-0800274fb806:3' |
| mysqld-bin.000003 | 239 | Query          |         1 |         333 | create database wb03                                              |
+-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
4 rows in set (0.00 sec)

Here we see that each existing binary log file has just one transaction. That will make it easier to compress the oldest binary log, and then disappear with part of the existing GTIDs. When the slave connects to a master, it will first send all the Executed_Gtid_Set, and then the master sends all the missing IDs to the slave. As Stephane Combaudon said, we will force it to happen! Slave database servers are both currently in the same position:

tool02 [(none)]:> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.10
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysqld-bin.000003
          Read_Master_Log_Pos: 333
               Relay_Log_File: mysqld-relay-bin.000006
                Relay_Log_Pos: 545
        Relay_Master_Log_File: mysqld-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
            ...
           Retrieved_Gtid_Set: 4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            
tool03 [(none)]:> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.10
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysqld-bin.000003
          Read_Master_Log_Pos: 333
               Relay_Log_File: mysqld-relay-bin.000008
                Relay_Log_Pos: 451
        Relay_Master_Log_File: mysqld-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
...
           Retrieved_Gtid_Set: 4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 4fbe2d57-5843-11e6-9268-0800274fb806:1-3

Now, we’ll compress the oldest binary log on master:

[root@tool01 mysql]# ls -lh | grep mysqld-bin.
-rw-rw---- 1 mysql mysql  262 Nov 11 13:55 mysqld-bin.000001.gz #: this is the file containing 4fbe2d57-5843-11e6-9268-0800274fb806:1
-rw-rw---- 1 mysql mysql  381 Nov 11 13:55 mysqld-bin.000002
-rw-rw---- 1 mysql mysql  333 Nov 11 13:55 mysqld-bin.000003
-rw-rw---- 1 mysql mysql   60 Nov 11 13:55 mysqld-bin.index

On tool03, which is the database server that will be used, we will execute the replication reload:

tool03 [(none)]:> stop slave; reset slave; reset master; start slave;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.03 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.02 sec)
tool03 [(none)]:> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 192.168.0.10
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File:
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysqld-relay-bin.000002
                Relay_Log_Pos: 4
        Relay_Master_Log_File:
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 151
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 4fbe2d57-5843-11e6-9268-0800274fb806
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp: 161111 14:47:13
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 1
1 row in set (0.00 sec)

Bingo! We broke the replication streaming on the slave. Now we know that the missing GTID on the master was due to the compressed file, and wasn’t able to be passed along to the connecting slave during their negotiation. Additionally, @@GTID_PURGED was not reloaded as per what the online manual said. The test done and we confirmed the theory (if you have additional comments, enter it at the end of the blog).

Test 2: Purge the oldest file on master and reload replication on slave

Let’s make it as straightforward as possible. The purge can be done manually using the PURGE BINARY LOGS command to get it done a proper way as the binary log index file should be considered a part of this purge operation as well (it should be edited to remove the file name index entry together with the log file on disk). I’m going to execute the same as before, but include purging the file manually with the mentioned command.

tool01 [(none)]:> show master logs;
+-------------------+-----------+
| Log_name | File_size |
+-------------------+-----------+
| mysqld-bin.000001 | 341 |
| mysqld-bin.000002 | 381 |
| mysqld-bin.000003 | 333 |
+-------------------+-----------+
3 rows in set (0.00 sec)
tool01 [(none)]:> purge binary logs to 'mysqld-bin.000002';
Query OK, 0 rows affected (0.01 sec)
tool01 [(none)]:> show master logs;
+-------------------+-----------+
| Log_name | File_size |
+-------------------+-----------+
| mysqld-bin.000002 | 381 |
| mysqld-bin.000003 | 333 |
+-------------------+-----------+
2 rows in set (0.00 sec)

Now, we’ll execute the commands to check how it goes:

tool03 [(none)]:> stop slave; reset slave; reset master; start slave;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.02 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.02 sec)
tool03 [(none)]:> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 192.168.0.10
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File:
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysqld-relay-bin.000002
                Relay_Log_Pos: 4
        Relay_Master_Log_File:
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 151
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 4fbe2d57-5843-11e6-9268-0800274fb806
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp: 161111 16:35:02
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 1
1 row in set (0.00 sec)

The GTID on the purged file is needed by the slave. In both cases, we can set the @@GTID_PURGED as below with the transaction that we know was purged, and move forward with replication:

tool03 [(none)]:> stop slave; set global gtid_purged='4fbe2d57-5843-11e6-9268-0800274fb806:1';
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
tool03 [(none)]:> start slave;
Query OK, 0 rows affected (0.01 sec)

The above adjusts the GTID on @@GTID_PURGED to just request the existing GTIDs, using the oldest existing GTID minus one to make the slave start the replication from the oldest existing GTID. In our scenario above, the replica restarts replication from 4fbe2d57-5843-11e6-9268-0800274fb806:2, which lives on binary log file mysqld-bin.000002. Replication is fixed, as its threads can restart processing the data streaming coming from master.

You will need to execute additional steps in checksum and sync for the set of transactions that were jumped when setting a new value for @@GTID_PURGED. If replication continues to break after restarting, I advise you rebuild the slave (possibly the subject of future blog).

Good explanations about this can be found on the below bug, reported by the Facebook guys and Laurynas Biveinis, the Percona Server Lead (who clarified the issue):

  • MySQL Bugs: #72635: Data inconsistencies when master has truncated binary log with GTID after crash;
  • MySQL Bugs: #73032: Setting gtid_purged may break auto_position and thus slaves;

Conclusion

Be careful when purging or doing something manually with binary logs, because @@GTID_PURGED needs to be automatically updated when binary logs are purged. It seems to happen only when

expire_logs_days
 is set to purge binary logs. Yet you need to be careful when trusting this variable, because it doesn’t consider fraction of days, depending the number of writes on a database server, it can get disks full in minutes. This blog showed that even housekeeping scripts and the PURGER BINARY LOGS command were able to make it happen.

by Wagner Bianchi at December 01, 2016 05:43 PM

Colin Charles

Debian and MariaDB Server

GNU/Linux distributions matter, and Debian is one of the most popular ones out there in terms of user base. Its an interesting time as MariaDB Server becomes more divergent compared to upstream MySQL, and people go about choosing default providers of the database.

The MariaDB Server original goals were to be a drop-in replacement. In fact this is how its described (“It is an enhanced, drop-in replacement for MySQL”). We all know that its becoming increasingly hard for that line to be used these days.

Anyhow in March 2016, Debian’s release team has made the decision that going forward, MariaDB Server is what people using Debian Stretch get, when they ask for MySQL (i.e. MariaDB Server is the default provider of an application that requires the use of port 3306, and provides a MySQL-like protocol).

All this has brought some interesting bug reports and discussions, so here’s a collection of links that interest me (with decisions that will affect Debian users going forward).

Connectors

MariaDB Server

by Colin Charles at December 01, 2016 08:13 AM

November 30, 2016

Peter Zaitsev

Galera Cache (gcache) is finally recoverable on restart

Gcache

GcacheThis post describes how to recover Galera Cache (or gcache) on restart.

Recently Codership introduced (with Galera 3.19) a very important and long awaited feature. Now users can recover Galera cache on restart.

Need

If you gracefully shutdown cluster nodes one after another, with some lag time between nodes, then the last node to shutdown holds the latest data. Next time you restart the cluster, the last node shutdown will be the first one to boot. Any followup nodes that join the cluster after the first node will demand an SST.

Why SST, when these nodes already have data and only few write-sets are missing? The DONOR node caches missing write-sets in Galera cache, but on restart this cache is wiped clean and restarted fresh. So the DONOR node doesn’t have a Galera cache to donate missing write-sets.

This painful set up made it necessary for users to think and plan before gracefully taking down the cluster. With the introduction of this new feature, the user can retain the Galera cache.

How does this help ?

On restart, the node will revive the galera-cache. This means the node can act as a DONOR and service missing write-sets (facilitating IST, instead of using SST). This option to retain the galera-cache is controlled by an option named

gcache.recover=yes/no
. The default is NO (Galera cache is not retained). The user can set this option for all nodes, or selective nodes, based on disk usage.

gcache.recover in action

The example below demonstrates how to use this option:

  • Let’s say the user has a three node cluster (n1, n2, n3), with all in sync.
  • The user gracefully shutdown n2 and n3.
  • n1 is still up and running, and processes some workload, so now n1 has latest data.
  • n1 is eventually shutdown.
  • Now the user decides to restart the cluster. Obviously, the user needs to start n1 first, followed by n2/n3.
  • n1 boots up, forming an new cluster.
  • n2 boots up, joins the cluster, finds there are missing write-sets and demands IST but given that n1 doesn’t have a gcache, it falls back to SST.

n2 (JOINER node log):

2016-11-18 13:11:06 3277 [Note] WSREP: State transfer required:
 Group state: 839028c7-ad61-11e6-9055-fe766a1886c3:4680
 Local state: 839028c7-ad61-11e6-9055-fe766a1886c3:3893

n1 (DONOR node log),

gcache.recover=no
:

2016-11-18 13:11:06 3245 [Note] WSREP: IST request: 839028c7-ad61-11e6-9055-fe766a1886c3:3893-4680|tcp://192.168.1.3:5031
2016-11-18 13:11:06 3245 [Note] WSREP: IST first seqno 3894 not found from cache, falling back to SST

Now let’s re-execute this scenario with

gcache.recover=yes
.

n2 (JOINER node log):

2016-11-18 13:24:38 4603 [Note] WSREP: State transfer required:
 Group state: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:1495
 Local state: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:769
....
2016-11-18 13:24:41 4603 [Note] WSREP: Receiving IST: 726 writesets, seqnos 769-1495
....
2016-11-18 13:24:49 4603 [Note] WSREP: IST received: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:1495

n1 (DONOR node log):

2016-11-18 13:24:38 4573 [Note] WSREP: IST request: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:769-1495|tcp://192.168.1.3:5031
2016-11-18 13:24:38 4573 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

You can also validate this by checking the lowest write-set available in gcache on the DONOR node.

mysql> show status like 'wsrep_local_cached_downto';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| wsrep_local_cached_downto | 1 |
+---------------------------+-------+
1 row in set (0.00 sec)

So as you can see,

gcache.recover
 could restore the cache on restart and help service IST over SST. This is a major resource saver for most of those graceful shutdowns.

gcache revive doesn’t work if . . .

If gcache pages are involved. Gcache pages are still removed on shutdown, and the gcache write-set until that point also gets cleared.

Again let’s see and example:

  • Let’s assume the same configuration and workflow as mentioned above. We will just change the workload pattern.
  • n1, n2, n3 are in sync and an average-size workload is executed, such that the write-set fits in the gcache. (seqno=1-x)
  • n2 and n3 are shutdown.
  • n1 continues to operate and executes some average size workload followed by a huge transaction that results in the creation of a gcache page. (1-x-a-b-c-h) [h represent transaction seqno]
  • Now n1 is shutdown. During shutdown, gcache pages are purged (irrespective of the
    keep_page_sizes
     setting).
  • The purge ensures that all the write-sets that has seqno smaller than
    gcache-page-residing
     write-set are purged, too. This effectively means (1-h) everything is removed, including (a,b,c).
  • On restart, even though n1 can revive the gcache it can’t revive anything, as all the write-sets are purged.
  • When n2 boots up, it requests IST, but n1 can’t service the missing write-set (a,b,c,h). This causes SST to take place.

Summing it up

Needless to say,

gcache.recover
 is a much needed feature, given it saves SST pain. (Thanks Codership.) It would be good to see if the feature can be optimized to work with gcache pages.

And yes, Percona XtraDB Cluster inherits this feature in its upcoming release.

by Krunal Bauskar at November 30, 2016 10:38 PM

Using the InnoDB Buffer Pool Pre-Load Feature in MySQL 5.7

InnoBD Buffer Pool Pre-Load Feature

InnoDB Buffer Pool Pre-LoadIn this blog post, I’ll discuss how to use the InnoDB buffer pool pre-load feature in MySQL 5.7

Starting MySQL 5.6, you can configure MySQL to save the contents of your InnoDB buffer pool and load it on startup. Starting in MySQL 5.7, this is the default behavior. Without any special effort, MySQL saves and restores a portion of buffer pool in the default configuration. We made a similar feature available in Percona Server 5.5 – so the concept has been around for quite a while.

Frankly, time has reduced the need for this feature. Five years ago, we would typically store databases on spinning disks. These disks often took quite a long time to warm up with normal database workloads, which could lead to many hours of poor performance after a restart. With the rise of SSDs, warm up happens faster and reduces the penalty from not having data in the buffer pool. Typically, a system reaches 90% of its fully warmed up performance in 10 minutes or less. But since it takes virtually no effort to use, saving the contents of the InnoDB buffer pool is a great feature to enable by default.

This blog post looks into some issues with this feature that might not be totally obvious from its name or documentation.

#1 

By default, MySQL only saves 25% of the most actively accessed pages (by the LRU) in the InnoDB buffer pool (not the whole buffer pool).

This is a reasonable choice for many use cases: it saves the most valuable pages, which can then be loaded faster than if you try to load every page in the buffer pool (many of which might not be relevant for continuing workload).

You can change this number by setting the

innodb_buffer_pool_dump_pct
 variable. If you’re using InnoDB essentially as an in-memory database, and want to ensure all data is memory resident and can be accessed without a disk read, set it to 100.     

Note that this variable is based on the actual amount of data present in memory, not the buffer pool size, For example, if you have a 100GB buffer pool but it only contains 10GB of data, by default only 25% of 10GB (2.5GB) gets saved. (As the manual explains, it will not take nearly as much on disk as only the page identifiers are stored, not full page contents.)

#2

MySQL starts and becomes accessible through the network before the buffer pool load on startup is complete. Immediately before the start, a lot of resources fetch buffer pool contents from the disk as quickly as possible, possibly affecting performance. If you have multiple MySQL nodes – like using MySQL Replication or running Percona XtraDB Cluster – you might consider bringing them back for production traffic only after the buffer pool load operation completes. You can monitor the buffer pool load progress by watching the GLOBAL STATUS variable:

Buffer pool load is in progress:

| Innodb_buffer_pool_load_status          | Loaded 403457/419487 pages         |

Buffer pool load is complete:

| Innodb_buffer_pool_load_status          | Buffer pool(s) load completed at 161123  9:18:57 |

As a side note, it would be great if MySQL would provide a clearer concept of the “State” of the node: being UP versus being READY to serve the traffic in an optimal way are often not the same.

#3

InnoDB’s buffer pool preload is not very efficient, at least with fast storage. In my test environment, with a rather capable NVMe storage, I get a more than 400MB/sec warmup rate if I run read-only sysbench workload. The InnoDB’s buffer pool preload warmup rate is around 100MB/sec or so.  I would guess the problem is that it doesn’t drive as many parallel IO requests as SSD storage needs to perform optimally. I did not investigate it further.

#4

Innodb buffer pool save/restore only stores the buffer pool contents on a clear shutdown.  If the server crashes MySQL still does a buffer pool preload, but with the content information saved on last clean shutdown (stored in the ib_buffer_pool  file). This might end up wasting time loading data that is not relevant for the current workload. Periodically running the following ensures a fresh set of pages is available for a quick warmup, even if MySQL crashed:

SET GLOBAL innodb_buffer_pool_dump_now=ON;

This preserves the current list of buffer pool pages.

Note that while you (hopefully) do not see your MySQL crash that often, the same issue exists with backups, MySQL slave cloning with Percona XtraBackup, or LVM snapshot. This causes these operations to be less efficient.

I hope the observations in this blog help you put this feature to better use!

by Peter Zaitsev at November 30, 2016 09:16 PM

Jean-Jerome Schmidt

We’ve answered Eurofunk’s database SOS call

Eurofunk replaces Oracle with feature-rich Severalnines ClusterControl

Today we’re happy to announce Eurofunk, one of the largest European command centre system specialists, as our latest customer. Severalnines was brought on board to help manage the databases used by European blue light services’ command centres who are responsible for dispatching response teams to emergencies. Eurofunk also provides command centres for well-known car manufacturers.

Eurofunk began operations in 1969 as a sole trader with a focus on consumer electronics and radio technology. It evolved into a crucial component of the emergency services in Europe, responsible for planning, implementing and operating command centres.

To provide efficient blue light services, it is crucial for Eurofunk to have an IT infrastructure which is highly available and fast. Unreliability and slow performance is unforgivable in a sector relying so heavily on speed of execution and directness of action.

Severalnines’ ClusterControl was preferred to Oracle because database speed was improved at a fraction of Oracle’s licensing costs. Eurofunk also experienced database downtime caused by prolonged fail-over times of their Oracle databases. With ClusterControl, it was possible to easily deploy an active/active cluster to reduce downtime scenarios. Galera Cluster for MySQL was chosen as a back-end database replication technology; Severalnines provided the platform to deploy, monitor and manage the back-end cluster and associated database load balancers, along with full enterprise support for the operations team.

Severalnines also helped Eurofunk improve end user experience for dispatchers working in the control centres. Rolling updates to the database layer is possible so emergency services have continuous access to up-to-date information to work with.

Stefan Rehlegger, System Architect, Eurofunk, said, “It’s been hard to find a unified feature-rich database cluster management system in today’s market but we’ve found one that has proved invaluable to our projects. With Severalnines’ help we’ve been able to deploy a centralised system across Europe and we’re planning to expand our usage of ClusterControl to other territories. The deployment via a web interface without any background knowledge of database clustering helps us make services available on a 24h basis more easily. Severalnines also provided great support during systems implementation; it is the database management life-saver for a fast-paced business like ours.”

Vinay Joosery, Severalnines CEO, added, “As an outsider who has watched too many TV shows, working in emergency response looks like the coolest thing to do. In reality the pressure command and control centres are under must be unbearable and to do their work effectively, they need the freshest information on accidents and emergencies. I’m happy to see Severalnines’ technology markedly improve the performance of their systems. Eurofunk keeps people safe and if we can keep their database safe and available, it means they can continue doing the great work they do.”

About Severalnines

Severalnines provides automation and management software for database clusters. We help companies deploy their databases in any environment, and manage all operational aspects to achieve high-scale availability.

Severalnines' products are used by developers and administrators of all skills levels to provide the full 'deploy, manage, monitor, scale' database cycle, thus freeing them from the complexity and learning curves that are typically associated with highly available database clusters. The company has enabled over 8,000 deployments to date via its popular ClusterControl product. Currently counting BT, Orange, Cisco, CNRS, Technicolor, AVG, Ping Identity and Paytrail as customers. Severalnines is a private company headquartered in Stockholm, Sweden with offices in Singapore and Tokyo, Japan. To see who is using Severalnines today visit, http://www.severalnines.com/about-us/company.

by Severalnines at November 30, 2016 07:40 AM

November 29, 2016

Peter Zaitsev

Percona XtraBackup 2.4.5 is now available

Percona XtraBackup 2.4.5

Percona XtraBackup 2.4.5Percona announces the GA release of Percona XtraBackup 2.4.5 on November 29th, 2016. You can download it from our download site and from apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

New features:
  • Percona XtraBackup now supports SHA256 passwords. Using the SHA256 algorithm requires either SSL encrypted connection, or using public key encryption for password exchange which is only available when both client and server are linked with OpenSSL.
  • Percona XtraBackup now supports Command Options for Secure Connections.
  • NOTE: Due to xbcrypt format changes, backups encrypted with this Percona XtraBackup version will not be recoverable by older versions.
Bugs fixed:
  • Percona XtraBackup would crash while preparing the backup, during the shutdown, when the master thread was performing a checkpoint and purge thread was expecting that all other threads completed or were idle. Bug fixed #1618555.
  • Safe slave backup algorithm performed too short delays between retries which could cause backups to fail on a busy server. Bug fixed #1624473.
  • Percona XtraBackup didn’t check the logblock checksums. Bug fixed #1633448.
  • Fixed new compilation warnings with GCC 6. Bug fixed #1641612.
  • xbcrypt was not setting the Initialization Vector (IV) correctly (and thus is was not using an IV). This was causing the same ciphertext to be generated across different runs (for the same message/same key). The IV provides the extra randomness to ensure that the same ciphertext is not generated across runs. Bug fixed #1643949.
  • target-dir was no longer relative to the current directory but to datadir instead. Bug fixed #1611568.
  • Backup would still succeed even if xtrabackup would fail to write the metadata. Bug fixed #1623210.
  • xbcloud now supports EMC ECS Swift API Authorization requests. Bugs fixed #1638017 and #1638020 (Txomin Barturen).
  • Some older versions of MySQL did not bother to initialize page type field for pages which are not index pages (see upstream #76262 for more information). Having this page type uninitialized could cause xtrabackup to crash on prepare. Bug fixed #1641426.
  • Percona XtraBackup would fail to backup MariaDB 10.2 with the unsupported server version error message. Bug fixed #1602842.

Other bugs fixed: #1639764, #1639767, #1641596, and #1641601.

Release notes with all the bugfixes for Percona XtraBackup 2.4.5 are available in our online documentation. Please report any bugs to the launchpad bug tracker.

by Hrvoje Matijakovic at November 29, 2016 06:26 PM

Percona XtraBackup 2.3.6 is now available

Percona XtraBackup 2.4.5

Percona XtraBackup 2.3.6Percona announces the release of Percona XtraBackup 2.3.6 on November 29, 2016. Downloads are available from our download site or Percona Software Repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

This release is the current GA (Generally Available) stable release in the 2.3 series.

New Features
  • Percona XtraBackup now supports SHA256 passwords. Using the SHA256 algorithm requires either SSL encrypted connection, or using public key encryption for password exchange which is only available when both client and server are linked with OpenSSL.
  • Percona XtraBackup now supports Command Options for Secure Connections.
  • NOTE: Due to xbcrypt format changes, backups encrypted with this Percona XtraBackup version will not be recoverable by older versions.
Bugs Fixed:
  • Fixed intermittent assertion failures that were happening when Percona XtraBackup couldn’t correctly identify server version. Bug fixed #1568009.
  • Safe slave backup algorithm performed too short delays between retries which could cause backups to fail on a busy servers. Bug fixed #1624473.
  • Fixed new compilation warnings with GCC 6. Bug fixed #1641612.
  • xbcrypt was not setting the Initialization Vector (IV) correctly (and thus is was not using an IV). This was causing the same ciphertext to be generated across different runs (for the same message/same key). The IV provides the extra randomness to ensure that the same ciphertext is not generated across runs. Bug fixed #1643949.
  • Backup would still succeed even if XtraBackup would fail to write the metadata. Bug fixed #1623210.
  • xbcloud now supports EMC ECS Swift API Authorization requests. Bugs fixed #1638017 and #1638020 (Txomin Barturen).
  • Percona XtraBackup would fail to backup MariaDB 10.2 with the unsupported server version error message. Bug fixed #1602842.

Other bugs fixed: #1639764 and #1639767.

Release notes with all the bugfixes for Percona XtraBackup 2.3.6 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

by Hrvoje Matijakovic at November 29, 2016 05:55 PM

Jean-Jerome Schmidt

How to build a stable MySQL Replication environment

While straight-forward to deploy, a production-ready and hence stable MySQL Replication setup requires a bit of planning and preparation. What does a solid replication configuration look like? What do you do when a topology is broken, and replication will not restart? How do you ensure performance? Or perform schema changes?

If you'd like to learn how to build a stable environment with MySQL replication, this webinar is for you. From OS and DB configuration checklists to schema changes and disaster recovery, you’ll have the information needed.

Join us next Tuesday as Krzysztof Książek, Senior Support Engineer at Severalnines, shares his top 9 tips on how to best build a production-ready MySQL Replication environment.

Top 9 Tips for building a stable MySQL Replication environment

Tuesday, December 6th

Sign up for the webinar

We look forward to “seeing” you there!

Agenda

  1. Sanity checks before migrating into MySQL replication setup
  2. Operating system configuration
  3. Replication
  4. Backup
  5. Provisioning
  6. Performance
  7. Schema changes
  8. Reporting
  9. Disaster recovery

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

 

 

 

by Severalnines at November 29, 2016 08:07 AM

November 28, 2016

Peter Zaitsev

Percona Server for MongoDB 3.0.14-1.9 is now available

Percona Server for MongoDB 3.0.14-1.9

Percona Server for MongoDB 3.0.14-1.9Percona announces the release of Percona Server for MongoDB 3.0.14-1.9 on November 28, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.0.14-1.9 is an enhanced, open source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.0 protocol and drivers. Based on MongoDB 3.0.14, it extends MongoDB with MongoRocks and PerconaFT storage engines, as well as features like external authentication and audit logging. Percona Server for MongoDB requires no changes to MongoDB applications or code.

NOTE: PerconaFT has been deprecated and will be removed in the future.

This release includes all changes from MongoDB 3.0.13 and MongoDB 3.0.14. We implemented no additional fixes or features.

You can find the release notes in the official documentation.

by Alexey Zhebel at November 28, 2016 06:18 PM

Percona Server 5.7.16-10 is now available

percona server 5.7.16-10

percona server 5.7.16-10Percona announces the GA release of Percona Server 5.7.16-10 on November 28, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.7.16, including all the bug fixes in it, Percona Server 5.7.16-10 is the current GA release in the Percona Server 5.7 series. Percona’s provides completely open-source and free software. Find release details in the 5.7.16-10 milestone at Launchpad.

Deprecated Features:
  • Metrics for scalability measurement feature is now deprecated. Users who have installed this plugin but are not using its capability are advised to uninstall the plugin due to known crashing bugs.
Bugs Fixed
  • When a stored routine would call an administrative command such as OPTIMIZE TABLE, ANALYZE TABLE, ALTER TABLE, CREATE/DROP INDEX, etc. the effective value of log_slow_sp_statements was overwritten by the value of log_slow_admin_statements. Bug fixed #719368.
  • Server wouldn’t start after crash with with innodb_force_recovery set to 6 if parallel doublewrite file existed. Bug fixed #1629879.
  • Thread Pool thread limit reached and failed to create thread messages are now printed on the first occurrence as well. Bug fixed #1636500.
  • INFORMATION_SCHEMA.TABLE_STATISTICS and INFORMATION_SCHEMA.INDEX_STATISTICS tables were not correctly updated for TokuDB. Bug fixed #1629448.

Other bugs fixed: #1633061, #1633430, and #1635184.

The release notes for Percona Server 5.7.16-10 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

by Hrvoje Matijakovic at November 28, 2016 05:58 PM

Percona Server 5.6.34-79.1 is now available

percona server 5.6.34-79.1

percona server 5.6.34-79.1Percona announces the release of Percona Server 5.6.34-79.1 on November 28, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.6.34, including all the bug fixes in it, Percona Server 5.6.34-79.1 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release are available in the 5.6.34-79.1 milestone on Launchpad.

Deprecated features:
  • Metrics for scalability measurement feature is now deprecated. Users who have installed this plugin but are not using its capability are advised to uninstall the plugin due to known crashing bugs.
Bugs fixed:
  • When a stored routine would call an administrative command such as OPTIMIZE TABLE, ANALYZE TABLE, ALTER TABLE, CREATE/DROP INDEX, etc. the effective value of log_slow_sp_statements was overwritten by the value of log_slow_admin_statements. Bug fixed #719368.
  • Thread Pool thread limit reached and failed to create thread messages are now printed on the first occurrence as well. Bug fixed #1636500.
  • INFORMATION_SCHEMA.TABLE_STATISTICS and INFORMATION_SCHEMA.INDEX_STATISTICS tables were not correctly updated for TokuDB. Bug fixed #1629448.

Other bugs fixed: #1633061, #1633430, and #1635184.

Release notes for Percona Server 5.6.34-79.1 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at November 28, 2016 05:21 PM

November 27, 2016

Valeriy Kravchuk

Upstart Basics for Automatic Restarts of MaxScale After Crash

Historically I do not care much about MariaDB's MaxScale, at least since I know how to build it from source when needed. But, as a support engineer who work at MariaDB, sometimes I have to deal with problems related to MaxScale, and this week it happened so that I had to explain what to do to implement automatic restarts of MaxScale "daemon" in case of crashes on RHEL 6.x.

In the process I had found out that two of my most often used Linux distributions, CentOS 6.x and Ubuntu 14.04, actually use Upstart, so good old System V's init tricks and scripts work there only partially and only because somebody cared to integrate them into this "new" approach to starting of tasks and services during boot, stopping them during shutdown and supervising them while the system is running. I had to do this years ago, but customer's question finally forced me to check and study some details on how this system actually works.

So, unfortunately, there is no script like mysqld_safe to start and restart MaxScale after installing official RPM from MariaDB (in this case it was maxscale-2.0.1-2.x86_64). My first idea was to write one, but then I asked myself why it is not there yet, decided to check what's already provided and take a closer look at /etc/init.d/maxscale I have on my CentOS 6.8 VM (the closest to what customer used). It's a typical classic shell script to start service and it starts MaxScale like this:

start() {
    echo -n $"Starting MaxScale: "
...
    ulimit -HSn 65535
    daemon --pidfile $MAXSCALE_PIDFILE /usr/bin/maxscale --user=maxscale $MAXSCALE_OPTIONS >& /dev/null

    RETVAL=$?
    [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$servicename
...
    # Return right code
    if [ $RETVAL -ne 0 ]; then
       failure
       RETVAL=$_RETVAL_NOT_RUNNING
    fi

    echo

    return $RETVAL
}
Basically, it runs /usr/bin/maxscale --user=maxscale, the rest are details (like location of PID file the rest of the script then relies on). There is no advanced script to monitor status of this process with this PID (like mysqld_safe) or anything to care about automatic restarts. It is supposed that I just execute chkconfig maxscale on and then service starts when system enters proper runlevel.

Simple test shows that when maxscale process is killed it's gone and is not restarted. At good old times I'd just add something like this:

mx:2345:respawn:/usr/bin/maxscale --user=maxscale

to /etc/inittab (as nice articles like this suggests), but quick check and then further reading proves that it's not going to work on CentOS 6.8, as it uses Upstart.

So, either I had to write something similar to mysqld_safe for MaxScale, or (having in mind size of code and number of bugs we had in the past in that script) I had better to find out what is the supposed way to respawn processes in Upstart. Basic ideas are, again, simple. One has to create /etc/init/service_name.conf file and put something like this there (real code quote from one of Ubuntu files for MySQL):
start on runlevel [2345]
stop on starting rc RUNLEVEL=[016]

respawn
respawn limit 2 5
...
pre-start script
...
end script

exec /usr/sbin/mysqld

post-start script
...
end script

The file is easy to understand even without reading the manual. One has to set when service starts and stops, add respawn clause if we want to restart it in case of unexpected crashes or kills of the process, optionally limit the number of restarts and intervals between restarts etc, and, optionally, do something before and after start.

I quickly created /etc/init/maxscale.conf based on the above and it did a great job in starting it automatically upon system startup. I've just used exec /usr/bin/maxscale --user=maxscale basically and decided to deal with options and other details later if needed. But what was strange for the very beginning is that in /var/log/messages I've seen what looked like repeated attempts to start maxscale process, that failed:

Nov 26 17:44:45 centos maxscale[20229]: MaxScale started with 1 server threads.
Nov 26 17:44:45 centos init: maxscale main process ended, respawning
Nov 26 17:44:45 centos maxscale[20229]: Started MaxScale log flusher.
Nov 26 17:44:45 centos maxscale[20235]: Working directory: /var/log/maxscale
Nov 26 17:44:45 centos maxscale[20235]: MariaDB MaxScale 2.0.1 started
Nov 26 17:44:45 centos maxscale[20235]: MaxScale is running in process 20235
...
Nov 26 17:44:45 centos maxscale[20235]: Loaded module qc_sqlite: V1.0.0 from /usr/lib64/maxscale/libqc_sqlite.so
Nov 26 17:44:45 centos maxscale[20235]: MaxScale is already running. Process id: 20229. Use another location for the PID file to run multiple instances of MaxScale on the same machine.
Nov 26 17:44:45 centos init: maxscale main process (20234) terminated with status 4
Nov 26 17:44:45 centos init: maxscale main process ended, respawning
Moreover, when proper maxscale process was killed, it was NOT respawned as expected.

It was just a proper time to read the manual more carefully, this part about expect stanza (that I noted in some of official Upstart scripts):

"To allow Upstart to determine the final process ID for a job, it needs to know how many times that process will call fork(2). Upstart itself cannot know the answer to this question since once a daemon is running, it could then fork a number of "worker" processes which could themselves fork any number of times. Upstart cannot be expected to know which PID is the "master" in this case, considering it does not know if worker processes will be created at all, let alone how many times, or how many times the process will fork initially. As such, it is necessary to tell Upstart which PID is the "master" or parent PID. This is achieved using the expect stanza.
The syntax is simple, but you do need to know how many times your service forks."
Let's check quickly how many times fork() is called in maxscale (I'd know it better if I ever cared to study the source code in details, but I had not checked most part of it yet). Test based on that cookbook gives unexpected results:

[root@centos ~]# strace -o /tmp/strace.log -fFv maxscale &
[1] 2369
[root@centos ~]# sleep 10
[root@centos ~]# ps aux | grep strace
root      2369  2.6  0.0   4476   888 pts/0    S    20:28   0:00 strace -o /tmp/strace.log -fFv maxscale
root      2382  0.0  0.0 103312   868 pts/0    S+   20:28   0:00 grep strace
[root@centos ~]# pkill -9 strace
[1]+  Killed                  strace -o /tmp/strace.log -fFv maxscale
[root@centos ~]# ps aux | grep maxscale
root      2375  1.3  0.2 276168  3896 ?        Ssl  20:28   0:00 maxscale
root      2385  0.0  0.0 103312   868 pts/0    S+   20:28   0:00 grep maxscale
[root@centos ~]# egrep "\<(fork|clone)\>\(" /tmp/strace.log | wc | awk '{print $1}'
5
How comes we have 5 fork calls? Here they are:
[root@centos ~]# egrep "\<(fork|clone)\>\(" /tmp/strace.log
2374  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb024c08ab0) = 2375
2375  clone(child_stack=0x7fb01f819f10, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb01f81a9d0, tls=0x7fb01f81a700, child_tidptr=0x7fb01f81a9d0) = 2376
2375  clone(child_stack=0x7fb01e118f10, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb01e1199d0, tls=0x7fb01e119700, child_tidptr=0x7fb01e1199d0) = 2377
2375  clone(child_stack=0x7fb01d10af10, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb01d10b9d0, tls=0x7fb01d10b700, child_tidptr=0x7fb01d10b9d0) = 2378
2375  clone(child_stack=0x7fb017ffef10, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb017fff9d0, tls=0x7fb017fff700, child_tidptr=0x7fb017fff9d0) = 2379
It seems process with PID 2374 actually forked just once, to produce PID 2375, and then several threads were started by that process. We can see them:
[root@centos ~]# ps -T -p `pidof maxscale`
  PID  SPID TTY          TIME CMD
 2375  2375 ?        00:00:00 maxscale
 2375  2376 ?        00:00:00 maxscale
 2375  2377 ?        00:00:00 maxscale
 2375  2378 ?        00:00:00 maxscale
 2375  2379 ?        00:00:00 maxscale
[root@centos ~]#
So, it was really one fork() (and I could notify that even from studying /var/log/messages before) and I have to add expect fork stanza to my Upstart configuration file:

[root@centos ~]# cat /etc/init/maxscale.conf
# MaxScale service

description "MaxScale"

start on stopped rc RUNLEVEL=[2345]
stop on starting rc runlevel [!2345]

respawn
respawn limit 2 5

expect fork

exec /usr/bin/maxscale --user=maxscale
This way it works as expected, as one may easily check:

[root@centos ~]# initctl status maxscale
maxscale start/running, process 6600
[root@centos ~]# maxadmin
MaxScale> show servers
Server 0x19e47a0 (server1)
        Server:                              127.0.0.1
        Status:                              Master, Running
        Protocol:                            MySQLBackend
        Port:                                3306
        Server Version:                      5.7.15-9-log
        Node Id:                             1
        Master Id:                           -1
        Slave Ids:
        Repl Depth:                          0
        Number of connections:               0
        Current no. of conns:                0
        Current no. of operations:           0
MaxScale> quit
[root@centos ~]# kill -9 6600
[root@centos ~]# ps aux | grep maxscale
maxscale  6627  2.0  0.2 276168  3884 ?        Ssl  20:41   0:00 /usr/bin/maxscale --user=maxscale
root      6633  0.0  0.0 103312   872 pts/0    S+   20:41   0:00 grep maxscale
[root@centos ~]# initctl status maxscale
maxscale start/running, process 6627

In the /var/log/messages we clearly see that the process is respawned by init:

...
Nov 26 20:38:15 centos maxscale[6600]: Started MaxScale log flusher.
Nov 26 20:40:33 centos maxscale[6600]: Loaded module MaxAdminAuth: V2.0.0 from /usr/lib64/maxscale/libMaxAdminAuth.so
Nov 26 20:41:52 centos init: maxscale main process (6600) killed by KILL signal
Nov 26 20:41:52 centos init: maxscale main process ended, respawning
Nov 26 20:41:52 centos maxscale[6627]: Working directory: /var/log/maxscale
Nov 26 20:42:07 centos maxscale[6627]: Loaded module MaxAdminAuth: V2.0.0 from /usr/lib64/maxscale/libMaxAdminAuth.so
...
After few more checks I asked to implement this officially in packages for the Linux distributions that use Upstart, see MXS-1027.

To summarize, I wish I care more to find out how Upstart works long time ago. Now it's time to study systemd probably :) Anyway, after some reading and testing one can use it efficiently to provide automated service starts and restarts for MySQL server and services used with it.

by Valeriy Kravchuk (noreply@blogger.com) at November 27, 2016 07:04 PM

November 25, 2016

Jean-Jerome Schmidt

Planets9s - Top 9 Tips for MySQL Replication, MongoDB Sharding & NinesControl

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

New webinar December 6th on Top 9 Tips to manage MySQL Replication

Join our new webinar during which Krzysztof Książek, Senior Support Engineer at Severalnines, will share his 9 Top Tips on how to build a production-ready MySQL Replication environment. From OS and DB configuration checklists to schema changes and disaster recovery,  you’ll have the 9 top tips needed for a production-ready replication setup.

Sign up for the webinar

Sign up for NinesControl for MySQL & MongoDB in the cloud

Built on the capabilities of ClusterControl, NinesControl is a database management cloud service, with no need to install anything. It enables developers and admins to uniformly and transparently deploy and manage polyglot databases on any cloud, with no vendor lock-in. If you haven’t tested NinesControl yet, do check it out - it’s free :-)

Try NinesControl

Become a MongoDB DBA: Sharding ins- and outs - part 2

Having recently discussed how to enable sharding on a MongoDB database and define the shard key on a collection, as well as explained the theory behind all this, we now focus on the monitoring and management aspects of sharding. Just like any database requires management, shards need to be looked after. Some of the monitoring and management aspects of sharding, like backups, are different than with ordinary MongoDB replicaSets. Also operations may lead to scaling or rebalancing the cluster. Find out more in this new blog post.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at November 25, 2016 01:04 PM

November 23, 2016

Peter Zaitsev

Percona Server for MySQL 5.5.53-38.5 is now available

percona server 5.6.34-79.1

Percona Server for MySQLPercona announces the release of Percona Server for MySQL 5.5.53-38.4 on November 23, 2016. Based on MySQL 5.5.53, including all the bug fixes in it, Percona Server for MySQL 5.5.53-38.5 is now the current stable release in the 5.5 series.

Percona Server for MySQL is open-source and free. You can find release details in the 5.5.53-38.5 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Metrics for scalability measurement feature is being built by default but deprecated. Users who have installed this plugin but are not using its capability are advised to uninstall the plugin due to known crashing bugs. This feature was accidentally removed instead of deprecated in the previous release which could cause issues for users that had this feature enabled.

Find the release notes for Percona Server for MySQL 5.5.53-38.5 in our online documentation. Report bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at November 23, 2016 06:59 PM

Shlomi Noach

Discussing online schema migrations with Oracle's MySQL engineering managers

Last week I had the pleasant opportunity of introducing and discussing the operation of online schema migrations to MySQL's engineering managers, as part of their annual meeting, in London.

Together with Simon J. Mudd of Booking.com, we discussed our perception of what it takes to run online schema migrations on a live, busy system.

While the Oracle/MySQL engineers develop new features or optimize behavior in the MySQL, we of the industry have the operational expertise and understanding of the flow of working with MySQL. In all topics, and in schema migration in particular, there is a gap between what's perceived to be the use case and what the use case actually is. It is the community's task to provide feedback back to Oracle so as to align development to match operations need where possible.

Our meeting included the following:

Need for schema migrations

We presented, based on our experience in current and past companies, and based on our friends of the community's experience, the case for online schema migrations. At GitHub, at Booking.com and in many other companies I'm familiar with, we continuously deploy to production, and this implies continuous schema migrations to our production databases. We have migrations running daily; sometimes multiple per day, some time none.

With continuous deployment, we as Guardians of the Database do not wish to be blockers for the development cycle. On the contrary, we want to be out of the way as soon as possible, other than verifying a requested migration is safe. We wish to be able to deliver a migration at any given time.

Not all companies behave this way; some run a weekly aggregation of migrations. Others yet still use the Though Shall Not Pass DBA model. We tried to depict the various approaches with strong emphasis on our own approach, which is the most demanding of schema migration solutions.

The MySQL ALTER

We proceeded to discuss the in-house ALTER statement & InnoDB online DDL, and pointed out the limitations those impose on "online" operations to the effect of rendering these solutions unused by many. The serialization in replication stream means losing serving capacity, getting lagging replicas. The lack of escape path means a commitment into an hours worth of uninterruptible operation. The lack of resource control implies getting performance degraded throughout the operation.

We briefly touched on the TokuDB's ALTER and how it worked.

Replication solutions

We discussed migrating via replication: running migrations on one or more replicas at a time, finally failing over onto a promoted replica once all replicas are updated.

We know this solution to be in use in companies such as DropBox, Etsy and others. We illustrated our own reasoning for not using this solution:

  • Increased clock-time for running a migration: running a one-replica-at-a-time or few-replicas-at-a-time can double, triple, quadruple and so forth the overall migration time.
  • Concurrent migration complexity: and since runtime increases, so does the likelihood of needing to run additional migration at the same time, which highly complicates the flow in a one-at-a-time or few-at-a-time model.
  • Serving capacity: in this model some, or up to half the number of servers, are non operational. Serving capacity is reduced and we need to have more hardware to support that
  • Failover: the failover is not smooth; it either includes some outage or some block time, and at any case noticeable in production. Having a planned failover once in a while is OK, but having a failover multiple times a day is too much of a hustle, in our current setup.
  • Topology complexity: how our topologies always have some special cases, such as cross-DC replication with reduced cross-DC network traffic via intermediate masters, testing replicas with newer versions, developer-dedicated servers and others, that make shuffling of replicas around difficult to automate.

We have not discussed Galera's Rolling Schema Upgrades as we personally do not have the experience of working with it. It solves the failover issue above, but given a "normal" replication tree under the cluster, same problems as above apply.

We concluded with our personal take, that like everything else, we just like to write stuff directly onto our masters, and let the natural replication flow deal with it and get our entire topology to be consistent.

Existing trigger based migrations

We drilled down into the algorithms behind pt-online-schema-change and Facebook's OSC (the latter being rewritten today, not yet released as open source). We elaborated on the pains we saw in trigger based migrations: being unsuspendible, causing lock spaghetti, impacting write latency on the master to the point of a standstill on busy servers, being untestable.

gh-ost

I presented  gh-ost, our own, triggerless take on schema migrations. I discussed the logic behind gh-ost and how it decouples migration load from production load; the low impact the triggerless migration has on the master and on the entire replication chain, leading to low, subsecond replication lags throughout the migration and eliminating locking contention on the master. Basically the presentation Tom Krouper and I gave at Percona Live Amsterdam.

Want to Have

We followed up by a list of feature requests we could enjoy. These were largely technical issues gh-ost would benefit from, simplifying its behavior or ensuring its correctness in complex cases. We discussed dropping tables at end of migration, getting more info in the binary logs, GTID issues and more.

Acknowledgements

Thank you to Morgan Tocker for officially inviting us to this gathering. There were quite a few familiar faces in the room, and it was a friendly gathering. Thank you to all the engineering managers with whom we met!

The discussion was lively, friendly and receptive. The Oracle engineers laid out the internals of the online DDL; some of their thoughts on the potential of the JSON format; gave advice on technical issues presented. I'd like to thank them for listening to our take on the subject. There was a discussion on the possible paths Oracle can take to improve online schema operations, and I'd like to thank Oracle for sharing their own thoughts and advice!

by shlomi at November 23, 2016 01:23 PM

November 22, 2016

Peter Zaitsev

Webinar Q/A: MySQL High Availability with Percona XtraDB Cluster 5.7

percona-mysql-webinars

Q/AIn this blog I will provide answers to the questions and queries that some of you have raised during the webinar on Nov 17th.

I would like to say thank you to all of the audience who attended the talk on November 17, 2016. You can also check the recording and slides here.

Q. How is storage distribution done across the node?

A. Each node has independent storage and other resources. There is no sharing of resource. Only the write-sets are replicated.

Q. If write-set propagation fails in some manner is there any retry mechanism?

A. write-set are written to group channel and originating node waits for ack from all the nodes of the cluster. If some nodes fails to respond back then it may be loose its cluster membership. Each node needs to consume all write-sets and in given order only.

Q. Normally, we point only to one write node, can we point in Percona XtraDB Cluster 5.7 to two writing nodes balanced ? Or should the solution be ProxySQL ?

A. Percona XtraDB Cluster (PXC) being multi-master you can execute writes on multiple-nodes. (This is possible even with 5.6). ProxySQL will help you load-balance your traffic but facility to write to any node is inherent to PXC.

Q. Which service call does a joining node have to be to get cluster membership? Is there some kind of registry service?

A. There is no special registry service. This is transparent to the end-user and is handled as part of gcomm communication layer.

Q. Would it be possible to get more information about setting up proxy-sql as we are currently using haproxy but would like a more aware balancer.

A. These articles should help:

Q. Is there a recommended setup for Cluster (White Paper)? I did hear a lot of conflict issues between nodes. So I would like to see if there is a recommended setup.

A. There is not a single way to do this but there are a lot of blogs based on your use-case. Simplest one is 3 node cluster in LAN. Conflicts generally happens if user tend to update same data through multiple nodes. Dis-joint workload distribution will help avoid conflict. Said that if conflicts are inherent part of application or workload Percona XtraDB Cluster (PXC) is well armed to handle it.

Q. What is best way to figure out timeouts for geo clusters?

A. Studying latency and ensuring timeout > latency.

Q. Lets say we are running Percona XtraDB Cluster 5.6 version with 2 cluster. Can i join new node with latest version of Percona XtraDB Cluster 5.7?

A. This scenario is possible as part of Percona XtraDB Cluster (PXC) support rolling upgrade a new node demanding SST from 5.6 node will surely not work. Also, this should be a temporary solution with plan for full upgrade not something you want to continue working with.

Q. Currently i am using Percona XtraDB Cluster 5.6. Mostly i am facing a deadlock situation. When insert query is running on big table. Then Percona trys to synch with another node. At that time ant dml query won’t be executed. So at that time i need to shutdown another node. Then query execution will be fine. Then i need to start another node one by node. I even changed may Gelera/percona wrep_xx configuration, but it did not work. So is this kind of issue solved in Percona XtraDB Cluster 5.7?

A. I am not sure I understood the complete setup but let me try to summarize my understanding. You have DML running on node-1 that is replication to node-2 and node-2 workload is trying to touch the same big-table that is getting replicated write-set. Local transaction may face a abort as replicated transaction always take priority over local running transaction. There shouldn’t be a need to shutdown any of the node. If you still face this problem you can file the detailed report on lp or forum. We can discuss what is going wrong.

Q. I need to make DR platform. which replication will be suitable for this. Do i need to upgrade with Percona XtraDB Cluster 5.7 at DR side or Replication manager requires?

A. For DR you can either use extended cluster so that DR site get instant write-set or setup a new cluster and enable cluster-cluster replication using MySQL MASTER-MASTER async replication. (Given DR one way MASTER-SLAVE should also work). You don’t need to upgrade it but it is better to use consistent and updated version for all node especially mix-match of MASTER-SLAVE may have compatibility issue.

Q. What are the major differences/benefits between Percona XtraDB Cluster 5.7 and MariaDB Cluster with Galera ?

A. Percona XtraDB Cluster (PXC) is 5.7 GA. MariaDB 10.2 is proposed to be GA by Dec 2016. Besides this PXC is fully PS compatible that uses XtraDB engine and there are some small functional/usage difference and stability difference.

Q. How much time a node can be out of a cluster and still can rejoin applying writesets ? How is managed writesets retention ?

A. Time node can be offline without need for SST depends on 2 factors: rate of replicating transaction (including size) and size of galera-cache that caches these write-sets. If you think you need longer offline time and then you should set galera cache accordingly.

Q. Can we have a sample config file for geo-clusters?

A. We will try to come up with one in due-course through an upcoming blog. In the meantime, you can look at existing blogs on the Percona Database Performance blog.

Q. Whats is the limit for max_rows and max_tnx_size in Percona XtraDB Cluster (PXC) 5.7..specially for batch datalaods across multi-region cluster nodes

A. wsrep_max_ws_rows (DEFAULT 0: no limit, max: 1048576). wsrep_max_ws_size (DEFAULT: 2G, range: 1024, 2G)

Q: Does Percona XtraDB Cluster (PXC) support MySQL’s GTIDs?

A. Yes. But for Percona XtraDB Cluster (PXC) replication it uses its own GTID. This blog will help clear confusion.

Q. How does Percona XtraDB Cluster (PXC) compare to MySQL’s Group Replication?

A. Both are trying to solve the same problem, except Percona XtraDB Cluster (PXC) is matured and has been in market for quite sometime. GR is being built.

Q. Does Percona XtraDB Cluster (PXC) have a size limitations? I recently tried to setup a 2TB PXC cluster, however, during load tests there were a few instances where one node got out of sync. The server did a full copy of the data, but could not complete because the load tests kept filling up the gcache.

A. There is no such known limitation. Generally if the node received queue fills up then it will emit a FLOW CONTROL signal. Generally you will receive a queue that is small enough not to fill up gcache. If you still have log files you can share them through LP or forum. We will try to look at them.

Q. How do you perform a major version upgrade. Per MySQL’s documentation, you can not replicate from a major version to the last major version. But it is fine to replicate from one major version to the next. So how would you do this in the cluster?

A. As per MySQL you may face issues if you try to replicate from lower version (master in 5.6) to higher version slave (slave in 5.7) but it is not blocked. Some of the semantics may be different. Percona XtraDB Cluster (PXC) write-sets are different though as it shares binlog events and this write-set format has not changed in 5.7.

Q. Does Galera set a max number of nodes that can be part of the cluster?

A. No such realistic limitation.

Q. Are there docker images with this configured? Dockerhub or something?

A. This should help.

Q. What is the maximum latency that would be supported on the LAN before you would say that running a Percona XtraDB Cluster is not a good idea?

A. I guess this is configurable based on timeout. So there is no such recommended latency threshold for LAN. Lesser the better.

Q. When you start a cluster and bootstrap Node 1, then start Node 2 and Node 3. If you restart Node 1, it will rejoin the cluster but not has a bootstrap state, but it does not matter because it will join a live cluster. If my understanding is correct Bootstrap only matter for the first node starting Is that correct ? What would happens if node 1 restart with bootstrap option, will it force the other node to sync against it ? will it join the running cluster?

A. When you start node-1 for the first time it will create a new cluster and node-2 and node-3 will join the existing cluster. Depending on how node-1 is restarted it can join the existing cluster or create one more independent cluster. Recommended way is to use a valid value of wsrep_cluster_address for all nodes and just pass following extra param –wsrep_new_cluster to the bootstrap node. If you happen to restart this node avoid passing this param. The node will try to join the existing cluster.

Q. What is the overhead of running Percona Monitoring and Management (PMM)

A. Percona Monitoring and Management (PMM) installs an agent on the node to collect a lot of other statistics. From Percona XtraDB Cluster (PXC) perspective it will only run to show a status, so pretty lightweight for PXC.

Q. Is it easy (any procedure) to move from codership galera to Percona XtraDB Cluster (PXC)?

A. I don’t think there is blog about it but they are fully compatible so moving should be easy. I will findout if there is set process for this.

Q. Where is the documentation for Cluster Safe Mode and other new features discussed here?

A. pxc_strict_mode. for PFS you can check this out. ProxySQL and Percona Monitoring and Management (PMM) has blog too.

Q. Is there some integrity issues that a client believes a node is up while this one has lost the cluster ?

A. No known issue.

Q. Is there any limit of running a huge number of databases ? Say several millions ?

A. No known issue.

Q. How are the performance of proxy sql compared with ha proxy?

A. You can check this out.

Q. We use Nagios for monitoring, will a plug-in be added for monitoring the cluster, or will it be only Percona Monitoring and Management (PMM)?

A. Check this out.

Q. “Cross data center replication”. We have two data centers that have a ping latency of 2ms (consistent) and I would like to replicate between the two for DR (disaster recovery) purposes.

A. 2 ms latency between 2 DC and consistent network sounds pretty good. Just tune timeout and things will work.

Q. Do you guys have a sample config files for a quick spin off of a 3 node cluster?

A. This should help.

Q. i see that there is added features like pam authentication,thraed pool which is given fro free in percona can you elobrate on it 

A. Percona XtraDB Cluster (PXC) is PS compatible. So any feature that is present in PS will be part of Percona XtraDB Cluster (PXC).

Q. In the example that you showed, where you had a 6 node cluster , where 3 was in Site A and 3 was in Site B. If the WAN link goes down, how does the cluster determine what data set is the master set, once the wan link comes back up after a few hours?

A. In the example I have used 2 DCs. Recommended is to use 3 DCs to avoid split-brain. If you have 6 nodes in 2 DCs and WAN link goes off it will create split-brain and no node will accept workload unless user set weight to form quorum or re-bootstrap primary.

I hope I was able to answer most of the questions/queries. If you happen to have follow-up questions please post them on forum.

by Krunal Bauskar at November 22, 2016 08:34 PM

Oli Sennhauser

New Features in MySQL and MariaDB

As you probably know MySQL is an Open Source product licensed under the GPL v2. The GPL grants you the right to not just read and understand the code of the product but also to use, modify AND redistribute the code as long as you follow the GPL rules.

This redistribution has happened in the past various times. But in the western hemisphere only 3 of these branches/forks of MySQL are of relevance for the majority of the MySQL users: Galera Cluster for MySQL, MariaDB (Server and Galera Cluster) and Percona Server (and XtraDB Cluster).

Now it happened what has to happen in nature: The different branches/forks start to diverge (following the marketing rule: differentiate yourself from your competitors). The biggest an most important divergence happens now between MySQL and MariaDB.

Recently a customer of FromDual claimed that there is no more progress in the MySQL Server development whereas the MariaDB Server does significant progress. I was wondering a bit how this statement could have been made. So I try to summarize the New Features which have been added since the beginning of the separation starting with MySQL 5.1.

It is important to know, that some parts of MySQL code are directly or in modified form ported to MariaDB whereas some MariaDB features were implemented in MySQL as well. So missing features in MariaDB or improvements in MySQL can possibly make it sooner or later also into MariaDB and vice versa. Further both forks were profiting significantly from old MySQL 6.0 code which was never really announced broadly.

Further to consider: Sun Microsystems acquired MySQL in January 2008 (MySQL 5.1.23 was out then and MySQL 5.2, 5.4 and 6.0 were in the queue) and Sun was acquired by Oracle in January 2010 (MySQL 5.1.43, MySQL 5.5.1 were out, MySQL 5.2, 5.4 and 6.0 were abandoned and MySQL 5.6 was in the queue).

MySQL 5.1 MariaDB 5.1 (link), 5.2 (link) and 5.3 (link)
  • Partitioning
  • Row-based replication
  • Plug-in API
  • Event scheduler.
  • Server log tables.
  • Upgrade program mysql_upgrade.
  • Improvements to INFORMATION_SCHEMA.
  • XML functions with Xpath support.

MariaDB 5.1

  • Storage Engines
    • Aria (Crash-safe MyISAM)
    • XtraDB plug-in (Branch of InnoDB)
    • PBXT (transactional Storage Engine)
    • Federated-X (replacement for Federated).
  • Performance
    • Faster CHECKSUM TABLE.
    • Character Set conversion improvement/elimination.
    • Speed-up of complex queries using Aria SE for temporary tables.
    • Optimizer: Table elimination.
  • Upgrade from MySQL 5.0 improved.
  • Better testing.
  • Microseconds precision in PROCESSLIST.

MariaDB 5.2

  • Storage Engines
    • OQGRAPH (Graph SE)
    • SphinxSE (Full-text search engine)
  • Performance
    • Segmented MyISAM key cache (instances)
    • Group Commit for Aria SE
  • Security
    • Pluggable Authentication
  • Virtual columns
  • Extended user statistics
  • Storage Engine specific CREATE TABLE
  • Enhancements to INFORMATION_SCHEMA.PLUGINS table

MariaDB 5.3

  • Performance
    • Subquery Optimization
      • Semi-join subquery optimizations
      • Non-semi-join optimizations
      • Subquery Cache
      • Subquery is not materialized any more in EXPLAIN
    • Optimization for derived tables and views
      • No early materialization of derived tables
      • Derived Table Merge optimization
      • Derived Table with Keys optimization
      • Fields of mergeable views and derived tables are involved in optimization
    • Disk access optimization
      • Index Condition Pushdown (ICP)
      • Multi-Range-Read optimization (MRR)
    • Join optimizations
      • Block-based Join Algorithms: Block Nested Loop (BNL) for outer joins, Block Hash Joins, Block Index Joins (Batched Key Access (BKA) Joins)
    • Index Merge improvements
  • Replication
    • Group Commit for Binary Log
    • Annotation of row-based replication events with the original SQL statement
    • Checksum for binlog events
    • Enhancements for START TRANSACTION WITH CONSISTENT SNAPSHOT
    • Performance improvement for row-based replication for tables with no primary key
  • Handler Socket Interface included.
  • HANDLER READ works with prepared statements
  • Dynamic Column support for Handler Interface
  • Microsecond support
  • CAST extended
  • Windows performance improvements
  • New status variables
  • Progress reports for some operations
  • Enhanced KILL command
MySQL 5.5 (link) MariaDB 5.5 (link)
  • InnoDB
    • InnoDB Version 5.5
    • Default storage engine switched to InnoDB.
    • InnoDB fast INDEX DROP/CREATE feature added.
    • Multi-core scalability. Focus on InnoDB, especially locking and memory management.
    • Optimizing InnoDB I/O subsystem to more effective use of available I/O capacity.
  • Performance
    • MySQL Thread Pool plug-in (Enterprise)
  • Security
    • MySQL Audit plug-in (Enterprise)
    • MySQL pluggable authentication (Enterprise) for LDAP, Kerberos, PAM and Windows login
  • Replication
    • Semi-synchronous replication.
  • Partitioning
    • 2 new partition types (RANGE COLUMNS, LIST COLUMNS).
    • TRUNCATE PARTITION.
  • Proxy Users
  • Diagnostic improvements to better access execution an performance information including PERFORMANCE_SCHEMA, expanded SHOW ENGINE INNODB STATUS output and new status variables.
  • Supplementary Unicode characters (utf16, utf32, utf8mb4).
  • CACHE INDEX and LOAD INDEX INTO CACHE for partitioned MyISAM tables.
  • Condition Handling: SIGNAL and RESIGNAL.
  • Introduction of Metadata locking to prevent DDL statements from compromising transactions serializability.
  • IPv6 Support
  • XML enhancement LOAD_XML_INFILE.
  • Build chain switched to CMake to ease build on other platforms including Windows.
  • Deprecation and remove of features.
  • Storage Engines
    • SphinxSE updated to 2.0.4
    • PBXT Storage Engine is deprecated.
  • XtraDB
    • MariaDB uses XtraDB 5.5 as compiled in SE and InnoDB 5.5 as plug-in.
    • Extended Keys support for XtraDB
  • Performance
    • Thread pool plug-in
    • Non-blocking client API Library
  • Replication
    • Updates on P_S tables are not logged to binary log.
    • replicate_* variables are dynamically.
    • Skip_replication option
  • LIMIT ROWS EXAMINED
  • New status variables for features.
  • New plug-in to log SQL level errors.
MySQL 5.6 (link) MariaDB 10.0 (link)
  • InnoDB
    • InnoDB Version 5.6
    • InnoDB full-text search.
    • InnoDB transportable tablespace support
    • Different InnoDB pages size implementation (4k, 8k, 16k)
    • Improvement of InnoDB adaptive flushing algorithm to make I/O more efficient.
    • NoSQL style Memcached API to access InnoDB data.
    • InnoDB optimizer persistent statistics.
    • InnoDB read-only transactions.
    • Separating InnoDB UNDO tablespace from system tablespace.
    • Maximum InnoDB transaction log size increased from 4G to 512G.
    • InnoDB read-only capability for read-only media (CD, DVD, etc.)
    • InnoDB table compression.
    • New InnoDB meta data table in INFORMATION_SCHEMA.
    • InnoDB internal performance performance enhancements.
    • Better InnoDB deadlock detection algorithm. Deadlock can be written to MySQL error log.
    • InnoDB buffer pool state saving and restoring capabilities.
    • InnoDB Monitor dynamially disable/enable.
    • Online and inplace DDL operations for normal and partitioned InnoDB Tables to reduce application downtime.
  • Optimizer
    • ORDER BY non-index-column for simple queries and subqueries
    • Disk-Sweep Multi-Range Read (MRR) optimization for secondary index/table access to reduce I/O
    • Index Condition Pushdown (ICP) optimization by pushing down the WHERE filter to the storage engine.
    • EXPLAIN also works for DML statemetns.
    • Optimizing of subqueries in derived tables (FROM (...)) by postponing or indexing deived tables.
    • Implementation of semi-join and materialization strategies to optimize subquery execution.
    • Batched Key Access (BKA) join algorithm to improve join performance during table scanning.
    • Optimizer trace capabilities.
  • Performance Schema (P_S)
    • Instrumentation for Statements and stages
    • Configuration of consumers at server startup
    • Summary tables for table and index I/O and for table locks
    • Event filtering by table
    • Various new instrumentation.
  • Security
    • Encrypted authentication credentials
    • Stronger encryption for passwords (SHA-256 authentication plugin)
    • MySQL User password expiration.
    • Password validation plugin to check password strength
    • mysql_install_db can create secure root password by default
    • cleartext password is not written to any log file any more.
    • MySQL Firewall (Enterprise)
  • Replication
    • Transaction based replication using global transaction identifiers (GTID)
    • Row Image Control to reduce binary log volume.
    • Crash-safe replication with checksumming and verfiying.
    • IO and SQL thread information can be stored in an transactional table inside the DB.
    • MySQL binlog streaming with mysqlbinlog possible.
    • Delayes replication
    • Parallel replication on schema level.
  • Partitioning
    • Number of partitions including subpartitions increased to 8192.
    • Exchange partition with a normal table.
    • Explicit selection of specific partiton is possible.
    • Partition lock prunining for DML and DDL statements.
  • Condition handling: GET DIAGNOSTICS and SET DIAGNOSTICS
  • Server defaults changes.
  • Data types TIME, DATETIME and TIMESTAMP with microseconds
  • Host cache exposure and connection errors status infromation for finding connection problems.
  • Improvement in GIS functions.
  • Deprecation and remove of features.
  • Storage Engine
    • Cassandra Storage Engine
    • Conncect Storage Engine
    • Squence Storage Engine
    • Better table discovery (Federated-X)
    • Spider Storage Engine
    • TokuDB Storage Engine
    • Mroonga fulltext search Storage Engine
  • XtraDB
    • XtraDB Version 5.6
    • Async commit checkpoint in XtraDB and InnoDB
    • Support for atomic writes on FusionIO DirectFS
  • Replication
    • Parallel Replication
    • Global Transaction ID (GTID)
    • Multi Source Replication
  • Performance
    • Subquery Optimization (EXISTS to IN)
    • Faster UNIQUE KEY generation
    • Shutdown performance improvment for MyISAM/Aria table (adjustable hash size)
  • Security
    • Roles
    • MariaDB Audit Plugin
  • Optimizer
    • EXPLAIN for DML Statements
    • Engine independent table statistics
    • Histogram based statistics
    • QUERY_RESPONSE_TIME plugin
    • SHOW EXPLAIN for running connections
    • EXPLAIN in the Slow Query Log
  • Per thread memory usage statistics
  • SHOW PLUGINS SONAME
  • SHUTDOWN command
  • Killing a query by query id not thread id.
  • Return result set of delete rows with DELETE ... RETURNING
  • ALTER TABLE IF (NOT) EXISTS
  • CREATE OR REPLACE TABLE
  • Dynamic columns referenced by name
  • Multiple use locks (GET_LOCK) in one connection
  • Better error messages
  • New regular expressions (PCRE) REGEXP_REPLACE, REGEXP_INSTR, REGEXP_SUBSTR
  • Metadata lock information in INFORMATION_SCHEMA
  • Priority queue optimzation visibility
  • FLUSH TABLE ... FOR EXPORT flushes changes to disk for binary copy
  • CURRENT_TIMESTAMP as DEFAULT for DATETIME
  • Various features backported from MySQL 5.6
MySQL 5.7 (link) MariaDB 10.1 (link)
  • InnoDB
    • InnoDB Version 5.7
    • VARCHAR size increase can be in-place in some cases.
    • DDL performance improvements for temporary InnoDB tables (CREATE DROP TRUNCATE, ALTER)
    • Active InnoDB temporary table metadata are exposed in table INNODB_TEMP_TABLE_INFO.
    • InnoDB support spatial data type (GIS, DATA_GEOMETRY)
    • Separate tablespace for temporary InnoDB tables.
    • Support for InnoDB Full-text parser plugins was added.
    • Multiple page cleaner threads were added.
    • Regular an paritioned InnoDB tables can be rebuilt using online inplace DDL commands (OPTIMZE, ALTER TABLE FORCE)
    • Automatic detection, support and optimization for Fusion-io NVM file system to support atomic writes.
    • Better support for Transportable Tablespaces to ease backup process.
    • InnoDB Buffer Pool size can be configured dynamically.
    • Multi-threaded page cleaner support for shutdown and recovery phase.
    • InnoDB spatial index support for online in place operation (ADD SPATIAL INDEX)
    • InnoDB sorted index builds to improve bulk loads.
    • Identification of modified tablespaces to increase crash recovery performance.
    • InnoDB UNDO log truncation.
    • InnoDB native partion support.
    • InnoDB general tablespace support for databases with a huge amount of tables.
    • InnoDB data at rest encryption for file-per-table tablespaces.
  • Performance
    • EXPLAIN for running connections (FOR CONNECTIONS)
    • Finer Control of optimizer hints.
  • Security
    • Old password support has been removed.
    • Autmomatic password expiry policies.
    • Lock and unlock of accounts.
    • SSL and RSA certificate and key file generation.
    • SSL enabled automatically if available.
    • MySQL will be initialized secure by default (= hardened)
    • STRICT_TRANS_TABLES sql_mode is now enabled by default.
    • ONLY_FULL_GROUP_BY sql_mode made more sophisticated to only prohibit non deterministic query.
  • Replication
    • Master dump thread was refactored to improve throughput.
    • Replication Master change without STOP SLAVE.
    • Multi-source replication introduced.
  • Partitioning
    • HANDLER statment works now on partitioned tables.
    • Index Condition Pushdown (ICP) works for partitioned InnoDB and MyISAM tables.
    • ALTER TABLE EXCHANGE PARTITION WITHOU VALIDATION is possible to improve performance of exchnage.
  • Native JSON support
    • Data type JSON.
    • JSON functions: JSON_ARRAY, JSON_MERGE, JSON_OBJECT, JSON_CONTAINS, JSON_CONTAINS_PATH, JSON_EXTRACT, JSON_KEYS, JSON_SEARCH, JSON_APPEND, JSON_ARRAY_APPEND, JSON_ARRAY_INSERT, JSON_INSERT, JSON_QUOTE, JSON_REMOVE, JSON_REPLACE, JSON_SET, JSON_UNQUOTE, JSON_DEPTH, JSON_LENGTH, JSON_TYPE, JSON_VALID
  • System and status variables moved from INFORMATION_SCHEMA to PERFORMANCE_SCHEMA.
  • Sys Schema created by default.
  • Condition handling: GET STACKED DIAGNOSTICS
  • Multiple triggers per event are possible now.
  • Native logging to syslog possible.
  • Generated Column support.
  • Database rewriting in mysqlbinlog.
  • Control+C in mysql client does not exit any more but interrupts query only.
  • New China National Standard GB18030 character set.
  • RENAME INDEX is online inplace without a table copy.
  • Chinese, Japanese and Korean (CJK) full-text parser implemented (ngram MeCab full-test parser plugins).
  • Deprecation and remove of features.
  • XtraDB
    • Allow up to 64K pages in InnoDB (old limit was 16K).
    • Defragmenting InnoDB Tablespaces improved which uses OPTIMIZE TABLE to defragment InnoDB tablespaces.
    • XtraDB page compression
  • Performance
    • Page compression for FusionIO
    • Do not create .frm files for temporary tables.
    • UNION ALL works without usage of a temporary table.
    • Scalability fixes for Power8.
    • Performance improvementes on simple queries.
    • Performance Schema tables no longer use .frm files.
    • xid cache scalability was significantly improved.
  • Replication
    • Optimistic mode of in-order parallel replication
    • domain_id based replication filters
    • Enhanced semisync replication: Wait for at least one slave to acknowledge transaction before committing.
    • Triggers can now be run on the slave for row-based events.
    • Dump Thread Enhancements: Makes multiple slave setups faster by allowing concurrent reading of binary log.
    • Throughput improvements in parallel replication.
    • RESET_MASTER is extended with TO.
  • Optimizer
    • ANALYZE statement provides output for how many rows were actually read, etc.
    • EXPLAIN FORMAT=JSON
    • ORDER BY optimization is improved.
    • MAX_STATEMENT_TIME can be used to automatically abort long running queries.
  • Security
    • Password validation plug-in API.
    • Simple password check password validation plugin.
    • Cracklib_password_check password validation plugin.
    • Table, Tablespace and Log at-rest encryption (TDE)
    • SET DEFAULT ROLE
    • New columns for the INFORMATION_SCHEMA.APPLICABLE_ROLES table.
  • Galera Cluster plug-in becomes standard in MariaDB.
  • Wsrep information in INFORMATION_SCHEMA: WSREP_MEMBERSHIP and WSREP_STATUS
  • Consistent support for IF EXISTS and IF NOT EXISTS and OR REPLACE for: CREATE DATABASE, CREATE FUNCTION UDF, CREATE ROLE, CREATE SERVER, CREATE USER, CREATE VIEW, DROP ROLE, DROP USER, CREATE EVENT, DROP EVENT, CREATE INDEX, DROP INDEX, CREATE TRIGGER, DROP TRIGGER
  • Information Schema plugins can now support SHOW and FLUSH statements.
  • GET_LOCK() now supports microseconds in the timeout.
  • The number of rows affected by a slow UPDATE or DELETE is now recorded in the slow query log.
  • Anonymous Compount Statents blocks are supported.
  • SQL standards-compliant behavior when dealing with Primary Keys with Nullable Columns.
  • Automatic discovery of PERFORMANCE_SCHEMA tables.
  • INFORMATION_SCHEMA.SYSTEM_VARIABLES, enforce_storage_engine, default-tmp-storage-engine, mysql56-temporal-format, Slave_skipped_errors, silent-startup
  • New status variables to show the number of grants on different object.
  • Set variables per statement: SET STATEMENT
  • Support for Spatial Reference systems for the GIS data.
  • More functions from the OGC standard added: ST_Boundary, ST_ConvexHull, ST_IsRing, ST_PointOnSurface, ST_Relate
  • GIS INFORMATION_SCHEMA tables: GEOMETRY_COLUMNS, SPATIAL_REF_SYS
MySQL 8.0 (link) MariaDB 10.2 (link)
  • InnoDB
    • InnoDB Version 8.0
    • AUTO_INCREMENT values are persisted accross server restarts.
    • Index corruption and in-memory corruption detection written persistently to the transaction log.
    • InnoDB Memcached plug-in supports multiple get operations.
    • Deadlock detection can be disabled and leads to a lock timeout to increase performance.
    • Index pages cached in buffer pool are listed in INNODB_CACHED_INDEXES.
    • All InnoDB temporary tables are created in InnoDB shared temporary tablespace.
  • JSON
    • Inline path operator ->> added.
    • Column paht operator -> improved.
    • JSON aggregation functions JSON_ARRAYAGG() and JSON_OBJECTAGG() added.
  • Security
    • Account management supports roles.
    • Aromicity in User Management DDLs.
  • Transactional data dictionary (DD).
  • Common Table Expressions (CTE, recursive SQL, Series creation)
  • Descending Indexes
  • Scaling and Performance of INFORMATION_SCHEMA (1 Mio table problem)
  • Deprecation and remove of features.

MySQL 8.0 is currently in a very early stage (DMR) so this list will increase over time!

  • XtraDB
    • XtraDB Version 5.6
  • Security
    • SHOW CREATE USER
    • CREATE USER and ALTER USER extended for limiting resources and TLS/SSL support.
  • Performance
    • Connection creation speed-up by separate thread.
  • Optimizer
    • EXPLAIN FORMAT=JSON improved.
  • Partition
    • Catchall partion for LIST partions.
  • Introduction of Window functions: CUME_DIST, DENSE_RANK, NTILE, PERCENT_RANK, RANK, ROW_NUMBER
  • WITH clause for recursive queries.
  • CHECK CONSTRAINT support.
  • Support for DEFAULT with expression.
  • BLOB and TEXT can now have default values.
  • Virtual computed columns restrictions lifted.
  • Supported decimals in DECIMAL increased from 30 to 38.
  • Multiple triggers for the same event.
  • Oracle style EXECUTE IMMEDIATE.
  • PREPARE STATEMENT understand most expressions.
  • I_S.USER_VARIABLES introduced as plug-in.
  • New status information: com_alter_user, com_multi, com_show_create_user.
  • New variables: innodb_tmpdir, read_binlog_speed_limit.
  • To come soon
    • MariaDB Column store (ex. InfiniDB)
    • MyRocks?

MariaDB 10.2 is currently in a early stage (beta release) so this list will increase over time...

MySQL 9.0 MariaDB 10.3 (link) and 10.4

No details are known yet. MySQL developer meetingt took place in November 2016.

  • Suggested features
    • Hidden columns
    • Long unique constraints
    • SQL based CREATE AGGREGATE FUNCTION
    • New data types: IPv6, UUID, pluggable data-type API
    • Better support for CJK (Chinese, Japanese, and Korean) languages. Include the ngram full-text parser and MeCab full-text parser .
    • Improvement of Spider SE.
    • Support for SEQUENCES
    • Additional PL/SQL parser
    • Support for INTERSECT
    • Support for EXCEPT

MariaDB 10.3 is currently in a very early stage so this list will increase over time!


Please let me know if I got something wrong or forgot any significant feature for theses 2 MySQL branches.

Taxonomy upgrade extras: 

by Shinguz at November 22, 2016 02:45 PM

Jean-Jerome Schmidt

Top 9 Tips for building a production-ready MySQL Replication environment

Join us on Tuesday, December 6th, for our last webinar of the year. Krzysztof Książek, Senior Support Engineer at Severalnines, will be sharing his top 9 tips on how to best build a production-ready MySQL Replication environment.

MySQL replication is a well known and proven solution for building distributed setups of databases, and it has gone through a total transformation with version 5.6 and more recently, 5.7. Although straight-forward to deploy, a production-ready setup requires a bit of planning and preparation. What does a good replication configuration look like? How do you ensure performance? What do you do when a topology is broken, and replication will not restart? How to perform schema changes?

So if you'd like to learn what is needed to build a stable environment using MySQL replication, this webinar is for you!

Top 9 Tips for building a stable MySQL Replication environment

Tuesday, December 6th

Sign up for the webinar

We look forward to “seeing” you there!

Agenda

  1. Sanity checks before migrating into MySQL replication setup
  2. Operating system configuration
  3. Replication
  4. Backup
  5. Provisioning
  6. Performance
  7. Schema changes
  8. Reporting
  9. Disaster recovery

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

by Severalnines at November 22, 2016 01:42 PM

November 18, 2016

Peter Zaitsev

Percona Server 5.5.53-38.4 is now available

percona server 5.6.34-79.1

Percona Server 5.5.53-38.4Percona announces the release of Percona Server 5.5.53-38.4 on November 18, 2016. Based on MySQL 5.5.53, including all the bug fixes in it, Percona Server 5.5.53-38.4 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. You can find release details in the 5.5.53-38.4 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Removed Features:
Bugs Fixed:
  • When a stored routine would call an “administrative” command such as OPTIMIZE TABLE, ANALYZE TABLE, ALTER TABLE, CREATE/DROP INDEX, etc. the effective value of log_slow_sp_statements was overwritten by the value of log_slow_admin_statements. Bug fixed #719368.
  • Thread Pool thread limit reached and failed to create thread messages are now printed on the first occurrence as well. Bug fixed #1636500.

Other bugs fixed: #1612076, #1633061, #1633430, and #1635184.

Find the release notes for Percona Server 5.5.53-38.4 in our online documentation. Report bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at November 18, 2016 09:33 PM

WiredTiger B-Tree versus WiredTiger In-Memory: Q & A

WiredTiger

WiredTigerIn this blog, I will provide answers to the Q & A for the WiredTiger B-Tree versus WiredTiger In-Memory webinar.

First, I want to thank everybody for attending the October, 13 webinar. The recording and slides for the webinar are available here. Below is the list of questions that I wasn’t able to fully answer during the webinar, with responses:

Q: Does the In-Memory storage engine have an oplog? Do we need more RAM if the oplog is set to be bigger?
Q: So we turn off the oplog?
Q: How is data replicated without oplog? Do you confound it with journaling?

A: Percona Memory Engine for MongoDB can be started with or without oplog, depending on whether it started as part of a replica set or standalone (you cannot explicitly turn oplog on or off). But if created, oplog will be stored in memory as well. You can still control its size with the option

--oplogSize
 .

The recovery log (journal) is disabled for the Percona Memory Engine.

Q: After a crash of the In-Memory storage engine, does it need a complete initial sync? Means, cloning all databases?

A: Yes.

Q: WiredTiger reserves 50% of RAM for de-compression. Is this also true for the In-Memory engine?

A: Where did you find this information? Please point to its location in the docs in the comments section below. I asked Percona developers to confirm or deny this for the Percona Memory Engine, and this was their answer:

WT decompresses data block-wise, and each block is of some reasonable size (usual numbers are couple of Megs, let’s say). Decompressor knows the size of uncompressed data by reading this info from compressed block (this info is stored during compression). It creates an extra buffer of uncompressed block size, decompresses data into this buffer, then uses that decompressed buffer and frees the initial one. So there’s no reserve of memory for either compression or decompression, and no docs stating that.

Please note that this comment applies only to block compression, which is only used during disk I/O when WiredTiger reads and writes blocks, thus not available for Percona Memory Engine.

Q: There is no compression of data in this engine?

A: The Percona Memory Engine uses only prefix compression for indexes. Theoretically, it can use other types of compression: dictionary and Huffman (but they both disabled in MongoDB).

Q: With all the data in memory, is there much benefit to having indexes on the data?

A: Yes, because with index access you will read less data. While reading from memory is much faster than from disk, it is faster to read just few rows from memory instead of scanning millions.

Q: Our db is 70g. Will we need 70g memory to use Percona In-Memory?
Q: How much memory should be allocated for 70g db size?

A: What storage engine do you use? How do you calculate size? If this is WiredTiger and you count the space it allocates, answer is “yes, you need 70G RAM to use Percona Memory Engine.”

Q: What is the difference in size of data between WiredTiger on disks versus WiredTiger In-Memory?

A: There is no difference: the size is same. Please note that WiredTiger (on which the Percona Memory Engine is based) itself can additionally allocate up to 50% of the amount specified in the

--inMemorySize
 option. You can check
db.serverStatus().inMemory.cache
 to find out how much of the specified memory is used for storing your data.
"bytes currently in the cache"
  shows the total number of bytes occupied by the physical representation of all MongoDB’s databases, and
"maximum bytes configured"
  shows what is passed in option
--inMemorySize
. The difference between the two can be used to calculate the amount of memory in bytes available.

Q: What is the way to convert data from disk to In-Memory? Using mongodump and rebuild the indexes?

A: Yes

Q: An enhancement request is to enable regular and In-Memory engines on the same MongoDB instance.

A: This is a MongoDB limitation, but noted and reported for Percona at https://jira.percona.com/browse/PSMDB-88.

by Sveta Smirnova at November 18, 2016 06:15 PM

Jean-Jerome Schmidt

Planets9s - NinesControl announcement, scaling & sharding MongoDB - and more!

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

Check out the new NinesControl for MySQL & MongoDB in the cloud

This week we were happy to announce NinesControl, which is taking its first steps to offer quick and easy automation and management of databases for the cloud to developers and admins of all skill levels. Built on the capabilities of ClusterControl, NinesControl is a database management cloud service, with no need to install anything. It enables users to uniformly and transparently deploy and manage polyglot databases on any cloud, with no vendor lock-in. If you haven’t seen NinesControl yet, do check it out!

Try NinesControl

Watch the replay: scaling & sharding MongoDB

In this webinar replay, Art van Scheppingen, Senior Support Engineer at Severalnines, shows you how to best plan your MongoDB scaling strategy up front and how to prevent ending up with unusable secondary nodes and shards. Art also demonstrates how to leverage ClusterControl’s MongoDB scaling capabilities and have ClusterControl manage your shards.

Watch the replay

How to deploy & monitor MySQL and MongoDB clusters in the cloud with NinesControl

As part of this week’s NinesControl announcement, we’ve published this handy blog post, which shows you how to deploy and monitor MySQL Galera, MariaDB and MongoDB clusters on DigitalOcean and Amazon Web Services using NinesControl. Before you attempt to deploy, you’ll need to configure access credentials to the cloud you’d like to run on, as per the process described in the blog below.

Read the blog

How to configure access credentials in NinesControl for AWS & Digital Ocean

Once you register for NinesControl and provide your cloud “access key”, the service will launch droplets in your region of choice and provision database nodes on them. In this blog post we show you how to configure that access to DigitalOcean and AWS. You’ll be all set to start deploying and monitoring your database cluster in the cloud of your choice with NinesControl.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at November 18, 2016 11:55 AM

November 17, 2016

Peter Zaitsev

Help Us Shape the Future of Percona

Percona

PerconaLet us know what you think about Percona, and what we should be thinking about for the future.

Over the last ten years, Percona has grown considerably. We’ve moved from being a strictly MySQL company, to a company that supports MongoDB and other open source databases. Percona Live and Percona Live Europe have become key meeting places for the open source community, and now are important hubs for learning about and discussing open source database solutions.

As we look forward to the next ten years of business, we want to get an idea of what you think of us. As we plan for the future, we’d like to hear about your experience with Percona today and get your input on how we can continue to evolve. 

To achieve that end, we’ve put together a survey of questions about us, our services, our products and the open source community’s perception of us. We would appreciate you taking the time to fill it out so we can know your thoughts. Your feedback helps us shape our company and grow the community.

Take the survey here: http://survey.newkind.com/r/rUkjDHPd

It should take 10-15 minutes to complete and will remain open until Friday, Dec. 2. Thanks again for helping us prepare for the future.

by Peter Zaitsev at November 17, 2016 07:05 PM

MariaDB Foundation

MariaDB Presentations from Percona Live Amsterdam

In October, the MariaDB Foundation attended Percona Live in Amsterdam. Below is a selection of MariaDB-related sessions with links to the slides. There were numerous other sessions that were relevant for MariaDB users, but those listed below are some of the most closely related. The MariaDB Foundation – Ensuring continuity and open collaboration in the […]

The post MariaDB Presentations from Percona Live Amsterdam appeared first on MariaDB.org.

by ian at November 17, 2016 04:34 PM

Jean-Jerome Schmidt

About cloud lock-in and open source databases

The cloud is no longer a question of if, but of when. Many IT leaders, however, find that one consistent barrier to their adoption of the cloud is vendor lock-in. What do you do when you are forced to stay with a provider that no longer meets your needs?

But is cloud lock-in a problem?

While it appears that you are able to move your workload from one cloud to another without being penalised economically, (the utility billing methods of the major pay-as-you-go platforms like Amazon Web Services or Azure ensure that you would only pay for services used, rather than paying for provisioned resources that may or may not be used.), the reality is that it might not work if the exact services and resources that you are using aren’t available on the cloud you’re migrating to.

Hardware is commodity, and if Cloud Infrastructure as a Service (IaaS) is just about renting VMs by the hour, then Cloud IaaS has very little lock-in. But cloud lock-in occurs when you adopt services beyond basic IaaS. The major cloud vendors do not support value-added services the same way, and this is especially true for database services. AWS, Google, Microsoft Azure, Oracle and IBM have cloud database services that work differently and are proprietary in nature, in some cases with specific APIs and data models. This means that even an open source database, combined with all the cloud vendor’s behind the hood automation, may not easily be migrated to another service.

Data can be the most important asset to the organisation, and is critical to the success of cloud applications. It is also hard to move as it is stateful, meaning that the application keeps track of the state of the interaction with users and other systems. The more data a user has, the harder it is to move. Services and applications also tend to gravitate towards the data. For this reason, the cloud vendors will go to great lengths to run and manage your data. For instance, it is free, and relatively easy, to move any amount of data into an AWS EC2 instance, but you’ll have to pay to transfer data out of AWS. The database services on Amazon are only available on Amazon, so good luck if you want to migrate to a new cloud provider or use multiple hosting providers for your application. This puts you, as customer, in a weak negotiating position and locks you into your current cloud vendor.

So, AWS has RDS, Aurora and DynamoDB. Microsoft has Azure DocumentDB and Azure SQL Database. Google has Cloud BigTable, Cloud Datastore, and Cloud SQL.

Severalnines recently joined the party with the NinesControl cloud service.

There are plenty of cloud databases out there already, so what makes NinesControl different? Well, if you are not prepared to go “all in” with a single cloud provider, then you might want to have a good look at NinesControl. It allows you to separate your data from the underlying cloud infrastructure. It supports multiple clouds, you can even bring it on-prem. The automation and management builds upon ClusterControl, a proven product used in production by companies like Cisco, Monster, AVG, BT, Eurovision amongst others.

If you want to avoid cloud vendor lock-in, then take control of your data.

by vinay at November 17, 2016 10:39 AM

Peter Zaitsev

All You Need to Know About GCache (Galera-Cache)

GCache

GCacheThis blog discusses some important aspects of GCache.

Why do we need GCache?

Percona XtraDB Cluster is a multi-master topology, where a transaction executed on one node is replicated on another node(s) of the cluster. This transaction is then copied over from the group channel to Galera-Cache followed by apply action.

The cache can be discarded immediately once the transaction is applied, but retaining it can help promote a node as a DONOR node serving write-sets for a newly booted node.

So in short, GCache acts as a temporary storage for replicated transactions.

How is GCache managed?

Naturally, the first choice to cache these write-sets is to use memory allocated pool, which is governed by gcache.mem_store. However, this is deprecated and buggy and shouldn’t be used.

Next on the list is on-disk files. Galera has two types of on-disk files to manage write-sets:

  • RingBuffer File:
    • A circular file (aka RingBuffer file). As the name suggests, this file is re-usable in a circular queue fashion, and is pre-created when the server starts. The size of this file is preconfigured and can’t be changed dynamically, so selecting a proper size for this file is important.
    • The user can set the size of this file using gcache.size. (There are multiple blogs about how to estimate size of the Galera Cache, which is generally linked to downtime. If properly planned, the next booting node will find all the missing write-sets in the cache, thereby avoiding need for SST.)
    • Write-sets are appended to this file and, when needed, the file is re-cycled for use.
  • On-demand page store:
    • If the transaction write-set is large enough not to fit in a RingBuffer File (actually large enough not to fit in half of the RingBuffer file) then an independent page (physical disk file) is allocated to cache the write-sets.
    • Again there are two types of pages:
      • Page with standard size: As defined by gcache.page_size (default=128M).
      • Page with non-standard page size: If the transaction is large enough not to fit into a standard page, then a non-standard page is created for the transaction. Let’s say
        gcache.page_size=1M
         and transaction
        write_set = 1.5M
        , then a separate page (in turn on-disk file) will be created with a size of 1.5M.

How long are on demand pages retained? This is controlled using following two variables:

  • gcache.keep_pages_size
    • keep_pages_size
       defines total size of allocated pages to keep. For example, if
      keep_pages_size = 10M
       then N pages that add up to 10M can be retained. If N pages add to more than 10M, then pages are removed from the start of the queue until the size falls below set threshold. A size of 0 means don’t retain any page.
  • gcache.keep_pages_count (PXC specific)
    • But before pages are actually removed, a second check is done based on
      page_count
      . Let’s say
      keep_page_count = N+M
      , then even though N pages adds up to 10M, they will be retained as the 
      page_count
       threshold is not yet hit. (The exception to this is non-standard pages at the start of the queue.)

So in short, both condition must be satisfied. The recommendation is to use whichever condition is applicable in the user environment.

Where are GCache files located?

The default location is the data directory, but this can be changed by setting gcache.dir. Given the temporary nature of the file, and iterative read/write cycle, it may be wise to place these files in a faster IO disk. Also, the default name of the file is gcache.cache. This is configurable by setting gcache.name.

What if one of the node is DESYNCED and PAUSED?

If a node desyncs, it will continue to received write-sets and apply them, so there is no major change in gcache handling.

If the node is desynced and paused, that means the node can’t apply write-sets and needs to keep caching them. This will, of course, affect the desynced/paused node and the node will continue to create on-demand page store. Since one of the cluster nodes can’t proceed, it will not emit a “last committed” message. In turn, other nodes in the cluster (that can purge the entry) will continue to retain the write-sets, even if these nodes are not desynced and paused.

by Krunal Bauskar at November 17, 2016 01:21 AM

November 16, 2016

Peter Zaitsev

Is Docker Good for Your Database?

Docker

DockerThis blog post reviews the appropriateness of Docker and other container solutions for your database environment.

A few weeks back, I wrote a fairly high-level blog post about containers. It covered what you should consider when thinking about using Docker, rkt, LXC, etc. I hope you’ve taken the chance to give it a quick read. It’s a good way to understand some of the disciplines you need to consider before moving to a new technology. However, it sparked a conversation in our Solutions Engineering team. Hopefully, the same one that you’re having in your organization: should customers run their database in containers?

Before we start, I’ll admit that Percona uses containers. Percona Monitoring and Management (PMM for short) presents all of the pretty graphs and query analytics by running in a Docker container. We made that choice because the integration between the components is where we could provide the most value to users. Docker lets us distribute a single ready-to-go unit of awesomeness. In short, it has huge potential on the application side of your environment. 

However, for databases… here are some of our recommendations:

Quick n Dirty

Decision = NOT FOR DBs (as it sits right now)

This is not the case for every environment. It is the default that we think is the best recommendation for the majority of our customers. Please note, that I am only making this recommendation for your database. If you’re using microservices for your application today, then it could make more sense to containerize your database depending on the load characteristics of your database, your scaling needs and the skillset you currently have.

Why?

Lack of Synergy

Before you decide to shoot me, please take some time to understand where we’re coming from. First of all, people designed container solutions to deal with stateless applications that have ephemeral data. Containers spin up a quick microservice and then destroy it. This includes all the components of that container (including its cache and data). The transient nature of containers is because all of the components and services of that container are considered to be part of the container (essentially it’s all or nothing). Serving the container a data volume owned by the underlying OS by punching a hole through the container can be very challenging. Current methods are too unreliable for most databases.

Most of the development efforts put into the various solutions had one goal in mind: statelessness. There are solutions that can help keep your data persistent, but they are very quickly evolving. From what we can tell, they require a high level of complexity, that negate any efficiency gains due to increased operational complexity (and risk). To further my point, this is precisely the conclusion that we’ve come to time and again when we’ve reviewed any “real world” information about the use of containers (especially Docker).

They’re Just Not Stable Yet

These container solutions are meant for quick development and deployment of applications that are broken into tiny components: microservices. Normally, these applications evolve very quickly in organizations that are very software/developer driven. That seems to be how these container solutions (again, especially Docker) are developed as well. New features are pushed out with little testing and design. The main focus seems to be the latest featureset and being first to market. They “beg for forgiveness” instead of “ask for permission.” On top of that, backward compatibility (from what we can tell) is a distant concern (and even that might be an overstatement). This means that you’re going to have to have a mature Continuous Delivery and testing environment as well as a known and tested image repository for your containers.

These are awesome tools to have for the right use cases, but they take time, money, resources and experience. In speaking with many of our customers, this is just not where they’re at as an organization. Their businesses aren’t designed around software development, and they simply don’t have the checkbooks to support the resources needed to keep this hungry machine fed. Rather, they are looking for something stable and performant that can keep their users happy 24×7. I know that we can give them a performant, highly-available environment requires much less management if we strip out containers.

Is There Hope?

Absolutely, in fact, there’s a lot more than hope. There are companies running containers (including databases) at massive scale today! These are the types of companies that have very mature processes. Their software development is a core part of their business plan and value proposition. You probably know who I’m talking about: Uber, Google, Facebook (there are more, these are a just a few). There’s even a good rundown of how you can get persistence in containers from Joyent. But as I said before, the complexity needed to get the basic features necessary to keep your data alive and available (the most basic use of a database) is much too high. When containers have a better and more stable solution for persistent storage volumes, they will be one step closer to being ready, in my opinion. Even then, containerizing databases in most organizations that aren’t dealing with large scale deployments (50+ nodes) with wildly varying workloads is probably unnecessary.

Don’t’ Leave Us Hanging…

I realize that the statement “you’re probably not ready to containerize your database” does not constitute a solution. So here it is: the Solutions Engineering team (SolEng for short) has you covered. Dimitri Vanoverbeke is in the process of a great blog series on configuration management. Configuration management solutions can greatly increase the repeatability of your infrastructure, and make sure that your IT/App Dev processes are repeatable in the physical configuration of your environment. Automating this process can lead to great gains. However, this should make use of a mature development/testing process as part of your application development lifecycle. The marriage of process and technology creates stable applications and happy customers.

Besides configuration management as an enhanced solution, there are some services that can make the life of your operations team much easier. Service discovery and health checking come to mind. My favorite solution is Consul, which we use extensively in PMM for configuration and service metadata. Consul can make sure that your frontend applications and backend infrastructure are working from a real-time snapshot of the state of your services.

Conclusion

There is a lot to think about when it comes to managing an environment, especially when your application develops at a quick pace. With the crafty use of available solutions, you can reduce the overhead that goes into every release. On top of that, you can increase resiliency and availability. If you need our help, please reach out. We’d love to help you!

by Jon Tobin at November 16, 2016 04:27 PM

November 15, 2016

Peter Zaitsev

Webinar Thursday, November 17: MySQL High Availability with Percona XtraDB Cluster 5.7

Percona XtraDB Cluster threading model

MySQL High AvailabilityJoin Percona’s Percona XtraDB Cluster Lead Software Engineer Krunal Bauskar for a webinar on Thursday, November 17, 2016, at 7:30 am PST on MySQL High Availability with Percona XtraDB Cluster 5.7.

Percona XtraDB Cluster 5.7 is our brand new MySQL 5.7 compatible Galera-based high availability (HA) solution. Whether you’re new to MySQL clustering technology, or experienced with Galera-based replication, this tutorial provides great insights into working with the software, including:

  • New and unique Features XtraDB Cluster 5.7, including Cluster Safe Mode, instrumentation with Performance Schema and extended support for encrypted tablespace in multi-master topology
  • Seamless integration with ProxySQL for better HA and read/write splitting
  • Improved security with native data at rest encryption and secure networking
  • Native integration with Docker, optimized for Container World
  • Monitoring with Percona Monitoring and Management (PMM)
  • Improved stability with many critical bug fixes and improved error messaging

This tutorial will demonstrate how to set up XtraDB Cluster, complete with High Availability Proxy and Monitoring, as well as perform the most important MySQL high availability management operations.

Register for this webinar here.

MySQL High AvailabilityKrunal Bauskar, Percona XtraDB Cluster Lead Software Engineer

Krunal joined Percona in September 2015. Before joining Percona, he worked as part of the InnoDB team at MySQL/Oracle. He authored most of the temporary table revamp work, in addition to many other features. In the past, he worked with Yahoo! Labs researching big data issues, as well as working for a database startup that is now part of Teradata. His interests mainly include data management at any scale – which he has been working at for more than decade now.

by Dave Avery at November 15, 2016 06:37 PM

Percona Monitoring and Management 1.0.6 is now available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management 1.0.6 on November 15, 2016.

The instructions for installing or upgrading Percona Monitoring and Management 1.0.6 are available in the documentation. Detailed release notes are available here.

New in PMM Server:

  • Prometheus 1.2.2
  • External static files are now local for PMM home page
  • Metrics Monitor improvements:
    • Added Amazon RDS OS Metrics dashboard and CloudWatch data source.
    • Added the PMM Server host to metrics monitoring.
    • Refactored MongoDB dashboards.
    • Added File Descriptors graph to System Overview dashboard.
    • Added Mountpoint Usage graph to Disk Space dashboard.
  • Query Analytics improvements:
    • QAN data is now purged correctly.
    • QAN data retention is made configurable with QUERIES_RETENTION option. The default is eight days.
    • Various small fixes to Query Analytics.

New in PMM Client:

  • Fixes for mysql:queries service using Performance Schema as query source:
    • Fixed crash when DIGEST_TEXT is NULL.
    • Removed iteration over all query digests on startup.
    • Added sending of query examples to QAN if available (depends on the workload).
  • Added query source information for mysql:queries service in pmm-admin list output.
  • Added purge command to purge metrics data on the server.
  • Updated mongodb_exporter with RocksDB support and various fixes.
  • Removed --nodetype and --replset flags for mongodb:metrics. The --cluster flag is now optional.
    It is recommended to re-add mongodb:metrics service and purge existing MongoDB metrics using the purgecommand.
  • Enabled monitoring of file descriptors (requires re-adding linux:metrics service).
  • Improved full uninstallation when PMM Server is unreachable.
  • Added time drift check between server and client to pmm-admin check-network output.

Live demo of PMM is available at pmmdemo.percona.com.

We welcome your feedback and questions on our PMM forum.

About Percona Monitoring and Management
Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. It is developed by Percona in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

by Alexey Zhebel at November 15, 2016 05:07 PM

Jean-Jerome Schmidt

Announcing NinesControl: helping MongoDB and MySQL developers scale in AWS and DigitalOcean clouds

Today we’re happy to announce our new product, NinesControl. Built on the capabilities of the popular ClusterControl, NinesControl is a database management cloud service that enables developers to easily, uniformly and transparently deploy and manage polyglot databases on any cloud, with no need to install anything.

NinesControl is for developers and admins of all skills levels who do not want to limit themselves to one cloud provider nor use the services that are only available on that cloud. It also removes the complexity and learning curve that typically come with highly-available database clusters. With this initial launch, users of Amazon AWS or DigitalOcean can spin-up MySQL or MongoDB clusters within minutes, with more cloud providers and datastores being added soon.

Sign up for NinesControl (free)

Avoids Cloud Lock-in

NinesControl offers developers an easy way to deploy and operate high-availability database setups in any cloud, giving them the flexibility to utilize or migrate to different cloud vendors as they see fit, thus avoid being locked into using a specific cloud provider.

Ensures Full Database Control

In addition to sidestepping cloud lock-in, the new service provides unified and real-time monitoring of the database and server infrastructure giving access to over 100 collected key databases and host metrics, with custom dashboards providing insight into operational and historic performance.

High Availability

With NinesControl’s self-healing and automatic recovery of MongoDB and MySQL clusters, developers are set to achieve high-availability of their databases.

Vinay Joosery, our Co-Founder and CEO, explains why NinesControl was created, “Cloud database products are usually not equivalent or compatible between vendors, which makes it nearly impossible to migrate. We want to give back to developers the control which is rapidly being taken away by cloud vendors. This means now they really can deploy their databases on any cloud. NinesControl is different from the rest of the market in that they have total control of their data. It does not host the database instances. The database is deployed in the cloud of the user's choice. NinesControl delivers on our vision of ‘Your Database, Any Cloud’. The plan for future releases is to add more technologies and hosting providers, giving users an even better range of choice.”

Related resources:

by Severalnines at November 15, 2016 02:19 PM

Deploying and Monitoring MySQL and MongoDB clusters in the cloud with NinesControl

NinesControl is a new service from Severalnines which helps you deploy MySQL Galera and MongoDB clusters in the cloud. In this blog post we will show you how you can easily deploy and monitor your databases on AWS and DigitalOcean.

Deployment

At the moment of writing, NinesControl supports two cloud providers - Amazon Web Services and DigitalOcean. Before you attempt to deploy, you need first to configure access credentials to the cloud you’d like to run on. We covered this topic in a blog post.

Once it’s done, you should see in the “Cloud Accounts” tab the credentials defined for the chosen cloud provider.

You’ll see screen below as you do not have any clusters running yet:

You can click on “Deploy your first cluster” to start your first deployment. You will be presented with a screen like below - you can pick the cluster type you’d like to deploy, set some configuration settings like port, data directory and password. You can also set number of nodes in the cluster and which database vendor you’d like to use.

For MongoDB, the deployment screen is fairly similar with some additional settings to configure.

Once you are done here, it’s time to move to the second step - picking credentials to use to deploy your cluster. You have an option to pick either DigitalOcean and Amazon Web Services. You can also pick whatever credentials you have added to NinesControl. In our example, we just have a single credential but it’s perfectly ok to have more than one credential per cloud provider.

Once you’ve made your choice, proceed to the third, final step in which you will pick what kind of VM’s you’d like to use. This screen differs between AWS and DigitalOcean.

If you picked AWS, you will have an option to choose the operating system and VM size. You also need to pick the VPC in which you will deploy too and subnet which will be used by your cluster. If you don’t see anything on the drop-down list, you can click on “[Add]” buttons to create both VPC and subnet and NinesControl will create these for you. Finally, you need to set the volume size of the VMs. After that, you can trigger the deployment.

DigitalOcean uses a bit different screen setup but the idea is similar - you need to pick a region, operating system and a size of droplet.

Once you are done, click on “Deploy cluster” to start deployment.

Status of the deployment will be show in the cluster list. You can also click on a status bar to see full log of a deployment. Whenever you’d like to deploy a new cluster, you will have to click on the “Deploy cluster” button.

Monitoring

Once deployment completes, you’ll see a list of your clusters.

When you click on one of them, you’ll see a list of nodes in the cluster and cluster-wide metrics.

Of course, metrics are cluster-dependent. Above is what you will see on a MySQL/MariaDB Galera cluster. MongoDB will present you different graphs and metrics:

When you click on a node, you will be redirected to host statistics of that particular node - CPU, network, disk, RAM usage - all of those very important basics which tell you about node health:

As you can see, NinesControl not only allows you to deploy Galera and MongoDB clusters in a fast and efficient way but it also collects important metrics for you and shows them as graphs.

Give it a try and let us know what you think.

by Severalnines at November 15, 2016 01:58 PM

MariaDB AB

M|17 Call for Papers

M|17 Call for Paperskajarno Tue, 11/15/2016 - 01:00

Please save the date! M|17 is MariaDB’s inaugural annual user conference that will bring together developers, architects, administrators and business people to learn, collaborate and network around the latest innovations at MariaDB and infrastructure modernization.

M|17
April 11 - 12, 2017
The Conrad Hotel | New York City

More details about the conference and community meetup will be coming soon.

M|17 Call for Papers

M|17 will include many sessions dedicated to MariaDB-specific content, but we will also have talks focused on exploring new technologies that reduce costs, and make data management and application development simpler and more efficient.

We are looking for submissions that cover MariaDB as well as exciting new technologies like Docker, Spark, Kafka and Hadoop, that make IT development and operations better, easier and faster. Here are some suggested topic themes to consider:

  • “In real life” success stories using MariaDB technologies: MariaDB Server, MaxScale, ColumnStore
  • Data streaming
  • Securing your data
  • Internet of things
  • Containerization
  • DevOps tools
  • Mobile application development
  • Microservices
  • Building in the cloud
  • Data analytics

All speakers with accepted proposals will receive a full conference pass for M|17.

Ready to participate in a modern infrastructure approach to data management? Submit your speaking proposal for M|17 here by January 10.

Please save the date! M|17 is MariaDB’s inaugural annual user conference that will bring together developers, architects, administrators and business people to learn, collaborate and network around the latest innovations at MariaDB and infrastructure modernization.

Login or Register to post comments

by kajarno at November 15, 2016 06:00 AM

Peter Zaitsev

Using Vault with MySQL

Using Vault with MySQL
Encrypt your secrets and use Vault with MySQL
Using Vault with MySQL

In my previous post I discussed using GPG to secure your database credentials. This relies on a local copy of your MySQL client config, but what if you want to keep the credentials stored safely along with other super secret information? Sure, GPG could still be used, but there must be an easier way to do this.

This post will look at a way to use Vault to store your credentials in a central location and use them to access your database. For those of you that have not yet come across Vault, it is a great way to manage your secrets – securing, storing and tightly controlling access. It has the added benefits of being able to handle leasing, key revocation, key rolling and auditing.

During this blog post we’ll accomplish the following tasks:

  1. Download the necessary software
  2. Get a free SAN certificate to use for Vault’s API and automate certificate renewal
  3. Configure Vault to run under a restricted user and secure access to its files and the API
  4. Create a policy for Vault to provide access control
  5. Enable TLS authentication for Vault and create a self-signed client certificate using OpenSSL to use with our client
  6. Add a new secret to Vault and gain access from a client using TLS authentication
  7. Enable automated, expiring MySQL grants

Before continuing onwards, I should drop in a quick note to say that the following is a quick example to show you how you can get Vault up and running and use it with MySQL, it is not a guide to production setup and does not cover High Availability (HA) implementations, etc.


Download time

We will be using some tools in addition to Vault, Let’s Encrypt, OpenSSL and json_pp (a command line utility using JSON::PP). For this post we’ll be using Ubuntu 16.04 LTS and we’ll presume that these aren’t yet installed.

$ sudo apt-get install letsencrypt openssl libjson-pp-perl

If you haven’t already heard of Let’s Encrypt then it is a free, automated, and open Certificate Authority (CA) enabling you to secure your website or other services without paying for an SSL certificate; you can even create Subject Alternative Name (SAN) certificates to make your life even easier, allowing one certificate to be used a number of different domains. The Electronic Frontier Foundation (EFF) provide Certbot, the recommended tool to manage your certificates, which is the new name for the letsencrypt software. If you don’t have letsencrypt/certbot in your package manager then you should be able to use the quick install method. We’ll be using json_pp to prettify the JSON output from the Vault API and openssl to create a client certificate.

We also need to download Vault, choosing the binary relevant for your Operating System and architecture. At the time of writing this, the latest version of Vault is 0.6.2, so the following steps may need adjusting if you use a different version.

# Download Vault (Linux x86_64), SHA256SUMS and signature
$ wget https://releases.hashicorp.com/vault/0.6.2/vault_0.6.2_linux_amd64.zip
  https://releases.hashicorp.com/vault/0.6.2/vault_0.6.2_SHA256SUMS.sig
  https://releases.hashicorp.com/vault/0.6.2/vault_0.6.2_SHA256SUMS
# Import the GPG key
$ gpg --keyserver pgp.mit.edu --recv-keys 51852D87348FFC4C
# Verify the checksums
$ gpg --verify vault_0.6.2_SHA256SUMS.sig
gpg: assuming signed data in `vault_0.6.2_SHA256SUMS'
gpg: Signature made Thu 06 Oct 2016 02:08:16 BST using RSA key ID 348FFC4C
gpg: Good signature from "HashiCorp Security <security@hashicorp.com>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 91A6 E7F8 5D05 C656 30BE  F189 5185 2D87 348F FC4C
# Verify the download
$ sha256sum --check <(fgrep vault_0.6.2_linux_amd64.zip vault_0.6.2_SHA256SUMS)
vault_0.6.2_linux_amd64.zip: OK
# Extract the binary
$ sudo unzip -j vault_0.6.2_linux_amd64.zip -d /usr/local/bin
Archive:  vault_0.6.2_linux_amd64.zip
  inflating: /usr/local/bin/vault


Let’s Encrypt… why not?

We want to be able to access Vault from wherever we are, we can put additional security in place to prevent unauthorised access, so we need to get ourselves encrypted. The following example shows the setup on a public server, allowing the CA to authenticate your request. More information on different methods can be found in the Certbot documentation.

$ sudo letsencrypt --webroot -w /home/www/vhosts/default/public -d myfirstdomain.com -d myseconddomain.com
#IMPORTANT NOTES:
# - Congratulations! Your certificate and chain have been saved at
#   /etc/letsencrypt/live/myfirstdomain.com/fullchain.pem. Your cert will
#   expire on 2017-01-29. To obtain a new or tweaked version of this
#   certificate in the future, simply run certbot again. To
#   non-interactively renew *all* of your certificates, run "certbot
#   renew"
# - If you like Certbot, please consider supporting our work by:
#
#   Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
#   Donating to EFF:                    https://eff.org/donate-le
#

That’s all it takes to get a SAN SSL certificate! The server that this was executed has a public webserver serving the domains that the certificates were requested for. During the request process a file is place in the specified webroot and is used to authenticate the domain(s) for the request. Essentially, the command said:

myfirstdomain.com and myseconddomain.com use /home/www/vhosts/default/public for the document root, so place your files there

Let’s Encrypt CA issues short-lived certificates (90 days), so you need to keep renewing them, but don’t worry as that is as easy as it was to create them in the first place! You can test that renewal works OK as follows (which will renew all certificates that you have without --dry-run):

$ sudo letsencrypt renew --dry-run
#
#-------------------------------------------------------------------------------
#Processing /etc/letsencrypt/renewal/myfirstdomain.com.conf
#-------------------------------------------------------------------------------
#** DRY RUN: simulating 'letsencrypt renew' close to cert expiry
#**          (The test certificates below have not been saved.)
#
#Congratulations, all renewals succeeded. The following certs have been renewed:
#  /etc/letsencrypt/live/myfirstdomain.com/fullchain.pem (success)
#** DRY RUN: simulating 'letsencrypt renew' close to cert expiry
#**          (The test certificates above have not been saved.)
#
#IMPORTANT NOTES:
# - Your account credentials have been saved in your Certbot
#   configuration directory at /etc/letsencrypt. You should make a
#   secure backup of this folder now. This configuration directory will
#   also contain certificates and private keys obtained by Certbot so
#   making regular backups of this folder is ideal.

Automating renewal

The test run for renewal worked fine, so we can now go and schedule this to take place automatically. I’m using systemd so the following example uses timers, but cron or similar could be used too. Here’s how to make systemd run the scheduled renew for you, running at 0600 – the rewew process will automatically proceed for any previously-obtained certificates that expire in less than 30 days.

$ sudo cat <<EOF > /etc/systemd/system/cert-renewal.service
[Unit]
Description=SSL renewal
[Service]
Type=simple
ExecStart=/usr/bin/letsencrypt renew --quiet
User=root
Group=root
EOF
$ sudo cat <<EOF > /etc/systemd/system/cert-renewal.timer
[Unit]
Description=Automatic SSL renewal
[Timer]
OnCalendar=*-*-* 06:00:00
Persistent=true
[Install]
WantedBy=timers.target
EOF
$ sudo systemctl enable cert-renewal.timer
Created symlink from /etc/systemd/system/timers.target.wants/cert-renewal.timer to /etc/systemd/system/cert-renewal.timer.
$ sudo systemctl start cert-renewal.timer
$ sudo systemctl list-timers
NEXT                         LEFT     LAST                         PASSED UNIT                         ACTIVATES
Tue 2016-11-01 06:00:00 UTC  6h left  n/a                          n/a    cert-renewal.timer           cert-renewal.service


Getting started with Vault

Firstly, a quick reminder that this is not an in-depth review, how-to or necessarily best-practice Vault installation as that is beyond the scope of this post. It is just to get you going to test things out, so please read up on the Vault documentation if you want to use it more seriously.

Whilst there is a development server that you can fire up with the command vault server -dev to get yourself testing a little quicker, we’re going to take a little extra time and configure it ourselves and make the data persistent. Vault supports a number of backends for data storage, including Zookeeper, Amazon S3 and MySQL, however the 3 maintained by HashiCorp are consul, file and inmem. The memory storage backend does not provide persistent data, so whilst there could possibly be uses for this it is really only useful for development and testing – it is the storage backend used with the -dev option to the server command. Rather than tackle the installation and configuration of Consul during this post, we’ll use file storage instead.

Before starting the server we’ll create a config, which can be written in one of 2 formats – HCL (HashiCorp Configuration Language) or JSON (JavaScript Object Notation). We’ll use HCL as it is a little cleaner and saves us a little extra typing!

# Create a system user
$ sudo useradd -r -g daemon -d /usr/local/vault -m -s /sbin/nologin -c "Vault user" vault
$ id vault
uid=998(vault) gid=1(daemon) groups=1(daemon)
# Create a config directory remove global access
$ sudo mkdir /etc/vault /etc/ssl/vault
$ sudo chown vault.root /etc/vault /etc/ssl/vault
$ sudo chmod 750 /etc/vault /etc/ssl/vault
$ sudo chmod 700 /usr/local/vault
# Copy the certficates and key
$ sudo cp -v /etc/letsencrypt/live/myfirstdomain.com/*pem /etc/ssl/vault
/etc/letsencrypt/live/myfirstdomain.com/cert.pem -> /etc/ssl/vault/cert.pem
/etc/letsencrypt/live/myfirstdomain.com/chain.pem -> /etc/ssl/vault/chain.pem
/etc/letsencrypt/live/myfirstdomain.com/fullchain.pem -> /etc/ssl/vault/fullchain.pem
/etc/letsencrypt/live/myfirstdomain.com/privkey.pem -> /etc/ssl/vault/privkey.pem
# Create a combined PEM certificate
$ sudo cat /etc/ssl/vault/{cert,fullchain}.pem /etc/ssl/vault/fullcert.pem
# Write the config to file
$ cat <<EOF | sudo tee /etc/vault/demo.hcl
listener "tcp" {
  address = "10.0.1.10:8200"
  tls_disable = 0
  tls_cert_file = "/etc/ssl/vault/fullcert.pem"
  tls_key_file = "/etc/ssl/vault/privkey.pem"
}
backend "file" {
  path = "/usr/local/vault/data"
}
disable_mlock = true
EOF

So, we’ve now set up a user and some directories to store the config, SSL certificate and key, and also the data, restricting access to the vault user. The config that we wrote specifies that we will use the file backend, storing data in /usr/local/vault/data, and the listener that will be providing TLS encryption using our certificate from Let’s Encrypt. The final setting, disable_mlock is not recommended for production and is being used to avoid some extra configuration during this post. More details about the other options available for configuration can be found in the Server Configuration section of the online documentation.

Please note that the Vault datadir should be kept secured as it contains all of the keys and secrets. In the example, we have done this by placing it in the vault user’s home directory and only allowing the vault user access. You can take this further by restricting local access (via logins) and access control lists

Starting Vault

Time to start the server and see if everything is looking good!

$ sudo -su vault vault server -config=/etc/vault/demo.hcl >/tmp/vault-debug.log 2>&1 &
$ jobs
[1]  + running    sudo -su vault vault server -config=/etc/vault/demo.hcl > /tmp/vault-debug.lo
$ VAULT_ADDR=https://myfirstdomain.com:8200 vault status
Error checking seal status: Error making API request.
URL: GET https://myfirstdomain.com:8200/v1/sys/seal-status
Code: 400. Errors:
* server is not yet initialized

Whilst it looks like something is wrong (we need to initialize the server), it does mean that everything is otherwise working as expected. So, we’ll initialize Vault, which is a pretty simple task, but you do need to make note/store some of the information that you will be given by the server during initialization – the unseal tokens and initial root key. You should distribute these to somewhere safe, but for now we’ll store them with the config.

# Change to vault user
$ sudo su -l vault -s /bin/bash
(vault)$ export VAULT_ADDR=https://myfirstdomain.com:8200 VAULT_SSL=/etc/ssl/vault
# Initialize Vault and save the token and keys
(vault)$ vault init 2>&1 | egrep '^Unseal Key|Initial Root Token' >/etc/vault/keys.txt
(vault)$ chmod 600 /etc/vault/keys.txt
# Unseal Vault
(vault)$ egrep -m3 '^Unseal Key' /etc/vault/keys.txt | cut -f2- -d: | tr -d ' ' |
while read key
do
  vault unseal
    -ca-cert=${VAULT_SSL}/fullchain.pem
    -client-cert=${VAULT_SSL}/client.pem
    -client-key=${VAULT_SSL}/privkey.pem ${key}
done
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 1
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 2
Sealed: false
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
# Check Vault status
(vault)$ vault status
Sealed: false
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
Version: 0.6.2
Cluster Name: vault-cluster-ebbd5ec7
Cluster ID: 61ae8f54-f420-09c1-90bb-60c9fbfa18a2
High-Availability Enabled: false

There we go, the vault is initialized and the status command now returns details and confirmation that it is up and running. It is worth noting here that each time you start Vault it will be sealed, which means that it cannot be accessed until 3 unseal keys have been used with vault unseal – for additional security here you would ensure that a single person cannot know any 3 keys, so that it always requires more than one person to (re)start the service.


Setting up a policy

Policies allow you to set access control restrictions to determine the data that authenticated users have access to. Once again the documents used to write policies are in either the HCL or JSON format. They are easy to write and apply, the only catch being that the policies associated with a token cannot be changed (added/removed) once the token has been issued; you need to revoke the token and apply the new policies. However, If you want to change the policy rules then this can be done on-the-fly as modifications apply on the next call to Vault.

When we initialized the server we were given the initial root key and we now need to use that in order to start configuring the server.

(vault)$ export VAULT_TOKEN=$(egrep '^Initial Root Token:' /etc/vault/keys.txt | cut -f2- -d: | tr -d ' ')

We will create a simple policy that allows us to read the MySQL secrets, but prevent access to the system information and commands

(vault)$ cat <<EOF > /etc/vault/demo-policy.hcl
path "sys/*" {
  policy = "deny"
}
path "secret/mysql/*" {
  policy = "read"
  capabilities = ["list", "sudo"]
}
EOF
(vault)$ vault policy-write demo /etc/vault/demo-policy.hcl
Policy 'demo' written.

We have only added one policy here, but you should really create as many policies as you need to suitably control access amongst the variety of humans and applications that may be using the service. As with any kind of data storage planning how to store your data is important, as it will help you write more compact policies with the level of granularity that you require. Writing everything in /secrets at the top level will most likely bring you headaches, or long policy definitions!


TLS authentication for MySQL secrets

We’re getting close to adding our first secret to Vault, but first of all we need a way to authenticate our access. Vault provides an API for access to your stored secrets, along with wealth of commands with direct use of the vault binary as we are doing at the moment. We will now enable the cert authentication backend, which allows authentication using SSL/TLS client certificates

(vault)$ vault auth-enable cert
Successfully enabled 'cert' at 'cert'!

Generate a client certificate using OpenSSL

The TLS authentication backend accepts certificates that are either signed by a CA or self-signed, so let’s quickly create ourselves a self-signed SSL certificate using openssl to use for authentication.

# Create working directory for SSL managment and copy in the config
$ mkdir ~/.ssl && cd $_
$ cp /usr/lib/ssl/openssl.cnf .
# Create a 4096-bit CA
$ openssl genrsa -des3 -out ca.key 4096
Generating RSA private key, 4096 bit long modulus
...........++
..........................................................................++
e is 65537 (0x10001)
Enter pass phrase for ca.key:
Verifying - Enter pass phrase for ca.key:
$ openssl req -config ./openssl.cnf -new -x509 -days 365 -key ca.key -out ca.crt
Enter pass phrase for ca.key:
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [GB]:
State or Province Name (full name) [Some-State]:
Locality Name (eg, city) [Some-Place]:
Organization Name (eg, company) [Percona]:
Organizational Unit Name (eg, section) [Demo]:
Comon Name (e.g. server FQDN or YOUR name) [ceri]:
Email Address [thisisnotme@myfirstdomain.com]:
# Create a 4096-bit Client Key and CSR
$ openssl genrsa -des3 -out client.key 4096
Generating RSA private key, 4096 bit long modulus
......................++
..................................++
e is 65537 (0x10001)
Enter pass phrase for client.key:
Verifying - Enter pass phrase for client.key:
$ openssl req -config ./openssl.cnf -new -key client.key -out client.csr
Enter pass phrase for client.key:
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [GB]:
State or Province Name (full name) [Some-State]:
Locality Name (eg, city) [Some-Place]:
Organization Name (eg, company) [Percona]:
Organizational Unit Name (eg, section) [Demo]:
Comon Name (e.g. server FQDN or YOUR name) [ceri]:
Email Address [thisisnotme@myfirstdomain.com]:
Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
# Self-sign
$ openssl x509 -req -days 365 -in client.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out client.crt
Signature ok
subject=/C=GB/ST=Some-State/L=Some-Place/O=Percona/OU=Demo/CN=ceri/emailAddress=thisisnotme@myfirstdomain.com
Getting CA Private Key
Enter pass phrase for ca.key:
# Create an unencrypted copy of the client key
$ openssl rsa -in client.key -out privkey.pem
Enter pass phrase for client.key:
writing RSA key
# Copy the certificate for Vault access
$ sudo cp client.crt /etc/ssl/vault/user.pem

OK, there was quite a lot of information there. You can edit openssl.cnf to set reasonable defaults for yourself and save time. In brief, we have created our own CA, created a self-signed certificate and then created a single PEM certificate with a decrypted key (this avoids specifying the password to use it – you may wish to leave the password in place to add more security, assuming that your client application can request the password.

Adding an authorisation certificate to Vault

Now that we have created a certificate and a policy we now need to allow authentication to occur using the certificate. We will give the token a 1-hour expiration and allow access to the MySQL secrets via the demo policy that we created in the previous step.

(vault)$ vault write auth/cert/certs/demo
    display_name=demo
    policies=demo
    certificate=@${VAULT_SSL}/user.pem
    ttl=3600
Success! Data written to: auth/cert/certs/demo
$ curl --cert user.pem --key privkey.pem ${VAULT_ADDR}/v1/auth/cert/login -X POST
{"request_id":"d5715ce1-2c6c-20c8-83ef-ce6259ad9110","lease_id":"","renewable":false,"lease_duration":0,"data":null,"wrap_info":null,"warnings":null,"auth":{"client_token":"e3b98fac-2676-9f44-fdc2-41114360d2fd","accessor":"4c5b4eb5-4faf-0b01-b732-39d309afd216","policies":["default","demo"],"metadata":{"authority_key_id":"","cert_name":"demo","common_name":"thisisnotme@myfirstdomain.com","subject_key_id":""},"lease_duration":600,"renewable":true}}

Awesome! We requested out first client token using an SSL client certificate, we are logged it and we were given our access token (client_token) in the response that provides us with a 1 hour lease (lease_duration) to go ahead and make requests as a client without reauthentication, but there is nothing in the vault right now.


Ssshh!! It’s secret!

“The time has come,” the Vault master said, “to encrypt many things: our keys and passwords and top-secret notes, our MySQL DSNs and strings.”

Perhaps the easiest way to use Vault with your application is to store information there as you would do in a configuration file and read it when the application first requires it. An example of such information is the Data Source Name (DSN) for a MySQL connection, or perhaps the information needed to dynamically generate a .my.cnf. As this is about using Vault with MySQL we will do exactly that and store the user, password and connection method as our first secret, reading it back using the command line tool to check that it looks as expected.

(vault)$ $ vault write secret/mysql/test password="mysupersecretpassword" user="percona" socket="/var/run/mysqld/mysqld.sock"
Success! Data written to: secret/mysql/test
(vault)$ vault read secret/mysql/test
Key                     Value
---                     -----
refresh_interval        768h0m0s
password                mysupersecretpassword
socket                  /var/run/mysqld/mysqld.sock
user                    percona

A little while back (hopefully less than 1 hour ago!) we authenticated using cURL and gained a token, so now that we have something secret to read we can try it out. Fanfares and trumpets at the ready…

$ curl --cert user.pem --key privkey.pem -H 'Content-type: application/json' -H 'X-Vault-Token: 2f1fb630-cbe9-a8c9-5931-515a12d79291' ${VAULT_ADDR}/v1/secret/mysql/test -X GET 2>/dev/null | json_pp
{
   "wrap_info" : null,
   "lease_id" : "",
   "request_id" : "c79033b1-f8f7-be89-4208-44d721a55804",
   "auth" : null,
   "data" : {
      "password" : "mysupersecretpassword",
      "socket" : "/var/run/mysqld/mysqld.sock",
      "user" : "percona"
   },
   "lease_duration" : 2764800,
   "renewable" : false,
   "warnings" : null
}

We did it! Now there is no longer the need to store passwords in your code or config files, you can just go and get them from Vault when you need them, such as when your application starts and holding them in memory, or on-demand if your application can tolerate any additional latency, etc. You would need to take further steps to make sure that your application is tolerant of Vault going down, as well as providing an HA setup of Vault to minimise the risk of the secrets being unavailable.

It doesn’t stop here though…


On-demand MySQL grants

Vault acts like a virtual filesystem and uses the generic storage backend by default, mounted as /secret, but due to powerful abstraction it is possible to use many other backends as mountpoints such as an SQL database, AWS IAM, HSMs and much more. We have kept things simple and been using the generic backend so far. You can view the available (mounted) backends using the mounts command:

(vault)$ vault mounts
Path        Type       Default TTL  Max TTL  Description
secret/     generic    system       system   generic secret storage
sys/        system     n/a          n/a      system endpoints used for control, policy and debugging

We are now going to enable the MySQL backend, add the management connection (which will use the auth_socket plugin) and then request a new MySQL user that will auto-expire!

# Create a dedicated MySQL user account
$ mysql -Bsse "CREATE USER vault@localhost IDENTIFIED WITH auth_socket; GRANT CREATE USER, SELECT, INSERT, UPDATE ON *.* TO vault@localhost WITH GRANT OPTION;"
# Enable the MySQL backend and set the connection details
(vault)$ vault mount mysql
(vault)$ vault write mysql/config/connection connection_url="vault:vault@unix(/var/run/mysqld/mysqld.sock)/"
Read access to this endpoint should be controlled via ACLs as it will return the connection URL as it is, including passwords, if any.
# Write the template for the readonly role
(vault)$ vault write mysql/roles/readonly
 sql="CREATE USER '{{name}}'@'%' IDENTIFIED WITH mysql_native_password BY '{{password}}' PASSWORD EXPIRE INTERVAL 1 DAY; GRANT SELECT ON *.* TO '{{name}}'@'%';"
Success! Data written to: mysql/roles/readonly
# Set the lease on MySQL grants
(vault)$ vault write mysql/config/lease lease=1h lease_max=12h
Success! Data written to: mysql/config/lease

Here you can see that a template is created so that you can customise the grants per role. We created a readonly role, so it just has SELECT access. We have set an expiration on the account so that MySQL will automatically mark the password as expired and prevent access. This is not strictly necessary since Vault will remove the user accounts that it created as it expires the tokens, but by adding an extra level in MySQL it would allow you to set the lease, which seems to be global, in Vault to a little longer than required and vary it by role using MySQL password expiration. You could also use it as a way of tracking which Vault-generated MySQL accounts are going to expire soon. The important part is that you ensure that the application is tolerant of reauthentication, whether it would hand off work whilst doing so, accept added latency, or perhaps the process would terminate and respawn.

Now we will authenticate and request our user to connect to the database with.

$ curl --cert user.pem --key privkey.pem -H 'Content-type: application/json' ${VAULT_ADDR}/v1/auth/cert/login -X POST 2>/dev/null | json_pp
{
   "auth" : {
      "policies" : [
         "default",
         "demo"
      ],
      "accessor" : "2e6d4b95-3bf5-f459-cd27-f9e35b9bed16",
      "renewable" : true,
      "lease_duration" : 3600,
      "metadata" : {
         "common_name" : "thisisnotme@myfirstdomain.com",
         "cert_name" : "demo",
         "authority_key_id" : "",
         "subject_key_id" : ""
      },
      "client_token" : "018e6feb-65c4-49f2-ae30-e4fbba81e687"
   },
   "lease_id" : "",
   "wrap_info" : null,
   "renewable" : false,
   "data" : null,
   "request_id" : "f00fe669-4382-3f33-23ae-73cec0d02f39",
   "warnings" : null,
   "lease_duration" : 0
}
$ curl --cert user.pem --key privkey.pem -H 'Content-type: application/json' -H 'X-Vault-Token: 018e6feb-65c4-49f2-ae30-e4fbba81e687' ${VAULT_ADDR}/v1/mysql/creds/readonly -X GET 2>/dev/null | json_pp
{
   "errors" : [
      "permission denied"
   ]
}

Oh, what happened? Well, remember the policy that we created earlier? We hadn’t allowed access to the MySQL role generator, so we need to update and apply the policy.

(vault)$ cat <<EOF | vault policy-write demo /dev/stdin
path "sys/*" {
  policy = "deny"
}
path "secret/mysql/*" {
  policy = "read"
  capabilities = ["list", "sudo"]
}
path "mysql/creds/readonly" {
  policy = "read"
  capabilities = ["list", "sudo"]
}
EOF
Policy 'demo' written.

Now that we have updated the policy to allow access to the readonly role (requests go via mysql/creds when requesting access) we can check that the policy has applied and whether we get a user account for MySQL.

# Request a user account
$ curl --cert user.pem --key privkey.pem -H 'Content-type: application/json' -H 'X-Vault-Token: 018e6feb-65c4-49f2-ae30-e4fbba81e687' ${VAULT_ADDR}/v1/mysql/creds/readonly -X GET 2>/dev/null | json_pp
{
   "request_id" : "7b45c9a1-bc46-f410-7af2-18c8e91f43de",
   "lease_id" : "mysql/creds/readonly/c661426c-c739-5bdb-cb7a-f51f74e16634",
   "warnings" : null,
   "lease_duration" : 3600,
   "data" : {
      "password" : "099c8f2e-588d-80be-1e4c-3c2e20756ab4",
      "username" : "read-cert-401f2c"
   },
   "wrap_info" : null,
   "renewable" : true,
   "auth" : null
}
# Test MySQL access
$ mysql -h localhost -u read-cert-401f2c -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 17
Server version: 5.7.14-8-log Percona Server (GPL), Release '8', Revision '1f84ccd'
Copyright (c) 2009-2016 Percona LLC and/or its affiliates
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> show grants;
+-----------------------------------------------+
| Grants for read-cert-401f2c@%                 |
+-----------------------------------------------+
| GRANT SELECT ON *.* TO 'read-cert-401f2c'@'%' |
+-----------------------------------------------+
1 row in set (0.00 sec)
# Display the full account information
$ pt-show-grants --only='read-cert-401f2c'@'%'
-- Grants dumped by pt-show-grants
-- Dumped from server Localhost via UNIX socket, MySQL 5.7.14-8-log at 2016-11-08 23:28:37
-- Grants for 'read-cert-401f2c'@'%'
CREATE USER IF NOT EXISTS 'read-cert-401f2c'@'%';
ALTER USER 'read-cert-401f2c'@'%' IDENTIFIED WITH 'mysql_native_password' AS '*FF157E33408E1FBE707B5FF89C87A2D14E8430C2' REQUIRE NONE PASSWORD EXPIRE INTERVAL 1 DAY ACCOUNT UNLOCK;
GRANT SELECT ON *.* TO 'read-cert-401f2c'@'%';

Hurrah! Now we don’t even need to go and create a user, the application can get one when it needs one. We’ve made the account auto-expire so that the credentials are only valid for 1 day, regardless of Vault expiration, and also we’ve reduced the amount of time that the token is valid, so we’ve done a pretty good job of limiting the window of opportunity for any rogue activity


We’ve covered quite a lot in this post, some detail for which has been left out to keep us on track. The online documentation for OpenSSL, Let’s Encrypt and Vault are pretty good, so you should be able to take a deeper dive should you wish to. Hopefully, this post has given a good enough introduction to Vault to get you interested and looking to test it out, as well as bringing the great Let’s Encrypt service to your attention so that there’s very little reason to not provide a secure online experience for your readers, customers and services.

by Ceri Williams at November 15, 2016 12:31 AM

November 14, 2016

Peter Zaitsev

MongoDB Through a MySQL Lens

MongoDB and MySQL

MongoDB and MySQL

This blog post looks at MongoDB and MySQL, and covers high-level MongoDB strengths, weaknesses, features, and uses from the perspective of an SQL user.

Delving into NoSQL coming from an exclusively SQL background can seem like a daunting task. I have worked with SQL in both small MySQL environments and large Oracle SQL environments. 

When is it a good choice?

MongoDB is an incredibly robust, scalable, and operator-friendly database solution. MongoDB is a good choice when your developers will also be responsible for the database environment. In small shops and startups, this might be the case. MongoDB stores information in BSON (binary JSON). BSON is the native JSON (JavaScript Object Notation) language used by MongoDB to retrieve information stored in BSON on the back end. JSON is easily relatable to other programming languages, and many developers will already have experience with it.

MongoDB is also a good option when you expect a great deal of your traffic to be writes. This is not to say that MySQL does not have good options when dealing with write-heavy environments, but MongoDB handles this with relative ease. Facebook designed the RocksDB storage engine for write-heavy environments, and performs well (with benchmark testing demonstrating this).

MongoDB is a good choice when you need a schemaless, or schema-flexible, data structure. MongoDB handles changes to your data organization with relative ease and grace. This is the selling point of NoSQL solutions. There have been many improvements in the MySQL world that make online schema changes possible, but the ease at which this is done in MongoDB has not yet been matched. The ability to create records without defining structure gives MongoDB added flexibility.

Another reason to choose MongoDB is its functionality with regards to replication setup, built-in sharding, and auto elections. Setting up a replicated environment in MongoDB is easy, and the auto-election process allows a secondary to take over in the event of a primary database failure. Built-in sharding allows for easy horizontal scaling, which can be more complicated to manage, setup and configure in a MySQL environment.

When should you choose something else?

MongoDB is a great choice for some use cases. It is also not a great choice for others. MongoDB might not be the right choice when your data is highly relational and structured. MongoDB does not support  transactions, but on a document level there is atomicity. There are configuration considerations to make for a replicated environment with regards to write concern, but these come at the cost of performance. Write concern verifies that replicas have written the information. By default, MongoDB sets the write concern to request acknowledgment from the primary only, not replicas. This can lead to consistency issues if there is a problem with the replicas.

How is the structure different?

Many concepts in the SQL world are relatable to the document structure of MongoDB. Let’s take a look at the high-level structure of a simple MongoDB environment to better understand how MongoDB is laid out.

The below chart relates MySQL to MongoDB (which is found in MongoDB’s documentation).

2016-11-10-09_34_06-mongodb-word


Another interesting note is the mongod process. This is a daemon that processes data requests, much the same as the mysqld process for MySQL. This is the process that listens for MongoDB requests, and manages access to the database. As with MySQL, there are a number of
start-up options for the mongod process. One of the most important configuration options is

--config
 which specifies a config file to use for your mongod instance. Slightly different from MySQL, this file uses YAML formatting. Below is an example config file for MongoDB. Please note this is to demonstrate formatting. It isn’t optimized for any production database.

By definition, MongoDB is a document store database. This chart gives you some idea of how that relates to the structure of MySQL or any SQL flavor. Instead of building a table and adding data, you can immediately insert documents into a collection without having to define a structure. This is one of the advantages in flexibility that MongoDB offers over MySQL. It is important to note that just because MongoDB offers this flexibility does not mean that organizing a highly functional production MongoDB database is effortless. Similar to choosing any database, thought should be put into the structure and goal of the database to avoid pitfalls down the line.

# mongod.conf, Percona Server for MongoDB
# for documentation of all options, see:
#   http://docs.mongodb.org/manual/reference/configuration-options/
# Where and how to store data.
storage:
    dbPath: /var/lib/mongodb
    journal:
        enabled: true
    engine: rocksdb
# where to write logging data.
systemLog:
    destination: file
    logAppend: true
    path: /var/log/mongodb/mongod.log
    processManagement:
        fork: true
        pidFilePath: /var/run/mongod.pid
# network interfaces
    net:
        port: 27017
        bindIp: 127.0.0.1

NOTE: YAML formatting does not handle tab. Use spaces to indent.

How is querying different?

Interacting with the database via the shell also offers something slightly different from SQL. JSON queries MongoDB. Again, this should be familiar to web developers which is one of the appeals of using MongoDB. Below is an example of a query translated from SQL to MongoDB. We have a user table with just usernames and an associated ID.

In SQL:

select username from user where id = 2;

In MongoDB:

db.user.find({_id:2},{“username”:1})

In the JSON format, we specify the user collection to query from and then the ID associated with the document we are interested in. Finally, the field is specified from which we want the value. The result of this query would be the username of the user that has an ID of 2.

Final thoughts

MongoDB is not a silver bullet to your MySQL woes. As both databases continue to evolve, their weaknesses and strengths slowly start to blend together. Do not let the flexibility of MongoDB’s structure fool you into thinking that you must not have a plan for your database environment. This will surely lead to headaches down the road. The flexibility should allow for dynamic, fast changes not thoughtless ones. I encourage any MySQL user to get their hands on a MongoDB instance for testing purposes. MongoDB is a popular option in the e-commerce and gaming world because of its flexibility in schema design and its ability to scale horizontally with large amounts of data.

by Barrett Chambers at November 14, 2016 07:56 PM

November 12, 2016

Valeriy Kravchuk

MySQL Support Engineer's Chronicles, Issue #3

The original idea of this series was to publish one post per week, but it seems every other week I have some special topic that well deserves a dedicated post. Last week I had no time to complete my writing because of long (and, I hope, useful) Howto post on replacing corrupted partition using non-corrupted one from other server in replication setup. But I had links and notes collected in a draft that I am going to complete now.

First of all, during the previous week I had time to submit two more talks for the  "MySQL and Friends Devroom" at FOSDEM 2017, "Applying profilers to MySQL" and "Background Story of a MySQL Bug". Call for papers is still open, as far as I understand, so I may come up with few more ideas on what to talk about.

Strange to admit this, but sometimes I can be excited with something MySQL-related. Two weeks ago I've added a note to myself about great, detailed changelogs that MariaDB publishes, with proper links to GitHub commits. Check this one for 10.0.28, for example. I wish Oracle provides the same level of details for MySQL releases as a part of their official release notes.

Still, sometimes important changes happen in upstream MySQL, get merged, and details about inherited incompatible change (and its real impact) are still missing in any release notes of any kind. Let's consider recent example. Historically MySQL treated incorrect utf8 bytes sequences differently for INSERT than for LOAD DATA. INSERT failed, LOAD DATA just had the value truncated on the first incorrect character and continued. Eventually (in MySQL 5.6.32) this was fixed by upstream MySQL (it was also fixed in a wider context in MariaDB 10.2 in frames of MDEV-9823). MySQL 5.6.32 release notes says about the incompatible change:
  • "Incompatible Change: For multibyte character sets, LOAD DATA could fail to allocate space correctly and ignore input rows as a result. (Bug #76237, Bug #20683959, Bug #23080148)
    References: This issue is a regression of: Bug #14653594."
But it surely says nothing about the impact for replication or another Bug #78758 that is fixed in 5.6.32 (by making LOAD DATA fail with error). It costed me some time to figure out all the details. Read MDEV-11217 for the historical details, nice test case and different views on the decisions made. Note also that now error message about the bad utf8 character from LOAD DATA looks (IMHO) weird enough, as it contains actually only a valid part of the string. See my MDEV-11216 about this.

I am still having fun with InnoDB locks. This week I checked what locks are set when the same DELETE (for the table with the primary key and unique secondary index) is executed twice in the same transaction. Check Bug #83640 and tell me how this sequence of locks set in one transaction may make any sense:
---TRANSACTION 636201, ACTIVE 202 sec
5 lock struct(s), heap size 1184, 4 row lock(s), undo log entries 1
MySQL thread id 1, OS thread handle 0x7f9e513a7700, query id 92 localhost root init
show engine innodb status
TABLE LOCK table `test`.`tu` trx id 636201 lock mode IX
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gap
RECORD LOCKS space id 11 page no 3 n bits 72 index `PRIMARY` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gap
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks gap before rec
My colleague Jan Lindström was also surprised, so we have MDEV-11215 as well, and a chance to see this studies and maybe changed by MariaDB engineers. Related problems were discussed in the past, see Bug #19762 and Bug #55717.

Some days I keep wondering what happens to XtraBackup these days in Percona. As far as I remember I was not able to reproduce lp:1461833 while working there, but the fact that the bug is still open and got no comments since I re-opened it gives me no additional confidence.

I report bugs and missing details in MySQL way too often even for my own liking. But MySQL manual really misses many details to explain results that users see in production. This week I'd like to remind about one of my bug reports about missing details in MySQL documentation, Bug #77390, and my request there:
"Please, explain all metadata and InnoDB locks set by online ALTER, with examples and details enough to explain non-trivial cases..."
Honestly, until this month I never noted that DDL log exists in MySQL. This is a ddl_log.log binary file that can be "dumped" into a somewhat readable form using a script by Mattias Jonsson from Bug #47343:
[openxs@fc23 5.7]$ perl ~/ddl_log_dump.pl data/ddl_log.log
Header: Num entries: 4 Name length: 512 Block length 4096
Entry 1 type i action s next 0
  name ./test/trange2
not 'e' entry (i)
Entry 2 type l action d next 0
  name ./test/#sql-trange2
not 'e' entry (l)
Entry 3 type l action d next 2
  name ./test/trange2#P#pmax
not 'e' entry (l)

This file may grow until MySQL server restart completes, but what's worse, when it grows over 4GB in size it becomes unusable and effectively blocks any further concurrent DDL until we get rid of it. I had a lot of fun reading the code and reporting related Bug #83708. Unlucky users who do a lot of partitioning-related DDL may find the situation less funny when they hit this bug.


I plan to describe what I had to work on this week soon, while I still remember all the relevant details and feelings. So, stay tuned!

by Valeriy Kravchuk (noreply@blogger.com) at November 12, 2016 05:10 PM

November 11, 2016

Peter Zaitsev

Amazon AWS Service Tiers

Amazon AWS Service TiersThis blog post discusses the differences between the Amazon AWS service tiers.

Many people want to move to an Amazon environment but are unsure what AWS service makes the most sense (EC2, RDS, Aurora). For database services, the tiering at Amazon starts with EC2, then moves up to RDS, and on to Aurora. Amazon takes on more of the implementation and management of the database As you move up the tiers. This limits the optimization options. Obviously, moving up the tiers increases basic costs, but there are tradeoffs at each level to consider.

  • EC2 (Elastic Compute Cloud) is a basic cloud platform. It provides the user with complete control of the compute environment, while reducing your need to monitor and manage hardware. From a database perspective, you can do almost anything in EC2 that you could do running a database on your own hardware. You can tweak OS and database settings, plus do all of the normal database optimization work you would do in a bare metal environment. In EC2, you can run a single server, master/slave, or a cluster, and you can use MySQL, MongoDB, or any other product. You can use AWS Snapshot Manager to take backups, or you can use another backup tool. This option is ideal if you want all the flexibility of running your own hardware without the hassles of daily hardware maintenance.
  • RDS (Relational Data Service) makes it easy to set up a relational database in the cloud. It offers similar resizing capabilities to EC2, but also automates a lot of tasks. RDS supports Aurora (more on that later), Postgres, MySQL, MariaDB, Oracle, and MSSQL. RDS simplifies deployment and automates some maintenance tasks. This means that you are limited in terms of the tweaks that you can implement at the OS and database configuration level. This means you will focus on query and schema changes to optimize a database in this environment. RDS also includes automated backups and provides options for read replicas that you can spread across multiple availability zones. You must consider and manage all these are all items in the EC2 world. This choice is great if you are looking to implement a database but don’t want (or know how) to take on a lot of the tasks, such as backups and replication setup, that are needed for a stable and highly available environment.
  • Aurora is one of the database options available through RDS. You might hear people refer to it either as Aurora or RDS Aurora (they’re both the same). With Aurora, Amazon takes on even more of the configuration and management options. This limits your optimization capabilities even more. It also means that there are far fewer things to worry about since Amazon handles so much of the administration. Aurora is MySQL-compatible, and is great if you want the power and convenience of MySQL with a minimum of effort on the hardware side. Aurora is designed to automatically detect database crashes and restart without the need for crash recovery or to rebuild the database cache. If the entire instance fails, Aurora will automatically failover to one of up to 15 read replicas.

With data in the cloud, security becomes a bigger concern. You continue to govern access to your content, platform, applications, systems ,and networks, just like you would with data stored in your own datacenter. Amazon’s cloud offerings also support highly secure environments, like HIPAA and PCI compliance. They have designed the cloud environment to be a secure database environment while maintaining the necessary access for use and administration, even in these more regulated environments.

Storing data in the cloud is becoming more common. Amazon offers multiple platform options and allows for easy scalability, availability, and reliability.

by Rick Golba at November 11, 2016 08:33 PM

Is Your Query Cache Really Disabled?

Query Cache

This blog post was motivated by an internal discussion about how to fully disable query cache in MySQL.

According to the manual, we should be able to disable “Query Cache” on the fly by changing

query_cache_type
 to 0, but as we will show this is not fully true. This blog will show you how to properly disable “query cache,” and how common practices might not be as good as we think.

Can we just disable it by changing variables, or does it requires a restart to avoid the global mutex? Let’s see how it works.

Some Query Cache context

The query cache stores the text of a “Select” statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client.

But cacheable queries take out an “exclusive lock” on MySQL’s query cache. In addition, any insert, update, delete or other modifications to a table causes any relevant entries in the query cache to be flushed. If you see many “Waiting for query cache lock” in the processlist, you might be suffering from this exclusive lock. In this blog post, you can see how this global mutex in high concurrency can cause performance degradation.

If we are facing with this situation, how can we disable it?

Disabling Query Cache

There are two options that you can change:

query_cache_type
 and query_cache_size.

So if we change

query_cache_size
 to “0”, does it means the cache is disabled? Or we also have to change
query_cache_type
? Or both? And does MySQL require a restart to avoid the global mutex?

The source code shows us this:

int Query_cache::send_result_to_client(THD *thd, const LEX_CSTRING &sql)
{
  ulonglong engine_data;
  Query_cache_query *query;
#ifndef EMBEDDED_LIBRARY
  Query_cache_block *first_result_block;
#endif
  Query_cache_block *result_block;
  Query_cache_block_table *block_table, *block_table_end;
  char *cache_key= NULL;
  size_t tot_length;
  Query_cache_query_flags flags;
  DBUG_ENTER("Query_cache::send_result_to_client");
  /*
    Testing 'query_cache_size' without a lock here is safe: the thing
    we may loose is that the query won't be served from cache, but we
    save on mutex locking in the case when query cache is disabled.
    See also a note on double-check locking usage above.
  */
  if (is_disabled() || thd->locked_tables_mode ||
      thd->variables.query_cache_type == 0 || query_cache_size == 0)
    goto err;
...

MySQL is going to check if the query cache is enabled before it locks it. It is checking four conditions, and one of them has to be true. The last three could be obvious, but what is the “is_disabled()” function? Following the source code, we can find the next: sql_cache.h

void disable_query_cache(void) { m_query_cache_is_disabled= TRUE; }
...
bool is_disabled(void) { return m_query_cache_is_disabled; }

sql_cache.cc

void Query_cache::init()
{
  DBUG_ENTER("Query_cache::init");
  mysql_mutex_init(key_structure_guard_mutex,
                   &structure_guard_mutex, MY_MUTEX_INIT_FAST);
  mysql_cond_init(key_COND_cache_status_changed,
                  &COND_cache_status_changed);
  m_cache_lock_status= Query_cache::UNLOCKED;
  initialized = 1;
  /*
    If we explicitly turn off query cache from the command line query cache will
    be disabled for the reminder of the server life time. This is because we
    want to avoid locking the QC specific mutex if query cache isn't going to
    be used.
  */
  if (global_system_variables.query_cache_type == 0)
    query_cache.disable_query_cache();
  DBUG_VOID_RETURN;
}

If the

global_system_variables.query_cache_type == 0
 condition is true it is going to call the 
disable_query_cache
  function, which sets
m_query_cache_is_disabled = True
, so
is_disabled
 going to be “True”. That means if we are setting
query_cache_type
 to 0 in runtime, that should eliminate the global mutex. Let’s run some tests to confirm this and see if the global mutex disappears after changing
query_cache_type
 to 0.

Running tests

Context on the tests:

  1. We ran simple OLTP tests using sysbench as follows:

sysbench --test="/usr/share/doc/sysbench/tests/db/oltp.lua" --report-interval=1 --max-time=120 --oltp-read-only=off --max-requests=0 --num-threads=4 --oltp-table-size=2000000 --mysql-host=localhost --mysql-db=test --db-driver=mysql --mysql-user=root run

  1. Important portion of my.cnf file:

query_cache_type =1
query_cache_limit = 1M
query_cache_size =1G
performance_schema_instrument='wait/synch/%Query_cache%=COUNTED'

Disable the Query Cache

So basically the tests were run for two minutes each while playing with

query_cache_type
 and
query_cache_size
.

  1. Started MySQL with
    query_cache_type = 1
     and
    query_cache_size=1G
    .
  2. Change
    query_cache_type
     to 0. As we can see nothing changed, MySQL is still using the query cache.
  3. But when we stopped sysbench and started again (closing and opening new connections), we can see there are no more inserts going into query cache. But we still can see the queries like “Not Cached” that means changing the
    query_cache_type
     applies only for the new connections, and we still can see some mutex.
  4. Restarted MySQL with
    query_cache_type = 0
     and
    query_cache_size=0
    . Finally we disabled the query cache and all the mutex is disappeared.
  5. Restarted MySQL with query cache enabled.
  6. We changed
    query_cache_size=0
     and it almost worked, we could disable query cache on the fly, but as we can see there is still some mutex activity.
  7. Changing
    query_cache_type=0
     and restarting sysbench does not have any effect on the mutex.

So the only way to stop any activity around query cache requires restarting MySQL with

query_cache_type = 0
  and
query_cache_size=0
. Disabling it or even set it to “0” on runtime is not completely stopping mutex activity.

But why do we still need

query_cache_size
 while in theory
query_cache_type
 should be enough?

As referenced above, the manual says if query_cache_type = 0:

Do not cache results in or retrieve results from the query cache. Note that this does not deallocate the query cache buffer. To do that, you should set query_cache_size to 0.

Based on our test, if we change

query_cache_type
 to 0, it still hits the cache.

So you might think “well, I don’t enable the query cache and use defaults to keep it disabled.” Keep reading, because you might be wrong. According to manual, starting from 5.6.8

query_cache_type=0
 is set by default, but
query_cache_size= 1048576
  (1MB). This means that if we keep default configuration, we will still see activity in the query cache as follows:

mysql -e "show global status like 'qca%';"
+-------------------------+---------+
| Variable_name | Value |
+-------------------------+---------+
| Qcache_free_blocks | 1 |
| Qcache_free_memory | 1031320 |
| Qcache_hits | 0 |
| Qcache_inserts | 0 |
| Qcache_lowmem_prunes | 0 |
| Qcache_not_cached | 423294 |
| Qcache_queries_in_cache | 0 |
| Qcache_total_blocks | 1 |
+-------------------------+---------+

But if we just add

query_cache_size=0
  to my.cnf and check again (of course after restarting server):

mysql -e "show global status like 'qca%';"
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Qcache_free_blocks | 0 |
| Qcache_free_memory | 0 |
| Qcache_hits | 0 |
| Qcache_inserts | 0 |
| Qcache_lowmem_prunes | 0 |
| Qcache_not_cached | 0 |
| Qcache_queries_in_cache | 0 |
| Qcache_total_blocks | 0 |
+-------------------------+-------+

We finally get no query cache related activity at all. How much overhead is caused by this? We’re not fully sure because we didn’t perform benchmarks, but we like to see no activity when we don’t want to.
Now we’re wondering if this case requires a bug report. Stay tuned, we will publish results in the post soon.

Digging more code

Let’s have a look on store_query function. MySQL uses this function to store queries in the query cache. If we read the code we can find this:

if (thd->locked_tables_mode || query_cache_size == 0)
    DBUG_VOID_RETURN;

It only checks the

query_cache_size
, it does not check the type.
Store_query
 is called in handle_query, which also does not check the
query_chache_type
.

Conclusion

There is some contradiction between checking the query cache and storing the data in the query cache, which needs further investigation. But as we can see it is not possible to fully disable the query cache on the fly by changing

query_cache_type
  or/and
query_cache_size
 to 0. Based on the code and the tests, if you want to make sure the query cache is fully disabled, change
query_cache_size
 and
query_cache_type
 to 0 and restart MySQL.

Is a known fact that query cache can be a big point of contention, and we are not trying to benchmark the performance overhead since this mostly depends on the workload type. However, we still can see some overhead if the query cache is not fully disabled when MySQL is started.

by Tibor Korocz at November 11, 2016 06:46 PM

November 10, 2016

Peter Zaitsev

Thoughts About Column Compression, with Optional Predefined Dictionary

column compression

column compressionThis blog discusses column compression with an optional predefined dictionary.

Compression, more compression with different algorithms, compress again, compress multiple times! 🙂 Compression is a hot topic in our lives.

In general, testing new things is great if the processes are well-described and easy to follow. Let’s try to think like a QA engineer: the first golden rule of QA is “everything is buggy, life is full of bugs: good night bugs, good morning bugs, hello my old bug friends.”

The second golden rule of QA is “OK, now let’s find a way to catch a bug — but remember that your methods can be buggy, too.”

Remember: always test! No bugs, no happiness!

When you start to test, the first goal is getting an idea of what is going on. This blog will demonstrate a test scenario for column compression with an optional predefined dictionary. For reference on column compression, read Compressed columns with dictionaries.”

To begin, let’s set up a basic environment:
The installation process requires installing Percona Server which is already documented here -> PS 5.6 installation

Secondly, find an already existing test: xtradb_compressed_columns_ibd_sizes.test.

Third, write a simple script to get started:

import mysql.connector
cnx = mysql.connector.connect(user='msandbox', password='msandbox',
                              host='127.0.0.1',
                              database='dbtest',
                              port=22896,
                              autocommit=True)
cursor = cnx.cursor()
crt_comp_dic = "CREATE COMPRESSION_DICTIONARY names2 ('Bartholomew')"
cursor.execute(crt_comp_dic)
table_t1 = "CREATE TABLE t1(id INT,a BLOB) ENGINE=InnoDB"
table_t2 = "CREATE TABLE t2(id INT,a BLOB COLUMN_FORMAT COMPRESSED) ENGINE=InnoDB"
table_t3 = "CREATE TABLE t3(id INT,a BLOB COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY names) ENGINE=InnoDB"
cursor.execute(table_t1);
cursor.execute(table_t2);
cursor.execute(table_t3);
insert_stmt = "insert into {} values({},repeat('Bartholomew', 128))"
for i in range(0, 100000):
	cursor.execute(insert_stmt.format('t1', int(i)))
	print insert_stmt.format('t1', int(i))
	cursor.execute(insert_stmt.format('t2', int(i)))
	print insert_stmt.format('t2', int(i))
	cursor.execute(insert_stmt.format('t3', int(i)))
	print insert_stmt.format('t3', int(i))
cursor.close()
cnx.close()

As you might notice, column compression might be with or without a compression dictionary. The visible difference, of course, is in the size of the tables. If you want to compress columns based on a predefined dictionary, you should create it with frequently used data. It is possible to create an empty dictionary, but it will have no effect. (See here: #1628231.)

The result of running this script is:

100.000 rows tables

t1 -> uncompressedt2 -> compressed column, t3 -> compressed column with compression dictionary, ‘names2’ dictionary

t2 -> compressed column,

t3 -> compressed column with compression dictionary, ‘names2’ dictionary.

Table size difference:

sh@sh-ubuntu:~/sandboxes/rsandbox_percona-server-5_6_31/master/data/dbtest$ ls -lth | grep .ibd
-rw-rw---- 1 sh sh 168M Sep 29 23:43 t1.ibd
-rw-rw---- 1 sh sh  15M Sep 29 23:43 t2.ibd
-rw-rw---- 1 sh sh  14M Sep 29 23:43 t3.ibd

After running an optimize table:

master [localhost] {msandbox} (dbtest) > optimize table t1;
+-----------+----------+----------+-------------------------------------------------------------------+
| Table     | Op       | Msg_type | Msg_text                                                          |
+-----------+----------+----------+-------------------------------------------------------------------+
| dbtest.t1 | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| dbtest.t1 | optimize | status   | OK                                                                |
+-----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (1 min 35.88 sec)
master [localhost] {msandbox} (dbtest) > optimize table t2;
+-----------+----------+----------+-------------------------------------------------------------------+
| Table     | Op       | Msg_type | Msg_text                                                          |
+-----------+----------+----------+-------------------------------------------------------------------+
| dbtest.t2 | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| dbtest.t2 | optimize | status   | OK                                                                |
+-----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (11.82 sec)
master [localhost] {msandbox} (dbtest) > optimize table t3;
+-----------+----------+----------+-------------------------------------------------------------------+
| Table     | Op       | Msg_type | Msg_text                                                          |
+-----------+----------+----------+-------------------------------------------------------------------+
| dbtest.t3 | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| dbtest.t3 | optimize | status   | OK                                                                |
+-----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (7.89 sec)

The resulted size:

sh@sh-ubuntu:~/sandboxes/rsandbox_percona-server-5_6_31/master/data/dbtest$ ls -lh | grep .ibd
-rw-rw---- 1 sh sh 160M Sep 29 23:52 t1.ibd
-rw-rw---- 1 sh sh 8.0M Sep 29 23:52 t2.ibd
-rw-rw---- 1 sh sh 7.0M Sep 29 23:52 t3.ibd

I want more:

master [localhost] {msandbox} (dbtest) > alter table t1 row_format=compressed;
Query OK, 0 rows affected (2 min 38.85 sec)
Records: 0  Duplicates: 0  Warnings: 0
master [localhost] {msandbox} (dbtest) > alter table t2 row_format=compressed;
Query OK, 0 rows affected (14.41 sec)
Records: 0  Duplicates: 0  Warnings: 0
master [localhost] {msandbox} (dbtest) > alter table t3 row_format=compressed;
Query OK, 0 rows affected (10.74 sec)
Records: 0  Duplicates: 0  Warnings: 0

Using 

ROW_FORMAT=COMPRESSED
 requires
innodb_file_format
  to be >  Antelope. But this is not true for
COLUMN_FORMAT
.

Again, check the size:

sh@sh-ubuntu:~/sandboxes/rsandbox_percona-server-5_6_31/master/data/dbtest$ ls -lh | grep .ibd
-rw-rw---- 1 sh sh  76M Sep 29 23:57 t1.ibd
-rw-rw---- 1 sh sh 4.0M Sep 29 23:58 t2.ibd
-rw-rw---- 1 sh sh 4.0M Sep 29 23:58 t3.ibd

Question: How do I get information about column compression dictionary and tables? Answer: tables from information_schema:

master [localhost] {msandbox} ((none)) > SELECT * FROM information_schema.xtradb_zip_dict;
+----+--------+-------------+
| id | name   | zip_dict    |
+----+--------+-------------+
|  1 | names  | Bartholomew |
|  2 | names2 | Bartholomew |
+----+--------+-------------+
2 rows in set (0.00 sec)
master [localhost] {msandbox} ((none)) > SELECT * FROM information_schema.xtradb_zip_dict_cols;
+----------+------------+---------+
| table_id | column_pos | dict_id |
+----------+------------+---------+
|       67 |          1 |       1 |
+----------+------------+---------+
1 row in set (0.00 sec)

Question: How do I drop the compression dictionary? Answer: if it is in use, you will get:

master [localhost] {msandbox} (dbtest) > drop COMPRESSION_DICTIONARY `names`;
ERROR 1894 (HY000): Compression dictionary 'names' is in use

Before dropping it, make sure there are no tables using the dictionary. There is an extreme condition where you are unable to drop the dictionary (see #1628824).

Question: Great! How about mysqldump? Answer: read here: mysqldump.

I might make this the topic of a dedicated post. Thanks for reading!

by Shahriyar Rzayev at November 10, 2016 11:05 PM

Jean-Jerome Schmidt

Planets9s - vidaXL choses ClusterControl, scaling & sharding MongoDB & more!

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

vidaXL choses ClusterControl to manage its MongoDB & MySQL databases

This week we’re happy to announce that we’re helping our customer vidaXL, a global e-commerce platform, compete with eBay and Amazon - and in doing so, keep its tills ringing. In their own words: “Our back-end is reliant on different MySQL & MongoDB databases to tackle different tasks. Using several different tools, rather than a one-stop shop, was detrimental to our productivity. Severalnines’ ClusterControl is that “one-stop shop” and we haven’t looked back. It’s an awesome solution like no other.”

Read the announcement

Live webinar next Tuesday on scaling & sharding MongoDB

Join us on Tuesday next week, November 15th, for this webinar during which we’ll discuss how to plan your MongoDB scaling strategy up front. We’ll cover topics such as what the differences are in read and write scaling with MongoDB, read scaling considerations and read preference will be explained; and we’ll look at how sharding works in MongoDB and at how to scale and shard MongoDB using ClusterControl. “See” you there!

Sign up for the webinar

HA on a Shoestring Budget - Deploying a Minimal Two Node MySQL Galera Cluster

As we regularly get questions on how to set up a Galera cluster with just 2 nodes, we published this hand blog post on why and how to go about that. The general consensus is that users should have at least 3 Galera nodes to avoid network partitioning. Yet there are some valid reasons for considering a 2 node deployment, e.g., if you want to achieve database high availability but have limited budget to spend on a third database node. Or perhaps you are running Galera in a development/sandbox environment and prefer a minimal setup. Whichever the reasoning, here’s a handy quick-guide on how to go about it.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at November 10, 2016 07:53 PM

Peter Zaitsev

Database Daily Ops Series: GTID Replication

GTID Replication

GTID ReplicationThis post discusses ways of fixing broken GTID replication.

This blog series is all about the daily stories we have in Managed Services, dealing with customers’ environments (mostly when we need to quickly restore a service level within the SLA time).

One of the issues we encounter daily is replication using the GTID protocol. While there are a lot of blogs written about this subject, I would like to just highlight GTID replication operations, and the way you can deal with broken replication.

Most of the time we face way more complex scenarios then the one I’m about to present as an example, but the main goal of this blog is to quickly highlight the tools that can be used to fix issues to resume replication.

After reading this blog, you might ask yourself “Now, we know how to fix replication, but what about consistency?” The next blog will be entirely focused on that matter, data consistency!

Little less talk, little more action…

Replication is broken, and the SHOW SLAVE STATUS command output looks like below:

mysql> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.12
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysqld-bin.000005
          Read_Master_Log_Pos: 879
               Relay_Log_File: mysqld-relay-bin.000009
                Relay_Log_Pos: 736
        Relay_Master_Log_File: mysqld-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1062
                   Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 634
              Relay_Log_Space: 1155
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1062
               Last_SQL_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 3
                  Master_UUID: 46fdb7ad-5852-11e6-92c9-0800274fb806
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 161108 16:47:53
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-4,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-3,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3,
81a567a8-5852-11e6-92cb-0800274fb806:1
                Auto_Position: 1
1 row in set (0.00 sec)

When a slave configured to replicate using the GTID protocol breaks, pay attention to the SHOW SLAVE STATUS command output. You will find the Retrieved_Gtid_Set and Executed_Gtid_Set in the listed columns. You can see that the last global transaction ID retrieved from the current master was not executed (it’s going to appear on the Retrieved_Gtid_Set but not on the Executed_Gtid_Set, following the GTID format).

That means that the slave has retrieved a transaction that, for some other reason, it couldn’t execute. That’s the global transaction ID you need if you want to inject a fake transaction and get replication resumed. The fake transaction you inject takes the place of the one that has an SQL that cannot be executed due to an error found in Last_Error from the SHOW SLAVE STATUS.

Let’s analyze it:
#: replication is broken due to error 1062, when the primary key of a particular table is violated
Last_Errno: 1062
Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'
 
#: you can identify what is the global transaction id with problems, so, getting the replication streaming broken
           Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-4,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-3,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3,
81a567a8-5852-11e6-92cb-0800274fb806:1

As shown above, we can see clearly that the transaction causing issues is global transaction ID number 4, coming from master under UUID 46fdb7ad-5852-11e6-92c9-0800274fb806. You can use the SHOW RELAYLOG EVENTS to try and verify that is the transaction’s query that is causing problems:

mysql> show relaylog events in 'mysqld-relay-bin.000009' from 736G
*************************** 1. row ***************************
   Log_name: mysqld-relay-bin.000009
        Pos: 736
 Event_type: Gtid
  Server_id: 3
End_log_pos: 682
       Info: SET @@SESSION.GTID_NEXT= '46fdb7ad-5852-11e6-92c9-0800274fb806:4'
*************************** 2. row ***************************
   Log_name: mysqld-relay-bin.000009
        Pos: 784
 Event_type: Query
  Server_id: 3
End_log_pos: 755
       Info: BEGIN
*************************** 3. row ***************************
   Log_name: mysqld-relay-bin.000009
        Pos: 857
 Event_type: Query
  Server_id: 3
End_log_pos: 848
       Info: insert into wb.t1 set i=1
*************************** 4. row ***************************
   Log_name: mysqld-relay-bin.000009
        Pos: 950
 Event_type: Xid
  Server_id: 3
End_log_pos: 879
       Info: COMMIT /* xid=66 */
4 rows in set (0.00 sec)

Before fixing and resuming the replication stream, we need to check why that INSERT query breaks replication. Let’s SELECT data and check the structure of table wb.t1:

mysql> select * from wb.t1;
+---+
| i |
+---+
| 1 |
+---+
1 row in set (0.00 sec)
 
mysql> show create table wb.t1;
+-------+-----------------------------------------------------+
| Table | Create Table                                                                                                       |
+-------+-----------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `i` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`i`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-----------------------------------------------------+
1 row in set (0.01 sec)

It’s clear that something is wrong somewhere else than just the database. It’s time to fix and resume replication, and catch up with the master data. GTID replication data uses the below tools:

  • pt-slave-restart
  • mysqlslavetrx
  • inject a fake/empty transaction

pt-slave-restart

One of the easiest ways of resuming replication on slaves when replication is broken is using

pt-slave-restart
, which is part of Percona Toolkit. Once you find the above facts (mainly the master UUID of the problematic global transaction ID that broke slave replication), you can move forward using
pt-slave-restart  
with the GTID flag
—-master-uuid
. This passes the master’s UUID and it skips all global transactions breaking replication on a specific slave server, as you can see below:

[root@dbops02 ~]# pt-slave-restart --master-uuid 46fdb7ad-5852-11e6-92c9-0800274fb806 --host=localhost -u root
2016-11-08T17:24:09 h=localhost,u=root mysqld-relay-bin.000009         736 1062
2016-11-08T17:24:25 h=localhost,u=root mysqld-relay-bin.000010         491 1062
2016-11-08T17:24:34 h=localhost,u=root mysqld-relay-bin.000010         736 1062
2016-11-08T17:24:35 h=localhost,u=root mysqld-relay-bin.000010         981 1062
2016-11-08T17:24:36 h=localhost,u=root mysqld-relay-bin.000010        1226 1062

With the resources provided by
pt-slave-restart
, together with the above info, replication should resume. If you don’t have the Percona Toolkit package setup on your servers, make sure you follow these steps. It’s easier if you add the Percona Repository to your servers (you can use the Package Manager to install it for Debian-based and for RedHat-based systems).

mysqlslavetrx

To use mysqlslavetrx (which is part of MySQL Utilities developer by Oracle), I recommend you read the article written by Daniel Guzman, and install MySQL Utilities on your database servers. Using it to skip problematic transactions and inject fake ones is pretty straightforward as well .

So, find the below on the slave side:

         Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-8,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3,
81a567a8-5852-11e6-92cb-0800274fb806:1

Then use GTID_SUBTRACT as the first parameter, which you can find on Retrieved_Gtid_Set and the subset, which you can grab from the Executed_Gtid_Set. Use the UUID of the direct master in the function to find the global transaction IDs executed on slave:

#: this below function is pretty cool and will exactly shows you if the slave is lacking some
#: GTIDs master have and vice-versa - this is going to help out using mysqlslavetrx to put
#: master and slave consistently with the same binary logs contents
mysql> SELECT GTID_SUBTRACT('46fdb7ad-5852-11e6-92c9-0800274fb806:1-13','46fdb7ad-5852-11e6-92c9-0800274fb806:1-8') gap;
*************************** 1. row ***************************
gap: 46fdb7ad-5852-11e6-92c9-0800274fb806:9-13
1 row in set (0.00 sec)

Now we can use mysqlslavetrx to insert a fake transaction on the slave to resume replication, as below:

 [root@dbops02 mysql-utilities-1.6.2]# mysqlslavetrx --gtid-set=46fdb7ad-5852-11e6-92c9-0800274fb806:9-13 --verbose --slaves=wb@localhost:3306
WARNING: Using a password on the command line interface can be insecure.
#
# GTID set to be skipped for each server:
# - localhost@3306: 46fdb7ad-5852-11e6-92c9-0800274fb806:9-13
#
# Injecting empty transactions for 'localhost:3306'...
# - 46fdb7ad-5852-11e6-92c9-0800274fb806:9
# - 46fdb7ad-5852-11e6-92c9-0800274fb806:10
# - 46fdb7ad-5852-11e6-92c9-0800274fb806:11
# - 46fdb7ad-5852-11e6-92c9-0800274fb806:12
# - 46fdb7ad-5852-11e6-92c9-0800274fb806:13
#
#...done.
#

When you get back to the MySQL client on the slave, you’ll see that the retrieved and executed out of SHOW SLAVE STATUS will point that they are in the same position:

           Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3,
81a567a8-5852-11e6-92cb-0800274fb806:1

After, make sure you start the slave (mysqlslavetrx won’t start the replication as previous tool does).

Inject a Fake Transaction

Fake transactions are called empty transactions as well, but the fact is if a global transaction is affecting a slave, you must fake empty transactions that won’t affect data to resume replication and carry on processing the data streaming from the master aka replication! We need to know that it’s not going to affect future slaves, especially if this server becomes a new master after a failover/switchover process. You can get more information about Errant Transactions here and here.

mysql> stop slave;
Query OK, 0 rows affected (0.01 sec)
 
mysql> set gtid_next='46fdb7ad-5852-11e6-92c9-0800274fb806:14';
Query OK, 0 rows affected (0.00 sec)
 
mysql> begin; commit;
Query OK, 0 rows affected (0.00 sec)
 
Query OK, 0 rows affected (0.00 sec)
 
mysql> set gtid_next=automatic;
Query OK, 0 rows affected (0.00 sec)
Now, when you check retrieved and executed out of SHOW SLAVE STATUS, you can see the below:
           Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3,
81a567a8-5852-11e6-92cb-0800274fb806:1

It’s time to start slave (and be happy)!

mysql> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.12
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysqld-bin.000005
          Read_Master_Log_Pos: 3329
               Relay_Log_File: mysqld-relay-bin.000011
                Relay_Log_Pos: 491
        Relay_Master_Log_File: mysqld-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 3329
              Relay_Log_Space: 3486
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 3
                  Master_UUID: 46fdb7ad-5852-11e6-92c9-0800274fb806
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3,
81a567a8-5852-11e6-92cb-0800274fb806:1
                Auto_Position: 1
1 row in set (0.00 sec)

Cheers!

by Wagner Bianchi at November 10, 2016 05:19 PM

Colin Charles

CfP for Percona Live Santa Clara closes November 13!

At Percona Live Amsterdam recently, the conference expanded beyond just its focus areas of MySQL & its ecosystem and MongoDB to also include PostgreSQL and other open source databases (just look at the recent poll). The event was a sold out success.

This will continue for Percona Live Santa Clara 2017, happening April 24-27 2017 – and the call for papers is open till November 13 2016, so what are you waiting for? Submit already!

I am on the conference committee and am looking forward to making the best program possible. Looking forward to your submissions!

by Colin Charles at November 10, 2016 04:49 PM

November 09, 2016

Jean-Jerome Schmidt

We’re keeping the tills ringing at eCommerce platform vidaXL

ClusterControl helps vidaXL compete with the world's largest e-commerce platforms by managing its MongoDB & MySQL databases.

Press Release: everywhere around the world, November 9th 2016 - today we announced vidaXL, an international eCommerce platform where you can “live it up for less”, as our latest customer. ClusterControl was deployed to help manage vidaXL’s polyglot database architecture, which consists of SQL and NoSQL database solutions to handle specific tasks within the enterprise.

vidaXL caters to the product hunters, offering items for inside and outside the home at competitive prices. With a catalogue of currently over 20,000 products to choose from and selling directly in 29 countries, it has a huge task of managing and updating the database its consumers rely on to fulfil their orders. With 200,000 orders monthly, vidaXL is one of the largest international e-retailers.

The eCommerce company is growing and it has an aim of expanding its product catalogue to over 10,000,000 items within the next 12 months. This extremely large selection of goods creates a wealth of new data; images alone in the catalogue create roughly 100 terabytes worth of data, and the products rows between one to two terabytes. The increase of data originally required vidaXL to hire more database administrators (DBAs), but it searched for a cost-effective solution.

ClusterControl was deployed to manage the database systems. As scaling was an issue for vidaXL, particularly the horizontal scaling of its servers, ClusterControl as a single platform replaced the need for a combination of tools and the sometimes unreliable command line control. The ClusterControl deployment took around one week to implement, with no extra support required from Severalnines.

ClusterControl is easily integrated within a polyglot framework, managing different databases with the same efficiency. vidaXL is using several different databases, MongoDB and MySQL for product and customer listings, along with ElasticSearch, for its real-time search capabilities; ClusterControl was plugged in to automate management and give control over scaling of MongoDB and MySQL. The operations team also leveraged it for proactive reporting.

Zeger Knops, Head of Business Technology, vidaXL said, “We’re looking to grow exponentially in the near future with the products we offer and maintain our position as the world’s largest eCommerce operator. This means we cannot suffer any online outages which lead to a loss of revenue. Scaling from thousands to millions of products is a giant leap and that will require us to have a strong infrastructure foundation. Our back-end is reliant on different databases to tackle different tasks. Using several different tools, rather than a one-stop shop, was detrimental to our productivity. Severalnines is that “shop” and we haven’t looked back. It’s an awesome solution like no other.”

Vinay Joosery, Severalnines CEO, added, “As we head towards the busy end of the year for retailers with Cyber Monday just around the corner, a product catalogue of VidaXL’s size requires strong database management skills and technologies. Keeping operations online and supplying people with their required orders is key. We trust that VidaXL will continue to reap the benefits of ClusterControl as it grows.”

About Severalnines

Severalnines provides automation and management software for database clusters. We help companies deploy their databases in any environment, and manage all operational aspects to achieve high-scale availability.

Severalnines' products are used by developers and administrators of all skills levels to provide the full 'deploy, manage, monitor, scale' database cycle, thus freeing them from the complexity and learning curves that are typically associated with highly available database clusters. The company has enabled over 8,000 deployments to date via its popular ClusterControl product. Currently counting BT, Orange, Cisco, CNRS, Technicolor, AVG, Ping Identity and Paytrail as customers. Severalnines is a private company headquartered in Stockholm, Sweden with offices in Singapore and Tokyo, Japan. To see who is using Severalnines today visit, http://www.severalnines.com/company.

by Severalnines at November 09, 2016 10:37 PM

Peter Zaitsev

Orchestrator and ProxySQL

Orchestrator and ProxySQL

In this blog post, I am going to show you how can you use Orchestrator and ProxySQL together.

In my previous blog post, I showed how to use bash scripts and move virtual IPs with Orchestrator. As in that post, I assume you already have Orchestrator working. If not, you can find the installation steps here.

In the case of a failover, Orchestrator changes the MySQL topology and promotes a new master. But who lets the application know about this change? This is where ProxySQL helps us.

ProxySQL

You can find the ProxySQL install steps here. In our test, we use the following topology:

screen-shot-2016-11-01-at-14-27-09

For this topology we need the next rules in “ProxySQL”:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.56.107',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.56.106',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.56.105',601,3306,1000,0);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.56.105',600,3306,1000,0);
INSERT INTO mysql_replication_hostgroups VALUES (600,601,'');
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
insert into mysql_query_rules (username,destination_hostgroup,active) values('testuser_w',600,1);
insert into mysql_query_rules (username,destination_hostgroup,active) values('testuser_r',601,1);
insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('testuser_rw',601,1,3,'^SELECT');
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('testuser_w','Testpass1.',1,600,'test',1);
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('testuser_r','Testpass1.',1,601,'test',1);
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('testuser_rw','Testpass1.',1,600,'test',1);
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;

See the connection pool:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
+-----------+----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host       | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.56.105 | 3306     | ONLINE | 4        | 0        | 4      | 0       | 2833    | 224351          | 0               | 3242       |
| 601       | 192.168.56.107 | 3306     | ONLINE | 1        | 1        | 11     | 0       | 275443  | 11785750        | 766914785       | 431        |
| 601       | 192.168.56.106 | 3306     | ONLINE | 1        | 1        | 10     | 0       | 262509  | 11182777        | 712120599       | 1343       |
| 601       | 192.168.56.105 | 3306     | ONLINE | 1        | 1        | 2      | 0       | 40598   | 1733059         | 111830195       | 3242       |
+-----------+----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
4 rows in set (0.00 sec)

It shows us “192.168.57.105” is in “hostgroup” 600, which means that server is the master.

How does ProxySQL decide who the new master is?

ProxySQL does not know what the topology looks like, which is really important. ProxySQL is monitoring the “read_only” variables on the MySQL servers, and the server where

read_only=off
 is going to get the writes. If the old master went down and we changed our topology, we have to change the read_only variables on the new master. Of course, applications like MHA or Orchestrator can do that for us.

We have two possibilities here: the master went down, or we want to promote a new master.

Master is down

If the master goes down, Orchestrator is going to change the topology and set the

read_only = OFF
 on the promoted master. ProxySQL is going to realize the master went down and send the write traffic to the server where
read_only=OFF
.

Let’s do a test. After we stopped MySQL on “192.168.56.105”, Orchestrator promoted “192.168.56.106” as the new master. ProxySQL is using it now as a master:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
+-----------+----------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host       | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+----------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.56.106 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 790        |
| 601       | 192.168.56.107 | 3306     | ONLINE  | 0        | 0        | 13     | 0       | 277953  | 11894400        | 774312665       | 445        |
| 601       | 192.168.56.106 | 3306     | ONLINE  | 0        | 0        | 10     | 0       | 265056  | 11290802        | 718935768       | 790        |
| 601       | 192.168.56.105 | 3306     | SHUNNED | 0        | 0        | 2      | 0       | 42961   | 1833016         | 117959313       | 355        |
+-----------+----------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
4 rows in set (0.00 sec)

This happens quickly and does not require any application, VIP or DNS modification.

Promoting a new Master

When we perform a

graceful-master-takeover
 with Orchestrator, it promotes a slave as a new master, removes the old master from the replicaset and sets
read_only=ON
.

From Orchestrator’s point of view, this is great. It promoted a slave as a new master, and old master is not part of the replicaset anymore. But as I mentioned earlier, ProxySQL does not know what the replicaset looks like.

It only knows we changed the

read_only
 variables on some servers. It is going to send reads to the old master, but it does not have up-to-date data anymore. This is not good at all.

We have two options to avoid this.

Remove master from read hostgroup

If the master is not part of the read hostgroup, ProxySQL won’t send any traffic there after we promote a new master. But in this case, if we lose the slaves, ProxySQL cannot redirect the reads to the master. If we have a lot of slaves, and the replication stopped on the saves because of an error or mistake, the master probably won’t be able to handle all the read traffic. But if we only have a few slaves, it would be good if the master can also handle reads if there is an issue on the slaves.

Using Scheduler

In this great blog post from Marco Tusa, we can see that ProxySQL can use “Schedulers”. We can use the same idea here as well. I wrote a script based on Marco’s that can recognize if the old master is no longer a part of the replicaset.

The script checks the followings:

  • read_only=ON
     – the server is read-only (on the slave servers, this has to be ON)
  • repl_lag
      is NULL – on the master, this should be NULL (if the
    seconds_behind_master
     is not defined, ProxySQL will report
    repl_lag
     is NULL)

If the

read_only=ON
, it means the server is not the master at the moment. But if the
repl_lag
 is NULL, it means the server is not replicating from anywhere, and it probably was a master. It has to be removed from the Hostgroup.

Adding a Scheduler

INSERT  INTO scheduler (id,interval_ms,filename,arg1) values (10,2000,"/var/lib/proxysql/server_monitor.pl","-u=admin -p=admin -h=127.0.0.1 -G=601 -P=6032 --debug=0  --log=/var/lib/proxysql/server_check");
LOAD SCHEDULER TO RUNTIME;SAVE SCHEDULER TO DISK;

The script has parameters like username, password or port. But we also have to define the read Hostgroup (-G).

Let’s see what happens with ProsySQL after we run the command

orchestrator -c graceful-master-takeover -i rep1 -d rep2
 :

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
+-----------+----------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host       | srv_port | status       | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+----------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.56.106 | 3306     | ONLINE       | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 504        |
| 601       | 192.168.56.107 | 3306     | ONLINE       | 0        | 2        | 2      | 0       | 6784    | 238075          | 2175559         | 454        |
| 601       | 192.168.56.106 | 3306     | ONLINE       | 0        | 0        | 2      | 0       | 6761    | 237409          | 2147005         | 504        |
| 601       | 192.168.56.105 | 3306     | OFFLINE_HARD | 0        | 0        | 2      | 0       | 6170    | 216001          | 0               | 435        |
+-----------+----------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
4 rows in set (0.00 sec)

As we can see, the status changed to

OFFLINE_HARD
:

mysql> select * from mysql_servers;
+--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname       | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 601          | 192.168.56.107 | 3306 | ONLINE | 1000   | 0           | 1000            | 10                  | 0       | 0              |         |
| 601          | 192.168.56.106 | 3306 | ONLINE | 1000   | 0           | 1000            | 10                  | 0       | 0              |         |
| 9601         | 192.168.56.105 | 3306 | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              |         |
| 600          | 192.168.56.106 | 3306 | ONLINE | 1000   | 0           | 1000            | 10                  | 0       | 0              |         |
+--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
4 rows in set (0.00 sec)

This is because we changed the “hostgroup_id” to 9601. This is what we wanted so that the old master won’t get more traffic.

Conclusion

Because ProxySQL redirects the traffic based on the

read_only
  variables, it is important to start the servers with
read_only=ON
 (even on the master). In that case, we can avoid getting writes on many servers at the same time.

If we want to use

graceful-master-takeover
 with Orchestrator, we have to use a scheduler that can remove the old master from the read Hostgroup.

by Tibor Korocz at November 09, 2016 08:19 PM

November 08, 2016

Jean-Jerome Schmidt

Join our live webinar on how to scale and shard MongoDB

We’re live next Tuesday, November 15th, with our webinar ‘Become a MongoDB DBA - Scaling and Sharding’!

Join us and learn about the three components necessary for MongoDB sharding. We’ll also share a read scaling considerations checklist as well as tips & tricks for finding the right shard key for MongoDB.

Overall, we’ll discuss how to plan your MongoDB scaling strategy up front and how to prevent ending up with unusable secondary nodes and shards. And we’ll look at how to leverage ClusterControl’s MongoDB scaling and shards management capabilities.

Sign up below!

Date, Time & Registration

Europe/MEA/APAC

Tuesday, November 15th at 09:00 GMT / 10:00 CET (Germany, France, Sweden)
Register Now

North America/LatAm

Tuesday, November 15th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)
Register Now

Agenda

  • What are the differences in read and write scaling with MongoDB
  • Read scaling considerations with MongoDB
  • MongoDB read preference explained
  • How sharding works in MongoDB
  • Adding new shards and balance data
  • How to scale and shard MongoDB using ClusterControl
  • Live Demo

Speaker

Art van Scheppingen is a Senior Support Engineer at Severalnines. He’s a pragmatic database expert with over 16 years experience in web development. He previously worked at Spil Games as Head of Database Engineering, where he kept a broad vision upon the whole database environment: from MySQL to MongoDB, Vertica to Hadoop and from Sphinx Search to SOLR. He regularly presents his work and projects at various conferences (Percona Live, MongoDB Open House, FOSDEM) and related meetups.

We look forward to “seeing” you there!

This session is based upon the experience we have using MongoDB and implementing it for our database infrastructure management solution, ClusterControl. For more details, read through our ‘Become a MongoDB DBA’ blog series.

by Severalnines at November 08, 2016 10:42 PM

Peter Zaitsev

Checking if a Slave Has Applied a Transaction from the Master

In this blog post, we will discuss how we can verify if an application transaction executed on the master has been applied to the slaves.

In summary, is a good practice to alleviate the load on the master by doing reads on slaves. It is acceptable in most of the cases to just connect on slaves and issue selects. But there are some cases we need to ensure that the data we just applied on our master has been applied on the slaves before we query it.

One way to do this is using a built-in function called MASTER_POS_WAIT. This function receives a binary log name and position. It will block the query until the slave applies transactions up to that point, or timeout. Here is one example of how to use it:

-- insert our data on master
master [localhost] {msandbox} (test) > INSERT INTO test VALUES ();
Query OK, 1 row affected (0.00 sec)
-- get the binlog file and position from master
master [localhost] {msandbox} (test) > SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000005 | 1591 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
-- connect on slave and run MASTER_POS_WAIT passing the binlog name and position
slave [localhost] {msandbox} ((none)) > SELECT NOW(); SELECT MASTER_POS_WAIT('mysql-bin.000005', 1591); SELECT NOW();
+---------------------+
| NOW() |
+---------------------+
| 2016-10-20 18:24:24 |
+---------------------+
1 row in set (0.00 sec)
-- it will wait until the slave apply up to that point
+-------------------------------------------+
| MASTER_POS_WAIT('mysql-bin.000005', 1591) |
+-------------------------------------------+
| 1 |
+-------------------------------------------+
1 row in set (3.82 sec)
+---------------------+
| NOW() |
+---------------------+
| 2016-10-20 18:24:28 |
+---------------------+
1 row in set (0.00 sec)

Blocking the connection until the slave is in sync with the coordinate passed as a parameter on

MASTER_POS_WAIT
 might not be affordable to all applications, however.

As an alternative, MySQL 5.6+ makes use of relay_log_info_repository configuration. If we set this variable to TABLE, MySQL stores the slave status information in the 

slave_relay_log_info
 table under mysql database. We must configure the sync_relay_log_info variable, and set to 1 in case we use non-transactional tables such as MyISAM. It forces 
slave_relay_log_info
  to sync after each statement. So edit my.cnf on slaves:

relay_log_info_repository=TABLE
sync_relay_log_info=1

Now we can query

slave_relay_log_info
 directly to see if the slave we are connected to already applied the transaction we need:

master [localhost] {msandbox} (test) > INSERT INTO test VALUES (NULL);
Query OK, 1 row affected (0.00 sec)
master [localhost] {msandbox} (test) > SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000003 | 366 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
slave1 [localhost] {msandbox} ((none)) > SELECT COUNT(*) FROM mysql.slave_relay_log_info WHERE (Master_log_name > 'mysql-bin.000003') OR ( Master_log_name = 'mysql-bin.000003' AND Master_log_pos >= '366' );
+----------+
| COUNT(*) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)

Conclusion

You can use

relay_log_info_repository
 as a replacement for 
MASTER_POS_WAIT
 to check if a slave has applied a particular transaction. Since it won’t block your thread (in case the slave is not in sync) you will be able to either abort the operation or disconnect and move to the next slave.

by Marcelo Altmann at November 08, 2016 09:08 PM

Percona Live 2017 Call for Papers Extended Deadline: November 13th

Percona Live 2017We’ve extended the Percona Live Open Source Database Conference 2017 call for papers deadline until November 13th!

If you haven’t submitted already, please consider doing so. Speaking at Percona Live is a great way to build your personal and company brands, and if selected you will receive a complimentary full conference pass! For Percona Live 2017, we’re not just looking for MySQL and MongoDB topics, but also talks on other open source databases. 

The Percona Live 2017 Call for Papers is open until November 13, 2016. Do you have a MySQL, MongoDB, PostgreSQL or open source database use case to share, a skill to teach, or a big idea to discuss? We invite you to submit your speaking proposal for either breakout or tutorial sessions. This conference provides an opportunity to network with peers and technology professionals. It brings together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience.

Percona Live 2017 is looking for topics for Breakout Sessions, Tutorial Sessions, and Lightning Talks:

  • Breakout Session. Make submissions detailed, and clearly indicate the topic and content of your proposal for the Conference Committee. Sessions should either be 25 minutes or 50 minutes in length, including Q&A.
  • Tutorial Session. Make submissions detailed, and include an agenda for review by the Conference Committee. Tutorial sessions should present immediate and practical applications of in-depth knowledge of MySQL, MongoDB and open source database technologies. They should be presented at a level between a training class and a conference breakout session. Attendees are expected to have their laptops to work through detailed and potentially hands-on presentations. Tutorials will be 3 hours in length including Q&A. If you would like to submit your proposal as a full day, 6-hour tutorial, please indicate this in your submission.
  • Lightning Talks. Lightning talks are five-minute presentations focusing on one key point that will be of interest to the community. Talks can be technical, lighthearted, fun or otherwise entertaining submissions. These can include new ideas, a successful project, a cautionary story, quick tip or demonstration. This session is an opportunity for ideas to get the attention they deserve. The rules for this session are easy: five minutes and only five minutes. Use this time wisely to present the pertinent message of the subject matter and have fun doing so!

Submit your topics as soon as you can, the period closes on November 13, 2016!

Percona Live Open Source Database Conference 2017: Santa Clara, CA

The Percona Live Open Source Database Conference 2017 is the premier event for the diverse and active open source database community, as well as organizations that develop and use open source database software.

The conference will feature one day of tutorials and three days of keynote talks and breakout sessions related to open source databases and software. Learn about the hottest topics, building and maintaining high-performing deployments and what top industry leaders have to say.

The Percona Live Open Source Database Conference 2017 is April 24th – 27th, at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Register for Percona Live 2017 now! Super Saver registration lasts until Nov 13. This year’s Percona Live Europe sold out, and we’re looking to do the same for Percona Live 2017. Don’t miss your chance to get your ticket at its most affordable price. Click here to register.

Percona Live 2017 sponsorship opportunities are available now. Click here to find out how to sponsor.

by Kortney Runyan at November 08, 2016 06:01 PM

Valeriy Kravchuk

How to Recover Corrupted InnoDB Partition Tablespace in Replication Setup

This week I've got a question that sounded basically like this:
"Is it possible to just copy the entire partition from the replicated server?"
Let me share some background story. As it happens sometimes, user had a huge table with many partitions, let's say hundreds of gigabytes in size each, and one of them got unfortunately corrupted. It happened in a replication setup on master, but lucky they were, they had used innodb_file_per_table=1 and they had a slave that was more or less in sync with master. This allowed to reconfigure replication and continue to work, but the task remained to eventually put master back in use and get correct data in the corrupted partition. Let's assume that dumping and reloading data from one of instances in replication setup is not a desired option, as it will take too much time comparing to just copying the partition tablespace file. Hence the question above...
Side note: Let's assume for simplicity that corrupted partition does not get changes at the moment, but even if it does we can replay them theoretically (by careful processing of binary logs, for example).
My quick and simple answer was that surely it is possible with InnoDB from MySQL 5.6, and there is a great blog post that provide a lot of relevant details and steps, this one by my former colleague from Percona Jervin Real. I remember I had to deal with orphaned or corrupted partitions more than once in the past, while working for Percona. It is NOT possible to import them directly in MySQL 5.6, and I've even reported this as Bug #70196, but in 5.7.4+ after the fix for Bug #52422 it is possible. Anyway, nice partition exchange feature, as Jervin explained, helps to overcome this limitation easily in MySQL 5.6 as well.

User decided to try this, but surely dealing with corrupted InnoDB tables is not as easy as with non-corrupted ones. There many things that could go not as expected and require additional steps. For example, hardly any partition exchange is possible for a table with corrupted partition - I've suggested to do this for a clean table without any corruptions.

I still feel that things may go wrong, so I decided to make a test and show step by step how it works and what may go wrong in the process. To do this I've used one of replication sandboxes at hand, with MySQL 5.6.28 and, after making sure replication works on both slaves, created simple partitioned table on master with the following statements:
create table tp(id int, val int, ts datetime, primary key(id, ts)) partition by range (year(ts)) (partition p0 values less than (2006), partition p1 values less than (2015), partition px values less than maxvalue);
insert into tp values(1, 10, '2005-10-01');
insert into tp values(2, 10, '2014-10-02');
insert into tp values(3, 30, now());
This is what I've got on master:
master [localhost] {msandbox} (test) > select * from tp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   10 | 2005-10-01 00:00:00 |
|  2 |   10 | 2014-10-02 00:00:00 |
|  3 |   30 | 2016-11-05 17:34:30 |
+----+------+---------------------+
3 rows in set (0,00 sec)

master [localhost] {msandbox} (test) > select * from tp partition(px);
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  3 |   30 | 2016-11-05 17:34:30 |
+----+------+---------------------+
1 row in set (0,00 sec)

master [localhost] {msandbox} (test) > select * from tp partition(p0);
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   10 | 2005-10-01 00:00:00 |
+----+------+---------------------+
1 row in set (0,00 sec)

master [localhost] {msandbox} (test) > select * from tp partition(p1);
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  2 |   10 | 2014-10-02 00:00:00 |
+----+------+---------------------+
1 row in set (0,00 sec)

Then I checked that data are replicated on slave that I am going to corrupt soon: 
slave1 [localhost] {msandbox} (test) > select * from tp partition(p1);
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  2 |   10 | 2014-10-02 00:00:00 |
+----+------+---------------------+
1 row in set (0,00 sec)
It's time to stop MySQL server and corrupt the data in partition p1:
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ./stop
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ls -l data/test
total 152012
...

-rw-rw---- 1 openxs openxs     8610 лис  5 17:34 tp.frm
-rw-rw---- 1 openxs openxs       32 лис  5 17:34 tp.par
-rw-rw---- 1 openxs openxs    98304 лис  5 17:34 tp#P#p0.ibd
-rw-rw---- 1 openxs openxs    98304 лис  5 17:34 tp#P#p1.ibd
-rw-rw---- 1 openxs openxs    98304 лис  5 17:34 tp#P#px.ibd

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ hexdump -C data/test/tp#P#p1.ibd | more
00000000  22 e2 b8 a3 00 00 00 00  00 00 00 00 00 00 00 00  |"...............|
00000010  00 00 00 00 0f f4 ae 86  00 08 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 0c 00 00  00 0c 00 00 00 00 00 00  |................|
00000030  00 06 00 00 00 40 00 00  00 00 00 00 00 04 00 00  |.....@..........|
00000040  00 00 ff ff ff ff 00 00  ff ff ff ff 00 00 00 00  |................|
00000050  00 01 00 00 00 00 00 9e  00 00 00 00 00 9e 00 00  |................|
...

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ dd if=/dev/zero of=data/test/tp#P#p1.ibd bs=1 count=98304
98304+0 records in
98304+0 records out
98304 bytes (98 kB) copied, 0,253177 s, 388 kB/s
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ hexdump -C data/test/tp#P#p1.ibd | more
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00018000
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ls -l data/test/tp*       
-rw-rw---- 1 openxs openxs  8610 лис  5 17:34 data/test/tp.frm
-rw-rw---- 1 openxs openxs    32 лис  5 17:34 data/test/tp.par
-rw-rw---- 1 openxs openxs 98304 лис  5 17:34 data/test/tp#P#p0.ibd
-rw-rw---- 1 openxs openxs 98304 лис  5 17:41 data/test/tp#P#p1.ibd
-rw-rw---- 1 openxs openxs 98304 лис  5 17:34 data/test/tp#P#px.ibd
I just filled entire partition file with zeroes and verified it's content  before and after with hexdump. I think it's bad enough kid of corruption - there is no way to recover data, they are gone entirely, and system areas of .ibd file will surely not match what InnoDB data dictionary may thing about the tablespace.

Now I can try to start slave back and access data in the table:
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ./start
. sandbox server started
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ tail data/msandbox.err
2016-11-05 18:15:11 8092 [Note] Server hostname (bind-address): '127.0.0.1'; port: 22294
2016-11-05 18:15:11 8092 [Note]   - '127.0.0.1' resolves to '127.0.0.1';
2016-11-05 18:15:11 8092 [Note] Server socket created on IP: '127.0.0.1'.
2016-11-05 18:15:11 8092 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql_sandbox22294-relay-bin' to avoid this problem.
2016-11-05 18:15:11 8092 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2016-11-05 18:15:11 8092 [Note] Slave I/O thread: connected to master 'rsandbox@127.0.0.1:22293',replication started in log 'mysql-bin.000005' at position 1015
2016-11-05 18:15:11 8092 [Note] Event Scheduler: Loaded 0 events
2016-11-05 18:15:11 8092 [Note] /home/openxs/5.6.28/bin/mysqld: ready for connections.
Version: '5.6.28-log'  socket: '/tmp/mysql_sandbox22294.sock'  port: 22294  MySQL Community Server (GPL)
2016-11-05 18:15:12 8092 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000005' at position 1015, relay log './mysql_sandbox22294-relay-bin.000236' position: 1178
The table was not accessed during recovery stage (as slave was stopped cleanly), so we have no hints in the error log about the corruption. But as soon as we try to access the table:

slave1 [localhost] {msandbox} (test) > select * from tp;
ERROR 2013 (HY000): Lost connection to MySQL server during query
In the error log I see:

InnoDB: Error: tablespace id is 12 in the data dictionary
InnoDB: but in file ./test/tp#P#p1.ibd it is 0!
2016-11-05 18:17:03 7f0e9c257700  InnoDB: Assertion failure in thread 139700725970688 in file fil0fil.cc line 796
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
16:17:03 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=3
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68108 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x276e860
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f0e9c256e50 thread_stack 0x40000
/home/openxs/5.6.28/bin/mysqld(my_print_stacktrace+0x35)[0x90f695]
/home/openxs/5.6.28/bin/mysqld(handle_fatal_signal+0x3d8)[0x674fc8]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f0eb0d99330]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f0eaf993c37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f0eaf997028]
/home/openxs/5.6.28/bin/mysqld[0xae994e]
/home/openxs/5.6.28/bin/mysqld[0xae9ade]
/home/openxs/5.6.28/bin/mysqld[0xaf0419]
/home/openxs/5.6.28/bin/mysqld[0xabb7fb]
/home/openxs/5.6.28/bin/mysqld[0xabbe6b]
/home/openxs/5.6.28/bin/mysqld[0xaa996a]
/home/openxs/5.6.28/bin/mysqld[0xa9720f]
/home/openxs/5.6.28/bin/mysqld[0xa40b2e]
/home/openxs/5.6.28/bin/mysqld[0x99fca5]
/home/openxs/5.6.28/bin/mysqld[0x997d89]
/home/openxs/5.6.28/bin/mysqld[0x99f9d9]
/home/openxs/5.6.28/bin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x9c)[0x58c52c]/home/openxs/5.6.28/bin/mysqld(_ZN12ha_partition8rnd_nextEPh+0x41)[0xb48441]
/home/openxs/5.6.28/bin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x64)[0x58c4f4]
/home/openxs/5.6.28/bin/mysqld(_Z13rr_sequentialP11READ_RECORD+0x37)[0x8400a7]
/home/openxs/5.6.28/bin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x181)[0x6d3891]
/home/openxs/5.6.28/bin/mysqld(_ZN4JOIN4execEv+0x391)[0x6d16b1]
/home/openxs/5.6.28/bin/mysqld[0x718349]
/home/openxs/5.6.28/bin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_P10SQL_I_ListI8st_orderESB_S7_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xbc)[0x718e0c]
/home/openxs/5.6.28/bin/mysqld(_Z13handle_selectP3THDP13select_resultm+0x175)[0x719015]
/home/openxs/5.6.28/bin/mysqld[0x6f3769]
/home/openxs/5.6.28/bin/mysqld(_Z21mysql_execute_commandP3THD+0x3575)[0x6f7f25]
/home/openxs/5.6.28/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x338)[0x6fba58]
/home/openxs/5.6.28/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xc60)[0x6fce30]
/home/openxs/5.6.28/bin/mysqld(_Z10do_commandP3THD+0xd7)[0x6fec27]
/home/openxs/5.6.28/bin/mysqld(_Z24do_handle_one_connectionP3THD+0x116)[0x6c5da6]
/home/openxs/5.6.28/bin/mysqld(handle_one_connection+0x45)[0x6c5e85]
/home/openxs/5.6.28/bin/mysqld(pfs_spawn_thread+0x126)[0x989be6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f0eb0d91184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f0eafa5737d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f0e74005050): select * from tp
Connection ID (thread ID): 3
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
161105 18:17:04 mysqld_safe Number of processes running now: 0
161105 18:17:04 mysqld_safe mysqld restarted
So, assertion failure happened because tablespace id of deliberately corrupted partition does not match InnoDB expectations in the data dictionary. Server was restart, but this time it was not successful:

...
2016-11-05 18:17:04 8146 [Note] InnoDB: Database was not shutdown normally!
2016-11-05 18:17:04 8146 [Note] InnoDB: Starting crash recovery.
2016-11-05 18:17:04 8146 [Note] InnoDB: Reading tablespace information from the .ibd files...
2016-11-05 18:17:04 8146 [ERROR] InnoDB: space header page consists of zero bytes in tablespace ./test/tp#P#p1.ibd (table test/tp#P#p1)
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size:1024 Pages to analyze:64
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size: 1024, Possible space_id count:0
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size:2048 Pages to analyze:48
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size: 2048, Possible space_id count:0
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size:4096 Pages to analyze:24
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size: 4096, Possible space_id count:0
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size:8192 Pages to analyze:12
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size: 8192, Possible space_id count:0
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size:16384 Pages to analyze:6
2016-11-05 18:17:04 8146 [Note] InnoDB: Page size: 16384, Possible space_id count:0
2016-11-05 18:17:04 7f0f0f745780  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
InnoDB: Error: could not open single-table tablespace file ./test/tp#P#p1.ibd
InnoDB: We do not continue the crash recovery, because the table may become
InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.
InnoDB: To fix the problem and start mysqld:
InnoDB: 1) If there is a permission problem in the file and mysqld cannot
InnoDB: open the file, you should modify the permissions.
InnoDB: 2) If the table is not needed, or you can restore it from a backup,
InnoDB: then you can remove the .ibd file, and InnoDB will do a normal
InnoDB: crash recovery and ignore that table.
InnoDB: 3) If the file system or the disk is broken, and you cannot remove
InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf
InnoDB: and force InnoDB to continue crash recovery here.
161105 18:17:04 mysqld_safe mysqld from pid file /home/openxs/sandboxes/rsandbox_mysql-5_6_28/node1/data/mysql_sandbox22294.pid ended
Now, what shell we do? There is no reason to try to start again and again. We should check that partition .ibd file is there (it is), does have proper permissions (it does) and looks like a a proper tablespace. We can check that if we have proper tablespace file for the same partition elsewhere, search for known strings of data in the file etc. In this case hexdump already shown corruption of data without any chance to restore anything. Time to ask a question that started this post...

With the answer in mind (yes, we can get the data from partition .ibd file from non-corrupted table), how to proceed in practice? First of all, we need MySQL server that is started and works, so that we can run SQL statements. This one can not be started normally and, if we plan to restore data on it (for whatever reason) we have to start it first. To do so we have to try to use innodb_force_recovery settings starting for 1. Depending on how and when corruption happened 1 may not be enough. In general I would not recommend to proceed to any value more than 4. Also it makes sense to copy all files related to corrupted table and, ideally, shared tabespace to a safe place. Changes done with forced recovery may corrupt data even more, so it's good to have a way to roll back and start again. I copied all .ibd files for the table to /tmp directory, but in realirty make sure you just rename/move them inside the same physical filesystem, if you care about huge size and time to copy:
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ cp data/test/tp*.ibd /tmp/
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ls -l /tmp/tp*.ibd
-rw-rw---- 1 openxs openxs 98304 лис  5 18:49 /tmp/tp#P#p0.ibd
-rw-rw---- 1 openxs openxs 98304 лис  5 18:49 /tmp/tp#P#p1.ibd
-rw-rw---- 1 openxs openxs 98304 лис  5 18:49 /tmp/tp#P#px.ibd
Then I set innodb_force_recovery to 1 in [mysqld] section of my configuration file and started the server:

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ grep innodb_force my.sandbox.cnf
innodb_force_recovery=1
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ./start
. sandbox server started
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ tail data/msandbox.err    
InnoDB: innodb_force_recovery is on: we do not allow
InnoDB: database modifications by the user. Shut down
InnoDB: mysqld and edit my.cnf so that
InnoDB: innodb_force_... is removed.
2016-11-05 18:25:25 8491 [ERROR] Error writing relay log configuration.
2016-11-05 18:25:25 8491 [ERROR] Error reading relay log configuration.
2016-11-05 18:25:25 8491 [ERROR] Failed to initialize the master info structure
2016-11-05 18:25:25 8491 [Note] Check error log for additional messages. You will not be able to start replication until the issue is resolved and the server restarted.
2016-11-05 18:25:25 8491 [Note] Event Scheduler: Loaded 0 events
2016-11-05 18:25:25 8491 [Note] /home/openxs/5.6.28/bin/mysqld: ready for connections.
Version: '5.6.28-log'  socket: '/tmp/mysql_sandbox22294.sock'  port: 22294  MySQL Community Server (GPL)
It seems server started, but when we try to access the table strange thing happens:
slave1 [localhost] {msandbox} (test) > select * from tp;
ERROR 1146 (42S02): Table 'test.tp' doesn't exist
slave1 [localhost] {msandbox} (test) > show tables;
+----------------+
| Tables_in_test |
+----------------+
| t              |
| t1             |
| t2             |
| t_local        |
| tp             |
+----------------+
5 rows in set (0,00 sec)

slave1 [localhost] {msandbox} (test) > show create table tp\G
ERROR 1146 (42S02): Table 'test.tp' doesn't exist
The table is NOT there and there at the same time. This indicates some problem with InnoDB data dictionary. In the error log we see:
2016-11-05 18:26:14 8491 [ERROR] InnoDB: Failed to find tablespace for table '"test"."tp" /* Partition "p1" */' in the cache. Attempting to load the tablespace with space id 12.
2016-11-05 18:26:14 8491 [ERROR] InnoDB: In file './test/tp#P#p1.ibd', tablespace id and flags are 0 and 0, but in the InnoDB data dictionary they are 12 and 0.
Have you moved InnoDB .ibd files around without using the commands DISCARD TABLESPACE and IMPORT TABLESPACE? Please refer to http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2016-11-05 18:26:14 7f434af03700  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
2016-11-05 18:26:14 8491 [ERROR] InnoDB: Could not find a valid tablespace file for 'test/tp#P#p1'. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2016-11-05 18:26:14 7f434af03700 InnoDB: cannot calculate statistics for table "test"."tp" /* Partition "p1" */ because the .ibd file is missing. For help, please refer to http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html
2016-11-05 18:26:18 7f434af03700 InnoDB: cannot calculate statistics for table "test"."tp" /* Partition "p1" */ because the .ibd file is missing. For help, please refer to http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html
2016-11-05 18:26:51 7f434af03700 InnoDB: cannot calculate statistics for table "test"."tp" /* Partition "p1" */ because the .ibd file is missing. For help, please refer to http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html
Forced recovery mode is designed to be used for one of two main actions with the problematic table: either to select some remaining data from it (we tried and can not do this, and we don't case as we have data on other server in replication setup), or to drop it (we also copied .ibd files elsewhere if you remember). Nothing works with the table, but DROP does work:
slave1 [localhost] {msandbox} (test) > drop table tp;
Query OK, 0 rows affected (0,32 sec)
Now we can try to restart without forced recovery and try to recreate the table and link orphaned .ibd files back:

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ./stop
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ vi my.sandbox.cnf
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ./start. sandbox server started
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ grep innodb_force my.sandbox.cnf
#innodb_force_recovery=1
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ tail data/msandbox.err   
2016-11-05 18:51:50 8861 [Note] Server hostname (bind-address): '127.0.0.1'; port: 22294
2016-11-05 18:51:50 8861 [Note]   - '127.0.0.1' resolves to '127.0.0.1';
2016-11-05 18:51:50 8861 [Note] Server socket created on IP: '127.0.0.1'.
2016-11-05 18:51:50 8861 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql_sandbox22294-relay-bin' to avoid this problem.
2016-11-05 18:51:50 8861 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2016-11-05 18:51:50 8861 [Note] Slave I/O thread: connected to master 'rsandbox@127.0.0.1:22293',replication started in log 'mysql-bin.000005' at position 1015
2016-11-05 18:51:50 8861 [Note] Event Scheduler: Loaded 0 events
2016-11-05 18:51:50 8861 [Note] /home/openxs/5.6.28/bin/mysqld: ready for connections.
Version: '5.6.28-log'  socket: '/tmp/mysql_sandbox22294.sock'  port: 22294  MySQL Community Server (GPL)
2016-11-05 18:51:51 8861 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000005' at position 1015, relay log './mysql_sandbox22294-relay-bin.000237' position: 4
It looks promising so far. We can get DDL for the table from other server in replication setup (or any other source, like older dump) and try to create the table in a hurry:
master [localhost] {msandbox} (test) > show create table tp\G
*************************** 1. row ***************************
       Table: tp
Create Table: CREATE TABLE `tp` (
  `id` int(11) NOT NULL DEFAULT '0',
  `val` int(11) DEFAULT NULL,
  `ts` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`,`ts`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (year(ts))
(PARTITION p0 VALUES LESS THAN (2006) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (2015) ENGINE = InnoDB,
 PARTITION px VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
1 row in set (0,00 sec)

slave1 [localhost] {msandbox} (test) > show tables;
+----------------+
| Tables_in_test |
+----------------+
| t              |
| t1             |
| t2             |
| t_local        |
+----------------+
4 rows in set (0,00 sec)

slave1 [localhost] {msandbox} (test) > CREATE TABLE `tp` (   `id` int(11) NOT NULL DEFAULT '0',   `val` int(11) DEFAULT NULL,   `ts` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',   PRIMARY KEY (`id`,`ts`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (year(ts)) (PARTITION p0 VALUES LESS THAN (2006) ENGINE = InnoDB,  PARTITION p1 VALUES LESS THAN (2015) ENGINE = InnoDB,  PARTITION px VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
ERROR 1813 (HY000): Tablespace for table '`test`.`tp` /* Partition `p1` */' exists. Please DISCARD the tablespace before IMPORT.
So, looks like DROP TABLE worked but corrupted partition file remained? Yes, it remained:

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ls -l data/test/tp*.ibd 
-rw-rw---- 1 openxs openxs 98304 лис  5 17:41 data/test/tp#P#p1.ibd
We have to remove it and then try again:

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ rm data/test/tp*.ibd

slave1 [localhost] {msandbox} (test) > CREATE TABLE `tp` (   `id` int(11) NOT NULL DEFAULT '0',   `val` int(11) DEFAULT NULL,   `ts` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',   PRIMARY KEY (`id`,`ts`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (year(ts)) (PARTITION p0 VALUES LESS THAN (2006) ENGINE = InnoDB,  PARTITION p1 VALUES LESS THAN (2015) ENGINE = InnoDB,  PARTITION px VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
Query OK, 0 rows affected (0,75 sec)
Now we can mostly follow Jervin's blog post and create non-partitioned table of the same structure with the following DDL statements:

create table tp_tmp like tp;
alter table tp_tmp remove partitioning;alter table tp_tmp discard tablespace;
At this moment we can re-import partitions one by one. In reality instead of copy we may use rename or move for those partitions that were non-corrupted.
slave1 [localhost] {msandbox} (test) > \! cp /tmp/tp#P#p0.ibd data/test/tp_tmp.ibd
slave1 [localhost] {msandbox} (test) > \! ls -l data/test/tp_tmp.*             
-rw-rw---- 1 openxs openxs  8610 лис  5 19:05 data/test/tp_tmp.frm
-rw-rw---- 1 openxs openxs 98304 лис  5 19:07 data/test/tp_tmp.ibd
slave1 [localhost] {msandbox} (test) > alter table tp_tmp import tablespace;
Query OK, 0 rows affected, 1 warning (0,34 sec)

slave1 [localhost] {msandbox} (test) > show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1810
Message: InnoDB: IO Read error: (2, No such file or directory) Error opening './test/tp_tmp.cfg', will attempt to import without schema verification
1 row in set (0,00 sec)

slave1 [localhost] {msandbox} (test) > select * from tp_tmp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   10 | 2005-10-01 00:00:00 |
+----+------+---------------------+
1 row in set (0,00 sec)
We get the warning, but it can be safely ignored - we know for sure that table structure matches .ibd file imported. Now we are ready to exchange emprty partiton with proper partiton data in the table:

slave1 [localhost] {msandbox} (test) > alter table tp exchange partition p0 with table tp_tmp;
Query OK, 0 rows affected (0,51 sec)

slave1 [localhost] {msandbox} (test) > select * from tp_tmp;
Empty set (0,00 sec)

slave1 [localhost] {msandbox} (test) > select * from tp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   10 | 2005-10-01 00:00:00 |
+----+------+---------------------+
1 row in set (0,01 sec)
Then we can do the same for other partition(s) stored on the same server, all but corrupted one:
slave1 [localhost] {msandbox} (test) > alter table tp_tmp discard tablespace;
Query OK, 0 rows affected (0,05 sec)

slave1 [localhost] {msandbox} (test) > \! cp /tmp/tp#P#px.ibd data/test/tp_tmp.ibd
slave1 [localhost] {msandbox} (test) > alter table tp_tmp import tablespace;
Query OK, 0 rows affected, 1 warning (0,35 sec)

slave1 [localhost] {msandbox} (test) > select * from tp_tmp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  3 |   30 | 2016-11-05 17:34:30 |
+----+------+---------------------+
1 row in set (0,00 sec)

slave1 [localhost] {msandbox} (test) > alter table tp exchange partition px with table tp_tmp;Query OK, 0 rows affected (0,53 sec)

slave1 [localhost] {msandbox} (test) > select * from tp_tmp;                   
Empty set (0,00 sec)

slave1 [localhost] {msandbox} (test) > select * from tp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   10 | 2005-10-01 00:00:00 |
|  3 |   30 | 2016-11-05 17:34:30 |
+----+------+---------------------+
2 rows in set (0,00 sec)
The file for corrupted partititon we should copy form elsewhere, in my case it's a datadir on master sandbox instance:
slave1 [localhost] {msandbox} (test) > alter table tp_tmp discard tablespace;  
Query OK, 0 rows affected (0,04 sec)

slave1 [localhost] {msandbox} (test) > \! cp ../master/data/test/tp#P#p1.ibd data/test/tp_tmp.ibd
slave1 [localhost] {msandbox} (test) > alter table tp_tmp import tablespace;   
Query OK, 0 rows affected, 1 warning (0,36 sec)

slave1 [localhost] {msandbox} (test) > select * from tp_tmp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  2 |   10 | 2014-10-02 00:00:00 |
+----+------+---------------------+
1 row in set (0,00 sec)

slave1 [localhost] {msandbox} (test) > alter table tp exchange partition px with table tp_tmp;
ERROR 1737 (HY000): Found a row that does not match the partition
slave1 [localhost] {msandbox} (test) > alter table tp exchange partition p1 with table tp_tmp;
Query OK, 0 rows affected (0,47 sec)

slave1 [localhost] {msandbox} (test) > select * from tp_tmp;
Empty set (0,00 sec)

slave1 [localhost] {msandbox} (test) > select * from tp partition(p1);
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  2 |   10 | 2014-10-02 00:00:00 |
+----+------+---------------------+
1 row in set (0,01 sec)
As you can see from my small mistake above, if data imported do not match range partition condition, they are not imported and you get error. We are done with restoring the data and now we can checkif replication works. Let me change all the data on master:
master [localhost] {msandbox} (test) > update tp set val = val + 1;
Query OK, 3 rows affected (0,03 sec)
Rows matched: 3  Changed: 3  Warnings: 0

master [localhost] {msandbox} (test) > select * from tp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   11 | 2005-10-01 00:00:00 |
|  2 |   11 | 2014-10-02 00:00:00 |
|  3 |   31 | 2016-11-05 17:34:30 |
+----+------+---------------------+
3 rows in set (0,00 sec)
and make sure they are replicated to slave:
slave1 [localhost] {msandbox} (test) > select * from tp;
+----+------+---------------------+
| id | val  | ts                  |
+----+------+---------------------+
|  1 |   11 | 2005-10-01 00:00:00 |
|  2 |   11 | 2014-10-02 00:00:00 |
|  3 |   31 | 2016-11-05 17:34:30 |
+----+------+---------------------+
3 rows in set (0,00 sec)

In the error log we'll see only messages related to importing tablespaces, no errors of any kind:
...
2016-11-05 19:01:02 9201 [ERROR] InnoDB: The file './test/tp#P#p1.ibd' already exists though the corresponding table did not exist in the InnoDB data dictionary. Have you moved InnoDB .ibd files around without using the SQL commands DISCARD TABLESPACE and IMPORT TABLESPACE, or did mysqld crash in the middle of CREATE TABLE? You can resolve the problem by removing the file './test/tp#P#p1.ibd' under the 'datadir' of MySQL.
2016-11-05 19:07:58 9201 [Note] InnoDB: Sync to disk
2016-11-05 19:07:58 9201 [Note] InnoDB: Sync to disk - done!
2016-11-05 19:07:58 9201 [Note] InnoDB: Phase I - Update all pages
2016-11-05 19:07:58 9201 [Note] InnoDB: Sync to disk
2016-11-05 19:07:58 9201 [Note] InnoDB: Sync to disk - done!
2016-11-05 19:07:58 9201 [Note] InnoDB: Phase III - Flush changes to disk
2016-11-05 19:07:58 9201 [Note] InnoDB: Phase IV - Flush complete
2016-11-05 19:10:48 9201 [Note] InnoDB: Sync to disk
2016-11-05 19:10:48 9201 [Note] InnoDB: Sync to disk - done!
2016-11-05 19:10:48 9201 [Note] InnoDB: Phase I - Update all pages
2016-11-05 19:10:48 9201 [Note] InnoDB: Sync to disk
2016-11-05 19:10:48 9201 [Note] InnoDB: Sync to disk - done!
2016-11-05 19:10:48 9201 [Note] InnoDB: Phase III - Flush changes to disk
2016-11-05 19:10:49 9201 [Note] InnoDB: Phase IV - Flush complete
2016-11-05 19:12:42 9201 [Note] InnoDB: Sync to disk
2016-11-05 19:12:42 9201 [Note] InnoDB: Sync to disk - done!
2016-11-05 19:12:42 9201 [Note] InnoDB: Phase I - Update all pages
2016-11-05 19:12:42 9201 [Note] InnoDB: Sync to disk
2016-11-05 19:12:42 9201 [Note] InnoDB: Sync to disk - done!
2016-11-05 19:12:42 9201 [Note] InnoDB: Phase III - Flush changes to disk
2016-11-05 19:12:42 9201 [Note] InnoDB: Phase IV - Flush complete
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$
I am too tired to write a nice summary right now, so I'll write it in some other post. Now I (and you, my readers) have a detailed enough reference of all boring steps involved in recovery of corrupted partition from .ibd file taken from other server in replication setup.

by Valeriy Kravchuk (noreply@blogger.com) at November 08, 2016 05:53 PM

MariaDB AB

Ready for the holiday shopping season? 20 tips to prepare your MariaDB database environment for Black Friday and Cyber Monday!

Ready for the holiday shopping season? 20 tips to prepare your MariaDB database environment for Black Friday and Cyber Monday!james_mclaurin_g Mon, 11/07/2016 - 23:41

Setting up a database environment for the expected loads that Black Friday or Cyber Monday brings can be a tricky situation.  However, the suggestions in this blog centers on the principles of Scalability, Capacity, Performance and High Availability (HA).  For this post, I define each principle as:

  • Scalability is the ability to add capacity by adding resources

  • Capacity is the ability to handle load

  • Performance is tuning your database environment for increased capacity and scale

  • High Availability is serving requests with good performance

This blog post covers how to tune system variable in the MariaDB database environment for capacity, scalability and high availability.  

Before deploying, do not accept these suggestions blindly. Each MariaDB environment is unique and requires additional thought before making any changes.  You will most likely need to adjust these settings for your specific use case and environment.

Things you need to know:

  • MariaDB configuration file is located in /etc/my.cnf. Every time you modify this file you will need to restart the MySQL service so the new changes can take effect.

Black Friday and Cyber Monday Tuning Recommendations

1. InnoDB Buffer Pool Size

The InnoDB buffer pool should generally be set to 60-80% of the available RAM when using InnoDB exclusively. Ideally, all InnoDB data and indexes should be able to fit within memory or at least the working data set.

More information:

2. InnoDB Logs

There are two general suggestions for InnoDB log file sizing: Set combined total size of InnoDB log files greater than 25-50% of the InnoDB buffer pool size or set combined InnoDB log file log size equal to one hour’s worth of log entries during peak load. Larger log files can lead to slower recovery in the event of a server crash. However, they also reduce the number of checkpoints needed and reduce disk I/O.

Evaluate the size of one hour’s worth of binary logs under operational load, then decide whether to increase the size of the InnoDB log files or not.

More information:

  • https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#innodb_log_file_size

3. InnoDB Log Buffer Size

A larger InnoDB log buffer size means less disk I/O for larger transactions. It is suggested to set this to 64M on all servers.

More information:

  • https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#innodb_log_buffer_size

4. InnoDB Log Flush Interval

The innodb_flush_log_at_trx_commit variable controls when flushing of the log buffer to disk occurs. innodb_flush_log_at_trx_commit = 1 (default) flushes the log buffer to disk at each transaction commit. This is the safest, but also the least performant.  

innodb_flush_log_at_trx_commit = 0 flushes the log buffer to disk every second, but nothing on transaction commit. Up to one second (possibly more due to process scheduling) could be lost. If there’s any crash, MySQL or the server, can lose data. This is the fastest, but least safe option.

innodb_flush_log_at_trx_commit = 2 writes the log buffer out to file on each commit but flushes to disk every second. If the disk cache has a battery backup (for instance a battery backed cache raid controller) this is generally the best balance of performance and safety. A crash of MySQL should not lose data. A server crash or power outage could lose up to a second (possibly more due to process scheduling). A battery backed cache reduces this possibility.

It is suggested to use the first option for safety.

More information:

  • https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#innodb_flush_log_at_trx_commit

5. InnoDB IO Capacity

innodb_io_capacity should be set to approximately the maximum number of IOPS the underlying storage can handle.

This was set to 400, and has been increased to 1000 in the new configuration. It is suggested to benchmark the storage to determine whether this value can be increased further.

More information:

  • https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#innodb_io_capacity

6. Thread Cache Size

It is suggested to monitor the value of Threads_created. If it continues increasing at more than a few threads per minute, increase the value of thread_cache_size.

The thread cache size is set to 200 in the new configuration.

More information:

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#thread_cache_size

7. Table Cache, Table Definition Cache

The table_open_cache and table_defintion_cache variables control the number of tables and definitions to keep open for all threads.

Monitor Open_tables, Open_table_defintitions, Opened_tables, and Opened_table_definitions to determine the best value. The general suggestion is to set table_open_cache (and subsequently table_definition_cache) only high enough to reduce the rate of increase of the Opened_tables (and Opened_table_definitions respectively) status value.

Both table open cache and definition cache have been set to 2048 in the new configuration.

More information:

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#table_open_cache

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#table_definition_cache

8. Query Cache

Generally, if the query cache hit rate is below 50%, it should be investigated whether a performance benefit is being seen or not by having query cache enabled. There is an overhead with query cache for each query.

The query cache is currently disabled. Due to the nature of the application and the ratio of writes to reads, it is unlikely the query cache will offer any performance improvements and could negatively impact performance.

More information:

  • https://mariadb.com/kb/en/mariadb/query-cache/

9. Temporary Tables, tmp_table_size, & max_heap_table_size

MySQL uses the lower of max_heap_table_size and tmp_table_size to limit the size of temporary tables in memory. These are per client variables. While having this value large can help reduce the number of temporary tables created on disk, it also raises the risk of reaching the server's memory capacity since this is per client. Generally 32M to 64M is the suggested value to begin with for both variables and tune as needed.

Temporary tables are often used for GROUP BY, ORDER BY, DISTINCT, UNION, sub queries, etc. Ideally, MySQL should create these in memory, and as few on disk as possible.

It is important to note that queries not using joins appropriately and creating large temporary tables can be a cause for higher number of temporary tables on disk. Another reason is the memory storage engine uses fixed length columns and assumes worst case scenario. If columns are not sized correctly (for example, a VARCHAR(255) for a short string), this influences the size of the table in memory and can cause it to go to disk earlier than it should. Also, temporary tables with blob and text columns will immediately go to disk as the memory storage engine does not support them.

Both have been set to 64M in the new configuration.

More information:

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#tmp_table_size

10. Warning Log Level

It is suggested to set this to log_warnings = 2. Doing so logs information about aborted connections and access-denied errors.

More information:

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#log_warnings

  • http://www.chriscalender.com/what-exactly-does-log_warnings2-log/

11. Max Connections

Determine an appropriate value for max connections and change. Recommended starting value would be 500 and adjust up or down as needed by monitoring the Max_used_connections status variable.

More information:

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#max_connections

12. Transaction Isolation

It is suggested to investigate the available transaction isolation levels, and determine the best transaction isolation for this server’s use case.

More information:

  • https://mariadb.com/kb/en/mariadb/set-transaction/

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#tx_isolation

  • http://karlssonondatabases.blogspot.com/2012/08/the-real-differences-betweenread.html

  • https://www.facebook.com/notes/mysql-at-facebook/repeatable-read-versus-readcommitted-for-innodb/244956410932

13. Binary Log Format

It is recommended to use ROW binary log format for master-master replication.

More information:

14. Auto Increment Offsets

To help reduce the chances of collision between two masters being written to simultaneously, the auto increment and auto increment offset values need to be adjusted accordingly. 

15. Sync Binlog

By default, flushing of the binlog to disk is handled by the OS. In the event of a server crash, it is possible to lose transactions from the binary log leading to replication being out sync. Setting sync_binlog = 1 causes the binlog file to be flushed on every commit.

This is slower, but the safest option.

More information:

16. Crash Safe(r) Slaves

To help avoid replication errors after a slave crash, enable relay log recover and the syncing of the relay log and relay log info files to disk.

More information:

17. Log Slave Updates

To have chained replication (master -> slave-> slave), log slave updates needs to be enabled. This tells a slave to write replicated transactions to its own binary log, so that they can then be replicated to slaves off of it.

18. Read Only Slaves

Slaves should be read only to avoid data accidentally being written to them.

Note: Users with super privilege can still write when the server is read only.

19. Slave Net Timeout

The slave_net_timeout variable is the number of seconds the slave will wait for a packet from the master before trying to reconnect. The default is 3600 (1 hour). This means if the link goes down and isn’t detected, it could be up to an hour before the slave reconnects. This could lead to the slave suddenly being up to an hour behind the master.

It is suggested to set slave_net_timeout to a more reasonable value such as 30 or 60.

More information:

20. Learn More

Join our webinar on Thursday, November 10 at 10am PST and 10am CET on last minute preparations for peak traffic periods like Black Friday and Cyber Monday.

Register now for the 10am PST webinar

Register now for the 10am CET webinar

With all the excitement surrounding the perfect Thanksgiving meal, it's often easy to overlook how to prep your database environment for the biggest online shopping day of the year! Each year, more and more shoppers opt for online holiday deals, instead of the more traditional mall experience, which means that retailers must prepare for multiple days of high online traffic to their e-commerce sites. This year you’ll be prepared as I’m passing along a few tips to tune your database environment for some of the biggest online holiday shopping days - Black Friday and Cyber Monday!

Login or Register to post comments

by james_mclaurin_g at November 08, 2016 04:41 AM

November 07, 2016

Peter Zaitsev

Updating Percona XtraDB Cluster from 5.6.24-72.2 to 5.6.32-25.17

Percona XtraDB Cluster

Percona XtraDB ClusterThis blog describes how to upgrade Percona XtraDB Cluster in place from 5.6.24-72.2 to 5.6.32-25.17.

This very hands-on blog is the result of some questions such as “can I perform an in-place upgrade for Percona XtraDB Cluster” coming in. We have done these minor upgrades for Percona Managed Services customers running Percona XtraDB Cluster with lots of nodes, and I think it’s feasible to smoothly do it – if we pay special attention to some specific points I’ll call out. The main concern you should have is that if you have a big dataset, you should avoid SST (which consumes a lot of time if a node rebuild is needed).

Make sure you have all the steps very clear in order to avoid spending too much time when updating packages. The crucial point is Galera’s API GCache size. If you’re executing this when part of the cluster is online, and writes cannot be avoided, check first if the current configuration for the GCache can avoid nodes being written to SST while shutting down Percona Server on each of the nodes, updating packages and finally getting Percona Server back up online again.

A blog post written by Miguel Angel Nieto provides instructions on how to check the GCache file’s size and make sure it’s covering all the transactions for the time you need to take the node out. After increasing the size of the GCache, if the new node finds all the missing transactions on the donor’s GCache, it goes to IST. If not, it will need to use SST.

You can read more about the difference between IST and SST in the Galera API documentation.

Little less talk, little more action…

At this point, we need to update the packages one cluster node at a time. The cluster needs to stay up. I’m going to use a cluster with three nodes. Node 01 is dedicated to writes, while nodes 02 and 03 are dedicated to scaling the cluster’s reads (all are running 5.6.24-72.2). Just for the reference, it’s running on CentOS 6.5, and I’m going to use yum, but you can convert that to any other package manager depending on the Linux distort you’re running. This is the list of nodes and the packages we need to update:

#: servers are like below
(writes) node01::192.168.50.11:3306, Server version: 5.6.24-72.2 Percona XtraDB Cluster (GPL)
(reads) node02::192.168.50.12:3306, Server version: 5.6.24-72.2 Percona XtraDB Cluster (GPL)
(reads) node03::192.168.50.13:3306, Server version: 5.6.24-72.2 Percona XtraDB Cluster (GPL)
#: packages currently installed
[vagrant@node02 ~]$ sudo rpm -qa | grep Percona
Percona-XtraDB-Cluster-client-56-5.6.24-72.2.el6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.24-72.2.el6.x86_64
Percona-XtraDB-Cluster-galera-3-3.15-1.rhel6.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.24-72.2.el6.x86_64
Percona-XtraDB-Cluster-devel-56-5.6.24-72.2.el6.x86_64

Before updating the packages above, make sure you update the XtraBackup package in case you have configured the variable
wsrep_sst_method
 as xtrabackup-v2, this avoids the error below:

WSREP_SST: [ERROR] FATAL: The innobackupex version is 2.3.4. Needs xtrabackup-2.3.5 or higher to perform SST (2016102620:47:15.307)
2016-10-26 20:47:15 5227 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.50.12' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '5227'  ''

So, on all three nodes, update percona-xtrabackup to make sure we’re running the latest version:

[root@node02 vagrant]# yum update percona-xtrabackup
Loaded plugins: fastestmirror, versionlock
Determining fastest mirrors
...
--> Running transaction check
---> Package percona-xtrabackup.x86_64 0:2.3.4-1.el6 will be updated
---> Package percona-xtrabackup.x86_64 0:2.3.5-1.el6 will be an update

With that, take out of the cluster one node at a time, update all old binaries using yum update and start mysqld back up online. You don’t need to run
mysql_upgrade
 in this case. When you start mysqld with the newer binaries in place, depending on the size of configured cache, it’s going to perform either an IST or SST.

As you’re going to take the node out of rotation and out of the cluster, you don’t need to worry about configuring it as read_only. If you can do that in a maintenance window, where no one is writing data to the main node, it’s the best scenario. You won’t need to worry about SST, as in most cases the dataset is too big (TB++) and the SST time can be some hours (an overnight streaming in my experience).

Let’s take out node02 and update the packages:
#: let's take out node02 to update packages
[vagrant@node02 ~]$ sudo /etc/init.d/mysql stop
Shutting down MySQL (Percona XtraDB Cluster).... SUCCESS!
[vagrant@node02 ~]$ sudo yum update Percona-XtraDB-Cluster-client-56-5.6.24-72.2.el6.x86_64 Percona-XtraDB-Cluster-server-56-5.6.24-72.2.el6.x86_64 Percona-XtraDB-Cluster-galera-3-3.15-1.rhel6.x86_64 Percona-XtraDB-Cluster-shared-56-5.6.24-72.2.el6.x86_64 Percona-XtraDB-Cluster-devel-56-5.6.24-72.2.el6.x86_64
...
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package Percona-XtraDB-Cluster-client-56.x86_64 1:5.6.24-72.2.el6 will be updated
---> Package Percona-XtraDB-Cluster-client-56.x86_64 1:5.6.32-25.17.1.el6 will be an update
---> Package Percona-XtraDB-Cluster-devel-56.x86_64 1:5.6.24-72.2.el6 will be updated
---> Package Percona-XtraDB-Cluster-devel-56.x86_64 1:5.6.32-25.17.1.el6 will be an update
---> Package Percona-XtraDB-Cluster-galera-3.x86_64 0:3.15-1.rhel6 will be updated
---> Package Percona-XtraDB-Cluster-galera-3.x86_64 0:3.17-1.rhel6 will be an update
---> Package Percona-XtraDB-Cluster-server-56.x86_64 1:5.6.24-72.2.el6 will be updated
---> Package Percona-XtraDB-Cluster-server-56.x86_64 1:5.6.32-25.17.1.el6 will be an update
---> Package Percona-XtraDB-Cluster-shared-56.x86_64 1:5.6.24-72.2.el6 will be updated
---> Package Percona-XtraDB-Cluster-shared-56.x86_64 1:5.6.32-25.17.1.el6 will be an update
#: new packages in place after yum update - here, make sure you run yum clean all before yum update
[root@node02 ~]# rpm -qa | grep Percona
Percona-XtraDB-Cluster-shared-56-5.6.32-25.17.1.el6.x86_64
Percona-XtraDB-Cluster-galera-3-3.17-1.rhel6.x86_64
Percona-XtraDB-Cluster-devel-56-5.6.32-25.17.1.el6.x86_64
Percona-XtraDB-Cluster-client-56-5.6.32-25.17.1.el6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.32-25.17.1.el6.x86_64

Now start node02, knowing that it’s going to join the cluster, but with updated packages:

[root@node02 vagrant]# /etc/init.d/mysql start
Starting MySQL (Percona XtraDB Cluster)...State transfer in progress, setting sleep higher
.. SUCCESS!
#: here you can see that the state transfer was required due to different states from cluster and current node
#: this is gonna test the wsrep_sst_method to make sure it’s working well after updating percona-xtrabackup
#: to latest version available
2016-10-26 21:51:38 3426 [Note] WSREP: State transfer required:
 Group state: 63788863-1f8c-11e6-a8cc-12f338870ac3:52613
 Local state: 63788863-1f8c-11e6-a8cc-12f338870ac3:52611
2016-10-26 21:51:38 3426 [Note] WSREP: New cluster view: global state: 63788863-1f8c-11e6-a8cc-12f338870ac3:52613, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 3
2016-10-26 21:51:38 3426 [Warning] WSREP: Gap in state sequence. Need state transfer.
2016-10-26 21:51:38 3426 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.50.12' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '3426'  '' '
WSREP_SST: [INFO] Streaming with xbstream (20161026 21:51:39.023)
WSREP_SST: [INFO] Using socat as streamer (20161026 21:51:39.025)
WSREP_SST: [INFO] Evaluating timeout -s9 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} )(20161026 21:51:39.100)
2016-10-26 21:51:39 3426 [Note] WSREP: Prepared SST request: xtrabackup-v2|192.168.50.12:4444/xtrabackup_sst//1
...
2016-10-26 21:51:39 3426 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 52613)
2016-10-26 21:51:39 3426 [Note] WSREP: Requesting state transfer: success, donor: 1
WSREP_SST: [INFO] Proceeding with SST (20161026 21:51:39.871)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (2016102621:51:39.873)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20161026 21:51:39.876)
...
WSREP_SST: [INFO] Moving the backup to /var/lib/mysql/ (20161026 21:51:55.826)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf  --defaults-group=mysqld --no-version-check  --datadir=/var/lib/mysql/ --move-back --force-non-empty-directories ${DATA} &>${DATA}/innobackup.move.log (2016102621:51:55.829)
WSREP_SST: [INFO] Move successful, removing /var/lib/mysql//.sst (20161026 21:51:55.859)
...
Version: '5.6.32-78.1-56'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release rel78.1,Revision 979409a, WSREP version 25.17, wsrep_25.17
2016-10-26 21:51:56 3426 [Note] WSREP: 0.0 (pxc01): State transfer from 1.0 (pxc01) complete.
2016-10-26 21:51:56 3426 [Note] WSREP: Shifting JOINER -> JOINED (TO: 52613)
2016-10-26 21:51:56 3426 [Note] WSREP: Member 0.0 (pxc01) synced with group.
2016-10-26 21:51:56 3426 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 52613)
2016-10-26 21:51:56 3426 [Note] WSREP: Synchronized with group, ready for connections
2016-10-26 21:51:56 3426 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

As you can see above, node02 is back in the cluster. Additionally, it’s important to see that both the Percona-Server packages and the Galera API packages were updated. When the node is up and part of the cluster, you should see a new API version in the output of a SHOW GLOBAL STATUS LIKE ‘wsrep%’ command:

#: node02, the one we just updated
[root@node02 mysql]# mysql -e "show global status like 'wsrep_provider_version'G"
*************************** 1. row ***************************
Variable_name: wsrep_provider_version
        Value: 3.17(r447d194)
#: node01 not updated yet
[root@node01 mysql]# mysql -e "show global status like 'wsrep_provider_version'G"
*************************** 1. row ***************************
Variable_name: wsrep_provider_version
        Value: 3.15(r5c765eb)

Summarizing the procedure until now, the cluster packages update plan is:
  1. Take nodes out of rotation one at a time
  2. Shutdown mysqld on each node in order
  3. Update the below packages (or the ones corresponding to what you’re running):

[vagrant@node02 ~]$ sudo rpm -qa | grep Percona
Percona-XtraDB-Cluster-client-56-5.6.24-72.2.el6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.24-72.2.el6.x86_64
Percona-XtraDB-Cluster-galera-3-3.15-1.rhel6.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.24-72.2.el6.x86_64
Percona-XtraDB-Cluster-devel-56-5.6.24-72.2.el6.x86_64

  1. Update percona-xtrabackup on all the cluster’s nodes to avoid issues (as explained above):

WSREP_SST: [ERROR] FATAL: The innobackupex version is 2.3.4. Needs xtrabackup-2.3.5 or higher to perform SST (2016102620:47:15.307)
...
[root@node01 ~]# yum update percona-xtrabackup
...
[root@node02 ~]# xtrabackup --version
xtrabackup version 2.3.5 based on MySQL server 5.6.24 Linux (x86_64) (revision id: 45cda89)

  1. Start mysqld back online to grab the cluster’s current state

After finishing up with each node’s packages update, make sure you check the main node to see if they have joined the cluster. On node01, you can enter the below query to return the main status variables. This checks the current status of node01 and the cluster size:

mysql> SELECT @@HOSTNAME AS HOST, NOW() AS `DATE`, VARIABLE_NAME,VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME IN ('wsrep_cluster_state_uuid','wsrep_cluster_conf_id','wsrep_cluster_size','wsrep_cluster_status','wsrep_local_state_comment')G
*************************** 1. row ***************************
 HOST: node01
 DATE: 2016-10-27 18:14:42
 VARIABLE_NAME: WSREP_LOCAL_STATE_COMMENT
VARIABLE_VALUE: Synced
*************************** 2. row ***************************
 HOST: node01
 DATE: 2016-10-27 18:14:42
 VARIABLE_NAME: WSREP_CLUSTER_CONF_ID
VARIABLE_VALUE: 10
*************************** 3. row ***************************
 HOST: node01
 DATE: 2016-10-27 18:14:42
 VARIABLE_NAME: WSREP_CLUSTER_SIZE
VARIABLE_VALUE: 3
*************************** 4. row ***************************
 HOST: node01
 DATE: 2016-10-27 18:14:42
 VARIABLE_NAME: WSREP_CLUSTER_STATE_UUID
VARIABLE_VALUE: 1e0b9725-9c5e-11e6-886d-7708872d6aa5
*************************** 5. row ***************************
 HOST: node01
 DATE: 2016-10-27 18:14:42
 VARIABLE_NAME: WSREP_CLUSTER_STATUS
VARIABLE_VALUE: Primary
5 rows in set (0.00 sec)

Check the other nodes as well:

#: node02
[root@node02 mysql]# mysql -e "show global status like 'wsrep_local_state%'G"
*************************** 1. row ***************************
Variable_name: wsrep_local_state_uuid
Value: 1e0b9725-9c5e-11e6-886d-7708872d6aa5
*************************** 2. row ***************************
Variable_name: wsrep_local_state
Value: 4
*************************** 3. row ***************************
Variable_name: wsrep_local_state_comment
Value: Synced
#: node03
[root@node03 ~]# mysql -e "show global status like 'wsrep_local_state%'G"
*************************** 1. row ***************************
Variable_name: wsrep_local_state_uuid
Value: 1e0b9725-9c5e-11e6-886d-7708872d6aa5
*************************** 2. row ***************************
Variable_name: wsrep_local_state
Value: 4
*************************** 3. row ***************************
Variable_name: wsrep_local_state_comment
Value: Synced

Cheers!

by Wagner Bianchi at November 07, 2016 07:35 PM

MariaDB Foundation

MariaDB 10.1.19 and other releases now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.19, MariaDB Galera Cluster 10.0.28, and MariaDB Connector/ODBC 2.0.13. These are all Stable (GA) releases. See the release notes and changelogs for details. Download MariaDB 10.1.19 Release Notes Changelog What is MariaDB 10.1? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Galera […]

The post MariaDB 10.1.19 and other releases now available appeared first on MariaDB.org.

by Daniel Bartholomew at November 07, 2016 05:01 PM

Jean-Jerome Schmidt

High Availability on a Shoestring Budget - Deploying a Minimal Two Node MySQL Galera Cluster

We regularly get questions about how to set up a Galera cluster with just 2 nodes. The documentation clearly states you should have at least 3 Galera nodes to avoid network partitioning. But there are some valid reasons for considering a 2 node deployment, e.g., if you want achieve database high availability but have limited budget to spend on a third database node. Or perhaps you are running Galera in a development/sandbox environment and prefer a minimal setup.

Galera implements a quorum-based algorithm to select a primary component through which it enforces consistency. The primary component needs to have a majority of votes, so in a 2 node system, there would be no majority resulting in split brain. Fortunately, it is possible to add a garbd (Galera Arbitrator Daemon), which is a lightweight stateless daemon that can act as the odd node. Arbitrator failure does not affect the cluster operations and a new instance can be reattached to the cluster at any time. There can be several arbitrators in the cluster.

ClusterControl has support for deploying garbd on non-database hosts.

Normally a Galera cluster needs at least three hosts to be fully functional, however at deploy time two nodes would suffice to create a primary component. Here are the steps:

  1. Deploy a Galera cluster of two nodes,
  2. After the cluster has been deployed by ClusterControl, add garbd on the ClusterControl node.

You should end up with the below setup:

Deploy the Galera Cluster

Go to the ClusterControl deploy wizard to deploy the cluster.

Even though ClusterControl warns you a Galera cluster needs an odd number of nodes, only add two nodes to the cluster.

Deploying a Galera cluster will trigger a ClusterControl job which can be monitored at the Jobs page.

Install Garbd

Once deployment is complete, install garbd on the ClusterControl host. It will be under the Manage -> Load Balancer:

Installing garbd will trigger a ClusterControl job which can be monitored at the Jobs page. Once completed, you can verify garbd is running with a green tick icon at the top bar:

That’s it. Our minimal two-node Galera cluster is now ready!

by Severalnines at November 07, 2016 08:55 AM

November 04, 2016

Peter Zaitsev

Changing the Tablespace Directory with pt-online-schema-change

Tablespace Directory

Tablespace DirectoryIn this blog, we’ll discuss changing the tablespace directory using pt-online-schema-change.

One of the most annoying situations in the life of a DBA is realizing that the disk where the datadir resides is running out of space. If you’re lucky enough to run over an LVM volume or a RAID (depending on the level, though), it is easy to add disk space. But what if you are not that lucky, and your datadir is running on a single disk? Not so funny!

That is the exact situation we recently faced with a customer, for both the master and slave server. When trying to figure out a solution we saw that:

  • There was enough space on a different partition within the same server.
  • The tables have their own tablespace (innodb_file_per_table = on)
  • The MySQL version was 5.6.

We proceed to move some of the tables to the other partition to make room in the datadir, by using the tablespace placing feature: http://dev.mysql.com/doc/refman/5.6/en/tablespace-placing.html.

One note before we continue: if you are using a version equal or lower than 5.6.29, and

innodb_flush_method = O_DIRECT
, there’s a bug that the
CREATE TABLE....DATA DIRECTORY = '/another/directory/'
 won’t work. See: https://bugs.mysql.com/bug.php?id=79200. This was fixed on 5.6.30.

In the slave, we were able to stop the replication and move the tables. A problem occurred when we wanted to do the same on the master, since no downtime was allowed.

This is where

pt-online-schema-change
 came to the rescue!

We could use

pt-osc
 to do the table placing without downtime, but there’s a catch:
pt-osc
 only works when what you want to do is possible by using an ALTER TABLE statement, and in order to use the
CREATE TABLE....DATA DIRECTORY = '/another/directory'
  you need to use a CREATE TABLE statement.

What to do, then? Add a new feature to

pt-online-schema-change
:
--data-dir="/new/directory"
 

With the help of the main developer of the Percona Toolkit, Carlos Salguero, adding this new feature was possible in record time. Now moving the tablespace to another place without downtime is possible.

The new feature will be available with version 2.2.20 of Percona Toolkit, but until the release the code is available at the GitHub repository: https://raw.githubusercontent.com/percona/percona-toolkit/2.2/bin/pt-online-schema-change

Moving the table is just a matter of executing

pt-online-schema-change  --data-dir="/new/datadir" --execute

Let’s see an example. The following table resides in the default datadir:

mysql> show create table sbtest5;
*************************** 1. row ***************************
Table: sbtest5
Create Table: CREATE TABLE `sbtest5` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`k` int(10) unsigned NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_5` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=latin1 MAX_ROWS=1000000
1 row in set (0.00 sec)

Now, let’s move it to the directory /opt/datadir, which owner is the MySQL user:

[root@ps56-1 percona]# pt-online-schema-change  --data-dir="/opt/datadir" --execute D=percona,t=sbtest5
No slaves found.  See --recursion-method if host ps56-1 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `percona`.`sbtest5`...
Creating new table...
Created new table percona._sbtest5_new OK.
2016-11-01T19:22:27 Creating triggers...
2016-11-01T19:22:27 Created triggers OK.
2016-11-01T19:22:27 Copying approximately 1000 rows...
2016-11-01T19:22:27 Copied rows OK.
2016-11-01T19:22:27 Analyzing new table...
2016-11-01T19:22:27 Swapping tables...
2016-11-01T19:22:28 Swapped original and new tables OK.
2016-11-01T19:22:28 Dropping old table...
2016-11-01T19:22:28 Dropped old table `percona`.`_sbtest5_old` OK.
2016-11-01T19:22:28 Dropping triggers...
2016-11-01T19:22:28 Dropped triggers OK.
Successfully altered `percona`.`sbtest5`.

Okay, all good. Let’s see the new table definition:

mysql> show create table sbtest5;
*************************** 1. row ***************************
       Table: sbtest5
Create Table: CREATE TABLE `sbtest5` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_5` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=latin1 MAX_ROWS=1000000 DATA DIRECTORY='/opt/datadir/'

DATA DIRECTORY='/opt/datadir/'
 is in the right place! 🙂

And from the filesystem, the *.ibd file is in the new directory:

[root@ps56-1 opt]# ls -l /opt/datadir/percona/ | grep sbtest5
-rw-rw---- 1 mysql mysql 344064 Nov  1 19:22 sbtest5.ibd

And in the datadir, we can see the *isl file:

[root@ps56-1 opt]# ls -l /var/lib/mysql/percona/ | grep sbtest5
-rw-rw---- 1 mysql mysql   8632 Nov  1 19:22 sbtest5.frm
-rw-rw---- 1 mysql mysql     32 Nov  1 19:22 sbtest5.isl

And the contents seems fine:

[root@ps56-1 opt]# cat /var/lib/mysql/percona/sbtest5.isl
/opt/datadir/percona/sbtest5.ibd

So, in conclusion, if you need to move a table to another directory without downtime,

pt-online-schema-change
 can do that for you now.

by Daniel Guzmán Burgos at November 04, 2016 07:38 PM

Jean-Jerome Schmidt

Planets9s - MySQL Replication Resources and MongoDB Scaling & Sharding

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

MySQL Replication: All the Severalnines Resources

We’ve just published this handy overview of all the Severalnines resources we produced during the course of this year, which are aimed at helping users to get started with MySQL Replication and/or get more out of their existing setups. From monitoring, management and through to load balancing, with information on the latest features introduced in 5.6 and 5.7 - all important aspects are covered. Do check these out and let us know if you have any questions.

Access the resources

Upcoming Webinar: Become a MongoDB DBA - Scaling and Sharding

In this third webinar of the ‘Become a MongoDB DBA’ series, we will focus on scaling and sharding your MongoDB setup. You’ll learn how to plan your scaling strategy up front and how to prevent ending up with unusable secondary nodes and shards. And we’ll show you how to leverage ClusterControl’s MongoDB scaling and shards management capabilities.

Sign up for the webinar

Become a MongoDB DBA: Sharding ins- and outs - part 1

As some of you will know, MongoDB supports sharding out of the box and it is relatively easy to set up. However, there are important considerations you need to take before sharding your data and with that in mind, we’ve started a three part miniseries about MongoDB and sharding. In this initial post, we cover the basics of sharding with MongoDB by setting up a sharded environment with related recommendations.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at November 04, 2016 02:29 PM

November 03, 2016

Peter Zaitsev

Orchestrator: Moving VIPs During Failover

Orchestrator

OrchestratorIn this post, I’ll discuss how to moving VIPs during a failover using Orchestrator.

In our previous post, we showed you how Orchestrator works. In this post, I am going to give you a proof-of-concept on how Orchestrator can move VIPs in case of failover. For this post, I’m assuming the Orchestrator is already installed and able to manage the topology.

Hooks

Orchestrator is a topology manager. Nothing less nothing more. In the case of failover, it will reorganize the topology, promote a new master and connect the slaves to it. But it won’t do any DNS changes, and it won’t move VIPs (or anything else).

However, Orchestrator supports hooks. Hooks are external scripts that can be invoked through the recovery process. There are six different hooks:

  • OnFailureDetectionProcesses
  • PreFailoverProcesses
  • PostIntermediateMasterFailoverProcesses
  • PostMasterFailoverProcesses
  • PostFailoverProcesses
  • PostUnsuccessfulFailoverProcesses

More details are in the Orchestrator manual.

With these hooks, we can call our own external scripts, which fit in our architecture and can make modifications or let the application knows who is the new master.

There are different ways to do this:

  • Updating a CNAME: if a CNAME is pointing to the master, an external script can easily do a DNS update after failover.
  • Moving a VIP to the new master: this solution is similar to a MHA and MHA-helper script (this post will discuss this solution).
Parameters

When Orchestrator calls an external script, it can also use parameters. Here is an example using the parameters available with “PostFailoverProcesses”:

{failureType}, {failureDescription}, {failedHost}, {failureCluster}, {failureClusterAlias}, {failureClusterDomain}, {failedPort}, {successorHost}, {successorPort}, {successorAlias}, {countSlaves}, {slaveHosts}, {isDowntimed}, {isSuccessful}, {lostSlaves}

Without these parameters, we wouldn’t know who the new master is and which host died.

Moving VIPs

As I already mentioned, in this post I am going to show you how can you move VIPs with Orchestrator. I think many people are familiar with MHA. This solution is a bit similar to what MHA and MHA-helper does.

The main requirement is, at the same time, the main disadvantage. This solution requires SSH access from the Orchestrator node to the MySQL servers.

Adding User

First we have to add a user on the MySQL servers and Orchestrator node (you can change the username):

useradd -m orchuser -s /bin/bash

Adding sudo permissions:

vi /etc/sudoers.d/orch
Defaults !requiretty
orchuser ALL=(ALL) NOPASSWD: /usr/sbin/arping,/sbin/ip,/bin/ping

We have to add the public key from the Orchestrator node on the MySQL servers to the “/home/orchuser/.ssh/authorized_keys” file.

Now we can SSH from the Orchestrator server to the others without a password:

ssh orchuser@192.168.56.106

Failover Script

Now we need a failover script. I wrote two small bash scripts that can do it for us.

The first one called orc_hook.sh. Orchestrator calls this script like so:

vi /etc/orchestrator.conf.json
...
"PostFailoverProcesses": [
"echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log",
"/usr/local/bin/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log"
],
...

Because Orchestrator can handle multiple clusters, we have to define some cluster parameters:

vi /usr/local/bin/orch_hook.sh
...
rep=( eth0 "192.168.56.121" orchuser )

Where “rep” is the name of the cluster, “eth0” is the name of the interface where the VIP should be added, “192.168.56.121” is the VIP on this cluster and “orchuser” is the SSH user. If we have multiple clusters, we have to add more arrays like this with the cluster details.

Orchestrator executes this script with parameters:

/usr/local/bin/orch_hook.sh DeadMaster rep mysql1 mysql2

After the script recognized the cluster, it calls the next script.

The next script is named orch_vip.sh. This is called by “orch_hook.sh” and it is going to move the VIP to the new master. It is executed like this:

/usr/local/bin/orch_vip.sh -d 1 -n mysql2 -i eth0 -I 192.168.56.121 -u orchuser -o mysql1

  • -d 1 the master is dead
  • -n mysql2is the new master
  • -i eth0 the network interface
  • -I 192.168.56.121 is the VIP
  • -u orchuser is the SSH user
  • -o mysql1 is the old master

The script requires the “arping” and “mail” commands.

Conclusion

With these two small scripts, Orchestrator is able to move VIPs from the master to the new master, and the application can work again. However this script is not production ready, and there could be cases that it cannot handle. You can test it, but use it at your own risk.

I would appreciate any comments or pull requests so we can make it better. But stay tuned: in my next blog post I am going to show you how Orchestrator can work with “ProxySQL.”

by Tibor Korocz at November 03, 2016 11:00 PM

Jean-Jerome Schmidt

MySQL Replication: All the Severalnines Resources

As many of you will know, MySQL Replication has become an instrumental part of scale-out architectures in LAMP environments. MySQL offers plenty of solutions when there is a need to scale out, the most common being to add read replicas. The major bottleneck for our data is generally not so much oriented around writing our data but more around reading it back. Therefore the easiest way to scale MySQL is to add replicas for reading.

We’ve produced a number of resources during the course of this year aimed at helping users to get started with MySQL Replication and/or get more out of their existing setups.

We’ve summarised these resources here in a handy overview, so that you can pick and chose the ones that might be the most relevant to you.

Do check them out and let us know your feedback!

The White Papers

The MySQL© Replication Blueprint by Severalnines

This is a great resource for anyone wanting to build or optimise a MySQL replication set up. The MySQL Replication Blueprint is about having a complete ops-ready solution from end to end. From monitoring, management and through to load balancing, all important aspects are covered.

Download the whitepaper

MySQL Replication for High Availability

This whitepaper covers MySQL Replication with information on the latest features introduced in 5.6 and 5.7. There is also a hands-on, practical section on how to quickly deploy and manage a replication setup using ClusterControl.

Download the whitepaper

The On-Demand Webinars

Introducing the Severalnines MySQL© Replication Blueprint

The Severalnines Blueprint for MySQL Replication includes all aspects of a MySQL Replication topology with the ins and outs of deployment, setting up replication, monitoring, upgrades, performing backups and managing high availability using proxies as ProxySQL, MaxScale and HAProxy. This webinar provides an in-depth walk-through of this blueprint and explains how to make best use of it.

Watch the replay!

Managing MySQL Replication for High Availability

This webinar covers deployment and management of MySQL replication topologies using ClusterControl. We show you how to schedule backups, promote slaves and what the most important metrics are worth keeping a close eye on. We also demonstrate how you can deal with schema and topology changes and how to solve the most common replication issues.

Watch the replay!

Become a MySQL DBA: Schema Changes for MySQL Replication & Galera Cluster

Find out how to implement schema changes in the least impacting way to your operations and ensure availability of your database. This webinar also covers some real-life examples and discusses how to handle them.

Watch the replay!

Become a MySQL DBA: Replication Topology Changes for MySQL and MariaDB

Discover how to perform replication topology changes in MySQL / MariaDB, and what the failover process may look like. This webinar also discusses some external tools you may find useful when dealing with these operations.

Watch the replay!

We trust that these resources prove useful!

Happy replicating!

by Severalnines at November 03, 2016 09:14 PM

November 02, 2016

Peter Zaitsev

Percona responds to CVE-2016-6663 and CVE-2016-6664

CVE-2016-6663 and CVE-2016-6664

CVE-2016-6663 and CVE-2016-6664Percona has addressed CVE-2016-6663 and CVE-2016-6664 in releases of Percona Server for MySQL and Percona XtraDB Cluster.

Percona is happy to announce that the following vulnerabilities are fixed in current releases of Percona Server for MySQL and Percona XtraDB Cluster:

  • CVE-2016-6663: allows a local system user with access to the affected database in the context of a low-privileged account (CREATE/INSERT/SELECT grants) to escalate their privileges and execute arbitrary code as the database system user (typically “mysql”).
  • CVE-2016-6664: can let attackers who have gained access to mysql system user to further escalate their privileges to root user allowing them to fully compromise the system.

Users should upgrade to their relevant incremental release.

Percona Server

Percona XtraDB Cluster

Users should update as soon as is practical to ensure protection from these vulnerabilities.

Percona would like to thank Dawid Golunski (https://legalhackers.com) for disclosing this issue.

by David Busby at November 02, 2016 05:21 PM

Webinar Thursday November 3: The MySQL Ecosystem in 2016

shutterstock_342193670

Life cycleJoin Percona’s Chief Evangelist Colin Charles on Thursday, November 3, 2016, at 10 am PDT/ 1:00pm EDT (UTC-7) as he presents “The MySQL Ecosystem in 2016.”

MySQL is a unique adult (now 21 years old) in many ways. It supports plugins. It supports storage engines. It is also owned by Oracle, thus birthing two branches of the popular opensource database: Percona Server and MariaDB Server. It also once spawned a fork: Drizzle. Lately, a consortium of web scale users (think a chunk of the top ten sites out there) have spawned WebScaleSQL.

You’re a busy DBA having to maintain this mix of technologies. Or you’re a CIO planning to choose one branch. How do you go about picking? Supporting multiple databases? Find out more in this talk. We will also take a deep-dive into what feature differences exist between MySQL/Percona Server/MariaDB/WebScaleSQL, and how distributions package the various databases differently. Within the hour, we’ll inform you about the past, the present and hopefully make you knowledgeable enough to know what to pick in the future.

Note, there will also be coverage of the various trees around WebScaleSQL, like the Facebook tree, the Alibaba tree as well as the Twitter tree.

Register for The MySQL Ecosystem in 2016 webinar here.

MySQL EcosystemColin Charles, Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team of MariaDB Server in 2009, and worked at MySQL since 2005, Colin has been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and OpenOffice.org projects. He’s well known within many open source communities, and has spoken on the conference circuit.

by Dave Avery at November 02, 2016 04:06 PM

MariaDB AB

Update on Security Vulnerabilities CVE-2016-6663 and CVE-2016-6664 Related to MariaDB Server

Update on Security Vulnerabilities CVE-2016-6663 and CVE-2016-6664 Related to MariaDB Serverrasmusjohansson Wed, 11/02/2016 - 03:00

The details about two vulnerabilities affecting MariaDB (and MySQL) have been published. The two vulnerabilities are:

  • CVE-2016-6663, Privilege Escalation / Race Condition (also referred to as CVE-2016-5616)
  • CVE-2016-6664, Root Privilege Escalation (also referred to as CVE-2016-5617)

These vulnerabilities are follow-ups on CVE-2016-6662, which we addressed in a blog post in September, which was about Remote Root Code Execution.

CVE-2016-6663 makes use of a race condition when performing REPAIR TABLE on a MyISAM table. There were unsafe system calls performed by the REPAIR TABLE statement where it could be possible to intervene with commands resulting in permission changes on directories and files. This could then be used to obtain a shell with the rights of the user running MariaDB Server.

CVE-2016-6663 is fixed as of the following versions of MariaDB Server:

  • MariaDB Server 10.1.18, released on September 30
  • MariaDB Server 10.0.28, released on October 28
  • MariaDB Server 5.5.52, released on September 13

Please upgrade to these versions (or newer) to be protected against CVE-2016-6663. The latest versions can be downloaded here.

Using a shell obtained through CVE-2016-6663, one can further exploit CVE-2016-6664 to gain root user access.

It’s important to note that CVE-2016-6664 is NOT exploitable by itself. Shell access must first be obtained through a vulnerability like CVE-2016-6663. Because CVE-2016-6663 has been fixed and is no longer exploitable, we’ve determined that CVE-2016-6664 is not critical on it’s own and doesn’t warrant an immediate fix to be released. A fix will be included in the next upcoming maintenance releases of MariaDB Server 5.5, 10.0 and 10.1.

For the complete reports on the vulnerabilities, please refer to the advisories on legalhackers.com by Dawid Golunski who discovered these vulnerabilities.

The details about two vulnerabilities affecting MariaDB (and MySQL) have been published. The two vulnerabilities are:

Login or Register to post comments

by rasmusjohansson at November 02, 2016 07:00 AM

November 01, 2016

Peter Zaitsev

Percona Monitoring and Management (PMM) Information Script

Percona Monitoring and Management

Percona Monitoring and ManagementThis blog post discusses an information script for the Percona Monitoring and Management (PMM) tool.

In recent news, we announced the fresh-of-the-press Percona Monitoring and Management (or PMM for short) platform. Given the interaction of the different components that together make up PMM, I developed a script that helps provide you information about the status of your PMM installation.

You can use this script yourself, or one of our support might point you to this page to obtain the information they need to troubleshoot an issue you are experiencing.

You will likely want to execute this script once on the PMM server (i.e., the server on which you installed the docker image), and once on the client (i.e., where you installed the PMM client rpm/apt package), if they are not the same (virtual) machine. It provides a different output for each. When sending this information back to us, please ensure to identify which output belongs to which machine (either the server or the client).

To get/run the script, use (please note that this script requires

sudo
 privileges):

wget https://raw.githubusercontent.com/Percona-QA/percona-qa/master/pmm-info.sh && sh ./pmm-info.sh

If you would like to examine the script contents before executing it, you can split the command:

wget https://raw.githubusercontent.com/Percona-QA/percona-qa/master/pmm-info.sh
vi pmm-info.sh
sh ./pmm-info.sh

If you have both the PMM server and the PMM client on a single machine, the output looks similar to the following:

[roel@localhost ~]$ wget https://raw.githubusercontent.com/Percona-QA/percona-qa/master/pmm-info.sh && sh ./pmm-info.sh
--2016-11-01 09:49:22-- https://raw.githubusercontent.com/Percona-QA/percona-qa/master/pmm-info.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1451 (1.4K) [text/plain]
Saving to: ‘pmm-info.sh’
100%[=========================================================================================================================================>] 1,451 --.-K/s in 0s
2016-11-01 09:49:23 (218 MB/s) - ‘pmm-info.sh’ saved [1451/1451]
QA PMM Info Script v0.07
==================== uname -a
  Linux localhost.localdomain 3.10.0-123.13.2.el7.x86_64 #1 SMP Thu Dec 18 14:09:13 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
==================== /proc/version
  Linux version 3.10.0-123.13.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Thu Dec 18 14:09:13 UTC 2014
==================== OS Release (filtered cat /etc/*-release):
  CentOS Linux release 7.2.1511 (Core)
  HOME_URL="https://www.centos.org/"
  ID="centos"
  ID_LIKE="rhel fedora"
  NAME="CentOS Linux"
  PRETTY_NAME="CentOS Linux 7 (Core)"
  VERSION="7 (Core)"
  VERSION_ID="7"
==================== Docker release (docker --version):
  Docker version 1.10.3, build cb079f6-unsupported
==================== SELinux status if present (sestatus):
  SELinux status: enabled
  SELinuxfs mount: /sys/fs/selinux
  SELinux root directory: /etc/selinux
  Loaded policy name: targeted
  Current mode: enforcing
  Mode from config file: enforcing
  Policy MLS status: enabled
  Policy deny_unknown status: allowed
  Max kernel policy version: 28
==================== PMM server images (sudo docker images | grep pmm):
  docker.io/percona/pmm-server 1.0.5 0eade99a1612 2 weeks ago 652.9 MB
  docker.io/percona/pmm-server 1.0.4 1c83d650105e 6 weeks ago 677.3 MB
  docker.io/percona/pmm-server 1.0.4-dev20160908.24845ea 4406c13d0ba3 7 weeks ago 676 MB
==================== PMM server state (sudo docker ps -a | grep pmm):
  fdf5e6adca7e percona/pmm-server:1.0.4 "/opt/entrypoint.sh" 5 days ago Exited (137) 5 days ago pmm-server3
  843a2ee31c96 percona/pmm-server:1.0.4 "/opt/entrypoint.sh" 5 days ago Created pmm-server2
  f075314b529f percona/pmm-server:1.0.4 "/bin/true" 5 days ago Created pmm-data2
  2090c072b56a percona/pmm-server:1.0.5 "/opt/entrypoint.sh" 6 days ago Up 7 minutes 0.0.0.0:80->80/tcp, 443/tcp pmm-server
  653fb58ce723 percona/pmm-server:1.0.5 "/bin/true" 6 days ago Created pmm-data
==================== Exporter status (ps -ef | grep exporter):
  root 2748 1 0 09:44 ? 00:00:00 /bin/sh -c /usr/local/percona/pmm-client/node_exporter -web.listen-address=192.168.0.13:42000 -collectors.enabled=diskstats,filesystem,loadavg,meminfo,netdev,netstat,stat,time,uname,vmstat >> /var/log/pmm-linux-metrics-42000.log 2>&1
  root 2749 1 0 09:44 ? 00:00:00 /bin/sh -c /usr/local/percona/pmm-client/mysqld_exporter -collect.auto_increment.columns=true -collect.binlog_size=true -collect.global_status=true -collect.global_variables=true -collect.info_schema.innodb_metrics=true -collect.info_schema.processlist=true -collect.info_schema.query_response_time=true -collect.info_schema.tables=true -collect.info_schema.tablestats=true -collect.info_schema.userstats=true -collect.perf_schema.eventswaits=true -collect.perf_schema.file_events=true -collect.perf_schema.indexiowaits=true -collect.perf_schema.tableiowaits=true -collect.perf_schema.tablelocks=true -collect.slave_status=true -web.listen-address=192.168.0.13:42002 >> /var/log/pmm-mysql-metrics-42002.log 2>&1
  root 2750 2748 1 09:44 ? 00:00:37 /usr/local/percona/pmm-client/node_exporter -web.listen-address=192.168.0.13:42000 -collectors.enabled=diskstats,filesystem,loadavg,meminfo,netdev,netstat,stat,time,uname,vmstat
  root 2751 2749 0 09:44 ? 00:00:05 /usr/local/percona/pmm-client/mysqld_exporter -collect.auto_increment.columns=true -collect.binlog_size=true -collect.global_status=true -collect.global_variables=true -collect.info_schema.innodb_metrics=true -collect.info_schema.processlist=true -collect.info_schema.query_response_time=true -collect.info_schema.tables=true -collect.info_schema.tablestats=true -collect.info_schema.userstats=true -collect.perf_schema.eventswaits=true -collect.perf_schema.file_events=true -collect.perf_schema.indexiowaits=true -collect.perf_schema.tableiowaits=true -collect.perf_schema.tablelocks=true -collect.slave_status=true -web.listen-address=192.168.0.13:42002
  roel 4445 4392 0 10:24 pts/0 00:00:00 grep exporter
==================== PMM agent (sudo pmm-admin --version):
  1.0.5
==================== PMM info (sudo pmm-admin info):
  pmm-admin 1.0.5
  PMM Server | 192.168.0.13
  Client Name | localhost.localdomain
  Client Address | 192.168.0.13
  Service manager | linux-systemd
==================== PMM network check (sudo pmm-admin check-network):
  PMM Network Status
  Server | 192.168.0.13
  Client | 192.168.0.13
  * Client --> Server
  --------------- -------
  SERVER SERVICE STATUS
  --------------- -------
  Consul API OK
  QAN API OK
  Prometheus API OK
  Connection duration | 102.017µs
  Request duration | 211.962µs
  Full round trip | 313.979µs
  * Client <-- Server
  -------------- ----- ---------------------- -------
  SERVICE TYPE NAME REMOTE ENDPOINT STATUS
  -------------- ----- ---------------------- -------
  linux:metrics test 192.168.0.13:42000 OK
  mysql:metrics test 192.168.0.13:42002 OK
==================== PMM list (sudo pmm-admin list):
  pmm-admin 1.0.5
  PMM Server | 192.168.0.13
  Client Name | localhost.localdomain
  Client Address | 192.168.0.13
  Service manager | linux-systemd
  -------------- ----- ------------ -------- --------------------------------------------------------------------------------------------- ------------------------
  SERVICE TYPE NAME CLIENT PORT RUNNING DATA SOURCE OPTIONS
  -------------- ----- ------------ -------- --------------------------------------------------------------------------------------------- ------------------------
  linux:metrics test 42000 YES -
  mysql:queries test 42001 YES root:***@unix(/sda/COMP8-PS131016-percona-server-5.6.33-78.0-linux-x86_64-debug/socket.sock) query_source=perfschema
  mysql:metrics test 42002 YES root:***@unix(/sda/COMP8-PS131016-percona-server-5.6.33-78.0-linux-x86_64-debug/socket.sock)

Support might also ask you to run the extended version of the script. This produces a lot of output, and it easier to sent the output to a log file:

wget https://raw.githubusercontent.com/Percona-QA/percona-qa/master/pmm-info.sh && sh ./pmm-info.sh ext > pmm-server.log  # Please execute this on the PMM server, and sent us pmm-server.log
wget https://raw.githubusercontent.com/Percona-QA/percona-qa/master/pmm-info.sh && sh ./pmm-info.sh > pmm-client.log      # Please execute this on the PMM client, and sent us pmm-client.log

by Roel Van de Paar at November 01, 2016 08:30 PM

The Future of TokuDB at Percona

TokuDB

TokuDBIn this blog post, I’ll discuss the future of TokuDB at Percona. Spoiler: solid.

As soon as we announced the fact that MyRocks was coming to Percona Server for MySQL at Percona Live Europe, rumors appeared that this means we’re going to phase out TokuDB at the same time.  

I can understand why those rumors would start: just a few months ago we deprecated Fractal Trees technology (called PerconaFT) in favor of MongoRocks and RocksDB for Percona Server for MongoDB.

As much as this might look like the same situation as with PerconaFT, TokuDB is very different. PerconaFT was actually a new port of Fractal Trees technology for MongoDB. It used the MongoDB storage engine API and focused on replacing TokuMX (a full fork of an earlier version of MongoDB). PerconaFT was new, and MongoRocks was already well-established and battle-tested by Parse. The MongoDB internal design was also much more friendly towards the RocksDB storage engine than Fractal Trees technology (as David Murphy explains here).    

Our research revealed that PerconaFT had very few current users, and MongoRocks was a superior choice for new users. This is why we depreciated it.

For MySQL, TokuDB is a rather mature storage engine. It has been in production for more than five years and is trusted by many businesses and large enterprises (in industries ranging from banking and finance to gaming) for use with critical applications.

MyRocks however, with as much promise as it shows, has not been extensively battle-tested outside of Facebook. And as it is hard to beat “Facebook scale,” it hasn’t exposed its performance under a large variety of workloads. There are also some known limitations in MyRocks that need work.

Ultimately though, I believe MyRocks has a great future, and we at Percona are going to take part in making this future reality. But this does not mean we plan to immediately abandon TokuDB, or place it on “life support”.    

You can see new development going on for TokuDB: we have dramatically improved block allocation performance and added a new, more convenient TokuDB file layout option. We are also working on adding support for new compression algorithms in TokuDB, support for Performance Schema and refactoring the checkpoint process to ensure uniform performance on all workloads. It is true TokuDB is not changing as rapidly as RocksDB, but that is to be expected – as the more mature storage engine, it should embrace stability first and foremost.

Our overall principle at Percona is to keep our ego in check. We want to provide the best open source ecosystem solutions available to our customers, whether they were developed in-house or not. This means that within 3-5 years MyRocks could possibly be superior to TokuDB in functionality and performance for all imaginable workloads. Or an even better open source storage engine technology might get developed and come to market. If or when this happens, we will consider depreciating TokuDB in favor of such technology – while providing the customers with enough notice, tools and support to ensure they can successfully migrate.

What this really means is that TokuDB is still being actively supported and developed for at least Percona Server 5.7 and Percona Server 8.0 (and longer). We will continue this as long as a reasonable use case remains for this technology.

Please feel free to contact us with any questions, or add them in the comments below.

by Peter Zaitsev at November 01, 2016 04:42 PM

October 31, 2016

Peter Zaitsev

Percona Server for MongoDB 3.2.10-3.0 is now available

Percona Server for MongoDB

Percona Server for MongoDBPercona announces the release of Percona Server for MongoDB 3.2.10-3.0 on October 31, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.2.10-3.0 is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.2 protocol and drivers. It extends MongoDB with MongoRocks, Percona Memory Engine, and PerconaFT storage engine, as well as enterprise-grade features like external authentication and audit logging at no extra cost. Percona Server for MongoDB requires no changes to MongoDB applications or code.

Note:

We deprecated the PerconaFT storage engine. It will not be available in future releases.


This release is based on MongoDB 3.2.10 and includes the following additional new features and improvements:

New Features

  • Universal Hot Backup
    This release introduces an integrated hot backup system for the default WiredTiger and alternative MongoRocks engine. It creates a physical data backup on a running server without performance degradation.
  • Profiling Rate Limit
    Rate limiting enables to seed logged queries and thus decrease the impact of profiling on database performance.

Bug Fixes

  • Fixed crash when running out of WiredTiger cache under Percona Memory Engine.

The release notes are available in the official documentation.

 

by Alexey Zhebel at October 31, 2016 05:58 PM

Webinar Wednesday November 2: MongoDB Backups, All Grown Up!

MongoDB Backups
MongoDB BackupsPlease join us on Wednesday, November 2, 2016 at 10:00 am PDT / 1:00pm EDT (UTC-7) for the webinar MongoDB Backups, All Grown Up, featuring David Murphy, Percona’s Mongo Practice Manager. 
 

It has been a long road to stable and dependable backups in the MongoDB space. This webinar covers the current types of backups and their limitations when it comes to sharding. From there we will move into why you can’t be consistent with a single node, and how you can take sharded or unsharded consistent backups. 

The webinar also covers more about the “mongodb_consistent_backup.py” tool, and the features it offers to the open source community: how to use it, what it looks like and why it’s our preferred backup methodology.

Click here to register for the webinar MongoDB Backups, All Grown Up!

MongoDB BackupsDavid Murphy, MongoDB Practice Manager

David joined Percona in October 2015 as Practice Manager for MongoDB. Prior to that, David joined the ObjectRocket by Rackspace team as the Lead DBA in Sept 2013. With the growth involved with a any recently acquired startup, David’s role covered a wide range from evangelism, research, run book development, knowledgebase design, consulting, technical account management, mentoring and much more.

Prior to the world of MongoDB, David was a MySQL and NoSQL architect at Electronic Arts. There, he worked with some of the largest titles in the world like FIFA, SimCity, and Battle Field providing tuning, design, and technology choice responsibilities. David maintains an active interest in database speaking and exploring new technologies.

by Dave Avery at October 31, 2016 05:42 PM

October 29, 2016

Valeriy Kravchuk

MySQL Support Engineer's Chronicles, Issue #2

It's time to continue my new series that I've started 2 weeks ago. I'd like to start with a reminder that it's time to send your talks for "MySQL and Friends Devroom" at FOSDEM 2017 - the only MySQL-related event next year that I plan to attend in any case. It seems we have one more week to submit, but I've already filled in all the details for the "main" talk, "Understanding MyRocks locks and deadlocks". I'd like to apply my usual source code reading and gdb breakpoints approach in case if by the end of January, 2017 official documentation still misses important details. Official MySQL manual is still the process of improving with regards to explaining InnoDB locks, and I am happy to admit that this week yet another my related documentation request, Bug #71735, "Manual does not explain locks set by SELECT ... FOR UPDATE properly", was closed.

I am really excited by community and developers activity around MyRocks these days, so it was easy to decide to spend more time on MyRocks while preparing to FOSDEM. Just take a look at recent issues reported by Justin Swanhart, for example: Issue #365 and Issue #369. How often do you see so active, almost real time discussion in the bug reports about something else MySQL-related these days?

I had to spend a lot of time recently trying to understand and prove the root cause of one performance problem that happens once in while on a node of Galera cluster, and for this I've used Performance_Schema, maybe for the first time (since reporting Bug #68079) on a real life case. It turned out that proper sizing of the tables there, so that under high load and concurrency we still get relevant data few seconds after problem had happened in its tables is not a trivial task. Moreover, it seems Galera code is not instrumented in enough details, so it is hard to measure the impact it may have. While trying to find a proper trade off between the amount of data collected, memory usage and performance impact, I decided to use profilers, from PMP to ideas of replacing gdb there with quickstack, to perf (that I've used to make some really good observations). I'd like to summarize some recent (and upcoming) profiling experience in a talk that I am still trying to submit for FOSDEM, titled "Applying profilers to MySQL". In the meantime, check comments to this my Facebook post, for many useful details.

I'd like to submit also a talk on how single bug report may influence MySQL development and development of skills of the bug reporter. It's just the idea for now, and I plan to concentrate on a story around Bug #68079 that will be 4 years old by the time of FOSDEM 2017. I have new interesting results to share that I've got while trying to check how the fix helped and how this same use case performs and scales in MyRocks now. Not sure if this is a good idea and if anything but a blog post may come out of it. Some of my previous attempts to build a talk around this bug were not really well accepted...

This week I had to explain how to get rid of huge ibdata* file(s) while moving each InnoDB table to a separate .ibd file. I've found a great summary of options in this old blog post by Shlomi Noach. Very useful in case you can not just dump everything and reload into a new instance, for whatever reason.

Another problem I had to work on was related to bad query plan on a slave where the table was partitioned. We tried to fight with bad cardinality estimations (see Bug #67351 etc), but even after getting them closer to reality optimizer still uses wrong index sometimes. My next hope is Engine-independent table statistics of MariaDB.

I've noted that customers recently are actively trying to use PAM and LDAP with MySQL. I consider this blog post by my colleague Geoff Montee very useful for them (and myself).

Finally, I've found this great blog post by Jervin Real very useful while working with colleagues on continuous InnoDB crashes caused by corrupted partition .ibd file. The idea was to restore partition data from backup on a separate instance and also recover data for all non-corrupted partitions of a huge table there, while problematic instance is starting up in a forced recovery mode to drop the problematic table and check for any further problems.

As I already mentioned several bugs in this post, I'd like to share a link to the list of bugs recently reported by my colleague from MariaDB Foundation, Sergey Vojtovich. Dear Oracle MySQL engineers, please, pay proper attention to them, especially those with "bs" in the "Summary"!

by Valeriy Kravchuk (noreply@blogger.com) at October 29, 2016 07:10 PM

October 28, 2016

Peter Zaitsev

Blog Series: MySQL Configuration Management

MySQL Configuration Management

MySQL Configuration ManagementMySQL configuration management remains a hot topic, as I’ve noticed on numerous occasions during my conversations with customers.

I thought it might be a good idea to start a blog series that goes deeper in detail into some of the different options, and what modules potentially might be used for managing your MySQL database infrastructure.

Configuration management has been around since way before the beginning of my professional career. I, myself, originally began working on integrating an infrastructure with my colleagues using Puppet.

Why is configuration management important?
  • ReproducibilityIt’s giving us the ability to provision any environment in an automated way, and feel sure that the new environment will contain the same configuration.
  • Fast restorationThanks to reproducibility, you can quickly provision machines in case of disasters. This makes sure you can focus on restoring your actual data instead of worrying about the deployment and configuration of your machines.
  • Integral part of continuous deploymentContinuous deployment is a terminology everyone loves: being able to deploy changes rapidly and automatically after automated regression testing requires a configuration management solution.
  • Compliance and securitySolutions like Puppet and Chef maintain and enforce configuration parameters on your infrastructure. This can sound bothersome at first, but it’s essential for maintaining a well-configured environment.
  • Documented environmentAlthough reading someone’s puppet code can potentially harm you beyond insanity, it provides you with the real truth about your infrastructure.
  • Efficiency and manageabilityConfiguration management can automate repetitive tasks (for example, user grants, database creation, configuration variables), as well as security updates, service restarts, etc. These can potentially bring you less work and faster rollouts.
Which players are active in this field?

The most popular open source solutions are Puppet, Chef, Ansible, and CFengine (among others). In this series, we will go deeper in the first three of them.

Let’s first start by giving you a quick, high-level introduction.

Puppet

Puppet is a language used to describe the desired state of an environment. The Puppet client reads the catalog of the expected state from the server and enforces these changes on the client. The system works based on a client/server principle.

Puppet has as default four essential components:

  • Puppet Server: A Java virtual machine offering Puppet’s core services.
  • Puppet Agent: A client library that requests configuration catalog info from the puppet-server.
  • Hiera: A key-value lookup database, which can store and modify values for specific hosts.
  • Facter: An application that keeps an inventory of the local node variables.

How can you integrate puppet in your MySQL infrastructure?

This will allow you and your team to create users, databases, install and configure MySQL

Probably my old “code from hell” module is still somewhere out there.

Chef

Chef also consists of a declarative language (like Puppet) based on Ruby which will allow you to write cookbooks for potential integrable technologies. Chef is also based on a server/client solution. The client being chef nodes, the server managing the cookbooks, catalogs and recipes.

In short, Chef consists of:

  • Chef server: Manages the multiple cookbooks and the catalog
  • Chef clients (nodes): The actual system requesting the catalog information from the chef server.
  • Workstations: This is a system that is configured to run Chef command-line tools that synchronize with a Chef-repository or the Chef server. You could also describe this as a Chef development and tooling environment.

How can you integrate Chef in your MySQL infrastructure:

Ansible

Ansible originated with something different in mind. System engineers typically chose to use their own management scripts. This can be troublesome and hard to maintain. Why wouldn’t you use something easy and automated and standardized? Ansible fills in these gaps, and simplifies management of Ansible targets.

Ansible works by connecting to your nodes (by SSH default) and pushes out Ansible modules to them. These modules represent the desired state of the node, and will be used to execute commands to attain the desired state.

This procedure is different to Puppet and Chef, which are essentially preferably client/server solutions.

Some pre-made modules for MySQL are:

Conclusion and Next Steps

Choose your poison (or magical medicine, you pick the wording), every solution has its perks.

Keep in mind that in some situations running a complicated Puppet or Chef infrastructure could be overkill. At this moment, a solution like Ansible might be a quick and easily integrable answer for you.

The next blog post will go over the Puppet Forge MySQL module, so stay tuned!

by Dimitri Vanoverbeke at October 28, 2016 10:45 PM

New TokuDB and PerconaFT database file management feature in Percona Server 5.6.33-79.0 and Percona Server 5.7.15-9

Tablespace Directory

TokuDB and PerconaFT database file managementThis blog post discusses a new TokuDB and PerconaFT database file management feature in two Percona Server releases.

By now you have hopefully read through Peter’s post and my two prior posts on the TokuDB/PerconaFT file set. If you have not, it is probably a good idea to run through them now before we get into the details of this new feature.

We introduced a new server option beginning in Percona Server 5.6.33-79.0 and Percona Server 5.7.15-9, called tokudb_dir_per_db, that addresses two shortcomings within the current TokuDB implementation:

  • The renaming of data files on table/index rename
  • The ability to group data files together within a directory that represents a single database.

The new option is disabled by default in 5.6.33-79.0, but will be enabled by default beginning in 5.7.15-9


New table renaming functionality

When you rename a TokuDB table via SQL, the data files on disk keep their original names. Only the mapping in the PerconaFT directory file is changed to map the new dictionary name to the original internal file names. This makes it difficult to quickly match database/table/index names to their actual files on disk, requiring you to use the INFORMATION_SCHEMA.TOKUDB_FILE_MAP to cross reference.

When tokudb_dir_per_db is enabled, this is no longer the case. When you rename a table, the mapping in the PerconaFT directory file will be updated, and the files will be renamed on disk to reflect the new table name. This was a lot more difficult to implement than we originally thought due to the non-transactional nature of the underlying file systems. We had to “invent” a transactional file system rename functionality within PerconaFT to ensure that a crash during a file system mv src dst was recoverable.


New directory layout functionality

Many users have had issues with managing the huge volume of individual files used by TokuDB and PerconaFT. We are beginning to take some steps to help improve the manageability of these files, and potentially even reduce the number of files present.

When you enable tokudb_dir_per_db, all new tables and indices are placed within their corresponding database directory within the tokudb_data_dir or system datadir. Existing table files will not be automatically relocated to their corresponding database directory.

You can easily move a table’s data files into the new scheme and proper database directory with a few steps:

mysql> SET GLOBAL tokudb_dir_per_db=true;
mysql> RENAME TABLE <table> TO <tmp_table>;
mysql> RENAME TABLE <tmp_table> TO <table>;

Two renames are needed because MySQL will not allow you to rename a table to itself. The first rename renames the table to the temporary name and mv‘s the tables files into the owning database directory. The second rename sets the table name back to the original name. Tables can, of course, also be renamed/moved across databases and will be placed correctly into the corresponding database directory.

You must be careful with renaming tables! If you have used any tricks to create symlinks of the database directories on different storage volumes, the mv is not a simple directory mv on the same volume, but a physical copy across volumes. This can take quite some time and prevents access to the table being moved during the copy.

NOTE: If you have tokudb_data_dir set to something other than the system datadir, TokuDB creates a directory matching the name of the database. Upon dropping of the database, this directory remains behind.


Example:

While running Percona Server 5.7.15-9 with tokudb_dir_per_db=false to illustrate the old behavior, create a table t1, show the file map, and list the data directory:

mysql> SET GLOBAL tokudb_dir_per_db=false;
Query OK, 0 rows affected (0.00 sec)
mysql> DROP DATABASE IF EXISTS test; CREATE DATABASE test; USE test;
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Database changed
mysql> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c VARCHAR(200), KEY kab(a, b)) ENGINE=TOKUDB;
Query OK, 0 rows affected (0.07 sec)
mysql> SELECT * FROM INFORMATION_SCHEMA.TOKUDB_FILE_MAP;
*************************** 1. row ***************************
      dictionary_name: ./test/t1-key-kab
   internal_file_name: ./_test_t1_key_kab_de_3_1d.tokudb
         table_schema: test
           table_name: t1
table_dictionary_name: key-kab
*************************** 2. row ***************************
      dictionary_name: ./test/t1-main
   internal_file_name: ./_test_t1_main_de_2_1d.tokudb
         table_schema: test
           table_name: t1
table_dictionary_name: main
*************************** 3. row ***************************
      dictionary_name: ./test/t1-status
   internal_file_name: ./_test_t1_status_de_1_1d.tokudb
         table_schema: test
           table_name: t1
table_dictionary_name: status
3 rows in set (0.00 sec)

$ ls -1 data/*.tokudb
data/_test_t1_key_kab_de_3_1d.tokudb
data/_test_t1_main_de_2_1d.tokudb
data/_test_t1_status_de_1_1d.tokudb

We see the data files for our table t1 as the three files _test_t1_*

Rename t1 to all_the_kings_horses, show the file map again, and another listing of the data directory:

mysql> RENAME TABLE t1 TO all_the_kings_horses;
Query OK, 0 rows affected (0.01 sec)
mysql> SELECT * FROM INFORMATION_SCHEMA.TOKUDB_FILE_MAP;
*************************** 1. row ***************************
      dictionary_name: ./test/all_the_kings_horses-key-kab
   internal_file_name: ./_test_t1_key_kab_de_3_1d.tokudb
         table_schema: test
           table_name: all_the_kings_horses
table_dictionary_name: key-kab
*************************** 2. row ***************************
      dictionary_name: ./test/all_the_kings_horses-main
   internal_file_name: ./_test_t1_main_de_2_1d.tokudb
         table_schema: test
           table_name: all_the_kings_horses
table_dictionary_name: main
*************************** 3. row ***************************
      dictionary_name: ./test/all_the_kings_horses-status
   internal_file_name: ./_test_t1_status_de_1_1d.tokudb
         table_schema: test
           table_name: all_the_kings_horses
table_dictionary_name: status
3 rows in set (0.00 sec)

$ ls -1 data/*.tokudb
data/_test_t1_key_kab_de_3_1d.tokudb
data/_test_t1_main_de_2_1d.tokudb
data/_test_t1_status_de_1_1d.tokudb

The file names remained the same as the original table, but the file map has changed to reflect the new table/dictionary names.

Let us inject a little confusion by adding another index to the table:

mysql> alter table all_the_kings_horses add index kac(a,c);
Query OK, 0 rows affected (0.08 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> SELECT * FROM INFORMATION_SCHEMA.TOKUDB_FILE_MAP;
*************************** 1. row ***************************
      dictionary_name: ./test/all_the_kings_horses-key-kab
   internal_file_name: ./_test_t1_key_kab_de_3_1d.tokudb
         table_schema: test
           table_name: all_the_kings_horses
table_dictionary_name: key-kab
*************************** 2. row ***************************
      dictionary_name: ./test/all_the_kings_horses-key-kac
   internal_file_name: ./_test_all_the_kings_horses_key_kac_e3_3_1d_B_0.tokudb
         table_schema: test
           table_name: all_the_kings_horses
table_dictionary_name: key-kac
*************************** 3. row ***************************
      dictionary_name: ./test/all_the_kings_horses-main
   internal_file_name: ./_test_t1_main_de_2_1d.tokudb
         table_schema: test
           table_name: all_the_kings_horses
table_dictionary_name: main
*************************** 4. row ***************************
      dictionary_name: ./test/all_the_kings_horses-status
   internal_file_name: ./_test_t1_status_de_1_1d.tokudb
         table_schema: test
           table_name: all_the_kings_horses
table_dictionary_name: status
4 rows in set (0.00 sec)

$ ls -1 data/*.tokudb
data/_test_all_the_kings_horses_key_kac_e3_3_1d_B_0.tokudb
data/_test_t1_key_kab_de_3_1d.tokudb
data/_test_t1_main_de_2_1d.tokudb
data/_test_t1_status_de_1_1d.tokudb

The file for the new index kac was created with the current table name, not the original.

Now we move on to the new behavior. First make sure that tokudb_dir_per_db=true, then rename the table again, show the file map, and do another directory listing:

mysql> SET GLOBAL tokudb_dir_per_db=true;
Query OK, 0 rows affected (0.00 sec)
mysql> RENAME TABLE all_the_kings_horses TO all_the_kings_men;
Query OK, 0 rows affected (0.02 sec)
mysql> SELECT * FROM INFORMATION_SCHEMA.TOKUDB_FILE_MAP;
*************************** 1. row ***************************
      dictionary_name: ./test/all_the_kings_men-key-kab
   internal_file_name: ./test/all_the_kings_men_key_kab_ea_2_1d.tokudb
         table_schema: test
           table_name: all_the_kings_men
table_dictionary_name: key-kab
*************************** 2. row ***************************
      dictionary_name: ./test/all_the_kings_men-key-kac
   internal_file_name: ./test/all_the_kings_men_key_kac_ea_3_1d.tokudb
         table_schema: test
           table_name: all_the_kings_men
table_dictionary_name: key-kac
*************************** 3. row ***************************
      dictionary_name: ./test/all_the_kings_men-main
   internal_file_name: ./test/all_the_kings_men_main_ea_4_1d.tokudb
         table_schema: test
           table_name: all_the_kings_men
table_dictionary_name: main
*************************** 4. row ***************************
      dictionary_name: ./test/all_the_kings_men-status
   internal_file_name: ./test/all_the_kings_men_status_ea_5_1d.tokudb
         table_schema: test
           table_name: all_the_kings_men
table_dictionary_name: status
4 rows in set (0.00 sec)

$ ls -1 data/test/*.tokudb
data/test/all_the_kings_men_key_kab_ea_2_1d.tokudb
data/test/all_the_kings_men_key_kac_ea_3_1d.tokudb
data/test/all_the_kings_men_main_ea_4_1d.tokudb
data/test/all_the_kings_men_status_ea_5_1d.tokudb

The database files have now been renamed to properly match the name of the database, table and keys, and they have been moved into the data/test directory.

Now let us watch all that action with an alternate tokudb_data_dir. Rather than showing how to move files around as mentioned in the previous blog posts, we will just reset our TokuDB installation and start the server with a different tokudb_data_dir that is a sibling to the server datadir, called tokudb_data.

mysql> SET GLOBAL tokudb_dir_per_db=true;
Query OK, 0 rows affected (0.00 sec)
mysql> DROP DATABASE IF EXISTS test; CREATE DATABASE test; USE test;
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Database changed
mysql> CREATE TABLE t1(a INT PRIMARY KEY, b INT, c VARCHAR(200), KEY kab(a, b)) ENGINE=TOKUDB;
Query OK, 0 rows affected (0.15 sec)
mysql> SELECT * FROM INFORMATION_SCHEMA.TOKUDB_FILE_MAP;
*************************** 1. row ***************************
      dictionary_name: ./test/t1-key-kab
   internal_file_name: /ssd/toku/DB-295/percona-server-install-5.7/tokudb_data/test/t1_key_kab_d9_3_1d.tokudb
         table_schema: test
           table_name: t1
table_dictionary_name: key-kab
*************************** 2. row ***************************
      dictionary_name: ./test/t1-main
   internal_file_name: /ssd/toku/DB-295/percona-server-install-5.7/tokudb_data/test/t1_main_d9_2_1d.tokudb
         table_schema: test
           table_name: t1
table_dictionary_name: main
*************************** 3. row ***************************
      dictionary_name: ./test/t1-status
   internal_file_name: /ssd/toku/DB-295/percona-server-install-5.7/tokudb_data/test/t1_status_d9_1_1d.tokudb
         table_schema: test
           table_name: t1
table_dictionary_name: status
3 rows in set (0.00 sec)

$ ls -1 tokudb_data/
test
__tokudb_lock_dont_delete_me_data
__tokudb_lock_dont_delete_me_temp
$ ls -1 tokudb_data/test/
t1_key_kab_d9_3_1d.tokudb
t1_main_d9_2_1d.tokudb
t1_status_d9_1_1d.tokudb

This shows us TokuDB now putting everything over in the directory we specified in our tokudb_data_dir, and is following the tokudb_dir_per_db paradigm by creating the directory called test before creating the table.

What happens when we drop that database?

mysql> drop database test;
Query OK, 1 row affected (0.02 sec)

$ ls -1 tokudb_data/
test
__tokudb_lock_dont_delete_me_data
__tokudb_lock_dont_delete_me_temp
$ ls -1 tokudb_data/test/

All of the tables files have been removed, but as mentioned above, the test still exists and needs to be removed manually.


Thanks for reading! We hope that you have found this blog series useful and look forward to hearing your experience with this new feature.

In the future we will investigate implementing the CREATE TABLE … DATA|INDEX DIRECTORY=… feature for TokuDB, which builds on top of this work.

We need to give a shout out to Vladislav Lesin, who took the lead on this issue and fought several battles in his attempt to ensure that this is fully crash safe and recoverable.

by George O. Lorch III at October 28, 2016 08:18 PM

MariaDB Foundation

MariaDB 10.0.28 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.0.28. This is a Stable (GA) release. See the release notes and changelog for details. Download MariaDB 10.0.28 Release Notes Changelog What is MariaDB 10.0? MariaDB APT and YUM Repository Configuration Generator Thanks, and enjoy MariaDB!

The post MariaDB 10.0.28 now available appeared first on MariaDB.org.

by Daniel Bartholomew at October 28, 2016 04:19 PM

Jean-Jerome Schmidt

Planets9s - The complete MySQL Query Tuning Trilogy, scaling & sharding MongoDB and more!

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

Watch Part 3 of the MySQL Query Tuning Trilogy: working with optimizer and SQL tuning

This week we completed our popular webinar trilogy on MySQL Query Tuning and the three parts are now available for you to watch online. Part 3 this Tuesday focussed on working with the optimizer and SQL tuning. In this session, Krzysztof Książek, Senior Support Engineer at Severalnines, discussed how execution plans are calculated. He also took a closer look at InnoDB statistics, how to hint the optimizer and finally, how to optimize SQL. Watch this last session or indeed all three parts by following the link below.

Watch replays

Sign up for our new webinar on scaling & sharding MongoDB

Join us for our third ‘How to become a MongoDB DBA’ webinar on Tuesday, November 15th, during which we will uncover the secrets and caveats of MongoDB scaling and sharding. Learn with this webinar how to plan your scaling strategy up front and how to prevent ending up with unusable secondary nodes and shards. We’ll also show you how to leverage ClusterControl’s MongoDB scaling and shard management capabilities.

Sign up for the webinar

ClusterControl Developer Studio: Custom database alerts by combining metrics

Following our introduction blogs to the ClusterControl Developer Studio and the ClusterControl Domain Specific Language, we now look at our MongoDB replication window advisor. It was recently added to the Advisors Github repository. Our advisor will not only check on the length of the replication window, but also calculate the lag of its secondaries and warn us if the node would be in any risk of danger. All advisors are open source on Github, so anyone can contribute back to the community!

Read the blog

Schema changes in Galera cluster for MySQL and MariaDB - how to avoid RSU locks

This blog discusses the Rolling Schema Upgrade as the only feasible method to execute schema changes where pt-online-schema-change failed or is not feasible to use. We check how this behaves in real life, in two scenarios. First, we have a single connection to the Galera cluster. We don’t scale out reads, we just use Galera as a way to improve availability of our application. We will simulate it by running a sysbench workload on one of the Galera cluster nodes. We are also going to execute RSU on this node. Check out the blog for the full discussion.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at October 28, 2016 11:39 AM

MariaDB AB

Getting to Know MariaDB ColumnStore

Getting to Know MariaDB ColumnStoredavid_thompson_g Thu, 10/27/2016 - 23:51

With the recent announcement of MariaDB ColumnStore, we get many questions on the architecture and functionality of MariaDB ColumnStore. This blog post describes the architecture of MariaDB ColumnStore.

MariaDB ColumnStore is a GPLv2 open source columnar database built on MariaDB Server. It is a fork and evolution of the former InfiniDB product. It can deployed in the cloud (optimized for Amazon Web Services) or on a local cluster of Linux servers using either local or networked storage.

MariaDB ColumnStore – a massively parallel, distributed database

MariaDB ColumnStore consists of two main component service classes:

  • User Modules :  providing the MariaDB SQL engine front end and query orchestration.
  • Performance Modules : providing distributed query processing.

By utilizing the MariaDB server as the front end, all server capabilities of MariaDB can also be leveraged including secure connections, audit plugin, and other storage engines. For the latter, ColumnStore supports the ability to perform cross engine joins to allow querying a ColumnStore table against say an InnoDB table.

The MariaDB server process incoming connection requests and queries for each user connection as shown in the diagram below. Once a SQL query is received by the User Module, it processes that SQL query and distributes query operations across the Performance Modules. The Performance Modules executes the query operations in distributed manner and reads/writes MariaDB ColumnStore Columnar data files and return intermediate query operation results to User Modules. Any operation that cannot be distributed is performed at the User Module level before returning results through the MariaDB server process back to the client.

Blog - Getting to Know ColumnStore.png

Both User Modules and Performance Modules are horizontally scalable. Scaling Performance Modules first allows for the greatest reduction in individual query performance. Scaling User Modules allows for high availability and increased query concurrency. Both User Modules and Performance Modules are multi threaded further increasing increasing performance on a per node level.

MariaDB ColumnStore – A Columnar Database

As a columnar database, MariaDB ColumnStore stores table data in columns rather than rows. This allows the query optimizer to only read columns necessary to fulfil a given query and its result set. Once a particular column value has been identified, the corresponding row values can easily be determined through a logical offset into those other column files. Data partitioning by columns is also called Vertical Partitioning. Horizontal Partitioning of data is achieved by distributing the data across Performance Modules. Further data elimination within a Performance Module is achieved by maintaining range metadata within a distributed Extent Map allowing elimination of particular column extent files should the value fall outside of the range. By storing data as columns it is also much easier to add and remove columns over time even online.

It is important to recognize that both the Vertical and Horizontal partition are automatically provided and managed by MariaDB ColumnStore. Very little configuration and maintenance is required to maintain high performance of the system. As a result indexes are not required to be defined or maintained either.

MariaDB ColumnStore – The Big Data Platform

If your analytical query workload is up to a hundred thousand rows and your table's size remains under a million rows, an OLTP engine such as InnoDB or MyISAM will handle this with reasonable performance. Beyond that, performance is much harder to tune for and maintain. MariaDB ColumnStore is designed for such workloads.

It is suitable for reporting or analysis of millions-billions of rows from data sets containing millions-trillions of rows. As the data size grows, MariaDB ColumnStore allows you to add more PM nodes to scale your performance linearly.

My next blog post will continue this topic and go deeper into more specifics of how MariaDB ColumnStore is able to handle such big data workloads. You can write to me at dthompson@mariadb.com or follow me @davidwbt for more insights into MariaDB ColumnStore.

With the recent announcement of MariaDB ColumnStore, we get many questions on the architecture and functionality of MariaDB ColumnStore. This blog post describes the architecture of MariaDB ColumnStore.

Login or Register to post comments

by david_thompson_g at October 28, 2016 03:51 AM

Invitation to Join MariaDB ColumnStore 1.0.4 Beta!

Invitation to Join MariaDB ColumnStore 1.0.4 Beta!nishantvyas Thu, 10/27/2016 - 23:49

Today, MariaDB ColumnStore, MariaDB's distributed columnar storage engine for analytics workload, is reaching its next major milestone – the availability of MariaDB ColumnStore 1.0.4 Beta software release.

MariaDB ColumnStore 1.0.4 is built by porting InfiniDB 4.6.2 (which is based on MySQL 5.1) on MariaDB 10.1.18. With this release, the implementation now pasases InfiniDB compatibility regression test and performs on par and better in some cases than InfiniDB which is a huge achievement considering the complexity and changes involved in moving from MySQL 5.1 to MariaDB 10.1. 

The release notes for MariaDB ColumnStore can be found here and the list of bugs fixed can be found in the release notes. Binaries for ColumnStore 1.0.4 Beta are available for download here.

For general ColumnStore questions, please visit our Knowledgebase here. Please report any issues in JIRA so that we may address them.

Today, MariaDB ColumnStore, MariaDB's distributed columnar storage engine for analytics workload, is reaching its next major milestone – the availability of MariaDB ColumnStore 1.0.4 Beta software release.

Login or Register to post comments

by nishantvyas at October 28, 2016 03:49 AM

October 27, 2016

Jean-Jerome Schmidt

Webinar: Become a MongoDB DBA - Scaling and Sharding

Join us for our third ‘How to become a MongoDB DBA’ webinar on Tuesday, November 15th! In this webinar we will uncover the secrets and caveats of MongoDB scaling and sharding.

Become a MongoDB DBA - Scaling and Sharding

MongoDB offers read and write scaling out of the box: adding secondary nodes will increase your potential read capacity, while adding shards will increase your potential write capacity. However, adding a new shard doesn’t necessarily mean it will be used. Choosing the wrong shard key may also cause uneven data distribution.

There is more to scaling than just simply adding nodes and shards. Factors to take into account include indexing, shard re-balancing,replication lag, capacity planning and consistency.

Learn with this webinar how to plan your scaling strategy up front and how to prevent ending up with unusable secondary nodes and shards. Finally, we’ll show you how to leverage ClusterControl’s MongoDB scaling capabilities and have ClusterControl manage your shards.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, November 15th at 09:00 GMT / 10:00 CET (Germany, France, Sweden)

Register Now

North America/LatAm

Tuesday, November 15th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)

Register Now

Agenda

  • What are the differences in read and write scaling with MongoDB
  • Read scaling considerations with MongoDB
  • MongoDB read preference explained
  • How sharding works in MongoDB
  • Adding new shards and balance data
  • How to scale and shard MongoDB using ClusterControl
  • Live Demo

Speaker

Art van Scheppingen is a Senior Support Engineer at Severalnines. He’s a pragmatic database expert with over 16 years experience in web development. He previously worked at Spil Games as Head of Database Engineering, where he kept a broad vision upon the whole database environment: from MySQL to MongoDB, Vertica to Hadoop and from Sphinx Search to SOLR. He regularly presents his work and projects at various conferences (Percona Live, MongoDB Open House, FOSDEM) and related meetups.

We look forward to “seeing” you there!

This session is based upon the experience we have using MongoDB and implementing it for our database infrastructure management solution, ClusterControl. For more details, read through our ‘Become a MongoDB DBA’ blog series.

by Severalnines at October 27, 2016 08:48 PM

Completing the MySQL Query Tuning Trilogy: working with optimizer & SQL tuning

Thanks to everyone who participated in this week’s webinar on working with optimizer and SQL tuning. In this session, Krzysztof Książek, Senior Support Engineer at Severalnines, discussed how execution plans are calculated. He also took a closer look at InnoDB statistics, how to hint the optimizer and finally, how to optimize SQL.

Watch the replay

The complete MySQL Query Tuning Trilogy is available to watch online, so if you missed the first two parts, you can now catch up with them on demand.

MySQL Query Tuning Trilogy

An in-depth look into the ins and outs of optimising MySQL queries

When done right, tuning MySQL queries and indexes can significantly increase the performance of your application as well as decrease response times. This is why we’ve covered this complex topic over the course of three webinars of 60 minutes each.

Part 1: Query tuning process and tools

Part 2: Indexing and EXPLAIN - deep dive

Part 3: Working with optimizer & SQL tuning

Trilogy speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience in managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. He’s the main author of the Severalnines blog and webinar series: Become a MySQL DBA.

Happy MySQL query tuning!

by Severalnines at October 27, 2016 08:40 PM

Peter Zaitsev

Thoughts on MySQL 8.0 Invisible Indexes

Invisible Indexes

Invisible IndexesMySQL 8.0 has a new feature called “invisible indexes,” which allow you to quickly enable/disable indexes from being used by the MySQL Optimizer.

I wanted to share some of my first experiences and thoughts about this new feature.

Why is it good for us?

There are a couple of use cases. One of them is if you want to drop an index, but want to know the effect beforehand. You can make it invisible to the optimizer. It is a quick metadata change to make an index invisible. Once you are sure there is no performance degradation, you can then drop the index.

The main point is that the invisible index is unavailable for use by the optimizer, but it is still present and kept up-to-date by write operations. The optimizer won’t use it, even if we try to “FORCE INDEX”. I think we should be able to force it somehow, though. There might be scenarios where this could be useful:

  • We can create a new invisible index, but if we want to test it we have to make it visible. That means all the queries are going to be able to use it, which could have an immediate effect on the application. I don’t think this is the best approach if we just want to test it. You should always test on staging, but not everybody has the same data size or real life data on their staging servers. Forcing invisible indexes could be useful.
  • You have many indexes, but are not sure which one is not in use. You can change one index to invisible to see if there is any performance degradation. If yes, you can change it back immediately.
  • You could have a special case where only one query can use that index. In that case, an invisible index could be a great solution.

How can you create invisible indexes?

There are two options. We can create a table with an invisible index like this:

CREATE TABLE `t1` (
`i` int(11) DEFAULT NULL,
`j` int(11) DEFAULT NULL,
`k` int(11) DEFAULT NULL,
KEY `i_idx` (`i`),
KEY `idx_1` (`i`,`j`,`k`) INVISIBLE
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Or we can use alter table and change the index to be invisible:

ALTER TABLE t1 ALTER INDEX idx_1 INVISIBLE;

Using invisible indexes

If we want to remove an index now, we can change it to invisible. But what about queries that use “FORCE/USE INDEX”? Are they are going to throw an error? If you force an index that does not exist, you get an error. You don’t get an error with invisible indexes. The optimizer doesn’t use it, but it knows that it exists.

mysql> show create table t1 G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`i` int(11) DEFAULT NULL,
`j` int(11) DEFAULT NULL,
`k` int(11) DEFAULT NULL,
KEY `i_idx` (`i`),
KEY `idx_1` (`i`,`j`,`k`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
mysql> explain select * from t1 force index(idx_1) where i=1 and j=4;
+----+-------------+-------+------------+------+---------------+-------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key   | key_len | ref         | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+-------+---------+-------------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ref  | idx_1         | idx_1 | 10      | const,const |    2 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+-------+---------+-------------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> alter table t1 alter index idx_1 invisible;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> explain select * from t1 force index(idx_1) where i=1 and j=4;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   16 |     6.25 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
mysql> explain select * from t1 where i=1 and j=4;
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ref  | i_idx         | i_idx | 5       | const |    2 |    10.00 | Using where |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

As you can see, if we use “FORCE INDEX” with an invisible index MySQL performs a full table scan because (from mysql.com):

The FORCE INDEX hint acts like USE INDEX (index_list), with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the named indexes to find rows in the table.

MySQL won’t throw any errors because the index exists, but it is not visible. Even if there is another usable index, it is going to perform a full table scan. On a large table, that could cause serious performance issues. Even if MySQL doesn’t throw any errors during the query execution, it should log a warning in the error log.

Conclusion

I think invisible indexes are a great new feature that could be useful for many customers. We should to be able to use an invisible index if necessary, and be able to log queries that are trying to use invisible indexes.

You can read more about invisible indexes in this blog post, and in the MySQL Documentation.

by Tibor Korocz at October 27, 2016 06:20 PM

MariaDB Foundation

2016 MariaDB Developers Meetup Presentations

I’ve collected slides and videos from several of the presentations given at the MariaDB Developers Meetup in Amsterdam, 6-8 October 2016. This meetup was kindly hosted by Booking.com. The presentations are listed here in roughly the order they were given. If I have both the slides and video for a given talk I link to […]

The post 2016 MariaDB Developers Meetup Presentations appeared first on MariaDB.org.

by Daniel Bartholomew at October 27, 2016 03:51 PM

MariaDB AB

Does InnoDB page size matter?

Does InnoDB page size matter?janlindstrom Wed, 10/26/2016 - 23:44

From MariaDB 10.1 there is a feature where the InnoDB page size can be configured to be larger than the default 16K for normal, uncompressed tables. However, there has been little performance results that show whether the page size really effects  the transaction performance or response time. In this blog, we study effects of page size on three different storage devices using the same benchmark(s). These devices are:

  • Traditional hard disk
  • SSD (Tree Intel X25-E Extreme SSDSA2SH032 G1GN 2.5-inch 32GB SATA II SLC Internal Solid State Drive as RAID-0)
  • FusionIO NVM device (ioMemory SX300-1600 with VSL driver 4.2.1 build 1137 and NVMFS 1.1.1)

Results from different devices should not be compared to each other, as there are other variables like device bandwidth and different file systems. Instead, we will look at page size effect on each device separately. 

I will use sysbench v0.5 as the benchmark with the following parameters varying the num-threads from 8 to 512:

./sysbench --test=tests/db/oltp.lua --mysql-table-engine=innodb --oltp-test-mode=complex --oltp-read-only=off --oltp-table-size=10000000 --max-requests=1000000000 --num-threads= --max-time=10800 --mysql-socket=/mnt/dfs/db/mysql.sock --mysql-user=root run 


Hard Disk

In traditional hard disk, InnoDB page size configuration has relatively small effect on overall performance using sysbench benchmark as seen in Figure 1.

Figure 1. Hard Disk performance with different page sizes and number of threads.
 

Similarly, there is no significant difference on average response time with different page size settings (Figure 2).

Figure 2. Hard Disk average response time with different page sizes and number of threads.

SSD

Looking at SSD in Figure 3, there seems to be a small decrease in performance if a 64K page size is used, although this decrease is not very significant. There are no differences when a 32K page size is used.

Figure 3. SSD performance with different page sizes and number of threads.

A similar difference is seen on average response times (Figure 4). Using a page size of 32K has no noticeable difference to the default 16K, however the 64K setting has slightly longer response times, but the difference is not significant. 
 

Figure 4. SSD average response time with different page sizes and number of threads.

NVM

In non volatile memory (NVM), device page size seems to have a very small effect (Figure 5). Similarly to SSD, a 64K page size has decreased performance but the difference is smaller compared to SSD.

Figure 5. NVM performance with different page sizes and number of threads.

Similarly, an average response time comparison of a 64K setup has slightly increased average response times compared to other configurations, but this difference is not significant (Figure 6).

Figure 6. NVM average response time with different page sizes and number of threads.

Summary

InnoDB page size setting has no significant performance effect on this benchmark. In both SSD and NVM devices there a is small decrease of performance and increased average response time when a 64K setting is used. However, this benchmark does not prove that there would not be a significant effect on different workloads. So if applications have a need for bigger page sizes, it should be first and foremost be benchmarked to test systems on how page size settings effect application workload. These results clearly show only that with sysbench, like an OLTP workload, there is no significant benefit or disadvantage using different page sizes in InnoDB.

From MariaDB 10.1 there is a feature where the InnoDB page size can be configured to be larger than the default 16K for normal, uncompressed tables. However, there has been little performance results that show whether the page size really effects  the transaction performance or response time. In this blog, we study effects of page size on three different storage devices using the same benchmark(s). These devices are:

Login or Register to post comments

by janlindstrom at October 27, 2016 03:44 AM

October 26, 2016

Oli Sennhauser

Multi-Instance set-up with MySQL Enterprise Server 5.7 on RHEL 7 with SystemD

In our current project the customer wants to install and run multiple MySQL Enterprise Server 5.7 Instances on the same machine (yes, I know about virtualization (we run on kvm), containers, Docker, etc.). He wants to use Red Hat Enterprise Linux (RHEL) 7 which brings the additional challenge of SystemD. So mysqld_multi is NOT an option any more.

We studied the MySQL documentation about the topic: Configuring Multiple MySQL Instances Using systemd. But to be honest: It was not really clear to me how to do the job...

So we started to work out our own cook-book which I want to share here.

The requirements are as follows:

  • Only ONE version of MySQL Enterprise Server binaries at a time is available. If you want to have more complicated set-ups (multi version) consider our MyEnv.
  • Because Segregation of Duties is an issue for this customer from the financial industries we are not allowed to use the operating system root user or have sudo privileges.
  • We have to work with the operating system user mysql as non privileged user.

Preparation work for the operating system administrator

This is the only work which has to be done under a privileged account (root):

shell> sudo yum install libaio
shell> sudo groupadd mysql
shell> sudo useradd -r -g mysql -s /bin/bash mysql
shell> sudo cp mysqld@.service /etc/systemd/system/

Installation of MySQL Enterprise Server binaries as non privileged user

To perform this task we need the generic MySQL Binary Tar Balls which you can get from the Oracle Software Delivery Cloud:

shell> mkdir /home/mysql/product
shell> cd /home/mysql/product
shell> tar xf /download/mysql-<version>.tar.gz
shell> ln -s mysql-<version> mysql-5.7.x
shell> ln -s mysql-5.7.x mysql
shell> echo 'export PATH=$PATH:/home/mysql/product/mysql/bin' >> ~/.bashrc
shell> . ~/.bashrc

Creating, Starting and Stopping several MySQL Enterprise Server Instances

shell> export INSTANCE_NAME=TMYSQL01   # and TMYSQL02 and TMYSQL03
shell> mkdir -p /mysql/${INSTANCE_NAME}/etc /mysql/${INSTANCE_NAME}/log /mysql/${INSTANCE_NAME}/data /mysql/${INSTANCE_NAME}/binlog
shell> cat /mysql/${INSTANCE_NAME}/etc/my.cnf
#
# /mysql/${INSTANCE_NAME}/etc/my.cnf
#
[mysqld]
datadir   = /mysql/${INSTANCE_NAME}/data
pid_file  = /var/run/mysqld/mysqld_${INSTANCE_NAME}.pid
log_error = /mysql/${INSTANCE_NAME}/log/error_${INSTANCE_NAME}.log
port      = 3306   # and 3307 and 3308
socket    = /var/run/mysqld/mysqld_${INSTANCE_NAME}.sock
_EOF
shell> cd /home/mysql/product/mysql
shell> bin/mysqld --defaults-file=/mysql/${INSTANCE_NAME}/etc/my.cnf --initialize --user=mysql --basedir=/home/mysql/product/mysql
shell> bin/mysqld --defaults-file=/mysql/${INSTANCE_NAME}/etc/my.cnf --daemonize >/dev/null 2>&1 &
shell> mysqladmin --user=root --socket=/var/run/mysqld/mysqld_${INSTANCE_NAME}.sock --password shutdown

So far so good. We can do everything with the database without root privileges. One thing is missing: The MySQL Database Instances should be started automatically at system reboot. For this we need a SystemD unit file:

#
# /etc/systemd/system/mysqld@.service
#

[Unit]

Description=Multi-Instance MySQL Enterprise Server
After=network.target syslog.target


[Install]

WantedBy=multi-user.target
 

[Service]

User=mysql
Group=mysql
Type=forking
PIDFile=/var/run/mysqld/mysqld_%i.pid
TimeoutStartSec=3
TimeoutStopSec=3
# true is needed for the ExecStartPre
PermissionsStartOnly=true
ExecStartPre=/bin/mkdir -p /var/run/mysqld
ExecStartPre=/bin/chown mysql: /var/run/mysqld
ExecStart=/home/mysql/product/mysql/bin/mysqld --defaults-file=/mysql/%i/etc/my.cnf --daemonize
LimitNOFILE=8192
Restart=on-failure
RestartPreventExitStatus=1
PrivateTmp=false

This file must be copied as root to:

shell> cp mysqld@.service /etc/systemd/system/

Now you can check if SystemD behaves correctly as follows:

shell> sudo systemctl daemon-reload
shell> sudo systemctl enable mysqld@TMYSQL01   # also TMYSQL02 and TMYSQL03
shell> sudo systemctl start mysqld@TMYSQL01
shell> sudo systemctl status 'mysqld@TMYSQL*'
shell> sudo systemctl start mysqld@TMYSQL01

How to go even further

If you need a more convenient or a more flexible solution you can go with our MySQL Enterprise Environment MyEnv.

by Shinguz at October 26, 2016 08:15 PM

Peter Zaitsev

Percona Poll: What Database Technologies Are You Using?

Database Technologies

Database TechnologiesTake Percona’s poll on what database technologies you use in your environment.

Different databases get designed for different scenarios. Using one database technology for every situation doesn’t make sense, and can lead to non-optimal solutions for common issues. Big data and IoT applications, high availability, secure backups, security, cloud vs. on-premises deployment: each have a set of requirements that might need a special technology. Relational, document-based, key-value, graphical, column family – there are many options for many problems. More and more, database environments combine more than one solution to address the various needs of an enterprise or application (known as polyglot persistence).

Please take a few seconds and answer the following poll on database technologies. Which are you using? Help the community learn what technologies critical database environments employ. Please select from one to six technologies as they apply to your environment.

If you’re using other solutions or have specific issues, feel free to comment below. We’ll post a follow-up blog with the results!

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

 

by Dave Avery at October 26, 2016 07:53 PM