Planet MariaDB

September 23, 2016

Peter Zaitsev

MariaDB Sessions at Percona Live Europe 2016

Percona Live Europe

Percona Live EuropeIf you’re going to Percona Live Europe Amsterdam 2016, and are specifically interested in MariaDB-related technologies, there are many talks for you to attend.

On Tutorial Monday, you will want to attend my tutorial The Complete MariaDB Server Tutorial. In those three hours, you’ll learn about all the differences that MariaDB Server has to offer in comparison to MySQL. Where there is feature parity, I’ll point out some syntax differences. This is a semi-hands-on tutorial as the feature scope would make fully hands-on impossible!

On Tuesday, my session picks include:
  • Common Table Expressions in MariaDB Server 10.2. You’ll get to learn from the very people implementing it, Sergei Petrunia (rockstar optimizer developer, and MyRocks hacker), and Galina Shalygina (a Google Summer of Code 2016 student who also helped develop it).
  • Percona XtraDB 5.7: Key Performance Algorithms. It is safe to presume that MariaDB Server 10.2 needs to include Percona XtraDB 5.7. Percona XtraDB has been the default storage engine for MariaDB Server since forever. In fact, MDEV-6113 suggests this will be available in the yet unreleased MariaDB Server 10.2.2. Learn why Percona XtraDB matters from Laurynas Biveinis, the Team Lead for Percona Server.
  • MySQL/MariaDB Parallel Replication: inventory, use-cases and limitations. Jean-François Gagné is the expert on parallel replication, not only from a usage and benchmarking perspective, but also from a feature request standpoint. I highly recommend attending this talk if you care about parallel replication. It will be packed!
On Wednesday, my session picks include:

So those are my Percona Live Europe picks that focus on the MariaDB ecosystem. Notably absent though is more MariaDB MaxScale content from the authors of the application. And what about MariaDB ColumnStore?

There is of course, still time to register. Use the FeaturedTalk code to get a discount.

by Colin Charles at September 23, 2016 05:06 PM

Identifying and Solving Database Performance Issues with PMM: Webinar Q&A

Solving Database Performance Issues

Solving Database Performance IssuesIn this blog, I will provide answers to the Q & A for the Identifying and Solving Database Performance Issues with PMM webinar.

First, I want to thank everybody for attending the September 15 webinar. The recording and slides for the webinar are available here. Thanks for so many good questions. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: PMM has some restrictions working with metrics from RDS instances (AWS)? Aurora for example?
Query analytics can be done only using performance_schema as a query source for RDS/Aurora. No slow log is possible at this moment, as it’s not in the file but the mysql.slow_log table. As for metrics, only MySQL-related ones can be collected, thus only MySQL graphs are available. No system metrics. However, we can look at what we can fetch from CloudWatch to close this gap.

Q: How many ports are needed for each MySQL client? This is related to how many ports need to be open on the firewall.
One metric service per instance requires one port (e.g., mysql:metrics, linux:metrics, mongodb:metrics). The MySQL query service (mysql:queries) does not require a port to open, as the agent connects to the server — unlike the server to client connection in the case of metric services. Usually, the diagram looks like this.

Q: Is it possible to add a customized item for additional monitoring on the client OS?
It is possible to enable additional options on the linux:metrics service (pmm-linux-metrics-42000, node_exporter process). However, it requires “hacking” the service file.

Q: Does PMM have any alerting feature built-in? Or, is there any way to feed alerts to other monitoring framework such as Nagios?
Currently, it doesn’t. But we have plans to add alerting functionality.

Q: Can pmm-client be delivered as an RPM instead of a tar ball?
It is delivered as packages, see the instructions. We recommend using system packages to simplify the upgrade process.

Q: You said it does SSL and Auth. I have 1.0.4 installed, but I do not know a way to do SSL or Auth with the install. My solution is to run it behind an nginx proxy.
Yes, 1.0.4 supports server SSL and HTTP basic authentication. The corresponding documentation is published now.

Q: If you change the Grafana username and password will it be persistent across reboots and upgrades of PMM?
Currently no, but we are working to make it possible very soon.

Q: In Percona cloud tools – we can sort the queries by the sum of the number of queries, or the max time taken by the queries. Is there a way to do that in PMM?
Unfortunately, there is no such possibility, but I have just filled an internal issue so it’s not forgotten.

Q: Can you show us how the explain query works in PMM?
Query Analytics web app calls the API, which asks the corresponding agent connected to the server from the client side to run EXPLAIN on MySQL in real time.

Q: Does PMM track deadlocks?
We can capture metric such as

mysql_global_status_innodb_deadlocks
 but we do not currently graph it. Thanks for pointing out, we will consider adding it.

Q: Is it possible to set up alarm thresholds and notification?
Q: Is there any plan to add sending alerts by e-mail or SMS? This would be an excellent feature.
Currently, there is no alerts functionality. But we have plans for that.

Q: If we do not have a service manager installed like systemv or systemd, upstart , I am using Gentoo Linux on my database server , it has no service manager 🙁
Unfortunately, we didn’t test PMM client on Gentoo Linux. PMM is an open source project, so any person is welcome to implement support for Gentoo Linux and propose the patch. 🙂

Q: What kind of resources are needed by the “server” if we have a moderately active “client” it will be monitoring? (db with millions of rows, thousands of queries per minute)
Regarding metrics, it should not be a problem. If there are thousands of schemas or tables, then per table stats are disabled automatically (10000+ tables). You can also  disable them explicitly. In terms of query analytics, it depends on the volume of slow queries (

long_query_time settings
 ) and how many of them are safe to write and later process. I would say MySQL options should be tuned to help in monitoring if possible (as long as they don’t impact performance).

Q: What is the ETA for the Amazon EC2 image for the PMM server?
We have no ETA, but we acknowledge that not everyone can afford using Docker. We try to do our best to have other shipping methods available.

Q: Does the client have to be installed on the mySQL or Mongo target?
No. However, for best results we recommend running the PMM client on the same host that is being monitored. You can setup remote monitoring when it is not possible.

Q: What are the pros and cons of running the PMM client on the same system as the DB?
Running PMM client locally, you can monitor system metrics (linux:metrics) and MySQL queries from the slow log, which is not possible remotely due to not having access to the filesystem. When the client runs remotely, say MySQL metric service connects to the db host to get metrics, then the server scrapes from the client. So you may get some network latency issues as the metric has to make two hops. Running locally, the metric exporter may pull metrics using MySQL socket.

Q: And if you have any plans for the PMM work in MySQL cluster, not in Galera, but in the same MySQL Cluster.
There is a limited support now, you can monitor MySQL Cluster SQL nodes. However, we currently do not do anything with data nodes or management nodes.

Q: I’m aware that Percona focuses on MySQL and MongoDB, but do you have any plans for supporting plugins capable of monitoring other databases beyond ones supported now? Is this feasible now in the current state of PMM?
There are plans to monitor backups, to support HA tools (e.g., ProxySQL), alerting functionality, etc. We will also think about plugins in some future version, so it is easier for the community to expand PMM for a complete monitoring of their database infrastructure.

Q: Can the query monitor display query execution plan?
Yes, usually you can run EXPLAIN on the query. However, sometimes it requires typing the database name. We plan to improve this behavior.

Q: For metrics monitor, what is the main difference between PMM and Observium?
I don’t know much about Observium, but it looks to me it’s mainly a network device monitoring tool. PMM is focused on database performance.

Q: With the Query Analytics tool, is there any way to capture the notes about an individual query, along with a query review status, such as whether it’s “marked as reviewed”, or “needs attention”?
Unfortunately no, but it would be a great feature and it is on our roadmap.

Q: If possible to monitor the replication MySQL (Slave status ..)?
Yes, there is MySQL Replication dashboard.

Q: Is this possible to store metrics for a long time period to a custom time series database such us openTSBD?
We acknowledge that long-term storage of metrics is very important. However, we can’t offer a complete solution at this moment. Something is in progress. It is possible to point Prometheus to remote storage such as OpenTSBD, but we can’t guarantee how reliable it would be. The main problem is you will need to downsample data and create another dashboard to query from OpenTSBD (which is not flexible).

Q: can PMM have any impact on MySQL server performance?
Q: What is the additional overhead to the “clients” with PMM?
We try to make PMM as safe as possible. In particular,

pmm-admin
 has options to disable the most critical things that might cause performance issues. For example, per table stats when there are thousands of tables, processlist when there is an abnormal count of process states, etc.

Q: Are there graphs for TokuDB storage engine as well?
Yes, there is a TokuDB Metrics dashboard.

Q: Is it recommended not to run PMM server on Production MySQL server, but instead run it on a separate server?
We do not recommend running PMM server on the database server, or on the production one. It should be run on a separate machine, acting a monitor server. It does not require MySQL.

Q: Can you install PMM Server w/out docker?
Currently, no. But we have plans to distribute PMM server as VM, Amazon EC2 image, etc.

Q: Is there a way to integrate with a pre-existing Grafana server?
Yes, you can add Prometheus data source pointing to “http://server/prometheus/: in your Grafana configuration, but you will have to import the dashboards separately.

Q: The graphs plotted using perfromance_schema tables, you have chosen not to display all the events from performance_schema.file_summary_by_event_name. Like there are only Read, Write and Misc events plotted, but Wait is never plotted. So, is there a criterion on which you decide which values are important and which are not for plotting. Also, what is the significance of each of them?
Looks like the amount of events plotted would depend on the performance schema instrumentation settings.

Q: I am trying to figure out how big of a container/VM/machine I need to create for the PMM server and/or client…
Q: What do you recommend to consider as the databases being monitored scales up? (i.e., going from 10 instances monitored to 500 instances)
Q: What are the hardware recommendations to monitor 1000 DB server?
Depending on the PMM settings and MySQL workload, the load on the PMM server might vary a lot. Fast CPUs, plenty of memory and SSD storage are what provide the best performance. If you have a single PMM instance overloaded, you can use multiple PMM instances (i.e., one in each data center). We will be working on a solution for very large scales in the future versions.

I hope all the questions were answered. Thanks for attending my webinar!

If you have other things to ask, do not hesitate to ask on our forum.

by Roman Vynar at September 23, 2016 02:16 PM

September 22, 2016

Peter Zaitsev

Percona Live Europe featured talk with Anthony Yeh — Launching Vitess: How to run YouTube’s MySQL sharding engine

Percona Live Europe featured talk

Percona Live Europe featured talkWelcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Anthony Yeh, Software Engineer, Google. His talk will be on Launching Vitess: How to run YouTube’s MySQL sharding engine. Vitess is YouTube’s solution for scaling MySQL horizontally through sharding, built as a general-purpose, open-source project. Now that Vitess 2.0 has reached general availability, they’re moving beyond “getting started” guides and working with users to develop and document best practices for launching Vitess in their own production environments.

I had a chance to speak with Anthony and learn a bit more about Vitess:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Anthony: Before joining YouTube as a software engineer, I worked on photonic integrated circuits as a graduate student researcher at U.C. Berkeley. So I guess you could say I took a rather circuitous path to the database field. My co-presenter Dan and I have that in common. If you see him at the conference, I recommend asking him about his story.

I don’t actually think of myself as being in database development though; that’s probably more Sugu‘s area. I treat Vitess as just another distributed system, and my job is to make it more automated, more reliable, and easier to administer. My favorite part of this job is when open-source contributors send us new features and plug-ins, and all I have to do is review them. Keep those pull requests coming!

Percona: Your talk is going to be on “Launching Vitess: How to run YouTube’s MySQL sharding engine.” How has Vitess moved from a YouTube fix to a viable enterprise data solution?

Anthony: I joined Vitess a little over two years ago, right when they decided to expand the team’s focus to include external usability as a key goal. The idea was to transform Vitess from a piece of YouTube infrastructure that happens to be open-source, into an open-source solution that YouTube happens to use.

At first, the biggest challenge was getting people to tell us what they needed to make Vitess work well in their environments. Attending Percona Live is a great way to keep a pulse on how the industry uses MySQL, and talk with exactly the people who can give us that feedback. Progress really picked up early this year when companies like Flipkart and Pixel Federation started not only trying out Vitess on their systems, but contributing back features, plug-ins, and connectors.

My half of the talk will summarize all the things we’ve learned from these early adopters about migrating to Vitess and running it in various environments. We also convinced one of our Site Reliability Engineers to give the second half of the talk, to share firsthand what it’s like to run Vitess in production.

Percona: What new features and fixes can people look forward to in the latest release?

Anthony: The biggest new feature in Vitess 2.0 is something that was codenamed “V3” (sorry about the naming confusion). In a nutshell, this completes the transition of all sharding logic from the app into Vitess: at first you had to give us a shard name, then you just had to tell us the sharding key value. Now you just send a regular query and we do the rest.

To make this possible, Vitess has to parse and analyze the query, for which it then builds a distributed execution plan. For queries served by a single shard, the plan collapses to a simple routing decision without extra processing. But for things like cross-shard joins, Vitess will generate new queries and combine results from multiple shards for you, in much the same way your app would otherwise do it.

Percona: Why is sharding beneficial to databases? Are there pros and cons to sharding?

Anthony: The main pro for sharding is horizontal scalability, the holy grail of distributed databases. It offers the promise of a magical knob that you simply turn up when you need more capacity. The biggest cons have usually been that it’s a lot of work to make your app handle sharding, and it multiplies the operational overhead as you add more and more database servers.

The goal of Vitess is to create a generalized solution to these problems, so we can all stop building one-off sharding layers within our apps, and replace a sea of management scripts with a holistic, self-healing distributed database.

Percona: Vitess is billed as being for web applications based in cloud and dedicated hardware infrastructures. Was it designed specifically for one or the other, and does it work better for certain environments?

Anthony: Vitess started out on dedicated YouTube hardware and later moved into Borg, which is Google’s internal precursor to Kubernetes. So we know from experience that it works in both types of environments. But like any distributed system, there are lots of benefits to running Vitess under some kind of cluster orchestration system. We provide sample configs to get you started on Kubernetes, but we would love to also have examples for other orchestration platforms like Mesos, Swarm, or Nomad, and we’d welcome contributions in this area.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?

Anthony: I hope to meet people who have ideas about how to make Vitess better, and I look forward to learning more about how others are solving similar problems.

You can read more about Anthony and Vitess on the Vitess blog.

Want to find out more about Anthony, Vitess, YouTube and and sharding? Register for Percona Live Europe 2016, and come see his talk Launching Vitess: How to run YouTube’s MySQL sharding engine.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

by Dave Avery at September 22, 2016 08:59 PM

Percona XtraDB Cluster 5.5.41-25.11.1 is now available

Percona XtraDB Cluster Reference Architecture

Percona XtraDB Cluster 5.5Percona announces the new release of Percona XtraDB Cluster 5.5.41-25.11.1 (rev. 855) on September 22, 2016. Binaries are available from the downloads area or our software repositories.

Bugs Fixed:
  • Due to security reasons ld_preload libraries can now only be loaded from the system directories (/usr/lib64, /usr/lib) and the MySQL installation base directory. This fix also addresses issue with where limiting didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Hrvoje Matijakovic at September 22, 2016 05:56 PM

Sixth Annual Percona Live Open Source Database Conference 2017 Call for Speakers Now Open

Percona LiveThe Call for Speakers for Percona Live Open Source Database Conference 2017 is open and accepting proposals through Oct. 31, 2016.

The Percona Live Open Source Database Conference 2017 is the premier event for the diverse and active open source community, as well as businesses that develop and use open source software. Topics for the event will focus on three key areas – MySQL, MongoDB and Open Source Databases – and the conference sessions will feature a range of in-depth discussions and hands-on tutorials.

The 2017 conference will feature four formal tracks – Developer, Operations, Business/Case Studies, and Wildcard – that will explore a variety of new and trending topics, including big data, IoT, analytics, security, scalability and performance, architecture and design, operations and management and development. Speaker proposals are welcome on these topics as well as on a variety of related technologies, including MySQL, MongoDB, Amazon Web Services (AWS), OpenStack, Redis, Docker and many more. The conference will also feature sponsored talks.

Percona Live Open Source Database Conference 2017 will take place April 24-27, 2017 at The Hyatt Regency Santa Clara and Santa Clara Convention Center. Sponsorship opportunities are still available, and Super Saver Registration Discounts can be purchased through Nov. 13, 2016 at 11:30 p.m. PST.

Click here to see all the submission criteria, and to submit your talk.

Sponsorships

Sponsorship opportunities for Percona Live Open Source Database Conference 2017 are available and offer the opportunity to interact with more than 1,000 DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event.

Planning to Attend?

Super Saver Registration Discounts for Percona Live Open Source Database Conference 2017 are available through Nov. 13, 2016 at 11:30 p.m. PST.

Visit the Percona Live Open Source Database Conference 2017 website for more information about the conference. Interested community members can also register to receive email updates about Percona Live Open Source Database Conference 2017.

by Kortney Runyan at September 22, 2016 04:44 PM

Jean-Jerome Schmidt

Planets9s - Download our new ‘Database Sharding with MySQL Fabric’ whitepaper

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

Download our new whitepaper: Database Sharding with MySQL Fabric

Database systems with large data sets or high throughput applications can challenge the capacity of a single database server, and sharding is a way to address that. Spreading your database across multiple servers sounds good, but how does this work in practice?

In this whitepaper, we will have a close look at MySQL Fabric. You will learn the basics, and also learn how to migrate to a sharded environment.

Download the whitepaper

Sign up for our 9 DevOps Tips for going in production with Galera Cluster for MySQL / MariaDB webinar

Operations is not so much about specific technologies, but about the techniques and tools you use to deploy and manage them. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups; these are all aspects to consider – preferably before going live. In this webinar, we’ll guide you through 9 key devops tips to consider before taking Galera Cluster for MySQL / MariaDB into production.

Sign up for the webinar

Load balanced MySQL Galera setup - Manual Deployment vs ClusterControl

Deploying a MySQL Galera Cluster with redundant load balancing can be time consuming. This blog looks at how much time it would take to do it manually, using the popular “Google university” to search for how-to’s and blogs that provide deployment steps. Or using our agentless management and automation console, ClusterControl, which supports MySQL (Oracle and Percona server), MariaDB, MongoDB (MongoDB inc. and Percona), and PostgreSQL.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at September 22, 2016 01:41 PM

September 21, 2016

MariaDB Foundation

MariaDB Galera Cluster 5.5.52 and Connector/ODBC 2.0.12 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB Galera Cluster 5.5.52 and MariaDB Connector/ODBC 2.0.12. Both are Stable (GA) releases. See the release notes and changelog for details on theses releases. IMPORTANT: There was a security fix included in the 5.5.51 release of MariaDB and MariaDB Galera Cluster. If you are […]

The post MariaDB Galera Cluster 5.5.52 and Connector/ODBC 2.0.12 now available appeared first on MariaDB.org.

by Daniel Bartholomew at September 21, 2016 07:04 PM

Peter Zaitsev

Percona XtraDB Cluster 5.6.30-25.16.3 is now available

Percona XtraDB Cluster Reference Architecture

Percona XtraDB Cluster 5.6Percona  announces the new release of Percona XtraDB Cluster 5.6 on September 21, 2016. Binaries are available from the downloads area or our software repositories.

Percona XtraDB Cluster 5.6.30-25.16.3 is now the current release, based on the following:

  • Percona Server 5.6.30-76.3
  • Galera Replication library 3.16
  • Codership wsrep API version 25
Bugs Fixed:
  • Limiting ld_preload libraries to be loaded from specific directories in mysqld_safe didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Hrvoje Matijakovic at September 21, 2016 06:22 PM

Percona Server 5.7.14-8 is now available

percona server 5.6.30-76.3

percona server 5.7.14-8Percona announces the GA release of Percona Server 5.7.14-8 on September 21, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.7.14, including all the bug fixes in it, Percona Server 5.7.14-8 is the current GA release in the Percona Server 5.7 series. Percona’s provides completely open-source and free software. Find release details in the 5.7.14-8 milestone at Launchpad.

Bugs Fixed:
  • Limiting ld_preload libraries to be loaded from specific directories in mysqld_safe didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

The release notes for Percona Server 5.7.14-8 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

by Hrvoje Matijakovic at September 21, 2016 06:11 PM

Percona Server 5.6.32-78.1 is now available

percona server 5.6.30-76.3

percona server 5.6.32-78.1Percona announces the release of Percona Server 5.6.32-78.1 on September 21st, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.6.32, including all the bug fixes in it, Percona Server 5.6.32-78.1 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release are available in the 5.6.32-78.1 milestone on Launchpad.

Bugs Fixed:
  • Limiting ld_preload libraries to be loaded from specific directories in mysqld_safe didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

Release notes for Percona Server 5.6.32-78.1 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at September 21, 2016 06:04 PM

Percona Server 5.5.51-38.2 is now available

percona server 5.6.30-76.3

percona server 5.5.51-38.2Percona announces the release of Percona Server 5.5.51-38.2 on September 21, 2016. Based on MySQL 5.5.51, including all the bug fixes in it, Percona Server 5.5.51-38.2 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. You can find release details of the release in the 5.5.51-38.2 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:
  • Limiting ld_preload libraries to be loaded from specific directories in mysqld_safe didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

Find the release notes for Percona Server 5.5.51-38.2 in our online documentation. Report bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at September 21, 2016 05:58 PM

Jean-Jerome Schmidt

Sign up for our new webinar: 9 DevOps Tips for Going in Production with Galera Cluster for MySQL / MariaDB

Galera Cluster for MySQL / MariaDB is easy to deploy, but how does it behave under real workload, scale, and during long term operation? Proof of concepts and lab tests usually work great for Galera, until it’s time to go into production. Throw in a live migration from an existing database setup and devops life just got a bit more interesting...

If this scenario sounds familiar, then this webinar is for you!

Operations is not so much about specific technologies, but about the techniques and tools you use to deploy and manage them. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups; these are all aspects to consider – preferably before going live.

In this webinar, we’d like to guide you through 9 key tips to consider before taking Galera Cluster for MySQL / MariaDB into production.

Date & Time

Europe/MEA/APAC

Tuesday, October 11th at 09:00 BST (UK) / 10:00 CEST (Germany, France, Sweden)
Register Now

North America/LatAm

Tuesday, October 11th at 9:00 Pacific Time (US) / 12:00 Eastern Time (US)
Register Now

Agenda

  • 101 Sanity Check
  • Operating System
  • Backup Strategies
  • Replication & Sync
  • Query Performance
  • Schema Changes
  • Security / Encryption
  • Reporting
  • Managing from disaster

Speaker

Johan Andersson, CTO, Severalnines

Johan's technical background and interest are in high performance computing as demonstrated by the work he did on main-memory clustered databases at Ericsson as well as his research on parallel Java Virtual Machines at Trinity College Dublin in Ireland. Prior to co-founding Severalnines, Johan was Principal Consultant and lead of the MySQL Clustering & High Availability consulting group at MySQL / Sun Microsystems / Oracle, where he designed and implemented large-scale MySQL systems for key customers. Johan is a regular speaker at MySQL User Conferences as well as other high profile community gatherings with popular talks and tutorials around architecting and tuning MySQL Clusters.

Join us for this live webinar, where we’ll be discussing and demonstrating how to best proceed when planning to go into production with Galera Cluster.

We look forward to “seeing” you there and to insightful discussions!

If you have any questions or would like a personalised live demo, please do contact us.

by Severalnines at September 21, 2016 04:07 PM

Peter Zaitsev

Regular Expressions Tutorial

regular expressions

regular expressionsThis blog post highlights a video on how to use regular expressions.

It’s been a while since I did the MySQL QA and Bash Training Series. The 13 episodes were quite enjoyable to make, and a lot of people watched the video’s and provided great feedback.

In today’s new video, I’d like to briefly go over regular expressions. The session will cover the basics of regular expressions, and then some. I’ll follow up later with a more advanced regex session too.

Regular expressions are very versatile, and once you know how to use them – especially as a script developer or software coder – you will return to them again and again. Enjoy!

Presented by Roel Van de Paar. Full-screen viewing @ 720p resolution recommended

 

by Roel Van de Paar at September 21, 2016 01:48 PM

Webinar Thursday September 22 – Black Friday and Cyber Monday: How to Avoid an E-Commerce Disaster

e-commerce disaster

e-commerce disasterJoin Percona’s Sr. Technical Operations Architect, Tim Vaillancourt on Thursday, September 22, at 10 am PDT (UTC-7) for the webinar Black Friday and Cyber Monday: How to Avoid an E-Commerce Disaster. This webinar will provide some best practices to ensure the performance of your system under high-traffic conditions.

Can your retail site handle the traffic deluge on the busiest shopping day of the year?

Black Friday and Cyber Monday is mere months away. Major retailers have already begun stress-testing their e-commerce sites to make sure they can handle the load. Failure to accommodate the onslaught of post-Thanksgiving shoppers might result in both embarrassing headlines and millions of dollars in lost revenue. Our advice to retailers: September stress tests are essential to a glitch-free Black Friday.

This webinar will cover:

  • Tips to avoid bottlenecks in data-driven apps
  • Techniques to allow an app to grow and shrink for large events/launches
  • Solutions to alleviate load on an app’s database
  • Developing and testing scalable apps
  • Deployment strategies to avoid downtime
  • Creating lighter, faster user facing requests

For more ideas on how to optimize your E-commerce database, read Tim’s blog post here.

Please register here.

register-now

E-Commerce DisasterTimothy Vaillancourt, Senior Technical Operations Architect

Tim joined Percona in 2016 as Sr. Technical Operations Architect for MongoDB with a goal to make the operations of MongoDB as smooth as possible. With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming, combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned.

Tim is based in Amsterdam, NL and enjoys traveling, coding and music. Before Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS. Before moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

by Dave Avery at September 21, 2016 01:11 PM

September 20, 2016

Peter Zaitsev

MongoDB point-in-time backups made easy

MongoDB point-in-time backups

MongoDB point-in-time backupsIn this blog post we’ll look at MongoDB point-in-time backups, and work with them.

Mongodump is the base logical backup tool included with MongoDB. It takes a full BSON copy of database/collections, and optionally includes a log of changes during the backup used to make it consistent to a point in time. Mongorestore is the tool used to restore logical backups created by Mongodump. I’ll use these tools in the steps in this article to restore backed-up data. This article assumes a mongodump-based backup that was taken consistently with oplog changes (by using the command flag “–oplog”), and the backup is being restored to a MongoDB instance.

In this example, a mongodump backup is gathered and restored for the base collection data, and separately the oplogs/changes necessary to restore the data to a particular point-in-time are collected and applied to this data.

Note: Percona developed a backup tool named mongodb_consistent_backup, which is a wrapper for ‘mongodump’ with added cluster-wide backup consistency. The backups created by mongodb_consistent_backup (in Dump/Mongodump mode) can be restored using the same steps as a regular “mongodump” backup.

Stages

Stage 1: Get a Mongodump Backup

Mongodump Command Flags
–host/–port (and –user/–password)

Required, even if you’re using the default host/port (localhost:27017).  If authorization is enabled, add –user/–password flags also.

–oplog

Required for any replset member! Causes “mongodump” to capture the oplog change log during the backup for consistent to one point in time.

–gzip

Optional. For mongodump >= 3.2, enables inline compression on the backup files.

Steps
  1. Get a mongodump backup via (pick one):
    • Running “mongodump” with the correct flags/options to take a backup (w/oplog) of the data:
      $ mongodump --host localhost --port 27017 --oplog --gzip
      2016-08-15T12:32:28.930+0200    writing wikipedia.pages to
      2016-08-15T12:32:31.932+0200    [#########...............]  wikipedia.pages  674/1700   (39.6%)
      2016-08-15T12:32:34.931+0200    [####################....]  wikipedia.pages  1436/1700  (84.5%)
      2016-08-15T12:32:37.509+0200    [########################]  wikipedia.pages  2119/1700  (124.6%)
      2016-08-15T12:32:37.510+0200    done dumping wikipedia.pages (2119 documents)
      2016-08-15T12:32:37.521+0200    writing captured oplog to
      2016-08-15T12:32:37.931+0200    [##......................]  .oplog  44/492   (8.9%)
      2016-08-15T12:32:39.648+0200    [########################]  .oplog  504/492  (102.4%)
      2016-08-15T12:32:39.648+0200    dumped 504 oplog entries
    • Use the latest daily automatic backup, if it exists.

Stage 2: Restore the Backup Data

Steps
  1. Locate the shard PRIMARY member.
  2. Triple check you’re restoring the right backup to the right shard/host!
  3. Restore a mongodump-based backup to the PRIMARY node using the steps in this article: Restore a Mongodump Backup.
  4. Check for errors.
  5. Check that all SECONDARY members are in sync with the PRIMARY.

Stage 3: Get Oplogs for Point-In-Time-Recovery

In this stage, we will gather the changes needed to roll the data forward from the time of backup to the time/oplog-position to which we would like to restore.

In this example below, let’s pretend someone accidentally deleted an entire collection at oplog timestamp: “Timestamp(1470923942, 3)” and we want to fix it. If we decrement the Timestamp increment (2nd number) of “Timestamp(1470923942, 3)” we will have the last change before the accidental command, which in this case is: “Timestamp(1470923942, 2)“. Using the timestamp, we can capture and replay the oplogs from when the backup occurred to just before the issue/error.

A start and end timestamp are required to get the oplog data. In all cases, this will need to be gathered manually, case-by-case.

Helper Script
#!/bin/bash
#
# This tool will dump out a BSON file of MongoDB oplog changes based on a range of Timestamp() objects.
# The captured oplog changes can be applied to a host using 'mongorestore --oplogReplay --dir /path/to/dump'.
set -e
TS_START=$1
TS_END=$2
MONGODUMP_EXTRA=$3
function usage_exit() {
  echo "Usage $0: [Start-BSON-Timestamp] [End-BSON-Timestamp] [Extra-Mongodump-Flags (in quotes for multiple)]"
  exit 1
}
function check_bson_timestamp() {
  local TS=$1
  echo "$TS" | grep -qP "^Timestamp(d+,sd+)$"
  if [ $? -gt 0 ]; then
    echo "ERROR: Both timestamp fields must be in BSON Timestamp format, eg: 'Timestamp(########, #)'!"
    usage_exit
  fi
}
if [ -z "$TS_START" ] || [ -z "$TS_END" ]; then
  usage_exit
else
  check_bson_timestamp "$TS_START"
  check_bson_timestamp "$TS_END"
fi
MONGODUMP_QUERY='{ "ts" : { "$gte" : '$TS_START' }, "ts" : { "$lte" : '$TS_END' } }'
MONGODUMP_FLAGS='--db=local --collection=oplog.rs'
[ ! -z "$MONGODUMP_EXTRA" ] && MONGODUMP_FLAGS="$MONGODUMP_FLAGS $MONGODUMP_EXTRA"
if [ -d dump ]; then
  echo "'dump' subdirectory already exists! Exiting!"
  exit 1
fi
echo "# Dumping oplogs from '$TS_START' to '$TS_END'..."
mkdir dump
mongodump $MONGODUMP_FLAGS --query "$MONGODUMP_QUERY" --out - >dump/oplog.bson
if [ -f dump/oplog.bson ]; then
  echo "# Done!"
else
  echo "ERROR: Cannot find oplog.bson file! Exiting!"
  exit 1
fi

 

Script Usage:
$ ./dump_oplog_range.sh
Usage ./dump_oplog_range.sh: [Start-BSON-Timestamp] [End-BSON-Timestamp] [Extra-Mongodump-Flags (in quotes for multiple)]

 

Steps
  1. Find the PRIMARY member that contains the oplogs needed for the PITR restore.
  2. Determine the “end” Timestamp() needed to restore to. This oplog time should be before the problem occurred.
  3. Determine the “start” Timestamp() from right before the backup was taken.
    1. This timestamp doesn’t need to be exact, so something like a Timestamp() object equal-to “a few min before the backup started” is fine, but the more accurate you are, the fewer changes you’ll need to re-apply (which saves on restore time).
  4. Use the MongoToolsAndSnippets script: “get_oplog_range.sh (above in “Helper Script”) to dump the oplog time-ranges you need to restore to your chosen point-in-time. In this example I am gathering the oplog between two point-in-times (also passing in –username/–password flags in quotes the 3rd parameter):
    1. The starting timestamp: the BSON timestamp from before the mongodump backup in “Stage 2: Restore Collection Data” was taken, in this example. “Timestamp(1470923918, 0)” is a time a few seconds before my mongodump was taken (does not need to be exact).
    2. The end timestamp: the end BSON Timestamp to restore to, in this example. “Timestamp(1470923942, 2)” is the last oplog-change BEFORE the problem occurred.

    Example:

    $ wget -q https://raw.githubusercontent.com/percona/MongoToolsAndSnippets/master/rdba/dump_oplog_range.sh
    $ bash ./dump_oplog_range.sh 'Timestamp(1470923918, 0)' 'Timestamp(1470923942, 2)' '--username=secret --password=secret --host=mongo01.example.com --port=27024'
    # Dumping oplogs from 'Timestamp(1470923918, 0)' to 'Timestamp(1470923942, 2)'...
    2016-08-12T13:11:17.676+0200    writing local.oplog.rs to stdout
    2016-08-12T13:11:18.120+0200    dumped 22 documents
    # Done!

    Note: all additional mongodump flags (optional 3rd field) must be in quotes!

  5. Double check it worked by looking for the ‘oplog.bson‘ file and checking that the file has some data in it (168mb in the below example):

    $ ls -alh dump/oplog.bson
    -rw-rw-r--. 1 tim tim 168M Aug 12 13:11 dump/oplog.bson

     

Stage 4: Apply Oplogs for Point in Time Recovery (PITR)

In this stage, we apply the time-range-based oplogs gathered in Stage 3 to the restored data set to bring it from the time of the backup to a particular point in time before a problem occurred.

Mongorestore Command Flags
–host/–port (and –user/–password)

Required, even if you’re using the default host/port (localhost:27017).  If authorization is enabled, add –user/–password flags also.

–oplogReplay

Required. This is needed to replay the oplogs in this step.

–dir

Required. The path to the mongodump data.

Steps
  1. Copy the “dump” directory containing only the “oplog.bson”. file (captured in Stage 3) to the host that needs the oplog changes applied (the restore host).
  2. Run “mongorestore” on the “dump” directory to replay the oplogs into the instance. Make sure the “dump” dir contains only “oplog.bson”!
    $ mongorestore --host localhost --port 27017 --oplogReplay --dir ./dump
    2016-08-12T13:12:28.105+0200    building a list of dbs and collections to restore from dump dir
    2016-08-12T13:12:28.106+0200    replaying oplog
    2016-08-12T13:12:31.109+0200    oplog   80.0 MB
    2016-08-12T13:12:34.109+0200    oplog   143.8 MB
    2016-08-12T13:12:35.501+0200    oplog   167.8 MB
    2016-08-12T13:12:35.501+0200    done
  3. Validate the data was restored with the customer or using any means possible (examples: .count() queries, some random .find() queries, etc.).

by David Murphy at September 20, 2016 11:03 PM

Percona Live Europe featured talk with Marc Berhault — Inside CockroachDB’s Survivability Model

percona live europe featured talk

percona live europe featured talkWelcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Marc Berhault, Engineer at Cockroach Labs.His talk will be on Inside CockroachDB’s Survivability Model. This talk takes a deep dive into CockroachDB, a database whose “survive and thrive” model aims to bring the best aspects of Google’s next generation database, Spanner, to the rest of the world via open source.

I had a chance to speak with Marc and learn a bit more about these questions:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Marc: I started out as a Site Reliability Engineer managing Google’s storage infrastructure (GFS). Back in those days, keeping a cluster up and running mostly meant worrying about the masters.

I then switched to a developer role on Google’s next-generation storage system, which replaced the single write master with sharded metadata handlers. This increased the reliability of the entire system considerably, allowing for machine and network failures. SRE concerns gradually shifted away from machine reliability towards more interesting problems, such as multi-tenancy issues (quotas, provisioning, isolation) and larger scale failures.

After leaving Google, I found myself back in a world where one had to worry about a single machine all over again – at least when running your own infrastructure. I kept hearing the same story: a midsize company starts to grow out of its single-machine database and starts trimming the edges. This means moving tables to other hosts, shrinking schemas, etc., in order to avoid the dreaded “great sharding of the monolithic table,” often accompanied by its friends: cross-shard coordination layer and production complexity.

This was when I joined Cockroach Labs, a newly created startup with the goal of bringing a large-scale, transactional, strongly consistent database to the world at large. After contributing to various aspects of the projects, I switched my focus to production: adding monitoring, working on deployment, and of course rolling out our test clusters.

Percona: Your talk is called “Inside CockroachDB’s Survivability Model.” Define “survivability model”, and why it is important to database environments.

Marc: The survivability model in CockroachDB is centered around data redundancy. By default, all data is replicated three times (this is configurable) and is only considered written if a quorum exists. When a new node holding one of the copies of the data becomes unavailable, a node is picked and given a snapshot of the data.

This redundancy model has been widely used in distributed systems, but rarely with strongly consistent databases. CockroachDB’s approach provides strong consistency as well as transactions across the distributed data. We see this as a critical component of modern databases: allowing scalability while guaranteeing consistency.

Percona: What are the workloads and database environments that are best suited for a CockroachDB deployment? Do you see an expansion of the solution to encompass other scenarios?

Marc: CockroachDB is a beta product and is still in development. We expect to be out of beta by the end of 2016. Ideal workloads are those requiring strong consistency – those applications that manage critical data. However, strong consistency comes at a cost, usually directly proportional to latency between nodes and replication factor. This means that a widely distributed CockroachDB cluster (e.g., across multiple regions) will incur high write latencies, making it unsuitable for high-throughput operations, at least in the near term.

Percona: What is changing in the way businesses use databases that keeps you awake at night? How do you think CockroachDB is addressing those concerns?

Marc: In recent years, more and more businesses have been reaching the limits of what their single-machine databases can handle. This has forced many to implement their own transactional layers on top of disjoint databases, at the cost of longer development time and correctness.

CockroachDB attempts to find a solution to this problem by allowing a strongly consistent, transactional database to scale arbitrarily.

Percona: What are looking forward to the most at Percona Live Europe this year?

Marc: This will be my first time at a Percona Live conference, so I’m looking forward to hearing from other developers and learning what challenges other architects and DBAs are facing in their own work.

You can read more about Marc’s thoughts on CockroachDB at their blog.

Want to find out more about Marc, CoachroachDB and survivability? Register for Percona Live Europe 2016, and come see his talk Inside CockroachDB’s Survivability Model.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

by Dave Avery at September 20, 2016 04:41 PM

Jean-Jerome Schmidt

Database Sharding - How does it work?

Database systems with large data sets or high throughput applications can challenge the capacity of a single database server. High query rates can exhaust CPU capacity, I/O resources, RAM or even network bandwidth.

Horizontal scaling is often the only way to scale out your infrastructure. You can upgrade to more powerful hardware, but there is a limit on how much load a single host can handle. You may be able to purchase the most expensive and the fastest CPU or storage on the market, but it still may not be enough to handle your workload. The only feasible way to scale beyond the constraints of a single host is to utilize multiple hosts working together as a part of a cluster or connected using replication.

Horizontal scaling has its limits too, though. When it comes to scaling reads, it is very efficient - just add a node and you can utilize additional processing power. With writes, things are completely different. Consider a MySQL replication setup. Historically, MySQL replication used a single thread to process writes - in a multi-user, highly concurrent environment, this was a serious limitation. This has changed recently. In MySQL 5.6, multiple schemas could be replicated in parallel. In MySQL 5.7, after addition of a ‘logical clock’ scheduler, it became possible for a single-schema workload to benefit from the parallelization of multi-threaded replication. Galera Cluster for MySQL also allows for multi-threaded replication by utilizing multiple workers to apply writesets. Still, even with those enhancements, you can get just some incremental improvement in the write throughput - it is not the solution to the problem.

One solution would be to split our data across multiple servers using some kind of a pattern and, in that way, to split writes across multiple MySQL hosts. This is sharding.

The idea is really simple - if my database server cannot handle the amount of writes, let’s split the data somehow and store one part, generating part of the write traffic, on one database host and the other part on another host. In that way, each host will have to handle half of the writes which should be well within their hardware limits. We can further split the data and distribute it on more servers if our write workload grows.

The actual implementation is more complex as there are numerous issues you need to solve before you can implement sharding. The first, very important question that you need to answer is - how are you going to split your data?

Functional sharding

Let’s imagine your application is built out of multiple modules, or microservices if we want to be fashionable. Assume it’s a large online store with a backend of several warehouses. Such site may contain a module to handle warehouse logistics - check the availability of an item, track shipment from a warehouse to a customer. Another module may be an online store - a website with a presentation of available goods. Yet another module would be a transaction module - collect and store credit cards, handle transaction processing and so on. Maybe the online store has a large, buzzing forum where customers share opinions on goods, discuss support issues etc. You may start your voyage in the world of shards by using a separate database per module. This will allow you to gain some breathing space and plan for next steps. On the other hand, the next step may not be necessary at all if each shard can comfortably handle its workload. Of course, there are downsides of such setup - you cannot easily query data across modules (shards) - you have to execute separate queries to separate databases and then combine together resultsets.

Expression-based sharding

Another method of splitting the data across shards would be to use some kind of expression or function/algorithm to help us decide where the data should be located. Let’s imagine you have a database with one large table that is commonly accessed and written to. For example, assume a social media site and our largest table contains data about users and their activities. This table uses some kind of id column as primary key - we need to split it somehow and one of the ways would be to apply an expression to the ID value. A very popular choice is to use a modulo function - if we want to generate 128 shards, we can just apply expression of ‘id % 128’ and this would calculate the shard number where a given row should be stored. Another method include making use of a date range, e.g., all user activity in year 2015 is stored in one database, activity in year 2016 is stored in a separate database). Yet another one would be to distribute data based on a list of attributes, e.g., all users from a specific country end up in the same shard. 

Metadata-based sharding

As we discussed above, both functional sharding and expression-based sharding have limitations when it comes to scaling out in terms of number of shards. There’s still one more method which gives you more flexibility in managing shards - a metadata-based sharding. The idea is very simple - instead of using some kind of hard-coded algorithm, let’s just write down where a given row is located: row of id=1 - shard 1, row with id=2 - shard 5. Finally, let’s build a database to keep this metadata.

This approach has a huge benefit - you can store any row in any shard. You can also easily add new shards to the mix - just set them up and start to store data on them. You can also easily migrate data between shards - nothing stops you from copying data between shards and then making an adjustment in the metadata. In reality it’s more complex than it sounds as you have to make sure you move all the data so some kind of data locking is required. For example, to copy data between shards, you’d have to do an initial copy of the data across shards, lock access to the part of the data which is migrated, make a final sync and, finally, change an entry in the metadata database and unlock the data.

This is it for now. If you would like to learn more about sharding, you may want to check out this ebook:

Database Sharding with MySQL Fabric

Why do we shard? How does sharding work? What are the different ways I can shard my database? This whitepaper goes through some of the theory behind sharding. It also discusses three different tools which are designed to help users shard their MySQL databases. And last but not least, it shows you how to set up a sharded MySQL setup based on MySQL Fabric and ProxySQL.

Download Here

by Severalnines at September 20, 2016 09:33 AM

September 19, 2016

Peter Zaitsev

Open Source Databases at Percona Live Europe, Amsterdam

Percona Live Europe

Percona Live EuropeIn this blog post, I’ll review some of the open source database technologies discussions at Percona Live Europe.

I’ve already written about the exciting PostgreSQL and MongoDB content at Percona Live Europe in Amsterdam, and now I’m going to highlight some of our open source database content.  

In the last five years, the open source database community has been flourishing. There has been an explosion of creativity and innovation. The community has created many niche (and not so niche) solutions for various problems.

As a software engineer or architect, the number of available database options might excite you. You might also be intimidated about how to make the right technology choice. At Percona Live Europe, we have introductory talks for the relevant technologies that we find particularly interesting. These talks will help expand your knowledge about the available solutions and lessen intimidation at the same time.

I’m looking forward to the exciting technologies and talks that we’ll cover this year, such as:

For talks and tutorials on specific uses cases, check out the following sessions:

  • RocksDB is a very cool write optimized (LSM) storage engine, one of the few that has been in more than one database. In addition to the RocksDB-based systems inside Facebook, it can be used with MongoDB as MongoRocks and MySQL as MyRocks. It is also used inside next-generation database systems such as CockroachDB and TiDB. We have a lot of talks about RocksDB and related integrations, ranging from a MyRocks Tutorial by Yoshinori Matsunobu, to talk about MongoRocks by Igor Canadi, and a performance-focused talk by Mark Callaghan.
  • Elastic is the leading technology for open source full-text search implementations (hence previous name ElasticSearch) — but it is much more than that. ElasticSearch, Kibana, Logstash and Beats allow you to get data from a variety of data searches and analyze and visualize it. Philip Krenn will talk about full-text search in general in his Full-Text Search Explained talk, as well as talk in more details about ElasticSearch in ElasticSearch for SQL Users.
  • I am sure you’ve heard about Redis, the Swiss army knife of different data structures and operations. Redis covers many typical data tasks, from caching to maintaining counters and queues. Justin Starry will talk about Redis at Scale in Powering Million of live streams at Periscope, and Itamar Haber will talk about Extending Redis with Modules to make Redis an even more powerful data store.
  • Apache Spark is another technology you’ve surely heard about. Apache Spark adoption has skyrocketed in recent years due to its high-performance in-memory data analyses, replacing or supplementing Hadoop installations. We will hear about Badoo’s experience processing 11 billion events a day with Spark with Alexander Krasheninnikov, and also learn how to use Spark with MongoDB, MySQL and Redis with Tim Vaillancourt.
  • Apache Cassandra is a database focused on high availability and high performance, even when replicating among several data centers. When you think “eventual consistency,” perhaps Cassandra is the first technology that comes to mind. Cassandra allows you to do some impressive things, and Duy Hai Doan will show us some of them in his talk 7 things in Cassandra that you cannot find in RDBMS.
  • ClickHouse is a new guy on the block, but I’m very excited about this distributed column store system for high-performance analytics. Built by the Yandex team to power real-time analytics on the scale of trillions of database records, ClickHouse went open source earlier this year. Victor Tarnavsky will share more details in his talk.
  • Apache Ignite is another new but very exciting technology. Described as in-memory data fabric, it can be used for a variety of applications to supplement or replace relational databases — ranging from advanced data caching strategies to parallel in-memory processing of large quantities of data. Christos Erotocritou will talk about some of these use cases in his talk Turbocharge Your SQL Queries In-Memory with Apache Ignite.
  • RethinkDB is an interesting OpenSource NoSQL database built from the ground up for scalable real-time applications. The end-to-end real-time data streaming feature is really cool, and allows you build interactive real-time applications much easier. Ilya Verbitskiy will talk about RethinkDB in his Agile web-development with RethinkDB talk.
  • CockroachDB is a distributed database focused on survivability and high performance (borrowing some ideas from Google’s innovative Spanner database). Marc Berhault will talk database rocket science in his Inside CockroachDB’s Survivability Model.
  • TiDB is another open source NewSQL database, inspired by Google Spanner and F1. It can use a variety of storage engines for data store, and it supports MySQL wire protocol to ease application migration. Max Liu explains How TiDB was built in his talk.
  • ToroDB is a very interesting piece of technology. It is protocol-compatible with MongoDB, but stores data through a relational format in PostgreSQL. This can offer substantial space reduction and performance improvements for some workloads. Álvaro Hernández from 8Kdata will discuss this technology in his ToroDB: All your MongoDB data are belong to SQL talk.

As you can see we cover a wealth of exciting open source database technologies at Percona Live Europe. Do not miss a chance to expand your database horizons and learn about new developments in the industry. There is still time to register! Use the code PZBlog for a €30 discount off your registration price!

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

by Peter Zaitsev at September 19, 2016 10:18 PM

Percona Server for MongoDB 3.2.9-2.1 is now available

Percona_ServerfMDBLogoVert

Percona Server for MongoDBPercona announces the release of Percona Server for MongoDB 3.2.9-2.1 on September 19, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.2.9-2.1 is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.2 protocol and drivers. It extends MongoDB with MongoRocks, Percona Memory Engine, and PerconaFT storage engine, as well as enterprise-grade features like external authentication and audit logging at no extra cost. Percona Server for MongoDB requires no changes to MongoDB applications or code.

Note:

We deprecated the PerconaFT storage engine. It will not be available in future releases.


This release is based on MongoDB 3.2.9. There are no additional improvements or new features on top of those upstream fixes.

The release notes are available in the official documentation.

 

by Alexey Zhebel at September 19, 2016 04:17 PM

Jean-Jerome Schmidt

Load balanced MySQL Galera setup - Manual Deployment vs ClusterControl

If you have deployed databases with high availability before, you will know that a deployment does not always go your way, even though you’ve done it a zillion times. You could spend a full day setting everything up and may still end up with a non-functioning cluster. It is not uncommon to start over, as it’s really hard to figure out what went wrong.

So, deploying a MySQL Galera Cluster with redundant load balancing takes a bit of time. This blog looks at how long time it would take to do it manually vs using ClusterControl to perform the task. For those who have not used it before, ClusterControl is an agentless management and automation software for databases. It supports MySQL (Oracle and Percona server), MariaDB, MongoDB (MongoDB inc. and Percona), and PostgreSQL.

For manual deployment, we’ll be using the popular “Google university” to search for how-to’s and blogs that provide deployment steps.

Database Deployment

Deployment of a database consists of several parts. These include getting the hardware ready, software installation, configuration tweaking and a bit of tuning and testing. Now, let’s assume the hardware is ready, the OS is installed and it is up to you to do the rest. We are going to deploy a three-node Galera cluster as shown in the following diagram:

Manual

Googling on “install mysql galera cluster” led us to this page. By following the steps explained plus some additional dependencies, the following is what we should run on every DB node:

$ semanage permissive -a mysqld_t
$ systemctl stop firewalld
$ systemctl disable firewalld
$ vim /etc/yum.repos.d/galera.repo # setting up Galera repository
$ yum install http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-release-0.1-3.noarch.rpm
$ yum install mysql-wsrep-5.6 galera3 percona-xtrabackup
$ vim /etc/my.cnf # setting up wsrep_* variables
$ systemctl start mysql --wsrep-new-cluster # ‘systemctl start mysql’ on the remaining nodes
$ mysql_secure_installation

The above commands took around 18 minutes to finish on each DB node. Total deployment time was 54 minutes.

ClusterControl

Using ClusterControl, here are the steps we took to first install ClusterControl (5 minutes):

$ wget http://severalnines.com/downloads/cmon/install-cc
$ chmod 755 install-cc
$ ./install-cc

Login to the ClusterControl UI and create the default admin user.

Setup passwordless SSH to all DB nodes on ClusterControl node (1 minute):

$ ssh-keygen -t rsa
$ ssh-copy-id 10.0.0.217
$ ssh-copy-id 10.0.0.218
$ ssh-copy-id 10.0.0.219

In the ClusterControl UI, go to Create Database Cluster -> MySQL Galera and enter the following details (4 minutes):

Click Deploy and wait until the deployment finishes. You can monitor the deployment progress under ClusterControl -> Settings -> Cluster Jobs and once deployed, you will notice it took around 15 minutes:

To sum it up, the total deployment time including installing ClusterControl is 15 + 4 + 1 + 5 = 25 minutes.

Following table summarizes the above deployment actions:

Area Manual ClusterControl
Total steps 8 steps x 3 servers + 1 = 25 8
Duration 18 x 3 = 54 minutes 25 minutes

To summarize, we needed less steps and less time with ClusterControl to achieve the same result. 3 node is sort of a minimum cluster size, and the difference would get bigger with clusters with more nodes.

Load Balancer and Virtual IP Deployment

Now that we have our Galera cluster running, the next thing is to add a load balancer in front. This provides one single endpoint to the cluster, thus reducing the complexity for applications to connect to a multi-node system. Applications would not need to have knowledge of the topology and any changes caused by failures or admin maintenance would be masked. For fault tolerance, we would need at least 2 load balancers with a virtual IP address.

By adding a load balancer tier, our architecture will look something like this:

Manual Deployment

Googling on “install haproxy virtual ip galera cluster” led us to this page. We followed the steps:

On each HAproxy node (2 times):

$ yum install epel-release
$ yum install haproxy keepalived
$ systemctl enable haproxy
$ systemctl enable keepalived
$ vi /etc/haproxy/haproxy.cfg # configure haproxy
$ systemctl start haproxy
$ vi /etc/keepalived/keepalived.conf # configure keepalived
$ systemctl start keepalived

On each DB node (3 times):

$ wget https://raw.githubusercontent.com/olafz/percona-clustercheck/master/clustercheck
$ chmod +x clustercheck
$ mv clustercheck /usr/bin/
$ vi /etc/xinetd.d/mysqlchk # configure mysql check user
$ vi /etc/services # setup xinetd port
$ systemctl start xinetd
$ mysql -uroot -p
mysql> GRANT PROCESS ON *.* TO 'clustercheckuser'@'localhost' IDENTIFIED BY 'clustercheckpassword!'

The total deployment time for this was around 42 minutes.

ClusterControl

For the ClusterControl host, here are the steps taken (1 minute) :

$ ssh-copy-id 10.0.0.229
$ ssh-copy-id 10.0.0.230

Then, go to ClusterControl -> select the database cluster -> Add Load Balancer and enter the IP address of the HAproxy hosts, one at a time:

Once both HAProxy are deployed, we can add Keepalived to provide a floating IP address and perform failover:

Go to ClusterControl -> select the database cluster -> Logs -> Jobs. The total deployment took about 5 minutes, as shown in the screenshot below:

Thus, total deployment for load balancers plus virtual IP address and redundancy is 1 + 5 = 6 minutes.

Following table summarized the above deployment actions:

Area Manual ClusterControl
Total steps (8 x 2 haproxy nodes) + (8 x 3 DB nodes) = 40 6
Duration 42 minutes 6 minutes

ClusterControl also manages and monitors the load balancers:

Adding a Read Replica

Our setup is now looking pretty decent, and the next step is to add a read replica to Galera. What is a read replica, and why do we need it? A read replica is an asynchronous slave, replicating from one of the Galera nodes using standard MySQL replication. There are a few good reasons to have this. Long-running reporting/OLAP type queries on a Galera node might slow down an entire cluster, if the reporting load is so intensive that the node has to spend considerable effort coping with it. So reporting queries can be sent to a standalone server, effectively isolating Galera from the reporting load. An asynchronous slave can also serve as a remote live backup of our cluster in a DR site, especially if the link is not good enough to stretch one cluster across 2 sites.

Our architecture is now looking like this:

Manual Deployment

Googling on “mysql galera with slave” brought us to this page. We followed the steps:

On master node:

$ vim /etc/my.cnf # setting up binary log and gtid
$ systemctl restart mysql
$ mysqldump --single-transaction --skip-add-locks --triggers --routines --events > dump.sql
$ mysql -uroot -p
mysql> GRANT REPLICATION SLAVE ON .. ;

On slave node (we used Percona Server):

$ yum install http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-release-0.1-3.noarch.rpm
$ yum install Percona-Server-server-56
$ vim /etc/my.cnf # setting up server id, gtid and stuff
$ systemctl start mysql
$ mysql_secure_installation
$ scp root@master:~/dump.sql /root
$ mysql -uroot -p < /root/dump.sql
$ mysql -uroot -p
mysql> CHANGE MASTER ... MASTER_AUTO_POSITION=1;
mysql> START SLAVE;

The total time spent for this manual deployment was around 40 minutes (with 1GB database in size).

ClusterControl

With ClusterControl, here is what we should do. Firstly, configure passwordless SSH to the target slave (0.5 minute):

$ ssh-copy-id 10.0.0.231 # setup passwordless ssh

Then, on one of the MySQL Galera nodes, we have to enable binary logging to become a master (2 minutes):

Click Proceed to start enabling binary log for this node. Once completed, we can add the replication slave by going to ClusterControl -> choose the Galera cluster -> Add Replication Slave and specify as per below (6 minutes including streaming 1GB of database to slave):

Click “Add node” and you are set. Total deployment time for adding a read replica complete with data is 6 + 2 + 0.5 = 8.5 minutes.

Following table summarized the above deployment actions:

Area Manual ClusterControl
Total steps 15 3
Duration 40 minutes 8.5 minutes

We can see that ClusterControl automates a number of time consuming tasks, including slave installation, backup streaming and slaving from master. Note that ClusterControl will also handle things like master failover so that replication does not break if the galera master fails..

Conclusion

A good deployment is important, as it is the foundation of an upcoming database workload. Speed matters too, especially in agile environments where a team frequently deploys entire systems and tears them down after a short time. You’re welcome to try ClusterControl to automate your database deployments, it comes with a free 30-day trial of the full enterprise features. Once the trial ends, it will default to the community edition (free forever).

by Severalnines at September 19, 2016 09:12 AM

September 16, 2016

Peter Zaitsev

How X Plugin Works Under the Hood

X Plugin

X PluginIn this blog post, we’ll look at what MySQL does under the hood to transform NoSQL requests to SQL (and then store them in InnoDB transactional engine) when using the X Plugin.

X Plugin allows MySQL to function as a document store. We don’t need to define any schema or use SQL language while still being a fully ACID database. Sounds like magic – but we know the only thing that magic does is make planes fly! 🙂

Alexander already wrote a blog post exploring how the X Plugin works, with some examples. In this post, I am going to show some more query examples and how they are transformed.

I have enabled the slow query log to see what it is actually being executed when I run NoSQL queries.

Creating our first collection

We start the MySQL shell and create our first collection:

$ mysqlsh -u root --py
Creating an X Session to root@localhost:33060
No default schema selected.
[...]
Currently in Python mode. Use sql to switch to SQL mode and execute queries.
mysql-py> db.createCollection("people")

What is a collection in SQL terms? A table. Let’s check what MySQL does by reading the slow query log:

CREATE TABLE `people` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,
  PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

As we correctly guessed, it creates a table with two columns. One is called “doc” and it stores a JSON document. A second column named “_id” and is created as a virtual column from data extracted from that JSON document. _id is used as a primary key, and if we don’t specify a value, MySQL will choose a random UUID every time we write a document.

So, the basics are clear.

  • It stores everything inside a JSON column.
  • Indexes are created on virtual columns that are generated by extracting data from that JSON. Every time we add a new index, a virtual column will be generated. That means that under the hood, an alter table will run adding the column and the corresponding index.

Let’s run a getCollections that would be similar to “SHOW TABLES” in the SQL world:

mysql-py> db.getCollections()
[
]

This is what MySQL actually runs:

SELECT C.table_name AS name, IF(ANY_VALUE(T.table_type)='VIEW', 'VIEW', IF(COUNT(*) = COUNT(CASE WHEN (column_name = 'doc' AND data_type = 'json') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name = '_id' AND generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name != '_id' AND generation_expression RLIKE '^(json_unquote[[.(.]])?json_extract[[.(.]]`doc`,''[[.$.]]([[...]][^[:space:][...]]+)+''[[.).]]{1,2}$') THEN 1 ELSE NULL END), 'COLLECTION', 'TABLE')) AS type FROM information_schema.columns AS C LEFT JOIN information_schema.tables AS T USING (table_name)WHERE C.table_schema = 'test' GROUP BY C.table_name ORDER BY C.table_name;

This time, the query is a bit more complex. It runs a query on information_schema.tables joining it, with information_schema.columns searching for tables that have “doc” and “_id” columns.

Inserting and reading documents

I am going to start adding data to our collection. Let’s add our first document:

mysql-py> db.people.add(
      ...  {
      ...     "Name": "Miguel Angel",
      ...     "Country": "Spain",
      ...     "Age": 33
      ...   }
      ... )

In the background, MySQL inserts a JSON object and auto-assign a primary key value.

INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Age',33,'Country','Spain','Name','Miguel Angel','_id','a45c69cd2074e611f11f62bf9ac407d7'));

Ok, this is supposed to be schemaless. So let’s add someone else using different fields:

mysql-py> db.people.add(
      ...  {
      ...     "Name": "Thrall",
      ...     "Race": "Orc",
      ...     "Faction": "Horde"
      ...   }
      ... )

Same as before, MySQL just writes another JSON object (with different fields):

INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Faction','Horde','Name','Thrall','Race','Orc','_id','7092776c2174e611f11f62bf9ac407d7'));

Now we are going to read the data we have just inserted. First, we are going to find all documents stored in the collection:

mysql-py> db.people.find()

MySQL translates to a simple:

SELECT doc FROM `test`.`people`;

And this is how filters are transformed:

mysql-py> db.people.find("Name = 'Thrall'")

It uses a SELECT with the WHERE clause on data extracted from the JSON object.

SELECT doc FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall');

Updating documents

Thrall decided that he doesn’t want to belong to the Horde anymore. He wants to join the Alliance. We need to update the document:

mysql-py> db.people.modify("Name = 'Thrall'").set("Faction", "Alliance")

MySQL runs an UPDATE, again using a WHERE clause on the data extracted from the JSON. Then, it updates the “Faction”:

UPDATE `test`.`people` SET doc=JSON_SET(doc,'$.Faction','Alliance') WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall');

Now I want to remove my own document:

mysql-py> db.people.remove("Name = 'Miguel Angel'");

As you can already imagine, it runs a DELETE, searching for my name on the data extracted from the JSON object:

DELETE FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Miguel Angel');

Summary

The magic that makes our MySQL work like a document-store NoSQL database is:

  • Create a simple InnoDB table with a JSON column.
  • Auto-generate the primary key with UUID values and represent it as a virtual column.
  • All searches are done by extracting data JSON_EXTRACT, and passing that info to the WHERE clause.

I would define the solution as something really clever, simple and clean. Congrats to Oracle! 🙂

by Miguel Angel Nieto at September 16, 2016 07:59 PM

Consul, ProxySQL and MySQL HA

ProxySQL

When it comes to “decision time” about which type of MySQL HA (high-availability) solution to implement, and how to architect the solution, many questions come to mind. The most important questions are:

  • “What are the best tools to provide HA and Load Balancing?”
  • “Should I be deploying this proxy tool on my application servers or on a standalone server?”.

Ultimately, the best tool really depends on the needs of your application and your environment. You might already be using specific tools such as Consul or MHA, or you might be looking to implement tools that provide richer features. The dilemma of deploying a proxy instance per application host versus a standalone proxy instance is usually a trade-off between “a less effective load balancing algorithm” or “a single point of failure.” Neither are desirable, but there are ways to implement a solution that balances all aspects.

In this article, we’ll go through a solution that is suitable for an application that has not been coded to split reads and writes over separate MySQL instances. An application like this would rely on a proxy or 3rd party tool to split reads/writes, and preferably a solution that has high-availability at the proxy layer. The solution described here is comprised of ProxySQLConsul and Master High Availability (MHA). Within this article, we’ll focus on the configuration required for ProxySQL and Consul since there are many articles that cover MHA configuration (such as Miguel’s recent MHA Quick Start Guide blog post).

When deploying Consul in production, a minimum of 3x instances are recommended – in this example, the Consul agents run on the Application Server (appserver) as well as on the two “ProxySQL servers” mysql1 and mysql2 (which act as the HA proxy pair). This is not a hard requirement, and these instances can easily run on another host or docker container. MySQL is deployed locally on mysql1 and mysql2, however this could just as well be 1..n separate standalone DB server instances:

Consul ProxySQL

So let’s move on to the actual configuration of this HA solution, starting with Consul.

Installation of Consul:

Firstly, we’ll need to install the required packages, download the Consul archive and perform the initial configuration. We’ll need to perform the same installation on each of the nodes (i.e., appserver, mysql1 and mysql2).

### Install pre-requisite packages:
sudo yum -y install wget unzip bind-utils dnsmasq
### Install Consul:
sudo useradd consul
sudo mkdir -p /opt/consul /etc/consul.d
sudo touch /var/log/consul.log /etc/consul.d/proxysql.json
cd /opt/consul
sudo wget https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_linux_amd64.zip
sudo unzip consul_0.6.4_linux_amd64.zip
sudo ln -s /opt/consul/consul /usr/bin/consul
sudo chown consul:consul -R /etc/consul* /opt/consul* /var/log/consul.log

Configuration of Consul on Application Server (used as ‘bootstrap’ node):

Now, that we’re done with the installation on each of the hosts, let’s continue with the configuration. In this example we’ll bootstrap the Consul cluster using “appserver”:

### Edit configuration files
$ sudo vi /etc/consul.conf
{
  "datacenter": "dc1",
  "data_dir": "/opt/consul/",
  "log_level": "INFO",
  "node_name": "agent1",
  "server": true,
  "ui": true,
  "bootstrap": true,
  "client_addr": "0.0.0.0",
  "advertise_addr": "192.168.1.119"  ## Add server IP here
}
######
$ sudo vi /etc/consul.d/proxysql.json
{"services": [
  {
   "id": "proxy1",
   "name": "proxysql",
   "address": "192.168.1.120",
   "tags": ["mysql"],
   "port": 6033,
   "check": {
     "script": "mysqladmin ping --host=192.168.1.120 --port=6033 --user=root --password=123",
     "interval": "3s"}
   },
  {
   "id": "proxy2",
   "name": "proxysql",
   "address": "192.168.1.121",
   "tags": ["mysql"],
   "port": 6033,
   "check": {
     "script": "mysqladmin ping --host=192.168.1.121 --port=6033 --user=root --password=123",
     "interval": "3s"}
   }
 ]
}
######
### Start Consul agent
$ sudo su - consul -c 'consul agent -config-file=/etc/consul.conf -config-dir=/etc/consul.d > /var/log/consul.log &'
### Setup DNSMASQ (as root)
echo "server=/consul/127.0.0.1#8600" > /etc/dnsmasq.d/10-consul
service dnsmasq restart
### Remember to add the localhost as a DNS server (this step can vary
### depending on how your DNS servers are managed... here I'm just
### adding the following line to resolve.conf:
sudo vi /etc/resolve.conf
#... snippet ...#
nameserver 127.0.0.1
#... snippet ...#
### Restart dnsmasq
sudo service dnsmasq restart

The service should now be started, and you can verify this in the logs in “/var/log/consul.log”.

Configuration of Consul on Proxy Servers:

The next item is to configure each of the proxy Consul agents. Note that the “agent name” and the “IP address” need to be updated for each host (values for both must be unique):

### Edit configuration files
$ sudo vi /etc/consul.conf
{
  "datacenter": "dc1",
  "data_dir": "/opt/consul/",
  "log_level": "INFO",
  "node_name": "agent2",  ### Agent node name must be unique
  "server": true,
  "ui": true,
  "bootstrap": false,   ### Disable bootstrap on joiner nodes
  "client_addr": "0.0.0.0",
  "advertise_addr": "192.168.1.xxx",  ### Set to local instance IP
  "dns_config": {
    "only_passing": true
  }
}
######
$ sudo vi /etc/consul.d/proxysql.json
{"services": [
  {
   "id": "proxy1",
   "name": "proxysql",
   "address": "192.168.1.120",
   "tags": ["mysql"],
   "port": 6033,
   "check": {
     "script": "mysqladmin ping --host=192.168.1.120 --port=6033 --user=root --password=123",
     "interval": "3s"}
   },
  {
   "id": "proxy2",
   "name": "proxysql",
   "address": "192.168.1.121",
   "tags": ["mysql"],
   "port": 6033,
   "check": {
     "script": "mysqladmin ping --host=192.168.1.121 --port=6033 --user=root password=123",
     "interval": "3s"}
   }
 ]
}
######
### Start Consul agent:
$ sudo su - consul -c 'consul agent -config-file=/etc/consul.conf -config-dir=/etc/consul.d > /var/log/consul.log &'
### Join Consul cluster specifying 1st node IP e.g.
$ consul join 192.168.1.119
### Verify logs and look out for the following messages:
$ cat /var/log/consul.log
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'agent2'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 192.168.1.120 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas:
==> Log data will now stream in as it occurs:
# ... snippet ...
    2016/09/05 19:48:04 [INFO] agent: Synced service 'consul'
    2016/09/05 19:48:04 [INFO] agent: Synced check 'service:proxysql1'
    2016/09/05 19:48:04 [INFO] agent: Synced check 'service:proxysql2'
# ... snippet ...

At this point, we have Consul installed, configured and running on each of our hosts appserver (mysql1 and mysql2). Now it’s time to install and configure ProxySQL on mysql1 and mysql2.

Installation & Configuration of ProxySQL:

The same procedure should be run on both mysql1 and mysql2 hosts:

### Install ProxySQL packages and initialise ProxySQL DB
sudo yum -y install https://github.com/sysown/proxysql/releases/download/v1.2.2/proxysql-1.2.2-1-centos7.x86_64.rpm
sudo service proxysql initial
sudo service proxysql stop
### Edit the ProxySQL configuration file to update username / password
vi /etc/proxysql.cnf
###
admin_variables=
{
    admin_credentials="admin:admin"
    mysql_ifaces="127.0.0.1:6032;/tmp/proxysql_admin.sock"
}
###
### Start ProxySQL
sudo service proxysql start
### Connect to ProxySQL and configure
mysql -P6032 -h127.0.0.1 -uadmin -padmin
### First we create a replication hostgroup:
mysql> INSERT INTO mysql_replication_hostgroups VALUES (10,11,'Standard Replication Groups');
### Add both nodes to the hostgroup 11 (ProxySQL will automatically put the writer node in hostgroup 10)
mysql> INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.120',11,3306,1000);
mysql> INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.121',11,3306,1000);
### Save server configuration
mysql> LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
### Add query rules for RW split
mysql> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, cache_ttl, apply) VALUES (1, '^SELECT .* FOR UPDATE', 10, NULL, 1);
mysql> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, cache_ttl, apply) VALUES (1, '^SELECT .*', 11, NULL, 1);
mysql> LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;
### Finally configure ProxySQL user and save configuration
mysql> INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema) VALUES ('root','123',1,10,'test');
mysql> LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK;
mysql> EXIT;

MySQL Configuration:

We also need to perform one configuration step on the MySQL servers in order to create a user for ProxySQL to monitor the instances:

### ProxySQL's monitor user on the master MySQL server (default username and password is monitor/monitor)
mysql -h192.168.1.120 -P3306 -uroot -p123 -e"GRANT USAGE ON *.* TO monitor@'%' IDENTIFIED BY 'monitor';"

We can view the configuration of the monitor user on the ProxySQL host by checking the global variables on the admin interface:

mysql> SHOW VARIABLES LIKE 'mysql-monitor%';
+----------------------------------------+---------+
| Variable_name                          | Value   |
+----------------------------------------+---------+
| mysql-monitor_enabled                  | true    |
| mysql-monitor_connect_timeout          | 200     |
| mysql-monitor_ping_max_failures        | 3       |
| mysql-monitor_ping_timeout             | 100     |
| mysql-monitor_replication_lag_interval | 10000   |
| mysql-monitor_replication_lag_timeout  | 1000    |
| mysql-monitor_username                 | monitor |
| mysql-monitor_password                 | monitor |
| mysql-monitor_query_interval           | 60000   |
| mysql-monitor_query_timeout            | 100     |
| mysql-monitor_slave_lag_when_null      | 60      |
| mysql-monitor_writer_is_also_reader    | true    |
| mysql-monitor_history                  | 600000  |
| mysql-monitor_connect_interval         | 60000   |
| mysql-monitor_ping_interval            | 10000   |
| mysql-monitor_read_only_interval       | 1500    |
| mysql-monitor_read_only_timeout        | 500     |
+----------------------------------------+---------+

Testing Consul:

Now that Consul and ProxySQL are configured we can do some tests from the “appserver”. First, we’ll verify that the hosts we’ve added are both reporting [OK] on our DNS requests:

$ dig @127.0.0.1 -p 53 proxysql.service.consul
; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @127.0.0.1 -p 53 proxysql.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9975
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;proxysql.service.consul.	IN	A
;; ANSWER SECTION:
proxysql.service.consul. 0	IN	A	192.168.1.121
proxysql.service.consul. 0	IN	A	192.168.1.120
;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Sep 05 19:32:12 UTC 2016
;; MSG SIZE  rcvd: 158

As you can see from the output above, DNS is reporting both 192.168.120 and 192.168.1.121 as available for the ProxySQL service. As soon as the ProxySQL check fails, the nodes will no longer report in the output above.

We can also view the status of our cluster and agents through the Consul Web GUI which runs on port 8500 of all the Consul servers in this configuration (e.g. http://192.168.1.120:8500/):

Consul GUI

Testing ProxySQL:

So now that we have this configured we can also do some basic tests to see that ProxySQL is load balancing our connections:

[percona@appserver consul.d]$ mysql -hproxysql.service.consul -e"select @@hostname"
+--------------------+
| @@hostname         |
+--------------------+
| mysql1.localdomain |
+--------------------+
[percona@appserver consul.d]$ mysql -hproxysql.service.consul -e"select @@hostname"
+--------------------+
| @@hostname         |
+--------------------+
| mysql2.localdomain |
+--------------------+

Perfect! We’re ready to use the hostname “proxysql.service.consul” to connect to our MySQL instances using a round-robin load balancing and HA proxy solution. If one of the two ProxySQL instances fails, we’ll continue communicating with the database through the other. Of course, this configuration is not limited to just two hosts, so feel free to add as many as you need. Be aware that in this example the two hosts’ replication hierarchy is managed by MHA in order to allow for master/slave promotion. By performing an automatic or manual failover using MHA, ProxySQL automatically detects the change in replication topology and redirect writes to the newly promoted master instance.

To make this configuration more durable, it is encouraged to create a more intelligent Consul check – i.e., a check that checks more than just the availability of the MySQL service (an example would be to select some data from a table). It is also recommended to fine tune the interval of the check to suit the requirements of your application.

by Nik Vyzas at September 16, 2016 03:20 PM

September 15, 2016

Peter Zaitsev

ProxySQL and Percona XtraDB Cluster (Galera) Integration

ProxySQL and Percona XtraDB ClusterIn this post, we’ll discuss how an integrated ProxySQL and Percona XtraDB Cluster (Galera) helps manage node states and failovers.

ProxySQL is designed to not perform any specialized operation in relation to the servers with which it communicates. Instead, it uses an event scheduler to extend functionalities and cover any special needs.

Given that specialized products like Percona XtraDB Cluster are not managed by ProxySQL, they require the design and implementation of good/efficient extensions.

In this article, I will illustrate how Percona XtraDB Cluster/Galera can be integrated with ProxySQL to get the best from both.

Brief digression

Before discussing their integration, we need to review a couple of very important concepts in ProxySQL. ProxySQL has a very important logical component: Hostgroup(s) (HG).

A hostgroup is a relation of:

+-----------+       +------------------------+
|Host group +------>|Server (1:N)            |
+-----------+       +------------------------+

In ProxySQL, QueryRules (QR) can be directly mapped to an HG. Using QRs, you can define a specific user to ONLY go to that HG. For instance, you may want to have user app1_user go only on servers A-B-C. Simply set a QR that says app1_user has the destination hostgroup 5, where HG 5 has the servers A-B-C:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',5,3306,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',5,3306,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',5,3306,10);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) values('app1_user',5,1);

Easy isn’t it?

Another important concept in ProxySQL also related to HG is ReplicationHostgroup(s) (RHG). This is a special HG that ProxySQL uses to automatically manage the nodes that are connected by replication and configured in Write/Read and Read_only mode.

What does this mean? Let’s say you have four nodes A-B-C-D, connected by standard asynchronous replication. A is the master and B-C-D are the slaves. What you want is to have your application pointing writes to server A, and reads to B-C (keeping D as a backup slave). Also, you don’t want to have any reads go to B-C if the replication delay is more than two seconds.

RHG, in conjunction with HG, ProxySQL can manage all this for you. Simply instruct the proxy to:

  1. Use RHG
  2. Define the value of the maximum latency

Using the example above:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.5',5,3306,10,2);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.6',5,3306,10,2);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.7',5,3306,10,2);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.8',10,3306,10,2);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) values('app1_user',5,1);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) values('app1_user',6,1);
INSERT INTO mysql_replication_hostgroups VALUES (5,6);

From now on ProxySQL will split the R/W using the RHG and the nodes defined in HG 5.
The flexibility introduced by using HGs is obviously not limited to what I mention here. It will play a good part in the integration of Percona XtraDB Cluster and ProxySQL, as I illustrate below.

Percona XtraDB Cluster/Galera Integration

In an XtraDB cluster, a node has many different states and conditions that affect if and how your application operates on the node.

The most common one is when a node become a DONOR. If you’ve ever installed Percona XtraDB Cluster (or any Galera implementation), you’ve faced the situation when a node become a DONOR it changes state to DESYNC. If the node is under a heavy load, the DONOR process might affect the node itself.

But that is just one of the possible node states:

  • A node can be JOINED but not synced
  • It can have wsrep_rejectqueries, wsrep_donorrejectqueries, wsrep_ready (off)
  • It can be in a different segment
  • The number of nodes per segment is relevant.

To show what can be done and how, we will use the following setup:

  • Five nodes
  • Two segments
  • Applications requiring R/W split

And two options:

  • Single writer node
  • Multiple writers node

We’ll analyze how the proxy behaves under the use of a script run by the ProxySQL scheduler.

The use of a script is necessary for ProxySQL to respond correctly to Percona XtraDB Cluster state modifications. ProxySQL comes with two scripts for Galera, both of them are too basic and don’t consider a lot of relevant conditions. I’ve written a more complete script: https://github.com/Tusamarco/proxy_sql_tools galera_check.pl

This script is a prototype and requires QA and debugging, but is still more powerful than the default ones.

The script is designed to manage X number of nodes that belong to a given HG. The script works by HG, and as such it will perform isolated actions/checks by the HG. It is not possible to have more than one check running on the same HG. The check will create a lock file {proxysql_galera_check_${hg}.pid} that will be used to prevent duplicates.

galera_check
 will connect to the ProxySQL node and retrieve all the information regarding the nodes/proxysql configuration. It will then check in parallel each node and will retrieve the status and configuration.
galera_check
 analyzes and manages the following node states:

  • read_only
  • wsrep_status
  • wsrep_rejectqueries
  • wsrep_donorrejectqueries
  • wsrep_connected
  • wsrep_desinccount
  • wsrep_ready
  • wsrep_provider
  • wsrep_segment
  • Number of nodes in by segment
  • Retry loop

As mentioned, the number of nodes inside a segment is relevant. If a node is the only one in a segment, the check behaves accordingly. For example, if a node is the only one in the MAIN segment, it will not put the node in OFFLINE_SOFT when the node becomes a donor, to prevent the cluster from becoming unavailable for applications.

The script allows you to declare a segment as MAIN — quite useful when managing production and DR sites, as the script manages the segment acting as main in a more conservative way. The check can be configured to perform retries after a given interval, where the interval is the time define in the ProxySQL scheduler. As such, if the check is set to have two retries for UP and three for DOWN, it will loop that number before doing anything.

Percona XtraDB Cluster/Galera performs some actions under the hood, some of them not totally correct. This feature is useful in some uncommon circumstances, where Galera behaves weirdly. For example, whenever a node is set to READ_ONLY=1, Galera desyncs and resyncs the node. A check that doesn’t take this into account sets the node to OFFLINE and back for no reason.

Another important differentiation for this check is that it use special HGs for maintenance, all in the range of 9000. So if a node belongs to HG 10, and the check needs to put it in maintenance mode, the node will be moved to HG 9010. Once all is normal again, the node will be put back on its original HG.

This check does NOT modify any node states. This means it will NOT modify any variables or settings in the original node. It will ONLY change node states in ProxySQL.

Multi-writer mode

The recommended way to use Galera is in multi-writer mode. You can then play with the weight to have a node act as MAIN node and prevent/reduce certification failures and Brutal force Abort from Percona XtraDB Cluster. Use this configuration:

Delete from mysql_replication_hostgroups where writer_hostgroup=500 ;
delete from mysql_servers where hostgroup_id in (500,501);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',500,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',501,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',500,3306,1000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',501,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',500,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',501,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.8',500,3306,1);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.8',501,3306,1);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.9',500,3306,1);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.9',501,3306,1);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL TO DISK;

In this test, we will NOT use Replication HostGroup. We will do that later when testing a single writer. For now, we’ll focus on multi-writer.

Segment 1 covers HG 500 and 501, while segment two only covers 501. Weight for the servers in HG 500 is progressive from 1 to 1 billion, in order to reduce the possible random writes on the non-main node.

As such nodes:

  • HG 500S1 192.168.1.5 – 1.000.000.000
    • S1 192.168.1.6 – 1.000.000
    • S1 192.168.1.7 – 100
    • S2 192.168.1.8 – 1
    • S2 192.168.1.9 – 1
  • HG 501S1 192.168.1.5 – 100
    • S1 192.168.1.6 – 1000000000
    • S1 192.168.1.7 – 1000000000
    • S2 192.168.1.8 – 1
    • S2 192.168.1.9 – 1

The following command shows what ProxySQL is doing:

watch -n 1 'mysql -h 127.0.0.1 -P 3310 -uadmin -padmin -t -e "select * from stats_mysql_connection_pool where hostgroup in (500,501,9500,9501) order by hostgroup,srv_host ;" -e " select hostgroup_id,hostname,status,weight,comment from mysql_servers where hostgroup_id in (500,501,9500,9501)  order by hostgroup_id,hostname ;"'

Download the check from GitHub (https://github.com/Tusamarco/proxy_sql_tools) and activate it in ProxySQL. Be sure to set the parameters that match your installation:

delete from scheduler where id=10;
INSERT  INTO scheduler (id,active,interval_ms,filename,arg1) values (10,0,2000,"/var/lib/proxysql/galera_check.pl","-u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --debug=0  --log=/var/lib/proxysql/galeraLog");
LOAD SCHEDULER TO RUNTIME;SAVE SCHEDULER TO DISK;

If you want to activate it:

update scheduler set active=1 where id=10;
LOAD SCHEDULER TO RUNTIME;

The following is the kind of scenario we have:

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.9 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 413        |
| 500       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 420        |
| 500       | 192.168.1.7 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 227        |
| 500       | 192.168.1.6 | 3306     | ONLINE | 0        | 10       | 10     | 0       | 12654    | 1016975         | 0               | 230        |
| 500       | 192.168.1.5 | 3306     | ONLINE | 0        | 9        | 29     | 0       | 107358   | 8629123         | 0               | 206        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 4        | 6      | 0       | 12602425 | 613371057       | 34467286486     | 413        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 6        | 7      | 0       | 12582617 | 612422028       | 34409606321     | 420        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 0        | 6        | 6      | 0       | 18580675 | 905464967       | 50824195445     | 227        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 0        | 6        | 14     | 0       | 18571127 | 905075154       | 50814832276     | 230        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 10     | 0       | 169570   | 8255821         | 462706881       | 206        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

To generate a load, use the following commands (or whatever you like, but use a different one for read-only and reads/writes):

Write
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=stress_RW --mysql-password=test --mysql-db=test_galera --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=50000 --max-requests=0 --max-time=9000 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all run
Read only
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=stress_RW --mysql-password=test --mysql-db=test_galera --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=50000 --max-requests=0 --max-time=9000 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10 --mysql-ignore-errors=all run

The most common thing that could happen to a cluster node is to become a donor. This is a planned activity for Percona XtraDB Cluster and is suppose to be managed in a less harmful way.

We’re going to simulate crashing a node and forcing it to elect our main node as DONOR (the one with the highest WEIGHT).

To do so, we need to have the parameter

wsrep_sst_donor
 set.

show global variables like 'wsrep_sst_donor';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| wsrep_sst_donor | node1 | <---
+-----------------+-------+

Activate the check if not already done:

update scheduler set active=1 where id=10;

And now run traffic. Check load:

select * from stats_mysql_connection_pool where hostgroup in (500,501,9500,9501) order by hostgroup,srv_host ;
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 10       | 0        | 30     | 0       | 112662   | 9055479         | 0               | 120        | <--- our Donor
| 500       | 192.168.1.6 | 3306     | ONLINE | 0        | 10       | 10     | 0       | 12654    | 1016975         | 0               | 111        |
| 500       | 192.168.1.7 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 115        |
| 500       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 316        |
| 500       | 192.168.1.9 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 329        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 10     | 0       | 257271   | 12533763        | 714473854       | 120        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 0        | 10       | 18     | 0       | 18881582 | 920200116       | 51688974309     | 111        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 3        | 6        | 9      | 0       | 18927077 | 922317772       | 51794504662     | 115        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 1        | 8      | 0       | 12595556 | 613054573       | 34447564440     | 316        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 1        | 3        | 6      | 0       | 12634435 | 614936148       | 34560620180     | 329        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

Now on one of the nodes:

  1. Kill mysql
  2. Remove the content of the data directory
  3. Restart the node

The node will go in SST and our

galera_check
 script will manage it:

+--------------+-------------+--------------+------------+--------------------------------------------------+
| hostgroup_id | hostname    | status       | weight     | comment                                          |
+--------------+-------------+--------------+------------+--------------------------------------------------+
| 500          | 192.168.1.5 | OFFLINE_SOFT | 1000000000 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <---- the donor
| 500          | 192.168.1.6 | ONLINE       | 1000000    |                                                  |
| 500          | 192.168.1.7 | ONLINE       | 100        |                                                  |
| 500          | 192.168.1.8 | ONLINE       | 1          |                                                  |
| 500          | 192.168.1.9 | ONLINE       | 1          |                                                  |
| 501          | 192.168.1.5 | OFFLINE_SOFT | 100        | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; |
| 501          | 192.168.1.6 | ONLINE       | 1000000000 |                                                  |
| 501          | 192.168.1.7 | ONLINE       | 1000000000 |                                                  |
| 501          | 192.168.1.8 | ONLINE       | 1          |                                                  |
| 501          | 192.168.1.9 | ONLINE       | 1          |                                                  |
+--------------+-------------+--------------+------------+--------------------------------------------------+

We can also check the

galera_check
 log and see what happened:

2016/09/02 16:13:27.298:[WARN] Move node:192.168.1.5;3306;500;3010 SQL: UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306'
2016/09/02 16:13:27.303:[WARN] Move node:192.168.1.5;3306;501;3010 SQL: UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

The node will remain in OFFLINE_SOFT while the other node (192.168.1.6 with the 2nd WEIGHT) serves the writes, until the node is in DONOR state.

All as expected, the node was set in OFFLINE_SOFT state, which mean the existing connections finished, while the node was not accepting any NEW connections.

As soon the node stops sending data to the Joiner, it was moved back and traffic restarted:

2016/09/02 16:14:58.239:[WARN] Move node:192.168.1.5;3306;500;1000 SQL: UPDATE mysql_servers SET status='ONLINE' WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306'
2016/09/02 16:14:58.243:[WARN] Move node:192.168.1.5;3306;501;1000 SQL: UPDATE mysql_servers SET status='ONLINE' WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 6        | 1        | 37     | 0       | 153882   | 12368557        | 0               | 72         | <---
| 500       | 192.168.1.6 | 3306     | ONLINE | 1        | 9        | 10     | 0       | 16008    | 1286492         | 0               | 42         |
| 500       | 192.168.1.7 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 1398     | 112371          | 0               | 96         |
| 500       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 24545  | 791     | 24545    | 122725          | 0               | 359        |
| 500       | 192.168.1.9 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 15108    | 1214366         | 0               | 271        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 1        | 0        | 11     | 0       | 2626808  | 128001112       | 7561278884      | 72         |
| 501       | 192.168.1.6 | 3306     | ONLINE | 5        | 7        | 20     | 0       | 28629516 | 1394974468      | 79289633420     | 42         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 2        | 8        | 10     | 0       | 29585925 | 1441400648      | 81976494740     | 96         |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 16779  | 954     | 12672983 | 616826002       | 34622768228     | 359        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 4        | 6      | 0       | 13567512 | 660472589       | 37267991677     | 271        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

This was easy, and more or less managed by the standard script. But what would happen if my donor was set to DO NOT serve query when in the DONOR state?

Wait, what?? Yes, Percona XtraDB Cluster (and Galera in general) can be set to refuse any query when the node goes in DONOR state. If not managed this can cause issues as the node will simply reject queries (but ProxySQL sees the node as alive).

Let me show you:

show global variables like 'wsrep_sst_donor_rejects_queries';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| wsrep_sst_donor_rejects_queries | ON    |
+---------------------------------+-------+

For the moment, let’s deactivate the check. Then, do the same stop and delete of the data dir, then restart the node. SST takes place.

Sysbench will report:

ALERT: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'BEGIN'
FATAL: failed to execute function `event': 3
ALERT: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'BEGIN'
FATAL: failed to execute function `event': 3

But ProxySQL?

+-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE  | 0        | 0        | 101    | 0       | 186331   | 14972717        | 0               | 118        | <-- no writes in wither HG
| 500       | 192.168.1.6 | 3306     | ONLINE  | 0        | 9        | 10     | 0       | 20514    | 1648665         | 0               | 171        |  |
| 500       | 192.168.1.7 | 3306     | ONLINE  | 0        | 1        | 3      | 0       | 5881     | 472629          | 0               | 134        |  |
| 500       | 192.168.1.8 | 3306     | ONLINE  | 0        | 0        | 205451 | 1264    | 205451   | 1027255         | 0               | 341        |  |
| 500       | 192.168.1.9 | 3306     | ONLINE  | 0        | 1        | 2      | 0       | 15642    | 1257277         | 0               | 459        |  -
| 501       | 192.168.1.5 | 3306     | ONLINE  | 1        | 0        | 13949  | 0       | 4903347  | 238627310       | 14089708430     | 118        |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 2        | 10       | 20     | 0       | 37012174 | 1803380964      | 103269634626    | 171        |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 2        | 11       | 13     | 0       | 38782923 | 1889507208      | 108288676435    | 134        |
| 501       | 192.168.1.8 | 3306     | SHUNNED | 0        | 0        | 208452 | 1506    | 12864656 | 626156995       | 34622768228     | 341        |
| 501       | 192.168.1.9 | 3306     | ONLINE  | 1        | 3        | 6      | 0       | 14451462 | 703534884       | 39837663734     | 459        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
mysql> select * from mysql_server_connect_log where hostname in ('192.168.1.5','192.168.1.6','192.168.1.7','192.168.1.8','192.168.1.9')  order by time_start_us desc limit 10;
+-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+
| hostname    | port | time_start_us    | connect_success_time_us | connect_error                                                                                          |
+-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+
| 192.168.1.9 | 3306 | 1472827444621954 | 1359                    | NULL                                                                                                   |
| 192.168.1.8 | 3306 | 1472827444618883 | 0                       | Can't connect to MySQL server on '192.168.1.8' (107)                                                   |
| 192.168.1.7 | 3306 | 1472827444615819 | 433                     | NULL                                                                                                   |
| 192.168.1.6 | 3306 | 1472827444612722 | 538                     | NULL                                                                                                   |
| 192.168.1.5 | 3306 | 1472827444606560 | 473                     | NULL                                                                                                   | <-- donor is seen as up
| 192.168.1.9 | 3306 | 1472827384621463 | 1286                    | NULL                                                                                                   |
| 192.168.1.8 | 3306 | 1472827384618442 | 0                       | Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 107 |
| 192.168.1.7 | 3306 | 1472827384615317 | 419                     | NULL                                                                                                   |
| 192.168.1.6 | 3306 | 1472827384612241 | 415                     | NULL                                                                                                   |
| 192.168.1.5 | 3306 | 1472827384606117 | 454                     | NULL                                                                                                   | <-- donor is seen as up
+-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+
select * from mysql_server_ping_log where hostname in ('192.168.1.5','192.168.1.6','192.168.1.7','192.168.1.8','192.168.1.9')  order by time_start_us desc limit 10;
+-------------+------+------------------+----------------------+------------------------------------------------------+
| hostname    | port | time_start_us    | ping_success_time_us | ping_error                                           |
+-------------+------+------------------+----------------------+------------------------------------------------------+
| 192.168.1.9 | 3306 | 1472827475062217 | 311                  | NULL                                                 |
| 192.168.1.8 | 3306 | 1472827475060617 | 0                    | Can't connect to MySQL server on '192.168.1.8' (107) |
| 192.168.1.7 | 3306 | 1472827475059073 | 108                  | NULL                                                 |
| 192.168.1.6 | 3306 | 1472827475057281 | 102                  | NULL                                                 |
| 192.168.1.5 | 3306 | 1472827475054188 | 74                   | NULL                                                 | <-- donor is seen as up
| 192.168.1.9 | 3306 | 1472827445061877 | 491                  | NULL                                                 |
| 192.168.1.8 | 3306 | 1472827445060254 | 0                    | Can't connect to MySQL server on '192.168.1.8' (107) |
| 192.168.1.7 | 3306 | 1472827445058688 | 53                   | NULL                                                 |
| 192.168.1.6 | 3306 | 1472827445057124 | 131                  | NULL                                                 |
| 192.168.1.5 | 3306 | 1472827445054015 | 98                   | NULL                                                 | <-- donor is seen as up
+-------------+------+------------------+----------------------+------------------------------------------------------+

As you can see, all seems OK. Let’s turn on

galera_check
 and see what happens when we run some read and write loads.

And now let me do the stop-delete-restart-SST process again:

kill -9 <mysqld_safe_pid> <mysqld_pid>; rm -fr data/*;rm -fr logs/*;sleep 2;./start

As soon as the node goes down, ProxySQL shuns the node.

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE  | 7        | 3        | 34     | 0       | 21570   | 1733833         | 0               | 146        |
| 500       | 192.168.1.6 | 3306     | ONLINE  | 1        | 8        | 12     | 0       | 9294    | 747063          | 0               | 129        |
| 500       | 192.168.1.7 | 3306     | ONLINE  | 1        | 0        | 4      | 0       | 3396    | 272950          | 0               | 89         |
| 500       | 192.168.1.8 | 3306     | SHUNNED | 0        | 0        | 1      | 6       | 12      | 966             | 0               | 326        | <-- crashed
| 500       | 192.168.1.9 | 3306     | ONLINE  | 1        | 0        | 2      | 0       | 246     | 19767           | 0               | 286        |
| 501       | 192.168.1.5 | 3306     | ONLINE  | 0        | 1        | 2      | 0       | 772203  | 37617973        | 2315131214      | 146        |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 9        | 3        | 12     | 0       | 3439458 | 167514166       | 10138636314     | 129        |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 1        | 12       | 13     | 0       | 3183822 | 155064971       | 9394612877      | 89         |
| 501       | 192.168.1.8 | 3306     | SHUNNED | 0        | 0        | 1      | 6       | 11429   | 560352          | 35350726        | 326        | <-- crashed
| 501       | 192.168.1.9 | 3306     | ONLINE  | 0        | 1        | 1      | 0       | 312253  | 15227786        | 941110520       | 286        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Immediately after, 

galera_check
 identifies the node is requesting the SST, and that the DONOR is our writer (given it is NOT the only writer in the HG, and it has the variable
wsrep_sst_donor_rejects_queries
 active), it cannot be set to OFFLINE_SOFT. We do not want ProxySQL to consider it OFFLINE_HARD (because it is not).

As such, the script moves it to a special HG:

2016/09/04 16:11:22.091:[WARN] Move node:192.168.1.5;3306;500;3001 SQL: UPDATE mysql_servers SET hostgroup_id=9500 WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306'
2016/09/04 16:11:22.097:[WARN] Move node:192.168.1.5;3306;501;3001 SQL: UPDATE mysql_servers SET hostgroup_id=9501 WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

+--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+
| hostgroup_id | hostname    | port | status | weight     | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment                                          |
+--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+
| 500          | 192.168.1.6 | 3306 | ONLINE | 1000000    | 0           | 1000            | 0                   | 0       | 0              |                                                  |
| 500          | 192.168.1.7 | 3306 | ONLINE | 100        | 0           | 1000            | 0                   | 0       | 0              |                                                  |
| 500          | 192.168.1.8 | 3306 | ONLINE | 1          | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; |
| 500          | 192.168.1.9 | 3306 | ONLINE | 1          | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; |
| 501          | 192.168.1.6 | 3306 | ONLINE | 1000000000 | 0           | 1000            | 0                   | 0       | 0              |                                                  |
| 501          | 192.168.1.7 | 3306 | ONLINE | 1000000000 | 0           | 1000            | 0                   | 0       | 0              |                                                  |
| 501          | 192.168.1.9 | 3306 | ONLINE | 1          | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; |
| 9500         | 192.168.1.5 | 3306 | ONLINE | 1000000000 | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <-- Special HG
| 9501         | 192.168.1.5 | 3306 | ONLINE | 100        | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <-- Special HG
+--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+

The Donor continues to serve the Joiner, but applications won’t see it.

What applications see is also very important. Applications doing WRITEs will see:

[ 10s] threads: 10, tps: 9.50, reads: 94.50, writes: 42.00, response time: 1175.77ms (95%), errors: 0.00, reconnects: 0.00
...
[ 40s] threads: 10, tps: 2.80, reads: 26.10, writes: 11.60, response time: 3491.45ms (95%), errors: 0.00, reconnects: 0.10
[ 50s] threads: 10, tps: 4.80, reads: 50.40, writes: 22.40, response time: 10062.13ms (95%), errors: 0.80, reconnects: 351.60 <--- Main writer moved to another HG
[ 60s] threads: 10, tps: 5.90, reads: 53.10, writes: 23.60, response time: 2869.82ms (95%), errors: 0.00, reconnects: 0.00
...

When one node shifts to another, the applications will have to manage the RE-TRY, but it will only be a short time and will cause limited impact on the production flow.

Application readers see no errors:

[ 10s] threads: 10, tps: 0.00, reads: 13007.31, writes: 0.00, response time: 9.13ms (95%), errors: 0.00, reconnects: 0.00
[ 50s] threads: 10, tps: 0.00, reads: 9613.70, writes: 0.00, response time: 10.66ms (95%), errors: 0.00, reconnects: 0.20 <-- just a glitch in reconnect
[ 60s] threads: 10, tps: 0.00, reads: 10807.90, writes: 0.00, response time: 11.07ms (95%), errors: 0.00, reconnects: 0.20
[ 70s] threads: 10, tps: 0.00, reads: 9082.61, writes: 0.00, response time: 23.62ms (95%), errors: 0.00, reconnects: 0.00
...
[ 390s] threads: 10, tps: 0.00, reads: 13050.80, writes: 0.00, response time: 8.97ms (95%), errors: 0.00, reconnects: 0.00

When the Donor ends providing SST, it comes back and the script manages it. Then 

galera_check
 puts it in the right HG:

2016/09/04 16:12:34.266:[WARN] Move node:192.168.1.5;3306;9500;1010 SQL: UPDATE mysql_servers SET hostgroup_id=500 WHERE hostgroup_id=9500 AND hostname='192.168.1.5' AND port='3306'
2016/09/04 16:12:34.270:[WARN] Move node:192.168.1.5;3306;9501;1010 SQL: UPDATE mysql_servers SET hostgroup_id=501 WHERE hostgroup_id=9501 AND hostname='192.168.1.5' AND port='3306'

The crashed node is restarted by the SST process, and the node goes up. But if the level of load in the cluster is mid/high, it will remain in the JOINED state for sometime, becoming visible by the ProxySQL again. ProxySQL will not, however, correctly recognize the state.

2016-09-04 16:17:15 21035 [Note] WSREP: 3.2 (node4): State transfer from 1.1 (node1) complete.
2016-09-04 16:17:15 21035 [Note] WSREP: Shifting JOINER -> JOINED (TO: 254515)

To avoid this issue, the script will move it to a special HG, allowing it to recovery without interfering with a real load.

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 6        | 2        | 15     | 0       | 3000    | 241060          | 0               | 141        |
| 500       | 192.168.1.6 | 3306     | ONLINE | 1        | 9        | 13     | 0       | 13128   | 1055268         | 0               | 84         |
| 500       | 192.168.1.7 | 3306     | ONLINE | 1        | 0        | 4      | 0       | 3756    | 301874          | 0               | 106        |
| 500       | 192.168.1.9 | 3306     | ONLINE | 1        | 0        | 2      | 0       | 4080    | 327872          | 0               | 278        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 1        | 0        | 2      | 0       | 256753  | 12508935        | 772048259       | 141        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 4        | 8        | 12     | 0       | 5116844 | 249191524       | 15100617833     | 84         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 2        | 11       | 13     | 0       | 4739756 | 230863200       | 13997231724     | 106        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 496524  | 24214563        | 1496482104      | 278        |
| 9500      | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 331        |<-- Joined not Sync
| 9501      | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 331        |<-- Joined not Sync
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Once the node fully recovers,

galera_check
 puts it back in the original HG, ready serve requests:

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 15     | 0       | 3444    | 276758          | 0               | 130        |
| 500       | 192.168.1.6 | 3306     | ONLINE | 0        | 9        | 13     | 0       | 13200   | 1061056         | 0               | 158        |
| 500       | 192.168.1.7 | 3306     | ONLINE | 0        | 0        | 4      | 0       | 3828    | 307662          | 0               | 139        |
| 500       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |<-- up again
| 500       | 192.168.1.9 | 3306     | ONLINE | 0        | 0        | 2      | 0       | 4086    | 328355          | 0               | 336        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 2      | 0       | 286349  | 13951366        | 861638962       | 130        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 0        | 12       | 12     | 0       | 5239212 | 255148806       | 15460951262     | 158        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 0        | 13       | 13     | 0       | 4849970 | 236234446       | 14323937975     | 139        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |<-- up again
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 507910  | 24768898        | 1530841172      | 336        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

A summary of the logical steps is:

+---------+
                |  Crash  |
                +----+----+
                     |
                     v
            +--------+-------+
            |  ProxySQL      |
            |  shun crashed  |
            |      node      |
            +--------+-------+
                     |
                     |
                     v
   +-----------------+-----------------+
   |  Donor has one of the following?  |
   |  wsrep_sst_dono _rejects_queries  |
   |  OR                               |
   |  wsrep_reject_queries             |
   +-----------------------------------+
      |No                            |Yes
      v                              v
+-----+----------+       +-----------+----+
| Galera_check   |       | Galera_check   |
| put the donor  |       | put the donor  |
| in OFFLINE_SOFT|       | in special HG  |
+---+------------+       +-----------+----+
    |                                |
    |                                |
    v                                v
+---+--------------------------------+-----+
|            Donor SST ends                |
+---+---------------+----------------+-----+
    |               |                |
    |               |                |
+---+------------+  |    +-----------+----+
| Galera_check   |  |    | Galera_check   |
| put the donor  |  |    | put the donor  |
| ONLINE         |  |    | in Original HG |
+----------------+  |    +----------------+
                    |
                    |
+------------------------------------------+
|           crashed SST ends               |
+-------------------+----------------------+
                    |
                    |
       +------------+-------------+
       |  Crashed node back but   +<------------+
       |  Not Sync?               |             |
       +--------------------------+             |
          |No                   |Yes            |
          |                     |               |
          |                     |               |
+---------+------+       +------+---------+     |
| Galera_check   |       | Galera_check   |     |
| put the node   |       | put the node   +-----+
| back orig. HG  |       | Special HG     |
+--------+-------+       +----------------+
         |
         |
         |
         |      +---------+
         +------>   END   |
                +---------+

As mentioned,

galera_check
 can manage several node states.

Another case is when we can’t have the node accept ANY queries. We might need that for several reasons, including preparing the node for maintenance (or whatever).

In Percona XtraDB Cluster (and other Galera implementations) we can set the value of

wsrep_reject_queries
 to:

  • NONE
  • ALL
  • ALL_KILL

Let see how it works. Run some load, then on the main writer node (192.168.1.5):

set global wsrep_reject_queries=ALL;

This blocks any new query execution until the run is complete. Do a simple select on the node:

(root@localhost:pm) [test]>select * from tbtest1;
ERROR 1047 (08S01): WSREP has not yet prepared node for application use

ProxySQL won’t see these conditions:

+-------------+------+------------------+----------------------+------------+
| hostname    | port | time_start_us    | ping_success_time_us | ping_error |
+-------------+------+------------------+----------------------+------------+
| 192.168.1.5 | 3306 | 1473005467628001 | 35                   | NULL       | <--- ping ok
| 192.168.1.5 | 3306 | 1473005437628014 | 154                  | NULL       |
+-------------+------+------------------+----------------------+------------+
+-------------+------+------------------+-------------------------+---------------+
| hostname    | port | time_start_us    | connect_success_time_us | connect_error |
+-------------+------+------------------+-------------------------+---------------+
| 192.168.1.5 | 3306 | 1473005467369575 | 246                     | NULL          | <--- connect ok
| 192.168.1.5 | 3306 | 1473005407369441 | 353                     | NULL          |
+-------------+------+------------------+-------------------------+---------------+

The script

galera_check
 will instead manage it:

+-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status       | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | OFFLINE_SOFT | 0        | 0        | 8343   | 0       | 10821   | 240870          | 0               | 93         | <--- galera check put it OFFLINE
| 500       | 192.168.1.6 | 3306     | ONLINE       | 10       | 0        | 15     | 0       | 48012   | 3859402         | 0               | 38         | <--- writer
| 500       | 192.168.1.7 | 3306     | ONLINE       | 0        | 1        | 6      | 0       | 14712   | 1182364         | 0               | 54         |
| 500       | 192.168.1.8 | 3306     | ONLINE       | 0        | 1        | 2      | 0       | 1092    | 87758           | 0               | 602        |
| 500       | 192.168.1.9 | 3306     | ONLINE       | 0        | 1        | 4      | 0       | 5352    | 430152          | 0               | 238        |
| 501       | 192.168.1.5 | 3306     | OFFLINE_SOFT | 0        | 0        | 1410   | 0       | 197909  | 9638665         | 597013919       | 93         |
| 501       | 192.168.1.6 | 3306     | ONLINE       | 2        | 10       | 12     | 0       | 7822682 | 380980455       | 23208091727     | 38         |
| 501       | 192.168.1.7 | 3306     | ONLINE       | 0        | 13       | 13     | 0       | 7267507 | 353962618       | 21577881545     | 54         |
| 501       | 192.168.1.8 | 3306     | ONLINE       | 0        | 1        | 1      | 0       | 241641  | 11779770        | 738145270       | 602        |
| 501       | 192.168.1.9 | 3306     | ONLINE       | 1        | 0        | 1      | 0       | 756415  | 36880233        | 2290165636      | 238        |
+-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

In this case, the script will put the node in OFFLINE_SOFT, given the

set global wsrep_reject_queries=ALL
 means do not accept NEW and complete the existing as OFFLINE_SOFT.

The script also manages the case of

set global wsrep_reject_queries=ALL_KILL;
. From the ProxySQL point of view, these do not exist either:

+-------------+------+------------------+----------------------+------------+
| hostname    | port | time_start_us    | ping_success_time_us | ping_error |
+-------------+------+------------------+----------------------+------------+
| 192.168.1.5 | 3306 | 1473005827629069 | 59                   | NULL       |<--- ping ok
| 192.168.1.5 | 3306 | 1473005797628988 | 57                   | NULL       |
+-------------+------+------------------+----------------------+------------+
+-------------+------+------------------+-------------------------+---------------+
| hostname    | port | time_start_us    | connect_success_time_us | connect_error |
+-------------+------+------------------+-------------------------+---------------+
| 192.168.1.5 | 3306 | 1473005827370084 | 370                     | NULL          | <--- connect ok
| 192.168.1.5 | 3306 | 1473005767369915 | 243                     | NULL          |
+-------------+------+------------------+-------------------------+---------------+
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 9500      | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |<--- galera check put it in special HG
| 9501      | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

The difference here is that the script moves the node to the special HG to isolate it, instead leaving it in the original HG.

The integration between ProxySQL and Percona XtraDB Custer (Galera) works perfectly for multi-writer if you have a script like

galera_check
 that correctly manages the different Percona XtraDB Custer/Galera states.

ProxySQL and PXC using Replication HostGroup

Sometimes we might need to have 100% of the write going to only one node at a time. As explained above, ProxySQL uses weight to redirect a % of the load to a specific node.

In most cases, it will be enough to set the weight in the main writer to a very high value (like 10 billion) and one thousand on the next node to almost achieve a single writer.

But this is not 100% effective, it still allows ProxySQL to send a query once every X times to the other node(s). To keep it consistent with the ProxySQL logic, the solution is to use replication Hostgroups.

Replication HGs are special HGs that the proxy sees as connected for R/W operations. ProxySQL analyzes the value of the READ_ONLY variables and assigns to the READ_ONLY HG the nodes that have it enabled.

The node having READ_ONLY=0 resides in both HGs. As such the first thing we need to modify is to tell ProxySQL that HG 500 and 501 are replication HGs.

INSERT INTO mysql_replication_hostgroups VALUES (500,501,'');
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
select * from mysql_replication_hostgroups ;
+------------------+------------------+---------+
| writer_hostgroup | reader_hostgroup | comment |
+------------------+------------------+---------+
| 500              | 501              |         |
+------------------+------------------+---------+

Now whenever I set the value of READ_ONLY on a node, ProxySQL will move the node accordingly. Let see how. Current:

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 6        | 1        | 7      | 0       | 16386    | 1317177         | 0               | 97         |
| 500       | 192.168.1.6 | 3306     | ONLINE | 1        | 9        | 15     | 0       | 73764    | 5929366         | 0               | 181        |
| 500       | 192.168.1.7 | 3306     | ONLINE | 1        | 0        | 6      | 0       | 18012    | 1447598         | 0               | 64         |
| 500       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 2      | 0       | 1440     | 115728          | 0               | 341        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1210029  | 58927817        | 3706882671      | 97         |
| 501       | 192.168.1.6 | 3306     | ONLINE | 1        | 11       | 12     | 0       | 16390790 | 798382865       | 49037691590     | 181        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 1        | 12       | 13     | 0       | 15357779 | 748038558       | 45950863867     | 64         |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1247662  | 60752227        | 3808131279      | 341        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1766309  | 86046839        | 5374169120      | 422        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

Set global READ_ONLY=1 on the following nodes: 192.168.1.6/7/8/9.

After:

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 10       | 0        | 20     | 0       | 25980    | 2088346         | 0               | 93         |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1787979  | 87010074        | 5473781192      | 93         |
| 501       | 192.168.1.6 | 3306     | ONLINE | 4        | 8        | 12     | 0       | 18815907 | 916547402       | 56379724890     | 79         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 1        | 12       | 13     | 0       | 17580636 | 856336023       | 52670114510     | 131        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 15324    | 746109          | 46760779        | 822        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 16210    | 789999          | 49940867        | 679        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

IF in this scenario a reader node crashes, the application will not suffer at all given the redundancy.

But if the writer is going to crash, THEN the issue exists because there will be NO node available to manage the failover. The solution is to either do the node election manually or to have the script elect the node with the lowest read weight in the same segment as the new writer.

Below is what happens when a node crashes (bird-eye view):

+---------+
                         |  Crash  |
                         +----+----+
                              |
                              v
                     +--------+-------+
                     |  ProxySQL      |
                     |  shun crashed  |
                     |      node      |
                     +--------+-------+
                              |
                              |
                              v
            +-----------------+-----------------+
+----------->   HostGroup has another active    |
|           |   Node in HG writer?              |
|           +--+--------------+---------------+-+
|              |              |               |
|              |              |               |
|              |No            |               |Yes
|              |              |               |
|        +-----v----------+   |   +-----------v----+
|        |ProxySQL will   |   |   |ProxySQL will   |
|        |stop serving    |   |   |redirect load   >--------+
|        |writes          |   |   |there           |        |
|        +----------------+   |   +----------------+        |
|                             |                             |
|                             v                             |
|                     +-------+--------+                    |
|                     |ProxySQL checks |                    |
|                     |READ_ONLY on    |                    |
|                     |Reader HG       |                    |
|                     |                |                    |
|                     +-------+--------+                    |
|                             |                             |
|                             v                             |
|                     +-------+--------+                    |
|                     |Any Node with   |                    |
|                     |READ_ONLY = 0 ? |                    |
|                     +----------------+                    |
|                      |No            |Yes                  |
|                      |              |                     |
|           +----------v------+    +--v--------------+      |
|           |ProxySQL will    |    |ProxySQL will    |      |
|           |continue to      |    |Move node to     |      |
+<---------<+do not serve     |    |Writer HG        |      |
|           |Writes           |    |                 |      |
|           +-----------------+    +--------v--------+      |
|                                           |               |
+-------------------------------------------+               |
                         +---------+                        |
                         |   END   <------------------------+
                         +---------+

The script should act immediately after the ProxySQL SHUNNED the node step, just replacing the READ_ONLY=1 with READ_ONLY=0 on the reader node with the lowest READ WEIGHT.

ProxySQL will do the rest, copying the node into the WRITER HG, keeping low weight, such that WHEN/IF the original node will comeback the new node will not compete for traffic.

Since it included that special function in the check, the feature allows automatic fail-over. This experimental feature is active only if explicitly set in the parameter that the scheduler passes to the script. To activate it add

--active_failover
 in the scheduler. My recommendation is to have two entries in the scheduler and activate the one with
--active_failover
 for testing, and remember to deactivate the other one.

Let see the manual procedure first:

The process is:

1 Generate some load
2 Kill the writer node
3 Manually elect a reader as writer
4 Recover crashed node

Current load:

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 30324   | 2437438         | 0               | 153        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1519612 | 74006447        | 4734427711      | 153        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 4        | 8        | 12     | 0       | 7730857 | 376505014       | 24119645457     | 156        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 2        | 10       | 12     | 0       | 7038332 | 342888697       | 21985442619     | 178        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 612523  | 29835858        | 1903693835      | 337        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 611021  | 29769497        | 1903180139      | 366        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Kill the main node 192.168.1.5:

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 501       | 192.168.1.5 | 3306     | SHUNNED | 0        | 0        | 1      | 11      | 1565987 | 76267703        | 4879938857      | 119        |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 1        | 11       | 12     | 0       | 8023216 | 390742215       | 25033271548     | 112        |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 1        | 11       | 12     | 0       | 7306838 | 355968373       | 22827016386     | 135        |
| 501       | 192.168.1.8 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 638326  | 31096065        | 1984732176      | 410        |
| 501       | 192.168.1.9 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 636857  | 31025014        | 1982213114      | 328        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
+-------------+------+------------------+----------------------+------------------------------------------------------+
| hostname    | port | time_start_us    | ping_success_time_us | ping_error                                           |
+-------------+------+------------------+----------------------+------------------------------------------------------+
| 192.168.1.5 | 3306 | 1473070640798571 | 0                    | Can't connect to MySQL server on '192.168.1.5' (107) |
| 192.168.1.5 | 3306 | 1473070610798464 | 0                    | Can't connect to MySQL server on '192.168.1.5' (107) |
+-------------+------+------------------+----------------------+------------------------------------------------------+
+-------------+------+------------------+-------------------------+------------------------------------------------------+
| hostname    | port | time_start_us    | connect_success_time_us | connect_error                                        |
+-------------+------+------------------+-------------------------+------------------------------------------------------+
| 192.168.1.5 | 3306 | 1473070640779903 | 0                       | Can't connect to MySQL server on '192.168.1.5' (107) |
| 192.168.1.5 | 3306 | 1473070580779977 | 0                       | Can't connect to MySQL server on '192.168.1.5' (107) |
+-------------+------+------------------+-------------------------+------------------------------------------------------+

When the node is killed ProxySQL, shun it and report issues with the checks (connect and ping). During this time frame the application will experience issues if it is not designed to manage the retry and eventually a queue, and it will crash.

Sysbench reports the errors:

Writes

[  10s] threads: 10, tps: 6.70, reads: 68.50, writes: 30.00, response time: 1950.53ms (95%), errors: 0.00, reconnects:  0.00
...
[1090s] threads: 10, tps: 4.10, reads: 36.90, writes: 16.40, response time: 2226.45ms (95%), errors: 0.00, reconnects:  1.00  <-+ killing the node
[1100s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1110s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1120s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1130s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |-- Gap waiting for a node to become
[1140s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |   READ_ONLY=0
[1150s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1160s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1170s] threads: 10, tps: 4.70, reads: 51.30, writes: 22.80, response time: 80430.18ms (95%), errors: 0.00, reconnects:  0.00 <-+
[1180s] threads: 10, tps: 8.90, reads: 80.10, writes: 35.60, response time: 2068.39ms (95%), errors: 0.00, reconnects:  0.00
...
 [1750s] threads: 10, tps: 5.50, reads: 49.80, writes: 22.80, response time: 2266.80ms (95%), errors: 0.00, reconnects:  0.00 -- No additional errors

I decided to promote node 192.168.1.6 given the weight for readers was equal and as such no difference in this setup.

(root@localhost:pm) [(none)]>set global read_only=0;
Query OK, 0 rows affected (0.00 sec)

Checking ProxySQL:

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.6 | 3306     | ONLINE  | 10       | 0        | 10     | 0       | 1848    | 148532          | 0               | 40         |
| 501       | 192.168.1.5 | 3306     | SHUNNED | 0        | 0        | 1      | 72      | 1565987 | 76267703        | 4879938857      | 38         |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 2        | 10       | 12     | 0       | 8843069 | 430654903       | 27597990684     | 40         |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 1        | 11       | 12     | 0       | 8048826 | 392101994       | 25145582384     | 83         |
| 501       | 192.168.1.8 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 725820  | 35371512        | 2259974847      | 227        |
| 501       | 192.168.1.9 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 723582  | 35265066        | 2254824754      | 290        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

As the READ_ONLY value is modified, ProxySQL moves it to the writer HG, and writes can take place again. At this point in time production activities are recovered.

Reads had just a minor glitch:

Reads

[  10s] threads: 10, tps: 0.00, reads: 20192.15, writes: 0.00, response time: 6.96ms (95%), errors: 0.00, reconnects:  0.00
...
[ 410s] threads: 10, tps: 0.00, reads: 16489.03, writes: 0.00, response time: 9.41ms (95%), errors: 0.00, reconnects:  2.50
...
[ 710s] threads: 10, tps: 0.00, reads: 18789.40, writes: 0.00, response time: 6.61ms (95%), errors: 0.00, reconnects:  0.00

The glitch happened when node 192.168.1.6 was copied over to HG 500, but with no interruptions or errors. At this point let us put back the crashed node, which comes back elect Node2 (192.168.1.6) as Donor.

This was a Percona XtraDB Cluster/Galera choice, and we have to accept and manage it.

Note that the other basic scripts put the node in OFFLINE_SOFT, given the node will become a DONOR.
Galera_check will recognize that Node2 (192.168.1.6) is the only active node in the segment for that specific HG (writer), while is not the only present for the READER HG.

As such it will put the node in OFFLINE_SOFT only for the READER HG, trying to reduce the load on the node, but it will keep it active in the WRITER HG, to prevent service interruption.

Restart the node and ask for a donor:

2016-09-05 12:21:43 8007 [Note] WSREP: Flow-control interval: [67, 67]
2016-09-05 12:21:45 8007 [Note] WSREP: Member 1.1 (node1) requested state transfer from '*any*'. Selected 0.1 (node2)(SYNCED) as donor.
2016-09-05 12:21:46 8007 [Note] WSREP: (ef248c1f, 'tcp://192.168.1.8:4567') turning message relay requesting off
2016-09-05 12:21:52 8007 [Note] WSREP: New cluster view: global state: 234bb6ed-527d-11e6-9971-e794f632b140:324329, view# 7: Primary, number of nodes: 5, my index: 3, protocol version 3

galera_check
  sets OFFLINE_SOFT 192.168.1.6 only for the READER HG, and ProxySQL uses the others to serve reads.

+-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status       | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.6 | 3306     | ONLINE       | 10       | 0        | 10     | 0       | 7746     | 622557          | 0               | 86         |
| 501       | 192.168.1.5 | 3306     | ONLINE       | 0        | 0        | 1      | 147     | 1565987  | 76267703        | 4879938857      | 38         |
| 501       | 192.168.1.6 | 3306     | OFFLINE_SOFT | 0        | 0        | 12     | 0       | 9668944  | 470878452       | 30181474498     | 86         | <-- Node offline
| 501       | 192.168.1.7 | 3306     | ONLINE       | 9        | 3        | 12     | 0       | 10932794 | 532558667       | 34170366564     | 62         |
| 501       | 192.168.1.8 | 3306     | ONLINE       | 0        | 1        | 1      | 0       | 816599   | 39804966        | 2545765089      | 229        |
| 501       | 192.168.1.9 | 3306     | ONLINE       | 0        | 1        | 1      | 0       | 814893   | 39724481        | 2541760230      | 248        |
+-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

When the SST donor task is over, 

galera_check
 moves the 192.168.1.6 back ONLINE as expected. But at the same time, it moves the recovering node to the special HG to avoid to have it included in any activity until ready.

2016-09-05 12:22:36 27352 [Note] WSREP: 1.1 (node1): State transfer from 0.1 (node2) complete.
2016-09-05 12:22:36 27352 [Note] WSREP: Shifting JOINER -> JOINED (TO: 325062)

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.6 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 1554     | 124909          | 0               | 35         |
| 501       | 192.168.1.6 | 3306     | ONLINE | 2        | 8        | 22     | 0       | 10341612 | 503637989       | 32286072739     | 35         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 3        | 9        | 12     | 0       | 12058701 | 587388598       | 37696717375     | 13         |
| 501       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 890102   | 43389051        | 2776691164      | 355        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 887994   | 43296865        | 2772702537      | 250        |
| 9500      | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 57         | <-- Special HG for recover
| 9501      | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 57         | <-- Special HG for recover
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

Once finally the node is in SYNC with the group, it is put back online in the READER HG and in the writer HG:

2016-09-05 12:22:36 27352 [Note] WSREP: 1.1 (node1): State transfer from 0.1 (node2) complete.
2016-09-05 12:22:36 27352 [Note] WSREP: Shifting JOINER -> JOINED (TO: 325062)

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 0          | <-- Back on line
| 500       | 192.168.1.6 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 402      | 32317           | 0               | 68         |
| 501       | 192.168.1.5 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 6285     | 305823          | 19592814        | 312        | <-- Back on line
| 501       | 192.168.1.6 | 3306     | ONLINE | 4        | 6        | 22     | 0       | 10818694 | 526870710       | 33779586475     | 68         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 0        | 12       | 12     | 0       | 12492316 | 608504039       | 39056093665     | 26         |
| 501       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 942023   | 45924082        | 2940228050      | 617        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 939975   | 45834039        | 2935816783      | 309        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
+--------------+-------------+------+--------+------------+
| hostgroup_id | hostname    | port | status | weight     |
+--------------+-------------+------+--------+------------+
| 500          | 192.168.1.5 | 3306 | ONLINE | 100        |
| 500          | 192.168.1.6 | 3306 | ONLINE | 1000000000 |
| 501          | 192.168.1.5 | 3306 | ONLINE | 100        |
| 501          | 192.168.1.6 | 3306 | ONLINE | 1000000000 |
| 501          | 192.168.1.7 | 3306 | ONLINE | 1000000000 |
| 501          | 192.168.1.8 | 3306 | ONLINE | 1          |
| 501          | 192.168.1.9 | 3306 | ONLINE | 1          |
+--------------+-------------+------+--------+------------+

But given it is coming back with its READER WEIGHT, it will NOT compete with the previously elected WRITER.

The recovered node will stay on “hold” waiting for a DBA to act and eventually put it back, or be set as READ_ONLY and as such be fully removed from the WRITER HG.

Let see the automatic procedure now:

For the moment, we just stick to the MANUAL failover process. The process is:

  1. Generate some load
  2. Kill the writer node
  3. Script will do auto-failover
  4. Recover crashed node

Check our scheduler config:

+----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+
| id | active | interval_ms | filename | arg1 | arg2 | arg3 | arg4 | arg5 | comment |
+----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+
| 10 | 1 | 2000 | /var/lib/proxysql/galera_check.pl | -u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --active_failover --debug=0 --log=/var/lib/proxysql/galeraLog | NULL | NULL | NULL | NULL | | <--- Active
| 20 | 0 | 1500 | /var/lib/proxysql/galera_check.pl | -u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --debug=0 --log=/var/lib/proxysql/galeraLog | NULL | NULL | NULL | NULL | |
+----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+

The active one is the one with auto-failover. Start load and check current load:

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 952     | 76461           | 0               | 0          |
| 501       | 192.168.1.5 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 53137   | 2587784         | 165811100       | 167        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 5        | 5        | 11     | 0       | 283496  | 13815077        | 891230826       | 109        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 3        | 7        | 10     | 0       | 503516  | 24519457        | 1576198138      | 151        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 21952   | 1068972         | 68554796        | 300        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 21314   | 1038593         | 67043935        | 289        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Kill the main node 192.168.1.5:

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.6 | 3306     | ONLINE  | 10       | 0        | 10     | 0       | 60      | 4826            | 0               | 0          |
| 501       | 192.168.1.5 | 3306     | SHUNNED | 0        | 0        | 1      | 11      | 177099  | 8626778         | 552221651       | 30         |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 3        | 7        | 11     | 0       | 956724  | 46601110        | 3002941482      | 49         |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 2        | 8        | 10     | 0       | 1115685 | 54342756        | 3497575125      | 42         |
| 501       | 192.168.1.8 | 3306     | ONLINE  | 0        | 1        | 1      | 0       | 76289   | 3721419         | 240157393       | 308        |
| 501       | 192.168.1.9 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 75803   | 3686067         | 236382784       | 231        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

When the node is killed the node is SHUNNED, but this time the script already set the new node 192.168.1.6 to ONLINE. See script log:

2016/09/08 14:04:02.494:[INFO] END EXECUTION Total Time:102.347850799561
2016/09/08 14:04:04.478:[INFO] This Node Try to become a WRITER set READ_ONLY to 0 192.168.1.6:3306:HG501
2016/09/08 14:04:04.479:[INFO] This Node NOW HAS READ_ONLY = 0 192.168.1.6:3306:HG501
2016/09/08 14:04:04.479:[INFO] END EXECUTION Total Time:71.8140602111816

More importantly, let’s look at the application experience:

Writes

[  10s] threads: 10, tps: 9.40, reads: 93.60, writes: 41.60, response time: 1317.41ms (95%), errors: 0.00, reconnects:  0.00
[  20s] threads: 10, tps: 8.30, reads: 74.70, writes: 33.20, response time: 1350.96ms (95%), errors: 0.00, reconnects:  0.00
[  30s] threads: 10, tps: 8.30, reads: 74.70, writes: 33.20, response time: 1317.81ms (95%), errors: 0.00, reconnects:  0.00
[  40s] threads: 10, tps: 7.80, reads: 70.20, writes: 31.20, response time: 1407.51ms (95%), errors: 0.00, reconnects:  0.00
[  50s] threads: 10, tps: 6.70, reads: 60.30, writes: 26.80, response time: 2259.35ms (95%), errors: 0.00, reconnects:  0.00
[  60s] threads: 10, tps: 6.60, reads: 59.40, writes: 26.40, response time: 3275.78ms (95%), errors: 0.00, reconnects:  0.00
[  70s] threads: 10, tps: 5.70, reads: 60.30, writes: 26.80, response time: 1492.56ms (95%), errors: 0.00, reconnects:  1.00 <-- just a reconnect experience
[  80s] threads: 10, tps: 6.70, reads: 60.30, writes: 26.80, response time: 7959.74ms (95%), errors: 0.00, reconnects:  0.00
[  90s] threads: 10, tps: 6.60, reads: 59.40, writes: 26.40, response time: 2109.03ms (95%), errors: 0.00, reconnects:  0.00
[ 100s] threads: 10, tps: 6.40, reads: 57.60, writes: 25.60, response time: 1883.96ms (95%), errors: 0.00, reconnects:  0.00
[ 110s] threads: 10, tps: 5.60, reads: 50.40, writes: 22.40, response time: 2167.27ms (95%), errors: 0.00, reconnects:  0.00

With no errors and no huge delay, our application (managing to reconnect) had only a glitch, and had to reconnect.

Read had no errors or reconnects.

The connection errors were managed by ProxySQL, and given it found five in one second it SHUNNED the node. The

galera_script
 was able to promote a reader, and given it is a failover, no delay with retry loop. The whole thing was done in such brief time that application barely saw it.

Obviously, an application with thousands of connections/sec will experience larger impact, but the time-window will be very narrow. Once the failed node is ready to come back, either we choose to start it with READ_ONLY=1, and it will come back as the reader.
Or we will keep it as it is and it will come back as the writer.

No matter what, the script manages the case as it had done in the previous (manual) exercise.

Conclusions

ProxySQL and galera_check, when working together, are quite efficient in managing the cluster and its different scenarios. When using the single-writer mode, solving the manual part of the failover dramatically improves the efficiency in production state recovery performance — going from few minutes to seconds or less.

The multi-writer mode remains the preferred and most recommended way to use ProxySQL/Percona XtraDB Cluster given it performs failover without the need of additional scripts or extensions. It’s also the preferred method if a script is required to manage the integration with ProxySQL.

In both cases, the use of a script can identify the multiple states of Percona XtraDB Cluster and the mutable node scenario. It is a crucial part of the implementation, without which ProxySQL might not behave correctly.

by Marco Tusa at September 15, 2016 10:37 PM

Percona XtraDB Cluster 5.6.30-25.16.2 is now available (CVE-2016-6662 fix)

Percona XtraDB Cluster Reference Architecture

Percona XtraDB Cluster 5.6

Percona  announces the new release of Percona XtraDB Cluster 5.6 on September 15, 2016. Binaries are available from the downloads area or our software repositories.

Percona XtraDB Cluster 5.6.30-25.16.2 is now the current release, based on the following:

  • Percona Server 5.6.30-76.3
  • Galera Replication library 3.16
  • Codership wsrep API version 25

This release provides a fix for CVE-2016-6662. More information about this security issue can be found here.

Bug Fixed:

  • Due to security reasons ld_preload libraries can now only be loaded from the system directories (/usr/lib64, /usr/lib) and the MySQL installation base directory.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Hrvoje Matijakovic at September 15, 2016 01:53 PM

Jean-Jerome Schmidt

Planets9s - New ClusterControl pricing plans for managing MySQL, MongoDB & PostgreSQL

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

New ClusterControl pricing plans for managing MySQL, MongoDB & PostgreSQL

Whether you’re looking to manage standalone instances, need high availability or have 24/7 SLA requirements for your databases, we’ve got you covered. ClusterControl now comes with three enhanced subscription options for you to chose from: Standalone, Advanced and Enterprise. This is in addition to its Community Edition that you can use at no charge.

View the new pricing plans

Join us for our MySQL Query Tuning Part 2: Indexing and EXPLAIN webinar

Why is a given query slow, what does the execution plan look like, how will JOINs be processed, is the query using the correct indexes, or is it creating a temporary table?

You are welcome to sign up for our next webinar on September 27, where we’ll look at the EXPLAIN command and see how it can help us answer these questions. EXPLAIN is one of the most powerful tools at your disposal for understanding and troubleshooting troublesome database queries. We will also look into how to use database indexes to speed up queries.

Sign up for the webinar

Sign up for our #ClusterControl CrowdChat

If you haven’t checked this out yet, do take a look at our online community to interact with experts on how to best deploy and manage your databases. CrowdChat is a community platform that works across Facebook, Twitter, and LinkedIn to allow users to discuss a topic using a specific #hashtag. This crowdchat focuses on the hashtag #ClusterControl. So if you’re a DBA, architect, CTO, or a database novice, register to join and become part of the conversation!

Join the CrowdChat

Become a MongoDB DBA: Recovering your Data

A well-designed backup and restore strategy maximizes data availability and minimizes data loss, while considering the requirements of your business. How do you best restore a MongoDB backup? What are the considerations when restoring a replicaSet as opposed to a single node? This blog gives you an overview on how to restore your data for recovery purposes, as well as when seeding a new node in a replicaSet.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at September 15, 2016 01:49 PM

Daniël van Eeden

About Oracle MySQL and CVE-2016-6662

The issue

On 12 September 2016 (three days ago) a MySQL security vulnerability was announced. The CVE id is CVE-2016-6662.

There are 3 claims:
  1. By setting malloc-lib in the configuration file access to an OS root shell can be gained.
  2. By using the general log a configuration file can be written in any place which is writable for the OS mysql user.
  3. By using SELECT...INTO DUMPFILE... it is possible to elevate privileges from a database user with the FILE privilege to any database account including root.

How it is supposed to be used

  1. Find an SQL Injection in a website or otherwise gain access to a MySQL account.
  2. Now create a trigger file (requires FILE privilege)
  3. Now in the trigger or otherwise use SET GLOBAL general_log_file etc to create a my.cnf in the datadir with the correct privileges. Directly using SELECT...INTO DUMPFILE...won't work as that would result in the wrong permissions, which would cause mysqld/mysqld_safe to ignore that file.
  4. Now wait someone/something to restart MySQL (upgrade, daily cold backup, etc) and a shell will be available on a port number and IP address chosen by the attacker.

How it is fixed

The document claims "Official patches for the vulnerability are not available at this time for Oracle MySQL server. ", but that isn't true.

From the 5.7.15 release notes:
  • mysqld_safe attempted to read my.cnf in the data directory, although that is no longer a standard option file location. (Bug #24482156)
  • For mysqld_safe, the argument to --malloc-lib now must be one of the directories /usr/lib, /usr/lib64, /usr/lib/i386-linux-gnu, or /usr/lib/x86_64-linux-gnu. In addition, the --mysqld and --mysqld-version options can be used only on the command line and not in an option file. (Bug #24464380)
  • It was possible to write log files ending with .ini or .cnf that later could be parsed as option files. The general query log and slow query log can no longer be written to a file ending with .ini or .cnf. (Bug #24388753)
The last two items are also in the 5.6.33 release notes.

So 2 out of the 3 vulnerabilities are patched. So the obvious advice is to upgrade.

Further steps to take

But there are more things you can do to further secure your setup.

Check if your my.cnf file(s) are writable for the mysql user

/etc/my.cnf, /etc/mysql/my.cnf /etc/mysql/my.cnf.d/* should NOT be writable for the mysql user. Make sure these are owned by root and mode 644.

Put an empty my.cnf in your datadir and make sure it has the above mentioned privileges. The vulnerability document also mentions a .my.cnf in the datadir, so also make that an empty file.

Review accounts with the FILE privilege:

Run this query and drop accounts or revoke the file privilege from them if they don't really need it.
SELECT GRANTEE FROM INFORMATION_SCHEMA.USER_PRIVILEGES WHERE PRIVILEGE_TYPE='FILE';

Isolate services

Don't run all services on one machine. Isolate services from each other. So put the webserver and database server on separate (virtual) machines or containers.

Use a firewall. If the database suddenly starts to listen on a weird port the attacker should not be able to connect to it. This can be a host based firewall like iptables and a network device. Yes an IDS might be able to detect the network shell, but running an IDS/IPS needs serious amount of time and doesn't give any guarantees.

Prepare for the next vulnerability

This is not only for MySQL, but also for other parts of your stack (OS, webserver, etc).

Make sure the configuration is secured properly for each service. A helpful resource here are the benchmark documents from the Center for Internet Security.

by Daniël van Eeden (noreply@blogger.com) at September 15, 2016 07:04 AM

September 14, 2016

Peter Zaitsev

MySQL Default Configuration Changes between 5.6 and 5.7

MySQL Default Configuration Changes

MySQL Default Configuration ChangesIn this blog post, we’ll discuss the MySQL default configuration changes between 5.6 and 5.7.

MySQL 5.7 has added a variety of new features that might excite you. However, there are also changes in the current variables that you might have overlooked. MySQL 5.7 updated nearly 40 of the defaults from 5.6. Some of the changes could severely impact your server performance, while others might go unnoticed. I’m going to go over each of the changes and what they mean.

The change that can have the largest impact on your server is likely

sync_binlog
. My colleague, Roel Van de Paar, wrote about this impact in depth in another blog post, so I won’t go in much detail.
Sync_binlog
 controls how MySQL flushes the binlog to disk. The new value of 1 forces MySQL to write every transaction to disk prior to committing. Previously, MySQL did not force flushing the binlog, and trusted the OS to decide when to flush the binlog.

(https://www.percona.com/blog/2016/06/03/binary-logs-make-mysql-5-7-slower-than-5-6/)

Variables 5.6.29 5.7.11
sync_binlog 0 1

 

The performance schema variables stand out as unusual, as many have a default of -1. MySQL uses this notation to call out variables that are automatically adjusted. The only performance schema variable change that doesn’t adjust itself is  

performance_schema_max_file_classes
. This is the number of file instruments used for the performance schema. It’s unlikely you will ever need to alter it.

Variables 5.6.29 5.7.11
performance_schema_accounts_size 100 -1
performance_schema_hosts_size 100 -1
performance_schema_max_cond_instances 3504 -1
performance_schema_max_file_classes 50 80
performance_schema_max_file_instances 7693 -1
performance_schema_max_mutex_instances 15906 -1
performance_schema_max_rwlock_instances 9102 -1
performance_schema_max_socket_instances 322 -1
performance_schema_max_statement_classes 168 -1
performance_schema_max_table_handles 4000 -1
performance_schema_max_table_instances 12500 -1
performance_schema_max_thread_instances 402 -1
performance_schema_setup_actors_size 100 -1
performance_schema_setup_objects_size 100 -1
performance_schema_users_size 100 -1

 

The

optimizer_switch
, and
sql_mode
 variables have a variety of options that can each be enabled and cause a slightly different action to occur. MySQL 5.7 enables both variables for flags, increasing their sensitivity and security. These additions make the optimizer more efficient in determining how to correctly interpret your queries.

Three flags have been added to the

optimzer_switch
, all of which existed in MySQL 5.6 and were set as the default in MySQL 5.7 (with the intent to increase the optimizer’s efficiency):
duplicateweedout=on
,
condition_fanout_filter=on
, and
derived_merge=on
.
duplicateweedout
 is part of the optimizer’s semi join materialization strategy.
condition_fanout_filter
 controls the use of condition filtering, and
derived_merge controls
 the merging of derived tables, and views into the outer query block.

https://dev.mysql.com/doc/refman/5.7/en/switchable-optimizations.html

http://www.chriscalender.com/tag/condition_fanout_filter/

The additions to SQL mode do not affect performance directly, however they will improve the way you write queries (which can increase performance). Some notable changes include requiring all fields in a select … group by statement must either be aggregated using a function like SUM, or be in the group by clause. MySQL will not assume they should be grouped, and will raise an error if a field is missing.

Strict_trans_tables
 causes a different effect depending on if it used with a transactional table.

Statements are rolled back on transaction tables if there is an invalid or missing value in a data change statement. For tables that do not use a transactional engine, MySQL’s behavior depends on the row in which the invalid data occurs. If it is the first row, then the behavior matches that of a transactional engine. If not, then the invalid value is converted to the closest valid value, or the default value for the columns. A warning is generated, but the data is still inserted.

Variables 5.6.29 5.7.11
optimizer_switch index_merge=on
index_merge_union=on
index_merge_sort_union=on
index_merge_intersection=on
engine_condition_pushdown=on
index_condition_pushdown=on
mrr=on,mrr_cost_based=on
block_nested_loop=on
batched_key_access=off
materialization=on, semijoin=on
loosescan=on, firstmatch=on
subquery_materialization_cost_based=on
use_index_extensions=on
index_merge=on
index_merge_union=on
index_merge_sort_union=on
index_merge_intersection=on
engine_condition_pushdown=on
index_condition_pushdown=on
mrr=on
mrr_cost_based=on
block_nested_loop=on
batched_key_access=off
materialization=on
semijoin=on
loosescan=on
firstmatch=on
duplicateweedout=on
subquery_materialization_cost_based=on
use_index_extensions=on
condition_fanout_filter=on
derived_merge=on
sql_mode NO_ENGINE_SUBSTITUTION ONLY_FULL_GROUP_BY
STRICT_TRANS_TABLES
NO_ZERO_IN_DATE
NO_ZERO_DATE
ERROR_FOR_DIVISION_BY_ZERO
NO_AUTO_CREATE_USER
NO_ENGINE_SUBSTITUTION

 

There have been a couple of variable changes surrounding the binlog. MySQL 5.7 updated the

binlog_error_action
 so that if there is an error while writing to the binlog, the server aborts. These kind of incidents are rare, but cause a big impact to your application and replication when they occurs, as the server will not perform any further transactions until corrected.

The binlog default format was changed to ROW, instead of the previously used statement format. Statement writes less data to the logs. However there are many statements that cannot be replicated correctly, including “update … order by rand()”. These non-deterministic statements could result in different resultsets on the master and slave. The change to Row format writes more data to  the binlog, but the information is more accurate and ensures correct replication.

MySQL has begun to focus on replication using GTID’s instead of the traditional binlog position. When MySQL is started, or restarted, it must generate a list of the previously used GTIDs. If

binlog_gtid_simple_recovery
 is OFF, or FALSE, then the server starts with the newest binlog and iterates backwards through the binlog files searching for a
previous_gtids_log_event
. With it set to ON, or TRUE, then the server only reviews the newest and oldest binlog files and computes the used gtids.
Binlog_gtid_simple_recovery
  makes it much faster to identify the binlogs, especially if there are a large number of binary logs without GTID events. However, in specific cases it could cause
gtid_executed
 and
gtid_purged
 to be populated incorrectly. This should only happen when the newest binarly log was generated by MySQL5.7.5 or older, or if a SET GTID_PURGED statement was run on MySQL earlier than version 5.7.7.

Another replication-based variable updated in 5.7 is 

slave_net_timeout
. It is lowered to only 60 seconds. Previously the replication thread would not consider it’s connection to the master broken until the problem existed for at least an hour. This change informs you much sooner if there is a connectivity problem, and ensures replication does not fall behind significantly before informing you of an issue.

Variables 5.6.29 5.7.11
binlog_error_action IGNORE_ERROR ABORT_SERVER
binlog_format STATEMENT ROW
binlog_gtid_simple_recovery OFF ON
slave_net_timeout 3600 60

 

InnoDB buffer pool changes impact how long starting and stopping the server takes.

innodb_buffer_pool_dump_at_shutdown
 and
innodb_buffer_pool_load_at_startup
 are used together to prevent you from having to “warm up” the server. As the names suggest, this causes a buffer pool dump at shutdown and load at startup. Even though you might have a buffer pool of 100’s of gigabytes, you will not need to reserve the same amount of space on disk, as the data written is much smaller. The only things written to disk for this is the information necessary to locate the actual data, the tablespace and page IDs.

Variables 5.6.29 5.7.11
innodb_buffer_pool_dump_at_shutdown OFF ON
innodb_buffer_pool_load_at_startup OFF ON

 

MySQL now made some of the options implemented in InnoDB during 5.6 and earlier into its defaults. InnoDB’s checksum algorithm was updated from innodb to crc32, allowing you to benefit from the hardware acceleration recent Intel CPU’s have available.

The Barracuda file format has been available since 5.5, but had many improvements in 5.6. It is now the default in 5.7. The Barracuda format allows you to use the compressed and dynamic row formats. My colleague Alexey has written about the utilization of the compressed format and the results he saw when optimizing a server: https://www.percona.com/blog/2008/04/23/real-life-use-case-for-barracuda-innodb-file-format/

The

innodb_large_prefix
 defaults to “on”, and when combined with the Barracuda file format allows for creating larger index key prefixes, up to 3072 bytes. This allows larger text fields to benefit from an index. If this is set to “off”, or the row format is not either dynamic or compressed, any index prefix larger than 767 bytes gets silently be truncated. MySQL has introduced larger InnoDB page sizes (32k and 64k) in 5.7.6.

MySQL 5.7 increased the

innodb_log_buffer_size
 value as well. InnoDB uses the log buffer to log transactions prior to writing them to disk in the binary log. The increased size allows the log to flush to the disk less often, reducing IO, and allows larger transactions to fit in the log without having to write to disk before committing.

MySQL 5.7 moved InnoDB’s purge operations to a background thread in order to reduce the thread contention in MySQL 5.5.The latest version increases the default to four purge threads, but can be changed to have anywhere from 1 to 32 threads.

MySQL 5.7 now enables 

innodb_strict_mode
 by default, turning some of the warnings into errors. Syntax errors in create table, alter table, create index, and optimize table statements generate errors and force the user to correct them prior to running. It also enables a record size check, ensuring that insert or update statements will not fail due to the record being too large for the selected page size.

Variables 5.6.29 5.7.11
innodb_checksum_algorithm innodb crc32
innodb_file_format Antelope Barracuda
innodb_file_format_max Antelope Barracuda
innodb_large_prefix OFF ON
innodb_log_buffer_size 8388608 16777216
innodb_purge_threads 1 4
innodb_strict_mode OFF ON

 

MySQL has increased the number of times the optimizer dives into the index when evaluating equality ranges. If the optimizer needs to dive into the index more than the

eq_range_index_dive_limit
 , defaulted to 200 in MySQL 5.7, then it uses the existing index statistics. You can adjust this limit from 0, eliminating index dives, to 4294967295. This can have a significant impact to query performance since the table statistics are based on the cardinality of a random sample. This can cause the optimizer to estimate a much larger set of rows to review than it would with the index dives, changing the method the optimizer chooses to execute the query.

MySQL 5.7 deprecated

log_warnings
. The new preference is 
utilize log_error_verbosity
. By default this is set to 3, and logs errors, warnings, and notes to the error log. You can alter this to 1 (log errors only) or 2 (log errors and warnings). When consulting the error log, verbosity is often a good thing. However this increases the IO and disk space needed for the error log.

Variables 5.6.29 5.7.11
eq_range_index_dive_limit 10 200
log_warnings 1 2

 

There are many changes to the defaults in 5.7. But many of these options have existed for a long time and should be familiar to users. Many people used these variables, and they are the best method to push MySQL forward. Remember, however, you can still edit these variables, and configure them to ensure that your server works it’s best for your data.

by bradley mickel at September 14, 2016 10:26 PM

pmp-check-pt-table-checksum Percona Monitoring Plugin

pmp-check-pt-table-checksum

pmp-check-pt-table-checksumRecently, I worked on a customer case where the customer needed to monitor the checksum via Nagios monitoring. The pmp-check-pt-table-checksum plugin from Percona Monitoring Plugins for MySQL achieves this goal. I thought it was worth a blogpost.

pmp-check-pt-table-checksum
 alerts you when the pt-table-checksum tool from Percona Toolkit finds data drifts on a replication slave.
pmp-checksum-pt-table-checksum
 monitors data differences on the slave from the checksum table as per information in the last checksum performed by the
pt-table-checksum
 tool. By default, the plugin queries the percona.checksum table to fetch information about data discrepancies. You can override this behavior with the “-T” option. You can check the
pmp-check-pt-table-checksum
 documentation for details.

Let’s demonstrate checksum monitoring via Nagios. My setup contains a master with two slave(s) connected, as follows:

  • Host 10.0.3.131 is master.
  • Host 10.0.3.83 is slave1
  • Host 10.0.3.36 is slave2

I intentionally generated more data on the master so

pt-table-checksum
 can catch the differences on the slave(s). Here’s what it looks like:

mysql-master> SELECT * FROM test.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
+------+
10 rows in set (0.00 sec)
mysql-slave1> SELECT * FROM test.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+------+
5 rows in set (0.00 sec)
mysql-slave2> SELECT * FROM test.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+------+
5 rows in set (0.00 sec)

As you can see, slave1 and slave2 are different from the master: the master has ten rows while the slave(s) have five rows each (table t1).

Then, I executed

pt-table-checksum
 from the master to check for data discrepancies:

[root@master]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password
 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
08-25T04:57:10 0 1 10 1 0 0.018 test.t1
[root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password
Differences on slave1
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.t1 1 -5 1
Differences on slave2
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.t1 1 -5 1

pt-table-checksum
 correctly identifies the differences for the test.t1 table on slave1 and slave2. Now, you can use the 
pmp-check-pt-table-checksum
  Percona checksum monitoring plugin. Let’s try to run it locally (via CLI) from the Nagios host.

[root@nagios]# pmp-check-pt-table-checksum -H slave1 -l checksum_user -p checksum_password -P 3306
WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1
[root@nagios]# pmp-check-pt-table-checksum -H slave2 -l checksum_user -p checksum_password -P 3306
WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1]

NOTE: The

checksum_user
 database user needs SELECT privileges on both the checksum table (Percona.checksums) and the slave(s) in order for SQL to alert for checksum differences on slave(s).

On the Nagios monitoring server, you need to add the 

pmp-check-pt-table-checksum
 command to the commands.cfg file:

define command{
command_name            pmp-check-pt-table-checksum
command_line            $USER1$/pmp-check-pt-table-checksum -H $HOSTADDRESS$ -c $ARG1$
        }

NOTE: I used “-c” option for

pmp-check-pt-table-checksum
, which raises a critical error instead of a warning.

And, on the existing hosts.cfg file (i.e., slave1.cfg and slave2.cfg), you need to add a monitoring command accordingly as below:

define service{
        use                             generic-service
        host_name                       slave1
        service_description             Checksum Status
        check_command                   pmp-check-pt-table-checksum!1
}

In this command “1” is an argument to command “-c $ARG1$” so

pmp-check-pt-table-checksum
will raise a critical error when one or more chunks on the slave(s) are different from the master.

Last but not least, restart the Nagios daemon on the monitoring host to make the change.

Below is how it looks like on the Nagios monitoring on the web:

pmp-check-pt-table-checksum
pmp-check-pt-table-checksum

I also think the “INTERVAL” option is useful:

-i INTERVAL     Interval over which to ensure pt-table-checksum was run, in days; default - not to check.

It makes sure that chunks are recent on the checksum table. Used the other way around, it checks on how old your chunks are. This option ensures the checksum cron executes at a defined number of days. Let’s say you have

pt-table-checksum
 cron running once per week. In that case, setting INTERVAL 14 or 21 alerts you if chunks are older then defined number of days (i.e., the INTERVAL number).

Conclusion:

Percona Monitoring plugins for MySQL are very useful and easy to embed in your centralize monitoring dashboard. You can schedule

pt-table-checksum
 via a cron job, and get reports regarding master/slave(s) data drifts (if any) from one global dashboard on the monitoring host. There are various plugins available from Percona, e.g. processlist plugin, replication delay plugin, etc. Along with that, Percona offers Cacti and Zabbix templates to graph various MySQL activities.

by Muhammad Irfan at September 14, 2016 08:52 PM

Daniël van Eeden

Visualizing the MySQL Bug Tide

On the MySQL Bugs website there are some tide stats available. These show rate of bug creation.

I've put them in a graph:
I made these with this IPython Notebook. There are more detailed graphs per version in the notebook.

Update: The version in the notebook now uses the same range for the Y axis and has a marker for the GA dates of each release.

by Daniël van Eeden (noreply@blogger.com) at September 14, 2016 07:08 PM

Jean-Jerome Schmidt

New ClusterControl subscriptions for managing MySQL, MongoDB and PostgreSQL

We’ve got your databases covered: check out our new pricing plans for ClusterControl, the single console to deploy, monitor and manage your entire database infrastructure.

Whether you’re looking to manage standalone instances, need high availability or have 24/7 SLA requirements for your databases, ClusterControl now comes with three enhanced options for you to chose from in addition to its Community Edition.

Standalone

Do you have standalone database servers to manage? Then this is the best plan for you. From real-time monitoring and performance advisors, to analyzing historical query data and making sure all your servers are backed up, ClusterControl Standalone has you covered.

Advanced

As our company name indicates, we’re all about achieving high availability. With ClusterControl Advanced, you can take the guesswork out of managing your high availability database setups - automate failover and recovery of your databases, add load balancers with read-write splits, add nodes or read replicas - all with a couple of clicks.

Enterprise

If you’re looking for all of the above in a 24/7 secure service environment, then look no further. From high-spec operational reports to role-based access control and SSL encryption, this is our most advanced plan aimed at mission-critical environments.

Here is a summary view of the new subscriptions:

Full features table & pricing plans Contact us

Note that ClusterControl can be downloaded for free and that each download includes an initial 30 day trial of ClusterControl Enterprise, so that you can test the full features set of our product. It then becomes ClusterControl Community, should you decide not to purchase a plan. With ClusterControl Community, you can deploy and monitor MySQL, MongoDB and PostgreSQL.

Happy Clustering!

by Severalnines at September 14, 2016 03:42 PM

Peter Zaitsev

Webinar Thursday Sept. 15: Identifying and Solving Database Performance Issues with PMM

PMM

PMMPlease join Roman Vynar, Lead Platform Engineer on Thursday, September 15, 2016 at 10 am PDT (UTC-7) for a webinar on Identifying and Solving Database Performance Issues with PMM.

Database performance is the key to high-performance applications. Gaining visibility into the database is the key to improving database performance. Percona’s Monitoring and Management (PMM) provides the insight you need into your database environment.

In this webinar, we will demonstrate how using PMM for query analytics, in combination with database and host performance metrics, can more efficiently drive tuning, issue management and application development. Using PMM can result in faster resolution times, more focused development and a more efficient IT team.

Register for the webinar here.

register-now

PMMRoman Vynar, Lead Platform Engineer
Roman is a Lead Platform Engineer at Percona. He joined the company to establish and develop the Remote DBA service from scratch. Over time, the growing service successfully expanded to Managed Services. Roman develops the monitoring tools, automated scripts, backup solution, notification and incident tracking web system and currently leading Percona Monitoring and Management project.

by Dave Avery at September 14, 2016 01:51 PM

Black Friday and Cyber Monday: Best Practices for Your E-Commerce Database

E-Commerce Database

E-Commerce DatabaseThis blog post discusses how you can protect your e-commerce database from a high traffic disaster.

Databases power today’s e-commerce. Whether it’s listing items on your site, contacting your distributor for inventory, tracking shipments, payments, or customer data, your database must be up, running, tuned and available for your business to be successful.

There is no time that this is more important than high-volume traffic days. There are specific events that occur throughout the year (such as Black Friday, Cyber Monday, or Singles Day) that you know are going to put extra strain on your database environment. But these are the specific times that your database can’t go down – these are the days that can make or break your year!

So what can you do to guarantee that your database environment is up to the challenge of handling high traffic events? Are there ways of preparing for this type of traffic?

Yes, there are! In this blog post, we’ll look at some of the factors that can help prepare your database environment to handle large amounts of traffic.

Synchronous versus Asynchronous Applications

Before moving to strategies, we need to discuss the difference between synchronous and asynchronous applications.

In most web-based applications, user input starts a number of requests for resources. Once the server answers the requests, no communication stops until the next input. This type of communication between a client and server is called synchronous communication.

Restricted application updates limit synchronous communication. Even synchronous applications designed to automatically refresh application server information at regular intervals have consistent periods of delay between data refreshes. While usually such delays aren’t an issue, some applications (for example, stock-trading applications) rely on continuously updated information to provide their users optimum functionality and usability.

Web 2.0-based applications address this issue by using asynchronous communication. Asynchronous applications deliver continuously updated data to users. Asynchronous applications separate client requests from application updates, so multiple asynchronous communications between the client and server can occur simultaneously or in parallel.

The strategy you use to scale the two types of applications to meet growing user and traffic demands will differ.

Scaling a Synchronous/Latency-sensitive Application

When it comes to synchronous applications, you really have only one option for scaling performance: sharding. With sharding, the tables are divided and distributed across multiple servers, which reduces the total number of rows in each table. This consequently reduces index size, and generally improves search performance.

A shard can also be located on its own hardware, with different shards added to different machines. This database distribution over a large multiple of machines spreads the load out, also improving performance. Sharding allows you to scale read and write performance when latency is important.

Generally speaking, it is better to avoid synchronous applications when possible – they limit your scalability options.

Scaling an Asynchronous Application

When it comes to scaling asynchronous applications, we have many more options than with synchronous applications. You should try and use asynchronous applications whenever possible:

  • Secondary/Slave hosts. Replication can be used to add more hardware for read traffic. Replication usually employs a master/slave relationship between a designated “original” server and copies of the server. The master logs and then distributes the updates to the slaves. This setup allows you to distribute the read load across more than one machine.
  • Caching. Database caching (tables, data, and models – caching summaries of data) improves scalability by distributing the query workload from expensive (overhead-wise) backend processes to multiple cheaper ones. It allows more flexibility for data processing: for example premium user data can be cached, while regular user data isn’t.

    Caching also improves data availability by providing applications that don’t depend on backend services continued service. It also allows for improved data access speeds by localizing the data and avoiding roundtrip queries. There are some specific caching strategies you can use:

    • Pre-Emptive Caching. Ordinarily, an object gets cached the first time it is requested (or if cached data isn’t timely enough). Preemptive caching instead generates cached versions before an application requests them. Typically this is done by a cron process.
    • Hit/Miss Caching. A cache hit occurs when an application or software requests data. First, the central processing unit (CPU) looks for the data in its closest memory location, which is usually the primary cache. If the requested data is found in the cache, it is considered a cache hit. Cache miss occurs within cache memory access modes and methods. For each new request, the processor searched the primary cache to find that data. If the data is not found, it is considered a cache miss. A cache hit serves data more quickly, as the data can be retrieved by reading the cache memory. The cache hit also can be in disk caches where the requested data is stored and accessed by the first query. A cache miss slows down the overall process because after a cache miss, the central processing unit (CPU) will look for a higher level cache, such as random access memory (RAM) for that data. Further, a new entry is created and copied into cache before it can be accessed by the processor.
    • Client-side Caching. Client-side caching allows server data to be copied and cached on the client computer. Client side caching reduces load times by several factors
  • Queuing Updates. Queues are used to order queries (and other database functions) in a timely fashion. There are queues for asynchronously sending notifications like email and SMS in most websites. E-commerce sites have queues for storing, processing and dispatching orders. How your database handles queues can affect your performance:
    • Batching. Batch processing can be used for efficient bulk database updates and automated transaction processing, as opposed to interactive online transaction processing (OLTP) applications.
    • Fan-Out Updates. Fan-out duplicates data in the database. When data is duplicated it eliminates slow joins and increases read performance.

Efficient Usage of Data at Scale

As you scale up in terms of database workload, you need to be able to avoid bad queries or patterns from your applications.

  • Moving expensive queries out of the user request path. Even if your database server uses powerful hardware, its performance can be negatively affected by a handful of expensive queries. Even a single bad query can cause serious performance issues for your database. Make sure to use monitoring tools to track down the queries that are taking up the most resources.
  • Using caching to offload database traffic. Cache data away from the database using something like memcached. This is usually done at the application layer, and is highly effective.
  • Counters and In-Memory Stores. Use memory counters to monitor performance hits: pages/sec, faults/sec, available bytes, total server, target server memory, etc. Percona’s new in-memory storage engine for MongoDB also can help.
  • Connection Pooling. A connection pool made up of cached database connections, remembered so that the connections can be reused for future requests to the database. Connection pools can improve the performance of executing commands on a database.

Scaling Out (Horizontal) Tricks

Scaling horizontally means adding more nodes to a system, such as adding a new server to a database environment to a distributed software application. For example, scaling out from one Web server to three.

  • Pre-Sharding Data for Flexibility. Pre-sharding the database across the server instances allows you to have the entire environment resources available at the start of the event, rather than having to rebalance during peak event traffic.
  • Using “Kill Switches” to Control Traffic. The idea of a kill switch is a single point where you can stop the flow of data to a particular node. Strategically set up kill switches allow you to stop a destructive workload that begins to impact the entire environment.
  • Limiting Graph Structures. By limiting the size or complexity of graph structures in the database, you will simplify data lookups and data size.

Scaling with Hardware (Vertical Scaling)

Another option to handle the increased traffic load is adding more hardware to your environment: more servers, more CPUs, more memory, etc. This, of course, can be expensive. One option here is to pre-configure your testing environment to become part of the production environment if necessary. Another is to pre-configure more Database-as-a-Service (DaaS) instances for the event (if you are a using cloud-based services).

Whichever method, be sure you verify and test your extra servers and environment before your drop-dead date.

Testing Performance and Capacity

As always, in any situation where your environment is going to be stressed beyond usual limits, testing under real-world conditions is a key factor. This includes not only testing for raw traffic levels, but also the actual workloads that your database will experience, with the same volume and variety of requests.

Knowing Your Application and Questions to Ask at Development Time

Finally, it’s important that you understand what applications will be used and querying the database. This sort of common sense idea is often overlooked, especially when teams (such as the development team and the database/operations team) get siloed and don’t communicate.

Get to know who is developing the applications that are using the database, and how they are doing it. As an example, a while back I had the opportunity to speak with a team of developers, mostly to just understand what they were doing. In the process of whiteboarding the app with them, we discovered a simple query issue that – now that we were aware of it – took little effort to fix. These sorts of interactions, early in the process, can save a great deal of headache down the line.

Conclusion

There are many strategies that can help you prepare for high traffic events that will impact your database. I’ve covered a few here briefly. For an even more thorough look at e-commerce database strategies, attend my webinar “Black Friday and Cyber Monday: How to Avoid an E-Commerce Disaster” on Thursday, September 22, 2016 10:00 am Pacific Time.

Register here.

by Tim Vaillancourt at September 14, 2016 12:06 PM

MariaDB AB

Is Your MariaDB Version Affected by the Remote Root Code Execution Vulnerability CVE-2016-6662?

Rasmus Johansson

Over the last few days, there has been a lot of questions and discussion around a vulnerability referred to as MySQL Remote Root Code Execution / Privilege Escalation 0day with CVE code CVE-2016-6662. It’s a serious vulnerability and we encourage every MariaDB Server, MariaDB Enterprise and MariaDB Enterprise Cluster user to read the below update on the vulnerability and how it affects MariaDB products.

The vulnerability can be exploited by both local and remote users. Both an authenticated connection to or SQL injection in an affected version of MariaDB Server can be used to exploit the vulnerability. If successful, a library file could be loaded and executed with root privileges.

The corresponding bug about the vulnerability can be seen in MariaDB’s project tracking with bug number MDEV-10465, which was opened on July 31, 2016.

MariaDB Enterprise and Enterprise Cluster
The following versions of MariaDB Enterprise and Enterprise Cluster include the fix for the vulnerability:

  • 5.5.51 or later versions
  • 10.0.27 or later versions
  • 10.1.17 or later versions

MariaDB Server
All stable MariaDB versions (5.5, 10.0, 10.1) were fixed in August 2016 in the following versions:

  • 5.5.51, released on August 10, 2016
  • 10.0.27, released on August 25, 2016
  • 10.1.17, released on August 30, 2016

If you’re on any of the above versions (or later), rest assured, you’re protected against this vulnerability. If you happen to be testing an alpha version of MariaDB 10.2, please be aware that the fix will be available in version 10.2.2, which is expected to be released soon.

More details on the vulnerability

The vulnerability makes use of the mysqld_safe startup script.

However, if the database user being used has neither SUPER nor FILE privilege or if the user has FILE but –secure-file-priv is set to isolate the location of import and export operations, then the vulnerability is NOT exploitable. It is always a recommended configuration to not grant SUPER privileges and to avoid granting FILE privileges without using –secure-file-priv.

Users that have installed MariaDB Server 10.1.8 or later from RPM or DEB packages are NOT affected by the vulnerability. This is due to the fact that in version 10.1.8, MariaDB started using systemd instead of init to manage the MariaDB service. In this case the mysqld_safe startup script isn’t used.

For the complete report of the vulnerability, please refer to the advisory by Dawid Golunski (legalhackers.com) who discovered the vulnerability.

About the Author

Rasmus Johansson's picture

Rasmus has worked with MariaDB since 2010 and was appointed VP Engineering in 2013. As such, he takes overall responsibility for the architecting and development of MariaDB Server, MariaDB Galera Cluster and MariaDB Enterprise.

by Rasmus Johansson at September 14, 2016 12:00 AM

September 13, 2016

Peter Zaitsev

MySQL CDC, Streaming Binary Logs and Asynchronous Triggers

MySQL CDC

MySQL CDCIn this post, we’ll look at MySQL CDC, streaming binary logs and asynchronous triggers.

What is Change Data Capture and why do we need it?

Change Data Capture (CDC) tracks data changes (usually close to realtime). In MySQL, the easiest and probably most efficient way to track data changes is to use binary logs. However, other approaches exist. For example:

  • General log or Audit Log Plugin (which logs all queries, not just the changes)
  • MySQL triggers (not recommended, as it can slow down the application — more below)

One of the first implementations of CDC for MySQL was the FlexCDC project by Justin Swanhart. Nowadays, there are a lot of CDC implementations (see mysql-cdc-projects wiki for a long list).

CDC can be implemented for various tasks such as auditing, copying data to another system or processing (and reacting to) events. In this blog post, I will demonstrate how to use a CDC approach to stream MySQL binary logs, process events and save it (stream to) another MySQL instance (or MongoDB). In addition, I will show how to implement asynchronous triggers by streaming binary logs.

Streaming binary logs 

You can read binary logs using the mysqlbinlog utility, by adding “-vvv” (verbose option). mysqlbinlog can also show human readable version for the ROW based replication. For example:

# mysqlbinlog -vvv /var/lib/mysql/master.000001
BINLOG '
JxiqVxMBAAAALAAAAI7LegAAAHQAAAAAAAEABHRlc3QAAWEAAQMAAUTAFAY=
JxiqVx4BAAAAKAAAALbLegAAAHQAAAAAAAEAAgAB//5kAAAAedRLHg==
'/*!*/;
### INSERT INTO `test`.`a`
### SET
###   @1=100 /* INT meta=0 nullable=1 is_null=0 */
# at 8047542
#160809 17:51:35 server id 1  end_log_pos 8047573 CRC32 0x56b36ca5      Xid = 24453
COMMIT/*!*/;

Starting with MySQL 5.6, mysqlbinlog can also read the binary log events from a remote master (“fake” replication slave).

Reading binary logs is a great basis for CDC. However, there are still some challenges:

  1. ROW-based replication is probably the easiest way to get the RAW changes, otherwise we will have to parse SQL. At the same time, ROW-based replication binary logs don’t contain the table metadata, i.e. it does not record the field names, only field number (as in the example above “@1” is the first field in table “a”).
  2. We will need to somehow record and store the binary log positions so that the tool can be restarted at any time and proceed from the last position (like a MySQL replication slave).

Maxwell’s daemon (Maxwell = Mysql + Kafka), an application recently released by Zendesk, reads MySQL binlogs and writes row updates as JSON (it can write to Kafka, which is its primary goal, but can also write to stdout and can be extended for other purposes). Maxwell stores the metadata about MySQL tables and binary log events (and other metadata) inside MySQL, so it solves the potential issues from the above list.

Here is a quick demo of Maxwell:

Session 1 (Insert into MySQL):

mysql> insert into a (i) values (151);
Query OK, 1 row affected (0.00 sec)
mysql> update a set i = 300 limit 5;
Query OK, 5 rows affected (0.01 sec)
Rows matched: 5  Changed: 5  Warnings: 0

Session 2 (starting Maxwell):

$ ./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout
16:00:15,303 INFO  Maxwell - Maxwell is booting (StdoutProducer), starting at BinlogPosition[master.000001:15494460]
16:00:15,327 INFO  TransportImpl - connecting to host: 127.0.0.1, port: 3306
16:00:15,350 INFO  TransportImpl - connected to host: 127.0.0.1, port: 3306, context: AbstractTransport.Context[threadId=9,...
16:00:15,350 INFO  AuthenticatorImpl - start to login, user: maxwell, host: 127.0.0.1, port: 3306
16:00:15,354 INFO  AuthenticatorImpl - login successfully, user: maxwell, detail: OKPacket[packetMarker=0,affectedRows=0,insertId=0,serverStatus=2,warningCount=0,message=<null>]
16:00:15,533 INFO  MysqlSavedSchema - Restoring schema id 1 (last modified at BinlogPosition[master.000001:3921])
{"database":"test","table":"a","type":"insert","ts":1472937475,"xid":211209,"commit":true,"data":{"i":151}}
{"database":"test","table":"a","type":"insert","ts":1472937475,"xid":211209,"commit":true,"data":{"i":151}}
{"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}}
{"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}}
{"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}}
{"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}}
{"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"commit":true,"data":{"i":300},"old":{"i":150}}

As we can see in this example, Maxwell get the events from MySQL replication stream and outputs it into stdout (if we change the producer, it can save it to Apache Kafka).

Saving binlog events to MySQL document store or MongoDB

If we want to save the events to some other place we can use MongoDB or MySQL JSON fields and document store (as Maxwell will provide use with JSON documents). For a simple proof of concept, I’ve created nodeJS scripts to implement a CDC “pipleline”:

var mysqlx = require('mysqlx');
var mySession =
mysqlx.getSession({
    host: '10.0.0.2',
    port: 33060,
    dbUser: 'root',
    dbPassword: 'xxx'
});
process.on('SIGINT', function() {
    console.log("Caught interrupt signal. Exiting...");
    process.exit()
});
process.stdin.setEncoding('utf8');
process.stdin.on('readable', () => {
  var chunk = process.stdin.read();
  if(chunk != null) {
    process.stdout.write(`data: ${chunk}`);
    mySession.then(session => {
                    session.getSchema("mysqlcdc").getCollection("mysqlcdc")
                    .add(  JSON.parse(chunk)  ) .execute(function (row) {
                            // can log something here
                    }).catch(err => {
                            console.log(err);
                    })
                    .then( function (notices) {
                            console.log("Wrote to MySQL: " + JSON.stringify(notices))
                    });
    }).catch(function (err) {
                  console.log(err);
                  process.exit();
    });
  }
});
process.stdin.on('end', () => {
  process.stdout.write('end');
  process.stdin.resume();
});

And to run it we can use the pipeline:

./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout --log_level=ERROR  | node ./maxwell_to_mysql.js

The same approach can be used to save the CDC events to MongoDB with mongoimport:

$ ./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout --log_level=ERROR |mongoimport -d mysqlcdc -c mysqlcdc --host localhost:27017

Reacting to binary log events: asynchronous triggers

In the above example, we only recorded the binary log events. Now we can add “reactions”.

One of the practical applications is re-implementing MySQL triggers to something more performant. MySQL triggers are executed for each row, and are synchronous (the query will not return until the trigger event finishes). This was known to cause poor performance, and can significantly slow down bulk operations (i.e., “load data infile” or “insert into … values (…), (…)”). With triggers, MySQL will have to process the “bulk” operations row by row, killing the performance. In addition, when using statement-based replication, triggers on the slave can slow down the replication thread (it is much less relevant nowadays with ROW-based replication and potentially multithreaded slaves).

With the ability to read binary logs from MySQL (using Maxwell), we can process the events and re-implement triggers — now in asynchronous mode — without delaying MySQL operations. As Maxwell gives us a JSON document with the “new” and “old” values (with the default option binlog_row_image=FULL, MySQL records the previous values for updates and deletes) we can use it to create triggers.

Not all triggers can be easily re-implemented based on the binary logs. However, in my experience most of the triggers in MySQL are used for:

  • auditing (if you deleted a row, what was the previous value and/or who did and when)
  • enriching the existing table (i.e., update the field in the same table)

Here is a quick algorithm for how to re-implement the triggers with Maxwell:

  • Find the trigger table and trigger event text (SQL)
  • Create an app or a script to parse JSON for the trigger table
  • Create a new version of the SQL changing the NEW.<field> to “data.field” (from JSON) and OLD.<field> to “old.field” (from JSON)

For example, if I want to audit all deletes in the “transactions” table, I can do it with Maxwell and a simple Python script (do not use this in production, it is a very basic sample):

import json,sys
line = sys.stdin.readline()
while line:
    print line,
    obj=json.loads(line);
    if obj["type"] == "delete":
        print "INSERT INTO transactions_delete_log VALUES ('" + str(obj["data"]) + "', Now() )"
    line = sys.stdin.readline()

MySQL:

mysql> delete from transactions where user_id = 2;
Query OK, 1 row affected (0.00 sec)

Maxwell pipeline:

$ ./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout --log_level=ERROR  | python trigger.py
{"database":"test","table":"transactions","type":"delete","ts":1472942384,"xid":214395,"commit":true,"data":{"id":2,"user_id":2,"value":2,"last_updated":"2016-09-03 22:39:31"}}
INSERT INTO transactions_delete_log VALUES ('{u'last_updated': u'2016-09-03 22:39:31', u'user_id': 2, u'id': 2, u'value': 2}', Now() )

Maxwell limitations

Maxwell was designed for MySQL 5.6 with ROW-based replication. Although it can work with MySQL 5.7, it does not support new MySQL 5.7 data types (i.e., JSON fields). Maxwell does not support GTID, and can’t failover based on GTID (it can parse events with GTID thou).

Conclusion

Streaming MySQL binary logs (for example with Maxwell application) can help to implement CDC for auditing and other purposes, and also implement asynchronous triggers (removing the MySQL level triggers can increase MySQL performance).

by Alexander Rubin at September 13, 2016 10:21 PM

ProxySQL and MHA Integration

MHA

MHAThis blog post discusses ProxySQL and MHA integration, and how they work together.

MHA (Master High Availability Manager and tools for MySQL) is almost fully integrated with the ProxySQL process. This means you can count on the MHA standard feature to manage failover, and ProxySQL to manage the traffic and shift from one server to another.

This is one of the main differences between MHA and VIP, and MHA and ProxySQL: with MHA/ProxySQL, there is no need to move IPs or re-define DNS.

The following is an example of an MHA configuration file for use with ProxySQL:

server default]
    user=mha
    password=mha
    ssh_user=root
    repl_password=replica
    manager_log=/tmp/mha.log
    manager_workdir=/tmp
    remote_workdir=/tmp
    master_binlog_dir=/opt/mysql_instances/mha1/logs
    client_bindir=/opt/mysql_templates/mysql-57/bin
    client_libdir=/opt/mysql_templates/mysql-57/lib
    master_ip_failover_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_failover
    master_ip_online_change_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_online_change
    log_level=debug
    [server1]
    hostname=mha1r
    ip=192.168.1.104
    candidate_master=1
    [server2]
    hostname=mha2r
    ip=192.168.1.107
    candidate_master=1
    [server3]
    hostname=mha3r
    ip=192.168.1.111
    candidate_master=1
    [server4]
    hostname=mha4r
    ip=192.168.1.109
    no_master=1

NOTE: Be sure to comment out the “FIX ME ” lines in the sample/scripts.

After that, just install MHA as you normally would.

In ProxySQL, be sure to have all MHA users and the servers set.

When using ProxySQL with standard replication, it’s important to set additional privileges for the ProxySQL monitor user. It must also have “Replication Client” set or it will fail to check the SLAVE LAG. The servers MUST have a defined value for the attribute

max_replication_lag
, or the check will be ignored.

As a reminder:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',600,3306,1000,0);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.107',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.111',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.109',601,3306,1000,10);
INSERT INTO mysql_replication_hostgroups VALUES (600,601);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
insert into mysql_query_rules (username,destination_hostgroup,active) values('mha_W',600,1);
insert into mysql_query_rules (username,destination_hostgroup,active) values('mha_R',601,1);
insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('mha_RW',600,1,3,'^SELECT.*FOR UPDATE');
insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('mha_RW',601,1,3,'^SELECT');
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_W','test',1,600,'test_mha');
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_R','test',1,601,'test_mha');
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_RW','test',1,600,'test_mha');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK

OK, now that all is ready,  let’s rock’n’roll!

Controlled fail-over

First of all, the masterha_manager should not be running or you will get an error.

Now let’s start some traffic:

Write
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all  run
Read only
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10  --mysql-ignore-errors=all run

Let it run for a bit, then check:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.1.104 | 3306     | ONLINE | 10       | 0        | 20     | 0       | 551256  | 44307633        | 0               | 285        | <--- current Master
| 601       | 192.168.1.111 | 3306     | ONLINE | 5        | 3        | 11     | 0       | 1053685 | 52798199        | 4245883580      | 1133       |
| 601       | 192.168.1.109 | 3306     | ONLINE | 3        | 5        | 10     | 0       | 1006880 | 50473746        | 4052079567      | 369        |
| 601       | 192.168.1.107 | 3306     | ONLINE | 3        | 5        | 13     | 0       | 1040524 | 52102581        | 4178965796      | 604        |
| 601       | 192.168.1.104 | 3306     | ONLINE | 7        | 1        | 16     | 0       | 987548  | 49458526        | 3954722258      | 285        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Now perform the failover. To do this, instruct MHA to do a switch, and to set the OLD master as a new slave:

masterha_master_switch --master_state=alive --conf=/etc/mha.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=0

Check what happened:

[ 160s] threads: 10, tps: 354.50, reads: 3191.10, writes: 1418.50, response time: 48.96ms (95%), errors: 0.00, reconnects:  0.00
[ 170s] threads: 10, tps: 322.50, reads: 2901.98, writes: 1289.89, response time: 55.45ms (95%), errors: 0.00, reconnects:  0.00
[ 180s] threads: 10, tps: 304.60, reads: 2743.12, writes: 1219.91, response time: 58.09ms (95%), errors: 0.10, reconnects:  0.00 <--- moment of the switch
[ 190s] threads: 10, tps: 330.40, reads: 2973.40, writes: 1321.00, response time: 50.52ms (95%), errors: 0.00, reconnects:  0.00
[ 200s] threads: 10, tps: 304.20, reads: 2745.60, writes: 1217.60, response time: 58.40ms (95%), errors: 0.00, reconnects:  1.00
[ 210s] threads: 10, tps: 353.80, reads: 3183.80, writes: 1414.40, response time: 48.15ms (95%), errors: 0.00, reconnects:  0.00

Check ProxySQL:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.1.107 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 123457  | 9922280         | 0               | 658        | <--- new master
| 601       | 192.168.1.111 | 3306     | ONLINE | 2        | 6        | 14     | 0       | 1848302 | 91513537        | 7590137770      | 1044       |
| 601       | 192.168.1.109 | 3306     | ONLINE | 5        | 3        | 12     | 0       | 1688789 | 83717258        | 6927354689      | 220        |
| 601       | 192.168.1.107 | 3306     | ONLINE | 3        | 5        | 13     | 0       | 1834415 | 90789405        | 7524861792      | 658        |
| 601       | 192.168.1.104 | 3306     | ONLINE | 6        | 2        | 24     | 0       | 1667252 | 82509124        | 6789724589      | 265        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

In this case, the servers weren’t behind the master and switch happened quite fast.

We can see that the WRITE operations that normally are an issue, given the need to move around a VIP or change name resolution, had a limited hiccup.

Read operations were not affected, at all. Nice, eh?

Do you know how long it takes to do a switch under these conditions? real 0m2.710s yes 2.7 seconds.

This is more evidence that, most of the time, an MHA-based switch is caused by the need to redirect traffic from A to B using the network.

Crash fail-over

What happened if instead of an easy switch, we have to cover a real failover?

First of all, let’s start masterha_manager:

nohup masterha_manager --conf=/etc/mha.cnf --wait_on_monitor_error=60 --wait_on_failover_error=60 >> /tmp/mha.log 2>&1

Then let’s start a load again. Finally, go to the MySQL node that uses master xxx.xxx.xxx.107

ps aux|grep mysql
mysql    18755  0.0  0.0 113248  1608 pts/0    S    Aug28   0:00 /bin/sh /opt/mysql_templates/mysql-57/bin/mysqld_safe --defaults-file=/opt/mysql_instances/mha1/my.cnf
mysql    21975  3.2 30.4 4398248 941748 pts/0  Sl   Aug28  93:21 /opt/mysql_templates/mysql-57/bin/mysqld --defaults-file=/opt/mysql_instances/mha1/my.cnf --basedir=/opt/mysql_templates/mysql-57/ --datadir=/opt/mysql_instances/mha1/data --plugin-dir=/opt/mysql_templates/mysql-57//lib/plugin --log-error=/opt/mysql_instances/mha1/mysql-3306.err --open-files-limit=65536 --pid-file=/opt/mysql_instances/mha1/mysql.pid --socket=/opt/mysql_instances/mha1/mysql.sock --port=3306
And kill the MySQL process.
kill -9 21975 18755

As before, check what happened on the application side:

[  80s] threads: 4, tps: 213.20, reads: 1919.10, writes: 853.20, response time: 28.74ms (95%), errors: 0.00, reconnects:  0.00
[  90s] threads: 4, tps: 211.30, reads: 1901.80, writes: 844.70, response time: 28.63ms (95%), errors: 0.00, reconnects:  0.00
[ 100s] threads: 4, tps: 211.90, reads: 1906.40, writes: 847.90, response time: 28.60ms (95%), errors: 0.00, reconnects:  0.00
[ 110s] threads: 4, tps: 211.10, reads: 1903.10, writes: 845.30, response time: 29.27ms (95%), errors: 0.30, reconnects:  0.00 <-- issue starts
[ 120s] threads: 4, tps: 198.30, reads: 1785.10, writes: 792.40, response time: 28.43ms (95%), errors: 0.00, reconnects:  0.00
[ 130s] threads: 4, tps: 0.00, reads: 0.60, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.40         <-- total stop in write
[ 140s] threads: 4, tps: 173.80, reads: 1567.80, writes: 696.30, response time: 34.89ms (95%), errors: 0.40, reconnects:  0.00 <-- writes restart
[ 150s] threads: 4, tps: 195.20, reads: 1755.10, writes: 780.50, response time: 33.98ms (95%), errors: 0.00, reconnects:  0.00
[ 160s] threads: 4, tps: 196.90, reads: 1771.30, writes: 786.80, response time: 33.49ms (95%), errors: 0.00, reconnects:  0.00
[ 170s] threads: 4, tps: 193.70, reads: 1745.40, writes: 775.40, response time: 34.39ms (95%), errors: 0.00, reconnects:  0.00
[ 180s] threads: 4, tps: 191.60, reads: 1723.70, writes: 766.20, response time: 35.82ms (95%), errors: 0.00, reconnects:  0.00

So it takes ~10 seconds to perform failover.

To understand better, let see what happened in MHA-land:

Tue Aug 30 09:33:33 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Aug 30 09:33:33 2016 - [info] Reading application default configuration from /etc/mha.cnf..
... Read conf and start
Tue Aug 30 09:33:47 2016 - [debug] Trying to get advisory lock..
Tue Aug 30 09:33:47 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
... Wait for errors
Tue Aug 30 09:34:47 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) <--- Error time
Tue Aug 30 09:34:56 2016 - [warning] Connection failed 4 time(s)..                                     <--- Finally MHA decide to do something
Tue Aug 30 09:34:56 2016 - [warning] Master is not reachable from health checker!
Tue Aug 30 09:34:56 2016 - [warning] Master mha2r(192.168.1.107:3306) is not reachable!
Tue Aug 30 09:34:56 2016 - [warning] SSH is reachable.
Tue Aug 30 09:34:58 2016 - [info] Master failover to mha1r(192.168.1.104:3306) completed successfully. <--- end of the failover

MHA sees the server failing at xx:47, but because of the retry and checks validation, it actually fully acknowledges the downtime at xx:56 (~8 seconds after).

To perform the whole failover, it only takes ~2 seconds (again). Because no movable IPs or DNSs were involved, the operations were fast. This is true when the servers have the binary-log there, but it’s a different story if MHA also has to manage and push data from the binarylog to MySQL.

As you can see, ProxySQL can also help reduce the timing for this scenario, totally skipping the network-related operations. These operations are the ones causing the most trouble in these cases.

by Marco Tusa at September 13, 2016 06:03 PM

MariaDB Foundation

MariaDB 5.5.52 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 5.5.52. See the release notes and changelog for details on this release. IMPORTANT: There was a security fix included in the 5.5.51 release of MariaDB. If you are running MariaDB 5.5.50 or lower, please upgrade to at least MariaDB 5.5.51 right away. See […]

The post MariaDB 5.5.52 now available appeared first on MariaDB.org.

by Daniel Bartholomew at September 13, 2016 04:11 PM

Jean-Jerome Schmidt

Sign up for Part 2 of the MySQL Query Tuning Webinar Trilogy: Indexing & EXPLAIN

When it comes to the query tuning, EXPLAIN is one the most important tools in the DBA’s arsenal. Why is a given query slow, what does the execution plan look like, how will JOINs be processed, is the query using the correct indexes, or is it creating a temporary table?

You can now sign up for the webinar, which takes place at the end of this month on September 27th. We’ll look at the EXPLAIN command and see how it can help us answer these questions.

We will also look into how to use database indexes to speed up queries. More specifically, we’ll cover the different index types such as B-Tree, Fulltext and Hash, deepdive into B-Tree indexes, and discuss the indexes for MyISAM vs. InnoDB tables as well as some gotchas.

MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive

September 27th

Sign up now

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

And if you’d like to be a step ahead, you can also already sign up for the third and last part of this trilogy: MySQL Query Tuning: Working with optimizer and SQL tuning on October 25th.

We look forward to seeing you there!

by Severalnines at September 13, 2016 01:59 PM

MariaDB Foundation

MariaDB Server versions and the Remote Root Code Execution Vulnerability CVE-2016-6662

During the recent days there has been quite a lot of questions and discussion around a vulnerability referred to as MySQL Remote Root Code Execution / Privilege Escalation 0day with CVE code CVE-2016-6662. It’s a serious vulnerability and we encourage every MariaDB Server user to read the below update on the vulnerability from a MariaDB […]

The post MariaDB Server versions and the Remote Root Code Execution Vulnerability CVE-2016-6662 appeared first on MariaDB.org.

by rasmus at September 13, 2016 12:38 PM

Peter Zaitsev

Percona Monitoring and Management (PMM) is now available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the availability of Percona Monitoring and Management (PMM), an open source software database monitoring and management tool. Completely open source and free to download and use, Percona Monitoring and Management provides point-in-time visibility and historical trending of database performance that enables DBAs and application developers to optimize the performance of their MySQL and MongoDB databases.

Percona Monitoring and Management combines several best-of-breed tools, including Grafana, Prometheus, and Consul, in a single, easy-to-manage virtual appliance, along with Percona-developed query analytics, administration, API, agent and exporter components. Percona Monitoring and Management monitors and provides actionable performance data for MySQL variants, including Oracle MySQL Community Edition, Oracle MySQL Enterprise Edition, Percona Server for MySQL, and MariaDB, as well as MongoDB variants, including MongoDB Community Edition, and Percona Server for MongoDB.

PMM is an on-premises solution that keeps all of your performance and query data inside the confines of your environment, with no requirement for any data to cross the Internet.

Percona Monitoring and Management Highlights:

  • Provides query and metric information that enables administrators to optimize database performance
  • Displays current queries and highlights potential query issues to enable faster issue resolution
  • Maps queries against metrics to help make informed decisions about crucial database resources: platform needs, system growth, team focus and the most important database activities.

PMM provides database maintenance teams with better visibility into database and query activity, in order to implement actionable strategies and issue resolution more quickly. More information allows you to concentrate efforts on the areas that yield the highest value.

Like prior versions, PMM is distributed through Docker Hub and is free to download. Full instructions for downloading and installing the server and client are available in the documentation.

A PMM demonstration is available at pmmdemo.percona.com. We have also implemented forums for PMM discussions.

There will also be a webinar with Percona’s Roman Vynar, Lead Platform Engineer on Thursday, September 15, 2016 at 10:00am PDT (UTC-7) about “Identifying and Solving Database Performance Issues with PMM.” Register here for the webinar to learn more about PMM.  Can’t attend the webinar we got you covered! Register anyways and we’ll send you the recording and slides even if you can’t attend the webinar.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

As always, thanks for your continued support of Percona!

by Bob Davis at September 13, 2016 10:00 AM

September 12, 2016

Peter Zaitsev

Is Your Database Affected by CVE-2016-6662?

CVE-2016-6662

CVE-2016-6662In this blog post, I will discuss the CVE-2016-6662 vulnerability, how to tell if it affects you, and how to prevent the vulnerability from affecting you if you have an older version of MySQL.

I’ll also list which MySQL versions include the vulnerability fixes.

As we announced in a previous post, there are certain scenarios in Percona Server (and MySQL) that can allow a remote root code execution (CVE-2016-6662).

Vulnerability approach

The website legalhackers.com contains the full, current explanation of the CVE-2016-6662 vulnerability.

To summarize, the methods used to gain

root
 privileges require multiple conditions:

  1. A remote (or even local) MySQL user that has
    FILE
     permissions (or
    SUPER
    , which encompasses all of them).
  2. Improper OS files/directories permissions around MySQL configuration files that allow the MySQL system user access to modify or create new configuration files.
  3. Several techniques alter the MySQL configuration to include loading a malicious shared library.
    The techniques currently described require
    FILE
     or
    SUPER
     privileges, but also include the currently undisclosed CVE-2016-6663 (which demonstrates how to alter the configuration without
    FILE
     privileges).
  4. Have that malicious shared library loaded when MySQL restarts, which includes the code that allows privilege escalation.

Fixed versions

MySQL fixes

MySQL seems to have already released versions that include the security fixes.

This is coming from the release notes in MySQL 5.6.33:

  • For mysqld_safe, the argument to --malloc-lib now must be one of the directories /usr/lib/usr/lib64/usr/lib/i386-linux-gnu, or /usr/lib/x86_64-linux-gnu. In addition, the --mysqld and --mysqld-version options can be used only on the command line and not in an option file. (Bug #24464380)
  • It was possible to write log files ending with .ini or .cnf that later could be parsed as option files. The general query log and slow query log can no longer be written to a file ending with .ini or .cnf. (Bug #24388753)
  • Privilege escalation was possible by exploiting the way REPAIR TABLE used temporary files. (Bug #24388746)

You aren’t affected if you use version 5.5.52, 5.6.33 or 5.7.15.

Release notes: 5.5.525.6.335.7.15

Percona Server

The way Percona increased security was by limiting which libraries are allowed to be loaded with 

LD_PRELOAD
 (including
--malloc-lib
), and limiting them to
/usr/lib/
/usr/lib64
 and the MySQL installation base directory.

This means only locations that are accessible by

root
 users can load shared libraries.

The following Percona Server versions have this fix:

We are working on releasing new Percona XtraDB Cluster versions as well.

Future Percona Server releases will include all fixes from MySQL.

MariaDB

MariaDB has fixed the issue in 5.5.5110.1.17 and 10.0.27

I have an older MySQL Version, what to do now?

It is possible to change the database configuration so that it isn’t affected anymore (without changing your MySQL versions and restarting your database). There are several options, each of them focusing on one of the conditions required for the vulnerability to work.

Patch
mysqld_safe
 Manually

Just before publishing this, a blogpost came out with another alternative on how to patch your server: https://www.psce.com/blog/2016/09/12/how-to-quickly-patch-mysql-server-against-cve-2016-6662/.

Database user permissions

One way to avoid the vulnerability is making sure no remote user has 

SUPER
 or
FILE
 privileges.

However, CVE-2016-6663 mentions there is a way to do this without any

FILE
 privileges (likely related to the
REPAIR TABLE
 issue mentioned in MySQL release notes).

Configuration files permissions

The vulnerability needs to be able to write to some MySQL configuration files. Prevent that and you are secure.

Make sure you configure permissions for various config files as follows:

  • MySQL reads configuration files from different paths, including from your
    datadir
    • Create an (empty)
      my.cnf
        and
      .my.cnf
       in the
      datadir
       (usually 
      /var/lib/mysql
      ) and make
      root
       the owner/group with
      0644
       permissions.
    • Other Locations to look into:
      /etc/my.cnf /etc/mysql/my.cnf /usr/etc/my.cnf ~/.my.cnf
        (
      mysqld --help --verbose
       shows you where
      mysqld
       will look)
  • This also includes 
    !includedir
     paths defined in your current configurations — make sure they are not writeable by the
    mysql
     user as well
  • No config files should be writeable by the
    mysql
     user (change ownership and permissions)

by Kenny Gryp at September 12, 2016 10:55 PM

Percona Live Europe featured talk with Ronald Bradford — Securing your MySQL/MariaDB data

Percona Live Europe featured talk

Percona Live Europe featured talkWelcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Ronald Bradford, Founder & CEO of EffectiveMySQL. His talk will be on Securing your MySQL/MariaDB data. This talk will answer questions like:

  • How do you apply the appropriate filesystem permissions?
  • How do you use TLS/SSL for connections, and are they good for replication?
  • Encrypting all your data at rest
  • How to monitor your database with the audit plugin

. . . and more. I had a chance to speak with Ronald and learn a bit more about database security:

PerconaGive me a brief history of yourself: how you got into database development, where you work, what you love about it?

Ronald: My first introduction to relational theory and databases was with the writings of C.J. Date and Michael Stonebraker while using the Ingres RDBMS in 1988. For 28 years, my industry experience in the database field has covered a number of relational and non-relational products, including MySQL – which I started using at my first startup in 1999. For the last 17 years, I have enjoyed contributing to the MySQL ecosystem in many ways. I’ve consulted with hundreds of organizations, both small and large, that rely on MySQL to deliver strategic value to their business customers. I have given over 130 presentations in the past ten years across six continents and published a number of books and blog articles from my experiences with MySQL and open source. I am also the organizer of the MySQL Meetup group in New York City.

My goals have always been to help educate the current generation of software engineers to appreciate, use and maximize the right product for the job. I always hope that MySQL is the right solution, but recommend other options when it is not.

I am presently looking for my next opportunity to help organizations develop a strategic and robust data infrastructure that ensures business continuity for growing needs – ensuring a reliable and consistent user experience.

Percona: Your talk is called “Securing your MySQL/MariaDB data.” Why is securing your database important, and what are the real-world ramifications for a database security breach?

Ronald: We secure the belongings in our home, we secure the passengers in our car, we secure the possessions we carry on us. Data is a valuable asset for many organizations, and for some it is the only asset of value for continued operation. Should such important business information not have the same value as people or possessions?

Within any industry, you want to be the disruptor and not the disrupted. The press coverage on any alleged or actual data breach generally leads to a loss of customer confidence. This in turn can directly affect your present and future business viability – enabling competitors to take advantage of the situation. Data security should be as important as data recovery and system performance. Today we hear about data breaches on a weekly basis – everything from government departments to large retail stores. We often do not hear of the data breaches that can occur with smaller organizations, who also have your information: your local medical provider, or a school or university that holds your personal information.

A data breach can be much more impactful than data loss. It can be harder to detect and assess the long-term impact of a security breach because there might be unauthorized access over a longer time period. Often there are insufficient audit trails and logs to validate the impact of any security breach. Inadequate access controls can also lead to unauthorized data access both internally and externally. Many organizations fail to manage risk by not providing a “least privileges required approach” for any access to valuable data by applications or staff.

Any recent real-world example highlights the potential of insufficient data security, and therefore the increased risk of your personal information being used illegally. What is your level of confidence about security when you register with a new service and then you receive an email with your login and password in clear text? If your password is not secure, your personal data is also not secure and now it’s almost impossible for your address, phone number and other information to be permanently removed from this insecure site.

Percona: Are there significant differences between security for on-premise and cloud-based databases? What are they?

Ronald: There should be no differences in protecting your data within MySQL regardless of where this is stored.  When using a cloud-based database there is the additional need to have a shared responsibility with your cloud provider ensuring their IaaS and provided services have adequate trust and verification. For example, you need to ensure that provisioned disk and memory is adequately zeroed after use, and also ensure that adequate separation exists between hosts and clients on dedicated equipment in a virtualized environment. While many providers state these security and compliance processes, there have been instances where data has not been adequately protected.

Just as you may trust an internal department with additional security in the physical and electronic access to the systems that hold your data, you should “trust but verify” your cloud provider’s capacity to protect your data and that these providers continue to assess risk regularly and respond appropriately.

Percona: What is changing in database security that keeps you awake at night? What things does the market need to address immediately?

Ronald: A discussion with a CTO recently indicated he was worried about how their infrastructure would support high availability: what is the impact of any outage, and how does the organization know if he is prepared enough? Many companies, regardless of their size, are not prepared for either a lack of availability or a security breach.

The recent Delta is an example of an availability outage that cost the company many millions of dollars. Data security should be considered with the exact same concern, however it is often the poor cousin to availability. Disaster recovery is a commonly used term for addressing the potential loss of access to data, but there is not a well-known term or common processes for addressing data protection.

You monitor the performance of your system for increased load and for slow queries. When did you last monitor the volume of access to secure data to look for unexpected patterns or anomalies? A data breach can be a single SQL statement that is not an expected application traffic pattern. How can you protect your data in this situation? We ask developers to write unit tests to improve code coverage. Does your organization ask developers to write tests to perform SQL injection, or write SQL statements that should not be acceptable to manipulate data and are therefore correctly identified, alerted and actioned? Many organizations run load and volume testing regularly, but few organizations run security drills as regularly.

As organizations continue to address the growing data needs in the digital age, ongoing education and awareness are very important. There is often very little information in the MySQL ecosystem about validating data security, determining what is applicable security monitoring, and what is the validation and verification of authorized and unauthorized data access. What also needs to be addressed is the use (and abuse) of available security in current and prior MySQL versions. The key advancements in MySQL 5.6 and MySQL 5.7, combined with a lack of a migration path for organizations, is a sign that ongoing security improvements are not considered as important as other features.

Percona: What are looking forward to the most at Percona Live Europe this year?

Ronald: Percona Live Europe is a chance for all attendees, including myself, to see, hear and share in the wide industry use of MySQL today (and the possibilities tomorrow).

With eight sessions per time slot, I often wish for the ability to be in multiple places at  once! Of particular interest to myself are new features that drive innovation of the product, such as MySQL group replication.

I am also following efforts related to deploying your application stack in containers using Docker. Solving the state and persistence needs of a database is very different to providing application micro-services. I hope to get a better appreciation for finding a balance between the use of containers, VMs and dedicated hardware in a MySQL stack that promotes accelerated development, performance, business continuity and security.

You can read more about Ronald and his thoughts on database security at ronaldbradford.com.

Want to find out more about Ronald, MySQL/MariaDB and security? Register for Percona Live Europe 2016, and come see his talk Securing your MySQL/MariaDB data.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

by Dave Avery at September 12, 2016 03:59 PM

Percona Server Critical Update CVE-2016-6662

CVE-2016-6662

CVE-2016-6662This blog is an announcement for a Percona Server update with regards to CVE-2016-6662.

We have added a fix for CVE-2016-6662 in the following releases:

From seclist.org:

An independent research has revealed multiple severe MySQL vulnerabilities. This advisory focuses on a critical vulnerability with a CVEID of CVE-2016-6662. The vulnerability affects MySQL servers in all version branches (5.7, 5.6, and 5.5) including the latest versions, and could be exploited by both local and remote attackers.

Both the authenticated access to MySQL database (via network connection or web interfaces such as phpMyAdmin) and SQL Injection could be used as exploitation vectors. Successful exploitation could allow attackers to execute arbitrary code with root privileges which would then allow them to fully compromise the server on which an affected version of MySQL is running.

This is a CRITICAL update, and the fix mitigates the potential for remote root code execution.

We encourage our users to update to the latest version of their particular fork as soon as possible, ensuring they have appropriate change management procedures in place beforehand so they can test the update before placing it into production.

Percona would like to thank Dawid Golunski of http://legalhackers.com/ for disclosing this vulnerability in the MySQL software, and working with us to resolve this problem.

by David Busby at September 12, 2016 02:06 PM

MariaDB AB

Creating a MariaDB MaxScale Router Module

Anders Karlsson

I wanted to do some tests with MariaDB MaxScale and realized that the two existing routers (beyond the binlog router that is, which is a bit special) didn't do what I wanted them to do. What I was looking for was a simple round-robin feature and neither the readconnroute nor readwritesplit could be configured to do this. They are just too smart for my simple experiment.

Why would you want a round-robin router? Well, one use case is when you are INSERTing a lot of data and you just want to persist it. You don't have the use case where you have to SELECT data from all servers, but in the case you need it, you just select from all servers until you find what you need. Let's think about log data that you don't care much about but that you for some reason need to retain, maybe for corporate policy reasons or legal reasons. Using round-robin could, in theory, give you better performance, but that would require something way smarter than what I am proposing here. Rather, you get INSERT availability, i.e. you will always have some server to insert into and secondly, you get INSERT sharding, which is basic but useful, you only store so much data on each server.

So, let's get to work. To begin with you need the MaxScale source tree, place yourself in some directory where you want this and do this:

$ git clone https://github.com/mariadb-corporation/MaxScale.git

Now you should have a directory called MaxScale, so pop in there, create a build directory and then run cmake to configure MaxScale itself: 

$ cd MaxScale
$ mkdir build
$ cd build
$ cmake ..
$ make

These are the quick instructions and you will probably find that you lack some dependencies. The full instructions for how to do this is available as part of the sample code as presented later in this document and that is available from Sourceforge. Browse to https://sourceforge.net/projects/mxsroundrobin/files and then click on roundrobin 1.0 where you will find a pdf with detailed instructions. Also there is a tgz there will all the sourcecode presented late in this blog.

So, now we have something to work with and the plan is to introduce a new router module in this tree. To begin with pop over to where the routers module code is and create a directory for our code there:

$ cd ../server/modules/routing
$ mkdir roundrobin
$ cd roundrobin

Before we can start building some code, let's look at the basics of what kind of code gets into a module.

A plugin is a shared object that is loaded by MaxScale core when it starts. Early on when MaxScale starts it reads the configuration file, /etc/maxscale.cnf by default, and in there each service defines a router. Note that several services can use the same router so the code we write later has to take this into account. Look at this extract of a service section for example:

[Read-Write Service]
type=service
router=readwritesplit

The router here tells MaxScale to look for a readwritesplit module, or in technical terms, it will load the shared library: libreadwritesplit.so. After loading this library successfully, MaxScale has to figure out a few things about this module, like its name and version and above all, the entry points for the functions that MaxScale will call when processing a connection. In addition, we need to define a few structs that are passed around these different calls to give the different router functions some context. Let's start with a header file rooundrobin.h in the roundrobin directory:

#ifndef ROUNDROBIN_H
#define ROUNDROBIN_H
#include <server.h>

typedef struct tagROUNDROBININST *PROUNDROBININST;

typedef struct tagROUNDROBIN_CLIENT_SES {
  SPINLOCK lock;
  bool bClosed;
  SERVER **pBackends;
  SESSION *pSession;
  DCB **pdcbClients;
  unsigned int nBackends;
  unsigned int nCurrBackend;
  PROUNDROBININST pRouter;
  struct tagROUNDROBIN_CLIENT_SES *pNext;
} ROUNDROBIN_CLIENT_SES, *PROUNDROBIN_CLIENT_SES;

typedef struct tagROUNDROBININST {
  SERVICE *pService;
  PROUNDROBIN_CLIENT_SES pConnections;
  SPINLOCK lock;
  SERVER **pBackends;
  unsigned int nBackends;
  struct tagROUNDROBININST *pNext;
} ROUNDROBININST;
#endif

As you can see, the main thing here is that I define and typedef two structs. As I said, I have mostly been looking at other existing routers and grabbed the stuff in there, so I can't explain all aspects of these structs, but let's look at a few members:

These structs are in a linked list and the pNext member is a pointer to the next element in this list.

The lock members is a reference to a spinlock associated with the struct.

The pBackends member is a pointer to an array of pointers to the database SERVERS that this service is attached to.

The pbcdClients member is an array of pointers to DCDs. A DCB is the Descriptor Control Block which is a generic descriptor of a connection inside MaxScale, be it a server or a client. In this case this is the DCBs to the SERVERs in pBackends.

The nBackends is the number of elements in the pBackends and pdcbClientsarrays.

The pRouter member is a pointer to the ROUNDROBININST for the connection.

That is the most of that, the next step now is to start with the more exiting stuff of the actual code that make up this module. The main source file we work with here is roundrobin.c and we need a few basics in this. Let's have a look the beginning of roundrobin.c:

#include <my_config.h>
#include <router.h>
#include <query_classifier.h>
#include <mysql_client_server_protocol.h=>
#include "roundrobin.h"

/* Macros. */
#define ROUNDROBIN_VERSION "1.0.0"

/* Globals. */
MODULE_INFO info = {
  MODULE_API_ROUTER,
  MODULE_GA,
  ROUTER_VERSION,
  "A simple roundrobin router"
};
static PROUNDROBININST pInstances;

/* Function prototypes for API. */
static ROUTER *CreateInstance(SERVICE *service, char **options);
static void *CreateSession(ROUTER *pInstance, SESSION *session);
static void CloseSession(ROUTER *pInstance, void *session);
static void FreeSession(ROUTER *pInstance, void *session);
static int RouteQuery(ROUTER *pInstance, void *session, GWBUF *queue);
static void Diagnostic(ROUTER *pInstance, DCB *dcb);
static void ClientReply(ROUTER *pInstance, void *router_session,
  GWBUF *queue, DCB *backend_dcb);
static void HandleError(ROUTER *pInstance, void *router_session,
  GWBUF *errmsgbuf, DCB *backend_dcb, error_action_t action,
  bool *succp);
static int GetCapabilities();

static ROUTER_OBJECT RoundRobinRouter = {
  CreateInstance,
  CreateSession,
  CloseSession,
  FreeSession,
  RouteQuery,
  Diagnostic,
  ClientReply,
  HandleError,
  GetCapabilities
};

Let's now look at what is going on here. To begin with, I include a few necessary files, including roundrobin.h that we created above and then a macro is defined. Then the MODULE_INFO struct follows. The information in this is used by MaxScale to get information on the router, but if you leave this out, currently MaxScale will start anyway. The command shows modules in maxadmin will return the information in this struct for the module.

Then follows a number of function prototypes, and these are needed here before the ROUTER_OBJECT struct, and this is the key to the router as it provides the entry points for MariaDB itself. Again, I will not specify exactly what all of these do, I have mostly just grabbed code from other routers.

Following this, we need some basic functions that all routers implement, to initialize the module, get the version and a function to return the ROUTER OBJECT defined above:

/*
 * Function: ModuleInit()
 * Initialize the Round Robin router module.
 */
void ModuleInit()
   {
   MXS_NOTICE("Initialise roundrobin router module version " ROUNDROBIN_VERSION ".");
   pInstances = NULL;
   } /* End of ModuleInit(). */


/*
 * Function: version()
 * Get the version of the roundrobin router
 */
char *version()
   {
   return ROUNDROBIN_VERSION;
   } /* End if version(). */


/*
 * Function: GetModuleObject()
 * Get the object that describes this module.
 */
ROUTER_OBJECT *GetModuleObject()
   {
   return &RoundRobinRouter;
   } /* End of GetModuleObject(). */

With that we have completed the housekeeping code and are ready to look at the functions that implement the actual functionality. We'll look at CreateInstance first which, as the name implies, creates an snstance of RoundRobin. Note that within a running MaxScale, there might well be more than one instance, one for each RoundRobin service.

/*
 * Function: CreateInstance()
 * Create an instance of RoundRobing router.
 */
ROUTER *CreateInstance(SERVICE *pService, char **pOpts)
   {
   PROUNDROBININST pRet;
   PROUNDROBININST pTmp;
   SERVER_REF *pSvcRef;
   unsigned int i;

   MXS_NOTICE("Creating roundrobin router instance.");
/* Allocate the RoundRobin instance struct. */
   if((pRet = malloc(sizeof(ROUNDROBININST))) == NULL)
      return NULL;
   pRet->pService = pService;
   pRet->pConnections = NULL;
   pRet->pNext = NULL;
   pRet->nBackends = 0;

/* Count the number of backend servers we manage. */
   for(pSvcRef = pService->dbref; pSvcRef != NULL; pSvcRef = pSvcRef->next)
      pRet->nBackends++;

/* Allocate space for the backend servers and add to the instance struct. */
   if((pRet->pBackends = calloc(pRet->nBackends, sizeof(SERVER *))) == NULL)
      {
      free(pRet);
      return NULL;
      }

   spinlock_init(&pRet->lock);

/* Set up list of servers. */
   for(i = 0, pSvcRef = pService->dbref; pSvcRef != NULL; i++, pSvcRef = pSvcRef->next)
      pRet->pBackends[i] = pSvcRef->server;

/* Set up instance in list. */
   if(pInstances == NULL)
      pInstances = pRet;
   else
      {
      for(pTmp = pInstances; pTmp->pNext != NULL; pTmp = pTmp->pNext)
         ;
      pTmp->pNext = pRet;
      }

   MXS_NOTICE("Created roundrobin router instance.");
   return (ROUTER *) pRet;
   } /* End of CreateInstance(). */

Again, nothing really exiting is happening, I create a struct that defines the instance, initialize it and add it to the linked list of instances that I maintain. Also I get references to the backend servers that this instance uses and set up the array for it and I also initialize the spinlock. With that, we are done. Then there is the issue of creating a session and this function gets called when a client connects to MaxScale through the port that is linked to RoundRobin.

/*
 * Function: CreateSession()
 * Create a session in the RoundRobin router.
 */
void *CreateSession(ROUTER *pInstance, SESSION *session)
   {
   PROUNDROBIN_CLIENT_SES pRet;
   PROUNDROBIN_CLIENT_SES pTmp;
   PROUNDROBININST pRoundRobinInst = (PROUNDROBININST) pInstance;
   unsigned int i;

/* Allocating session struct. */
   if((pRet = malloc(sizeof(ROUNDROBIN_CLIENT_SES))) == NULL)
      return NULL;
   spinlock_init(&pRet->lock);
   pRet->pNext = NULL;
   pRet->nCurrBackend = 0;
   pRet->pSession = session;
   pRet->pRouter = pRoundRobinInst;
   pRet->nBackends = pRoundRobinInst->nBackends;

/* Allocating backends and DCBs. */
   if((pRet->pBackends = calloc(pRet->nBackends, sizeof(SERVER *))) == NULL)
      {
      free(pRet);
      return NULL;
      }
   if((pRet->pdcbClients = calloc(pRet->nBackends, sizeof(DCB *))) == NULL)
      {
      free(pRet->pBackends);
      free(pRet);
      return NULL;
      }

/* Set servers and DCBs. */
   for(i = 0; i < pRet->nBackends; i++)
      {
      pRet->pBackends[i] = pRoundRobinInst->pBackends[i];
      pRet->pdcbClients[i] = NULL;
      }

/* Place connecting last in list of connections in instance. */
   spinlock_acquire(&pRoundRobinInst->lock);
   if(pRoundRobinInst->pConnections == NULL)
      pRoundRobinInst->pConnections = pRet;
   else
      {
      for(pTmp = pRoundRobinInst->pConnections; pTmp->pNext != NULL; pTmp = pTmp->pNext)
         ;
      pTmp->pNext = pRet;
      }
   spinlock_release(&pRoundRobinInst->lock);

   return (void *) pRet;
   } /* End of CreateSession(). */

This is also pretty basic stuff, the server pointers are copied from the instance (do I need to do this you ask? Answer is, I don't know but I do know that what I do here works). I also clear the DCB pointers, these are created on an as-needed base later in the code.

Following this are a couple of basic housekeeping functions that I am not showing here. Actually, I'm just going to show one more function, which is RouteQuery. This is, as the name implies, the function that gets called to do what we are actually writing this code for - routing queries. Before I show that code, I have to explain that this is very simplistic code. To being with, it doesn't implement "session commands", these are commands that really should be run on all backends, like setting the current database, handling transactions and such things. As I said, I do not implement this and this is one of the major shortcomings on this code that makes it much less generally applicable. But it still has use cases. Secondly, I have tried to make sure that the code works, more than optimizing it to death, so maybe I grab the spinlock too often and maybe I am too picky with allocating/deallocating the DCBs, I let others answer that.

The role of the function at hand is to handle an incoming query and pass it along to one of the servers defined for the service in question. In the general case, the most complicated part of this is selection of which server to route the query to and handling of session commands. I have simplified this by only having a very simple routing algorithm where I store the index of the last used backed for a connection in the nCurrBackend member, and for each query this is incremented until nBackends is reached where it is reset to 0. And for the complexity of session commands, I just don't implement them.

So, lets have a look at what the RouteQuery function looks like:

/*
 * Function: RouteQuery()
 * Route a query in the RoundRobin router.
 */
int RouteQuery(ROUTER *instance, void *session, GWBUF *queue)
   {
   PROUNDROBIN_CLIENT_SES pSession = (PROUNDROBIN_CLIENT_SES) session;
   DCB *pDcb;
   int nRet;
   unsigned int nBackend;

   MXS_NOTICE("Enter RoundRobin RouteQuery.");
   queue = gwbuf_make_contiguous(queue);

   spinlock_acquire(&pSession->lock);
/* Check for the next running backend. Set non-running backend DCBs to NULL. */
   for(nBackend = pSession->nCurrBackend; nBackend < pSession->nBackends; nBackend++)
      {
/* If this server is up, then exit this loop now. */
      if(!SERVER_IS_DOWN(pSession->pBackends[nBackend]))
         break;

/* If the server is down and the DCB is non-null, then free the DCB and NULL it now. */
      if(pSession->pdcbClients[nBackend] != NULL)
         {
         dcb_close(pSession->pdcbClients[nBackend]);
         pSession->pdcbClients[nBackend] = NULL;
         }
      }
/* If I couldn't find a backend after the current, then look through the ones before. */
   if(nBackend >= pSession->nBackends)
      {
      for(nBackend = 0; nBackend <= pSession->nCurrBackend; nBackend++)
         {
         if(!SERVER_IS_DOWN(pSession->pBackends[nBackend]))
            break;
         if(pSession->pdcbClients[nBackend] != NULL)
            {
            dcb_close(pSession->pdcbClients[nBackend]);
            pSession->pdcbClients[nBackend] = NULL;
            }
         }

/* Check that I really found a suitable backend. */
      if(nBackend > pSession->nCurrBackend)
         {
         spinlock_release(&pSession->lock);
         MXS_NOTICE("No suitable RoundRobin running server found in RouteQuery.");
         return 0;
         }
      }

   pDcb = pSession->pdcbClients[nBackend];
/* If backend DCB wasn't set, then do that now. */
   if(pDcb == NULL)
      pDcb = pSession->pdcbClients[nBackend] = dcb_connect(pSession->pBackends[nBackend],
        pSession->pSession,
        pSession->pBackends[nBackend]->protocol);
   spinlock_release(&pSession->lock);

/* Route the query. */
   nRet = pDcb->func.write(pDcb, queue);

/* Move to next dcb. */
   pSession->nCurrBackend = nBackend;
   if(++pSession->nCurrBackend >= pSession->nBackends)
      pSession->nCurrBackend = 0;

   MXS_NOTICE("Exit RoundRobin RouteQuery.");
   return 1;
   } /* End of RouteQuery(). */

So, what is going on here? First I check for a backend, first the ones starting with the current one (which is badly named, this is actually the one after the current) and then until I find a server that is running. If I find a non-Running server I skip that one, after having closed the associated DCB. If I can't find a server after the current one, I start again from the first, processing servers in the same way.

Following this I should have a server, then I check if the DCB is open, and if not I open it now. After that I do the actual routing of the query, move not the next backend and then return. Simple as that. As I have stated, this is a very simple router, but it does work, within the given limitations, and it should be good enough as a crude example.

Before I can test my code, I have to set it up for inclusion in the build process and do a few other mundane tasks, but that is all documented in the pdf that comes with the code, download the package from Sourceforge.

by Anders Karlsson at September 12, 2016 09:07 AM

Colin Charles

Speaking in September 2016

A few events, but mostly circling around London:

  • Open collaboration – an O’Reilly Online Conference, at 10am PT, Tuesday September 13 2016 – I’m going to be giving a new talk titled Forking Successfully. I’ve seen how the platform works, and I’m looking forward to trying this method out (its like a webminar but not quite!)
  • September MySQL London Meetup – I’m going to focus on MySQL, a branch, Percona Server and the fork MariaDB Server. This will be interesting because one of the reasons you don’t see a huge Emacs/XEmacs push after about 20 years? Feature parity. And the work that’s going into MySQL 8.0 is mighty interesting.
  • Operability.io should be a fun event, as the speakers were hand-picked and the content is heavily curated. I look forward to my first visit there.

by Colin Charles at September 12, 2016 03:44 AM

September 10, 2016

Valeriy Kravchuk

Fun with Bugs #45 - On Some Bugs Fixed in MySQL 5.7.15

Oracle released MySQL 5.7.15 recently, earlier than expected. The reason for this "unexpected" release is not clear to me, but it could happen because of a couple of security related internal bug reports that got fixed:

  • "It was possible to write log files ending with .ini or .cnf that later could be parsed as option files. The general query log and slow query log can no longer be written to a file ending with .ini or .cnf. (Bug #24388753)
  • Privilege escalation was possible by exploiting the way REPAIR TABLE used temporary files. (Bug #24388746)"
Let me concentrate on the most important fixes to bugs and problems reported by Community users. First of all, in MySQL 5.7.15 one can just turn off InnoDB deadlock detection using the new  innodb_deadlock_detect dynamic server variable. Domas had explained the positive effect of this more than 6 years ago in his post. Some improvements to the way deadlock detection worked in MySQL happened in frames of fix for the Bug #49047 long time ago, but this time Oracle just implemented a way to disable check and rely on InnoDB lock wait timeout instead.

Other InnoDB-related fixes to problems reported in public bugs database include:
  • Bug #82073 - "Crash with InnoDB Encryption, 5.7.13, FusionIO & innodb_flush_method=O_DIRECT". It was reported by my colleague from MariaDB, Chris Calender, and verified by other my colleague from MariaDB, Jan Lindström. Probably Bugs Verification Team in Oracle just had no access to proper hardware to verify this.
  • Bug #79378 - "buf_block_align() makes incorrect assumptions about chunk size". This bug was reported by Alexey Kopytov, who had provided a patch.
There were several fixes to replication-related bugs:
  • Bug #81675 - "mysqlbinlog does not free the existing connection before opening new remote one". It was reported by Laurynas Biveinis from Percona, who had also provided a patch, and verified by Umesh.
  • Bug #80881 - "MTR: binlog test suite failed to cleanup (contribution)". This fix to the binlog test suit was contributed by Daniel Black and verified by Umesh.
  • Bug #79867 - "unnecessary using temporary for update". This bug was reported by Zhang Yingqiangwho had also contributed a patch (that was not used after all, according to the comment from Oracle developer). It was verified by Umesh.
 Some more bugs from other categories were also fixed:
  • Bug #82125 - "@@basedir sysvar value not normalized if set through the command line/INI file". It was reported by Georgi Kodinov from Oracle. It's funny that there is a typo in the release notes when this fix is described (pay attention to slashes):
    "If the basedir system variable was set at server startup from the command line or option file, the value was not normalized (on Windows, / was not replaced with /)"
  • Bug #82097 is private. I can not say anything about it in addition to this:
    "kevent statement timer subsystem deinitialization was revised to avoid a mysqld hang during shutdown on OS X 10.12."
    I can repeat, though, my usual statement that in most cases making bugs private is a wrong thing to do. I feel myself personally insulted every time when I see that fixed bug report remains private.
  • Bug #81666 - "The MYSQL_SERVER define not defined du to spelling error in plugin.cmake". It was reported by Magnus Blåudd who had provided a patch also.
  • Bug #81587 - "Combining ALTER operations triggers table rebuild". This bug was reported by Daniël van Eeden and verified by Umesh.
  • Bug #68972 - "Can't find temporary table". This bug (that could happen in a stored procedure or when prepared statements are used) was reported by Cyril Scetbon and verified by Miguel Solorzano.
  • Bug #82019 - "Is client library supposed to retry EINTR indefinitely or not". It was reported by Laurynas Biveinis from Percona, who had also contributed patches later. This bug was verified formally by Sinisa Milivojevic.
To summarize, you should consider upgrade to MySQL 5.7.15 for sure if you use FusionIO or want to be able to disable InnoDB deadlock detection entirely, or if you consider security-related fixes in this release really important (I don't). Otherwise just check other fixes that could impact you positively, or just wait for 5.7.16...

by Valeriy Kravchuk (noreply@blogger.com) at September 10, 2016 05:39 PM

September 09, 2016

Peter Zaitsev

Don’t Spin Your Data, Use SSDs!

ssds

ssdsThis blog post discussed the advantages of SSDs over HDDs for database environments.

For years now, I’ve been telling audiences for my MySQL Performance talk the following: if you are running an I/O-intensive database on spinning disks you’re doing it wrong. But there are still a surprising number of laggards who aren’t embracing SSD storage (whether it’s for cost or reliability reasons).

Let’s look at cost first. As I write this now (September 2016), high-performance server-grade spinning hard drives run for about $240 for 600GB (or $0.40 per GB).  Of course, you can get an 8TB archive drive at about same price (about $0.03 per GB), but it isn’t likely you’d use something like that for your operational database. At the same time, you can get a Samsung 850 EVO drive for approximately $300 (or $0.30 per GB), which is cheaper than the server-grade spinning drive!  

While it’s not the best drive money can buy, it is certainly an order of magnitude faster than any spinning disk drive!

(I’m focusing on the cost per GB rather than the cost of the number of IOPS per drive as SSDs have overtaken HDDs years ago when it comes to IOPS/$.)

If we take a look at Amazon EBS pricing, we will find that Amazon has moved to SSD volumes by default as “General Purpose” storage (gp2). Prices for this volume type run about 2x higher per GB than high-performance HDD-based volumes (st1) and provisioned IOPs volumes. The best volumes for databases will likely run you 4x higher than HDD.

This appears to be a significant cost difference, but keep in mind you can get much more IOPS at much better latency from these volumes. They also handle IO spikes better, which is very important for real workloads.

Whether we’re looking at a cloud or private environment, it is wrong just to look at the cost of the storage alone – you must look at the whole server cost. When using an SSD, you might not need to buy a RAID card with battery-backed-up (BBU) cache, as many SSDs have similar functions built in.

(For some entry-level SSDs, there might be an advantage to purchasing a RAID with BBU, but it doesn’t affect performance nearly as much as for HDDs. This works out well, however, as entry level SSDs aren’t going to cost that much to begin with and won’t make this setup particularly costly, relative to a higher-end SSD.)  

Some vendors can charge insane prices for SSDs, but this is where you should negotiate and your alternative vendor choice powers.

Some folks are concerned they can’t get as much storage per server with SSDs because they are smaller. This was the case a few years back, but not any more. You can find a 2TB 2.5” SSD drive easily, which is larger than the available 2.5” spinning drives. You can go as high as 13TB in the 2.5” form factor

There is a bit of challenge if you’re looking at the NVMe (PCI-E) cards, as you typically can’t have as many of those per server as you could using spinning disks, but the situation is changing here as well with the 6.4TB SX300 from Sandisk/FusionIO or the PM1725 from Samsung. Directly attached storage provides extremely high performance and 10TB-class sizes.  

To get multiple storage units together, you can use hardware RAID, software RAID, LVM striping or some file systems (such as ZFS) can take care of it for you.    

Where do we stand with SSD reliability? In my experience, modern SSDs (even inexpensive ones) are pretty reliable, particularly for online data storage. The shelf life of unpowered SSDs is likely to be less than HDDs, but we do not really keep servers off for long periods of time when running database workloads. Most SSDs also do something like RAID internally (it’s called RAIN) in addition to error correction codes that protect your data from a full single flash chip.

In truth, focusing on storage-level redundancy is overrated for databases. We want to protect most critical applications from complete database server failure, which means using some form of replication, storing several copies of data. In this case, you don’t need bulletproof storage on a single server – just a replication setup where you won’t lose the data and any server loss is easy to handle. For MySQL, solutions like Percona XtraDB Cluster come handy. You can use external tools such as Orchestrator or MHA to make MySQL replication work.  

When it comes to comparing SSD vs. HDD performance, whatever you do with SSDs they will likely still perform better than HDDs. Your RAID5 and RAID6 arrays made from SSDs will beat your RAID10 and RAID0 made from HDDs (unless your RAID card is doing something nasty).

Another concern with SSD reliability is write endurance. SSDs indeed have a specified amount of writes they can handle (after which they are likely to fail). If you’re thinking about replacing HDDs with SSDs, examine how long SSDs would endure under a comparable write load.  

If we’re looking at a high HDD write workload, a single device is likely to handle 200 write IOPS of 16KB (when running InnoDB). Let’s double that. That comes to 6.4MB/sec, which gives us  527GB/day (doing this 24/7). Even with the inexpensive Samsung 850 Pro we get 300TB of official write endurance – enough for 1.5 years. And in reality, drives tend to last well beyond their official specs.    

If you don’t like living on the edge, more expensive server-grade storage options have much better endurance. For example, 6.4TB SX300 offers almost 100x more endurance at 22 Petabytes written.

In my experience, people often overestimate how many writes their application performs on a sustained basis. The best approach is to do the math, but also monitor the drive status with a SMART utility or vendor tool. The tools can alert you in advance when drive wears out.

Whatever your workload is, you will likely find an SSD solution that offers you enough endurance while significantly exceeding the performance of an HDD-based solution.

Finally, there is a third and very important component of SSD reliability for operational database workloads: not losing your data during a power failure. Many “consumer-grade” SSDs come with drive write cache enabled by default, but without proper power loss protection. This means you can lose some writes during a power failure, causing data loss or database corruption.

Disabling write cache is one option, though it can severely reduce write performance and does not guarantee data won’t be lost. Using enterprise-grade SSDs from a reputable vendor is another option, and testing SSDs yourself might be a good idea if you’re on a budget.  

Conclusion

When it comes to operational databases, whether your workload is on-premises or in the cloud,  Don’t spin your data – use SSD. There are choices and options for almost any budget and every workload.

by Peter Zaitsev at September 09, 2016 08:49 PM

Basic Housekeeping for MySQL Indexes

MySQL Indexes

MySQL IndexesIn this blog post, we’ll look at some of the basic housekeeping steps for MySQL indexes.

We all know that indexes can be the difference between a high-performance database and a bad/slow/painful query ride. It’s a critical part that needs deserves some housekeeping once in a while. So, what should you check? In no particular order, here are some things to look at:

1. Unused indexes

With sys schema, is pretty easy to find unused indexes: use the schema_unused_indexes view.

mysql> select * from sys.schema_unused_indexes;
+---------------+-----------------+-------------+
| object_schema | object_name     | index_name  |
+---------------+-----------------+-------------+
| world         | City            | CountryCode |
| world         | CountryLanguage | CountryCode |
+---------------+-----------------+-------------+
2 rows in set (0.01 sec)

This view is based on the performance_schema.table_io_waits_summary_by_index_usage table, which will require enabling the Performance Schema, the events_waits_current consumer and the wait/io/table/sql/handler instrument. PRIMARY (key) indexes are ignored.

If you don’t have them enabled, just execute these queries:

update performance_schema.setup_consumers set enabled = 'yes' where name = 'events_waits_current';
update performance_schema.setup_instruments set enabled = 'yes' where name = 'wait/io/table/sql/handler';

Quoting the documentation:

“To trust whether the data from this view is representative of your workload, you should ensure that the server has been up for a representative amount of time before using it.”

And by representative amount, I mean representative: 

  • Do you have a weekly job? Wait at least one week
  • Do you have monthly reports? Wait at least one month
  • Don’t rush!

Once you’ve found unused indexes, remove them.

2. Duplicated indexes

You have two options here:

  • pt-duplicate-key-checker
  • the schema_redundant_indexes view from sys_schema

The pt-duplicate-key-checker is part of Percona Toolkit. The basic usage is pretty straightforward:

[root@e51d333b1fbe mysql-sys]# pt-duplicate-key-checker
# ########################################################################
# world.CountryLanguage
# ########################################################################
# CountryCode is a left-prefix of PRIMARY
# Key definitions:
#   KEY `CountryCode` (`CountryCode`),
#   PRIMARY KEY (`CountryCode`,`Language`),
# Column types:
#      	  `countrycode` char(3) not null default ''
#      	  `language` char(30) not null default ''
# To remove this duplicate index, execute:
ALTER TABLE `world`.`CountryLanguage` DROP INDEX `CountryCode`;
# ########################################################################
# Summary of indexes
# ########################################################################
# Size Duplicate Indexes   2952
# Total Duplicate Indexes  1
# Total Indexes            37

Now, the schema_redundant_indexes view is also easy to use once you have sys schema installed. The difference is that it is based on the information_schema.statistics table:

mysql> select * from schema_redundant_indexesG
*************************** 1. row ***************************
              table_schema: world
                table_name: CountryLanguage
      redundant_index_name: CountryCode
   redundant_index_columns: CountryCode
redundant_index_non_unique: 1
       dominant_index_name: PRIMARY
    dominant_index_columns: CountryCode,Language
 dominant_index_non_unique: 0
            subpart_exists: 0
            sql_drop_index: ALTER TABLE `world`.`CountryLanguage` DROP INDEX `CountryCode`
1 row in set (0.00 sec)

Again, once you find the redundant index, remove it.

3. Potentially missing indexes

The statements summary tables from the performance schema have several interesting fields. For our case, two of them are pretty important: NO_INDEX_USED (means that the statement performed a table scan without using an index) and NO_GOOD_INDEX_USED (“1” if the server found no good index to use for the statement, “0” otherwise).

Sys schema has one view that is based on the performance_schema.events_statements_summary_by_digest table, and is useful for this purpose: statements_with_full_table_scans, which lists all normalized statements that have done a table scan.

For example:

mysql> select * from world.CountryLanguage where isOfficial = 'F';
55a208785be7a5beca68b147c58fe634  -
746 rows in set (0.00 sec)
mysql> select * from statements_with_full_table_scansG
*************************** 1. row ***************************
                   query: SELECT * FROM `world` . `Count ... guage` WHERE `isOfficial` = ?
                      db: world
              exec_count: 1
           total_latency: 739.87 us
     no_index_used_count: 1
no_good_index_used_count: 0
       no_index_used_pct: 100
               rows_sent: 746
           rows_examined: 984
           rows_sent_avg: 746
       rows_examined_avg: 984
              first_seen: 2016-09-05 19:51:31
               last_seen: 2016-09-05 19:51:31
                  digest: aa637cf0867616c591251fac39e23261
1 row in set (0.01 sec)

The above query doesn’t use an index because there was no good index to use, and thus was reported. See the explain output:

mysql> explain select * from world.CountryLanguage where isOfficial = 'F'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: CountryLanguage
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 984
        Extra: Using where

Note that the “query” field reports the query digest (more like a fingerprint) instead of the actual query.

In this case, the CountryLanguage table is missing an index over the “isOfficial” field. It is your job to decide whether it is worth it to add the index or not.

4. Multiple column indexes order

It was explained before that Multiple Column index beats Index Merge in all cases when such index can be used, even when sometimes you might have to use index hints to make it work.

But when using them, don’t forget that the order matters. MySQL will only use a multi-column index if at least one value is specified for the first column in the index.

For example, consider this table:

mysql> show create table CountryLanguageG
*************************** 1. row ***************************
       Table: CountryLanguage
Create Table: CREATE TABLE `CountryLanguage` (
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `Language` char(30) NOT NULL DEFAULT '',
  `IsOfficial` enum('T','F') NOT NULL DEFAULT 'F',
  `Percentage` float(4,1) NOT NULL DEFAULT '0.0',
  PRIMARY KEY (`CountryCode`,`Language`),
  KEY `CountryCode` (`CountryCode`),
  CONSTRAINT `countryLanguage_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `Country` (`Code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

A query against the field “Language” won’t use an index:

mysql> explain select * from CountryLanguage where Language = 'English'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: CountryLanguage
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 984
        Extra: Using where

Simply because it is not the leftmost prefix for the Primary Key. If we add the “CountryCode” field, now the index will be used:

mysql> explain select * from CountryLanguage where Language = 'English' and CountryCode = 'CAN'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: CountryLanguage
         type: const
possible_keys: PRIMARY,CountryCode
          key: PRIMARY
      key_len: 33
          ref: const,const
         rows: 1
        Extra: NULL

Now, you’ll have to also consider the selectivity of the fields involved. Which is the preferred order?

In this case, the “Language” field has a higher selectivity than “CountryCode”:

mysql> select count(distinct CountryCode)/count(*), count(distinct Language)/count(*) from CountryLanguage;
+--------------------------------------+-----------------------------------+
| count(distinct CountryCode)/count(*) | count(distinct Language)/count(*) |
+--------------------------------------+-----------------------------------+
|                               0.2368 |                            0.4644 |
+--------------------------------------+-----------------------------------+

So in this case, if we create a multi-column index, the preferred order will be (Language, CountryCode).

Placing the most selective columns first is a good idea when there is no sorting or grouping to consider, and thus the purpose of the index is only to optimize where lookups. You might need to choose the column order, so that it’s as selective as possible for the queries that you’ll run most.

Now, is this good enough? Not really. What about special cases where the table doesn’t have an even distribution? When a single value is present way more times than all the others? In that case, no index will be good enough. Be careful not to assume that average-case performance is representative of special-case performance. Special cases can wreck performance for the whole application.

In conclusion, we depend heavily on proper indexes. Give them some love and care once in a while, and the database will be very grateful.

All the examples were done with the following MySQL and Sys Schema version:

mysql> select * from sys.version;
+-------------+-----------------+
| sys_version | mysql_version   |
+-------------+-----------------+
| 1.5.1       | 5.6.31-77.0-log |
+-------------+-----------------+

by Daniel Guzmán Burgos at September 09, 2016 05:44 PM

September 08, 2016

Peter Zaitsev

MySQL Replication Troubleshooting: Q & A

MySQL Replication Troubleshooting

MySQL Replication TroubleshootingIn this blog, I will provide answers to the Q & A for the MySQL Replication Troubleshooting webinar.

First, I want to thank everybody for attending the August 25 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: Hi Sveta. One question: how is it possible to get N previous events using the SHOW BINLOG EVENTS command? For example, the position is 999 and I want to analyze the previous five events. Is it possible?

A: Not, there is no such option. You cannot get the previous five events using

SHOW BINLOG EVENTS
. However, you can use
mysqlbinlog
 with the option
--stop-position
 and tail its output.

Q: We are having issues with inconsistencies over time. We also have a lot of “waiting for table lock” statuses during high volume usage. Would changing these tables to InnoDB help the replicated database remain consistent?

A: Do you use MyISAM? Switching to InnoDB might help, but it depends on what types of queries you use. For example, if you often use the 

LOCK TABLE
  command, that will cause a 
"waiting for table lock"
  error for InnoDB too. Regarding data consistency between the master and slave, you need to use row-based replication.

Q: For semi-sync replication, what’s the master’s behavior when the master never received ACK from any of the slaves?

A: It will timeout after

rpl_semi_sync_master_timeout
  milliseconds, and then switch to asynchronous replication.

Q: We’re using MySQL on r3.4xlarge EC2 instances (16 CPU). We use RBR. innodb_read_io_threads and innodb_write_io_threads =4. We often experience lags. Would increasing these to eight offer better IO for slaves? What other parameters could boost slave IO?

A: Yes, an increased number of IO threads would most likely improve performance. Other parameters that could help are similar to the ones discussed in “InnoDB Troubleshooting” and “Introduction to Troubleshooting Performance: What Affects Query Execution?” webinars. You need to pay attention to InnoDB options that affect IO (

innodb_thread_concurrency, innodb_flush_method, innodb_flush_log_at_trx_commit, innodb_flush_log_at_timeout
 ) and general IO options, such as
sync_binlog
 .

Q: How many masters can I have working together?

A: What do you mean by “how many masters can [you] have working together”? Do you mean circular replication or a multi-master setup? In any case, the only limitation is hardware. For a multi-master setup you should ensure that the slave has enough resources to process all requests. For circular replication, ensure that each of the masters in the chain can handle the increasing number of writes as they replicate down the chain, and do not lead to permanently increasing slave lags.

Q: What’s the best way to handle auto_increment?

A: Follow the advice in the user manual: set

auto_increment_offset
  to a unique value on each of servers,
auto_increment_increment
  to the number of servers and never update auto-incremented columns manually.

Q: I configured multi threads replication. Sometimes the replication lag keeps increasing while the slave was doing “invalidating query cache entries(table)”.  How should I do to fine tune it?

A: The status

"invalidating query cache entries(table)"
 means that the query cache is invalidating entries, and has been changed by a command currently being executed by the slave SQL thread. To avoid this issue, you need to keep the query cache small (not larger than 512 MB) and de-fragment it from time to time using the FLUSH QUERY CACHE command.

Q: Sometimes when IO is slow and during lag we see info: Reading event from the relay log “Waiting for master to send event” — How do we troubleshoot to get more details.

A: The

"Waiting for master to send event"
 state shows that the slave IO thread sent a request for a new event, and is waiting for the event from the master. If you believe it hasn’t received the event in a timely fashion, check the error log files on both the master and slave for connection errors. If there is no error message, or if the message doesn’t provide enough information to solve the issue, use the network troubleshooting methods discussed in the “Troubleshooting hardware resource usage” webinar.

Save

by Sveta Smirnova at September 08, 2016 06:53 PM

Percona is Hiring: Director of Platform Engineering

percona is hiring

percona is hiringPercona is hiring a Director of Platform Engineering. Find out more!

At Percona, we recognize you need much more than just a database server to successfully run a database-powered infrastructure. You also need strong tools that deploy, manage and monitor the software. Percona’s Platform Engineering group is responsible just for that. They build next-generation open source solutions for the deployment, monitoring and management of open source databases.

This  team is currently responsible for products such as Percona Toolkit , Percona Monitoring Plugins and Percona Monitoring and Management.  

Percona builds products that advance state-of-the-art open source software. Our products help our customers monitor and manage their databases. They help our services team serve customers faster, better and more effectively.

The leader of the Platform Engineering group needs a strong vision, as well as an understanding of market trends, best practices for automation, monitoring and management – in the cloud and on premises. This person must have some past technical operations background and experience building and leading engineering teams that have efficiently delivered high-quality software. The ideal candidate will also understand the nature of open source software development and experience working with distributed teams.

This position is for “player coach” – you will get your hands dirty writing code, performing quality assurance, making great documentation and assisting customers with troubleshooting.

We not looking for extensive experience with a particular programming language, but qualified candidates should be adept at learning new programming languages. Currently, our teams use a combination of Perl, Python, Go and Javascript.

The Director of Platform Engineering reports to Vadim Tkachenko, CTO and VP of Engineering. They will also work closely with myself, other senior managers and experts at Percona.

Interested? Please apply here on Percona’s website.

by Peter Zaitsev at September 08, 2016 06:20 PM

Jean-Jerome Schmidt

Planets9s - Try the new ClusterControl 1.3.2 with its new deployment wizard

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

Try the new ClusterControl 1.3.2 with its new deployment wizard

This week we’re delighted to announce the release of ClusterControl 1.3.2, which includes the following features: a new alarm viewer and new deployment wizard for MySQL, MongoDB & PostgreSQL making it ever easier to deploy your favourite open source databases; it also includes deployment of MongoDB sharded clusters as well as MongoDB advisors. If you haven’t tried it out yet, now is the time to download this latest release and provide us with your feedback.

Download the new ClusterControl

New partnership with WooServers helps start-ups challenge Google, Amazon and Microsoft

In addition to announcing the new ClusterControl 1.3.2 this week, we’ve also officially entered into a new partnership with WooServers to bring ClusterControl to web hosting. WooServers is a web hosting platform, used by 5,500 businesses, such as WhiteSharkMedia and SwiftServe to host their websites and applications. With ClusterControl, WooServers makes available a managed service that includes comprehensive infrastructure automation and management of MySQL-based database clusters. The service is available on WooServers data centers, as well as on Amazon Web Services and Microsoft Azure.

Find out more

Sign up for Part 2 of our MySQL Query Tuning Trilogy: Indexing and EXPLAIN

You can now sign up for Part 2 of our webinar trilogy on MySQL Query Tuning. In this follow up webinar to the one on process and tools, we’ll cover topics such as SQL tuning, indexing, the optimizer and how to leverage EXPLAIN to gain insight into execution plans. More specifically, we’ll look at how B-Tree indexes are built, indexes MyISAM vs. InnoDB, different index types such as B-Tree, Fulltext and Hash, indexing gotchas and an EXPLAIN walkthrough of a query execution plan.

Sign up today

How to set up read-write split in Galera Cluster using ProxySQL

ProxySQL is an SQL-aware load balancer for MySQL and MariaDB. A scheduler was recently added, making it possible to execute external scripts from within ProxySQL. In this new blog post, we’ll show you how to take advantage of this new feature to perform read-write splits on your Galera Cluster.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at September 08, 2016 02:23 PM

Colin Charles

Speaking at Percona Live Europe Amsterdam

I’m happy to speak at Percona Live Europe Amsterdam 2016 again this year (just look at the awesome schedule). On my agenda:

I’m also signed up for the Community Dinner @ Booking.com, and I reckon you should as well – only 35 spots remain!

Go ahead and register now. You should be able to search Twitter or the Percona blog for discount codes :-)

by Colin Charles at September 08, 2016 11:20 AM

September 07, 2016

Peter Zaitsev

Percona Live Europe featured talk with Igor Canadi — Everything you wanted to know about MongoRocks

percona live europe featured talk

Percona Live Europe featured talkWelcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Igor Canadi, Software Engineer at Facebook, Inc. His talk will be on Everything you wanted to know about MongoRocks. MongoRocks is MongoDB with RocksDB storage engine. It was developed by Facebook, where it’s used to power mobile backend as a service provider Parse.

I had a chance to speak with Igor and learn a bit more about these questions:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it?

Igor: After I finished my undergrad at the University of Zagreb in Croatia, I joined University of Wisconsin-Madison’s Masters program. Even though UW-M is famous for its work on databases, during my two years there I worked in a different area. However, as I joined Facebook after school, I heard of a cool new project called RocksDB. Everything about building a new storage engine sounded exciting to me, although I had zero idea how thrilling the ride will actually be. The best part was working with and getting to know amazing people from Facebook, Parse, MongoDB, Percona, and many other companies that are using or experimenting with RocksDB.

Percona: Your talk is called “Everything you wanted to know about MongoRocks.” Briefly, what is MongoRocks and why did it get developed?

Igor: Back in 2014 MongoDB announced that they are building a pluggable storage engine API, which would enable MongoDB users to seamlessly choose a storage engine that works best for their workload. Their first prototype was actually using RocksDB as a storage engine, which was very exciting for us. However, they bought WiredTiger soon after, another great storage engine, and decided to abandon MongoDB+RocksDB project. At the same time, Parse was running into scaling challenges with their MongoDB deployment. We decided to help out and take over the development of MongoRocks. We started rolling it out at Parse in March of 2015 already and completed the rollout in October. Running MongoRocks instead of MongoDB with the MMap storage engine resulted in much greater efficiency and lower latencies in some scenarios. Some of the experiences are captured in Parse’s blog posts: http://blog.parse.com/announcements/mongodb-rocksdb-parse/ and http://blog.parse.com/learn/engineering/mongodb-rocksdb-writing-so-fast-it-makes-your-head-spin/

Percona: What are the workloads and database environments that are best suited for a MongoRocks deployment? Do you see and expansion of the solution to encompass other scenarios?

Igor: Generally speaking, MongoRocks should compress really well. Over the years of using LSM engines, we learned that its compression rates are hard to beat. The difference can sometimes be substantial. For example, many benchmarks of MyRocks, which is a MySQL with RocksDB storage engines, have shown that compressed InnoDB uses two times as much space as compressed RocksDB. With better compression, more of your data fits in memory, which could also improve read latencies and lower the stress on storage media. However, this is a tricky question to answer generally. It really depends on the metrics you care about. One great thing about Mongo and different storage engines is that the replication format is the same across all of them, so it’s simple to try it out and see how it performs under your workload. You can just add an additional node in your replica set that’s using RocksDB and monitor the metric you care about on that node.

Percona: What are the unique database requirements at Facebook that keep you awake at night? What would you most like to see feature-wise in MongoDB in the near future (or any database technology)?

Igor: One of the most exciting database projects that we’re working on at Facebook is MyRocks, which I mentioned previously. Currently, we use MySQL with InnoDB to store our Facebook graph and we are experimenting with replacing that with MyRocks. The main motivation behind the project is 2x better compression rates, but we also see better performance in some areas. If you’re attending Percona Live Europe I encourage you to attend either Mark Callaghan’s talk on MyRocks, or Yoshinori’s 3-hour tutorial to learn more.

Percona: What are looking forward to the most at Percona Live Europe this year?

Igor: The best part of attending conferences is the people. I am looking forward to seeing old friends and meeting new ones. If you like to talk storage engines, hit me up!

You can read more about Igor’s thoughts on MongoRocks at his twitter feed.

Want to find out more about Igor, Facebook and MongoRocks? Register for Percona Live Europe 2016, and come see his talk Everything you wanted to know about MongoRocks.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

by Dave Avery at September 07, 2016 05:47 PM

Get MySQL Passwords in Plain Text from .mylogin.cnf

MySQL Passwords

MySQL PasswordsThis post will tell you how to get MySQL passwords in plain text using the .mylogin.cnf file.

Since MySQL 5.6.6, it became possible to store MySQL credentials in an encrypted login path file named .mylogin.cnf, using the mysql_config_editor tool. This is better than in plain text anyway.

What if I need to read this password in plain text?

Perhaps because I didn’t save it? It might be that I don’t need it for long (as I can reset it), but it’s important that I get it. 😎

Unfortunately (or intentionally),

mysql_config_editor
 doesn’t allow it.

[root@db01 ~]# cat /root/.mylogin.cnf
????uUd????ٞN??3k??ǘ);??Ѻ0
                         ?'?(??W.???Xܽ<'?C???ha?$
??
r(?q`?+[root@db01 ~]#
[root@db01 ~]#
[root@db01 ~]# mysql_config_editor print --all
[client]
user = root
password = *****
[root@db01 ~]#

I wrote this blog post because I just faced this issue. I needed to get the password out of there. Surprisingly, it is simpler than I thought. While looking for an answer I found that some people created scripts to decrypt it (as it uses the AES-128 ECB algorithm), sometimes getting the MySQL code source or simply using a scripting language.

However, it turns to be very simple.

mysql_config_editor
 does not provide this option, but
my_print_defaults
 does!

[root@db01 ~]# my_print_defaults -s client
--user=root
--password=Xb_aOh1-R33_,_wPql68
[root@db01 ~]#

my_print_defaults
 is a standard tool in the MySQL server package. Keep your passwords safe!

by Roman Vynar at September 07, 2016 04:25 PM

Jean-Jerome Schmidt

Press Release: Severalnines and WooServers bring ClusterControl to web hosting

New partnership helps start-ups challenge Google, Amazon and Microsoft

Stockholm, Sweden and anywhere else in the world - 07 September 2016 - Severalnines, the provider of database automation and management software for open source databases, today announced its latest partnership with WooServers. WooServers is a web hosting platform, used by 5,500 businesses, such as WhiteSharkMedia and SwiftServe to host their websites and applications.

WooServers will make available, in this partnership with Severalnines, a managed service that includes comprehensive infrastructure automation and management of MySQL-based database clusters. The service is available on WooServers data centers, as well as on Amazon Web Services and Microsoft Azure. The main target group is online businesses who rely on solid IT infrastructure, but wish to avoid the operational heavy lifting of managing high availability databases across data centers, or even across different cloud providers.

The Severalnines and WooServers’ partnership began in August 2015. WooServers felt there was a gap in the market for providing managed database services across different cloud providers. A large portion of their own clients were using more than one cloud provider to take advantage of features offered by one provider over another. However, without a distributed database infrastructure, it is hard to be cloud agnostic. Ultimately, where the data is determines where the service or application resides. And the larger the data, the harder it is to move.

Businesses can now quickly be up and running with a distributed database across the cloud services of their choice, knowing the service is managed by hosting and database experts. It is powered by MySQL clusters, with ClusterControl to automate deployment, and operational tasks such as configuration, topology changes, patching, backups, and failure recovery.

Aleksey Krasnov, co-founder of WooServers, said, “Cloud providers are trying hard to differentiate from each other, and a hyper competitive market is good news for clients. But there is very little in terms of cross-cloud compatibility, especially for database services. Being tied to a particular service may not be optimal in the long run, because if the costs escalate and you want to move to a more cost-effective vendor, you’re in for a big surprise.

We provide choice and flexibility to our clients, and our service is cloud agnostic. Since we’re using ClusterControl, it means you could even bring the data in-house and run on your own infrastructure.“

Vinay Joosery, Severalnines CEO, said, “Operational management of databases is not something you’d want your developers to spend too much time on, which makes a managed service a very logical choice. However, yielding too much control over your data to any single cloud vendor is scary. WooServers provides a service that allows businesses to benefit from cloud economics, while keeping control over their data.”

One customer who is relying on this partnership is digital advertising services provider Plexiads. After its launch in 2015, investing in IT development was a priority for Plexiads, to ensure they can compete with Google Adsense by delivering the highest quality service to its customers all over the world, especially in Africa and the Middle East. A highly available database that is resistant to outages, is vital to keep customers’ advertising running and capitalise on every revenue opportunity.

Abdelkhalek Baou, Co-CEO and Founder of Plexiads, said: “Our products challenge web giants such as Google Adsense and Adwords, and it’s important to us to keep customers satisfied in a market which is worth $60 billion. Data is a critical part of our business, and we were looking for a partner to help us build and manage a state of the art data tier for our applications. We have been very happy with WooServers and Severalnines, and feel confident we will be able to expand the infrastructure as the business grows.”

About Severalnines

Severalnines provides automation and management software for database clusters. We help companies deploy their databases in any environment, and manage all operational aspects to achieve high-scale availability.

Severalnines' products are used by developers and administrators of all skills levels to provide the full 'deploy, manage, monitor, scale' database cycle, thus freeing them from the complexity and learning curves that are typically associated with highly available database clusters. The company has enabled over 8,000 deployments to date via its popular online database configurator. Currently counting BT, Orange, Cisco, CNRS, Technicolour, AVG, Ping Identity and Paytrail as customers. Severalnines is a private company headquartered in Stockholm, Sweden with offices in Singapore and Tokyo, Japan. To see who is using Severalnines today visit, http://www.severalnines.com/company.

About WooServers

Since 2009 WooServers offers web hosting, virtual and dedicated servers with full management for the price of unmanaged solutions of its competitors. Serving more than 5500 clients WooServers has always strived to take over all system administration hassle from the client and provide hands-on support, unmatched in the industry.

During the past 3 years WooServers made a considerable expansion into the SMB client segment in order to utilize its infrastructure and DB management experience and offer custom solutions to clients with more complicated requirements. As a result, a number of new products were introduced, such Microsoft Azure Management and High-Availability Database Clusters in partnership with Severalnines. 

by Severalnines at September 07, 2016 02:35 PM

September 06, 2016

Peter Zaitsev

MyRocks Docker images

MyRocks Docker images

homepage-docker-logoIn this post, I’ll point you to MyRocks Docker images with binaries, allowing you to install and play with the software.

During the @Scale conference, Facebook announced that MyRocks is mature enough that it has been installed on 5% of Facebook’s MySQL slaves. This has saved 50% of the space on these slaves, which allows them to decrease the number of servers by half. Check out the announcement here:  https://code.facebook.com/posts/190251048047090/myrocks-a-space-and-write-optimized-mysql-database/

Those are pretty impressive numbers, so I decided to take a serious look at MyRocks. The biggest showstopper is usually binary availability, since Facebook only provides the source code: https://github.com/facebook/mysql-5.6.

You can get the image from https://hub.docker.com/r/perconalab/myrocks/.

To start MyRocks:

docker run -d --name myr -P  perconalab/myrocks

To access it, use a regular MySQL client:

mysql -h127.0.0.1

From there you should see RocksDB installed:

show engines;
+------------+---------+----------------------------------------------------------------+--------------+------+------------+
| Engine | Support | Comment | Transactions | XA | Savepoints |
+------------+---------+----------------------------------------------------------------+--------------+------+------------+
| ROCKSDB | DEFAULT | RocksDB storage engine | YES | YES | YES |

I hope it makes easier to start experimenting with MyRocks!

by Vadim Tkachenko at September 06, 2016 08:28 PM

MongoDB at Percona Live Europe

MongoDB at Percona Live Europe

MongoDB at Percona Live EuropeThis year, you will find a great deal about MongoDB at Percona Live Europe.

As we continue to work on growing the independent MongoDB ecosystem, this year’s Percona Live Europe in Amsterdam includes many talks about MongoDB. If your company uses MongoDB technologies, is focused exclusively on developing with MongoDB or MongoDB operations, or is just evaluating MongoDB, attending Percona Live Europe will prove a valuable experience.  

As always with Percona Live conferences, the focus is squarely on the technical content — not sales pitches. We encourage our speakers to tell the truth: the good, the bad and the ugly. There is never a “silver bullet” when it comes to technology — only tradeoffs between different solution options.

As someone who has worked in database operations for more than 15 years, I recognize and respect the value of “negative information.” I like knowing what does not work, what you should not do and where trouble lies. Negative information often proves more valuable than knowing how great the features of a specific technology work — especially since the product’s marketing team tends to highlight those very well (and they seldom require independent coverage).

For MongoDB at this year’s Percona Live Europe:
  • We have talks about MongoRocks, a RocksDB powered storage engine for MongoDB — the one you absolutely need to know about if you’re looking to run the most efficient MongoDB deployment at scale!  
  • We will cover MongoDB Backups best practices, as well as several talks about MongoDB monitoring and management  (1, 2, 3) — all of them with MongoDB Community Edition and Percona Server for MongoDB (so they don’t require a MongoDB Enterprise subscription).

There will also be a number of talks about how MongoDB interfaces with other technologies. We show how ToroDB can use the MongoDB protocol while storing data in a relational database (and why that might be a good idea), we contrast and compare MySQL and MongoDB Geospatial features, and examine MongoDB from MySQL DBA point of view.

We also how to use Apache Spark to unify data from MongoDB, MySQL, and Redis, and what are generally the best practices for choosing databases for different application needs.

Finally, if you’re just starting with MongoDB and would like a jump start before attending more detailed MongoDB talks, we’ve got a full day MongoDB 101 tutorial for you.

Join us for the full conference, or register for just one day if that is all your schedule allows. But come to Percona Live Europe in Amsterdam on October 3-5 to get the best and latest MongoDB information.

by Peter Zaitsev at September 06, 2016 03:28 PM

Jean-Jerome Schmidt

ClusterControl 1.3.2 Released with Key MongoDB Features - Sharded Deployments, Cluster-Consistent Backups, Advisors and more

The Severalnines team is pleased to announce the release of ClusterControl 1.3.2.

This release contains new features, such as deploying MongoDB sharded clusters and scheduling cluster-consistent backups, MongoDB Advisors, a new alarm viewer and new deployment wizard for MySQL, MongoDB & PostgreSQL, along with performance improvements and bug fixes.

Highlights

  • For MongoDB
    • Deploy or add existing MongoDB sharded clusters
    • Support for Percona consistent MongoDB backup
    • Manage MongoDB configurations
    • MongoDB Advisors
    • New overview page for sharded clusters and performance graphs
  • For MySQL, MongoDB & PostgreSQL
    • New Alarm Viewer
    • New Deployment Wizard

For more details and resources:

Deploy or add existing MongoDB sharded clusters

Not only can users now deploy MongoDB sharded clusters, but adding your existing sharded cluster to ClusterControl is as easy as adding a replica set: all shard routers in the cluster need to be specified with its credentials, and ClusterControl will automatically discover all shards and replica sets in the cluster. Supports Percona MongoDB and MongoDB Inc v3.2.

Manage MongoDB configurations

MongoDB configuration management includes functionality such as: change the configuration, import configurations for all nodes and define/alter templates. Users can immediately change the whole configuration file and write this configuration back to the database node. With this latest release of ClusterControl, users can manage MongoDB configurations even more intuitively than before.

New overview page for sharded clusters and performance graphs

The MongoDB stats and performance overview can be found under the Performance tab of your ClusterControl instance. Mongo Stats is an overview of the output of mongostat and the Performance overview gives a good graphical overview of the MongoDB opcounters. It now includes a per replicaSet/Config Server/router view for sharded clusters, alongside performance graphs.

Support for Cluster Consistent MongoDB backup

ClusterControl now supports Percona’s Consistent MongoDB backup, if installed on the ClusterControl controller node. Percona’s Consistent MongoDB backup is able to create a consistent cluster backup across many separate shards. It auto-discovers healthy members for backup by considering replication lag, replication 'priority' and by preferring 'hidden' members.

New Alarm Viewer

The enhanced viewer gives the user a better overview of events, especially if you have multiple clusters. Alarms and jobs for all clusters are now consolidated in a single view with a timeline of each event/alarm. Click on each event name to view related logs to the event/alarm and take the relevant actions required to avoid potential issues.

New Deployment Wizard

It is now possible to create entire master-slave setups in one go via our new deployment wizard. In previous versions, one had to first create a master, and afterwards, add slaves to it. You can now do it all in one go. The wizard supports MySQL Replication, MySQL Galera, MySQL/NDB, MongoDB ReplicaSet, MongoDB Shards and PostgreSQL.

There is a bunch of other improvements that we have not mentioned here. You can find all details in the ChangeLog.

We encourage you to test this latest release and provide us with your feedback. If you’d like a demo, feel free to request one.

With over 8,000 users to date, ClusterControl is the leading, platform independent automation and management solution for MySQL, MariaDB, Percona, MongoDB and PostgreSQL.

Thank you for your ongoing support, and happy clustering!

For additional tips & tricks, follow our blog: http://www.severalnines.com/blog/.

by Severalnines at September 06, 2016 12:27 PM

Shlomi Noach

gh-ost 1.0.17: Hooks, Sub-second lag control, Amazon RDS and more

gh-ost version 1.0.17 is now released, with various additions and fixes. Here are some notes of interest:

Hooks

gh-ost now supports hooks. These are your own executables that gh-ost will invoke at particular points of interest (validation pass, about to cut-over, success, failure, status, etc.)

gh-ost will set various environment variables for your executables to pick up, passing along such information as migrated/ghost table name, elapsed time, processed rows, migrated host etc.

Sub-second lag control

At GitHub we're very strict about replication lag. We keep it well under 1 second at most times. gh-ost can now identify sub-second lag on replicas (well, you need to supply with the right query). Our current production migrations are set by default with --max-lag-millis=500 or less, and our most intensive migrations keep replication lag well below 1sec or even below 500ms

No SUPER

The SUPER privilege is required to set global binlog_format='ROW' and for STOP SLAVE; START SLAVE;

If you know your replica has RBR, you can pass --assume-rbr and skips those steps.

RDS

Hooks + No Super = RDS, as seems to be the case. For --test-on-replica you will need to supply your own gh-ost-on-stop-replication hook, to stop your RDS replica at cut-over phase. See this tracking issue

master-master

While active-active are still not supported, you now have greater control over master-master topologies by being able to explicitly pick your master (as gh-ost arbitrarily picks one of the co-masters). Do so by passing --assume-master-host. See cheatsheet.

tungsten replicator

Similarly, gh-ost cannot crawl your tungsten topology, and you are able to specify --tungsten --assume-master-host=the.master.com. See cheatsheet.

Concurrent-rowcount

--exact-rowcount is awesomeness, keeping quite accurate estimate of progress. With --concurrent-rowcount we begin migration with a rough estimate, and execute select count(*) from your_table in parallel, updating our estimate later on throughout the migration

Stricter, safer

gh-ost works in STRICT_ALL_TABLES mode, meaning it would fail rather than set the wrong value to a column.

In addition to unit-testing and production continuous test, a set of local tests is growing, hopefully to run as CI tests later on.

Fixed problems

Fixed time_zone related bug, high unsigned values bug; added strict check for triggers, relaxed config file parsing, and more. Thank you to community contributors for PRs, from ipv6 to typos!

Known issues

Issues coming and going at all times -- thank you for reporting Issues!

We have a confirmed bug with non-UTF charsets at this time. Some other minor issues and feature requests are open -- we'll take them as we go along.

Feedback requests

We are not testing gh-ost on RDS ourselves. We appreciate community feedback on this tracking issue.

We are not testing gh-ost on Galera/XtraDB cluster ourselves. We appreciate community feedback on this tracking issue.

We value submitted Issues and questions.

Speaking

We will be presenting gh-ost in the next month:

Hope to see you there, and thank you again to all contributors!

by shlomi at September 06, 2016 09:44 AM

September 05, 2016

Jean-Jerome Schmidt

How to set up read-write split in Galera Cluster using ProxySQL

Edited on Sep 12, 2016 to correct the description of how ProxySQL handles session variables. Many thanks to Francisco Miguel for pointing this out.


ProxySQL is becoming more and more popular as SQL-aware load balancer for MySQL and MariaDB. In previous blog posts, we covered installation of ProxySQL and its configuration in a MySQL replication environment. We’ve covered how to set up ProxySQL to perform failovers executed from ClusterControl. At that time, Galera support in ProxySQL was a bit limited - you could configure Galera Cluster and split traffic across all nodes but there was no easy way to implement read-write split of your traffic. The only way to do that was to create a daemon which would monitor Galera state and update weights of backend servers defined in ProxySQL - a much more complex task than to write a small bash script.

In one of the recent ProxySQL releases, a very important feature was added - a scheduler, which allows to execute external scripts from within ProxySQL even as often as every millisecond (well, as long as your script can execute within this time frame). This feature creates an opportunity to extend ProxySQL and implement setups which were not possible to build easily in the past due to too low granularity of the cron schedule. In this blog post, we will show you how to take advantage of this new feature and create a Galera Cluster with read-write split performed by ProxySQL.

First, we need to install and start ProxySQL:

[root@ip-172-30-4-215 ~]# wget https://github.com/sysown/proxysql/releases/download/v1.2.1/proxysql-1.2.1-1-centos7.x86_64.rpm

[root@ip-172-30-4-215 ~]# rpm -i proxysql-1.2.1-1-centos7.x86_64.rpm
[root@ip-172-30-4-215 ~]# service proxysql start
Starting ProxySQL: DONE!

Next, we need to download a script which we will use to monitor Galera status. Currently it has to be downloaded separately but in the next release of ProxySQL it should be included in the rpm. The script needs to be located in /var/lib/proxysql.

[root@ip-172-30-4-215 ~]# wget https://raw.githubusercontent.com/sysown/proxysql/master/tools/proxysql_galera_checker.sh

[root@ip-172-30-4-215 ~]# mv proxysql_galera_checker.sh /var/lib/proxysql/
[root@ip-172-30-4-215 ~]# chmod u+x /var/lib/proxysql/proxysql_galera_checker.sh

If you are not familiar with this script, you can check what arguments it accepts by running:

[root@ip-172-30-4-215 ~]# /var/lib/proxysql/proxysql_galera_checker.sh
Usage: /var/lib/proxysql/proxysql_galera_checker.sh <hostgroup_id write> [hostgroup_id read] [number writers] [writers are readers 0|1} [log_file]

As we can see, we need to pass couple of arguments - hostgroups for writers and readers, number of writers which should be active at the same time. We also need to pass information if writers can be used as readers and, finally, path to a log file.

Next, we need to connect to ProxySQL’s admin interface. For that you need to know credentials - you can find them in a configuration file, typically located in /etc/proxysql.cnf:

admin_variables=
{
        admin_credentials="admin:admin"
        mysql_ifaces="127.0.0.1:6032;/tmp/proxysql_admin.sock"
#       refresh_interval=2000
#       debug=true
}

Knowing the credentials and interfaces on which ProxySQL listens, we can connect to the admin interface and begin configuration.

[root@ip-172-30-4-215 ~]# mysql -P6032 -uadmin -padmin -h 127.0.0.1

First, we need to fill mysql_servers table with information about our Galera nodes. We will add them twice, to two different hostgroups. One hostgroup (with hostgroup_id of 0) will handle writes while the second hostgroup (with hostgroup_id of 1) will handle reads.

MySQL [(none)]> INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (0, '172.30.4.238', 3306), (0, '172.30.4.184', 3306), (0, '172.30.4.67', 3306);
Query OK, 3 rows affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (1, '172.30.4.238', 3306), (1, '172.30.4.184', 3306), (1, '172.30.4.67', 3306);
Query OK, 3 rows affected (0.00 sec)

Next, we need to add information about users which will be used by the application. We used a plain text password here but ProxySQL accepts also hashed passwords in MySQL format.

MySQL [(none)]> INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('sbtest', 'sbtest', 1, 0);
Query OK, 1 row affected (0.00 sec)

What’s important to keep in mind is the default_hostgroup setting - we set it to ‘0’ which means that, unless one of query rules say different, all queries will be sent to the hostgroup 0 - our writers.

At this point we need to define query rules which will handle read/write split. First, we want to match all SELECT queries:

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*', 1, 0);
Query OK, 1 row affected (0.00 sec)

It is important to make sure you get the regex correctly. It is also crucial to note that we set ‘apply’ column to ‘0’. This means that our rule won’t be the final one - a query, even if it matches the regex, will be tested against next rule in the chain. You can see why we’ve done that when you look at our second rule:

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*FOR UPDATE', 0, 1);
Query OK, 1 row affected (0.00 sec)

We are looking for SELECT … FOR UPDATE queries, that’s why we couldn’t just finish checking our SELECT queries on the first rule. SELECT … FOR UPDATE should be routed to our write hostgroup, where UPDATE will happen.

Those settings will work fine if autocommit is enabled and no explicit transactions are used. If your application uses transactions, one of the methods to make them work safely in ProxySQL is to use the following set of queries:

SET autocommit=0;
BEGIN;
...

The transaction is created and it will stick to the host where it was opened. You also need to have a query rule for BEGIN, which would route it to the hostgroup for writers - in our case we leverage the fact that, by default, all queries executed as ‘sbtest’ user are routed to writers’ hostgroup (‘0’) so there’s no need to add anything.

Second method would be to enable persistent transactions for our user (transaction_persistent column in mysql_users table should be set to ‘1’).

ProxySQL’s handling of other SET statements and user-defined variables is another thing we’d like to discuss a bit here. ProxySQL works on two levels of routing. First - query rules. You need to make sure all your queries are routed accordingly to your needs. Then, connection mutiplexing - even when routed to the same host, every query you issue may in fact use a different connection to the backend. This makes things hard for session variables. Luckily, ProxySQL treats all queries containing ‘@’ character in a special way - once it detects it, it disables connection multiplexing for the duration of that session - thanks to that, we don’t have to be worried that the next query won’t know a thing about our session variable.

The only thing we need to make sure of is that we end up in the correct hostgroup before disabling connection multiplexing. To cover all cases, the ideal hostgroup in our setup would be the one with writers. This would require slight change in the way we set our query rules (you may require to run ‘DELETE FROM mysql_query_rules’ if you already added the query rules we mentioned earlier).

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '.*@.*', 0, 1);
Query OK, 1 row affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*', 1, 0);
Query OK, 1 row affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*FOR UPDATE', 0, 1);
Query OK, 1 row affected (0.00 sec)

Those two cases could become a problem in our setup but as long as you are not affected by them (or if you used the proposed workarounds), we can proceed further with configuration. We still need to setup our script to be executed from ProxySQL:

MySQL [(none)]> INSERT INTO scheduler (id, active, interval_ms, filename, arg1, arg2, arg3, arg4, arg5) VALUES (1, 1, 1000, '/var/lib/proxysql/proxysql_galera_checker.sh', 0, 1, 1, 1, '/var/lib/proxysql/proxysql_galera_checker.log');
Query OK, 1 row affected (0.01 sec)

Additionally, because of the way how Galera handles dropped nodes, we want to increase the number of attempts that ProxySQL will make before it decides a host cannot be reached.

MySQL [(none)]> SET mysql-query_retries_on_failure=10;
Query OK, 1 row affected (0.00 sec)

Finally, we need to apply all changes we made to the runtime configuration and save them to disk.

MySQL [(none)]> LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK; LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK; LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; LOAD SCHEDULER TO RUNTIME; SAVE SCHEDULER TO DISK; LOAD MYSQL VARIABLES TO RUNTIME; SAVE MYSQL VARIABLES TO DISK;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 64 rows affected (0.05 sec)

Ok, let’s see how things work together. First, verify that our script works by looking at /var/lib/proxysql/proxysql_galera_checker.log:

Fri Sep  2 21:43:15 UTC 2016 Check server 0:172.30.4.184:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Check server 0:172.30.4.238:3306 , status OFFLINE_SOFT , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Changing server 0:172.30.4.238:3306 to status ONLINE
Fri Sep  2 21:43:15 UTC 2016 Check server 0:172.30.4.67:3306 , status OFFLINE_SOFT , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Changing server 0:172.30.4.67:3306 to status ONLINE
Fri Sep  2 21:43:15 UTC 2016 Check server 1:172.30.4.184:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Check server 1:172.30.4.238:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:16 UTC 2016 Check server 1:172.30.4.67:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:16 UTC 2016 Number of writers online: 3 : hostgroup: 0
Fri Sep  2 21:43:16 UTC 2016 Number of writers reached, disabling extra write server 0:172.30.4.238:3306 to status OFFLINE_SOFT
Fri Sep  2 21:43:16 UTC 2016 Number of writers reached, disabling extra write server 0:172.30.4.67:3306 to status OFFLINE_SOFT
Fri Sep  2 21:43:16 UTC 2016 Enabling config

Looks ok. Next we can check mysql_servers table:

MySQL [(none)]> select hostgroup_id, hostname, status from mysql_servers;
+--------------+--------------+--------------+
| hostgroup_id | hostname     | status       |
+--------------+--------------+--------------+
| 0            | 172.30.4.238 | OFFLINE_SOFT |
| 0            | 172.30.4.184 | ONLINE       |
| 0            | 172.30.4.67  | OFFLINE_SOFT |
| 1            | 172.30.4.238 | ONLINE       |
| 1            | 172.30.4.184 | ONLINE       |
| 1            | 172.30.4.67  | ONLINE       |
+--------------+--------------+--------------+
6 rows in set (0.00 sec)

Again, everything looks as expected - one host is taking writes (172.30.4.184), all three are handling reads. Let’s start sysbench to generate some traffic and then we can check how ProxySQL will handle failure of the writer host.

[root@ip-172-30-4-215 ~]# while true ; do sysbench --test=/root/sysbench/sysbench/tests/db/oltp.lua --num-threads=6 --max-requests=0 --max-time=0 --mysql-host=172.30.4.215 --mysql-user=sbtest --mysql-password=sbtest --mysql-port=6033 --oltp-tables-count=32 --report-interval=1 --oltp-skip-trx=on --oltp-read-only=off --oltp-table-size=100000  run ;done

We are going to simulate a crash by killing the mysqld process on host 172.30.4.184. This is what you’ll see on the application side:

[  45s] threads: 6, tps: 0.00, reads: 4891.00, writes: 1398.00, response time: 23.67ms (95%), errors: 0.00, reconnects:  0.00
[  46s] threads: 6, tps: 0.00, reads: 4973.00, writes: 1425.00, response time: 25.39ms (95%), errors: 0.00, reconnects:  0.00
[  47s] threads: 6, tps: 0.00, reads: 5057.99, writes: 1439.00, response time: 22.23ms (95%), errors: 0.00, reconnects:  0.00
[  48s] threads: 6, tps: 0.00, reads: 2743.96, writes: 774.99, response time: 23.26ms (95%), errors: 0.00, reconnects:  0.00
[  49s] threads: 6, tps: 0.00, reads: 0.00, writes: 1.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  50s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  51s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  52s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  53s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  54s] threads: 6, tps: 0.00, reads: 1235.02, writes: 354.01, response time: 6134.76ms (95%), errors: 0.00, reconnects:  0.00
[  55s] threads: 6, tps: 0.00, reads: 5067.98, writes: 1459.00, response time: 24.95ms (95%), errors: 0.00, reconnects:  0.00
[  56s] threads: 6, tps: 0.00, reads: 5131.00, writes: 1458.00, response time: 22.07ms (95%), errors: 0.00, reconnects:  0.00
[  57s] threads: 6, tps: 0.00, reads: 4936.02, writes: 1414.00, response time: 22.37ms (95%), errors: 0.00, reconnects:  0.00
[  58s] threads: 6, tps: 0.00, reads: 4929.99, writes: 1404.00, response time: 24.79ms (95%), errors: 0.00, reconnects:  0.00

There’s a ~5 seconds break but otherwise, no error was reported. Of course, your mileage may vary - all depends on Galera settings and your application. Such feat might not be possible if you use transactions in your application.

To summarize, we showed you how to configure read-write split in Galera Cluster using ProxySQL. There are a couple of limitations due to the way the proxy works, but as long as none of them are a blocker, you can use it and benefit from other ProxySQL features like caching or query rewriting. Please also keep in mind that the script we used for setting up read-write split is just an example which comes from ProxySQL. If you’d like it to cover more complex cases, you can easily write one tailored to your needs.

by Severalnines at September 05, 2016 02:02 PM

MariaDB AB

Real-time Data Streaming to Kafka with MaxScale CDC

Markus Mäkelä

The new Change Data Capture (CDC) protocol modules in MaxScale 2.0.0 can be used to convert binlog events into easy to stream data. These streams can be guided to other systems for further processing and in-depth analysis.

In this article, we set up a simple Kafka broker on CentOS 7 and publish changes in the database as JSON with the help of the new CDC protocol in MaxScale.

The tools we'll be using require Python 3, the pip Python package manager and the kafka-python package. You can install them with the following commands on CentOS 7.

sudo yum install epel-release
sudo yum install python34
curl https://bootstrap.pypa.io/get-pip.py|sudo python3.4
sudo pip3 install kafka-python

Note: CentOS 7 Python 3 package is broken and requires manual installation of the pip3 packager (seen in step three).

The environment consists of one MariaDB 10.0 master, MaxScale and one Kafka broker inside a docker container. We'll be running all the components (mysqld, maxscale, docker) on the same machine to make the setup and testing easier.

Configuring the Database

We start off by configuring the master server with row based replication by adding the following lines to its configuration file.

log-bin=binlog
binlog_format=row
binlog_row_image=full

With row based replication, all replication events contain the modified data. This information can be used to reconstruct each change in data as it happens. The MaxScale CDC protocol router, avrorouter, reads this information from the binary logs and converts it into the easy-to-process and compact Avro format.

Configuring MaxScale

The next step is to configure MaxScale. We add two services to the configuration, one for reading the binary logs from the master and another for converting them into JSON and streaming them to the clients.

The first service, named Binlog Service, registers as a slave to the MariaDB master and starts to read replication events. These events are stored to a local cache where the second service, named CDC Service, converts them into Avro format files (we'll configure that later).

Here's the configuration entry for the Binlog Service.

[Binlog Service]
type=service
router=binlogrouter
router_options=server-id=4000,binlogdir=/var/lib/maxscale,mariadb10-compatibility=1
user=maxuser
passwd=maxpwd

[Binlog Listener]
type=listener
service=Binlog Service
protocol=MySQLClient
port=3306

The maxuser:maxpwd credentials are used to connect to the master server and query for details about database users. Read the MaxScale Tutorial on how to create them.

The router_options parameter is the main way the binlogrouter module is configured. The server-id option is the server ID given to the master, binlogdir is the directory where the binlog files are stored and mariadb10-compatibility enables MariaDB 10.0 support in the binlogrouter. For more details on the binlogrouter options, read the Binlogrouter Documentation.

We also configured a MySQL protocol listener so that we can start the replication with the mysql cli.

After configuring the Binlog Service, we'll set up the CDC router and listeners.

[CDC Service]
type=service
router=avrorouter
source=Binlog Service
router_options=filestem=binlog
user=maxuser
passwd=maxpwd

[CDC Listener]
type=listener
service=CDC Service
protocol=CDC
port=4001

The configuration for the CDC Service is very simple. The only parameter we need to configure is source, which tells us which service we are going to use as the source for binary logs, and the router_option filestem, which tells the prefix of the binary log files. The CDC Service will read the Binlog Service configuration and gather all the required information from there and start the conversion process. For more details on how to fine-tune the avrorouter for optimal conversion speed, read the Avrorouter Documentation.

Setting up Kafka in Docker

After we have MaxScale configured, we'll start the Kafka broker inside Docker. We'll use the spotify/kafka image to set up a quick single node Kafka broker on the same machine MaxScale is running on.

sudo docker run -d --name kafka -p 2181:2181 -p 9092:9092 --env ADVERTISED_HOST=192.168.0.100 --env ADVERTISED_PORT=9092 spotify/kafka

This command will start the Kafka broker inside a Docker container. The container will contain command line utilities that we can use to read the queues messages from the broker to confirm our setup is working.

Next we'll heave to create a topic where we can publish messages. We'll use the packaged console utilities in the docker container to create it and we'll call it CDC_DataStream.

sudo docker exec -ti kafka /opt/kafka_2.11-0.8.2.1/bin/kafka-topics.sh --create --zookeeper 127.0.0.1:2181 --topic CDC_DataStream --replication-factor 1 --partitions 1

We are using the docker container version of Kafka to simplify the setup and make this easy to reproduce. Read the Kafka Quickstart guide on information how to set up your own Kafka cluster and for more details on the tools used inside the container.

Starting Up MaxScale

The final step is to start the replication in MaxScale and stream events into the Kafka broker using the cdc and cdc_kafka_producer tools included in the MaxScale installation.

After starting MaxScale we connect to the Binlog Service on port 3306 and start replication.

CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=3306, MASTER_USER='maxuser', MASTER_PASSWORD='maxpwd', MASTER_LOG_POS=4, MASTER_LOG_FILE='binlog.000001';
START SLAVE;

The CDC service allows us to query it for changes in a specific table. For the purpose of this article, we've created an extremely simple test table using the following statement and populated it with some data.

CREATE TABLE test.t1 (id INT);
INSERT INTO test.t1 VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);

After we have created and populated the table on the master, the table and the changes in it will be replicated to MaxScale. MaxScale will cache the binlogs locally and convert them into the Avro format for quick streaming. After a short while, MaxScale should've read and converted enough data so that we can start querying it using the cdc tool.

Next, we'll query the CDC Service for change records on the test.t1 table. Since we didn't configure any extra users, we'll use the service user we configured for the service.

cdc -u maxuser -pmaxpwd -h 127.0.0.1 -P 4001 test.t1

We'll get a continuous stream of JSON objects printed into the standard output which we can utilize as the source for out Kafka streamer, the cdc_kafka_producer utility. All we have to do is to pipe the output of the cdc program into the cdc_kafka_producer to push it to the broker.

cdc -u maxuser -pmaxpwd -h 127.0.0.1 -P 4001 test.t1 | cdc_kafka_producer --kafka-broker=127.0.0.1:9092 --kafka-topic=CDC_DataStream

We send the changes to the broker listening on the port 9092 on the local host and publish them on the CDC_DataStream topic we created earlier. When we start the console consumer in another terminal, we'll see the events arriving as they are published on the broker.

[vagrant@maxscale ~]$ sudo docker exec -ti kafka /opt/kafka_2.11-0.8.2.1/bin/kafka-console-consumer.sh --zookeeper 127.0.0.1:2181 --topic CDC_DataStream
{"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "fields": [{"type": "int", "name": "domain"}, {"type": "int", "name": "server_id"}, {"type": "int", "name": "sequence"}, {"type": "int", "name": "event_number"}, {"type": "int", "name": "timestamp"}, {"type": {"symbols": ["insert", "update_before", "update_after", "delete"], "type": "enum", "name": "EVENT_TYPES"}, "name": "event_type"}, {"type": "int", "name": "id"}], "name": "ChangeRecord"}
{"domain": 0, "event_number": 1, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 1}
{"domain": 0, "event_number": 2, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 2}
{"domain": 0, "event_number": 3, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 3}
{"domain": 0, "event_number": 4, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 4}
{"domain": 0, "event_number": 5, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 5}
{"domain": 0, "event_number": 6, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 6}
{"domain": 0, "event_number": 7, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 7}
{"domain": 0, "event_number": 8, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 8}
{"domain": 0, "event_number": 9, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 9}

The first JSON object is the schema of the table and the objects following it are the actual changes. If we insert, delete or modify data on the database, we'll see the changes in JSON only seconds after they happen.

And that's it, we've successfully created a real-time stream representation of the database changes with the help of MaxScale CDC protocol and Kafka.

About the Author

Markus Mäkelä's picture

Markus Mäkelä is a Software Engineer working on MariaDB MaxScale. He graduated from Metropolia University of Applied Sciences in Helsinki, Finland.

by Markus Mäkelä at September 05, 2016 12:00 AM

September 02, 2016

Peter Zaitsev

InnoDB Troubleshooting: Q & A

InnoDB Troubleshooting

InnoDB TroubleshootingIn this blog, I will provide answers to the Q & A for the InnoDB Troubleshooting webinar.

First, I want to thank everybody for attending the August 11 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: What’s a good speed for buffer pool speed/size for maximum query performance?

A: I am sorry, I don’t quite understand the question. InnoDB buffer pool is an in-memory buffer. In an ideal case, your whole active dataset (rows that are accessed by application regularly) should be in the buffer pool. There is a good blog post by Peter Zaitsev describing how to find the best size for the buffer pool.

Q: Any maximum range for these InnoDB options?

A: I am again sorry, I only see questions after the webinar and don’t know which slide you were on when you asked about options. But generally speaking, the maximum ranges should be limited by hardware: the size of InnoDB buffer pool limited by the amount of physical memory you have, the size of

innodb_io_capacity
  limited by the number of IOPS which your disk can handle, and the number of concurrent threads limited by the number of CPU cores.

Q: On a AWS r3.4xlarge, 16 CPU, 119GB of RAM, EBS volumes, what innodb_thread_concurrency, innodb_read_io_threads, innodb_write_io_threads would you recommend? and innodb_read_io_capacity?

A:

innodb_thread_concurrency = 16, innodb_read_io_threads = 8, innodb_write_io_threads = 8, innodb_io_capacity
 — but it depends on the speed of your disks. As far as I know, AWS offers disks with different speeds. You should consult IOPS about what your disks can handle when setting
innodb_io_capacity
, and “Max IOPS” when setting 
innodb_io_capacity_max
.

Q: About InnoDB structures and parallelism: Are there InnoDB settings that can prevent or reduce latching (causes semaphore locks and shutdown after 600s) that occur trying to add an index object to memory but only DML queries on the primary key are running?

A: Unfortunately, semaphore locks for the

CREATE INDEX
 command are not avoidable. You only can affect other factors that speed up index creation. For example, how fast you write records to the disk or how many concurrent queries you run. Kill queries that are waiting for a lock too long. There is an old feature request asking to handle long semaphore waits gracefully. Consider clicking “Affects Me” button to bring it to the developers’ attention.

Q: How can we check these threads?

A: I assume you are asking about InnoDB threads? You can find information about running threads in

SHOW ENGINE INNODB STATUS
 :

--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 1; buffer pool: 0
529 OS file reads, 252 OS file writes, 251 OS fsyncs
0.74 reads/s, 16384 avg bytes/read, 7.97 writes/s, 7.94 fsyncs/s

And in the Performance Schema

THREADS
 table:

mysql> select thread_id, name, type from performance_schema.threads where name like '%innodb%';
+-----------+----------------------------------------+------------+
| thread_id | name                                   | type       |
+-----------+----------------------------------------+------------+
|         2 | thread/innodb/io_handler_thread        | BACKGROUND |
|         3 | thread/innodb/io_handler_thread        | BACKGROUND |
|         4 | thread/innodb/io_handler_thread        | BACKGROUND |
|         5 | thread/innodb/io_handler_thread        | BACKGROUND |
|         6 | thread/innodb/io_handler_thread        | BACKGROUND |
|         7 | thread/innodb/io_handler_thread        | BACKGROUND |
|         8 | thread/innodb/io_handler_thread        | BACKGROUND |
|         9 | thread/innodb/io_handler_thread        | BACKGROUND |
|        10 | thread/innodb/io_handler_thread        | BACKGROUND |
|        11 | thread/innodb/io_handler_thread        | BACKGROUND |
|        13 | thread/innodb/srv_lock_timeout_thread  | BACKGROUND |
|        14 | thread/innodb/srv_monitor_thread       | BACKGROUND |
|        15 | thread/innodb/srv_error_monitor_thread | BACKGROUND |
|        16 | thread/innodb/srv_master_thread        | BACKGROUND |
|        17 | thread/innodb/srv_purge_thread         | BACKGROUND |
|        18 | thread/innodb/page_cleaner_thread      | BACKGROUND |
|        19 | thread/innodb/lru_manager_thread       | BACKGROUND |
+-----------+----------------------------------------+------------+
17 rows in set (0.00 sec)

Q: Give brief on InnoDB thread is not same as connection thread.

A: You create a MySQL connection thread each time the client connects to the server. Generally, the lifetime of this thread is the same as the connection (I won’t discuss the thread cache and thread pool plugin here to avoid unnecessary complexity). This way, if you have 100 connections you have 100 connection threads. But not all of these threads do something. Some are actively querying MySQL, but others are sleeping. You can find the number of threads actively doing something if you examine the status variable

Threads_running
. InnoDB doesn’t create as many threads as connections to perform its job effectively. It creates fewer threads (ideally, it is same as the number of CPU cores). So, for example just 16 InnoDB threads can handle100 and more connection threads effectively.

Q: How can we delete bulk data in Percona XtraDB Cluster?  without affecting production? nearly 6 million records worth 40 GB size table

A: You can use the utility pt-archiver. It deletes rows in chunks. While your database will still have to handle all these writes, the option

--max-flow-ctl
  pauses a purge job if the cluster spent too much time pausing for flow control.

Q: Why do we sometimes get “–tc-heuristic-recover” message in error logs? Especially when we recover after a crash? What does this indicate? And should we commit or rollback?

A: This means you used two transactional engines that support XA in the same transaction, and mysqld crashed in the middle of the transaction. Now mysqld cannot determine which strategy to use when recovering transactions: either

COMMIT
 or
ROLLBACK
. Strangely, this option is documented as “not used”. It certainly is, however. Test case for bug #70860 proves it. I reported a documentation bug #82780.

Q: Which parameter controls the InnoDB thread count?

A: The main parameter is

innodb_thread_concurrency
. For fine tuning, use 
innodb_read_io_threads, innodb_write_io_threads, innodb_purge_threads, innodb_page_cleaners
. Q:

Q: At what frequency will the InnoDB status be dumped in a file by using innodb-status-file?

A: Approximately every 15 seconds, but it can vary slightly depending on the server load.

Q: I faced an issue that once disk got detached from running server due to some issue on AWS ec2. MySQL went to default mode. After MySQL stopped and started, we observed slave skipped some around 15 mins data. We got it by foreign key relationship issue. Can you please explain why it was skipped data in slave?

A: Amazon Aurora supports two kinds of replication: physical as implemented by Amazon (this is the default for replicas in the same region), and the regular asynchronous replication for cross-region replication. If you use the former, I cannot help you because this is a closed-source Amazon feature. You need to report a bug to Amazon. If you used the latter, this looks buggy too. According to my experience, it should not happen. With regular replication you need to check which transactions were applied (best if you use GTIDs, or at least the 

log-slave-updates
 option) and which were not. If you find a gap, report a bug at bugs.mysql.com.

Q: Can you explain more about adaptive hash index?

A: InnoDB stores its indexes on disks as a B-Tree. While B-Tree indexes are effective in general, some queries can take advantage of using much simpler hash indexes. While your server is in use, InnoDB analyzes the queries it is currently processing and builds an in-memory hash index inside the buffer pool (using the prefix of the B-Tree key). While adaptive hash index generally works well, “with some workloads, the speedup from hash index lookups greatly outweighs the extra work to monitor index lookups and maintain the hash index structure” Another issue with adaptive hash index is that until version 5.7.8, it was protected by a single latch — which could be a contention point under heavy workloads. Since 5.7.8, adaptive hash index can be partitioned. The number of parts is controlled by option 

innodb_adaptive_hash_index_parts
.

Save

by Sveta Smirnova at September 02, 2016 09:12 PM

MHA Quick Start Guide

MHA

high availabilityMHA (Master High Availability Manager and tools for MySQL) is one of the most important pieces of our managed services. When properly set up, it can check replication health, move writer and reader virtual IPs, perform failovers, and have its output constantly monitored by Nagios. Is it easy to deploy and follows the KISS (Keep It Simple, Stupid) philosophy that I love so much.

This blog post is a quick start guide to try it out and play with it in your own testing environment. I assume that you already know how to install software, deal with SSH keys and setup replication in MySQL. The post just covers MHA configuration.

Testing environment

Taken from /etc/hosts

192.168.1.116	mysql-server1
192.168.1.117   mysql-server2
192.168.1.118   mysql-server3
192.168.1.119   mha-manager

mysql-server1: Our master MySQL server with 5.6
mysql-server2: Slave server
mysql-server3: Slave server
mha-manager: The server monitors the replication and from where we manage MHA. The installation is also required to meet some Perl dependencies.

We just introduced some new concepts, the MHA Node and MHA Manager:

MHA Node

It is installed and runs on each MySQL server. This is the piece of software that it is invoked by the manager every time we want to do something, like for example a failover or a check.

MHA Manager

As explained before, this is our operations center. The manager monitors the services, replication, and includes several administrative command lines.

Pre-requisites

  • Replication must already be running. MHA manages replication and monitors it, but it is not a tool to deploy it. So MySQL and replication need to be running already.
  • All hosts should be able to connect to each other using public SSH keys.
  • All nodes need to be able to connect to each other’s MySQL servers.
  • All nodes should have the same replication user and password.
  • In the case of multi-master setups, only one writable node is allowed. All others need to be configured with read_only.
  • MySQL version has to be 5.0 or later.
  • Candidates for master failover should have binary log enabled. The replication user must exist there too.
  • Binary log filtering variables should be the same on all servers (replicate-wild%, binlog-do-db…).
  • Disable automatic relay-log purge and do it regularly from a cron task. You can use an MHA-included script called “purge_relay_logs”.

While that is a large list of requisites, I think that they are pretty standard and logical.

MHA installation

As explained before, the MHA Node needs to be installed on all the nodes. You can download it from this Google Drive link.

This post shows you how to install it using the source code, but there are RPM packages available. Deb too, but only for older versions. Use the installation method you prefer. This is how to compile it:

tar -xzf mha4mysql-node-0.57.tar.gz
perl Makefile.PL
make
make install

The commands included in the node package are save_binary_logs, filter_mysqlbinlog, purge_relay_logs, apply_diff_relay_logs. Mostly tools that the manager needs to call in order to perform a failover, while trying to minimize or avoid any data loss.

On the manager server, you need to install MHA Node plus MHA Manager. This is due to MHA Manager dependance on a Perl library that comes with MHA Node. The installation process is just the same.

Configuration

We only need one configuration file on the Manager node. The example below is a good starting point:

# cat /etc/app1.cnf
[server default]
# mysql user and password
user=root
password=supersecure
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
[server1]
hostname=mysql-server1
candidate_master=1
[server2]
hostname=mysql-server2
candidate_master=1
[server3]
hostname=mysql-server3
no_master=1

So pretty straightforward. It specifies that there are three servers, two that can be master and one that can’t be promoted to master.

Let’s check if we meet some of the pre-requisites. We are going to test if replication is working, can be monitored, and also if SSH connectivity works.

# masterha_check_ssh --conf=/etc/app1.cnf
[...]
[info] All SSH connection tests passed successfully.

It works. Now let’s check MySQL:

# masterha_check_repl --conf=/etc/app1.cnf
[...]
MySQL Replication Health is OK.

Start the manager and operations

Everything is setup, we meet the pre-requisites. We can start our manager:

# masterha_manager --remove_dead_master_conf --conf=/etc/app1.cnf
[...]
[info] Starting ping health check on mysql-server1(192.168.1.116:3306)..
[info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

The manager found our master and it is now actively monitoring it using a SELECT command. –remove_dead_master_conf tells the manager that if the master goes down, it must edit the config file and remove the master’s configuration from it after a successful failover. This avoids the “there is a dead slave” error when you restart the manager. All servers listed in the conf should be part of the replication and in good health, or the manager will refuse to work.

Automatic and manual failover

Good, everything is running as expected. What happens if the MySQL master dies!?!

[...]
[warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
[info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/log/mysql, up to mysql-bin.000002
[info] HealthCheck: SSH to mha-server1 is reachable.
[...]

First, it tries to connect by SSH to read the binary log and save it. MHA can apply the missing binary log events to the remaining slaves so they are up to date with all the before-failover info. Nice!

Theses different phases follow:

* Phase 1: Configuration Check Phase..
* Phase 2: Dead Master Shutdown Phase..
* Phase 3: Master Recovery Phase..
* Phase 3.1: Getting Latest Slaves Phase..
* Phase 3.2: Saving Dead Master's Binlog Phase..
* Phase 3.3: Determining New Master Phase..
[info] Finding the latest slave that has all relay logs for recovering other slaves..
[info] All slaves received relay logs to the same position. No need to resync each other.
[info] Starting master failover..
[info]
From:
mysql-server1(192.168.1.116:3306) (current master)
 +--mysql-server2(192.168.1.117:3306)
 +--mysql-server3(192.168.1.118:3306)
To:
mysql-server2(192.168.1.117:3306) (new master)
 +--mysql-server3(192.168.1.118:3306)
* Phase 3.3: New Master Diff Log Generation Phase..
* Phase 3.4: Master Log Apply Phase..
* Phase 4: Slaves Recovery Phase..
* Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
* Phase 4.2: Starting Parallel Slave Log Apply Phase..
* Phase 5: New master cleanup phase..

The phases are pretty self-explanatory. MHA tries to get all the data possible from the master’s binary log and slave’s relay log (the one that is more advanced) to avoid losing any data or promote a slave that it was far behind the master. So it tries to promote a slave with the most current data as possible. We see that server2 has been promoted to master, because in our configuration we specified that server3 shouldn’t be promoted.

After the failover, the manager service stops itself. If we check the config file, the failed server is not there anymore. Now the recovery is up to you. You need to get the old master back in the replication chain, then add it again to the config file and start the manager.

It is also possible to perform a manual failover (if, for example, you need to do some maintenance on the master server). To do that you need to:

  • Stop masterha_manager.
  • Run masterha_master_switch –master_state=alive –conf=/etc/app1.cnf. The line says that you want to switch the master, but the actual master is still alive, so no need to mark it as dead or remove it from the conf file.

And that’s it. Here is part of the output. It shows the tool making the decision on the new topology and asking the user for confirmation:

[info]
From:
mysql-server1(192.168.1.116:3306) (current master)
 +--mysql-server2(192.168.1.117:3306)
 +--mysql-server3(192.168.1.118:3306)
To:
mysql-server2(192.168.1.117:3306) (new master)
 +--mysql-server3(192.168.1.118:3306)
Starting master switch from mha-server1(192.168.1.116:3306) to mha-server2(192.168.1.117:3306)? (yes/NO): yes
[...]
[info] Switching master to mha-server2(192.168.1.117:3306) completed successfully.

You can also employ some extra parameters that are really useful in some cases:

–orig_master_is_new_slave: if you want to make the old master a slave of the new one.

–running_updates_limit: if the current master executes write queries that take more than this parameter’s setting, or if any of the MySQL slaves behind master take more than this parameter, the master switch aborts. By default, it’s 1 (1 second). All these checks are for safety reasons.

–interactive=0: if you want to skip all the confirmation requests and questions masterha_master_switch could ask.

Check this link in case you use GTID and want to avoid problems with errant transactions during the failover:

https://www.percona.com/blog/2015/12/02/gtid-failover-with-mysqlslavetrx-fix-errant-transactions/

Custom scripts

Since this is a quick guide to start playing around with MHA, I won’t cover advanced topics in detail. But I will mention a few:

    • Custom scripts. MHA can move IPs around, shutdown a server and send you a report in case something happens. It needs a custom script, however. MHA comes with some example scripts, but you would need to write one that fits your environment.The directives are master_ip_failover_script, shutdown_script, report_script. With them configured, MHA will send you an email or a message to your mobile device in the case of a failover, shutdown the server and move IPs between servers. Pretty nice!

Hope you found this quickstart guide useful for your own tests. Remember, one of the most important things: don’t overdo automation!  😉 These tools are good for checking health and performing the first initial failover. But you must still investigate what happened, why, fix it and work to avoid it from happening again. In high availability (HA) environments, automate everything and cause it to stop being HA.

Have fun!

by Miguel Angel Nieto at September 02, 2016 07:22 PM

MariaDB AB

How to Stream Change Data through MariaDB MaxScale using CDC API

Massimiliano Pinto

In the previous two blog posts, I introduced Data Streaming with MariaDB MaxScale and demonstrated how to configure MariaDB Master and MariaDB MaxScale for Data Streaming. In this post, I will review how to use the Change Data Capture (CDC) protocol to request streaming data from MariaDB MaxScale.


CDC Protocol

CDC is a new protocol introduced in MariaDB MaxScale 2.0 that allows clients to authenticate and register for CDC events. The new protocol is to be used in conjunction with AVRO router which currently converts binlog events into AVRO records. The CDC protocol is used by clients to request and receive change data events through MariaDB MaxScale either in AVRO or JSON format.

Protocol Phases

There are three phases of the protocol.

1. Connection and Authentication

  • Client connects to MariaDB MaxScale CDC protocol listener.

  • The authentication starts when the client sends the hexadecimal representation of the username concatenated with a colon (:) and the SHA1 of the password.

  • Example: For the user foobar with a password of foopasswd, client should send the following hexadecimal string

foobar:SHA1(foopasswd) ->  666f6f6261723a3137336363643535253331

Then MaxScale returns OK on success and ERR on failure.

2. Registration

  • Client sends UUID and specifies the output format (AVRO or JSON) for data retrieval.

  • Example: The following message from the client registers the client to receive AVRO change data events

REGISTER UUID=11ec2300-2e23-11e6-8308-0002a5d5c51b, TYPE=AVRO

Then MaxScale returns OK on success and ERR on failure.

3. Change Data Capture Commands

  • Request Change Data Events

REQUEST-DATA DATABASE.TABLE[.VERSION] [GTID]

This command fetches data from a specified table in a database and returns the output in the requested format (AVRO or JSON). Data records are sent to clients and if new AVRO versions are found (e.g. mydb.mytable.0000002.avro) the new schema and data will be sent as well.

The data will be streamed until the client closes the connection.

Clients should continue reading from network in order to automatically gets new events.

  • Examples:

REQUEST-DATA db1.table1
REQUEST-DATA dbi1.table1.000003
REQUEST-DATA db2.table4 0-11-345

Example output in JSON, note the AVRO schema and database records (“id” is the only column in table1) at the end:

{"fields": [{"type": "int", "name": "domain"}, {"type": "int", "name": "server_id"}, {"type": "int", "name": "sequence"}, {"type": "int", "name": "event_number"}, {"type": "int", "name": "timestamp"}, {"type": {"type": "enum", "symbols": ["insert", "update_before", "update_after", "delete"], "name": "EVENT_TYPES"}, "name": "event_type"}, {"type": "int", "name": "id"}], "type": "record", "namespace": "MariaDB MaxScaleChangeDataSchema.avro", "name": "ChangeRecord"}                    
{"event_number": 1, "event_type": "insert", "domain": 0, "sequence": 225, "server_id": 1, "timestamp": 1463037556, "id": 1111}
{"event_number": 4, "event_type": "insert", "domain": 0, "sequence": 225, "server_id": 1, "timestamp": 1463037563,  "id": 4444}

Note: When a table has been altered a new schema is generated by the AVRO converter and a new AVRO file comes with that schema.

  • Request Change Data Event Statistics

QUERY-LAST-TRANSACTION

Returns JSON only with last GTID, timestamp and affected tables.

Example output:

{"GTID": "0-1-178", "events": 2, "timestamp": 1462290084, "tables": ["db1.tb1", “db2.tb2”]}

Last GTID could then be later used in a REQUEST-DATA query.

QUERY-TRANSACTION 0-14-1245

Returns JSON events from the specified GTID, the commit timestamp and affected tables:

  • Example output:
{"GTID": "0-14-1245", "events": 23, "timestamp": 1462291124, "tables": ["db1.tb3"]}

 

How to Quickly Connect to the MaxScale CDC Protocol

The avrorouter comes with an example client program, cdc, written in Python 3. This client can connect to MaxScale configured with the CDC protocol and the avrorouter.

Before using this client, you will need to install the Python 3 interpreter and add users to the service with thecdc_users script. For more details about the user creation, please refer to the CDC Protocol and CDC Users documentation.

The output of #cdc --help provides a full list of supported options and a short usage description of the client program.

-bash-4.1$ python3 cdc --help
usage: cdc [--help] [-h HOST] [-P PORT] [-u USER] [-p PASSWORD]
           [-f {AVRO,JSON}] [-t READ_TIMEOUT]
           FILE [GTID]

CDC Binary consumer

positional arguments:
  FILE                  Requested table name in the following format:
                        DATABASE.TABLE[.VERSION]
  GTID                  Requested GTID position

optional arguments:
  --help                show this help message and exit
  -h HOST, --host HOST  Network address where the connection is made
  -P PORT, --port PORT  Port where the connection is made
  -u USER, --user USER  Username used when connecting
  -p PASSWORD, --password PASSWORD
                        Password used when connecting
  -f {AVRO,JSON}, --format {AVRO,JSON}
                        Data transmission format
  -t READ_TIMEOUT, --timeout READ_TIMEOUT
                        Read timeout

Here is an example of how to query data in JSON using the cdc Python script. It queries the table test.mytable for all change records.

# cdc --user=myuser --password=mypasswd --host=127.0.0.1 --port=4001 test.mytable

The AVRO binary output will look like this:

Objavro.codenullavro.schema?{"type":"record","name":"ChangeRecord","namespace":"MariaDB MaxScaleChangeDataSchema.avro","fields":[{"name":"domain","type":{"type":"int"}},{"name":"server_id","type":{"type":"int"}},{"name":"sequence","type":{"type":"int"}},{"name":"event_number","type":{"type":"int"}},{"name":"timestamp","type":{"type":"int"}},{"name":"event_type","type":{"type":"enum","name":"EVENT_TYPES","symbols":["insert","update_before","update_after","delete"]}},{"name":"id","type":{"type":"int"}}]}??7)??.?v?r???????
??7)??.?v?r???????
???7)??.?v?r???????
???7)??.?v?r????●?
??䗏?

Note: Each client could request data from one AVRO file (it contains one schema only).

In order to get all the events in a transaction that affects N tables, N connections (one per each table) are needed.

The application stays in the read loop and it will gets additional events as long as they come in. All clients are notified with new events.

Assume a transaction affects two tables, tbl1 and tbl2 in db1:

BEGIN;
INSERT into db1.tbl1 (id) VALUES (111);
​INSERT into db1.tbl2 (id) VALUES (222);
COMMIT;


The user should start two clients:

# cdc --user=myuser --password=mypasswd --host=127.0.0.1 --port=4001 db1.tbl1
# cdc --user=myuser --password=mypasswd --host=127.0.0.1 --port=4001 db1.tbl2

 

Example Client (python code):

#!/usr/bin/env python3

import time
import json
import re
import sys
import socket
import hashlib
import argparse
import subprocess
import selectors
import binascii
import os

# Read data as JSON
def read_json():
    decoder = json.JSONDecoder()
    rbuf = bytes()
    ep = selectors.EpollSelector()
    ep.register(sock, selectors.EVENT_READ)

    while True:
        pollrc = ep.select(timeout=int(opts.read_timeout) if int(opts.read_timeout) > 0 else None)
        try:
            buf = sock.recv(4096, socket.MSG_DONTWAIT)
            rbuf += buf
            while True:
                rbuf = rbuf.lstrip()
                data = decoder.raw_decode(rbuf.decode('ascii'))
                rbuf = rbuf[data[1]:]
                print(json.dumps(data[0]))
        except ValueError as err:
            sys.stdout.flush()
            pass
        except Exception:
            break

# Read data as Avro
def read_avro():
    ep = selectors.EpollSelector()
    ep.register(sock, selectors.EVENT_READ)

    while True:
        pollrc = ep.select(timeout=int(opts.read_timeout) if int(opts.read_timeout) > 0 else None)
        try:
            buf = sock.recv(4096, socket.MSG_DONTWAIT)
            os.write(sys.stdout.fileno(), buf)
            sys.stdout.flush()
        except Exception:
            break

parser = argparse.ArgumentParser(description = "CDC Binary consumer", conflict_handler="resolve")
parser.add_argument("-h", "--host", dest="host", help="Network address where the connection is made", default="localhost")
parser.add_argument("-P", "--port", dest="port", help="Port where the connection is made", default="4001")
parser.add_argument("-u", "--user", dest="user", help="Username used when connecting", default="")
parser.add_argument("-p", "--password", dest="password", help="Password used when connecting", default="")
parser.add_argument("-f", "--format", dest="format", help="Data transmission format", default="JSON", choices=["AVRO", "JSON"])
parser.add_argument("-t", "--timeout", dest="read_timeout", help="Read timeout", default=0)
parser.add_argument("FILE", help="Requested table name in the following format: DATABASE.TABLE[.VERSION]")
parser.add_argument("GTID", help="Requested GTID position", default=None, nargs='?')

opts = parser.parse_args(sys.argv[1:])

sock = socket.create_connection([opts.host, opts.port])

# Authentication
auth_string = binascii.b2a_hex((opts.user + ":").encode())
auth_string += bytes(hashlib.sha1(opts.password.encode("utf_8")).hexdigest().encode())
sock.send(auth_string)

# Discard the response
response = str(sock.recv(1024)).encode('utf_8')

# Register as a client as request Avro format data
sock.send(bytes(("REGISTER UUID=XXX-YYY_YYY, TYPE=" + opts.format).encode()))

# Discard the response again
response = str(sock.recv(1024)).encode('utf_8')

# Request a data stream
sock.send(bytes(("REQUEST-DATA " + opts.FILE + (" " + opts.GTID if opts.GTID else "")).encode()))

if opts.format == "JSON":
    read_json()
elif opts.format == "AVRO":
    read_avro()

Additional example applications can be found here:

https://github.com/mariadb-corporation/MaxScale/tree/2.0/server/modules/routing/avro

Up next in this blog series, Markus Makela will show you how to use the CDC API in a Kafka Producer.

Relevant Links

About the Author

Massimiliano Pinto's picture

Massimiliano is a Senior Software Solutions Engineer working mainly on MaxScale. Massimiliano has worked for almost 15 years in Web Companies playing the roles of Technical Leader and Software Engineer. Prior to joining MariaDB he worked at Banzai Group and Matrix S.p.A, big players in the Italy Web Industry. He is still a guy who likes too much the terminal window on his Mac. Apache modules and PHP extensions skills are included as well.

by Massimiliano Pinto at September 02, 2016 03:12 PM

September 01, 2016

Peter Zaitsev

Percona Live Europe featured talk with Manyi Lu — MySQL 8.0: what’s new in Optimizer

Percona Live Europe featured talk

percona live europe featured talkWelcome to a new Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Manyi Lu, Director Software Development at Oracle. Her talk will be on MySQL 8.0: what’s new in Optimizer. There are substantial improvements in the optimizer in MySQL 5.7 and MySQL 8.0. Most noticeably, users can now combine relational data with NoSQL using the new JSON features. I had a chance to speak with Manyi and learn a bit more about the MySQL 8.0:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Manyi: Oh, my interest in database development goes way back to university almost twenty years ago. After graduation, I joined local startup Clustra and worked on the development of a highly available distributed database system for the telecom sector. Since then, I have worked on various aspects of the database, kernel, and replication. Lately I am heading the MySQL optimizer and GIS team.

What I love most about my work are the talented and dedicated people I am surrounded by, both within the team and in the MySQL community. We are passionate about building a database used by millions.

Percona: Your talk is called “MySQL 8.0: what’s new in Optimizer.” So, obvious question, what is new in the MySQL 8.0 Optimizer?

Manyi: There are a number of interesting features in 8.0. CTE or Common Table Expression has been one of the most demanded SQL features. MySQL 8.0 will support both the WITH and WITH RECURSIVE clausesA recursive CTE is quite useful for reproducing reports based on hierarchical data. For DBAs, Invisible Index should make life easier. They can mark an index invisible to the optimizer, check the performance and then decide to either drop it or keep it. On the performance side, we have improved the performance of table scans, range scans and similar queries by batching up records read from the storage engine into the server. We have significant work happening in the cost model area. In order to produce more optimal query plans, we have started the work on adding support for histograms, and for taking into account whether data already is in memory or needs to be read from disk.

Besides the optimizer, my team is also putting a major effort into utf8mb4 support. We have added a large set of utf8mb4 collations based on the latest Unicode standard. These collations have better support for emojis and languages. Utf8 is the dominating character encoding for the web, and this move will make the life easier for the vast majority of MySQL users. We also plan to add support for accent and case sensitive collations.

Please keep in mind that 8.0.0 is the first milestone release. There are quite a few features in the pipeline down the road.

Percona: How are some of the bridges between relational and NoSQL environments (like JSON support) of benefit to database deployments?

Manyi: The JSON support that we introduced in 5.7 has been immensely popular because it solves some very basic day-to-day problems. Relational database forces you to have a fixed schema, and the JSON datatype gives you the flexibility to store data without a schema. In the past, people stored relational data in MySQL and had to install yet another datastore to handle unstructured or semi-structured data that are schema-less in nature. With JSON support, you can store both relational and non-relational data in the same database, which makes database deployment much simpler. And not only that, but you can also perform queries across the boundaries of relational and non-relational data.

Clients that communicate with a MySQL Server using the newly introduced X Protocol can use the X DevAPI to develop applications. Developers do not even need to understand SQL if they do not want to. There are a number of connectors that support the X protocol, so you can use X DevApi in your preferred programming language. We have made MySQL more appealing to a larger range of developers.

Percona: What is the latest on the refactoring of the MySQL Optimizer and Parser?

Manyi: The codebase of optimizer and parser used to be quite messy. The parsing, optimizing and execution stages were intermingled, and the code was hard to maintain. We have had a long-running effort to clean up the codebase. In 5.7, the optimization stage was separated from the execution stage. In 8.0, the focus is refactoring the prepare stage and complete parser rewrite.

We have already seen the benefits of the refactoring work. Development time on new features has been reduced. CTE is a good example. Without refactoring done previously, it would have taken much longer to implement CTE. With a cleaner codebase, we also managed to reduce the bug count, which means more development resources can be allocated to new features instead of maintenance.

Percona: Where do you see MySQL heading in order to deal with some of the database trends that keep you awake at night?

Manyi: One industry trend is cloud computing and Database as a Service becoming viable options to in-house databases. In particular, it speeds up technology deployments and reduces initial investments for smaller organizations. MySQL, being the most popular open source database, fits well into the cloud data management trend.

What we can do is make MySQL even better in the cloud setting. E.g., better support for horizontal scaling, fail-over, sharding, cross-shard queries and the like.

Percona: What are looking forward to the most at Percona Live Europe this year?

Manyi: I like to speak and get feedback from MySQL users. Their input has a big impact on our roadmap. I also look forward to learning more about innovations by web-scale players like Facebook, Alibaba and others. I always feel more energized after talking to people who are passionate about MySQL and databases in general.

You can learn more about Manyi and her thoughts on MySQL 8.0 here: http://mysqlserverteam.com/

Want to find out more about Manyi, MySQL and Oracle? Register for Percona Live Europe 2016, and see her talk MySQL 8.0: what’s new in Optimizer.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

by Dave Avery at September 01, 2016 09:09 PM

MariaDB Foundation

MariaDB Connector/J 1.5.2 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB Connector/J 1.5.2 Stable (GA). This is the first stable release in the MariaDB Connector/J 1.5 series and includes several new features including a native SSPI Windows implementation, support for TLSv1.1 and TLSv1.2, Aurora host auto-discovery, “LOAD DATA INFILE” interceptors, performance improvements, bug fixes, […]

The post MariaDB Connector/J 1.5.2 now available appeared first on MariaDB.org.

by Daniel Bartholomew at September 01, 2016 02:37 PM

Jean-Jerome Schmidt

Planets9s - Intro to Docker Swarm Mode, MySQL Query Tuning replay and more ...

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

MySQL Query Tuning: Process & Tools - watch the replay

You can now watch the replay of this week’s webinar on MySQL Query Tuning: Process & Tools. Many thanks to everyone who participated. We looked at the query tuning process for MySQL from building, collecting, analysing, through to tuning and testing. And we discussed some of the main tools involved in this process, such as tcpdump and Pt-query-digest.

Watch the replay

MySQL on Docker: Introduction to Docker Swarm Mode and Multi-Host Networking

While we recently looked into Docker’s single-host networking for MySQL containers, we are now discussing the basics of multi-host networking and Docker swarm mode, a built-in orchestration tool to manage containers across multiple hosts. It is recommended to use an orchestration tool to get more manageability and scalability on top of your Docker engine cluster, and this blog post provides insight and pointers as to how to proceed with that.

Read the blog

Severalnines talks & tutorials at Percona Live Amsterdam

If you’re in Europe early October, then you may want to drop in to the Percona Live conference in Amsterdam, which this year covers a wide range of specialist topics for MySQL, MongoDB & PostgreSQL users. We’ll be there as well with a number of talks and tutorials, as well as a presence in the exhibition hall and we’d love to meet you there. We’ll be talking about how to become a MySQL and MongoDB DBA, load balancers and more. There’s still time to sign up!

Find out more

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at September 01, 2016 02:03 PM

MySQL Query Tuning: Process & Tools - View the Replay & Slides

Many thanks to everyone who participated in this week’s first part of our webinar trilogy on MySQL Query Tuning. This time we looked at the process and tools to consider when tuning MySQL queries. It was an interactive session with interesting feedback and questions, some of which we’d like to share in this blog.

The majority of this week’s participants currently use MySQL 5.6 or 5.7 (54%), followed by Percona Server 5.6 or 5.7. It was nice to see that many of our participants are on the most recent versions of their databases of choice. As we discussed query tuning, we also found that 100% of participants use slow log to collect their slow queries as well as other tools, such as tcpdump or performance schema. And everyone seems to run full query reviews, whether it’d be monthly, quarterly or yearly …

You can view the detail of the polls we ran during the webinar below in this blog or by watching the replay.

Watch the webinar replay or view the slides

When done right, Tuning MySQL queries and indexes can increase the performance of your application and decrease response times. We’re covering this complex topic over the course of three webinars of 60 minutes each, of which this was part 1; feel free to also register for parts 2 & 3 here.

Agenda Part 1

  • MySQL Query Tuning Trilogy: Process and tools
    • Query tuning process
    • Build
    • Collect
    • Analyse
    • Tune
    • Test
  • Tools
    • tcpdump
    • Pt-query-digest

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience in managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. He’s the main author of the Severalnines blog and webinar series: Become a MySQL DBA.

Webinar Poll Results

by Severalnines at September 01, 2016 01:12 PM

August 30, 2016

Peter Zaitsev

Webinar Thursday, September 1 – MongoDB Security: A Practical Approach

Percona MySQL and MongoDB Webinars

MongoDB SecurityPlease join David Murphy as he presents a webinar Thursday, September 1 at 10 am PDT (UTC-7) on MongoDB Security: A Practical Approach. (Date changed*)

Register
This webinar will discuss the many features and options available in the MongoDB community to help secure your database environment. First, we will cover how these features work and how to fill in known gaps. Next, we will look at the enterprise-type features shared by Percona Server for MongoDB and MongoDB Enterprise. Finally, we will examine some disk and network designs that can ensure your security out of the gate – you don’t want to read about how your MongoDB database leaked hundreds of gigs of data on someone’s security blog!

We will cover the following topics:

  • Using SSL for all things
  • Limiting network access
  • Custom roles
  • The missing true SUPER user
  • Wildcarding databases and collections
  • Adding specific actions to a normal role
  • LDAP
  • Auditing
  • Disk encryption
  • Network designs

Register for the webinar here.

MongoDB SecurityDavid Murphy, MongoDB Practice Manager
David joined Percona in October 2015 as Practice Manager for MongoDB. Prior to that, David was on the ObjectRocket by Rackspace team as the Lead DBA. With the growth involved with any recently acquired startup, David’s role covered a wide range from evangelism, research, run book development, knowledgebase design, consulting, technical account management, mentoring and much more. Prior to the world of MongoDB had been a MySQL and NoSQL architect at Electronic Arts working with some of the largest titles in the world like FIFA, SimCity, and Battle Field providing tuning, design, and technology choice responsibilities. David maintains an active interest in database speaking and exploring new technologies.

by Dave Avery at August 30, 2016 09:23 PM

MySQL Sharding with ProxySQL

MySQL Sharding with ProxySQL

MySQL Sharding with ProxySQLThis article demonstrates how MySQL sharding with ProxySQL works.

Recently a colleague of mine asked me to provide a simple example on how ProxySQL performs sharding.

In response, I’m writing this short tutorial in the hope it will illustrate ProxySQL’s sharding functionalities, and help people out there better understand how to use it.

ProxySQL is a very powerful platform that allows us to manipulate and manage our connections and queries in a simple but effective way. This article shows you how.

Before starting let’s clarify some basic concepts.

  • ProxySQL organizes its internal set of servers in Host Groups (HG), and each HG can be associated with users and Query Rules (QR)
  • Each QR can be final (apply = 1) or = let ProxySQL continue to parse other QRs
  • A QR can be a rewrite action, be a simple match, have a specific target HG, or be generic
  • QRs are defined using regex

You can see QRs as a sequence of filters and transformations that you can arrange as you like.

These simple basic rules give us enormous flexibility. They allow us to create very simple actions like a simple query re-write, or very complex chains that could see dozens of QR concatenated. Documentation can be found here.

The information related to HGs or QRs is easily accessible using the ProxySQL administrator interface, in the tables mysql_servers, mysql_query_rules and stats.stats_mysql_query_rules. The last one allows us to evaluate if and how the rule(s) is used.

With regards to sharding, what can ProxySQL do to help us achieve what we need (in a relatively easy way)? Some people/companies include sharding logic in the application, use multiple connections to reach the different targets, or have some logic to split the load across several schemas/tables. ProxySQL allows us to simplify the way connectivity and query distribution is supposed to work reading data in the query or accepting HINTS.

No matter what the requirements, the sharding exercise can be summarized in a few different categories.

  • By splitting the data inside the same container (like having a shard by State where each State is a schema)
  • By physical data location (this can have multiple MySQL servers in the same room, as well as having them geographically distributed)
  • A combination of the two, where I do split by State using a dedicated server, and again split by schema/table by whatever (say by gender)

In the following examples, I show how to use ProxySQL to cover the three different scenarios defined above (and a bit more).

The example below will report text from the Admin ProxySQL interface and the MySQL console. I will mark each one as follows:

  • Mc for MySQL console
  • Pa for ProxySQL Admin

Please note that the MySQL console MUST use the -c flag to pass the comments in the query. This is because the default behavior in the MySQL console is to remove the comments.

I am going to illustrate procedures that you can replicate on your laptop, and when possible I will mention a real implementation. This because I want you to directly test the ProxySQL functionalities.

For the example described below I have a PrxySQL v1.2.2 that is going to become the master in few days. You can download it from:

git clone https://github.com/sysown/proxysql.git
git checkout v1.2.2

Then to compile:

cd <path to proxy source code>
make
make install

If you need full instructions on how to install and configure ProxySQL, read here and here.

Finally, you need to have the WORLD test DB loaded. WORLD test DB can be found here.

Shard inside the same MySQL Server using three different schemas split by continent

Obviously, you can have any number of shards and relative schemas. What is relevant here is demonstrating how traffic gets redirected to different targets (schemas), maintaining the same structure (tables), by discriminating the target based on some relevant information in the data or pass by the application.

OK, let us roll the ball.

[Mc]
+---------------+-------------+
| Continent     | count(Code) |
+---------------+-------------+
| Asia          |          51 | <--
| Europe        |          46 | <--
| North America |          37 |
| Africa        |          58 | <--
| Oceania       |          28 |
| Antarctica    |           5 |
| South America |          14 |
+---------------+-------------+

For this exercise, I will use three hosts in replica.

To summarize, I will need:

  • Three hosts: 192.168.1.[5-6-7]
  • Three schemas: Continent X + world schema
  • One user : user_shardRW
  • Three hostgroups: 10, 20, 30 (for future use)

First, we will create the schemas Asia, Africa, Europe:

[Mc]
Create schema [Asia|Europe|North_America|Africa];
create table Asia.City as select a.* from  world.City a join Country on a.CountryCode = Country.code where Continent='Asia' ;
create table Europe.City as select a.* from  world.City a join Country on a.CountryCode = Country.code where Continent='Europe' ;
create table Africa.City as select a.* from  world.City a join Country on a.CountryCode = Country.code where Continent='Africa' ;
create table North_America.City as select a.* from  world.City a join Country on a.CountryCode = Country.code where Continent='North America' ;
create table Asia.Country as select * from  world.Country where Continent='Asia' ;
create table Europe.Country as select * from  world.Country where Continent='Europe' ;
create table Africa.Country as select * from  world.Country  where Continent='Africa' ;
create table North_America.Country as select * from  world.Country where Continent='North America' ;

Now, create the user

grant all on *.* to user_shardRW@'%' identified by 'test';

Now let us start to configure the ProxySQL:

[Pa]
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('user_shardRW','test',1,10,'test_shard1');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',10,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',20,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',30,3306,100);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;

With this we have defined the user, the servers and the host groups.

Let us start to define the logic with the query rules:

[Pa]
delete from mysql_query_rules where rule_id > 30;
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (31,1,'user_shardRW',"^SELECT\s*(.*)\s*from\s*world.(\S*)\s(.*).*Continent='(\S*)'\s*(\s*.*)$","SELECT \1 from \4.\2 WHERE 1=1 \5",1);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

I am now going to query the master (or a single node), but I am expecting ProxySQL to redirect the query to the right shard, catching the value of the continent:

[Mc]
 SELECT name,population from world.City  WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1;
+------+------------+
| name | population |
+------+------------+
| Roma |    2643581 |
+------+------------+

You can say: “Hey! You are querying the schema World, of course you get back the correct data.”

This is not what really happened. ProxySQL did not query the schema World, but the schema Europe.

Let’s look at the details:

[Pa]
select * from stats_mysql_query_digest;
Original    :SELECT name,population from world.City  WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1;
Transformed :SELECT name,population from Europe.City WHERE ?=? and CountryCode=? order by population desc limit ?

Let me explain what happened.

Rule 31 in ProxySQL will take all the FIELDS we will pass in the query. It will catch the CONTINENT in the WHERE clause, it will take any condition after WHERE and it will reorganize the queries all using the RegEx.

Does this work for any table in the sharded schemas? Of course it does.

A query like:

SELECT name,population from world.Country WHERE Continent='Asia' ;

Will be transformed into:
SELECT name,population from Asia.Country WHERE ?=?

[Mc]
+----------------------+------------+
| name                 | population |
+----------------------+------------+
| Afghanistan          |   22720000 |
| United Arab Emirates |    2441000 |
| Armenia              |    3520000 |
<snip ...>
| Vietnam              |   79832000 |
| Yemen                |   18112000 |
+----------------------+------------+

Another possible approach is to instruct ProxySQL to shard is to pass a hint inside a comment. Let see how.

First let me disable the rule I just inserted. This is not really needed, but we’ll do it so you can see how. 🙂

[Pa]
mysql> update mysql_query_rules set active=0 where rule_id=31;
Query OK, 1 row affected (0.00 sec)
mysql> LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.00 sec)

Done.

Now what I want is for *ANY* query that contains the comment /* continent=X */ to go to the continent X schema, same server.

To do so, I instruct ProxySQL to replace any reference to the world schema inside the query I am going to submit.

[Pa]
delete from mysql_query_rules where rule_id in (31,33,34,35,36);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (31,1,'user_shardRW',"\S*\s*\/\*\s*continent=.*Asia\s*\*.*",null,0,23,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (32,1,'user_shardRW','world.','Asia.',0,23,23);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (33,1,'user_shardRW',"\S*\s*\/\*\s*continent=.*Europe\s*\*.*",null,0,25,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (34,1,'user_shardRW','world.','Europe.',0,25,25);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (35,1,'user_shardRW',"\S*\s*\/\*\s*continent=.*Africa\s*\*.*",null,0,24,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (36,1,'user_shardRW','world.','Africa.',0,24,24);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

How does this work?

I have defined two concatenated rules. The first captures the incoming query containing the desired value (like continent = Asia). If the match is there, ProxySQL exits that action, but while doing so it will read the Apply field. If Apply is 0, it will read the FlagOUT value. At this point it will go to the first rule (in sequence) that has the value of FlagIN equal to the FlagOUT.

The second rule gets the request and will replace the value of world with the one I have defined. In short, it replaces whatever is in the match_pattern with the value that is in the replace_pattern.

Now ProxySQL implements the Re2 Google library for RegEx. Re2 is very fast but has some limitations, like it does NOT support (at the time of the writing) the flag option g. In other words, if I have a select with many tables, and as such several “world”, Re2 will replace ONLY the first instance.

As such, a query like:

Select /* continent=Europe */ * from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' ;

Will be transformed into:

Select /* continent=Europe */ * from Europe.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' ;

And fail.

The other day, Rene and I were discussing how to solve this given the lack of implementation in Re2. Finally, we opted for recursive actions.

What does this mean? It means that ProxySQL from v1.2.2 now has a new functionality that allows recursive calls to a Query Rule. The maximum number of iterations that ProxySQL can run is managed by the option (global variable) mysql-query_processor_iterations. Mysql-query_processor_iterations define how many operations a query process can execute as whole (from start to end).

This new implementation allows us to reference a Query Rule to itself to be executed multiple times.

If you go back you will notice that QR 34 has FlagIN and FlagOUT pointing to the same value of 25 and Apply =0. This brings ProxySQL to recursively call rule 34 until it changes ALL the values of the word world.

The result is the following:

[Mc]
Select /* continent=Europe */ Code, City.Name, City.population  from world.Country join world.City on world.City.CountryCode=world.Country.Code where City.population > 10000 group by Name order by City.Population desc limit 5;
+------+---------------+------------+
| Code | Name          | population |
+------+---------------+------------+
| RUS  | Moscow        |    8389200 |
| GBR  | London        |    7285000 |
| RUS  | St Petersburg |    4694000 |
| DEU  | Berlin        |    3386667 |
| ESP  | Madrid        |    2879052 |
+------+---------------+------------+

You can see ProxySQL internal information using the following queries:

[Pa]
 select active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| active | hits | rule_id | match_digest        | match_pattern                          | replace_pattern | cache_ttl | apply | flagIN | flagOUT |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| 1      | 1    | 33      | NULL                | \S*\s*\/\*\s*continent=.*Europe\s*\*.* | NULL            | NULL      | 0     | 0      | 25      |
| 1      | 4    | 34      | NULL                | world.                                 | Europe.         | NULL      | 0     | 25     | 25      |
| 1      | 0    | 35      | NULL                | \S*\s*\/\*\s*continent=.*Africa\s*\*.* | NULL            | NULL      | 0     | 0      | 24      |
| 1      | 0    | 36      | NULL                | world.                                 | Africa.         | NULL      | 0     | 24     | 24      |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

And:

[Pa]
select * from stats_mysql_query_digest;
<snip and taking only digest_text>
Select Code, City.Name, City.population from Europe.Country join Europe.City on Europe.City.CountryCode=Europe.Country.Code where City.population > ? group by Name order by City.Population desc limit ?

As you can see ProxySQL has nicely replaced the word world in the query to Europe, and it ran Query Rule 34 four times (hits).

This is obviously working for Insert/Update/Delete as well.

Queries like:

insert into  /* continent=Europe */  world.City values(999999,'AAAAAAA','ITA','ROMA',0) ;

Will be transformed into:

[Pa]
select digest_text from stats_mysql_query_digest;
+-------------------------------------------+
| digest_text                               |
+-------------------------------------------+
| insert into Europe.City values(?,?,?,?,?) |
+-------------------------------------------+

And executed only on the desired schema.

Sharding by host

Using hint

How can I shard and redirect the queries to a host (instead of a schema)? This is even easier! 🙂

The main point is that whatever matches the rule should go to a defined HG. No rewrite imply, which means less work.

So how this is done? As before, I have three NODES: 192.168.1.[5-6-7]. For this example, I will use world DB (no continent schema), distributed in each node, and I will retrieve the node bound to the IP to be sure I am going to the right place.

I instruct ProxySQL to send my query by using a HINT to a specific host. I choose the hint “shard_host_HG”, and I am going to inject it in the query as a comment.

As such the Query Rules will be:

[Pa]
delete from mysql_query_rules where rule_id in (40,41,42, 10,11,12);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (10,1,'user_shardRW',"\/\*\s*shard_host_HG=.*Europe\s*\*.",10,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (11,1,'user_shardRW',"\/\*\s*shard_host_HG=.*Asia\s*\*.",20,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (12,1,'user_shardRW',"\/\*\s*shard_host_HG=.*Africa\s*\*.",30,0);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

While the queries I am going to test are:

[Mc]
Select /* shard_host_HG=Europe */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' limit 5; SELECT * /* shard_host_HG=Europe */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';
Select /* shard_host_HG=Asia */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='IND' limit 5; SELECT * /* shard_host_HG=Asia */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';
Select /* shard_host_HG=Africa */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ETH' limit 5; SELECT * /* shard_host_HG=Africa */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';

Running the query for Africa, I will get:

[Mc]
Select /* shard_host_HG=Africa */ City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ETH' limit 5; SELECT * /* shard_host_HG=Africa */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';
+-------------+------------+
| Name        | Population |
+-------------+------------+
| Addis Abeba |    2495000 |
| Dire Dawa   |     164851 |
| Nazret      |     127842 |
| Gonder      |     112249 |
| Dese        |      97314 |
+-------------+------------+
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| BIND_ADDRESS  | 192.168.1.7    |
+---------------+----------------+

That will give me:

[Pa]
select active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| active | hits | rule_id | match_digest        | match_pattern                          | replace_pattern | cache_ttl | apply | flagIN | flagOUT |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| 1      | 0    | 40      | NULL                | \/\*\s*shard_host_HG=.*Europe\s*\*.    | NULL            | NULL      | 0     | 0      | 0       |
| 1      | 0    | 41      | NULL                | \/\*\s*shard_host_HG=.*Asia\s*\*.      | NULL            | NULL      | 0     | 0      | 0       |
| 1      | 2    | 42      | NULL                | \/\*\s*shard_host_HG=.*Africa\s*\*.    | NULL            | NULL      | 0     | 0      | 0       | <-- Note the HITS (2 as the run queries)
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

In this example, we have NO replace_patter. This is only a matching and redirecting Rule, where the destination HG is defined in the value of the destination_hostgroup attribute while inserting. In the case for Africa, it is HG 30.

The server in HG 30 is:

[Pa]
select hostgroup_id,hostname,port,status from mysql_servers ;
+--------------+-------------+------+--------+
| hostgroup_id | hostname    | port | status |
+--------------+-------------+------+--------+
| 10           | 192.168.1.5 | 3306 | ONLINE |
| 20           | 192.168.1.6 | 3306 | ONLINE |
| 30           | 192.168.1.7 | 3306 | ONLINE | <---
+--------------+-------------+------+--------+

This perfectly matches our returned value.

You can try by your own the other two continents.

Using destination_hostgroup

Another way to assign a query’s final host is to use the the destination_hostgroup, set the Schema_name attribute and use the use schema syntax in the query.

For example:

[Pa]
INSERT INTO mysql_query_rules (active,schemaname,destination_hostgroup,apply) VALUES
(1, 'shard00', 1, 1), (1, 'shard01', 1, 1), (1, 'shard03', 1, 1),
(1, 'shard04', 2, 1), (1, 'shard06', 2, 1), (1, 'shard06', 2, 1),
(1, 'shard07', 3, 1), (1, 'shard08', 3, 1), (1, 'shard09', 3, 1);

And then in the query do something like :

use shard02; Select * from tablex;

I mention this method because it is currently one of the most common in large companies using SHARDING.

But it is not safe, because it relays on the fact the query will be executed in the desired HG. The risk of error is high.

Just think if a query doing join against a specified SHARD:

use shard01; Select * from tablex join shard03 on tablex.id = shard03.tabley.id;

This will probably generate an error because shard03 is probably NOT present on the host containing shard01.

As such this approach can be used ONLY when you are 100% sure of what you are doing, and when you are sure NO query will have an explicit schema declaration.

Shard by host and by schema

Finally, is obviously possible to combine the two approaches and shard by host with only a subset of schemas.

To do so, let’s use all three nodes and have the schema distribution as follow:

  • Europe on Server 192.168.1.5 -> HG 10
  • Asia on Server 192.168.1.6 -> HG 20
  • Africa on Server 192.168.1.7 -> HG 30

I have already set the query rules using HINT, so what I have to do is to use them BOTH to combine the operations:

[Mc]
Select /* shard_host_HG=Asia */ /* continent=Asia */  City.Name, City.Population from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='IND' limit 5; SELECT * /* shard_host_HG=Asia */ from information_schema.GLOBAL_VARIABLES where variable_name like 'bind%';
+--------------------+------------+
| Name               | Population |
+--------------------+------------+
| Mumbai (Bombay)    |   10500000 |
| Delhi              |    7206704 |
| Calcutta [Kolkata] |    4399819 |
| Chennai (Madras)   |    3841396 |
| Hyderabad          |    2964638 |
+--------------------+------------+
5 rows in set (0.00 sec)
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| BIND_ADDRESS  | 192.168.1.6    |
+---------------+----------------+
1 row in set (0.01 sec)

[Pa]
mysql> select digest_text from stats_mysql_query_digest;
+--------------------------------------------------------------------------------------------------------------------------------------------+
| digest_text                                                                                                                                |
+--------------------------------------------------------------------------------------------------------------------------------------------+
| SELECT * from information_schema.GLOBAL_VARIABLES where variable_name like ?                                                               |
| Select City.Name, City.Population from Asia.Country join Asia.City on Asia.City.CountryCode=Asia.Country.Code where Country.code=? limit ? |
+--------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| active | hits | rule_id | match_digest        | match_pattern                          | replace_pattern | cache_ttl | apply | flagIN | flagOUT |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| 1      | 0    | 10      | NULL                | \/\*\s*shard_host_HG=.*Europe\s*\*.    | NULL            | NULL      | 0     | 0      | NULL    |
| 1      | 2    | 11      | NULL                | \/\*\s*shard_host_HG=.*Asia\s*\*.      | NULL            | NULL      | 0     | 0      | NULL    |
| 1      | 0    | 12      | NULL                | \/\*\s*shard_host_HG=.*Africa\s*\*.    | NULL            | NULL      | 0     | 0      | NULL    |
| 1      | 0    | 13      | NULL                | NULL                                   | NULL            | NULL      | 0     | 0      | 0       |
| 1      | 1    | 31      | NULL                | \S*\s*\/\*\s*continent=.*Asia\s*\*.*   | NULL            | NULL      | 0     | 0      | 23      |
| 1      | 4    | 32      | NULL                | world.                                 | Asia.           | NULL      | 0     | 23     | 23      |
| 1      | 0    | 33      | NULL                | \S*\s*\/\*\s*continent=.*Europe\s*\*.* | NULL            | NULL      | 0     | 0      | 25      |
| 1      | 0    | 34      | NULL                | world.                                 | Europe.         | NULL      | 0     | 25     | 25      |
| 1      | 0    | 35      | NULL                | \S*\s*\/\*\s*continent=.*Africa\s*\*.* | NULL            | NULL      | 0     | 0      | 24      |
| 1      | 0    | 36      | NULL                | world.                                 | Africa.         | NULL      | 0     | 24     | 24      |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

As you can see rule 11 has two HITS, which means my queries will go to the associated HG. But given that the Apply for rule 11 is =0, ProxySQL will first continue to process the Query Rules.

As such it will also transform the queries for rules 31 and 32, each one having the expected number of hits (one the first and the second four because of the loop).

That was all we had to do to perform a perfect two layer sharding in ProxySQL.

Conclusion

ProxySQL allows the user to access data distributed by shards in a very simple way. The query rules that follow the consolidate RegEx pattern, in conjunction with the possibility to concatenate rules and the Host Group approach definition give us huge flexibility with relative simplicity.

References:

https://github.com/sysown/proxysql/tree/v1.2.2/doc
https://github.com/google/re2/wiki/Syntax
http://www.proxysql.com/2015/09/proxysql-tutorial-setup-in-mysql.html
https://github.com/sysown/proxysql/blob/v1.2.2/doc/configuration_howto.md
https://github.com/sysown/proxysql/blob/v1.2.2/INSTALL.md
https://dev.mysql.com/doc/index-other.html

Credits

It is obvious that I need to acknowledge and kudos the work Rene’ Cannao is doing to make ProxySQL a solid, fast and flexible product. I also should mention that I work with him very often (more often than he likes), asking him for fixes and discussing optimization strategies. Requests that he satisfies with surprising speed and efficiency.

by Marco Tusa at August 30, 2016 07:25 PM

Percona Live Europe Discounted Pricing and Community Dinner!

percona live europe

Get your Percona Live Europe discounted tickets now, and sign up for the community dinner.

percona live europeThe countdown is on for the annual Percona Live Europe Open Source Database Conference! This year the conference will be taking place in the great city of Amsterdam October 3-5. This three-day conference will focus on the latest trends, news and best practices in the MySQL, MongoDB, PostgreSQL and other open source databases, while tackling subjects such as analytics, architecture and design, security, operations, scalability and performance. Percona Live provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs.

With breakout sessions, tutorial sessions and keynote speakers, there will certainly be no lack of content.
Advanced Rate Registration ENDS September 5, so make sure to register now to secure the best price possible.

As it is a Percona Live Europe conference, there will certainly be no lack of FUN either!!!!

percona live europeAs tradition holds, there will be a Community Dinner. Tuesday night, October 4, Percona Live Diamond Sponsor Booking.com hosts the Community Dinner at their very own headquarters located in historic Rembrandt Square in the heart of the city. After breakout sessions conclude, attendees are picked up right outside of the venue and taken to booking.com’s headquarters by canal boats! This gives all attendees the opportunity to play “tourist” while viewing the beauty of Amsterdam from the water. Attendees are dropped off right next to Booking.com’s office (return trip isn’t included)! The Lightning Talks for this year’s conference will be featured at the dinner.

Come and show your support for the community while enjoying dinner and drinks! The first 50 people registered for the dinner get in the doors for €10 (after that the price goes to €15 euro). Space is limited so make sure to sign up ASAP!

So don’t forget, register for the conference and sign up for the community dinner before space is gone! See you in Amsterdam!

by Kortney Runyan at August 30, 2016 05:47 PM

MariaDB Foundation

MariaDB 10.1.17 and MariaDB Galera Cluster 10.0.27 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.17 and MariaDB Galera Cluster 10.0.27. See the release notes and changelogs for details on these releases. Download MariaDB 10.1.17 Release Notes Changelog What is MariaDB 10.1? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Galera Cluster 10.0.27 Release Notes Changelog What […]

The post MariaDB 10.1.17 and MariaDB Galera Cluster 10.0.27 now available appeared first on MariaDB.org.

by Daniel Bartholomew at August 30, 2016 03:19 PM

Jean-Jerome Schmidt

From Stockholm to Paris via Amsterdam: Polyglot Persistence Meetups across Europe

Thanks to everyone who joined us last week in Stockholm at Spotify HQ: it was great to meet fellow Polyglots and discuss the talks by Radovan Zvoncek, Spotify: Storage at Spotify and Jim Dowling, Scientist & Lecturer: Polyglot metadata for Hadoop. We’ll be sure to schedule the next meetup soon!

Tomorrow, August 31st, we’ll be back in Amsterdam, hosted again by Booking.com in their HQ. Thank you for having us again.

The agenda includes talks by Art van Scheppingen, Severalnines, on 9 DevOps Tips for going in production with your open source databases as well as by fellow Polyglot member Yegor Andreenko, who will give us an introduction to ClickHouse - an open-source column-oriented database management system that allows users to generate analytical data reports in real time.

You can still join tomorrow of course if you’re in Amsterdam by signing up on the meetup.com page.

And finally for now, we’ll also be in Paris on September 13th for a joint meetup with Percona and the MySQL User Group there.

The agenda here includes talks by Giuseppe Maxia, VMware, on MySQL Operations in Docker, Sveta Smirnova, Percona, on introducing new SQL syntax and improving performance with preparse Query Rewrite Plugins and Vinay Joosery, Severalnines, on Polyglot Persistence for MongoDB, PostgreSQL and MySQL DBAs.

We’ll be meeting at Logmatic.io in 75015 Paris for an evening of interesting talks, discussions and drinks/nibbles. Simply follow this meetup link to sign up.

We are working on further dates as well as we continue this meetup series, so look out for future updates and locations.

See you there!

by Severalnines at August 30, 2016 03:18 PM

August 29, 2016

Peter Zaitsev

Percona Live Europe featured talk with Alexander Krasheninnikov — Processing 11 billion events a day with Spark in Badoo

Percona Live Europe Featured Talk

Percona Live Europe Featured TalkWelcome to a new Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Alexander Krasheninnikov, Head of Data Team at Badoo. His talk will be on Processing 11 billions events a day with Spark in Badoo. Badoo is one of the world’s largest and fastest growing social networks for meeting new people. I had a chance to speak with Alexander and learn a bit more about the database environment at Badoo:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it?

Alexander: Currently, I work at Badoo as Head of Data Team. Our team is responsible for providing internal API’s for statistics data collecting and processing.

I started as a developer at Badoo, but the project I am going to cover in my talk lead to creating a separate department.

Percona: Your talk is called “Processing 11 billion events a day with Spark in Badoo.” What were the issues with your environment that led you to Spark? How did Spark solve these needs?

Alexander: When we designed the Unified Data Stream system in Badoo, we’ve extracted several requirements: scalability, fault tolerance and reliability. Altogether, these requirements moved us towards using Hadoop as deep data storage and data processing framework. Our initial implementation was built on top of Scribe + WebHDFS + Hive. But we’ve realized that processing speed and any lag of data delivery is unacceptable (we need near-realtime data processing). One of our BI team mentioned Spark as being significantly faster than Hive in some cases, (especially ones similar to ours). When investigated Spark’s API, we found the Streaming submodule — ideal for our needs. Additionally, this framework allowed us to use some third-party libraries, and write code. We’ve actually created an aggregation framework that follows “divide and conquer” principle. Without Spark, we definitely went way re-inventing lot of things from it.

Percona: Why is tracking the event stream important for your business model? How are you using the data Spark is providing you to reach business goals?

Alexander: The event stream always represents some important business/technical metrics — votes, messages, likes and so on. All this, brought together, forms the “health” of our product. The primary goal of our Spark-based system is to process a heterogeneous event stream one way, and draw charts automatically. We acheived this goal, and now we have hundreds of charts and dozens of developers/analysts/product team members using them. The system also evolved, and now we perform automatic anomaly detection over the event stream. We report strange data behavior to all the interested people.

Percona: What is changing in data use in your businesses model that keeps you awake at night? What tools or features are you looking for to address these issues?

Alexander: As I’ve mentioned before, we have an anomaly detection process for our metrics. If some of our metrics are out of expected bounds, it is treated as being an anomaly, and notification are sent. Also, we have a self-monitoring functionality for the whole system — a small event rate of heartbeats is generated, and processed with two different systems. If those show a significant difference — that defintely keeps me awake at night! 🙂

Percona: What are looking forward to the most at Percona Live Europe this year?

Alexander: My main interest is distributed open source databases. At Percona Live Europe, I expect to gain a lot of new information from the appropriate conference sections. Particularly, I want to get some knowledge about Yandex ClickHouse, as it looks very promising.

You can read more about how Alexander and Badoo use Spark here: techblog.badoo.com.

Want to find out more about Alexander, Spark and Badoo? Register for Percona Live Europe 2016, and come see his talk Processing 11 billions events a day with Spark in Badoo.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

by Dave Avery at August 29, 2016 06:51 PM

Jean-Jerome Schmidt

MySQL on Docker: Introduction to Docker Swarm Mode and Multi-Host Networking

In the previous blog post, we looked into Docker’s single-host networking for MySQL containers. This time, we are going to look into the basics of multi-host networking and Docker swarm mode, a built-in orchestration tool to manage containers across multiple hosts.

Docker Engine - Swarm Mode

Running MySQL containers on multiple hosts can get a bit more complex depending on the clustering technology you choose.

Before we try to run MySQL on containers + multi-host networking, we have to understand how the image works, how much resources to allocate (disk,memory,CPU), networking (the overlay network drivers - default, flannel, weave, etc) and fault tolerance (how is the container relocated, failed over and load balanced). Because all these will impact the overall operations, uptime and performance of the database. It is recommended to use an orchestration tool to get more manageability and scalability on top of your Docker engine cluster. The latest Docker Engine (version 1.12, released on July 14th, 2016) includes swarm mode for natively managing a cluster of Docker Engines called a Swarm. Take note that Docker Engine Swarm mode and Docker Swarm are two different projects, with different installation steps despite they both work in a similar way.

Some of the noteworthy parts that you should know before entering the swarm world:

  • The following ports must be opened:
    • 2377 (TCP) - Cluster management
    • 7946 (TCP and UDP) - Nodes communication
    • 4789 (TCP and UDP) - Overlay network traffic
  • There are 2 types of nodes:
    • Manager - Manager nodes perform the orchestration and cluster management functions required to maintain the desired state of the swarm. Manager nodes elect a single leader to conduct orchestration tasks.
    • Worker - Worker nodes receive and execute tasks dispatched from manager nodes. By default, manager nodes are also worker nodes, but you can configure managers to be manager-only nodes.

More details in the Docker Engine Swarm documentation.

In this blog, we are going to deploy application containers on top of a load-balanced Galera Cluster on 3 Docker hosts (docker1, docker2 and docker3), connected through an overlay network. We will use Docker Engine Swarm mode as the orchestration tool.

“Swarming” Up

Let’s cluster our Docker nodes into a Swarm. Swarm mode requires an odd number of managers (obviously more than one) to maintain quorum for fault tolerance. So, we are going to use all the physical hosts as manager nodes. Note that by default, manager nodes are also worker nodes.

  1. Firstly, initialize Swarm mode on docker1. This will make the node as manager and leader:

    [root@docker1]$ docker swarm init --advertise-addr 192.168.55.111
    Swarm initialized: current node (6r22rd71wi59ejaeh7gmq3rge) is now a manager.
    
    To add a worker to this swarm, run the following command:
    
        docker swarm join \
        --token SWMTKN-1-16kit6dksvrqilgptjg5pvu0tvo5qfs8uczjq458lf9mul41hc-dzvgu0h3qngfgihz4fv0855bo \
        192.168.55.111:2377
    
    To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
  2. We are going to add two more nodes as manager. Generate the join command for other nodes to register as manager:

    [docker1]$ docker swarm join-token manager
    To add a manager to this swarm, run the following command:
    
        docker swarm join \
        --token SWMTKN-1-16kit6dksvrqilgptjg5pvu0tvo5qfs8uczjq458lf9mul41hc-7fd1an5iucy4poa4g1bnav0pt \
        192.168.55.111:2377
  3. On docker2 and docker3, run the following command to register the node:

    $ docker swarm join \
        --token SWMTKN-1-16kit6dksvrqilgptjg5pvu0tvo5qfs8uczjq458lf9mul41hc-7fd1an5iucy4poa4g1bnav0pt \
        192.168.55.111:2377
  4. Verify if all nodes are added correctly:

    [docker1]$ docker node ls
    ID                           HOSTNAME       STATUS  AVAILABILITY  MANAGER STATUS
    5w9kycb046p9aj6yk8l365esh    docker3.local  Ready   Active        Reachable
    6r22rd71wi59ejaeh7gmq3rge *  docker1.local  Ready   Active        Leader
    awlh9cduvbdo58znra7uyuq1n    docker2.local  Ready   Active        Reachable

    At the moment, we have docker1.local as the leader. 

Overlay Network

The only way to let containers running on different hosts connect to each other is by using an overlay network. It can be thought of as a container network that is built on top of another network (in this case, the physical hosts network). Docker Swarm mode comes with a default overlay network which implements a VxLAN-based solution with the help of libnetwork and libkv. You can however choose another overlay network driver like Flannel, Calico or Weave, where extra installation steps are necessary. We are going to cover more on that later in an upcoming blog post.

In Docker Engine Swarm mode, you can create an overlay network only from a manager node and it doesn’t need an external key-value store like etcd, consul or Zookeeper.

The swarm makes the overlay network available only to nodes in the swarm that require it for a service. When you create a service that uses an overlay network, the manager node automatically extends the overlay network to nodes that run service tasks.

Let’s create an overlay network for our containers. We are going to deploy Percona XtraDB Cluster and application containers on separate Docker hosts to achieve fault tolerance. These containers must be running on the same overlay network so they can communicate with each other.

We are going to name our network “mynet”. You can only create this on the manager node:

[docker1]$ docker network create --driver overlay mynet

Let’s see what networks we have now:

[docker1]$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
213ec94de6c9        bridge              bridge              local
bac2a639e835        docker_gwbridge     bridge              local
5b3ba00f72c7        host                host                local
03wvlqw41e9g        ingress             overlay             swarm
9iy6k0gqs35b        mynet               overlay             swarm
12835e9e75b9        none                null                local

There are now 2 overlay networks with a Swarm scope. The “mynet” network is what we are going to use today when deploying our containers. The ingress overlay network comes by default. The swarm manager uses ingress load balancing to expose the services you want externally to the swarm.

Deployment using Services and Tasks

We are going to deploy the Galera Cluster containers through services and tasks. When you create a service, you specify which container image to use and which commands to execute inside running containers. There are two type of services:

  • Replicated services - Distributes a specific number of replica tasks among the nodes based upon the scale you set in the desired state, for examples “--replicas 3”.
  • Global services - One task for the service on every available node in the cluster, for example “--mode global”. If you have 7 Docker nodes in the Swarm, there will be one container on each of them.

Docker Swarm mode has a limitation in managing persistent data storage. When a node fails, the manager will get rid of the containers and create new containers in place of the old ones to meet the desired replica state. Since a container is discarded when it goes down, we would lose the corresponding data volume as well. Fortunately for Galera Cluster, the MySQL container can be automatically provisioned with state/data when joining.

Deploying Key-Value Store

The docker image that we are going to use is from Percona-Lab. This image requires the MySQL containers to access a key-value store (supports etcd only) for IP address discovery during cluster initialization and bootstrap. The containers will look for other IP addresses in etcd, if there are any, start the MySQL with a proper wsrep_cluster_address. Otherwise, the first container will start with the bootstrap address, gcomm://.

  1. Let’s deploy our etcd service. We will use etcd image available here. It requires us to have a discovery URL on the number of etcd node that we are going to deploy. In this case, we are going to setup a standalone etcd container, so the command is:

    [docker1]$ curl -w "\n" 'https://discovery.etcd.io/new?size=1'
    https://discovery.etcd.io/a293d6cc552a66e68f4b5e52ef163d68
  2. Then, use the generated URL as “-discovery” value when creating the service for etcd:

    [docker1]$ docker service create \
    --name etcd \
    --replicas 1 \
    --network mynet \
    -p 2379:2379 \
    -p 2380:2380 \
    -p 4001:4001 \
    -p 7001:7001 \
    elcolio/etcd:latest \
    -name etcd \
    -discovery=https://discovery.etcd.io/a293d6cc552a66e68f4b5e52ef163d68

    At this point, Docker swarm mode will orchestrate the deployment of the container on one of the Docker hosts.

  3. Retrieve the etcd service virtual IP address. We are going to use that in the next step when deploying the cluster:

    [docker1]$ docker service inspect etcd -f "{{ .Endpoint.VirtualIPs }}"
    [{03wvlqw41e9go8li34z2u1t4p 10.255.0.5/16} {9iy6k0gqs35bn541pr31mly59 10.0.0.2/24}]

    At this point, our architecture looks like this:

Deploying Database Cluster

  1. Specify the virtual IP address for etcd in the following command to deploy Galera (Percona XtraDB Cluster) containers:

    [docker1]$ docker service create \
    --name mysql-galera \
    --replicas 3 \
    -p 3306:3306 \
    --network mynet \
    --env MYSQL_ROOT_PASSWORD=mypassword \
    --env DISCOVERY_SERVICE=10.0.0.2:2379 \
    --env XTRABACKUP_PASSWORD=mypassword \
    --env CLUSTER_NAME=galera \
    perconalab/percona-xtradb-cluster:5.6
  2. It takes some time for the deployment where the image will be downloaded on the assigned worker/manager node. You can verify the status with the following command:

    [docker1]$ docker service ps mysql-galera
    ID                         NAME                IMAGE                                  NODE           DESIRED STATE  CURRENT STATE            ERROR
    8wbyzwr2x5buxrhslvrlp2uy7  mysql-galera.1      perconalab/percona-xtradb-cluster:5.6  docker1.local  Running        Running 3 minutes ago
    0xhddwx5jzgw8fxrpj2lhcqeq  mysql-galera.2      perconalab/percona-xtradb-cluster:5.6  docker3.local  Running        Running 2 minutes ago
    f2ma6enkb8xi26f9mo06oj2fh  mysql-galera.3      perconalab/percona-xtradb-cluster:5.6  docker2.local  Running        Running 2 minutes ago
  3. We can see that the mysql-galera service is now running. Let’s list out all services we have now:

    [docker1]$ docker service ls
    ID            NAME          REPLICAS  IMAGE                                  COMMAND
    1m9ygovv9zui  mysql-galera  3/3       perconalab/percona-xtradb-cluster:5.6
    au1w5qkez9d4  etcd          1/1       elcolio/etcd:latest                    -name etcd -discovery=https://discovery.etcd.io/a293d6cc552a66e68f4b5e52ef163d68
  4. Swarm mode has an internal DNS component that automatically assigns each service in the swarm a DNS entry. So you use the service name to resolve to the virtual IP address:

    [docker2]$ docker exec -it $(docker ps | grep etcd | awk {'print $1'}) ping mysql-galera
    PING mysql-galera (10.0.0.4): 56 data bytes
    64 bytes from 10.0.0.4: seq=0 ttl=64 time=0.078 ms
    64 bytes from 10.0.0.4: seq=1 ttl=64 time=0.179 ms

    Or, retrieve the virtual IP address through the “docker service inspect” command:

    [docker1]# docker service inspect mysql-galera -f "{{ .Endpoint.VirtualIPs }}"
    [{03wvlqw41e9go8li34z2u1t4p 10.255.0.7/16} {9iy6k0gqs35bn541pr31mly59 10.0.0.4/24}]

    Our architecture now can be illustrated as below:

Deploying Applications

Finally, you can create the application service and pass the MySQL service name (mysql-galera) as the database host value:

[docker1]$ docker service create \
--name wordpress \
--replicas 2 \
-p 80:80 \
--network mynet \
--env WORDPRESS_DB_HOST=mysql-galera \
--env WORDPRESS_DB_USER=root \
--env WORDPRESS_DB_PASSWORD=mypassword \
wordpress

Once deployed, we can then retrieve the virtual IP address for wordpress service through the “docker service inspect” command:

[docker1]# docker service inspect wordpress -f "{{ .Endpoint.VirtualIPs }}"
[{p3wvtyw12e9ro8jz34t9u1t4w 10.255.0.11/16} {kpv8e0fqs95by541pr31jly48 10.0.0.8/24}]

At this point, this is what we have:

Our distributed application and database setup is now deployed by Docker containers.

Connecting to the Services and Load Balancing

At this point, the following ports are published (based on the -p flag on each “docker service create” command) on all Docker nodes in the cluster, whether or not the node is currently running the task for the service:

  • etcd - 2380, 2379, 7001, 4001
  • MySQL - 3306
  • HTTP - 80

If we connect directly to the PublishedPort, with a simple loop, we can see that the MySQL service is load balanced among containers:

[docker1]$ while true; do mysql -uroot -pmypassword -h127.0.0.1 -P3306 -NBe 'select @@wsrep_node_address'; sleep 1; done
10.255.0.10
10.255.0.8
10.255.0.9
10.255.0.10
10.255.0.8
10.255.0.9
10.255.0.10
10.255.0.8
10.255.0.9
10.255.0.10
^C

At the moment, Swarm manager manages the load balancing internally and there is no way to configure the load balancing algorithm. We can then use external load balancers to route outside traffic to these Docker nodes. In case of any of the Docker nodes goes down, the service will be relocated to the other available nodes.

That’s all for now. In the next blog post, we’ll take a deeper look at Docker overlay network drivers for MySQL containers.

by Severalnines at August 29, 2016 06:51 PM

MariaDB AB

Installing MariaDB MaxScale the Hard Way

Anders Karlsson

If you are like me (let's for everyones sake hope you are not, though), you like to do things the hard way, in particular when it comes to testing things. For example, when installing things on your Linux box, just to try them out, you might not want to do a yum install, an rpm -ivh or an apt-get to have some files spread all over your system. Instead you want to tar xvf some tarball and possibly, if you are in a good mood or you want to be nice so you get some gifts for the holidays or maybe because it is just that day, you unpack that tarball in /usr/local instead of in /home/bofh/junk. And this will usually get you in some trouble, but as we have already determined that we are truly bad (Maybe we should get a tattoo or two also, or is the right to death-metal antics reserved for IT security personel only? Sure seems so.) we can ignore that and get to work.

Here, I will show you how to install MariaDB MaxScale from a tarball and get it running, without touching any system directories or anything if you want to test it or if you, even in production, want to install it in some non-standard location (like /usr/local. I actually like to have stuff there, I don't know what's so wrong with that. I'm a rebel, I know...).

To begin with, let's download MariaDB MaxScale tarball (rpm's are for wussies), for example from mariadb.com where you should register and then go to "my portal"->;"Downloads"->;"MariaDB MaxScale" and download an appropriate .tar.gz for your operating system of choice. In my case, I download it for CentOS 6 / RHEL 6 and since the current MariaDB MaxScale version is 2.0.0, I issue the command in my home directory (/home2/anders):

$ wget https://downloads.mariadb.com/enterprise/99sd-ha47/mariadb-maxscale/2.0.0/tgz/maxscale-beta-2.0.0-1.centos.6.x86_64.tar.gz

With <my tag> replaced by a generated tag on mariadb.com. Following this we are stuck with a tarball named maxscale-2.0.0-1.rhel.6.x86_64.tar.gz and we unpack that as usual and then create a more readable link to the created directory:

$ tar xvfz maxscale-beta-2.0.0-1.centos.6.x86_64.tar.gz
$ ln -s maxscale-beta-2.0.0-1.centos.6.x86_64 maxscale200

So far nothing magic has happened. The next step is to create a few directories in our new maxscale200 directory where MariaDB MaxScale will keep temporary, stuff, logs, etc.:

$ cd maxscale200
​$ mkdir cache data log

The next step, is to create a MariaDB MaxScale configuration file. There is a template for this in the etc subdirectory so we just have to copy that:

$ cp etc/maxscale.cnf.template etc/maxscale.cnf

The supplied config file will start MariaDB MaxScale with just 1 server defined, and unless you have this server running on a non-standard port or on another machine other than the one where MariaDB MaxScale itself is running, you can leave this configuration file alone, and, if not, you have to edit the [server1] section appropriately.

Another thing to look for is iptables / firewalld settings, but this you already know about I guess. You might want to turn them off (which is not recommended at all) or configure it appropriately. As per the default configuration with MariaDB MaxScale 1.4.3, ports 4006, 4008 and 6603 will be listened to, so you configure iptables / firewalld appropriately. And don't turn them off, do this the right way for once. I turned iptables off by the way, just to annoy you.

Now, MariaDB MaxScale will connect to the server we defined in the configuration file above, and we need to allow it to connect and execute a few commands. There are two users that MariaDB MaxScale can use, one to connect and get authentication data, like usernames and passwords, and another separate one to monitor the state of the server. In the supplied configuration template these two users use the same account, namely myuser using mypwd as the password, and this is what I use in the following where are set up the appropriate user and grant in the MariaDB server I am connecting to. Also note that I am assuming that MariaDB MaxScale and the MariaDB server in question run on the same node. So connect to MariaDB and issue the following commands:

MariaDB> CREATE USER 'myuser'@'localhost'  IDENTIFIED BY 'mypwd';
MariaDB> GRANT SELECT ON mysql.user TO 'myuser'@'localhost';
MariaDB> GRANT SELECT ON mysql.db TO 'myuser'@'localhost';
MariaDB> GRANT SELECT ON mysql.tables_priv TO 'myuser'@'localhost';
MariaDB> GRANT SHOW DATABASES ON *.*TO 'myuser'@'localhost';
MariaDB> GRANT REPLICATION SLAVE ON *.* TO 'myuser'@'localhost';
MariaDB> GRANT REPLICATION CLIENT ON *.* TO 'myuser'@'localhost';

With this in place we are ready to start MariaDB MaxScale, but this is an itsy bitsy more complex than you think. The issue is that the default locations for a lot of stuff that MariaDB MaxScale wants to use is somewhere in the global file system, and they are also not relative to some basedir as is conveniently the case with MariaDB server itself. To support this, instead of putting all this in the global section in the MariaDB MaxScale config file, I'll instead put any necessary arguments to get MaxScale going on the command line, and for that I have created three scripts: one to set up the environment, one to start MariaDB MaxScale and one to stop it. Let's start with the environment one first. This is places in the MariaDB MaxScale home directory (maxscale200) called maxenv.sh and has the following contents:

#!/bin/bash
#
MAXSCALE_HOME=$(cd $(dirname $BASH_SOURCE) ; pwd)
PATH=$PATH:$MAXSCALE_HOME/usr/bin
export LD_LIBRARY_PATH=$MAXSCALE_HOME/usr/lib64/maxscale

The next file to create is the script to start MariaDB MaxScale. This is called startmax.sh, is again placed in the MariaDB MaxScale root directory and has this content:

#!/bin/bash
#
. `dirname $0`/maxenv.sh

$MAXSCALE_HOME/usr/bin/maxscale \
  --config=$MAXSCALE_HOME/etc/maxscale.cnf \
  --logdir=$MAXSCALE_HOME/log \
  --datadir=$MAXSCALE_HOME/data \
  --libdir=$MAXSCALE_HOME/usr/lib64/maxscale \
  --piddir=$MAXSCALE_HOME --syslog=no \
  --cachedir=$MAXSCALE_HOME/cache

As you can see this invokes maxenv.sh before going on to start MariaDB MaxScale. The only parameter that I really don't have to set here, but which I set anyway, again just to be annoying to the world in general, is --syslog=no as we are only testing things here and logging to syslog is then not really appropriate (but it is the default).

All we need now is the script to stop MariaDB MaxScale, and for this create a file called stopmax.sh in the MariaDB MaxScale home directory with this content:

#!/bin/bash
#
. `dirname $0`/maxenv.sh

# Check that we have a pid file.
if [ ! -e "$MAXSCALE_HOME/maxscale.pid" ]; then
   exit 0
fi
MAXPID=`cat $MAXSCALE_HOME/maxscale.pid`

# If we have a pid file but no process, then delete the pid file and exit.
if [ "`ps -p $MAXPID | wc -l`" -lt 2 ]; then
   rm $MAXSCALE_HOME/maxscale.pid
   exit 0
fi

kill -term $MAXPID

# Wait for max 20 seconds until the service is gone.
numwaits=0
while [ $numwaits -le 20 -a -e "$MAXSCALE_HOME/maxscale.pid" ]; do
   sleep 1
   numwaits=$(($numwaits + 1))
done

Following this, the one thing that remains to be done is to make the scripts we just created executable:

$ chmod +x maxenv.sh startmax.sh stopmax.sh

Now we are ready to try things, let's start MariaDB MaxScale first:

$ ./startmax.sh

And then let's see if we can connect to the MariaDB server through MariaDB MaxScale:

$ mysql -h 127.0.0.1 -P 4006 -u myuser -pmypwd

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 6950
Server version: 10.0.0 1.4.3-maxscale MariaDB Server

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB>

As you can see, I am not connecting as root here, as this is not allowed by MariaDB MaxScale by default. Also, I am not connecting to localhost as that assumes I am connecting using a socket, which is not what we want to do here.

This is all for today, now I'll need to start my Harley-Davidson and head downtown to hang with the other tough guys (OK, I'm really taking my beaten-up Ford and picking up the kids from kindergarten, I admit it).

Keep on SQL'ing

/Karlsson

Tags: 

by Anders Karlsson at August 29, 2016 02:10 PM

August 26, 2016

MariaDB AB

Configuring MariaDB Master and MariaDB MaxScale for Data Streaming

Massimiliano Pinto

In the previous blog, I introduced Data Streaming with MariaDB MaxScale. In this post, I will show you how to configure a MariaDB Master server and MariaDB MaxScale to stream binary log events from the Master server to MaxScale, and convert them to AVRO records.

First of all, some checks and eventually some modifications will be needed in the MariaDB 10 database.

Configuring the Master Database

MariaDB MaxScale requires that the binary log events provided to each row be modified rather than the operation on the row, whether it was inserted, updated or deleted. Additionally, in order to stream the entire content of the row being modified by a binary log event, replication on the Master database must be configured with ‘row’ format. Simply edit my.cnf, add the two required options and restart mysqld process.

[mysqld]
....
binlog_format=row
binlog_row_image=full

This way, in row-based replication, each row change event contains two images, a “before” image whose columns are matched against when searching for the row to be updated, and an “after” image containing the changes.

You can find out more about replication formats from the MariaDB Knowledge Base.

Configuring the MaxScale Server

MariaDB MaxScale can be configured to run on the same server as MariaDB 10 Master or separately on a dedicated server. In this blog we will show you configurations when MariaDB MaxScale is running on its own dedicated server.

If MariaDB MaxScale is installed on a dedicated server, it needs to register with the Master database as a Slave. To do so, the binlog server should be configured to MariaDB MaxScale in order to receive binlog events from the MariaDB 10 Master database.

AVRO Service Setup

We start by adding two new services into the configuration file. The first service (replication service) uses the binlogrouter plugin which reads the binary logs from the Master server.

The binlog router supports both old style binlog events without GTID and MariaDB 10 binlog events with GTID. However for the purpose of data streaming, MariaDB 10 GTID is required, so we need to specifically set the router_options mariadb10-compatibility to 1.

# The Replication Proxy service
[replication-service]
type=service
router=binlogrouter
router_options=server-id=4000,
               master-id=3000,
               binlogdir=/var/lib/maxscale/binlog/,
               mariadb10-compatibility=1
user=maxuser
passwd=maxpwd

The second service (avro-service) reads the binary logs as they are streamed from the Master through the binlog router and converts them into AVRO format files. Please note that the source of the binary log events is set to “replication-service” - this is what enables AVRO-service to continuously consume the binary log events received by the replication and converts them to AVRO records. The router option avrodir is the directory where the converted AVRO records are stored. The router option filestem is the prefix of the source binlog files.

# The Avro conversion service
[avro-service]
type=service
router=avrorouter
source=replication-service
router_options=avrodir=/var/lib/maxscale/avro/,
               filestem=binlog

Next, we need to setup the listener on MaxScale, so that MaxScale can be administratively configured to register with the Master server as a Slave.

# The listener for the replication-service
[replication-listener]
type=listener
service=replication-service
protocol=MySQLClient
port=4000

Now, we need to set up the listener on MaxScale where the CDC protocol clients can request AVRO change data records.

# The client listener for the avro-service
[avro-listener]
type=listener
service=avro-service
protocol=CDC
port=4001

AVRO Schema Setup

Before starting the conversion process, AVRO schema files for all the tables that need to be replicated need to be on the MaxScale server.

Every converted AVRO record corresponds to a single table record and contains the schema of the table including schema id that refers to the id of an AVRO schema file. The schema files are in JSON format per Avro specification and stored in $avrodir.  Before the conversion process starts, so that MaxScale can generate these schema file, either:

  • the binary log events needs to include CREATE TABLE and any ALTER TABLE events, OR;

  • The schema files needs to be created on MaxScale manually using the cdc_schema Go utility.

All AVRO file schemas follow the same general idea. They are in JSON and follow the following format:

{
    "Namespace": "MaxScaleChangeDataSchema.avro",
    "Type": "record",
    "Name": "ChangeRecord",
    "Fields":
    [
        {
            "Name": "name",
            "Type": "string"
        },
        {
            "Name":"address",
            "Type":"string"
        },
        {
            "Name":"age",
            "Type":"int"
        }
    ]
}

The AVRO converter uses the schema file to identify the columns, their names and what type they are. The Name field contains the name of the column and the Type contains the AVRO type. Read the AVRO specification for details on the layout of the schema files.

Starting MariaDB MaxScale

The next step is to start MariaDB MaxScale and set up the binlog server. We do that by connecting to the listener of the replication_router service and executing a Slave command to set the Master server, then start the Slave on MaxScale.

An example of CHANGE MASTER command required to configure the binlog server:

# mysql -h maxscale-host -P 4000
CHANGE MASTER TO MASTER_HOST='172.18.0.1',
       MASTER_PORT=3000,
       MASTER_LOG_FILE='binlog.000001',
       MASTER_LOG_POS=4,
       MASTER_USER='maxuser',
       MASTER_PASSWORD='maxpwd';
START SLAVE;

That command will start the replication of binary logs from the Master server at 172.18.0.1:3000. After the binary log streaming has started, the AVRO router will automatically start converting the binlogs into AVRO files. Please note that the first binlog file to be converted is the one with the highest sequence number in the $binlogdir with the $filestem prefix.

Now, let us create an example table and insert some data into it to see if our setup works.

First, create a simple test table using the following statement and populate it.

CREATE TABLE test.t1 (id INT); 
INSERT INTO test.t1 VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);

Table creation and insert data events will be converted into an AVRO file, which can be inspected by using the maxavrocheck utility program.

[maxscale@localhost]$ maxavrocheck /var/lib/maxscale/avro/test.t1.000001.avro
File sync marker: caaed7778bbe58e701eec1f96d7719a
/var/lib/maxscale/avro/test.t1.000001.avro: 1 blocks, 1 records and 12 bytes

Now we can see the generated AVRO record. In the next blog post, we will explore how these AVRO records can be requested in real time with the MariaDB MaxScale CDC API.

Related Blog Posts

Data Streaming with MariaDB MaxScale

MaxScale as a Replication Proxy, aka the Binlog Server

About the Author

Massimiliano Pinto's picture

Massimiliano is a Senior Software Solutions Engineer working mainly on MaxScale. Massimiliano has worked for almost 15 years in Web Companies playing the roles of Technical Leader and Software Engineer. Prior to joining MariaDB he worked at Banzai Group and Matrix S.p.A, big players in the Italy Web Industry. He is still a guy who likes too much the terminal window on his Mac. Apache modules and PHP extensions skills are included as well.

by Massimiliano Pinto at August 26, 2016 09:06 AM

Open Query

On Open Source and Business Choices

Open Source is a whole-of-process approach to development that can produce high-quality products better tailored to users’ real world needs.  A key reason for this is the early feedback cycle built into that complete process.

Simply publishing something under an Open Source license (while not applying Open Source development processes) does not yield the same quality and other benefits.  So, not all Open Source is the same.

Publishing source of a product “later” (for instance when the monetary benefit has diminished for the company) is meaningless.  In this scenario, there is no “Open Source benefit” to users whatsoever, it’s simply a proprietary product. There is no opportunity for the client to make custom modifications or improvements, or ask a third party to work on such matters – neither is there any third party opportunity to verify and validate either code quality or security.

Open Source is not a marketing gimmick.  Labels such as “Open Source”, or “Enterprise”, on their own, do not have any more positive outcome than a greasy hamburger labeled with “healthy”.  If a company “believes” in Open Source software, they’ll use the open source development model for their software development.

And now we see things like this: Uproar: MariaDB Corp. veers away from open source (by Simon Phipps, InfoWorld, August 2016)

So what does it mean when a company publishes some of their software under an open source license, and does some related products under a proprietary license?  To me, it’s generally a strong indication that the company either doesn’t believe in that model, or doesn’t understand it.  And we’ve seen it before.

It also reminds me of an interaction I had many years ago.  A Marketing VP asked me “How can we leverage our [Open Source] community?”  I answered the only possible way: “One does not ‘leverage’ the community, that’s not how it works.”  Of course that wasn’t the answer the VP wanted to hear, but that doesn’t make it less true.  They saw the community as an asset to use, rather than work with.  People don’t like getting used, and in the Open Source space that’s even more true.

Companies that have turned their back on their earlier Open Source work and who have devised some other model to (arguably) make more money, have all discovered that this fundamentally changes their market.  They’ll lose some of their users, customers and supporters, and gain some new different clients.  It’s a different market.  Whether and how that pans out in terms of commercial success is never certain.  Given that we know that the Open Source development process yields benefits in terms of quality and features users want, we can say that non-OSS products lack (some of) those benefits, so to put it bluntly, it’ll be a different product of possibly less quality and the feature set is likely to differ as well.

Naturally we cannot ascertain code quality directly as we can’t review closed code directly, bug systems of proprietary software tends to be closed, changelogs are condensed for marketing purposes, but as far back as a decade and a half there have been independent studies that worked out “lines of code per software flaw” and it came out significantly in favour of Open Source software, having proportionally much fewer bugs.  Bugs also tend to get fixed quicker in Open Source software.  None of this is new(s). see for instance Open-source vs. proprietary software bugs: Which get squashed fastest? (CNET, 2007)

For complete products (libraries are a slightly different beast) with a relatively large market scope, source code being available does not in any way diminish a company’s ability to make money.  Having the core developers, tech writers and support people gives them a significant edge in the open market, and that’s a business asset you can leverage.  You do that by focusing on those aspects in your communications – that’s basic marketing, you draw attention to the positive aspects that make your company/product stand out from the rest.  Clearly, this objective cannot not achieved by force, as you don’t make a (potential) client like or trust you by denying them choice or transparency.

There is one other known option aside from not believing or not understanding, and that’s fear. But fear is an awkward business driver, it makes for very bad decisions.

MariaDB Corp in part uses the Open Source development model, in part they’re an Open Source publisher (in-house work that’s only made available at a later stage in the development process), and now some proprietary product has been added to the mix (actually new versions of an existing product).  Looking at this I am rather unclear about what they believe in.  Of course companies can make business choices as they see fit – but they never operate in a vacuum.  In the end it doesn’t matter much what I believe personally, the market will do what it will – historically, it responds in the various ways as described above.  We’ll see how it pans out.

Open Query does not recommend (or re-sell at all) proprietary tools, as it just doesn’t make sense for us or our clients.  We often do bugfixes and improvements which we contribute upstream – for proprietary tools we can’t do that and thus it becomes a hindrance for us and our clients.  On the specific practical level, we’ve actually never used MaxScale (the product that MariaDB Corp will now sell under different conditions for future versions), and this stems from our experience with its effective predecessor MySQL Proxy.  Having a complex set of scripted logic in a proxy slows down applications and introduces a rather large extra (single) point of potential failure in to infrastructure.   So, while Simon refers to MaxScale as an essential tool for scale-able environments, we know from experience that there are other ways of achieving that desired objective, and without the downsides.

Rather than promoting a single tool for many wildly different jobs, we utilise a few different tools depending on the needs of particular client infrastructure.  We still have a couple of (now legacy) MySQL-MMM deployments, but also quite a few Galera clusters, and other setups as suit our clients’ needs.  Key is to not only make the infrastructure convenient to use for applications, but also to not introduce any more single points of failure.  We build resilience into the client’s server infrastructure, without adding significant overhead in either performance or maintenance requirements.

We believe that that’s what clients want, and since potential clients come to us asking exactly for that (and note our approach with relief) we think that we’re doing the right thing by our clients.  We’ve used this approach for over 9 years, and we’ll just keep on doing that – our basic approach doesn’t change even when our tools do.  If you’d like to talk with us about helping you with your infra, using our approach and way of working, contact us today!

by Arjen Lentz at August 26, 2016 04:24 AM

August 25, 2016

MariaDB Foundation

MariaDB 10.0.27 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.0.27. See the release notes and changelog for details on this release. Download MariaDB 10.0.27 Release Notes Changelog What is MariaDB 10.0? MariaDB APT and YUM Repository Configuration Generator Thanks, and enjoy MariaDB!

The post MariaDB 10.0.27 now available appeared first on MariaDB.org.

by Daniel Bartholomew at August 25, 2016 04:39 PM

Peter Zaitsev

Percona Live Europe featured talk with Krzysztof Książek — MySQL Load Balancers – MaxScale, ProxySQL, HAProxy, MySQL Router & nginx

Percona Live Europe featured talkWelcome to the first Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Krzysztof Książek, Senior Support Engineer at Severalnines AB. His talk will be on MySQL Load Balancers – MaxScale, ProxySQL, HAProxy, MySQL Router & nginx: a close up look. Load balancing MySQL connections and queries using HAProxy has been popular in the past years. However, the recent arrival of MaxScale, MySQL Router, ProxySQL and now also Nginx as a reverse proxy have changed the game. Which use cases are best for which solution, and how well do they integrate into your environment?

I had a chance to speak with Krzysztof and learn a bit more about these questions:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it?

Krzysztof: I was working as a system administrator in a hosting company in Poland. They had a need for a dedicated MySQL DBA. So I volunteered for the job. Later, I decided it was time to move on and joined Laine Campbell’s PalominoDB. I had a great time there, working with large MySQL deployments. At the beginning of 2015, I joined Severalnines as Senior Support Engineer. It was a no-brainer for me as I was always interested in building and managing scalable clusters based on MySQL — this is exactly what Severalnines helps its customers with.

Percona: Your talk is called “MySQL Load Balancers: MaxScale, ProxySQL, HAProxy, MySQL Router & nginx – a close up look.” Why are more load balancing solutions becoming available? What problems does load balancing solve for database environments?

Krzysztof:Load balancers are a must in highly scalable environments that are usually distributed across multiple servers or data centers. Large MySQL setups can quickly become very complex — many clusters, each containing numerous nodes and using different and interconnected technologies: MySQL replication, Galera Cluster. Load balancers not only help to maintain availability of the database tier by routing traffic to available nodes, but they also hide the complexity of the database tier from the application.

Percona: You call out three general groups of load balancers: application connectors, TCP reverse proxies, and SQL-aware load balancers. What workloads do these three groups generally address best?

Krzysztof: I wouldn’t say “workloads” — I’d say more like “use cases.” Each of those groups will handle all types of workloads but they do it differently. TCP reverse proxies like HAProxy or nginx will just route packets: fast and robust. They won’t understand the state of MySQL backends, though. For that you need to use external scripts like Percona’s clustercheck or Severalnines’ clustercheck-iptables.

On the other hand, should you want to build your application to be more database-aware, you can use mysqlnd and manage complex HA topologies from your application. Finally, SQL-aware load balancers like ProxySQL or MaxScale can be used to move complexity away from the application and, for example, perform read-write split in the proxy layer. They detect the MySQL state and can make necessary changes in routing — such as moving writes to a newly promoted master. They can also empower the DBA by allowing him to (for example) rewrite queries as they pass the proxy.

Percona: Where do you see load balancing technologies heading in order to deal with some of the database trends that keep you awake at night?

Krzysztof: Personally, I love to see the “empowerment” of DBA’s. For example, ProxySQL not only routes packets and helps to maintain high availability (although this is still the main role of a proxy), it is also a flexible tool that can help a DBA tackle many day-to-day problems. An offending query? You can cache it in the proxy or you can rewrite it on the fly. Do you need to test your system before an upgrade, using real-world queries? You can configure ProxySQL to mirror the production traffic on a test system. You can use it to build a sharded environment. These things, in the past, typically weren’t possible for a DBA to do — the application had to be modified and new code had to be deployed. Activities like those take time, time that is very precious when the ops staff is dealing with databases on fire from a high load. Now I can do all that just through reconfiguring a proxy. Isn’t it great?

Percona: What are looking forward to the most at Percona Live Europe this year?

Krzysztof: The Percona Live Europe agenda looks great and, as always, it’s a hard choice to decide which talks to attend. I’d love to learn more about the upcoming MySQL 8.0: there are quite a few talks covering both performance improvements and different features of 8.0. There’s also a new Galera version in the works with great features like non-blocking DDL’s, so it would be great to see what’s happening there. We’re also excited to run the “Become a MySQL DBA” tutorial again (our blog series on the same topic has been very popular).

Additionally, I’ve been working within the MySQL community for a while and I have many friends who, unfortunately, I don’t see very often. Percona Live Europe is an event that brings us together and where we can catch up. I’m definitely looking forward to this.

You can read more about Krzysztof thoughts on load balancers at Severalnines blog.

Want to find out more about Krzysztof, load balancers and Severalnines? Register for Percona Live Europe 2016, and come see his talk MySQL Load Balancers – MaxScale, ProxySQL, HAProxy, MySQL Router & nginx: a close up look.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

by Dave Avery at August 25, 2016 04:36 PM

PostgreSQL Day at Percona Live Amsterdam 2016

PostgreSQL Day

PostgreSQL DayIntroducing PostgreSQL Day at Percona Live Europe, Amsterdam 2016.

As modern open source database deployments change, often including more than just a single open source database, Percona Live has also changed. We changed our model from being a purely MySQL-focused conference (with variants) to include a significant amount of MongoDB content. We’ve also expanded our overview of the open source database landscape and included introductory talks on many other technologies. These included practices we commonly see used in the world, and new up and coming solutions we think show promise.

In getting Percona Live Europe 2016 ready, something unexpected happened: we noticed the PostgreSQL community come together and submit many interesting talks about this great open source database technology. This effort on their part pushed to go further than we initially planned this year, and we’ve put together a full day of PostgreSQL talks. At Percona Live Europe this year, we will be running our first ever PostgreSQL Day on October 4th!

Some folks have been questioning this decision: do we really need so much PostgreSQL content? Isn’t there some tension between the MySQL and PostgreSQL communities? (Here is a link to a very recent example.)  

While it might be true (and I think it is) that some contention exists between these groups, I don’t think isolation and indifference are the answers to improving cooperation. They certainly aren’t the best plan for the open source database community at large, because there is too much we can learn from each other — especially when it comes to promoting open source databases as a real alternative to commercial ones.

Percona Live Europe featured talkEvery open source community has its own set of “zealots” (or maybe just “strict adherents”). But our dedication to one particular technology shouldn’t blind us to the value of others. The MySQL and PostgreSQL communities have both successfully obtained support through substantial large scale deployments. There are more and more engineers joining those communities, looking to find better solutions for the problems they face and learn from others’ technologies.  

Through the years I have held very productive discussions with people like Josh Berkus, Bruce Momjian, Oleg Bartunov,  Ilya Kosmodemiansky and Robert Treat (to name just a few) about how things are done in MySQL versus PostgreSQL — and what could be done better in both.

At PGDay this year, I was glad to see Alexey Kopytov speaking about what MySQL does better; it got some very constructive conversations going. I was also pleased that my keynote on Migration to the Open Source Databases at the same conference was well attended and also sparked some good conversations.

I want this trend to continue to grow. This is why I think running a PostgreSQL Day as part of Percona Live Europe, Amsterdam is an excellent development. It provides an outstanding opportunity for people interested in PostgreSQL to further their knowledge through exposure to  MySQL, MongoDB and other open source technologies. This holds true for folks attending the conference mainly as MySQL and MongoDB users: they get exposed to the state of PostgreSQL in 2016.

Even more, I hope that this new track will spark productive conversations in the hallways, at lunches and other events between the speakers themselves. It’s really the best way to see what we can learn from each other. In the end, it benefits all technologies.

I believe the whole conference is worth attending, but for people who only wish to attend our new  PostgreSQL Day on October 4th, you can register for a single day conference pass using the PostgreSQLRocks discount code (€200, plus VAT).  

I’m looking forward to meeting and speaking with members of the PostgreSQL community at Percona Live!

by Peter Zaitsev at August 25, 2016 01:21 PM