Planet MariaDB

September 30, 2016

MariaDB Foundation

MariaDB 10.1.18 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.18. This is a Stable (GA) release. See the release notes and changelog for details. Download MariaDB 10.1.18 Release Notes Changelog What is MariaDB 10.1? MariaDB APT and YUM Repository Configuration Generator Thanks, and enjoy MariaDB!

The post MariaDB 10.1.18 now available appeared first on MariaDB.org.

by Daniel Bartholomew at September 30, 2016 05:54 PM

Peter Zaitsev

Percona Software News and Roadmap Update with CEO Peter Zaitsev

Percona Software News and Roadmap Update

Percona Software News and RoadmapThis blog post is a summary of the webinar Percona Software News and Roadmap Update – Q3 2016 given by Peter Zaitsev on September 28th, 2016.

Over the last few months, I’ve had the opportunity to meet and talk with many of Percona’s customers. I love these meetings, and I always get a bunch of questions about what we’re doing, what our plans are and what releases are coming.

I’m pleased to say there is a great deal going on at Percona, and I thought giving a quick talk about our current software and services, along with our future plans, would provide a simple reference for many of these questions.

A full recording of this webinar, along with the presentation slide deck, can be found here.

Percona Solutions and Services

Let me start by laying out Percona’s company purpose:

To champion unbiased open source database solutions.

What does this mean? It means that we write software to offer you better solutions, and we use the best of what software and technology exists in the open source community.

Percona stands by a set of principles that we feel define us as a company, and are a promise to our customers:

  • 100% free and open source software
  • Focused on finding solution to maximize your success
  • Open source database strategy consulting and implementation
  • No vendor lock-in required

We offer trusted and unbiased expert solutions, support and resource in a broad software ecosystem, including:

  • MySQL
  • Percona Server
  • MariaDB
  • Percona XtraDB Cluster
  • Galera Cluster for MySQL
  • MariaDB Galera Cluster
  • MongoDB
  • Percona Server for MongoDB
  • Amazon RDS for MySQL/MariaDB/Aurora
  • Google CloudSQL

We also have specialization options for PAAS, IAAS, and SAAS solutions like Amazon Web Services, OpenStack, Google Cloud Platform, OpenShift, Ceph, Docker and Kubernetes.

Percona’s immediate business focus includes building long-term partnership relationships through support and managed services.

The next few sections detail our current service offerings, with some outlook on our future plans.

Consulting and Training. Our consulting and training services are available to assist you with whatever project or staff needs you have.

  • Onsite and remote
  • 4 hours to full time (weeks or months)
  • Project and staff augmentation

Moved from Eventum to Zendesk. In order to better and more simply serve and address our customers’ needs, we recently moved from Eventum to Zendesk for reporting issues and requesting services. You can reach our support site here.  So far it seems to be working well, and is providing:

  • A better experience for customers
  • Better responsiveness from our staff
  • Better measurements of customer experience

Advanced HA Included with Enterprise and Premier Support. Starting this past Spring, we included advance high availability (HA) support as part of our Enterprise and Premier support tiers. This advanced support includes coverage for:

  • Percona XtraDB Cluster
  • MariaDB Galera Cluster
  • Galera Cluster for MySQL
  • Upcoming MySQL group replication
  • Upcoming MySQL Innodb Cluster

Enterprise Wide Support Agreements. Our new Enterprise Wide Support option allows you to buy per-environment support coverage that covers all of the servers in your environment, rather than on a per-server basis. This method of support can save you money because it:

  • Covers both “MySQL” and “MongoDB”
  • Means you don’t have to count servers
  • Provides highly customized coverage

Simplified Support Pricing. Get easy to understand support pricing quickly.

To discuss how Percona Support can help your business, please call us at +1-888-316-9775 (USA),
+44 203 608 6727 (Europe), or have us contact you.

Percona Software

Below are the latest and upcoming features in Percona’s software. All of Percona’s software adheres to the following principles:

  • 100% free and open source
  • No restricted “Enterprise” version
  • No “open core”
  • No BS-licensing (BSL)

Percona Server for MySQL 5.7

Overview

  • 100% Compatible with MySQL 5.7 Community Edition
  • 100% Free and Open Source
  • Includes Alternatives to Many MySQL Enterprise Features
  • Includes TokuDB Storage Engine
  • Focus on Performance and Operational Visibility

Latest Improvements

Features about to be Released Column level compression (thanks Pinterest!)

  • Database based file layout for TokuDB
  • Integration of TokuDB and Performance Schema

Percona XtraBackup 2.4

Overview

  • #1 open source binary hot backup solution for MySQL
  • Alternative to MySQL Enterprise backup
  • Parallel backups, incremental backups, streaming, encryption
  • Supports MySQL, MariaDB, Percona Server

New Features

  • MySQL 5.7 and Percona Server 5.7 Support
  • Support for Innodb Tablespace Encryption

Percona Toolkit

Overview

  • “Swiss Army Knife” of tools
  • Helps DBAs be more efficient
  • Helps DBAs make fewer mistakes
  • Supports MySQL, MariaDB, Percona Server, Amazon RDS MySQL

New Features

  • Support for New MySQL, Percona Server and MariaDB versions
  • Working on making tools even safer to use

Percona Server for MongoDB 3.2

Overview

  • 100% compatible with MongoDB 3.2 Community Edition
  • 100% open source
  • Alternatives for many MongoDB Enterprise features
  • MongoRocks (RocksDB) storage engine
  • Percona Memory Engine

New

  • Percona Server for MongoDB 3.2 – GA
  • Support for MongoRocks storage engine
  • PerconaFT storage engine depreciated
  • Implemented Percona Memory Engine

Percona Memory Engine for MongoDB

Benchmarks

Percona Memory Engine for MongoDB® is a 100 percent open source in-memory storage engine for Percona Server for MongoDB.

Based on the in-memory storage engine used in MongoDB Enterprise Edition, WiredTiger, Percona Memory Engine for MongoDB delivers extremely high performance and reduced costs for a variety of use cases, including application cache, sophisticated data manipulation, session management and more.

Below are some benchmarks that we ran to demonstrate Percona Memory Engine’s performance.

Percona Software News and Roadmap Update

WiredTiger vs MongoRocks – write intensive

Percona XtraDB Cluster 5.7

Overview

  • Based on Percona Server 5.7
  • Easiest way to bring HA in your MySQL environment
  • Designed to work well in the cloud
  • Multi-master replication with no conflicts
  • Automatic node provisioning for auto scaling and self-healing
  • Available now

Goals

  • Brought PXC development in-house to server our customers better
  • Provide complete clustering solution, not set of LEGO pieces
  • Improve usability and ease of use
  • Focus on quality

Highlights

  • Integrated cluster-aware load balancer with ProxySQL
  • Instrumentation with Performance Schema
  • Support for data at rest encryption (InnoDB tablespace encryption)
  • Your data is safe by default with “strict mode” – prevents using features that do not work correctly
  • Integration with Percona Monitoring and Management (PMM)

Percona Monitoring and Management

Overview

  • Comprehensive database-focused monitoring
  • 100% open source, roll-your-own solution
  • Easy to install and use
  • Supports MySQL and MongoDB
  • Version 1.0 focuses on trending and query analyses
  • Management features to come

Check out the Demo

Percona Live Europe is right around the corner!

high availibilityThe Percona Live Open Source Database Conference is the premier event for the diverse and active open source database community, as well as businesses that develop and use open source database software. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, PostgreSQL and other open source databases. Tackling subjects such as analytics, architecture and design, security, operations, scalability and performance, Percona Live provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience – all to help you learn how to tackle your open source database challenges in a whole new way

This conference has something for everyone!

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

by Peter Zaitsev at September 30, 2016 05:44 PM

MariaDB Foundation

MariaDB Server is a true open source project

The mission of the MariaDB Foundation is to ensure continuity and open collaboration in the MariaDB ecosystem. We facilitate the development of the MariaDB Server and the related connectors as listed on our GitHub account. Core to us is to enable and foster collaboration so that contributing is meaningful and produces results for everybody. Here are […]

The post MariaDB Server is a true open source project appeared first on MariaDB.org.

by Otto Kekäläinen at September 30, 2016 02:47 PM

September 29, 2016

Jean-Jerome Schmidt

Planets9s - 9 DevOps Tips for MySQL / MariaDB Galera Cluster, MySQL Query Tuning Part 2 and more!

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

New webinar: 9 DevOps Tips for going in production with MySQL / MariaDB Galera Cluster

In this new webinar on October 11th, Johan Andersson, CTO at Severalnines, will guide you through 9 key DevOps tips to consider before taking Galera Cluster for MySQL / MariaDB into production. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups; these are all aspects to consider before going live with Galera Cluster and Johan will share his 9 DevOps tips with you for a successful production environment.

Sign up for the webinar

Watch the replay: MySQL Query Tuning Part 2 - Indexing and EXPLAIN

You can now watch the replay of Part 2 of our webinar trilogy on MySQL Query Tuning, which covers Indexing as well as EXPLAIN, one of the most important tools in the DBA’s arsenal. Our colleague Krzysztof Książek, Senior Support Engineer at Severalnines, presents this webinar trilogy and this week he looked into answering questions such as why a given query might be slow, what the execution plan might look like, how JOINs might be processed, whether a given query is using the correct indexes, or whether it’s creating a temporary table. Find out more by watching the replay of this webinar.

Watch the replay

Download our whitepaper on Database Sharding with MySQL Fabric

This new whitepaper provides a close look at database sharding with MySQL Fabric. You will learn the basics of it, and also learn how to migrate to a sharded environment. It further discusses three different tools which are designed to help users shard their MySQL databases. And last but not least, it shows you how to set up a sharded MySQL setup based on MySQL Fabric and ProxySQL.

Download the whitepaper

Critical zero-day vulnerabilities exposed in MySQL

Database security notice: you can easily upgrade your MySQL and MariaDB servers with ClusterControl and this new blog post shows you how. You must have heard about the CVE-2016-6662, the recent zero-day exploit exposed in most of MySQL and its variants. The vulnerability flaw can be exploited by a remote attacker to inject malicious settings into your my.cnf,. We advise you to upgrade as soon as possible, if you haven’t done so yet, with these easy-to-follow instructions for ClusterControl users.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at September 29, 2016 02:09 PM

Peter Zaitsev

Percona XtraDB Cluster 5.7.14-26.17 GA is now available

Percona XtraDB Cluster Reference Architecture

Percona XtraDB Cluster 5.7.14-26.17 GAThis Percona XtraDB Cluster 5.7.14-26.17 GA release is dedicated to the memory of Federico Goncalvez, our colleague with Percona’s Uruguayan team until his tragic death on September 6, 2016.

Fede, we miss you.

Percona announces the first general availability (GA) release in the Percona XtraDB Cluster 5.7 series on September 29, 2016. Binaries are available from the downloads area or our software repositories.

The full release notes are available here.

Percona XtraDB Cluster 5.7.14-26.17 GA is based on the following:

For information about the changes and new features introduced in Percona Server 5.7, see Changed in Percona Server 5.7.

Percona XtraDB Cluster 5.7.14-26.17 GA New Features

This is a list of the most important features introduced in Percona XtraDB Cluster 5.7 compared to version 5.6:

  • PXC Strict Mode saves your workload from experimental and unsupported features.
  • Support for monitoring Galera Library instruments and other wsrep instruments as part of Performance Schema.
  • Support for encrypted tablespaces in Multi-Master Topology, which enables Percona XtraDB Cluster to wire encrypted tablespace to new booting node.
  • Compatibility with ProxySQL, including a quick configuration script.
  • Support for monitoring Percona XtraDB Cluster nodes using Percona Monitoring and Management
  • More stable and robust operation with MySQL and Percona Server version 5.7.14, as well as Galera 3.17 compatibility. Includes all upstream bug fixes, improved logging and more.
  • Simplified packaging for Percona XtraDB Cluster to a single package that installs everything it needs, including the Galera library.
  • Support for latest Percona XtraBackup with enhanced security checks.
Bug Fixes
  • Fixed crash when a local transaction (such as EXPLAIN or SHOW) is interrupted by a replicated transaction with higher priority (like ALTER that changes table structure and can thus affect the result of the local transaction).
  • Fixed DONOR node getting stuck in Joined state after successful SST.
  • Fixed error message when altering non-existent table with pxc-strict-mode enabled.
  • Fixed path to the directory in percona-xtradb-cluster-shared.conf.
  • Fixed setting of seqno in grastate.dat to -1 on clean shutdown.
  • Fixed failure of asynchronous TOI actions (like DROP) for non-primary nodes.
  • Fixed replacing of my.cnf during upgrade from 5.6 to 5.7.
Security Fixes
  • CVE-2016-6662
  • CVE-2016-6663
  • CVE-2016-6664

For more information, see this blog post.

Other Improvements
  • Added support of defaults-group-suffix for SST scripts.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Alexey Zhebel at September 29, 2016 02:05 PM

Jean-Jerome Schmidt

Watch the replay: MySQL Query Tuning Part 2 - Indexing and EXPLAIN

We’ve just completed Part 2 of our webinar trilogy on MySQL Query Tuning this week and if you didn’t get the chance to join us for the live sessions (or would like to watch them again), the recording is now available to view online as a replay.

As announced in our communications, these webinars are almost 1.5hrs long, as the content is quite dense (and we got some great questions from participants as well), so be sure to make sufficient time available to watch the replay ;-)

Watch the replay

About 40% of our audience indicated that they currently either use MySQL / Percona Server 5.5 (or earlier), MySQL / Percona Server 5.6 or MariaDB 10.1, so it was an interesting mix. We also found that the majority of participants don’t have a process to check for unused indexes, which was a good opportunity to explain why this might be a good thing to implement. And finally, 68% of the audience confirmed that they use EXPLAIN Analyzer to analyse query execution plans. You can view the detail of the polls further below.

Our colleague Krzysztof Książek, Senior Support Engineer at Severalnines, presents this webinar trilogy and this week he looked into answering questions such as why a given query might be slow, what the execution plan might look like, how JOINs might be processed, whether a given query is using the correct indexes, or whether it’s creating a temporary table.

In order to answer these questions, Krzysztof discussed how to use database indexes to speed up queries and, more specifically, he covered the different index types such as B-Tree, Fulltext and Hash, deepdive into B-Tree indexes, and discussed the indexes for MyISAM vs. InnoDB tables as well as some gotchas.

Watch the replay

And if you’re ready for Part 3 of the MySQL Query Tuning Trilogy, please do sign up for the webinar on October 25th, which will cover ‘working with optimizer and SQL tuning’.

Feedback detail of this week’s webinar:

by Severalnines at September 29, 2016 11:47 AM

September 28, 2016

Peter Zaitsev

7 Fresh Bugs in MySQL 8.0

bugs in mysql 8.0

bugs in mysql 8.0This blog post will look at seven bugs in MySQL 8.0.

Friday afternoon is always ideal for a quick look at the early quality of MySQL 8.0! Last Friday, I did just that.

If you haven’t heard the news yet, MySQL 8.0 DMR is available for download on mysql.com!

Tools to the ready: pquery2, updated 8.0 compatible scripts in Percona-qa and some advanced regex to wade through the many cores generated by the test run. For those of you who know and use pquery-run.sh, this should mean a lot!

[09:41:50] [21492] ====== TRIAL #39308 ======

In other words, almost 40K trials and 21.5K core dumps (or other significant issues) detected! This run had been churning away on a server for a number of days. On to the bug logging fun!

After reducing test cases, and filtering duplicates, we have the following seven bugs logged in upstream;

  • Bug #83120 virtual void Field::store_timestamp(const timeval*): Assertion ‘false’ failed.
  • Bug #83118 handle_fatal_signal (sig=11) in replace_user_table
  • Bug #83117 Assertion MDL_checker::is_read_locked(m_thd, *object)’ failed.
  • Bug #83115 Assertion ‘maybe_null’ failed. handle_fatal_signal in tem_func_concat::val_str
  • Bug #83114 Assertion `strlen(db_name) <= (64*3) && strlen(table_name) <= (64*3)’ failed.
  • Bug #83113 SIGKILL myself on DROP TABLE
  • Bug #83112 handle_fatal_signal (sig=11) in sp_get_flags_for_command

My first impressions?

MySQL 8.0 DMR is a reasonably solid release for a first iteration.

It seems our friends at upstream are on an excellent path to making MySQL 8.0 another rock-solid release. Chapeau!

by Roel Van de Paar at September 28, 2016 06:19 PM

Percona Live Europe featured talk with John De Goes — MongoDB Analytics & Dashboards

Percona Live Europe Featured Talk

Percona Live Europe Featured TalkWelcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet John De Goes, CTO at SlamData Inc. His talk will be on MongoDB Analytics & Dashboards. SlamData is an open source analytics and reporting solution designed specifically for MongoDB. SlamData lets anyone build live analytics, charts, and dashboards on any type of data inside MongoDB. Because SlamData runs all queries inside the database, reports are always up to date and can scale to the largest MongoDB deployments.

I had a chance to speak with John and learn a bit more about analytics for businesses:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

John: I work as the CTO at SlamData, where we build open source software that helps companies understand complex data stored in modern databases.

I love my job. We’re solving a huge pain point for businesses, and we get to do that through open source. The technical challenges we’ve had to overcome (and have yet to overcome!) can be really difficult, but that’s what makes it so fun.

Percona: Your talk is called “MongoDB Analytics & Dashboards.” What are some of the more important data sets in MongoDB for people to monitor, and how does SlamData work to provide easy analysis of those data?

John: We use MongoDB to build data hubs, which pull together lots of different types of data into a single database. SlamData’s really useful for that case because you can explore many complex kinds of data, no matter how nested or irregular, then refine the data and build beautiful interactive reports and dashboards that can be used by non-technical people.

Developers also use MongoDB to build web and mobile applications, which tend to collect a lot of data about the product and the users. Normal analytics tools don’t work with MongoDB, because the data model is too complex. But we built SlamData to handle complex data, we make it very easy to do product and user analytics on top MongoDB databases.

Percona: Why are data dashboards important for businesses? How can businesses use the data to improve processes or realize revenues?

John: Analytics serve one of two major roles in today’s businesses. Firstly, tech companies use analytics to build product features. For example, if you built a marketing application for businesses, then your customers probably want their own analytics dashboard that’s embedded into the product. If you built your application on a modern database like MongoDB, then SlamData makes it very easy for developers to add these beautiful, interactive and secure dashboards into their applications.

Secondly, all kinds of companies store user profile, user event and product data inside databases. Businesses can use this information to better understand their customer makeup, what users are doing, how they are using internal or external applications, and how one data set relates to another. These insights help businesses improve business processes, such as allocating marketing spend or directing product development resources. This is the classic use case for Business Intelligence (BI) and reporting software, and SlamData really shines here, because you can just get to work on any kind of data, no matter how much or little structure it might have.

Percona: What is changing in how businesses use data that keeps you awake at night? What tools or features are you looking might you be thinking about to address these issues?

John: Businesses are moving away from traditional data warehousing. They don’t want to spend millions of dollars a year licensing a big clunky piece of technology that never has all the data, and which is always out of date. They want agile solutions that they can point to any source of data, no matter where it is, and no matter what structure it has, and begin answering the questions they have.

This is the single biggest shift in how companies want to consume analytics. The other two important ones are the democratization of analytics, which is a fancy way of saying business users want to do more and bottleneck less on IT, and the increasing use of web and mobile analytics solutions.

SlamData is built web- and mobile-first, and the core technology relies a lot on pushing computation down into data sources to minimize the need to involve IT. But nonetheless, it’s a new paradigm, and building this sort of technology that addresses the need for analytics agility is not easy. It’s taken us two and a half years to get this far, and we’re still barely halfway in.

Percona: What are you looking forward to the most at Percona Live Europe this year?

John: I’m from the USA, so I’m interested in seeing what tools and processes professionals in Europe are using to manage data requirements. I’ll be comparing notes. It’s going to be a blast!

You can read more about John and his thoughts on MongoDB analytics at his twitter handle.

Want to find out more about John, SlamData, and analytics? Register for Percona Live Europe 2016, and come see his talk MongoDB Analytics & Dashboards.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

by Dave Avery at September 28, 2016 03:46 PM

Shlomi Noach

Three wishes for a new year

(Almost) another new year by Jewish calendar. What do I wish for the following year?

  1. World peace
  2. Good health to all
  3. Relaxed GTID constraints

I'm still not using GTID, and still see operational issues with working with GTID. As a latest example, our new schema migration solution, gh-ost, allows us to test migrations in production, on replicas. The GTID catch? gh-ost has to write something to the binary log. Thus, it "corrupts" the replica with a bogus GTID entry that will never be met in another server, thus making said replica unsafe to promote. We can work around this, but...

I understand the idea and need for the Executed GTID Set. It will certainly come in handy with multi-writer InnoDB Cluster. However for most use cases GTID poses a burden. The reason is that our topologies are imperfect, and we as humans are imperfect, and operations are most certainly imperfect. We may wish to operate on a replica: test something, by intention or mistake. We may wish to use a subchain as the seed for a new cluster split. We may wish to be able to write to downstream replicas. We may use a 3rd party tool that issues a flush tables with read lock without disabling sql_log_bin. Things just happen.

For that, I would like to suggest GTID control levels, such as:

  1. Strict: same as Oracle's existing implementation. Executed sets, purged sets, whatnot.
  2. Last executed: a mode where the only thing that counts is the last executed GTID value. If I repoint replica, all it needs to check is "hey this is my last executed GTID entry, give me the coordinates of yours. And, no, I don't care about comparing executed and purged sets, I will trust you and keep running from that point on"
  3. Declarative: GTIDs are generated, are visible in each and every binary log entry, but are completely ignored.

I realize Oracle MySQL GTID is out for some over 3 years now, but I'm sorry - I still have reservations and see use cases where I fear it will not serve me right.

How about my previous years wishes? World peace and good health never came through, however:

  • My 2015 wish for "decent, operations friendly built in online table refactoring" was unmet, however gh-ost is a thing now and exceeds my expectations. No, really. Please come see Tom & myself present gh-ost and how it changed our migration paradigm.
  • My 2012 wish for "decent, long waited for, implementation of Window Functions (aka Analytic Functions) for MySQL" was met by MariaDB's window functions.
    Not strictly Window Functions, but Oracle MySQL 8.0 will support CTE (hierarchial/recursive), worth a mention.

See you in Amsterdam!

by shlomi at September 28, 2016 02:20 PM

MariaDB AB

Webinar - Better, Faster DevOps? Learn how with MaxScale 2.0

MariaDB Team

Please join Roger Bodamer, MariaDB’s Chief Product Officer, on Thursday, October 6 to learn about the exciting new features in MariaDB MaxScale 2.0 that enable rapid innovation for web applications without impacting the architecture that simultaneously supports your legacy applications. In this webinar, we will cover:

  • Data streaming. Stream transactional data in real time from MariaDB to big data stores, like Hadoop, through messaging systems, like Kafka, for real-time analytics and machine learning applications.
     
  • Better security. Prevent security attacks like SQL injection and DDoS attack.
     
  • High availability. New automatic failover and asynchronous replication to ensure uptime and minimize downtime.
     
  • Scalability. Query routing improves read and write scalability.

MaxScale makes it easy to manage the scalability and availability of your database cluster, as well as secure it and minimize maintenance downtime.

MaxScale is a next-generation database proxy that goes well beyond routing, with advanced filtering, enhanced security and authentication. It is a multi-threaded, event-driven engine that has its main functionality provided by plugins loaded at runtime. With MaxScale’s innovative architecture you can update the data layer on scale-out architectures without impacting application performance.

Register to join the EMEA session:
Date: Thursday, October 6
Time: 10-11am CET

Or join our North America session:
Date: Thursday, October 6
Time: 10-11am PST

Roger Bodamer
Chief Product Officer

Roger has more than 20 years of experience building and delivering innovative products to market, as well as deep expertise and knowledge of database architectures. Roger holds several patents for database and middleware technology. His experience leading product development and engineering teams include 12 years with Oracle’s Database and Application Server development organization where he pioneered products that delivered heterogeneous interoperability, as well as several years as SVP of product operations and engineering at Apple’s PowerSchool division. Roger served as GM at 10gen (now MongoDB Inc.). He was a founder and CEO at software company UpThere, Inc. Roger also held leadership positions at OuterBay and Efficient Frontier. He earned a bachelor’s degree in computer science from Saxion Hogescholen in the Netherlands.

Tags: 

About the Author

MariaDB Team's picture

We are the team behind MariaDB, the fastest growing Open Source database.

by MariaDB Team at September 28, 2016 07:00 AM

September 27, 2016

Jean-Jerome Schmidt

Two Database Sharding Tools for MySQL

In a previous blog post we discussed several approaches to sharding. The most flexible one, sharding using metadata, is also the most complex one to implement. You need to design the meta-database, and build high availability not only for your application data but also for the metadata. On top of that, you need to design your application so it will be aware of the complex database infrastructure beneath - it has to query metadata first and then it has to be directed to a correct shard to read or write data. You will also have to build tools to manage and maintain the metadata. Migrating data requires caution so it has to be done carefully. You also have to make sure that any operations on the production databases are mirrored in the metadata. For instance, have you taken a slave out of rotation? This should be reflected in the metadata. Have you added a new slave to a shard? You have to modify the metadata and add that host. As you can imagine, a lot of time and effort has to be put into developing and maintaining scripts and tools to manage such a setup. It begs the question -  is there some external solution to design, deploy and manage a sharded environment? In this post, we will cover a couple of solutions which are available and which may help you to build a scalable, sharded infrastructure.

Vitess

Vitess is a tool built to help manage sharded environments. It was developed to help scale out databases at Youtube. In short, it is a solution based on metadata - by default, it uses range sharding but it is also possible to implement a custom sharding schema. Topology data is stored and maintained in a service like Zookeeper or etcd. Application access data using a lightweight proxy, named ‘vtgate’ in Vitess’ nomenclature. Vtgate connects to the metadata store and checks the data distribution - this allows it to route queries to correct shards - ‘tablets’.

Vitess supports range sharding - the keyspace is divided into two or more partitions, each partition covering a range of data. To find ranges, Vitess has to use a column of some kind to calculate them - currently supported data types are BIGINT UNSIGNED and VARBINARY. This works very well with id’s which usually use unsigned integer format

MySQL Fabric

In 2014, Oracle introduced a new set of tools for MySQL, called “MySQL Fabric”. Historically, there was no official tool which would allow users to build highly available topologies, including sharded setups. The idea behind Fabric is to provide an “official” tooling for building such setups. It provides a framework and tools to manage groups of highly-available MySQL instances. It supports implementation of HA setups and scaling through sharding.

MySQL Fabric uses a concept of high-availability groups - a group contains two or more MySQL servers connected using replication (actually, you can have just a single host in a group but, obviously, it won’t be highly-available).

MySQL Fabric not only gives you the ability to maintain availability of your data - it also supports scaling out through sharding. The basic idea is - if we can configure a few servers into a single high-availability group, we can then scale by having more of them. Then we’d need to implement some kind of shard mapping - we need to decide which column to use for sharding and which tables should be sharded.

If you are interested in sharding, check out our ebook on sharding.

Database Sharding with MySQL Fabric

Why do we shard? How does sharding work? What are the different ways I can shard my database? This whitepaper goes through some of the theory behind sharding. It also discusses three different tools which are designed to help users shard their MySQL databases. And last but not least, it shows you how to set up a sharded MySQL setup based on MySQL Fabric and ProxySQL.

Download Here

by Severalnines at September 27, 2016 06:26 PM

Peter Zaitsev

Using the super_read_only system variable

super_read_only-system-variable

super_read_only-system-variableThis blog post will discuss how to use the MySQL super_read_only system variable.

It is well known that replica servers in a master/slave configuration, to avoid breaking replication due to duplicate keys, missing rows or other similar issues, should not receive write queries. It’s a good practice to set

read_only=1
 on slave servers to prevent any (accidental) writes. Servers acting as replicas will NOT be in read-only mode automatically by default.

Sadly, 

read_only
 has an historical issue: users with the SUPER privilege can override the setting and could still run DML queries. Since Percona Server 5.6.21 and MySQL 5.7.8, however, you can use the
super_read_only
 feature to extend the
read_only
  option and apply it to users with SUPER privileges.

Both 

super_read_only
 and 
read_only
  are disabled by default, and using 
super_read_only
 implies that 
read_only
  is automatically ON as well. We’ll demonstrate how
read_only
 and
super_read only
 work:

mysql> SET GLOBAL read_only = 1;
Query OK, 0 rows affected (0.00 sec)

As expected, with the

read_only
 variable enabled, users without SUPER privilege won’t be able to INSERT values, and instead they will get an ERROR 1290 message:

mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
|                  1 |                        0 |
+--------------------+--------------------------+
1 row in set (0.01 sec)
mysql> SHOW GRANTSG
*************************** 1. row ***************************
Grants for nosuper@localhost: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, CREATE TABLESPACE ON *.* TO 'nosuper'@'localhost' IDENTIFIED BY PASSWORD <secret>
1 row in set (0.00 sec)
mysql> INSERT INTO test.example VALUES (1);
ERROR 1290 (HY000): The MySQL server is running with the --read-only option so it cannot execute this statement

However, users with SUPER privileges can INSERT values on the table:

mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
|                  1 |                        0 |
+--------------------+--------------------------+
1 row in set (0.01 sec)
mysql> SHOW GRANTSG
*************************** 1. row ***************************
Grants for super@localhost: GRANT ALL PRIVILEGES ON *.* TO 'super'@'localhost' IDENTIFIED BY PASSWORD '*3E26301B12AE2B8906D9F09785359751700930E8'
1 row in set (0.00 sec)
mysql> INSERT INTO test.example VALUES (1);
Query OK, 1 row affected (0.01 sec)

Now we will enable

super_read_only
 and try to INSERT data again with both users:

mysql> SET GLOBAL super_read_only = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
|                  1 |                        1 |
+--------------------+--------------------------+
1 row in set (0.00 sec)
mysql> SHOW GRANTSG
*************************** 1. row ***************************
Grants for super@localhost: GRANT ALL PRIVILEGES ON *.* TO 'super'@'localhost' IDENTIFIED BY PASSWORD '*3E26301B12AE2B8906D9F09785359751700930E8'
1 row in set (0.00 sec)
mysql> INSERT INTO test.example VALUES (1);
ERROR 1290 (HY000): The MySQL server is running with the --read-only (super) option so it cannot execute this statement

 

mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
|                  1 |                        1 |
+--------------------+--------------------------+
1 row in set (0.00 sec)
mysql> SHOW GRANTSG
*************************** 1. row ***************************
Grants for nosuper@localhost: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, CREATE TABLESPACE ON *.* TO 'nosuper'@'localhost' IDENTIFIED BY PASSWORD <secret>
1 row in set (0.00 sec)
mysql> INSERT INTO test.example VALUES (1);
ERROR 1290 (HY000): The MySQL server is running with the --read-only (super) option so it cannot execute this statement

As you can see above, now even users with SUPER privileges can’t make updates or modify data. This is useful in replication to ensure that no updates are accepted from the clients, and are only accepted by the master.

When enabling the

super_read_only
 system variable, please keep in mind the following implications:

  • Setting super_read_only ON implicitly forces read_only ON
  • Setting read_only OFF implicitly forces super_read_only OFF

There are some other implications for

read_only
 that apply to 
super_read_only
 as well:

  • Operations on temporary tables are allowed no matter how these variables are set:
    • Updates performed by slave threads are permitted if the server is a replication slave. In replication setups, it can be useful to enable super_read_only on slave servers to ensure that slaves accept updates only from the master server and not from clients.
  • OPTIMIZE TABLE and ANALYZE TABLE operations are allowed as well, since the purpose of the read-only mode is to prevent changes to table structure or contents, but not to table metadata like index stats.
  • You will need to manually disable it when you promote a replica server to the role of master.

There are few bugs related to this variable that might be useful to take into consideration if you’re running on Percona Server 5.6:

For more information, please refer to this following documentation links:

by Pablo Padua at September 27, 2016 06:06 PM

TokuDB and PerconaFT database file management part 1 of 2

okuDB and PerconaFT database file management

okuDB and PerconaFT database file managementIn this blog post, we’ll look at TokuDB and PerconaFT database file management.

The TokuDB/PerconaFT file set consists of many different files that all serve various purposes. These blog posts lists the different types of TokuDB and PerconaFT files, explains their purpose, shows their location and how to move them around.

Peter Zaitsev blogged on the same topic a few years ago. By the time you read back through Peter’s post and reach the end of this series, you should have some ideas to help you to manage your data set more efficiently.

TokuDB and PerconaFT files and file types:

  • tokudb.environment
    • This file is the root of the PerconaFT file set and contains various bits of metadata about the system, such as creation times, current file format versions, etc.
    • PerconaFT will create/expect this file in the directory specified by the MySQL datadir.
  • tokudb.rollback
    • Every transaction within PerconaFT maintains its own transaction rollback log. These logs are stored together within a single PerconaFT dictionary file and take up space within the PerconaFT cachetable (just like any other PerconaFT dictionary).
    • The transaction rollback logs will “undo” any changes made by a transaction if the transaction is explicitly rolled back, or rolled back via recovery as a result of an uncommitted transaction when a crash occurs.
    • PerconaFT will create/expect this file in the directory specified by the MySQL datadir.
  • tokudb.directory
    • PerconaFT maintains a mapping of a dictionary name (example: sbtest.sbtest1.main) to an internal file name (example: _sbtest_sbtest1_main_xx_x_xx.tokudb). This mapping is stored within this single PerconaFT dictionary file and takes up space within the PerconaFT cachetable just like any other PerconaFT dictionary.
    • PerconaFT will created/expect this file in the directory specified by the MySQL datadir.
  • Dictionary files
    • TokuDB dictionary (data) files store actual user data. For each MySQL table there will be:
      • One “status” dictionary that contains metadata about the table.
      • One “main” dictionary that stores the full primary key (an imaginary key is used if one was not explicitly specified) and full row data.
      • One “key” dictionary for each additional key/index on the table.
    • These are typically named: _<database>_<table>_<key>_<internal_txn_id>.tokudb
      PerconaFT creates/expects these files in the directory specified by tokudb_data_dir if set, otherwise the MySQL datadir is used.
  • Recovery log files
    • The PerconaFT recovery log records every operation that modifies a PerconaFT dictionary. Periodically, the system will take a snapshot of the system called a checkpoint. This checkpoint ensures that the modifications recorded within the PerconaFT recovery logs have been applied to the appropriate dictionary files up to a known point in time and synced to disk.
    • These files have a rolling naming convention, but use: log<log_file_number>.tokulog<log_file_format_version>
    • PerconaFT creates/expects these files in the directory specified by tokudb_log_dir if set, otherwise the MySQL datadir is used.
    • PeconaFT does not track what log files should or shouldn’t be present. Upon startup, it discovers the logs in the log dir, and replays them in order. If the wrong logs are present, the recovery aborts and possibly damages the dictionaries.
  • Temporary files
    • PerconaFT might need to create some temporary files in order to perform some operations. When the bulk loader is active, these temporary files might grow to be quite large.
    • As different operations start and finish, the files will come and go.
    • There are no temporary files left behind upon a clean shutdown,
    • PerconaFT creates/expects these files in the directory specified by tokudb_tmp_dir if set. If not, the tokudb_data_dir is used if set, otherwise the MySQL datadir is used.
  • Lock files
    • PerconaFT uses lock files to prevent multiple processes from accessing/writing to the files in the assorted PerconaFT functionality areas. Each lock file will be in the same directory as the file(s) that it is protecting. These empty files are only used as semaphores across processes. They are safe to delete/ignore as long as no server instances are currently running and using the data set.
    • __tokudb_lock_dont_delete_me_environment
    • __tokudb_lock_dont_delete_me_recovery
    • __tokudb_lock_dont_delete_me_logs
    • __tokudb_lock_dont_delete_me_data
    • __tokudb_lock_dont_delete_me_temp

PerconaFT is extremely pedantic about validating its data set. If a file goes missing or unfound, or seems to contain some nonsensical data, it will assert, abort or fail to start. It does this not to annoy you, but to try to protect you from doing any further damage to your data.

Look out for part 2 of this series for information on how to move your log, dictionary, and temp files around correctly.

by George O. Lorch III at September 27, 2016 02:53 PM

MariaDB Foundation

MariaDB 10.2.2 Beta now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.2.2 Beta. This is the first beta release in the MariaDB 10.2 series. See the release notes and changelog for details. Download MariaDB 10.2.2 Beta Release Notes Changelog What is MariaDB 10.2? MariaDB APT and YUM Repository Configuration Generator Thanks, and enjoy MariaDB!

The post MariaDB 10.2.2 Beta now available appeared first on MariaDB.org.

by Daniel Bartholomew at September 27, 2016 02:51 PM

September 26, 2016

Peter Zaitsev

High Availability at Percona Live Europe 2016

high availibilityThis blog will review some of the high availability topics featured at this year’s Percona Live Europe, Amsterdam conference.

The topic of high availability MySQL is always hot, because beyond just being available, you also want efficient database manageability. I’m sure you’ve all seen the video by Frederic Descamps talking about the launch of MySQL InnoDB Cluster (built with group replication, with management executed with the new MySQL Shell). MySQL 8.0 going GA will prove for an exciting time for the MySQL world (though all you early adopters should start trying it now, or right after Percona Live Europe Amsterdam!).

With that, I think that a must attend tutorial is MySQL Group Replication in a nutshell: hands-on tutorial by Frederic Descamps and Kenny Gryp. It competes with MySQL High Availability with Percona XtraDB Cluster 5.7 by  Alok Pathak, Peter Zaitsev and Krunal Bauskar (Percona XtraDB Cluster Team Lead), however, which I think will also be an interesting session (and you’re going to learn about Percona XtraDB Cluster 5.7 here).

The quality of the sessions this year are extremely high, making tutorial day hard to split up – another reason to bring a colleague to the conference to get the best spread! Remember, if you bring three or more in a group, you qualify for the group discounted rate at registration. Back to tutorial day, in the morning you have the choice of seeing me give Best Practices for MySQL High Availability or checkout the ProxySQL Tutorial by David Turner (Uber), Derek Downey (Pythian), and the author himself René Cannaò (Dropbox/ProxySQL). Frankly I think ProxySQL is the new hotness, so you definitely want to be there in the morning (and it will make a good follow on to the Percona XtraDB Cluster tutorial in the afternoon if you’re looking for a “track”).

On Day 1, there are plenty of talks, but my picks focused around high availability would be:

On Day 2, I’d probably check out the following:

So here’s another “track” like post about what’s coming in Amsterdam in about 2 weeks. There is of course, still time to register. Use the FeaturedTalk code to get a discount.

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

by Colin Charles at September 26, 2016 07:55 PM

Webinar Wednesday, September 28: Percona Software News and Roadmap Update – Q3 2016

full-logo

Percona Software News and RoadmapPlease join Percona founder and CEO Peter Zaitsev for a webinar Wednesday, September 28 at 11 am PDT (UTC-7) where he’ll discuss Percona Software News and Roadmap Update – Q3 2016.

Come and listen to Percona CEO Peter Zaitsev discuss what’s new in Percona open source software, including Percona Server for MySQL and MongoDB, Percona XtraBackup, Percona Toolkit, Percona XtraDB Cluster and Percona Monitoring and Management.

During this webinar Peter will talk about newly released features in Percona software, show a few quick demos and share with you highlights from the Percona open source software roadmap.

Peter will also talk about new developments in Percona commercial services and finish with a Q&A.

Register for the Percona Software News and Roadmap Update – Q3 2016 webinar here.

register-now

Percona Software News and RoadmapPeter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 150 professionals in 20 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. The Inc. 5000 added Percona to its list in 2013, 2014 and 2015. Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group.

A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Data Performance Blog. He was also tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

by Dave Avery at September 26, 2016 07:27 PM

Jean-Jerome Schmidt

Database Security - MySQL Upgrade Instructions for Zero-day Exploit

You must have heard about the CVE-2016-6662, the recent zero-day exploit exposed in most of MySQL and its variants. The vulnerability flaw can be exploited by a remote attacker to inject malicious settings into your my.cnf,. you can read about the details here.

At the moment, all supported MySQL vendors by ClusterControl (Oracle, Codership, Percona, MariaDB) have been patched with a bug fix and released in their respective package repository:

Vendor Software Patched Release
Oracle MySQL Server 5.5.52
5.6.33
5.7.15
Percona Percona Server
Percona XtraDB Cluster
5.5.51-38.1
5.6.32-78.0
5.7.14-7
MariaDB MariaDB Server
MariaDB Galera Cluster
10.1.17
10.0.27
5.5.51
Codership MySQL Galera Cluster 5.5.52
5.6.33

If you are using ClusterControl to manage your MySQL/MariaDB databases, we advise you to do the following as soon as possible:

  1. Upgrade MySQL or MariaDB server for ClusterControl.
  2. Upgrade ClusterControl to the latest version (recommended for #3).
  3. Upgrade your MySQL servers manually or using ClusterControl.

Upgrade MySQL/MariaDB Server on ClusterControl server

ClusterControl stores monitoring data in a MySQL/MariaDB server. The ClusterControl installer script (install-cc) relies on the respective OS’s repository to install MySQL server.

On Debian 8 and Ubuntu 14.04, the latest version of mysql-server package is patched:

ubuntu@ubuntu-trusty:~$ sudo apt list mysql-server
mysql-server/trusty-updates,trusty-security 5.5.52-0ubuntu0.14.04.1 all

To upgrade, simply:

$ sudo apt-get update
$ sudo apt-get install mysql-server
$ sudo service mysql restart #sudo systemctl restart mysql

For RHEL/CentOS 6, taken from Redhat Customer Portal, the MySQL 5.1 packages in Red Hat Enterprise Linux 6 do not implement support for library preloading, therefore preventing the remote attack vector used by the published exploit.

At the moment of writing, there is no patched MariaDB release available from RHEL/CentOS 7 repository:

[root@centos7 ]$ yum list | grep -i mariadb-server
mariadb-server.x86_64                   1:5.5.44-2.el7.centos          @base
mariadb-server.x86_64                   1:5.5.50-1.el7_2               updates

For the above reason, on RHEL/CentOS, we can apply the patches manually. The below recommendations are taken from Percona blog:

  1. Patch the mysqld_safe and related files

    Compare and patch mysqld_safe, /etc/init.d/mysql and related files according to the diff shown in the following vendor’s Github repository:

    MySQL: https://github.com/mysql/mysql-server/commit/f75735e36e6569b9dae3b0605b1d5915a519260e#diff-144aa2f11374843c969d96b7b84247eaR348

    Percona: https://github.com/percona/percona-server/commit/c14be53e029442f576cced1fb8ff96b58e89f2e0#diff-144aa2f11374843c969d96b7b84247eaR261

    MariaDB: https://github.com/MariaDB/server/commit/684a165f28b3718160a3e4c5ebd18a465d85e97c

  2. Database user permissions

    One way to avoid the vulnerability is making sure no remote user has SUPER or FILE privileges. Verify if there is any user holding unnecessarily these two privileges. You can get a list of remote users that has these privileges by using the following query:

    mysql> SELECT user, host FROM mysql.user WHERE Super_priv='Y' AND File_priv='Y' AND host NOT IN ('localhost','127.0.0.1', '::1');
  3. Configuration files permissions

    The vulnerability needs to be able to write to some MySQL configuration files. Prevent that and you are secure. Make sure you configure permissions for various config files as follows:

    Create an (empty) my.cnf and .my.cnf in the datadir (usually /var/lib/mysql) and make root the owner/group with 0644 permissions:

    $ touch /var/lib/mysql/my.cnf
    $ touch /var/lib/mysql/.my.cnf
    $ chmod 644 my.cnf .my.cnf
    $ chown root:root *.cnf

    Verify other MySQL configuration files as well in other locations:

    $ for i in "/etc/my.cnf /etc/mysql/my.cnf /usr/etc/my.cnf ~/.my.cnf"; do chown root:root $i; chmod 644 $i; done

    This also includes “!includedir” paths defined in your current configurations - make sure they are not writeable by the mysql user as well. Consider “!includedir = /etc/my.cnf.d” is defined in my.cnf:

    $ chmod 644 /etc/my.cnf.d/*.cnf
    $ chown root:root /etc/my.cnf.d/*.cnf

    Once the RHEL/CentOS releases the patched mysql packages in their respective repository, you can then perform the package upgrade using the yum commands.

Upgrade ClusterControl to the latest version

ClusterControl should be updated to the latest version to ensure the upgrade steps are updated and relevant to the latest release.

The latest stable release for ClusterControl now is 1.3.2 (build 1467). Take note that if you are upgrading from 1.2.12 and below, you should perform some extra steps to re-configure /etc/cmon.cnf to use minimal configuration options. This part is further explained in the ClusterControl upgrade instructions available at our documentation page.

Upgrade the monitored DB server

For Galera Cluster, ClusterControl supports rolling patch upgrade (between minor versions e.g, 5.6.12 to 5.6.30) directly from the UI. To do this, go to ClusterControl -> Manage -> Upgrades -> Upgrade and it will start the rolling upgrade, one node at a time.

You are able to monitor the job progress under Logs -> Jobs:

Alternatively, you can perform the upgrade manually by following the upgrade instructions from the respective database vendor. Minor upgrade does not require you to uninstall the existing packages so it should be a pretty straightforward upgrade. For example, if you are using Percona XtraDB Cluster 5.6 on CentOS 7, you can simply perform the following on one DB node at a time:

$ yum clean all
$ yum install Percona-XtraDB-Cluster-56
$ systemctl restart mysql

Ensure the node re-joins the cluster and reaches the Primary state (monitor the wsrep_cluster_status and wsrep_cluster_size status) before proceeding to the next node.

That’s it. For Severalnines subscription customers, you are welcome to contact us via our support portal if you need further assistance on applying the patches.

by Severalnines at September 26, 2016 01:19 PM

September 25, 2016

Daniël van Eeden

Common Table Expressions in MySQL

In a recent labs release a new feature was introduced by Oracle, or actually two very related new features were introduced. The first new feature is Common Table Expressions (CTEs), which is also known as WITH. The second feature is recursive CTEs, also known as WITH RECURSIVE.

An example of WITH:

WITH non_root_users AS (SELECT User, Host FROM mysql.user WHERE User<>'root')
SELECT Host FROM non_root_users WHERE User = ?

The non-CTE equivalent is this:

SELECT Host FROM 
(SELECT User, Host FROM mysql.user WHERE User<>'root') non_root_users
WHERE User = ?

This makes it easier to understand the query, especially if there are many subqueries.

Besides using regular subqueries or CTEs you could also put the subquery in a view, but this requires more privileges. It is also difficult to change the views later on as other quieries might have started to use them.

But views are still very useful. You can make it easier for others to query data or you can use views to restrict access to certain rows.

So CTEs are basically views which are bound to a query. This makes it easier to write complex queries in a way that they are easy to understand. So don't expect CTEs to replace views.

In the PostgreSQL world CTEs existed since version 8.4 (2009) and it is used a lot.

There are some cool things PostgreSQL allows you to do with CTEs and MySQL doesn't:

test=# create table t1 (id serial, name varchar(100));
CREATE TABLE
test=# insert into t1(name) values ('foo'),('bar');
INSERT 0 2
test=# with deleted_names as (delete from t1 where id = 2 returning name)
test-# select name from deleted_names;
name
------
bar
(1 row)

The blog post has more details and examples about recursive CTEs, the second new feature.

One of the examples is generating a range of numbers.

If you're familiar with PostgreSQL that will remind you of the generate_series function. This function can be used to generate a series of intergers or timestamps. So I tried to make a stored procedure which together with the recursive CTE support would emulate generate_series in MySQL, but no such luck as you can't return a table from a stored fuction yet.

In the PostgreSQL world CTEs are also used to trick the optimizer but note that this depends on the specific CTE implementation, so don't assume this trick will work in MySQL.

MariaDB has some support for the RETURNING keyword and in MariaDB 10.2 (not yet released) there is CTE support. Support for recursive CTEs is not yet present, see MDEV-9864 for the progress.

If you want to see the progress of MySQL and MariaDB on other modern SQL features check out this page.

by Daniël van Eeden (noreply@blogger.com) at September 25, 2016 11:23 AM

September 23, 2016

Peter Zaitsev

MariaDB Sessions at Percona Live Europe 2016

Percona Live Europe

Percona Live EuropeIf you’re going to Percona Live Europe Amsterdam 2016, and are specifically interested in MariaDB-related technologies, there are many talks for you to attend.

On Tutorial Monday, you will want to attend my tutorial The Complete MariaDB Server Tutorial. In those three hours, you’ll learn about all the differences that MariaDB Server has to offer in comparison to MySQL. Where there is feature parity, I’ll point out some syntax differences. This is a semi-hands-on tutorial as the feature scope would make fully hands-on impossible!

On Tuesday, my session picks include:
  • Common Table Expressions in MariaDB Server 10.2. You’ll get to learn from the very people implementing it, Sergei Petrunia (rockstar optimizer developer, and MyRocks hacker), and Galina Shalygina (a Google Summer of Code 2016 student who also helped develop it).
  • Percona XtraDB 5.7: Key Performance Algorithms. It is safe to presume that MariaDB Server 10.2 needs to include Percona XtraDB 5.7. Percona XtraDB has been the default storage engine for MariaDB Server since forever. In fact, MDEV-6113 suggests this will be available in the yet unreleased MariaDB Server 10.2.2. Learn why Percona XtraDB matters from Laurynas Biveinis, the Team Lead for Percona Server.
  • MySQL/MariaDB Parallel Replication: inventory, use-cases and limitations. Jean-François Gagné is the expert on parallel replication, not only from a usage and benchmarking perspective, but also from a feature request standpoint. I highly recommend attending this talk if you care about parallel replication. It will be packed!
On Wednesday, my session picks include:

So those are my Percona Live Europe picks that focus on the MariaDB ecosystem. Notably absent though is more MariaDB MaxScale content from the authors of the application. And what about MariaDB ColumnStore?

There is of course, still time to register. Use the FeaturedTalk code to get a discount.

by Colin Charles at September 23, 2016 05:06 PM

Identifying and Solving Database Performance Issues with PMM: Webinar Q&A

Solving Database Performance Issues

Solving Database Performance IssuesIn this blog, I will provide answers to the Q & A for the Identifying and Solving Database Performance Issues with PMM webinar.

First, I want to thank everybody for attending the September 15 webinar. The recording and slides for the webinar are available here. Thanks for so many good questions. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: PMM has some restrictions working with metrics from RDS instances (AWS)? Aurora for example?
Query analytics can be done only using performance_schema as a query source for RDS/Aurora. No slow log is possible at this moment, as it’s not in the file but the mysql.slow_log table. As for metrics, only MySQL-related ones can be collected, thus only MySQL graphs are available. No system metrics. However, we can look at what we can fetch from CloudWatch to close this gap.

Q: How many ports are needed for each MySQL client? This is related to how many ports need to be open on the firewall.
One metric service per instance requires one port (e.g., mysql:metrics, linux:metrics, mongodb:metrics). The MySQL query service (mysql:queries) does not require a port to open, as the agent connects to the server — unlike the server to client connection in the case of metric services. Usually, the diagram looks like this.

Q: Is it possible to add a customized item for additional monitoring on the client OS?
It is possible to enable additional options on the linux:metrics service (pmm-linux-metrics-42000, node_exporter process). However, it requires “hacking” the service file.

Q: Does PMM have any alerting feature built-in? Or, is there any way to feed alerts to other monitoring framework such as Nagios?
Currently, it doesn’t. But we have plans to add alerting functionality.

Q: Can pmm-client be delivered as an RPM instead of a tar ball?
It is delivered as packages, see the instructions. We recommend using system packages to simplify the upgrade process.

Q: You said it does SSL and Auth. I have 1.0.4 installed, but I do not know a way to do SSL or Auth with the install. My solution is to run it behind an nginx proxy.
Yes, 1.0.4 supports server SSL and HTTP basic authentication. The corresponding documentation is published now.

Q: If you change the Grafana username and password will it be persistent across reboots and upgrades of PMM?
Currently no, but we are working to make it possible very soon.

Q: In Percona cloud tools – we can sort the queries by the sum of the number of queries, or the max time taken by the queries. Is there a way to do that in PMM?
Unfortunately, there is no such possibility, but I have just filled an internal issue so it’s not forgotten.

Q: Can you show us how the explain query works in PMM?
Query Analytics web app calls the API, which asks the corresponding agent connected to the server from the client side to run EXPLAIN on MySQL in real time.

Q: Does PMM track deadlocks?
We can capture metric such as

mysql_global_status_innodb_deadlocks
 but we do not currently graph it. Thanks for pointing out, we will consider adding it.

Q: Is it possible to set up alarm thresholds and notification?
Q: Is there any plan to add sending alerts by e-mail or SMS? This would be an excellent feature.
Currently, there is no alerts functionality. But we have plans for that.

Q: If we do not have a service manager installed like systemv or systemd, upstart , I am using Gentoo Linux on my database server , it has no service manager 🙁
Unfortunately, we didn’t test PMM client on Gentoo Linux. PMM is an open source project, so any person is welcome to implement support for Gentoo Linux and propose the patch. 🙂

Q: What kind of resources are needed by the “server” if we have a moderately active “client” it will be monitoring? (db with millions of rows, thousands of queries per minute)
Regarding metrics, it should not be a problem. If there are thousands of schemas or tables, then per table stats are disabled automatically (10000+ tables). You can also  disable them explicitly. In terms of query analytics, it depends on the volume of slow queries (

long_query_time settings
 ) and how many of them are safe to write and later process. I would say MySQL options should be tuned to help in monitoring if possible (as long as they don’t impact performance).

Q: What is the ETA for the Amazon EC2 image for the PMM server?
We have no ETA, but we acknowledge that not everyone can afford using Docker. We try to do our best to have other shipping methods available.

Q: Does the client have to be installed on the mySQL or Mongo target?
No. However, for best results we recommend running the PMM client on the same host that is being monitored. You can setup remote monitoring when it is not possible.

Q: What are the pros and cons of running the PMM client on the same system as the DB?
Running PMM client locally, you can monitor system metrics (linux:metrics) and MySQL queries from the slow log, which is not possible remotely due to not having access to the filesystem. When the client runs remotely, say MySQL metric service connects to the db host to get metrics, then the server scrapes from the client. So you may get some network latency issues as the metric has to make two hops. Running locally, the metric exporter may pull metrics using MySQL socket.

Q: And if you have any plans for the PMM work in MySQL cluster, not in Galera, but in the same MySQL Cluster.
There is a limited support now, you can monitor MySQL Cluster SQL nodes. However, we currently do not do anything with data nodes or management nodes.

Q: I’m aware that Percona focuses on MySQL and MongoDB, but do you have any plans for supporting plugins capable of monitoring other databases beyond ones supported now? Is this feasible now in the current state of PMM?
There are plans to monitor backups, to support HA tools (e.g., ProxySQL), alerting functionality, etc. We will also think about plugins in some future version, so it is easier for the community to expand PMM for a complete monitoring of their database infrastructure.

Q: Can the query monitor display query execution plan?
Yes, usually you can run EXPLAIN on the query. However, sometimes it requires typing the database name. We plan to improve this behavior.

Q: For metrics monitor, what is the main difference between PMM and Observium?
I don’t know much about Observium, but it looks to me it’s mainly a network device monitoring tool. PMM is focused on database performance.

Q: With the Query Analytics tool, is there any way to capture the notes about an individual query, along with a query review status, such as whether it’s “marked as reviewed”, or “needs attention”?
Unfortunately no, but it would be a great feature and it is on our roadmap.

Q: If possible to monitor the replication MySQL (Slave status ..)?
Yes, there is MySQL Replication dashboard.

Q: Is this possible to store metrics for a long time period to a custom time series database such us openTSBD?
We acknowledge that long-term storage of metrics is very important. However, we can’t offer a complete solution at this moment. Something is in progress. It is possible to point Prometheus to remote storage such as OpenTSBD, but we can’t guarantee how reliable it would be. The main problem is you will need to downsample data and create another dashboard to query from OpenTSBD (which is not flexible).

Q: can PMM have any impact on MySQL server performance?
Q: What is the additional overhead to the “clients” with PMM?
We try to make PMM as safe as possible. In particular,

pmm-admin
 has options to disable the most critical things that might cause performance issues. For example, per table stats when there are thousands of tables, processlist when there is an abnormal count of process states, etc.

Q: Are there graphs for TokuDB storage engine as well?
Yes, there is a TokuDB Metrics dashboard.

Q: Is it recommended not to run PMM server on Production MySQL server, but instead run it on a separate server?
We do not recommend running PMM server on the database server, or on the production one. It should be run on a separate machine, acting a monitor server. It does not require MySQL.

Q: Can you install PMM Server w/out docker?
Currently, no. But we have plans to distribute PMM server as VM, Amazon EC2 image, etc.

Q: Is there a way to integrate with a pre-existing Grafana server?
Yes, you can add Prometheus data source pointing to “http://server/prometheus/: in your Grafana configuration, but you will have to import the dashboards separately.

Q: The graphs plotted using perfromance_schema tables, you have chosen not to display all the events from performance_schema.file_summary_by_event_name. Like there are only Read, Write and Misc events plotted, but Wait is never plotted. So, is there a criterion on which you decide which values are important and which are not for plotting. Also, what is the significance of each of them?
Looks like the amount of events plotted would depend on the performance schema instrumentation settings.

Q: I am trying to figure out how big of a container/VM/machine I need to create for the PMM server and/or client…
Q: What do you recommend to consider as the databases being monitored scales up? (i.e., going from 10 instances monitored to 500 instances)
Q: What are the hardware recommendations to monitor 1000 DB server?
Depending on the PMM settings and MySQL workload, the load on the PMM server might vary a lot. Fast CPUs, plenty of memory and SSD storage are what provide the best performance. If you have a single PMM instance overloaded, you can use multiple PMM instances (i.e., one in each data center). We will be working on a solution for very large scales in the future versions.

I hope all the questions were answered. Thanks for attending my webinar!

If you have other things to ask, do not hesitate to ask on our forum.

by Roman Vynar at September 23, 2016 02:16 PM

September 22, 2016

Peter Zaitsev

Percona Live Europe featured talk with Anthony Yeh — Launching Vitess: How to run YouTube’s MySQL sharding engine

Percona Live Europe featured talk

Percona Live Europe featured talkWelcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Anthony Yeh, Software Engineer, Google. His talk will be on Launching Vitess: How to run YouTube’s MySQL sharding engine. Vitess is YouTube’s solution for scaling MySQL horizontally through sharding, built as a general-purpose, open-source project. Now that Vitess 2.0 has reached general availability, they’re moving beyond “getting started” guides and working with users to develop and document best practices for launching Vitess in their own production environments.

I had a chance to speak with Anthony and learn a bit more about Vitess:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Anthony: Before joining YouTube as a software engineer, I worked on photonic integrated circuits as a graduate student researcher at U.C. Berkeley. So I guess you could say I took a rather circuitous path to the database field. My co-presenter Dan and I have that in common. If you see him at the conference, I recommend asking him about his story.

I don’t actually think of myself as being in database development though; that’s probably more Sugu‘s area. I treat Vitess as just another distributed system, and my job is to make it more automated, more reliable, and easier to administer. My favorite part of this job is when open-source contributors send us new features and plug-ins, and all I have to do is review them. Keep those pull requests coming!

Percona: Your talk is going to be on “Launching Vitess: How to run YouTube’s MySQL sharding engine.” How has Vitess moved from a YouTube fix to a viable enterprise data solution?

Anthony: I joined Vitess a little over two years ago, right when they decided to expand the team’s focus to include external usability as a key goal. The idea was to transform Vitess from a piece of YouTube infrastructure that happens to be open-source, into an open-source solution that YouTube happens to use.

At first, the biggest challenge was getting people to tell us what they needed to make Vitess work well in their environments. Attending Percona Live is a great way to keep a pulse on how the industry uses MySQL, and talk with exactly the people who can give us that feedback. Progress really picked up early this year when companies like Flipkart and Pixel Federation started not only trying out Vitess on their systems, but contributing back features, plug-ins, and connectors.

My half of the talk will summarize all the things we’ve learned from these early adopters about migrating to Vitess and running it in various environments. We also convinced one of our Site Reliability Engineers to give the second half of the talk, to share firsthand what it’s like to run Vitess in production.

Percona: What new features and fixes can people look forward to in the latest release?

Anthony: The biggest new feature in Vitess 2.0 is something that was codenamed “V3” (sorry about the naming confusion). In a nutshell, this completes the transition of all sharding logic from the app into Vitess: at first you had to give us a shard name, then you just had to tell us the sharding key value. Now you just send a regular query and we do the rest.

To make this possible, Vitess has to parse and analyze the query, for which it then builds a distributed execution plan. For queries served by a single shard, the plan collapses to a simple routing decision without extra processing. But for things like cross-shard joins, Vitess will generate new queries and combine results from multiple shards for you, in much the same way your app would otherwise do it.

Percona: Why is sharding beneficial to databases? Are there pros and cons to sharding?

Anthony: The main pro for sharding is horizontal scalability, the holy grail of distributed databases. It offers the promise of a magical knob that you simply turn up when you need more capacity. The biggest cons have usually been that it’s a lot of work to make your app handle sharding, and it multiplies the operational overhead as you add more and more database servers.

The goal of Vitess is to create a generalized solution to these problems, so we can all stop building one-off sharding layers within our apps, and replace a sea of management scripts with a holistic, self-healing distributed database.

Percona: Vitess is billed as being for web applications based in cloud and dedicated hardware infrastructures. Was it designed specifically for one or the other, and does it work better for certain environments?

Anthony: Vitess started out on dedicated YouTube hardware and later moved into Borg, which is Google’s internal precursor to Kubernetes. So we know from experience that it works in both types of environments. But like any distributed system, there are lots of benefits to running Vitess under some kind of cluster orchestration system. We provide sample configs to get you started on Kubernetes, but we would love to also have examples for other orchestration platforms like Mesos, Swarm, or Nomad, and we’d welcome contributions in this area.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?

Anthony: I hope to meet people who have ideas about how to make Vitess better, and I look forward to learning more about how others are solving similar problems.

You can read more about Anthony and Vitess on the Vitess blog.

Want to find out more about Anthony, Vitess, YouTube and and sharding? Register for Percona Live Europe 2016, and come see his talk Launching Vitess: How to run YouTube’s MySQL sharding engine.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

by Dave Avery at September 22, 2016 08:59 PM

Percona XtraDB Cluster 5.5.41-25.11.1 is now available

Percona XtraDB Cluster Reference Architecture

Percona XtraDB Cluster 5.5Percona announces the new release of Percona XtraDB Cluster 5.5.41-25.11.1 (rev. 855) on September 22, 2016. Binaries are available from the downloads area or our software repositories.

Bugs Fixed:
  • Due to security reasons ld_preload libraries can now only be loaded from the system directories (/usr/lib64, /usr/lib) and the MySQL installation base directory. This fix also addresses issue with where limiting didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Hrvoje Matijakovic at September 22, 2016 05:56 PM

Sixth Annual Percona Live Open Source Database Conference 2017 Call for Speakers Now Open

Percona LiveThe Call for Speakers for Percona Live Open Source Database Conference 2017 is open and accepting proposals through Oct. 31, 2016.

The Percona Live Open Source Database Conference 2017 is the premier event for the diverse and active open source community, as well as businesses that develop and use open source software. Topics for the event will focus on three key areas – MySQL, MongoDB and Open Source Databases – and the conference sessions will feature a range of in-depth discussions and hands-on tutorials.

The 2017 conference will feature four formal tracks – Developer, Operations, Business/Case Studies, and Wildcard – that will explore a variety of new and trending topics, including big data, IoT, analytics, security, scalability and performance, architecture and design, operations and management and development. Speaker proposals are welcome on these topics as well as on a variety of related technologies, including MySQL, MongoDB, Amazon Web Services (AWS), OpenStack, Redis, Docker and many more. The conference will also feature sponsored talks.

Percona Live Open Source Database Conference 2017 will take place April 24-27, 2017 at The Hyatt Regency Santa Clara and Santa Clara Convention Center. Sponsorship opportunities are still available, and Super Saver Registration Discounts can be purchased through Nov. 13, 2016 at 11:30 p.m. PST.

Click here to see all the submission criteria, and to submit your talk.

Sponsorships

Sponsorship opportunities for Percona Live Open Source Database Conference 2017 are available and offer the opportunity to interact with more than 1,000 DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event.

Planning to Attend?

Super Saver Registration Discounts for Percona Live Open Source Database Conference 2017 are available through Nov. 13, 2016 at 11:30 p.m. PST.

Visit the Percona Live Open Source Database Conference 2017 website for more information about the conference. Interested community members can also register to receive email updates about Percona Live Open Source Database Conference 2017.

by Kortney Runyan at September 22, 2016 04:44 PM

Jean-Jerome Schmidt

Planets9s - Download our new ‘Database Sharding with MySQL Fabric’ whitepaper

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

Download our new whitepaper: Database Sharding with MySQL Fabric

Database systems with large data sets or high throughput applications can challenge the capacity of a single database server, and sharding is a way to address that. Spreading your database across multiple servers sounds good, but how does this work in practice?

In this whitepaper, we will have a close look at MySQL Fabric. You will learn the basics, and also learn how to migrate to a sharded environment.

Download the whitepaper

Sign up for our 9 DevOps Tips for going in production with Galera Cluster for MySQL / MariaDB webinar

Operations is not so much about specific technologies, but about the techniques and tools you use to deploy and manage them. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups; these are all aspects to consider – preferably before going live. In this webinar, we’ll guide you through 9 key devops tips to consider before taking Galera Cluster for MySQL / MariaDB into production.

Sign up for the webinar

Load balanced MySQL Galera setup - Manual Deployment vs ClusterControl

Deploying a MySQL Galera Cluster with redundant load balancing can be time consuming. This blog looks at how much time it would take to do it manually, using the popular “Google university” to search for how-to’s and blogs that provide deployment steps. Or using our agentless management and automation console, ClusterControl, which supports MySQL (Oracle and Percona server), MariaDB, MongoDB (MongoDB inc. and Percona), and PostgreSQL.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at September 22, 2016 01:41 PM

September 21, 2016

MariaDB Foundation

MariaDB Galera Cluster 5.5.52 and Connector/ODBC 2.0.12 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB Galera Cluster 5.5.52 and MariaDB Connector/ODBC 2.0.12. Both are Stable (GA) releases. See the release notes and changelog for details on theses releases. IMPORTANT: There was a security fix included in the 5.5.51 release of MariaDB and MariaDB Galera Cluster. If you are […]

The post MariaDB Galera Cluster 5.5.52 and Connector/ODBC 2.0.12 now available appeared first on MariaDB.org.

by Daniel Bartholomew at September 21, 2016 07:04 PM

Peter Zaitsev

Percona XtraDB Cluster 5.6.30-25.16.3 is now available

Percona XtraDB Cluster Reference Architecture

Percona XtraDB Cluster 5.6Percona  announces the new release of Percona XtraDB Cluster 5.6 on September 21, 2016. Binaries are available from the downloads area or our software repositories.

Percona XtraDB Cluster 5.6.30-25.16.3 is now the current release, based on the following:

  • Percona Server 5.6.30-76.3
  • Galera Replication library 3.16
  • Codership wsrep API version 25
Bugs Fixed:
  • Limiting ld_preload libraries to be loaded from specific directories in mysqld_safe didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Hrvoje Matijakovic at September 21, 2016 06:22 PM

Percona Server 5.7.14-8 is now available

percona server 5.6.30-76.3

percona server 5.7.14-8Percona announces the GA release of Percona Server 5.7.14-8 on September 21, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.7.14, including all the bug fixes in it, Percona Server 5.7.14-8 is the current GA release in the Percona Server 5.7 series. Percona’s provides completely open-source and free software. Find release details in the 5.7.14-8 milestone at Launchpad.

Bugs Fixed:
  • Limiting ld_preload libraries to be loaded from specific directories in mysqld_safe didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

The release notes for Percona Server 5.7.14-8 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

by Hrvoje Matijakovic at September 21, 2016 06:11 PM

Percona Server 5.6.32-78.1 is now available

percona server 5.6.30-76.3

percona server 5.6.32-78.1Percona announces the release of Percona Server 5.6.32-78.1 on September 21st, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.6.32, including all the bug fixes in it, Percona Server 5.6.32-78.1 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release are available in the 5.6.32-78.1 milestone on Launchpad.

Bugs Fixed:
  • Limiting ld_preload libraries to be loaded from specific directories in mysqld_safe didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

Release notes for Percona Server 5.6.32-78.1 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at September 21, 2016 06:04 PM

Percona Server 5.5.51-38.2 is now available

percona server 5.6.30-76.3

percona server 5.5.51-38.2Percona announces the release of Percona Server 5.5.51-38.2 on September 21, 2016. Based on MySQL 5.5.51, including all the bug fixes in it, Percona Server 5.5.51-38.2 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. You can find release details of the release in the 5.5.51-38.2 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:
  • Limiting ld_preload libraries to be loaded from specific directories in mysqld_safe didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

Find the release notes for Percona Server 5.5.51-38.2 in our online documentation. Report bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at September 21, 2016 05:58 PM

Jean-Jerome Schmidt

Sign up for our new webinar: 9 DevOps Tips for Going in Production with Galera Cluster for MySQL / MariaDB

Galera Cluster for MySQL / MariaDB is easy to deploy, but how does it behave under real workload, scale, and during long term operation? Proof of concepts and lab tests usually work great for Galera, until it’s time to go into production. Throw in a live migration from an existing database setup and devops life just got a bit more interesting...

If this scenario sounds familiar, then this webinar is for you!

Operations is not so much about specific technologies, but about the techniques and tools you use to deploy and manage them. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups; these are all aspects to consider – preferably before going live.

In this webinar, we’d like to guide you through 9 key tips to consider before taking Galera Cluster for MySQL / MariaDB into production.

Date & Time

Europe/MEA/APAC

Tuesday, October 11th at 09:00 BST (UK) / 10:00 CEST (Germany, France, Sweden)
Register Now

North America/LatAm

Tuesday, October 11th at 9:00 Pacific Time (US) / 12:00 Eastern Time (US)
Register Now

Agenda

  • 101 Sanity Check
  • Operating System
  • Backup Strategies
  • Replication & Sync
  • Query Performance
  • Schema Changes
  • Security / Encryption
  • Reporting
  • Managing from disaster

Speaker

Johan Andersson, CTO, Severalnines

Johan's technical background and interest are in high performance computing as demonstrated by the work he did on main-memory clustered databases at Ericsson as well as his research on parallel Java Virtual Machines at Trinity College Dublin in Ireland. Prior to co-founding Severalnines, Johan was Principal Consultant and lead of the MySQL Clustering & High Availability consulting group at MySQL / Sun Microsystems / Oracle, where he designed and implemented large-scale MySQL systems for key customers. Johan is a regular speaker at MySQL User Conferences as well as other high profile community gatherings with popular talks and tutorials around architecting and tuning MySQL Clusters.

Join us for this live webinar, where we’ll be discussing and demonstrating how to best proceed when planning to go into production with Galera Cluster.

We look forward to “seeing” you there and to insightful discussions!

If you have any questions or would like a personalised live demo, please do contact us.

by Severalnines at September 21, 2016 04:07 PM

Peter Zaitsev

Regular Expressions Tutorial

regular expressions

regular expressionsThis blog post highlights a video on how to use regular expressions.

It’s been a while since I did the MySQL QA and Bash Training Series. The 13 episodes were quite enjoyable to make, and a lot of people watched the video’s and provided great feedback.

In today’s new video, I’d like to briefly go over regular expressions. The session will cover the basics of regular expressions, and then some. I’ll follow up later with a more advanced regex session too.

Regular expressions are very versatile, and once you know how to use them – especially as a script developer or software coder – you will return to them again and again. Enjoy!

Presented by Roel Van de Paar. Full-screen viewing @ 720p resolution recommended

 

by Roel Van de Paar at September 21, 2016 01:48 PM

Webinar Thursday September 22 – Black Friday and Cyber Monday: How to Avoid an E-Commerce Disaster

e-commerce disaster

e-commerce disasterJoin Percona’s Sr. Technical Operations Architect, Tim Vaillancourt on Thursday, September 22, at 10 am PDT (UTC-7) for the webinar Black Friday and Cyber Monday: How to Avoid an E-Commerce Disaster. This webinar will provide some best practices to ensure the performance of your system under high-traffic conditions.

Can your retail site handle the traffic deluge on the busiest shopping day of the year?

Black Friday and Cyber Monday is mere months away. Major retailers have already begun stress-testing their e-commerce sites to make sure they can handle the load. Failure to accommodate the onslaught of post-Thanksgiving shoppers might result in both embarrassing headlines and millions of dollars in lost revenue. Our advice to retailers: September stress tests are essential to a glitch-free Black Friday.

This webinar will cover:

  • Tips to avoid bottlenecks in data-driven apps
  • Techniques to allow an app to grow and shrink for large events/launches
  • Solutions to alleviate load on an app’s database
  • Developing and testing scalable apps
  • Deployment strategies to avoid downtime
  • Creating lighter, faster user facing requests

For more ideas on how to optimize your E-commerce database, read Tim’s blog post here.

Please register here.

register-now

E-Commerce DisasterTimothy Vaillancourt, Senior Technical Operations Architect

Tim joined Percona in 2016 as Sr. Technical Operations Architect for MongoDB with a goal to make the operations of MongoDB as smooth as possible. With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming, combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned.

Tim is based in Amsterdam, NL and enjoys traveling, coding and music. Before Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS. Before moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

by Dave Avery at September 21, 2016 01:11 PM

September 20, 2016

Peter Zaitsev

MongoDB point-in-time backups made easy

MongoDB point-in-time backups

MongoDB point-in-time backupsIn this blog post we’ll look at MongoDB point-in-time backups, and work with them.

Mongodump is the base logical backup tool included with MongoDB. It takes a full BSON copy of database/collections, and optionally includes a log of changes during the backup used to make it consistent to a point in time. Mongorestore is the tool used to restore logical backups created by Mongodump. I’ll use these tools in the steps in this article to restore backed-up data. This article assumes a mongodump-based backup that was taken consistently with oplog changes (by using the command flag “–oplog”), and the backup is being restored to a MongoDB instance.

In this example, a mongodump backup is gathered and restored for the base collection data, and separately the oplogs/changes necessary to restore the data to a particular point-in-time are collected and applied to this data.

Note: Percona developed a backup tool named mongodb_consistent_backup, which is a wrapper for ‘mongodump’ with added cluster-wide backup consistency. The backups created by mongodb_consistent_backup (in Dump/Mongodump mode) can be restored using the same steps as a regular “mongodump” backup.

Stages

Stage 1: Get a Mongodump Backup

Mongodump Command Flags
–host/–port (and –user/–password)

Required, even if you’re using the default host/port (localhost:27017).  If authorization is enabled, add –user/–password flags also.

–oplog

Required for any replset member! Causes “mongodump” to capture the oplog change log during the backup for consistent to one point in time.

–gzip

Optional. For mongodump >= 3.2, enables inline compression on the backup files.

Steps
  1. Get a mongodump backup via (pick one):
    • Running “mongodump” with the correct flags/options to take a backup (w/oplog) of the data:
      $ mongodump --host localhost --port 27017 --oplog --gzip
      2016-08-15T12:32:28.930+0200    writing wikipedia.pages to
      2016-08-15T12:32:31.932+0200    [#########...............]  wikipedia.pages  674/1700   (39.6%)
      2016-08-15T12:32:34.931+0200    [####################....]  wikipedia.pages  1436/1700  (84.5%)
      2016-08-15T12:32:37.509+0200    [########################]  wikipedia.pages  2119/1700  (124.6%)
      2016-08-15T12:32:37.510+0200    done dumping wikipedia.pages (2119 documents)
      2016-08-15T12:32:37.521+0200    writing captured oplog to
      2016-08-15T12:32:37.931+0200    [##......................]  .oplog  44/492   (8.9%)
      2016-08-15T12:32:39.648+0200    [########################]  .oplog  504/492  (102.4%)
      2016-08-15T12:32:39.648+0200    dumped 504 oplog entries
    • Use the latest daily automatic backup, if it exists.

Stage 2: Restore the Backup Data

Steps
  1. Locate the shard PRIMARY member.
  2. Triple check you’re restoring the right backup to the right shard/host!
  3. Restore a mongodump-based backup to the PRIMARY node using the steps in this article: Restore a Mongodump Backup.
  4. Check for errors.
  5. Check that all SECONDARY members are in sync with the PRIMARY.

Stage 3: Get Oplogs for Point-In-Time-Recovery

In this stage, we will gather the changes needed to roll the data forward from the time of backup to the time/oplog-position to which we would like to restore.

In this example below, let’s pretend someone accidentally deleted an entire collection at oplog timestamp: “Timestamp(1470923942, 3)” and we want to fix it. If we decrement the Timestamp increment (2nd number) of “Timestamp(1470923942, 3)” we will have the last change before the accidental command, which in this case is: “Timestamp(1470923942, 2)“. Using the timestamp, we can capture and replay the oplogs from when the backup occurred to just before the issue/error.

A start and end timestamp are required to get the oplog data. In all cases, this will need to be gathered manually, case-by-case.

Helper Script
#!/bin/bash
#
# This tool will dump out a BSON file of MongoDB oplog changes based on a range of Timestamp() objects.
# The captured oplog changes can be applied to a host using 'mongorestore --oplogReplay --dir /path/to/dump'.
set -e
TS_START=$1
TS_END=$2
MONGODUMP_EXTRA=$3
function usage_exit() {
  echo "Usage $0: [Start-BSON-Timestamp] [End-BSON-Timestamp] [Extra-Mongodump-Flags (in quotes for multiple)]"
  exit 1
}
function check_bson_timestamp() {
  local TS=$1
  echo "$TS" | grep -qP "^Timestamp(d+,sd+)$"
  if [ $? -gt 0 ]; then
    echo "ERROR: Both timestamp fields must be in BSON Timestamp format, eg: 'Timestamp(########, #)'!"
    usage_exit
  fi
}
if [ -z "$TS_START" ] || [ -z "$TS_END" ]; then
  usage_exit
else
  check_bson_timestamp "$TS_START"
  check_bson_timestamp "$TS_END"
fi
MONGODUMP_QUERY='{ "ts" : { "$gte" : '$TS_START' }, "ts" : { "$lte" : '$TS_END' } }'
MONGODUMP_FLAGS='--db=local --collection=oplog.rs'
[ ! -z "$MONGODUMP_EXTRA" ] && MONGODUMP_FLAGS="$MONGODUMP_FLAGS $MONGODUMP_EXTRA"
if [ -d dump ]; then
  echo "'dump' subdirectory already exists! Exiting!"
  exit 1
fi
echo "# Dumping oplogs from '$TS_START' to '$TS_END'..."
mkdir dump
mongodump $MONGODUMP_FLAGS --query "$MONGODUMP_QUERY" --out - >dump/oplog.bson
if [ -f dump/oplog.bson ]; then
  echo "# Done!"
else
  echo "ERROR: Cannot find oplog.bson file! Exiting!"
  exit 1
fi

 

Script Usage:
$ ./dump_oplog_range.sh
Usage ./dump_oplog_range.sh: [Start-BSON-Timestamp] [End-BSON-Timestamp] [Extra-Mongodump-Flags (in quotes for multiple)]

 

Steps
  1. Find the PRIMARY member that contains the oplogs needed for the PITR restore.
  2. Determine the “end” Timestamp() needed to restore to. This oplog time should be before the problem occurred.
  3. Determine the “start” Timestamp() from right before the backup was taken.
    1. This timestamp doesn’t need to be exact, so something like a Timestamp() object equal-to “a few min before the backup started” is fine, but the more accurate you are, the fewer changes you’ll need to re-apply (which saves on restore time).
  4. Use the MongoToolsAndSnippets script: “get_oplog_range.sh (above in “Helper Script”) to dump the oplog time-ranges you need to restore to your chosen point-in-time. In this example I am gathering the oplog between two point-in-times (also passing in –username/–password flags in quotes the 3rd parameter):
    1. The starting timestamp: the BSON timestamp from before the mongodump backup in “Stage 2: Restore Collection Data” was taken, in this example. “Timestamp(1470923918, 0)” is a time a few seconds before my mongodump was taken (does not need to be exact).
    2. The end timestamp: the end BSON Timestamp to restore to, in this example. “Timestamp(1470923942, 2)” is the last oplog-change BEFORE the problem occurred.

    Example:

    $ wget -q https://raw.githubusercontent.com/percona/MongoToolsAndSnippets/master/rdba/dump_oplog_range.sh
    $ bash ./dump_oplog_range.sh 'Timestamp(1470923918, 0)' 'Timestamp(1470923942, 2)' '--username=secret --password=secret --host=mongo01.example.com --port=27024'
    # Dumping oplogs from 'Timestamp(1470923918, 0)' to 'Timestamp(1470923942, 2)'...
    2016-08-12T13:11:17.676+0200&nbsp;&nbsp;&nbsp; writing local.oplog.rs to stdout
    2016-08-12T13:11:18.120+0200&nbsp;&nbsp;&nbsp; dumped 22 documents
    # Done!

    Note: all additional mongodump flags (optional 3rd field) must be in quotes!

  5. Double check it worked by looking for the ‘oplog.bson‘ file and checking that the file has some data in it (168mb in the below example):

    $ ls -alh dump/oplog.bson
    -rw-rw-r--. 1 tim tim 168M Aug 12 13:11 dump/oplog.bson

     

Stage 4: Apply Oplogs for Point in Time Recovery (PITR)

In this stage, we apply the time-range-based oplogs gathered in Stage 3 to the restored data set to bring it from the time of the backup to a particular point in time before a problem occurred.

Mongorestore Command Flags
–host/–port (and –user/–password)

Required, even if you’re using the default host/port (localhost:27017).  If authorization is enabled, add –user/–password flags also.

–oplogReplay

Required. This is needed to replay the oplogs in this step.

–dir

Required. The path to the mongodump data.

Steps
  1. Copy the “dump” directory containing only the “oplog.bson”. file (captured in Stage 3) to the host that needs the oplog changes applied (the restore host).
  2. Run “mongorestore” on the “dump” directory to replay the oplogs into the instance. Make sure the “dump” dir contains only “oplog.bson”!
    $ mongorestore --host localhost --port 27017 --oplogReplay --dir ./dump
    2016-08-12T13:12:28.105+0200&nbsp;&nbsp;&nbsp; building a list of dbs and collections to restore from dump dir
    2016-08-12T13:12:28.106+0200&nbsp;&nbsp;&nbsp; replaying oplog
    2016-08-12T13:12:31.109+0200&nbsp;&nbsp;&nbsp; oplog&nbsp;&nbsp; 80.0 MB
    2016-08-12T13:12:34.109+0200&nbsp;&nbsp;&nbsp; oplog&nbsp;&nbsp; 143.8 MB
    2016-08-12T13:12:35.501+0200&nbsp;&nbsp;&nbsp; oplog&nbsp;&nbsp; 167.8 MB
    2016-08-12T13:12:35.501+0200&nbsp;&nbsp;&nbsp; done
  3. Validate the data was restored with the customer or using any means possible (examples: .count() queries, some random .find() queries, etc.).

by David Murphy at September 20, 2016 11:03 PM

Percona Live Europe featured talk with Marc Berhault — Inside CockroachDB’s Survivability Model

percona live europe featured talk

percona live europe featured talkWelcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Marc Berhault, Engineer at Cockroach Labs.His talk will be on Inside CockroachDB’s Survivability Model. This talk takes a deep dive into CockroachDB, a database whose “survive and thrive” model aims to bring the best aspects of Google’s next generation database, Spanner, to the rest of the world via open source.

I had a chance to speak with Marc and learn a bit more about these questions:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Marc: I started out as a Site Reliability Engineer managing Google’s storage infrastructure (GFS). Back in those days, keeping a cluster up and running mostly meant worrying about the masters.

I then switched to a developer role on Google’s next-generation storage system, which replaced the single write master with sharded metadata handlers. This increased the reliability of the entire system considerably, allowing for machine and network failures. SRE concerns gradually shifted away from machine reliability towards more interesting problems, such as multi-tenancy issues (quotas, provisioning, isolation) and larger scale failures.

After leaving Google, I found myself back in a world where one had to worry about a single machine all over again – at least when running your own infrastructure. I kept hearing the same story: a midsize company starts to grow out of its single-machine database and starts trimming the edges. This means moving tables to other hosts, shrinking schemas, etc., in order to avoid the dreaded “great sharding of the monolithic table,” often accompanied by its friends: cross-shard coordination layer and production complexity.

This was when I joined Cockroach Labs, a newly created startup with the goal of bringing a large-scale, transactional, strongly consistent database to the world at large. After contributing to various aspects of the projects, I switched my focus to production: adding monitoring, working on deployment, and of course rolling out our test clusters.

Percona: Your talk is called “Inside CockroachDB’s Survivability Model.” Define “survivability model”, and why it is important to database environments.

Marc: The survivability model in CockroachDB is centered around data redundancy. By default, all data is replicated three times (this is configurable) and is only considered written if a quorum exists. When a new node holding one of the copies of the data becomes unavailable, a node is picked and given a snapshot of the data.

This redundancy model has been widely used in distributed systems, but rarely with strongly consistent databases. CockroachDB’s approach provides strong consistency as well as transactions across the distributed data. We see this as a critical component of modern databases: allowing scalability while guaranteeing consistency.

Percona: What are the workloads and database environments that are best suited for a CockroachDB deployment? Do you see an expansion of the solution to encompass other scenarios?

Marc: CockroachDB is a beta product and is still in development. We expect to be out of beta by the end of 2016. Ideal workloads are those requiring strong consistency – those applications that manage critical data. However, strong consistency comes at a cost, usually directly proportional to latency between nodes and replication factor. This means that a widely distributed CockroachDB cluster (e.g., across multiple regions) will incur high write latencies, making it unsuitable for high-throughput operations, at least in the near term.

Percona: What is changing in the way businesses use databases that keeps you awake at night? How do you think CockroachDB is addressing those concerns?

Marc: In recent years, more and more businesses have been reaching the limits of what their single-machine databases can handle. This has forced many to implement their own transactional layers on top of disjoint databases, at the cost of longer development time and correctness.

CockroachDB attempts to find a solution to this problem by allowing a strongly consistent, transactional database to scale arbitrarily.

Percona: What are looking forward to the most at Percona Live Europe this year?

Marc: This will be my first time at a Percona Live conference, so I’m looking forward to hearing from other developers and learning what challenges other architects and DBAs are facing in their own work.

You can read more about Marc’s thoughts on CockroachDB at their blog.

Want to find out more about Marc, CoachroachDB and survivability? Register for Percona Live Europe 2016, and come see his talk Inside CockroachDB’s Survivability Model.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

by Dave Avery at September 20, 2016 04:41 PM

Jean-Jerome Schmidt

Database Sharding - How does it work?

Database systems with large data sets or high throughput applications can challenge the capacity of a single database server. High query rates can exhaust CPU capacity, I/O resources, RAM or even network bandwidth.

Horizontal scaling is often the only way to scale out your infrastructure. You can upgrade to more powerful hardware, but there is a limit on how much load a single host can handle. You may be able to purchase the most expensive and the fastest CPU or storage on the market, but it still may not be enough to handle your workload. The only feasible way to scale beyond the constraints of a single host is to utilize multiple hosts working together as a part of a cluster or connected using replication.

Horizontal scaling has its limits too, though. When it comes to scaling reads, it is very efficient - just add a node and you can utilize additional processing power. With writes, things are completely different. Consider a MySQL replication setup. Historically, MySQL replication used a single thread to process writes - in a multi-user, highly concurrent environment, this was a serious limitation. This has changed recently. In MySQL 5.6, multiple schemas could be replicated in parallel. In MySQL 5.7, after addition of a ‘logical clock’ scheduler, it became possible for a single-schema workload to benefit from the parallelization of multi-threaded replication. Galera Cluster for MySQL also allows for multi-threaded replication by utilizing multiple workers to apply writesets. Still, even with those enhancements, you can get just some incremental improvement in the write throughput - it is not the solution to the problem.

One solution would be to split our data across multiple servers using some kind of a pattern and, in that way, to split writes across multiple MySQL hosts. This is sharding.

The idea is really simple - if my database server cannot handle the amount of writes, let’s split the data somehow and store one part, generating part of the write traffic, on one database host and the other part on another host. In that way, each host will have to handle half of the writes which should be well within their hardware limits. We can further split the data and distribute it on more servers if our write workload grows.

The actual implementation is more complex as there are numerous issues you need to solve before you can implement sharding. The first, very important question that you need to answer is - how are you going to split your data?

Functional sharding

Let’s imagine your application is built out of multiple modules, or microservices if we want to be fashionable. Assume it’s a large online store with a backend of several warehouses. Such site may contain a module to handle warehouse logistics - check the availability of an item, track shipment from a warehouse to a customer. Another module may be an online store - a website with a presentation of available goods. Yet another module would be a transaction module - collect and store credit cards, handle transaction processing and so on. Maybe the online store has a large, buzzing forum where customers share opinions on goods, discuss support issues etc. You may start your voyage in the world of shards by using a separate database per module. This will allow you to gain some breathing space and plan for next steps. On the other hand, the next step may not be necessary at all if each shard can comfortably handle its workload. Of course, there are downsides of such setup - you cannot easily query data across modules (shards) - you have to execute separate queries to separate databases and then combine together resultsets.

Expression-based sharding

Another method of splitting the data across shards would be to use some kind of expression or function/algorithm to help us decide where the data should be located. Let’s imagine you have a database with one large table that is commonly accessed and written to. For example, assume a social media site and our largest table contains data about users and their activities. This table uses some kind of id column as primary key - we need to split it somehow and one of the ways would be to apply an expression to the ID value. A very popular choice is to use a modulo function - if we want to generate 128 shards, we can just apply expression of ‘id % 128’ and this would calculate the shard number where a given row should be stored. Another method include making use of a date range, e.g., all user activity in year 2015 is stored in one database, activity in year 2016 is stored in a separate database). Yet another one would be to distribute data based on a list of attributes, e.g., all users from a specific country end up in the same shard. 

Metadata-based sharding

As we discussed above, both functional sharding and expression-based sharding have limitations when it comes to scaling out in terms of number of shards. There’s still one more method which gives you more flexibility in managing shards - a metadata-based sharding. The idea is very simple - instead of using some kind of hard-coded algorithm, let’s just write down where a given row is located: row of id=1 - shard 1, row with id=2 - shard 5. Finally, let’s build a database to keep this metadata.

This approach has a huge benefit - you can store any row in any shard. You can also easily add new shards to the mix - just set them up and start to store data on them. You can also easily migrate data between shards - nothing stops you from copying data between shards and then making an adjustment in the metadata. In reality it’s more complex than it sounds as you have to make sure you move all the data so some kind of data locking is required. For example, to copy data between shards, you’d have to do an initial copy of the data across shards, lock access to the part of the data which is migrated, make a final sync and, finally, change an entry in the metadata database and unlock the data.

This is it for now. If you would like to learn more about sharding, you may want to check out this ebook:

Database Sharding with MySQL Fabric

Why do we shard? How does sharding work? What are the different ways I can shard my database? This whitepaper goes through some of the theory behind sharding. It also discusses three different tools which are designed to help users shard their MySQL databases. And last but not least, it shows you how to set up a sharded MySQL setup based on MySQL Fabric and ProxySQL.

Download Here

by Severalnines at September 20, 2016 09:33 AM

September 19, 2016

Peter Zaitsev

Open Source Databases at Percona Live Europe, Amsterdam

Percona Live Europe

Percona Live EuropeIn this blog post, I’ll review some of the open source database technologies discussions at Percona Live Europe.

I’ve already written about the exciting PostgreSQL and MongoDB content at Percona Live Europe in Amsterdam, and now I’m going to highlight some of our open source database content.  

In the last five years, the open source database community has been flourishing. There has been an explosion of creativity and innovation. The community has created many niche (and not so niche) solutions for various problems.

As a software engineer or architect, the number of available database options might excite you. You might also be intimidated about how to make the right technology choice. At Percona Live Europe, we have introductory talks for the relevant technologies that we find particularly interesting. These talks will help expand your knowledge about the available solutions and lessen intimidation at the same time.

I’m looking forward to the exciting technologies and talks that we’ll cover this year, such as:

For talks and tutorials on specific uses cases, check out the following sessions:

  • RocksDB is a very cool write optimized (LSM) storage engine, one of the few that has been in more than one database. In addition to the RocksDB-based systems inside Facebook, it can be used with MongoDB as MongoRocks and MySQL as MyRocks. It is also used inside next-generation database systems such as CockroachDB and TiDB. We have a lot of talks about RocksDB and related integrations, ranging from a MyRocks Tutorial by Yoshinori Matsunobu, to talk about MongoRocks by Igor Canadi, and a performance-focused talk by Mark Callaghan.
  • Elastic is the leading technology for open source full-text search implementations (hence previous name ElasticSearch) — but it is much more than that. ElasticSearch, Kibana, Logstash and Beats allow you to get data from a variety of data searches and analyze and visualize it. Philip Krenn will talk about full-text search in general in his Full-Text Search Explained talk, as well as talk in more details about ElasticSearch in ElasticSearch for SQL Users.
  • I am sure you’ve heard about Redis, the Swiss army knife of different data structures and operations. Redis covers many typical data tasks, from caching to maintaining counters and queues. Justin Starry will talk about Redis at Scale in Powering Million of live streams at Periscope, and Itamar Haber will talk about Extending Redis with Modules to make Redis an even more powerful data store.
  • Apache Spark is another technology you’ve surely heard about. Apache Spark adoption has skyrocketed in recent years due to its high-performance in-memory data analyses, replacing or supplementing Hadoop installations. We will hear about Badoo’s experience processing 11 billion events a day with Spark with Alexander Krasheninnikov, and also learn how to use Spark with MongoDB, MySQL and Redis with Tim Vaillancourt.
  • Apache Cassandra is a database focused on high availability and high performance, even when replicating among several data centers. When you think “eventual consistency,” perhaps Cassandra is the first technology that comes to mind. Cassandra allows you to do some impressive things, and Duy Hai Doan will show us some of them in his talk 7 things in Cassandra that you cannot find in RDBMS.
  • ClickHouse is a new guy on the block, but I’m very excited about this distributed column store system for high-performance analytics. Built by the Yandex team to power real-time analytics on the scale of trillions of database records, ClickHouse went open source earlier this year. Victor Tarnavsky will share more details in his talk.
  • Apache Ignite is another new but very exciting technology. Described as in-memory data fabric, it can be used for a variety of applications to supplement or replace relational databases — ranging from advanced data caching strategies to parallel in-memory processing of large quantities of data. Christos Erotocritou will talk about some of these use cases in his talk Turbocharge Your SQL Queries In-Memory with Apache Ignite.
  • RethinkDB is an interesting OpenSource NoSQL database built from the ground up for scalable real-time applications. The end-to-end real-time data streaming feature is really cool, and allows you build interactive real-time applications much easier. Ilya Verbitskiy will talk about RethinkDB in his Agile web-development with RethinkDB talk.
  • CockroachDB is a distributed database focused on survivability and high performance (borrowing some ideas from Google’s innovative Spanner database). Marc Berhault will talk database rocket science in his Inside CockroachDB’s Survivability Model.
  • TiDB is another open source NewSQL database, inspired by Google Spanner and F1. It can use a variety of storage engines for data store, and it supports MySQL wire protocol to ease application migration. Max Liu explains How TiDB was built in his talk.
  • ToroDB is a very interesting piece of technology. It is protocol-compatible with MongoDB, but stores data through a relational format in PostgreSQL. This can offer substantial space reduction and performance improvements for some workloads. Álvaro Hernández from 8Kdata will discuss this technology in his ToroDB: All your MongoDB data are belong to SQL talk.

As you can see we cover a wealth of exciting open source database technologies at Percona Live Europe. Do not miss a chance to expand your database horizons and learn about new developments in the industry. There is still time to register! Use the code PZBlog for a €30 discount off your registration price!

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

by Peter Zaitsev at September 19, 2016 10:18 PM

Percona Server for MongoDB 3.2.9-2.1 is now available

Percona_ServerfMDBLogoVert

Percona Server for MongoDBPercona announces the release of Percona Server for MongoDB 3.2.9-2.1 on September 19, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.2.9-2.1 is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.2 protocol and drivers. It extends MongoDB with MongoRocks, Percona Memory Engine, and PerconaFT storage engine, as well as enterprise-grade features like external authentication and audit logging at no extra cost. Percona Server for MongoDB requires no changes to MongoDB applications or code.

Note:

We deprecated the PerconaFT storage engine. It will not be available in future releases.


This release is based on MongoDB 3.2.9. There are no additional improvements or new features on top of those upstream fixes.

The release notes are available in the official documentation.

 

by Alexey Zhebel at September 19, 2016 04:17 PM

Jean-Jerome Schmidt

Load balanced MySQL Galera setup - Manual Deployment vs ClusterControl

If you have deployed databases with high availability before, you will know that a deployment does not always go your way, even though you’ve done it a zillion times. You could spend a full day setting everything up and may still end up with a non-functioning cluster. It is not uncommon to start over, as it’s really hard to figure out what went wrong.

So, deploying a MySQL Galera Cluster with redundant load balancing takes a bit of time. This blog looks at how long time it would take to do it manually vs using ClusterControl to perform the task. For those who have not used it before, ClusterControl is an agentless management and automation software for databases. It supports MySQL (Oracle and Percona server), MariaDB, MongoDB (MongoDB inc. and Percona), and PostgreSQL.

For manual deployment, we’ll be using the popular “Google university” to search for how-to’s and blogs that provide deployment steps.

Database Deployment

Deployment of a database consists of several parts. These include getting the hardware ready, software installation, configuration tweaking and a bit of tuning and testing. Now, let’s assume the hardware is ready, the OS is installed and it is up to you to do the rest. We are going to deploy a three-node Galera cluster as shown in the following diagram:

Manual

Googling on “install mysql galera cluster” led us to this page. By following the steps explained plus some additional dependencies, the following is what we should run on every DB node:

$ semanage permissive -a mysqld_t
$ systemctl stop firewalld
$ systemctl disable firewalld
$ vim /etc/yum.repos.d/galera.repo # setting up Galera repository
$ yum install http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-release-0.1-3.noarch.rpm
$ yum install mysql-wsrep-5.6 galera3 percona-xtrabackup
$ vim /etc/my.cnf # setting up wsrep_* variables
$ systemctl start mysql --wsrep-new-cluster # ‘systemctl start mysql’ on the remaining nodes
$ mysql_secure_installation

The above commands took around 18 minutes to finish on each DB node. Total deployment time was 54 minutes.

ClusterControl

Using ClusterControl, here are the steps we took to first install ClusterControl (5 minutes):

$ wget http://severalnines.com/downloads/cmon/install-cc
$ chmod 755 install-cc
$ ./install-cc

Login to the ClusterControl UI and create the default admin user.

Setup passwordless SSH to all DB nodes on ClusterControl node (1 minute):

$ ssh-keygen -t rsa
$ ssh-copy-id 10.0.0.217
$ ssh-copy-id 10.0.0.218
$ ssh-copy-id 10.0.0.219

In the ClusterControl UI, go to Create Database Cluster -> MySQL Galera and enter the following details (4 minutes):

Click Deploy and wait until the deployment finishes. You can monitor the deployment progress under ClusterControl -> Settings -> Cluster Jobs and once deployed, you will notice it took around 15 minutes:

To sum it up, the total deployment time including installing ClusterControl is 15 + 4 + 1 + 5 = 25 minutes.

Following table summarizes the above deployment actions:

Area Manual ClusterControl
Total steps 8 steps x 3 servers + 1 = 25 8
Duration 18 x 3 = 54 minutes 25 minutes

To summarize, we needed less steps and less time with ClusterControl to achieve the same result. 3 node is sort of a minimum cluster size, and the difference would get bigger with clusters with more nodes.

Load Balancer and Virtual IP Deployment

Now that we have our Galera cluster running, the next thing is to add a load balancer in front. This provides one single endpoint to the cluster, thus reducing the complexity for applications to connect to a multi-node system. Applications would not need to have knowledge of the topology and any changes caused by failures or admin maintenance would be masked. For fault tolerance, we would need at least 2 load balancers with a virtual IP address.

By adding a load balancer tier, our architecture will look something like this:

Manual Deployment

Googling on “install haproxy virtual ip galera cluster” led us to this page. We followed the steps:

On each HAproxy node (2 times):

$ yum install epel-release
$ yum install haproxy keepalived
$ systemctl enable haproxy
$ systemctl enable keepalived
$ vi /etc/haproxy/haproxy.cfg # configure haproxy
$ systemctl start haproxy
$ vi /etc/keepalived/keepalived.conf # configure keepalived
$ systemctl start keepalived

On each DB node (3 times):

$ wget https://raw.githubusercontent.com/olafz/percona-clustercheck/master/clustercheck
$ chmod +x clustercheck
$ mv clustercheck /usr/bin/
$ vi /etc/xinetd.d/mysqlchk # configure mysql check user
$ vi /etc/services # setup xinetd port
$ systemctl start xinetd
$ mysql -uroot -p
mysql> GRANT PROCESS ON *.* TO 'clustercheckuser'@'localhost' IDENTIFIED BY 'clustercheckpassword!'

The total deployment time for this was around 42 minutes.

ClusterControl

For the ClusterControl host, here are the steps taken (1 minute) :

$ ssh-copy-id 10.0.0.229
$ ssh-copy-id 10.0.0.230

Then, go to ClusterControl -> select the database cluster -> Add Load Balancer and enter the IP address of the HAproxy hosts, one at a time:

Once both HAProxy are deployed, we can add Keepalived to provide a floating IP address and perform failover:

Go to ClusterControl -> select the database cluster -> Logs -> Jobs. The total deployment took about 5 minutes, as shown in the screenshot below:

Thus, total deployment for load balancers plus virtual IP address and redundancy is 1 + 5 = 6 minutes.

Following table summarized the above deployment actions:

Area Manual ClusterControl
Total steps (8 x 2 haproxy nodes) + (8 x 3 DB nodes) = 40 6
Duration 42 minutes 6 minutes

ClusterControl also manages and monitors the load balancers:

Adding a Read Replica

Our setup is now looking pretty decent, and the next step is to add a read replica to Galera. What is a read replica, and why do we need it? A read replica is an asynchronous slave, replicating from one of the Galera nodes using standard MySQL replication. There are a few good reasons to have this. Long-running reporting/OLAP type queries on a Galera node might slow down an entire cluster, if the reporting load is so intensive that the node has to spend considerable effort coping with it. So reporting queries can be sent to a standalone server, effectively isolating Galera from the reporting load. An asynchronous slave can also serve as a remote live backup of our cluster in a DR site, especially if the link is not good enough to stretch one cluster across 2 sites.

Our architecture is now looking like this:

Manual Deployment

Googling on “mysql galera with slave” brought us to this page. We followed the steps:

On master node:

$ vim /etc/my.cnf # setting up binary log and gtid
$ systemctl restart mysql
$ mysqldump --single-transaction --skip-add-locks --triggers --routines --events > dump.sql
$ mysql -uroot -p
mysql> GRANT REPLICATION SLAVE ON .. ;

On slave node (we used Percona Server):

$ yum install http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-release-0.1-3.noarch.rpm
$ yum install Percona-Server-server-56
$ vim /etc/my.cnf # setting up server id, gtid and stuff
$ systemctl start mysql
$ mysql_secure_installation
$ scp root@master:~/dump.sql /root
$ mysql -uroot -p < /root/dump.sql
$ mysql -uroot -p
mysql> CHANGE MASTER ... MASTER_AUTO_POSITION=1;
mysql> START SLAVE;

The total time spent for this manual deployment was around 40 minutes (with 1GB database in size).

ClusterControl

With ClusterControl, here is what we should do. Firstly, configure passwordless SSH to the target slave (0.5 minute):

$ ssh-copy-id 10.0.0.231 # setup passwordless ssh

Then, on one of the MySQL Galera nodes, we have to enable binary logging to become a master (2 minutes):

Click Proceed to start enabling binary log for this node. Once completed, we can add the replication slave by going to ClusterControl -> choose the Galera cluster -> Add Replication Slave and specify as per below (6 minutes including streaming 1GB of database to slave):

Click “Add node” and you are set. Total deployment time for adding a read replica complete with data is 6 + 2 + 0.5 = 8.5 minutes.

Following table summarized the above deployment actions:

Area Manual ClusterControl
Total steps 15 3
Duration 40 minutes 8.5 minutes

We can see that ClusterControl automates a number of time consuming tasks, including slave installation, backup streaming and slaving from master. Note that ClusterControl will also handle things like master failover so that replication does not break if the galera master fails..

Conclusion

A good deployment is important, as it is the foundation of an upcoming database workload. Speed matters too, especially in agile environments where a team frequently deploys entire systems and tears them down after a short time. You’re welcome to try ClusterControl to automate your database deployments, it comes with a free 30-day trial of the full enterprise features. Once the trial ends, it will default to the community edition (free forever).

by Severalnines at September 19, 2016 09:12 AM

September 16, 2016

Peter Zaitsev

How X Plugin Works Under the Hood

X Plugin

X PluginIn this blog post, we’ll look at what MySQL does under the hood to transform NoSQL requests to SQL (and then store them in InnoDB transactional engine) when using the X Plugin.

X Plugin allows MySQL to function as a document store. We don’t need to define any schema or use SQL language while still being a fully ACID database. Sounds like magic – but we know the only thing that magic does is make planes fly! 🙂

Alexander already wrote a blog post exploring how the X Plugin works, with some examples. In this post, I am going to show some more query examples and how they are transformed.

I have enabled the slow query log to see what it is actually being executed when I run NoSQL queries.

Creating our first collection

We start the MySQL shell and create our first collection:

$ mysqlsh -u root --py
Creating an X Session to root@localhost:33060
No default schema selected.
[...]
Currently in Python mode. Use sql to switch to SQL mode and execute queries.
mysql-py> db.createCollection("people")

What is a collection in SQL terms? A table. Let’s check what MySQL does by reading the slow query log:

CREATE TABLE `people` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,
  PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

As we correctly guessed, it creates a table with two columns. One is called “doc” and it stores a JSON document. A second column named “_id” and is created as a virtual column from data extracted from that JSON document. _id is used as a primary key, and if we don’t specify a value, MySQL will choose a random UUID every time we write a document.

So, the basics are clear.

  • It stores everything inside a JSON column.
  • Indexes are created on virtual columns that are generated by extracting data from that JSON. Every time we add a new index, a virtual column will be generated. That means that under the hood, an alter table will run adding the column and the corresponding index.

Let’s run a getCollections that would be similar to “SHOW TABLES” in the SQL world:

mysql-py> db.getCollections()
[
]

This is what MySQL actually runs:

SELECT C.table_name AS name, IF(ANY_VALUE(T.table_type)='VIEW', 'VIEW', IF(COUNT(*) = COUNT(CASE WHEN (column_name = 'doc' AND data_type = 'json') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name = '_id' AND generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name != '_id' AND generation_expression RLIKE '^(json_unquote[[.(.]])?json_extract[[.(.]]`doc`,''[[.$.]]([[...]][^[:space:][...]]+)+''[[.).]]{1,2}$') THEN 1 ELSE NULL END), 'COLLECTION', 'TABLE')) AS type FROM information_schema.columns AS C LEFT JOIN information_schema.tables AS T USING (table_name)WHERE C.table_schema = 'test' GROUP BY C.table_name ORDER BY C.table_name;

This time, the query is a bit more complex. It runs a query on information_schema.tables joining it, with information_schema.columns searching for tables that have “doc” and “_id” columns.

Inserting and reading documents

I am going to start adding data to our collection. Let’s add our first document:

mysql-py> db.people.add(
      ...  {
      ...     "Name": "Miguel Angel",
      ...     "Country": "Spain",
      ...     "Age": 33
      ...   }
      ... )

In the background, MySQL inserts a JSON object and auto-assign a primary key value.

INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Age',33,'Country','Spain','Name','Miguel Angel','_id','a45c69cd2074e611f11f62bf9ac407d7'));

Ok, this is supposed to be schemaless. So let’s add someone else using different fields:

mysql-py> db.people.add(
      ...  {
      ...     "Name": "Thrall",
      ...     "Race": "Orc",
      ...     "Faction": "Horde"
      ...   }
      ... )

Same as before, MySQL just writes another JSON object (with different fields):

INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Faction','Horde','Name','Thrall','Race','Orc','_id','7092776c2174e611f11f62bf9ac407d7'));

Now we are going to read the data we have just inserted. First, we are going to find all documents stored in the collection:

mysql-py> db.people.find()

MySQL translates to a simple:

SELECT doc FROM `test`.`people`;

And this is how filters are transformed:

mysql-py> db.people.find("Name = 'Thrall'")

It uses a SELECT with the WHERE clause on data extracted from the JSON object.

SELECT doc FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall');

Updating documents

Thrall decided that he doesn’t want to belong to the Horde anymore. He wants to join the Alliance. We need to update the document:

mysql-py> db.people.modify("Name = 'Thrall'").set("Faction", "Alliance")

MySQL runs an UPDATE, again using a WHERE clause on the data extracted from the JSON. Then, it updates the “Faction”:

UPDATE `test`.`people` SET doc=JSON_SET(doc,'$.Faction','Alliance') WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall');

Now I want to remove my own document:

mysql-py> db.people.remove("Name = 'Miguel Angel'");

As you can already imagine, it runs a DELETE, searching for my name on the data extracted from the JSON object:

DELETE FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Miguel Angel');

Summary

The magic that makes our MySQL work like a document-store NoSQL database is:

  • Create a simple InnoDB table with a JSON column.
  • Auto-generate the primary key with UUID values and represent it as a virtual column.
  • All searches are done by extracting data JSON_EXTRACT, and passing that info to the WHERE clause.

I would define the solution as something really clever, simple and clean. Congrats to Oracle! 🙂

by Miguel Angel Nieto at September 16, 2016 07:59 PM

Consul, ProxySQL and MySQL HA

ProxySQL

When it comes to “decision time” about which type of MySQL HA (high-availability) solution to implement, and how to architect the solution, many questions come to mind. The most important questions are:

  • “What are the best tools to provide HA and Load Balancing?”
  • “Should I be deploying this proxy tool on my application servers or on a standalone server?”.

Ultimately, the best tool really depends on the needs of your application and your environment. You might already be using specific tools such as Consul or MHA, or you might be looking to implement tools that provide richer features. The dilemma of deploying a proxy instance per application host versus a standalone proxy instance is usually a trade-off between “a less effective load balancing algorithm” or “a single point of failure.” Neither are desirable, but there are ways to implement a solution that balances all aspects.

In this article, we’ll go through a solution that is suitable for an application that has not been coded to split reads and writes over separate MySQL instances. An application like this would rely on a proxy or 3rd party tool to split reads/writes, and preferably a solution that has high-availability at the proxy layer. The solution described here is comprised of ProxySQLConsul and Master High Availability (MHA). Within this article, we’ll focus on the configuration required for ProxySQL and Consul since there are many articles that cover MHA configuration (such as Miguel’s recent MHA Quick Start Guide blog post).

When deploying Consul in production, a minimum of 3x instances are recommended – in this example, the Consul agents run on the Application Server (appserver) as well as on the two “ProxySQL servers” mysql1 and mysql2 (which act as the HA proxy pair). This is not a hard requirement, and these instances can easily run on another host or docker container. MySQL is deployed locally on mysql1 and mysql2, however this could just as well be 1..n separate standalone DB server instances:

Consul ProxySQL

So let’s move on to the actual configuration of this HA solution, starting with Consul.

Installation of Consul:

Firstly, we’ll need to install the required packages, download the Consul archive and perform the initial configuration. We’ll need to perform the same installation on each of the nodes (i.e., appserver, mysql1 and mysql2).

### Install pre-requisite packages:
sudo yum -y install wget unzip bind-utils dnsmasq
### Install Consul:
sudo useradd consul
sudo mkdir -p /opt/consul /etc/consul.d
sudo touch /var/log/consul.log /etc/consul.d/proxysql.json
cd /opt/consul
sudo wget https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_linux_amd64.zip
sudo unzip consul_0.6.4_linux_amd64.zip
sudo ln -s /opt/consul/consul /usr/bin/consul
sudo chown consul:consul -R /etc/consul* /opt/consul* /var/log/consul.log

Configuration of Consul on Application Server (used as ‘bootstrap’ node):

Now, that we’re done with the installation on each of the hosts, let’s continue with the configuration. In this example we’ll bootstrap the Consul cluster using “appserver”:

### Edit configuration files
$ sudo vi /etc/consul.conf
{
  "datacenter": "dc1",
  "data_dir": "/opt/consul/",
  "log_level": "INFO",
  "node_name": "agent1",
  "server": true,
  "ui": true,
  "bootstrap": true,
  "client_addr": "0.0.0.0",
  "advertise_addr": "192.168.1.119"  ## Add server IP here
}
######
$ sudo vi /etc/consul.d/proxysql.json
{"services": [
  {
   "id": "proxy1",
   "name": "proxysql",
   "address": "192.168.1.120",
   "tags": ["mysql"],
   "port": 6033,
   "check": {
     "script": "mysqladmin ping --host=192.168.1.120 --port=6033 --user=root --password=123",
     "interval": "3s"}
   },
  {
   "id": "proxy2",
   "name": "proxysql",
   "address": "192.168.1.121",
   "tags": ["mysql"],
   "port": 6033,
   "check": {
     "script": "mysqladmin ping --host=192.168.1.121 --port=6033 --user=root --password=123",
     "interval": "3s"}
   }
 ]
}
######
### Start Consul agent
$ sudo su - consul -c 'consul agent -config-file=/etc/consul.conf -config-dir=/etc/consul.d > /var/log/consul.log &'
### Setup DNSMASQ (as root)
echo "server=/consul/127.0.0.1#8600" > /etc/dnsmasq.d/10-consul
service dnsmasq restart
### Remember to add the localhost as a DNS server (this step can vary
### depending on how your DNS servers are managed... here I'm just
### adding the following line to resolve.conf:
sudo vi /etc/resolve.conf
#... snippet ...#
nameserver 127.0.0.1
#... snippet ...#
### Restart dnsmasq
sudo service dnsmasq restart

The service should now be started, and you can verify this in the logs in “/var/log/consul.log”.

Configuration of Consul on Proxy Servers:

The next item is to configure each of the proxy Consul agents. Note that the “agent name” and the “IP address” need to be updated for each host (values for both must be unique):

### Edit configuration files
$ sudo vi /etc/consul.conf
{
  "datacenter": "dc1",
  "data_dir": "/opt/consul/",
  "log_level": "INFO",
  "node_name": "agent2",  ### Agent node name must be unique
  "server": true,
  "ui": true,
  "bootstrap": false,   ### Disable bootstrap on joiner nodes
  "client_addr": "0.0.0.0",
  "advertise_addr": "192.168.1.xxx",  ### Set to local instance IP
  "dns_config": {
    "only_passing": true
  }
}
######
$ sudo vi /etc/consul.d/proxysql.json
{"services": [
  {
   "id": "proxy1",
   "name": "proxysql",
   "address": "192.168.1.120",
   "tags": ["mysql"],
   "port": 6033,
   "check": {
     "script": "mysqladmin ping --host=192.168.1.120 --port=6033 --user=root --password=123",
     "interval": "3s"}
   },
  {
   "id": "proxy2",
   "name": "proxysql",
   "address": "192.168.1.121",
   "tags": ["mysql"],
   "port": 6033,
   "check": {
     "script": "mysqladmin ping --host=192.168.1.121 --port=6033 --user=root password=123",
     "interval": "3s"}
   }
 ]
}
######
### Start Consul agent:
$ sudo su - consul -c 'consul agent -config-file=/etc/consul.conf -config-dir=/etc/consul.d > /var/log/consul.log &'
### Join Consul cluster specifying 1st node IP e.g.
$ consul join 192.168.1.119
### Verify logs and look out for the following messages:
$ cat /var/log/consul.log
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'agent2'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 192.168.1.120 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas:
==> Log data will now stream in as it occurs:
# ... snippet ...
    2016/09/05 19:48:04 [INFO] agent: Synced service 'consul'
    2016/09/05 19:48:04 [INFO] agent: Synced check 'service:proxysql1'
    2016/09/05 19:48:04 [INFO] agent: Synced check 'service:proxysql2'
# ... snippet ...

At this point, we have Consul installed, configured and running on each of our hosts appserver (mysql1 and mysql2). Now it’s time to install and configure ProxySQL on mysql1 and mysql2.

Installation & Configuration of ProxySQL:

The same procedure should be run on both mysql1 and mysql2 hosts:

### Install ProxySQL packages and initialise ProxySQL DB
sudo yum -y install https://github.com/sysown/proxysql/releases/download/v1.2.2/proxysql-1.2.2-1-centos7.x86_64.rpm
sudo service proxysql initial
sudo service proxysql stop
### Edit the ProxySQL configuration file to update username / password
vi /etc/proxysql.cnf
###
admin_variables=
{
    admin_credentials="admin:admin"
    mysql_ifaces="127.0.0.1:6032;/tmp/proxysql_admin.sock"
}
###
### Start ProxySQL
sudo service proxysql start
### Connect to ProxySQL and configure
mysql -P6032 -h127.0.0.1 -uadmin -padmin
### First we create a replication hostgroup:
mysql> INSERT INTO mysql_replication_hostgroups VALUES (10,11,'Standard Replication Groups');
### Add both nodes to the hostgroup 11 (ProxySQL will automatically put the writer node in hostgroup 10)
mysql> INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.120',11,3306,1000);
mysql> INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.121',11,3306,1000);
### Save server configuration
mysql> LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
### Add query rules for RW split
mysql> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, cache_ttl, apply) VALUES (1, '^SELECT .* FOR UPDATE', 10, NULL, 1);
mysql> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, cache_ttl, apply) VALUES (1, '^SELECT .*', 11, NULL, 1);
mysql> LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;
### Finally configure ProxySQL user and save configuration
mysql> INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema) VALUES ('root','123',1,10,'test');
mysql> LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK;
mysql> EXIT;

MySQL Configuration:

We also need to perform one configuration step on the MySQL servers in order to create a user for ProxySQL to monitor the instances:

### ProxySQL's monitor user on the master MySQL server (default username and password is monitor/monitor)
mysql -h192.168.1.120 -P3306 -uroot -p123 -e"GRANT USAGE ON *.* TO monitor@'%' IDENTIFIED BY 'monitor';"

We can view the configuration of the monitor user on the ProxySQL host by checking the global variables on the admin interface:

mysql> SHOW VARIABLES LIKE 'mysql-monitor%';
+----------------------------------------+---------+
| Variable_name                          | Value   |
+----------------------------------------+---------+
| mysql-monitor_enabled                  | true    |
| mysql-monitor_connect_timeout          | 200     |
| mysql-monitor_ping_max_failures        | 3       |
| mysql-monitor_ping_timeout             | 100     |
| mysql-monitor_replication_lag_interval | 10000   |
| mysql-monitor_replication_lag_timeout  | 1000    |
| mysql-monitor_username                 | monitor |
| mysql-monitor_password                 | monitor |
| mysql-monitor_query_interval           | 60000   |
| mysql-monitor_query_timeout            | 100     |
| mysql-monitor_slave_lag_when_null      | 60      |
| mysql-monitor_writer_is_also_reader    | true    |
| mysql-monitor_history                  | 600000  |
| mysql-monitor_connect_interval         | 60000   |
| mysql-monitor_ping_interval            | 10000   |
| mysql-monitor_read_only_interval       | 1500    |
| mysql-monitor_read_only_timeout        | 500     |
+----------------------------------------+---------+

Testing Consul:

Now that Consul and ProxySQL are configured we can do some tests from the “appserver”. First, we’ll verify that the hosts we’ve added are both reporting [OK] on our DNS requests:

$ dig @127.0.0.1 -p 53 proxysql.service.consul
; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @127.0.0.1 -p 53 proxysql.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9975
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;proxysql.service.consul.	IN	A
;; ANSWER SECTION:
proxysql.service.consul. 0	IN	A	192.168.1.121
proxysql.service.consul. 0	IN	A	192.168.1.120
;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Sep 05 19:32:12 UTC 2016
;; MSG SIZE  rcvd: 158

As you can see from the output above, DNS is reporting both 192.168.120 and 192.168.1.121 as available for the ProxySQL service. As soon as the ProxySQL check fails, the nodes will no longer report in the output above.

We can also view the status of our cluster and agents through the Consul Web GUI which runs on port 8500 of all the Consul servers in this configuration (e.g. http://192.168.1.120:8500/):

Consul GUI

Testing ProxySQL:

So now that we have this configured we can also do some basic tests to see that ProxySQL is load balancing our connections:

[percona@appserver consul.d]$ mysql -hproxysql.service.consul -e"select @@hostname"
+--------------------+
| @@hostname         |
+--------------------+
| mysql1.localdomain |
+--------------------+
[percona@appserver consul.d]$ mysql -hproxysql.service.consul -e"select @@hostname"
+--------------------+
| @@hostname         |
+--------------------+
| mysql2.localdomain |
+--------------------+

Perfect! We’re ready to use the hostname “proxysql.service.consul” to connect to our MySQL instances using a round-robin load balancing and HA proxy solution. If one of the two ProxySQL instances fails, we’ll continue communicating with the database through the other. Of course, this configuration is not limited to just two hosts, so feel free to add as many as you need. Be aware that in this example the two hosts’ replication hierarchy is managed by MHA in order to allow for master/slave promotion. By performing an automatic or manual failover using MHA, ProxySQL automatically detects the change in replication topology and redirect writes to the newly promoted master instance.

To make this configuration more durable, it is encouraged to create a more intelligent Consul check – i.e., a check that checks more than just the availability of the MySQL service (an example would be to select some data from a table). It is also recommended to fine tune the interval of the check to suit the requirements of your application.

by Nik Vyzas at September 16, 2016 03:20 PM

September 15, 2016

Peter Zaitsev

ProxySQL and Percona XtraDB Cluster (Galera) Integration

ProxySQL and Percona XtraDB ClusterIn this post, we’ll discuss how an integrated ProxySQL and Percona XtraDB Cluster (Galera) helps manage node states and failovers.

ProxySQL is designed to not perform any specialized operation in relation to the servers with which it communicates. Instead, it uses an event scheduler to extend functionalities and cover any special needs.

Given that specialized products like Percona XtraDB Cluster are not managed by ProxySQL, they require the design and implementation of good/efficient extensions.

In this article, I will illustrate how Percona XtraDB Cluster/Galera can be integrated with ProxySQL to get the best from both.

Brief digression

Before discussing their integration, we need to review a couple of very important concepts in ProxySQL. ProxySQL has a very important logical component: Hostgroup(s) (HG).

A hostgroup is a relation of:

+-----------+       +------------------------+
|Host group +------>|Server (1:N)            |
+-----------+       +------------------------+

In ProxySQL, QueryRules (QR) can be directly mapped to an HG. Using QRs, you can define a specific user to ONLY go to that HG. For instance, you may want to have user app1_user go only on servers A-B-C. Simply set a QR that says app1_user has the destination hostgroup 5, where HG 5 has the servers A-B-C:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',5,3306,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',5,3306,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',5,3306,10);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) values('app1_user',5,1);

Easy isn’t it?

Another important concept in ProxySQL also related to HG is ReplicationHostgroup(s) (RHG). This is a special HG that ProxySQL uses to automatically manage the nodes that are connected by replication and configured in Write/Read and Read_only mode.

What does this mean? Let’s say you have four nodes A-B-C-D, connected by standard asynchronous replication. A is the master and B-C-D are the slaves. What you want is to have your application pointing writes to server A, and reads to B-C (keeping D as a backup slave). Also, you don’t want to have any reads go to B-C if the replication delay is more than two seconds.

RHG, in conjunction with HG, ProxySQL can manage all this for you. Simply instruct the proxy to:

  1. Use RHG
  2. Define the value of the maximum latency

Using the example above:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.5',5,3306,10,2);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.6',5,3306,10,2);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.7',5,3306,10,2);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.8',10,3306,10,2);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) values('app1_user',5,1);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) values('app1_user',6,1);
INSERT INTO mysql_replication_hostgroups VALUES (5,6);

From now on ProxySQL will split the R/W using the RHG and the nodes defined in HG 5.
The flexibility introduced by using HGs is obviously not limited to what I mention here. It will play a good part in the integration of Percona XtraDB Cluster and ProxySQL, as I illustrate below.

Percona XtraDB Cluster/Galera Integration

In an XtraDB cluster, a node has many different states and conditions that affect if and how your application operates on the node.

The most common one is when a node become a DONOR. If you’ve ever installed Percona XtraDB Cluster (or any Galera implementation), you’ve faced the situation when a node become a DONOR it changes state to DESYNC. If the node is under a heavy load, the DONOR process might affect the node itself.

But that is just one of the possible node states:

  • A node can be JOINED but not synced
  • It can have wsrep_rejectqueries, wsrep_donorrejectqueries, wsrep_ready (off)
  • It can be in a different segment
  • The number of nodes per segment is relevant.

To show what can be done and how, we will use the following setup:

  • Five nodes
  • Two segments
  • Applications requiring R/W split

And two options:

  • Single writer node
  • Multiple writers node

We’ll analyze how the proxy behaves under the use of a script run by the ProxySQL scheduler.

The use of a script is necessary for ProxySQL to respond correctly to Percona XtraDB Cluster state modifications. ProxySQL comes with two scripts for Galera, both of them are too basic and don’t consider a lot of relevant conditions. I’ve written a more complete script: https://github.com/Tusamarco/proxy_sql_tools galera_check.pl

This script is a prototype and requires QA and debugging, but is still more powerful than the default ones.

The script is designed to manage X number of nodes that belong to a given HG. The script works by HG, and as such it will perform isolated actions/checks by the HG. It is not possible to have more than one check running on the same HG. The check will create a lock file {proxysql_galera_check_${hg}.pid} that will be used to prevent duplicates.

galera_check
 will connect to the ProxySQL node and retrieve all the information regarding the nodes/proxysql configuration. It will then check in parallel each node and will retrieve the status and configuration.
galera_check
 analyzes and manages the following node states:

  • read_only
  • wsrep_status
  • wsrep_rejectqueries
  • wsrep_donorrejectqueries
  • wsrep_connected
  • wsrep_desinccount
  • wsrep_ready
  • wsrep_provider
  • wsrep_segment
  • Number of nodes in by segment
  • Retry loop

As mentioned, the number of nodes inside a segment is relevant. If a node is the only one in a segment, the check behaves accordingly. For example, if a node is the only one in the MAIN segment, it will not put the node in OFFLINE_SOFT when the node becomes a donor, to prevent the cluster from becoming unavailable for applications.

The script allows you to declare a segment as MAIN — quite useful when managing production and DR sites, as the script manages the segment acting as main in a more conservative way. The check can be configured to perform retries after a given interval, where the interval is the time define in the ProxySQL scheduler. As such, if the check is set to have two retries for UP and three for DOWN, it will loop that number before doing anything.

Percona XtraDB Cluster/Galera performs some actions under the hood, some of them not totally correct. This feature is useful in some uncommon circumstances, where Galera behaves weirdly. For example, whenever a node is set to READ_ONLY=1, Galera desyncs and resyncs the node. A check that doesn’t take this into account sets the node to OFFLINE and back for no reason.

Another important differentiation for this check is that it use special HGs for maintenance, all in the range of 9000. So if a node belongs to HG 10, and the check needs to put it in maintenance mode, the node will be moved to HG 9010. Once all is normal again, the node will be put back on its original HG.

This check does NOT modify any node states. This means it will NOT modify any variables or settings in the original node. It will ONLY change node states in ProxySQL.

Multi-writer mode

The recommended way to use Galera is in multi-writer mode. You can then play with the weight to have a node act as MAIN node and prevent/reduce certification failures and Brutal force Abort from Percona XtraDB Cluster. Use this configuration:

Delete from mysql_replication_hostgroups where writer_hostgroup=500 ;
delete from mysql_servers where hostgroup_id in (500,501);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',500,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',501,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',500,3306,1000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',501,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',500,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',501,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.8',500,3306,1);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.8',501,3306,1);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.9',500,3306,1);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.9',501,3306,1);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL TO DISK;

In this test, we will NOT use Replication HostGroup. We will do that later when testing a single writer. For now, we’ll focus on multi-writer.

Segment 1 covers HG 500 and 501, while segment two only covers 501. Weight for the servers in HG 500 is progressive from 1 to 1 billion, in order to reduce the possible random writes on the non-main node.

As such nodes:

  • HG 500S1 192.168.1.5 – 1.000.000.000
    • S1 192.168.1.6 – 1.000.000
    • S1 192.168.1.7 – 100
    • S2 192.168.1.8 – 1
    • S2 192.168.1.9 – 1
  • HG 501S1 192.168.1.5 – 100
    • S1 192.168.1.6 – 1000000000
    • S1 192.168.1.7 – 1000000000
    • S2 192.168.1.8 – 1
    • S2 192.168.1.9 – 1

The following command shows what ProxySQL is doing:

watch -n 1 'mysql -h 127.0.0.1 -P 3310 -uadmin -padmin -t -e "select * from stats_mysql_connection_pool where hostgroup in (500,501,9500,9501) order by hostgroup,srv_host ;" -e " select hostgroup_id,hostname,status,weight,comment from mysql_servers where hostgroup_id in (500,501,9500,9501)  order by hostgroup_id,hostname ;"'

Download the check from GitHub (https://github.com/Tusamarco/proxy_sql_tools) and activate it in ProxySQL. Be sure to set the parameters that match your installation:

delete from scheduler where id=10;
INSERT  INTO scheduler (id,active,interval_ms,filename,arg1) values (10,0,2000,"/var/lib/proxysql/galera_check.pl","-u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --debug=0  --log=/var/lib/proxysql/galeraLog");
LOAD SCHEDULER TO RUNTIME;SAVE SCHEDULER TO DISK;

If you want to activate it:

update scheduler set active=1 where id=10;
LOAD SCHEDULER TO RUNTIME;

The following is the kind of scenario we have:

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.9 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 413        |
| 500       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 420        |
| 500       | 192.168.1.7 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 227        |
| 500       | 192.168.1.6 | 3306     | ONLINE | 0        | 10       | 10     | 0       | 12654    | 1016975         | 0               | 230        |
| 500       | 192.168.1.5 | 3306     | ONLINE | 0        | 9        | 29     | 0       | 107358   | 8629123         | 0               | 206        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 4        | 6      | 0       | 12602425 | 613371057       | 34467286486     | 413        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 6        | 7      | 0       | 12582617 | 612422028       | 34409606321     | 420        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 0        | 6        | 6      | 0       | 18580675 | 905464967       | 50824195445     | 227        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 0        | 6        | 14     | 0       | 18571127 | 905075154       | 50814832276     | 230        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 10     | 0       | 169570   | 8255821         | 462706881       | 206        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

To generate a load, use the following commands (or whatever you like, but use a different one for read-only and reads/writes):

Write
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=stress_RW --mysql-password=test --mysql-db=test_galera --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=50000 --max-requests=0 --max-time=9000 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all run
Read only
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=stress_RW --mysql-password=test --mysql-db=test_galera --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=50000 --max-requests=0 --max-time=9000 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10 --mysql-ignore-errors=all run

The most common thing that could happen to a cluster node is to become a donor. This is a planned activity for Percona XtraDB Cluster and is suppose to be managed in a less harmful way.

We’re going to simulate crashing a node and forcing it to elect our main node as DONOR (the one with the highest WEIGHT).

To do so, we need to have the parameter

wsrep_sst_donor
 set.

show global variables like 'wsrep_sst_donor';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| wsrep_sst_donor | node1 | <---
+-----------------+-------+

Activate the check if not already done:

update scheduler set active=1 where id=10;

And now run traffic. Check load:

select * from stats_mysql_connection_pool where hostgroup in (500,501,9500,9501) order by hostgroup,srv_host ;
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 10       | 0        | 30     | 0       | 112662   | 9055479         | 0               | 120        | <--- our Donor
| 500       | 192.168.1.6 | 3306     | ONLINE | 0        | 10       | 10     | 0       | 12654    | 1016975         | 0               | 111        |
| 500       | 192.168.1.7 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 115        |
| 500       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 316        |
| 500       | 192.168.1.9 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 329        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 10     | 0       | 257271   | 12533763        | 714473854       | 120        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 0        | 10       | 18     | 0       | 18881582 | 920200116       | 51688974309     | 111        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 3        | 6        | 9      | 0       | 18927077 | 922317772       | 51794504662     | 115        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 1        | 8      | 0       | 12595556 | 613054573       | 34447564440     | 316        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 1        | 3        | 6      | 0       | 12634435 | 614936148       | 34560620180     | 329        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

Now on one of the nodes:

  1. Kill mysql
  2. Remove the content of the data directory
  3. Restart the node

The node will go in SST and our

galera_check
 script will manage it:

+--------------+-------------+--------------+------------+--------------------------------------------------+
| hostgroup_id | hostname    | status       | weight     | comment                                          |
+--------------+-------------+--------------+------------+--------------------------------------------------+
| 500          | 192.168.1.5 | OFFLINE_SOFT | 1000000000 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <---- the donor
| 500          | 192.168.1.6 | ONLINE       | 1000000    |                                                  |
| 500          | 192.168.1.7 | ONLINE       | 100        |                                                  |
| 500          | 192.168.1.8 | ONLINE       | 1          |                                                  |
| 500          | 192.168.1.9 | ONLINE       | 1          |                                                  |
| 501          | 192.168.1.5 | OFFLINE_SOFT | 100        | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; |
| 501          | 192.168.1.6 | ONLINE       | 1000000000 |                                                  |
| 501          | 192.168.1.7 | ONLINE       | 1000000000 |                                                  |
| 501          | 192.168.1.8 | ONLINE       | 1          |                                                  |
| 501          | 192.168.1.9 | ONLINE       | 1          |                                                  |
+--------------+-------------+--------------+------------+--------------------------------------------------+

We can also check the

galera_check
 log and see what happened:

2016/09/02 16:13:27.298:[WARN] Move node:192.168.1.5;3306;500;3010 SQL: UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306'
2016/09/02 16:13:27.303:[WARN] Move node:192.168.1.5;3306;501;3010 SQL: UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

The node will remain in OFFLINE_SOFT while the other node (192.168.1.6 with the 2nd WEIGHT) serves the writes, until the node is in DONOR state.

All as expected, the node was set in OFFLINE_SOFT state, which mean the existing connections finished, while the node was not accepting any NEW connections.

As soon the node stops sending data to the Joiner, it was moved back and traffic restarted:

2016/09/02 16:14:58.239:[WARN] Move node:192.168.1.5;3306;500;1000 SQL: UPDATE mysql_servers SET status='ONLINE' WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306'
2016/09/02 16:14:58.243:[WARN] Move node:192.168.1.5;3306;501;1000 SQL: UPDATE mysql_servers SET status='ONLINE' WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 6        | 1        | 37     | 0       | 153882   | 12368557        | 0               | 72         | <---
| 500       | 192.168.1.6 | 3306     | ONLINE | 1        | 9        | 10     | 0       | 16008    | 1286492         | 0               | 42         |
| 500       | 192.168.1.7 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 1398     | 112371          | 0               | 96         |
| 500       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 24545  | 791     | 24545    | 122725          | 0               | 359        |
| 500       | 192.168.1.9 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 15108    | 1214366         | 0               | 271        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 1        | 0        | 11     | 0       | 2626808  | 128001112       | 7561278884      | 72         |
| 501       | 192.168.1.6 | 3306     | ONLINE | 5        | 7        | 20     | 0       | 28629516 | 1394974468      | 79289633420     | 42         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 2        | 8        | 10     | 0       | 29585925 | 1441400648      | 81976494740     | 96         |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 16779  | 954     | 12672983 | 616826002       | 34622768228     | 359        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 4        | 6      | 0       | 13567512 | 660472589       | 37267991677     | 271        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

This was easy, and more or less managed by the standard script. But what would happen if my donor was set to DO NOT serve query when in the DONOR state?

Wait, what?? Yes, Percona XtraDB Cluster (and Galera in general) can be set to refuse any query when the node goes in DONOR state. If not managed this can cause issues as the node will simply reject queries (but ProxySQL sees the node as alive).

Let me show you:

show global variables like 'wsrep_sst_donor_rejects_queries';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| wsrep_sst_donor_rejects_queries | ON    |
+---------------------------------+-------+

For the moment, let’s deactivate the check. Then, do the same stop and delete of the data dir, then restart the node. SST takes place.

Sysbench will report:

ALERT: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'BEGIN'
FATAL: failed to execute function `event': 3
ALERT: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'BEGIN'
FATAL: failed to execute function `event': 3

But ProxySQL?

+-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE  | 0        | 0        | 101    | 0       | 186331   | 14972717        | 0               | 118        | <-- no writes in wither HG
| 500       | 192.168.1.6 | 3306     | ONLINE  | 0        | 9        | 10     | 0       | 20514    | 1648665         | 0               | 171        |  |
| 500       | 192.168.1.7 | 3306     | ONLINE  | 0        | 1        | 3      | 0       | 5881     | 472629          | 0               | 134        |  |
| 500       | 192.168.1.8 | 3306     | ONLINE  | 0        | 0        | 205451 | 1264    | 205451   | 1027255         | 0               | 341        |  |
| 500       | 192.168.1.9 | 3306     | ONLINE  | 0        | 1        | 2      | 0       | 15642    | 1257277         | 0               | 459        |  -
| 501       | 192.168.1.5 | 3306     | ONLINE  | 1        | 0        | 13949  | 0       | 4903347  | 238627310       | 14089708430     | 118        |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 2        | 10       | 20     | 0       | 37012174 | 1803380964      | 103269634626    | 171        |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 2        | 11       | 13     | 0       | 38782923 | 1889507208      | 108288676435    | 134        |
| 501       | 192.168.1.8 | 3306     | SHUNNED | 0        | 0        | 208452 | 1506    | 12864656 | 626156995       | 34622768228     | 341        |
| 501       | 192.168.1.9 | 3306     | ONLINE  | 1        | 3        | 6      | 0       | 14451462 | 703534884       | 39837663734     | 459        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
mysql> select * from mysql_server_connect_log where hostname in ('192.168.1.5','192.168.1.6','192.168.1.7','192.168.1.8','192.168.1.9')  order by time_start_us desc limit 10;
+-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+
| hostname    | port | time_start_us    | connect_success_time_us | connect_error                                                                                          |
+-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+
| 192.168.1.9 | 3306 | 1472827444621954 | 1359                    | NULL                                                                                                   |
| 192.168.1.8 | 3306 | 1472827444618883 | 0                       | Can't connect to MySQL server on '192.168.1.8' (107)                                                   |
| 192.168.1.7 | 3306 | 1472827444615819 | 433                     | NULL                                                                                                   |
| 192.168.1.6 | 3306 | 1472827444612722 | 538                     | NULL                                                                                                   |
| 192.168.1.5 | 3306 | 1472827444606560 | 473                     | NULL                                                                                                   | <-- donor is seen as up
| 192.168.1.9 | 3306 | 1472827384621463 | 1286                    | NULL                                                                                                   |
| 192.168.1.8 | 3306 | 1472827384618442 | 0                       | Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 107 |
| 192.168.1.7 | 3306 | 1472827384615317 | 419                     | NULL                                                                                                   |
| 192.168.1.6 | 3306 | 1472827384612241 | 415                     | NULL                                                                                                   |
| 192.168.1.5 | 3306 | 1472827384606117 | 454                     | NULL                                                                                                   | <-- donor is seen as up
+-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+
select * from mysql_server_ping_log where hostname in ('192.168.1.5','192.168.1.6','192.168.1.7','192.168.1.8','192.168.1.9')  order by time_start_us desc limit 10;
+-------------+------+------------------+----------------------+------------------------------------------------------+
| hostname    | port | time_start_us    | ping_success_time_us | ping_error                                           |
+-------------+------+------------------+----------------------+------------------------------------------------------+
| 192.168.1.9 | 3306 | 1472827475062217 | 311                  | NULL                                                 |
| 192.168.1.8 | 3306 | 1472827475060617 | 0                    | Can't connect to MySQL server on '192.168.1.8' (107) |
| 192.168.1.7 | 3306 | 1472827475059073 | 108                  | NULL                                                 |
| 192.168.1.6 | 3306 | 1472827475057281 | 102                  | NULL                                                 |
| 192.168.1.5 | 3306 | 1472827475054188 | 74                   | NULL                                                 | <-- donor is seen as up
| 192.168.1.9 | 3306 | 1472827445061877 | 491                  | NULL                                                 |
| 192.168.1.8 | 3306 | 1472827445060254 | 0                    | Can't connect to MySQL server on '192.168.1.8' (107) |
| 192.168.1.7 | 3306 | 1472827445058688 | 53                   | NULL                                                 |
| 192.168.1.6 | 3306 | 1472827445057124 | 131                  | NULL                                                 |
| 192.168.1.5 | 3306 | 1472827445054015 | 98                   | NULL                                                 | <-- donor is seen as up
+-------------+------+------------------+----------------------+------------------------------------------------------+

As you can see, all seems OK. Let’s turn on

galera_check
 and see what happens when we run some read and write loads.

And now let me do the stop-delete-restart-SST process again:

kill -9 <mysqld_safe_pid> <mysqld_pid>; rm -fr data/*;rm -fr logs/*;sleep 2;./start

As soon as the node goes down, ProxySQL shuns the node.

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE  | 7        | 3        | 34     | 0       | 21570   | 1733833         | 0               | 146        |
| 500       | 192.168.1.6 | 3306     | ONLINE  | 1        | 8        | 12     | 0       | 9294    | 747063          | 0               | 129        |
| 500       | 192.168.1.7 | 3306     | ONLINE  | 1        | 0        | 4      | 0       | 3396    | 272950          | 0               | 89         |
| 500       | 192.168.1.8 | 3306     | SHUNNED | 0        | 0        | 1      | 6       | 12      | 966             | 0               | 326        | <-- crashed
| 500       | 192.168.1.9 | 3306     | ONLINE  | 1        | 0        | 2      | 0       | 246     | 19767           | 0               | 286        |
| 501       | 192.168.1.5 | 3306     | ONLINE  | 0        | 1        | 2      | 0       | 772203  | 37617973        | 2315131214      | 146        |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 9        | 3        | 12     | 0       | 3439458 | 167514166       | 10138636314     | 129        |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 1        | 12       | 13     | 0       | 3183822 | 155064971       | 9394612877      | 89         |
| 501       | 192.168.1.8 | 3306     | SHUNNED | 0        | 0        | 1      | 6       | 11429   | 560352          | 35350726        | 326        | <-- crashed
| 501       | 192.168.1.9 | 3306     | ONLINE  | 0        | 1        | 1      | 0       | 312253  | 15227786        | 941110520       | 286        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Immediately after, 

galera_check
 identifies the node is requesting the SST, and that the DONOR is our writer (given it is NOT the only writer in the HG, and it has the variable
wsrep_sst_donor_rejects_queries
 active), it cannot be set to OFFLINE_SOFT. We do not want ProxySQL to consider it OFFLINE_HARD (because it is not).

As such, the script moves it to a special HG:

2016/09/04 16:11:22.091:[WARN] Move node:192.168.1.5;3306;500;3001 SQL: UPDATE mysql_servers SET hostgroup_id=9500 WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306'
2016/09/04 16:11:22.097:[WARN] Move node:192.168.1.5;3306;501;3001 SQL: UPDATE mysql_servers SET hostgroup_id=9501 WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

+--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+
| hostgroup_id | hostname    | port | status | weight     | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment                                          |
+--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+
| 500          | 192.168.1.6 | 3306 | ONLINE | 1000000    | 0           | 1000            | 0                   | 0       | 0              |                                                  |
| 500          | 192.168.1.7 | 3306 | ONLINE | 100        | 0           | 1000            | 0                   | 0       | 0              |                                                  |
| 500          | 192.168.1.8 | 3306 | ONLINE | 1          | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; |
| 500          | 192.168.1.9 | 3306 | ONLINE | 1          | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; |
| 501          | 192.168.1.6 | 3306 | ONLINE | 1000000000 | 0           | 1000            | 0                   | 0       | 0              |                                                  |
| 501          | 192.168.1.7 | 3306 | ONLINE | 1000000000 | 0           | 1000            | 0                   | 0       | 0              |                                                  |
| 501          | 192.168.1.9 | 3306 | ONLINE | 1          | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; |
| 9500         | 192.168.1.5 | 3306 | ONLINE | 1000000000 | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <-- Special HG
| 9501         | 192.168.1.5 | 3306 | ONLINE | 100        | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <-- Special HG
+--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+

The Donor continues to serve the Joiner, but applications won’t see it.

What applications see is also very important. Applications doing WRITEs will see:

[ 10s] threads: 10, tps: 9.50, reads: 94.50, writes: 42.00, response time: 1175.77ms (95%), errors: 0.00, reconnects: 0.00
...
[ 40s] threads: 10, tps: 2.80, reads: 26.10, writes: 11.60, response time: 3491.45ms (95%), errors: 0.00, reconnects: 0.10
[ 50s] threads: 10, tps: 4.80, reads: 50.40, writes: 22.40, response time: 10062.13ms (95%), errors: 0.80, reconnects: 351.60 <--- Main writer moved to another HG
[ 60s] threads: 10, tps: 5.90, reads: 53.10, writes: 23.60, response time: 2869.82ms (95%), errors: 0.00, reconnects: 0.00
...

When one node shifts to another, the applications will have to manage the RE-TRY, but it will only be a short time and will cause limited impact on the production flow.

Application readers see no errors:

[ 10s] threads: 10, tps: 0.00, reads: 13007.31, writes: 0.00, response time: 9.13ms (95%), errors: 0.00, reconnects: 0.00
[ 50s] threads: 10, tps: 0.00, reads: 9613.70, writes: 0.00, response time: 10.66ms (95%), errors: 0.00, reconnects: 0.20 <-- just a glitch in reconnect
[ 60s] threads: 10, tps: 0.00, reads: 10807.90, writes: 0.00, response time: 11.07ms (95%), errors: 0.00, reconnects: 0.20
[ 70s] threads: 10, tps: 0.00, reads: 9082.61, writes: 0.00, response time: 23.62ms (95%), errors: 0.00, reconnects: 0.00
...
[ 390s] threads: 10, tps: 0.00, reads: 13050.80, writes: 0.00, response time: 8.97ms (95%), errors: 0.00, reconnects: 0.00

When the Donor ends providing SST, it comes back and the script manages it. Then 

galera_check
 puts it in the right HG:

2016/09/04 16:12:34.266:[WARN] Move node:192.168.1.5;3306;9500;1010 SQL: UPDATE mysql_servers SET hostgroup_id=500 WHERE hostgroup_id=9500 AND hostname='192.168.1.5' AND port='3306'
2016/09/04 16:12:34.270:[WARN] Move node:192.168.1.5;3306;9501;1010 SQL: UPDATE mysql_servers SET hostgroup_id=501 WHERE hostgroup_id=9501 AND hostname='192.168.1.5' AND port='3306'

The crashed node is restarted by the SST process, and the node goes up. But if the level of load in the cluster is mid/high, it will remain in the JOINED state for sometime, becoming visible by the ProxySQL again. ProxySQL will not, however, correctly recognize the state.

2016-09-04 16:17:15 21035 [Note] WSREP: 3.2 (node4): State transfer from 1.1 (node1) complete.
2016-09-04 16:17:15 21035 [Note] WSREP: Shifting JOINER -> JOINED (TO: 254515)

To avoid this issue, the script will move it to a special HG, allowing it to recovery without interfering with a real load.

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 6        | 2        | 15     | 0       | 3000    | 241060          | 0               | 141        |
| 500       | 192.168.1.6 | 3306     | ONLINE | 1        | 9        | 13     | 0       | 13128   | 1055268         | 0               | 84         |
| 500       | 192.168.1.7 | 3306     | ONLINE | 1        | 0        | 4      | 0       | 3756    | 301874          | 0               | 106        |
| 500       | 192.168.1.9 | 3306     | ONLINE | 1        | 0        | 2      | 0       | 4080    | 327872          | 0               | 278        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 1        | 0        | 2      | 0       | 256753  | 12508935        | 772048259       | 141        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 4        | 8        | 12     | 0       | 5116844 | 249191524       | 15100617833     | 84         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 2        | 11       | 13     | 0       | 4739756 | 230863200       | 13997231724     | 106        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 496524  | 24214563        | 1496482104      | 278        |
| 9500      | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 331        |<-- Joined not Sync
| 9501      | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 331        |<-- Joined not Sync
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Once the node fully recovers,

galera_check
 puts it back in the original HG, ready serve requests:

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 15     | 0       | 3444    | 276758          | 0               | 130        |
| 500       | 192.168.1.6 | 3306     | ONLINE | 0        | 9        | 13     | 0       | 13200   | 1061056         | 0               | 158        |
| 500       | 192.168.1.7 | 3306     | ONLINE | 0        | 0        | 4      | 0       | 3828    | 307662          | 0               | 139        |
| 500       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |<-- up again
| 500       | 192.168.1.9 | 3306     | ONLINE | 0        | 0        | 2      | 0       | 4086    | 328355          | 0               | 336        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 2      | 0       | 286349  | 13951366        | 861638962       | 130        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 0        | 12       | 12     | 0       | 5239212 | 255148806       | 15460951262     | 158        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 0        | 13       | 13     | 0       | 4849970 | 236234446       | 14323937975     | 139        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |<-- up again
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 507910  | 24768898        | 1530841172      | 336        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

A summary of the logical steps is:

+---------+
                |  Crash  |
                +----+----+
                     |
                     v
            +--------+-------+
            |  ProxySQL      |
            |  shun crashed  |
            |      node      |
            +--------+-------+
                     |
                     |
                     v
   +-----------------+-----------------+
   |  Donor has one of the following?  |
   |  wsrep_sst_dono _rejects_queries  |
   |  OR                               |
   |  wsrep_reject_queries             |
   +-----------------------------------+
      |No                            |Yes
      v                              v
+-----+----------+       +-----------+----+
| Galera_check   |       | Galera_check   |
| put the donor  |       | put the donor  |
| in OFFLINE_SOFT|       | in special HG  |
+---+------------+       +-----------+----+
    |                                |
    |                                |
    v                                v
+---+--------------------------------+-----+
|            Donor SST ends                |
+---+---------------+----------------+-----+
    |               |                |
    |               |                |
+---+------------+  |    +-----------+----+
| Galera_check   |  |    | Galera_check   |
| put the donor  |  |    | put the donor  |
| ONLINE         |  |    | in Original HG |
+----------------+  |    +----------------+
                    |
                    |
+------------------------------------------+
|           crashed SST ends               |
+-------------------+----------------------+
                    |
                    |
       +------------+-------------+
       |  Crashed node back but   +<------------+
       |  Not Sync?               |             |
       +--------------------------+             |
          |No                   |Yes            |
          |                     |               |
          |                     |               |
+---------+------+       +------+---------+     |
| Galera_check   |       | Galera_check   |     |
| put the node   |       | put the node   +-----+
| back orig. HG  |       | Special HG     |
+--------+-------+       +----------------+
         |
         |
         |
         |      +---------+
         +------>   END   |
                +---------+

As mentioned,

galera_check
 can manage several node states.

Another case is when we can’t have the node accept ANY queries. We might need that for several reasons, including preparing the node for maintenance (or whatever).

In Percona XtraDB Cluster (and other Galera implementations) we can set the value of

wsrep_reject_queries
 to:

  • NONE
  • ALL
  • ALL_KILL

Let see how it works. Run some load, then on the main writer node (192.168.1.5):

set global wsrep_reject_queries=ALL;

This blocks any new query execution until the run is complete. Do a simple select on the node:

(root@localhost:pm) [test]>select * from tbtest1;
ERROR 1047 (08S01): WSREP has not yet prepared node for application use

ProxySQL won’t see these conditions:

+-------------+------+------------------+----------------------+------------+
| hostname    | port | time_start_us    | ping_success_time_us | ping_error |
+-------------+------+------------------+----------------------+------------+
| 192.168.1.5 | 3306 | 1473005467628001 | 35                   | NULL       | <--- ping ok
| 192.168.1.5 | 3306 | 1473005437628014 | 154                  | NULL       |
+-------------+------+------------------+----------------------+------------+
+-------------+------+------------------+-------------------------+---------------+
| hostname    | port | time_start_us    | connect_success_time_us | connect_error |
+-------------+------+------------------+-------------------------+---------------+
| 192.168.1.5 | 3306 | 1473005467369575 | 246                     | NULL          | <--- connect ok
| 192.168.1.5 | 3306 | 1473005407369441 | 353                     | NULL          |
+-------------+------+------------------+-------------------------+---------------+

The script

galera_check
 will instead manage it:

+-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status       | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | OFFLINE_SOFT | 0        | 0        | 8343   | 0       | 10821   | 240870          | 0               | 93         | <--- galera check put it OFFLINE
| 500       | 192.168.1.6 | 3306     | ONLINE       | 10       | 0        | 15     | 0       | 48012   | 3859402         | 0               | 38         | <--- writer
| 500       | 192.168.1.7 | 3306     | ONLINE       | 0        | 1        | 6      | 0       | 14712   | 1182364         | 0               | 54         |
| 500       | 192.168.1.8 | 3306     | ONLINE       | 0        | 1        | 2      | 0       | 1092    | 87758           | 0               | 602        |
| 500       | 192.168.1.9 | 3306     | ONLINE       | 0        | 1        | 4      | 0       | 5352    | 430152          | 0               | 238        |
| 501       | 192.168.1.5 | 3306     | OFFLINE_SOFT | 0        | 0        | 1410   | 0       | 197909  | 9638665         | 597013919       | 93         |
| 501       | 192.168.1.6 | 3306     | ONLINE       | 2        | 10       | 12     | 0       | 7822682 | 380980455       | 23208091727     | 38         |
| 501       | 192.168.1.7 | 3306     | ONLINE       | 0        | 13       | 13     | 0       | 7267507 | 353962618       | 21577881545     | 54         |
| 501       | 192.168.1.8 | 3306     | ONLINE       | 0        | 1        | 1      | 0       | 241641  | 11779770        | 738145270       | 602        |
| 501       | 192.168.1.9 | 3306     | ONLINE       | 1        | 0        | 1      | 0       | 756415  | 36880233        | 2290165636      | 238        |
+-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

In this case, the script will put the node in OFFLINE_SOFT, given the

set global wsrep_reject_queries=ALL
 means do not accept NEW and complete the existing as OFFLINE_SOFT.

The script also manages the case of

set global wsrep_reject_queries=ALL_KILL;
. From the ProxySQL point of view, these do not exist either:

+-------------+------+------------------+----------------------+------------+
| hostname    | port | time_start_us    | ping_success_time_us | ping_error |
+-------------+------+------------------+----------------------+------------+
| 192.168.1.5 | 3306 | 1473005827629069 | 59                   | NULL       |<--- ping ok
| 192.168.1.5 | 3306 | 1473005797628988 | 57                   | NULL       |
+-------------+------+------------------+----------------------+------------+
+-------------+------+------------------+-------------------------+---------------+
| hostname    | port | time_start_us    | connect_success_time_us | connect_error |
+-------------+------+------------------+-------------------------+---------------+
| 192.168.1.5 | 3306 | 1473005827370084 | 370                     | NULL          | <--- connect ok
| 192.168.1.5 | 3306 | 1473005767369915 | 243                     | NULL          |
+-------------+------+------------------+-------------------------+---------------+
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 9500      | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |<--- galera check put it in special HG
| 9501      | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

The difference here is that the script moves the node to the special HG to isolate it, instead leaving it in the original HG.

The integration between ProxySQL and Percona XtraDB Custer (Galera) works perfectly for multi-writer if you have a script like

galera_check
 that correctly manages the different Percona XtraDB Custer/Galera states.

ProxySQL and PXC using Replication HostGroup

Sometimes we might need to have 100% of the write going to only one node at a time. As explained above, ProxySQL uses weight to redirect a % of the load to a specific node.

In most cases, it will be enough to set the weight in the main writer to a very high value (like 10 billion) and one thousand on the next node to almost achieve a single writer.

But this is not 100% effective, it still allows ProxySQL to send a query once every X times to the other node(s). To keep it consistent with the ProxySQL logic, the solution is to use replication Hostgroups.

Replication HGs are special HGs that the proxy sees as connected for R/W operations. ProxySQL analyzes the value of the READ_ONLY variables and assigns to the READ_ONLY HG the nodes that have it enabled.

The node having READ_ONLY=0 resides in both HGs. As such the first thing we need to modify is to tell ProxySQL that HG 500 and 501 are replication HGs.

INSERT INTO mysql_replication_hostgroups VALUES (500,501,'');
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
select * from mysql_replication_hostgroups ;
+------------------+------------------+---------+
| writer_hostgroup | reader_hostgroup | comment |
+------------------+------------------+---------+
| 500              | 501              |         |
+------------------+------------------+---------+

Now whenever I set the value of READ_ONLY on a node, ProxySQL will move the node accordingly. Let see how. Current:

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 6        | 1        | 7      | 0       | 16386    | 1317177         | 0               | 97         |
| 500       | 192.168.1.6 | 3306     | ONLINE | 1        | 9        | 15     | 0       | 73764    | 5929366         | 0               | 181        |
| 500       | 192.168.1.7 | 3306     | ONLINE | 1        | 0        | 6      | 0       | 18012    | 1447598         | 0               | 64         |
| 500       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 2      | 0       | 1440     | 115728          | 0               | 341        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1210029  | 58927817        | 3706882671      | 97         |
| 501       | 192.168.1.6 | 3306     | ONLINE | 1        | 11       | 12     | 0       | 16390790 | 798382865       | 49037691590     | 181        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 1        | 12       | 13     | 0       | 15357779 | 748038558       | 45950863867     | 64         |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1247662  | 60752227        | 3808131279      | 341        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1766309  | 86046839        | 5374169120      | 422        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

Set global READ_ONLY=1 on the following nodes: 192.168.1.6/7/8/9.

After:

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 10       | 0        | 20     | 0       | 25980    | 2088346         | 0               | 93         |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1787979  | 87010074        | 5473781192      | 93         |
| 501       | 192.168.1.6 | 3306     | ONLINE | 4        | 8        | 12     | 0       | 18815907 | 916547402       | 56379724890     | 79         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 1        | 12       | 13     | 0       | 17580636 | 856336023       | 52670114510     | 131        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 15324    | 746109          | 46760779        | 822        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 16210    | 789999          | 49940867        | 679        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

IF in this scenario a reader node crashes, the application will not suffer at all given the redundancy.

But if the writer is going to crash, THEN the issue exists because there will be NO node available to manage the failover. The solution is to either do the node election manually or to have the script elect the node with the lowest read weight in the same segment as the new writer.

Below is what happens when a node crashes (bird-eye view):

+---------+
                         |  Crash  |
                         +----+----+
                              |
                              v
                     +--------+-------+
                     |  ProxySQL      |
                     |  shun crashed  |
                     |      node      |
                     +--------+-------+
                              |
                              |
                              v
            +-----------------+-----------------+
+----------->   HostGroup has another active    |
|           |   Node in HG writer?              |
|           +--+--------------+---------------+-+
|              |              |               |
|              |              |               |
|              |No            |               |Yes
|              |              |               |
|        +-----v----------+   |   +-----------v----+
|        |ProxySQL will   |   |   |ProxySQL will   |
|        |stop serving    |   |   |redirect load   >--------+
|        |writes          |   |   |there           |        |
|        +----------------+   |   +----------------+        |
|                             |                             |
|                             v                             |
|                     +-------+--------+                    |
|                     |ProxySQL checks |                    |
|                     |READ_ONLY on    |                    |
|                     |Reader HG       |                    |
|                     |                |                    |
|                     +-------+--------+                    |
|                             |                             |
|                             v                             |
|                     +-------+--------+                    |
|                     |Any Node with   |                    |
|                     |READ_ONLY = 0 ? |                    |
|                     +----------------+                    |
|                      |No            |Yes                  |
|                      |              |                     |
|           +----------v------+    +--v--------------+      |
|           |ProxySQL will    |    |ProxySQL will    |      |
|           |continue to      |    |Move node to     |      |
+<---------<+do not serve     |    |Writer HG        |      |
|           |Writes           |    |                 |      |
|           +-----------------+    +--------v--------+      |
|                                           |               |
+-------------------------------------------+               |
                         +---------+                        |
                         |   END   <------------------------+
                         +---------+

The script should act immediately after the ProxySQL SHUNNED the node step, just replacing the READ_ONLY=1 with READ_ONLY=0 on the reader node with the lowest READ WEIGHT.

ProxySQL will do the rest, copying the node into the WRITER HG, keeping low weight, such that WHEN/IF the original node will comeback the new node will not compete for traffic.

Since it included that special function in the check, the feature allows automatic fail-over. This experimental feature is active only if explicitly set in the parameter that the scheduler passes to the script. To activate it add

--active_failover
 in the scheduler. My recommendation is to have two entries in the scheduler and activate the one with
--active_failover
 for testing, and remember to deactivate the other one.

Let see the manual procedure first:

The process is:

1 Generate some load
2 Kill the writer node
3 Manually elect a reader as writer
4 Recover crashed node

Current load:

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 30324   | 2437438         | 0               | 153        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1519612 | 74006447        | 4734427711      | 153        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 4        | 8        | 12     | 0       | 7730857 | 376505014       | 24119645457     | 156        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 2        | 10       | 12     | 0       | 7038332 | 342888697       | 21985442619     | 178        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 612523  | 29835858        | 1903693835      | 337        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 611021  | 29769497        | 1903180139      | 366        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Kill the main node 192.168.1.5:

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 501       | 192.168.1.5 | 3306     | SHUNNED | 0        | 0        | 1      | 11      | 1565987 | 76267703        | 4879938857      | 119        |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 1        | 11       | 12     | 0       | 8023216 | 390742215       | 25033271548     | 112        |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 1        | 11       | 12     | 0       | 7306838 | 355968373       | 22827016386     | 135        |
| 501       | 192.168.1.8 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 638326  | 31096065        | 1984732176      | 410        |
| 501       | 192.168.1.9 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 636857  | 31025014        | 1982213114      | 328        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
+-------------+------+------------------+----------------------+------------------------------------------------------+
| hostname    | port | time_start_us    | ping_success_time_us | ping_error                                           |
+-------------+------+------------------+----------------------+------------------------------------------------------+
| 192.168.1.5 | 3306 | 1473070640798571 | 0                    | Can't connect to MySQL server on '192.168.1.5' (107) |
| 192.168.1.5 | 3306 | 1473070610798464 | 0                    | Can't connect to MySQL server on '192.168.1.5' (107) |
+-------------+------+------------------+----------------------+------------------------------------------------------+
+-------------+------+------------------+-------------------------+------------------------------------------------------+
| hostname    | port | time_start_us    | connect_success_time_us | connect_error                                        |
+-------------+------+------------------+-------------------------+------------------------------------------------------+
| 192.168.1.5 | 3306 | 1473070640779903 | 0                       | Can't connect to MySQL server on '192.168.1.5' (107) |
| 192.168.1.5 | 3306 | 1473070580779977 | 0                       | Can't connect to MySQL server on '192.168.1.5' (107) |
+-------------+------+------------------+-------------------------+------------------------------------------------------+

When the node is killed ProxySQL, shun it and report issues with the checks (connect and ping). During this time frame the application will experience issues if it is not designed to manage the retry and eventually a queue, and it will crash.

Sysbench reports the errors:

Writes

[  10s] threads: 10, tps: 6.70, reads: 68.50, writes: 30.00, response time: 1950.53ms (95%), errors: 0.00, reconnects:  0.00
...
[1090s] threads: 10, tps: 4.10, reads: 36.90, writes: 16.40, response time: 2226.45ms (95%), errors: 0.00, reconnects:  1.00  <-+ killing the node
[1100s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1110s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1120s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1130s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |-- Gap waiting for a node to become
[1140s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |   READ_ONLY=0
[1150s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1160s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1170s] threads: 10, tps: 4.70, reads: 51.30, writes: 22.80, response time: 80430.18ms (95%), errors: 0.00, reconnects:  0.00 <-+
[1180s] threads: 10, tps: 8.90, reads: 80.10, writes: 35.60, response time: 2068.39ms (95%), errors: 0.00, reconnects:  0.00
...
 [1750s] threads: 10, tps: 5.50, reads: 49.80, writes: 22.80, response time: 2266.80ms (95%), errors: 0.00, reconnects:  0.00 -- No additional errors

I decided to promote node 192.168.1.6 given the weight for readers was equal and as such no difference in this setup.

(root@localhost:pm) [(none)]>set global read_only=0;
Query OK, 0 rows affected (0.00 sec)

Checking ProxySQL:

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.6 | 3306     | ONLINE  | 10       | 0        | 10     | 0       | 1848    | 148532          | 0               | 40         |
| 501       | 192.168.1.5 | 3306     | SHUNNED | 0        | 0        | 1      | 72      | 1565987 | 76267703        | 4879938857      | 38         |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 2        | 10       | 12     | 0       | 8843069 | 430654903       | 27597990684     | 40         |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 1        | 11       | 12     | 0       | 8048826 | 392101994       | 25145582384     | 83         |
| 501       | 192.168.1.8 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 725820  | 35371512        | 2259974847      | 227        |
| 501       | 192.168.1.9 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 723582  | 35265066        | 2254824754      | 290        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

As the READ_ONLY value is modified, ProxySQL moves it to the writer HG, and writes can take place again. At this point in time production activities are recovered.

Reads had just a minor glitch:

Reads

[  10s] threads: 10, tps: 0.00, reads: 20192.15, writes: 0.00, response time: 6.96ms (95%), errors: 0.00, reconnects:  0.00
...
[ 410s] threads: 10, tps: 0.00, reads: 16489.03, writes: 0.00, response time: 9.41ms (95%), errors: 0.00, reconnects:  2.50
...
[ 710s] threads: 10, tps: 0.00, reads: 18789.40, writes: 0.00, response time: 6.61ms (95%), errors: 0.00, reconnects:  0.00

The glitch happened when node 192.168.1.6 was copied over to HG 500, but with no interruptions or errors. At this point let us put back the crashed node, which comes back elect Node2 (192.168.1.6) as Donor.

This was a Percona XtraDB Cluster/Galera choice, and we have to accept and manage it.

Note that the other basic scripts put the node in OFFLINE_SOFT, given the node will become a DONOR.
Galera_check will recognize that Node2 (192.168.1.6) is the only active node in the segment for that specific HG (writer), while is not the only present for the READER HG.

As such it will put the node in OFFLINE_SOFT only for the READER HG, trying to reduce the load on the node, but it will keep it active in the WRITER HG, to prevent service interruption.

Restart the node and ask for a donor:

2016-09-05 12:21:43 8007 [Note] WSREP: Flow-control interval: [67, 67]
2016-09-05 12:21:45 8007 [Note] WSREP: Member 1.1 (node1) requested state transfer from '*any*'. Selected 0.1 (node2)(SYNCED) as donor.
2016-09-05 12:21:46 8007 [Note] WSREP: (ef248c1f, 'tcp://192.168.1.8:4567') turning message relay requesting off
2016-09-05 12:21:52 8007 [Note] WSREP: New cluster view: global state: 234bb6ed-527d-11e6-9971-e794f632b140:324329, view# 7: Primary, number of nodes: 5, my index: 3, protocol version 3

galera_check
  sets OFFLINE_SOFT 192.168.1.6 only for the READER HG, and ProxySQL uses the others to serve reads.

+-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status       | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.6 | 3306     | ONLINE       | 10       | 0        | 10     | 0       | 7746     | 622557          | 0               | 86         |
| 501       | 192.168.1.5 | 3306     | ONLINE       | 0        | 0        | 1      | 147     | 1565987  | 76267703        | 4879938857      | 38         |
| 501       | 192.168.1.6 | 3306     | OFFLINE_SOFT | 0        | 0        | 12     | 0       | 9668944  | 470878452       | 30181474498     | 86         | <-- Node offline
| 501       | 192.168.1.7 | 3306     | ONLINE       | 9        | 3        | 12     | 0       | 10932794 | 532558667       | 34170366564     | 62         |
| 501       | 192.168.1.8 | 3306     | ONLINE       | 0        | 1        | 1      | 0       | 816599   | 39804966        | 2545765089      | 229        |
| 501       | 192.168.1.9 | 3306     | ONLINE       | 0        | 1        | 1      | 0       | 814893   | 39724481        | 2541760230      | 248        |
+-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

When the SST donor task is over, 

galera_check
 moves the 192.168.1.6 back ONLINE as expected. But at the same time, it moves the recovering node to the special HG to avoid to have it included in any activity until ready.

2016-09-05 12:22:36 27352 [Note] WSREP: 1.1 (node1): State transfer from 0.1 (node2) complete.
2016-09-05 12:22:36 27352 [Note] WSREP: Shifting JOINER -> JOINED (TO: 325062)

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.6 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 1554     | 124909          | 0               | 35         |
| 501       | 192.168.1.6 | 3306     | ONLINE | 2        | 8        | 22     | 0       | 10341612 | 503637989       | 32286072739     | 35         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 3        | 9        | 12     | 0       | 12058701 | 587388598       | 37696717375     | 13         |
| 501       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 890102   | 43389051        | 2776691164      | 355        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 887994   | 43296865        | 2772702537      | 250        |
| 9500      | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 57         | <-- Special HG for recover
| 9501      | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 57         | <-- Special HG for recover
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

Once finally the node is in SYNC with the group, it is put back online in the READER HG and in the writer HG:

2016-09-05 12:22:36 27352 [Note] WSREP: 1.1 (node1): State transfer from 0.1 (node2) complete.
2016-09-05 12:22:36 27352 [Note] WSREP: Shifting JOINER -> JOINED (TO: 325062)

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 0          | <-- Back on line
| 500       | 192.168.1.6 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 402      | 32317           | 0               | 68         |
| 501       | 192.168.1.5 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 6285     | 305823          | 19592814        | 312        | <-- Back on line
| 501       | 192.168.1.6 | 3306     | ONLINE | 4        | 6        | 22     | 0       | 10818694 | 526870710       | 33779586475     | 68         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 0        | 12       | 12     | 0       | 12492316 | 608504039       | 39056093665     | 26         |
| 501       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 942023   | 45924082        | 2940228050      | 617        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 939975   | 45834039        | 2935816783      | 309        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
+--------------+-------------+------+--------+------------+
| hostgroup_id | hostname    | port | status | weight     |
+--------------+-------------+------+--------+------------+
| 500          | 192.168.1.5 | 3306 | ONLINE | 100        |
| 500          | 192.168.1.6 | 3306 | ONLINE | 1000000000 |
| 501          | 192.168.1.5 | 3306 | ONLINE | 100        |
| 501          | 192.168.1.6 | 3306 | ONLINE | 1000000000 |
| 501          | 192.168.1.7 | 3306 | ONLINE | 1000000000 |
| 501          | 192.168.1.8 | 3306 | ONLINE | 1          |
| 501          | 192.168.1.9 | 3306 | ONLINE | 1          |
+--------------+-------------+------+--------+------------+

But given it is coming back with its READER WEIGHT, it will NOT compete with the previously elected WRITER.

The recovered node will stay on “hold” waiting for a DBA to act and eventually put it back, or be set as READ_ONLY and as such be fully removed from the WRITER HG.

Let see the automatic procedure now:

For the moment, we just stick to the MANUAL failover process. The process is:

  1. Generate some load
  2. Kill the writer node
  3. Script will do auto-failover
  4. Recover crashed node

Check our scheduler config:

+----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+
| id | active | interval_ms | filename | arg1 | arg2 | arg3 | arg4 | arg5 | comment |
+----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+
| 10 | 1 | 2000 | /var/lib/proxysql/galera_check.pl | -u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --active_failover --debug=0 --log=/var/lib/proxysql/galeraLog | NULL | NULL | NULL | NULL | | <--- Active
| 20 | 0 | 1500 | /var/lib/proxysql/galera_check.pl | -u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --debug=0 --log=/var/lib/proxysql/galeraLog | NULL | NULL | NULL | NULL | |
+----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+

The active one is the one with auto-failover. Start load and check current load:

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 952     | 76461           | 0               | 0          |
| 501       | 192.168.1.5 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 53137   | 2587784         | 165811100       | 167        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 5        | 5        | 11     | 0       | 283496  | 13815077        | 891230826       | 109        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 3        | 7        | 10     | 0       | 503516  | 24519457        | 1576198138      | 151        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 21952   | 1068972         | 68554796        | 300        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 21314   | 1038593         | 67043935        | 289        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Kill the main node 192.168.1.5:

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.6 | 3306     | ONLINE  | 10       | 0        | 10     | 0       | 60      | 4826            | 0               | 0          |
| 501       | 192.168.1.5 | 3306     | SHUNNED | 0        | 0        | 1      | 11      | 177099  | 8626778         | 552221651       | 30         |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 3        | 7        | 11     | 0       | 956724  | 46601110        | 3002941482      | 49         |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 2        | 8        | 10     | 0       | 1115685 | 54342756        | 3497575125      | 42         |
| 501       | 192.168.1.8 | 3306     | ONLINE  | 0        | 1        | 1      | 0       | 76289   | 3721419         | 240157393       | 308        |
| 501       | 192.168.1.9 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 75803   | 3686067         | 236382784       | 231        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

When the node is killed the node is SHUNNED, but this time the script already set the new node 192.168.1.6 to ONLINE. See script log:

2016/09/08 14:04:02.494:[INFO] END EXECUTION Total Time:102.347850799561
2016/09/08 14:04:04.478:[INFO] This Node Try to become a WRITER set READ_ONLY to 0 192.168.1.6:3306:HG501
2016/09/08 14:04:04.479:[INFO] This Node NOW HAS READ_ONLY = 0 192.168.1.6:3306:HG501
2016/09/08 14:04:04.479:[INFO] END EXECUTION Total Time:71.8140602111816

More importantly, let’s look at the application experience:

Writes

[  10s] threads: 10, tps: 9.40, reads: 93.60, writes: 41.60, response time: 1317.41ms (95%), errors: 0.00, reconnects:  0.00
[  20s] threads: 10, tps: 8.30, reads: 74.70, writes: 33.20, response time: 1350.96ms (95%), errors: 0.00, reconnects:  0.00
[  30s] threads: 10, tps: 8.30, reads: 74.70, writes: 33.20, response time: 1317.81ms (95%), errors: 0.00, reconnects:  0.00
[  40s] threads: 10, tps: 7.80, reads: 70.20, writes: 31.20, response time: 1407.51ms (95%), errors: 0.00, reconnects:  0.00
[  50s] threads: 10, tps: 6.70, reads: 60.30, writes: 26.80, response time: 2259.35ms (95%), errors: 0.00, reconnects:  0.00
[  60s] threads: 10, tps: 6.60, reads: 59.40, writes: 26.40, response time: 3275.78ms (95%), errors: 0.00, reconnects:  0.00
[  70s] threads: 10, tps: 5.70, reads: 60.30, writes: 26.80, response time: 1492.56ms (95%), errors: 0.00, reconnects:  1.00 <-- just a reconnect experience
[  80s] threads: 10, tps: 6.70, reads: 60.30, writes: 26.80, response time: 7959.74ms (95%), errors: 0.00, reconnects:  0.00
[  90s] threads: 10, tps: 6.60, reads: 59.40, writes: 26.40, response time: 2109.03ms (95%), errors: 0.00, reconnects:  0.00
[ 100s] threads: 10, tps: 6.40, reads: 57.60, writes: 25.60, response time: 1883.96ms (95%), errors: 0.00, reconnects:  0.00
[ 110s] threads: 10, tps: 5.60, reads: 50.40, writes: 22.40, response time: 2167.27ms (95%), errors: 0.00, reconnects:  0.00

With no errors and no huge delay, our application (managing to reconnect) had only a glitch, and had to reconnect.

Read had no errors or reconnects.

The connection errors were managed by ProxySQL, and given it found five in one second it SHUNNED the node. The

galera_script
 was able to promote a reader, and given it is a failover, no delay with retry loop. The whole thing was done in such brief time that application barely saw it.

Obviously, an application with thousands of connections/sec will experience larger impact, but the time-window will be very narrow. Once the failed node is ready to come back, either we choose to start it with READ_ONLY=1, and it will come back as the reader.
Or we will keep it as it is and it will come back as the writer.

No matter what, the script manages the case as it had done in the previous (manual) exercise.

Conclusions

ProxySQL and galera_check, when working together, are quite efficient in managing the cluster and its different scenarios. When using the single-writer mode, solving the manual part of the failover dramatically improves the efficiency in production state recovery performance — going from few minutes to seconds or less.

The multi-writer mode remains the preferred and most recommended way to use ProxySQL/Percona XtraDB Cluster given it performs failover without the need of additional scripts or extensions. It’s also the preferred method if a script is required to manage the integration with ProxySQL.

In both cases, the use of a script can identify the multiple states of Percona XtraDB Cluster and the mutable node scenario. It is a crucial part of the implementation, without which ProxySQL might not behave correctly.

by Marco Tusa at September 15, 2016 10:37 PM

Percona XtraDB Cluster 5.6.30-25.16.2 is now available (CVE-2016-6662 fix)

Percona XtraDB Cluster Reference Architecture

Percona XtraDB Cluster 5.6

Percona  announces the new release of Percona XtraDB Cluster 5.6 on September 15, 2016. Binaries are available from the downloads area or our software repositories.

Percona XtraDB Cluster 5.6.30-25.16.2 is now the current release, based on the following:

  • Percona Server 5.6.30-76.3
  • Galera Replication library 3.16
  • Codership wsrep API version 25

This release provides a fix for CVE-2016-6662. More information about this security issue can be found here.

Bug Fixed:

  • Due to security reasons ld_preload libraries can now only be loaded from the system directories (/usr/lib64, /usr/lib) and the MySQL installation base directory.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Hrvoje Matijakovic at September 15, 2016 01:53 PM

Jean-Jerome Schmidt

Planets9s - New ClusterControl pricing plans for managing MySQL, MongoDB & PostgreSQL

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

New ClusterControl pricing plans for managing MySQL, MongoDB & PostgreSQL

Whether you’re looking to manage standalone instances, need high availability or have 24/7 SLA requirements for your databases, we’ve got you covered. ClusterControl now comes with three enhanced subscription options for you to chose from: Standalone, Advanced and Enterprise. This is in addition to its Community Edition that you can use at no charge.

View the new pricing plans

Join us for our MySQL Query Tuning Part 2: Indexing and EXPLAIN webinar

Why is a given query slow, what does the execution plan look like, how will JOINs be processed, is the query using the correct indexes, or is it creating a temporary table?

You are welcome to sign up for our next webinar on September 27, where we’ll look at the EXPLAIN command and see how it can help us answer these questions. EXPLAIN is one of the most powerful tools at your disposal for understanding and troubleshooting troublesome database queries. We will also look into how to use database indexes to speed up queries.

Sign up for the webinar

Sign up for our #ClusterControl CrowdChat

If you haven’t checked this out yet, do take a look at our online community to interact with experts on how to best deploy and manage your databases. CrowdChat is a community platform that works across Facebook, Twitter, and LinkedIn to allow users to discuss a topic using a specific #hashtag. This crowdchat focuses on the hashtag #ClusterControl. So if you’re a DBA, architect, CTO, or a database novice, register to join and become part of the conversation!

Join the CrowdChat

Become a MongoDB DBA: Recovering your Data

A well-designed backup and restore strategy maximizes data availability and minimizes data loss, while considering the requirements of your business. How do you best restore a MongoDB backup? What are the considerations when restoring a replicaSet as opposed to a single node? This blog gives you an overview on how to restore your data for recovery purposes, as well as when seeding a new node in a replicaSet.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at September 15, 2016 01:49 PM

Daniël van Eeden

About Oracle MySQL and CVE-2016-6662

The issue

On 12 September 2016 (three days ago) a MySQL security vulnerability was announced. The CVE id is CVE-2016-6662.

There are 3 claims:
  1. By setting malloc-lib in the configuration file access to an OS root shell can be gained.
  2. By using the general log a configuration file can be written in any place which is writable for the OS mysql user.
  3. By using SELECT...INTO DUMPFILE... it is possible to elevate privileges from a database user with the FILE privilege to any database account including root.

How it is supposed to be used

  1. Find an SQL Injection in a website or otherwise gain access to a MySQL account.
  2. Now create a trigger file (requires FILE privilege)
  3. Now in the trigger or otherwise use SET GLOBAL general_log_file etc to create a my.cnf in the datadir with the correct privileges. Directly using SELECT...INTO DUMPFILE...won't work as that would result in the wrong permissions, which would cause mysqld/mysqld_safe to ignore that file.
  4. Now wait someone/something to restart MySQL (upgrade, daily cold backup, etc) and a shell will be available on a port number and IP address chosen by the attacker.

How it is fixed

The document claims "Official patches for the vulnerability are not available at this time for Oracle MySQL server. ", but that isn't true.

From the 5.7.15 release notes:
  • mysqld_safe attempted to read my.cnf in the data directory, although that is no longer a standard option file location. (Bug #24482156)
  • For mysqld_safe, the argument to --malloc-lib now must be one of the directories /usr/lib, /usr/lib64, /usr/lib/i386-linux-gnu, or /usr/lib/x86_64-linux-gnu. In addition, the --mysqld and --mysqld-version options can be used only on the command line and not in an option file. (Bug #24464380)
  • It was possible to write log files ending with .ini or .cnf that later could be parsed as option files. The general query log and slow query log can no longer be written to a file ending with .ini or .cnf. (Bug #24388753)
The last two items are also in the 5.6.33 release notes.

So 2 out of the 3 vulnerabilities are patched. So the obvious advice is to upgrade.

Further steps to take

But there are more things you can do to further secure your setup.

Check if your my.cnf file(s) are writable for the mysql user

/etc/my.cnf, /etc/mysql/my.cnf /etc/mysql/my.cnf.d/* should NOT be writable for the mysql user. Make sure these are owned by root and mode 644.

Put an empty my.cnf in your datadir and make sure it has the above mentioned privileges. The vulnerability document also mentions a .my.cnf in the datadir, so also make that an empty file.

Review accounts with the FILE privilege:

Run this query and drop accounts or revoke the file privilege from them if they don't really need it.
SELECT GRANTEE FROM INFORMATION_SCHEMA.USER_PRIVILEGES WHERE PRIVILEGE_TYPE='FILE';

Isolate services

Don't run all services on one machine. Isolate services from each other. So put the webserver and database server on separate (virtual) machines or containers.

Use a firewall. If the database suddenly starts to listen on a weird port the attacker should not be able to connect to it. This can be a host based firewall like iptables and a network device. Yes an IDS might be able to detect the network shell, but running an IDS/IPS needs serious amount of time and doesn't give any guarantees.

Prepare for the next vulnerability

This is not only for MySQL, but also for other parts of your stack (OS, webserver, etc).

Make sure the configuration is secured properly for each service. A helpful resource here are the benchmark documents from the Center for Internet Security.

by Daniël van Eeden (noreply@blogger.com) at September 15, 2016 07:04 AM

September 14, 2016

Peter Zaitsev

MySQL Default Configuration Changes between 5.6 and 5.7

MySQL Default Configuration Changes

MySQL Default Configuration ChangesIn this blog post, we’ll discuss the MySQL default configuration changes between 5.6 and 5.7.

MySQL 5.7 has added a variety of new features that might excite you. However, there are also changes in the current variables that you might have overlooked. MySQL 5.7 updated nearly 40 of the defaults from 5.6. Some of the changes could severely impact your server performance, while others might go unnoticed. I’m going to go over each of the changes and what they mean.

The change that can have the largest impact on your server is likely

sync_binlog
. My colleague, Roel Van de Paar, wrote about this impact in depth in another blog post, so I won’t go in much detail.
Sync_binlog
 controls how MySQL flushes the binlog to disk. The new value of 1 forces MySQL to write every transaction to disk prior to committing. Previously, MySQL did not force flushing the binlog, and trusted the OS to decide when to flush the binlog.

(https://www.percona.com/blog/2016/06/03/binary-logs-make-mysql-5-7-slower-than-5-6/)

Variables 5.6.29 5.7.11
sync_binlog 0 1

 

The performance schema variables stand out as unusual, as many have a default of -1. MySQL uses this notation to call out variables that are automatically adjusted. The only performance schema variable change that doesn’t adjust itself is  

performance_schema_max_file_classes
. This is the number of file instruments used for the performance schema. It’s unlikely you will ever need to alter it.

Variables 5.6.29 5.7.11
performance_schema_accounts_size 100 -1
performance_schema_hosts_size 100 -1
performance_schema_max_cond_instances 3504 -1
performance_schema_max_file_classes 50 80
performance_schema_max_file_instances 7693 -1
performance_schema_max_mutex_instances 15906 -1
performance_schema_max_rwlock_instances 9102 -1
performance_schema_max_socket_instances 322 -1
performance_schema_max_statement_classes 168 -1
performance_schema_max_table_handles 4000 -1
performance_schema_max_table_instances 12500 -1
performance_schema_max_thread_instances 402 -1
performance_schema_setup_actors_size 100 -1
performance_schema_setup_objects_size 100 -1
performance_schema_users_size 100 -1

 

The

optimizer_switch
, and
sql_mode
 variables have a variety of options that can each be enabled and cause a slightly different action to occur. MySQL 5.7 enables both variables for flags, increasing their sensitivity and security. These additions make the optimizer more efficient in determining how to correctly interpret your queries.

Three flags have been added to the

optimzer_switch
, all of which existed in MySQL 5.6 and were set as the default in MySQL 5.7 (with the intent to increase the optimizer’s efficiency):
duplicateweedout=on
,
condition_fanout_filter=on
, and
derived_merge=on
.
duplicateweedout
 is part of the optimizer’s semi join materialization strategy.
condition_fanout_filter
 controls the use of condition filtering, and
derived_merge controls
 the merging of derived tables, and views into the outer query block.

https://dev.mysql.com/doc/refman/5.7/en/switchable-optimizations.html

http://www.chriscalender.com/tag/condition_fanout_filter/

The additions to SQL mode do not affect performance directly, however they will improve the way you write queries (which can increase performance). Some notable changes include requiring all fields in a select … group by statement must either be aggregated using a function like SUM, or be in the group by clause. MySQL will not assume they should be grouped, and will raise an error if a field is missing.

Strict_trans_tables
 causes a different effect depending on if it used with a transactional table.

Statements are rolled back on transaction tables if there is an invalid or missing value in a data change statement. For tables that do not use a transactional engine, MySQL’s behavior depends on the row in which the invalid data occurs. If it is the first row, then the behavior matches that of a transactional engine. If not, then the invalid value is converted to the closest valid value, or the default value for the columns. A warning is generated, but the data is still inserted.

Variables 5.6.29 5.7.11
optimizer_switch index_merge=on
index_merge_union=on
index_merge_sort_union=on
index_merge_intersection=on
engine_condition_pushdown=on
index_condition_pushdown=on
mrr=on,mrr_cost_based=on
block_nested_loop=on
batched_key_access=off
materialization=on, semijoin=on
loosescan=on, firstmatch=on
subquery_materialization_cost_based=on
use_index_extensions=on
index_merge=on
index_merge_union=on
index_merge_sort_union=on
index_merge_intersection=on
engine_condition_pushdown=on
index_condition_pushdown=on
mrr=on
mrr_cost_based=on
block_nested_loop=on
batched_key_access=off
materialization=on
semijoin=on
loosescan=on
firstmatch=on
duplicateweedout=on
subquery_materialization_cost_based=on
use_index_extensions=on
condition_fanout_filter=on
derived_merge=on
sql_mode NO_ENGINE_SUBSTITUTION ONLY_FULL_GROUP_BY
STRICT_TRANS_TABLES
NO_ZERO_IN_DATE
NO_ZERO_DATE
ERROR_FOR_DIVISION_BY_ZERO
NO_AUTO_CREATE_USER
NO_ENGINE_SUBSTITUTION

 

There have been a couple of variable changes surrounding the binlog. MySQL 5.7 updated the

binlog_error_action
 so that if there is an error while writing to the binlog, the server aborts. These kind of incidents are rare, but cause a big impact to your application and replication when they occurs, as the server will not perform any further transactions until corrected.

The binlog default format was changed to ROW, instead of the previously used statement format. Statement writes less data to the logs. However there are many statements that cannot be replicated correctly, including “update … order by rand()”. These non-deterministic statements could result in different resultsets on the master and slave. The change to Row format writes more data to  the binlog, but the information is more accurate and ensures correct replication.

MySQL has begun to focus on replication using GTID’s instead of the traditional binlog position. When MySQL is started, or restarted, it must generate a list of the previously used GTIDs. If

binlog_gtid_simple_recovery
 is OFF, or FALSE, then the server starts with the newest binlog and iterates backwards through the binlog files searching for a
previous_gtids_log_event
. With it set to ON, or TRUE, then the server only reviews the newest and oldest binlog files and computes the used gtids.
Binlog_gtid_simple_recovery
  makes it much faster to identify the binlogs, especially if there are a large number of binary logs without GTID events. However, in specific cases it could cause
gtid_executed
 and
gtid_purged
 to be populated incorrectly. This should only happen when the newest binarly log was generated by MySQL5.7.5 or older, or if a SET GTID_PURGED statement was run on MySQL earlier than version 5.7.7.

Another replication-based variable updated in 5.7 is 

slave_net_timeout
. It is lowered to only 60 seconds. Previously the replication thread would not consider it’s connection to the master broken until the problem existed for at least an hour. This change informs you much sooner if there is a connectivity problem, and ensures replication does not fall behind significantly before informing you of an issue.

Variables 5.6.29 5.7.11
binlog_error_action IGNORE_ERROR ABORT_SERVER
binlog_format STATEMENT ROW
binlog_gtid_simple_recovery OFF ON
slave_net_timeout 3600 60

 

InnoDB buffer pool changes impact how long starting and stopping the server takes.

innodb_buffer_pool_dump_at_shutdown
 and
innodb_buffer_pool_load_at_startup
 are used together to prevent you from having to “warm up” the server. As the names suggest, this causes a buffer pool dump at shutdown and load at startup. Even though you might have a buffer pool of 100’s of gigabytes, you will not need to reserve the same amount of space on disk, as the data written is much smaller. The only things written to disk for this is the information necessary to locate the actual data, the tablespace and page IDs.

Variables 5.6.29 5.7.11
innodb_buffer_pool_dump_at_shutdown OFF ON
innodb_buffer_pool_load_at_startup OFF ON

 

MySQL now made some of the options implemented in InnoDB during 5.6 and earlier into its defaults. InnoDB’s checksum algorithm was updated from innodb to crc32, allowing you to benefit from the hardware acceleration recent Intel CPU’s have available.

The Barracuda file format has been available since 5.5, but had many improvements in 5.6. It is now the default in 5.7. The Barracuda format allows you to use the compressed and dynamic row formats. My colleague Alexey has written about the utilization of the compressed format and the results he saw when optimizing a server: https://www.percona.com/blog/2008/04/23/real-life-use-case-for-barracuda-innodb-file-format/

The

innodb_large_prefix
 defaults to “on”, and when combined with the Barracuda file format allows for creating larger index key prefixes, up to 3072 bytes. This allows larger text fields to benefit from an index. If this is set to “off”, or the row format is not either dynamic or compressed, any index prefix larger than 767 bytes gets silently be truncated. MySQL has introduced larger InnoDB page sizes (32k and 64k) in 5.7.6.

MySQL 5.7 increased the

innodb_log_buffer_size
 value as well. InnoDB uses the log buffer to log transactions prior to writing them to disk in the binary log. The increased size allows the log to flush to the disk less often, reducing IO, and allows larger transactions to fit in the log without having to write to disk before committing.

MySQL 5.7 moved InnoDB’s purge operations to a background thread in order to reduce the thread contention in MySQL 5.5.The latest version increases the default to four purge threads, but can be changed to have anywhere from 1 to 32 threads.

MySQL 5.7 now enables 

innodb_strict_mode
 by default, turning some of the warnings into errors. Syntax errors in create table, alter table, create index, and optimize table statements generate errors and force the user to correct them prior to running. It also enables a record size check, ensuring that insert or update statements will not fail due to the record being too large for the selected page size.

Variables 5.6.29 5.7.11
innodb_checksum_algorithm innodb crc32
innodb_file_format Antelope Barracuda
innodb_file_format_max Antelope Barracuda
innodb_large_prefix OFF ON
innodb_log_buffer_size 8388608 16777216
innodb_purge_threads 1 4
innodb_strict_mode OFF ON

 

MySQL has increased the number of times the optimizer dives into the index when evaluating equality ranges. If the optimizer needs to dive into the index more than the

eq_range_index_dive_limit
 , defaulted to 200 in MySQL 5.7, then it uses the existing index statistics. You can adjust this limit from 0, eliminating index dives, to 4294967295. This can have a significant impact to query performance since the table statistics are based on the cardinality of a random sample. This can cause the optimizer to estimate a much larger set of rows to review than it would with the index dives, changing the method the optimizer chooses to execute the query.

MySQL 5.7 deprecated

log_warnings
. The new preference is 
utilize log_error_verbosity
. By default this is set to 3, and logs errors, warnings, and notes to the error log. You can alter this to 1 (log errors only) or 2 (log errors and warnings). When consulting the error log, verbosity is often a good thing. However this increases the IO and disk space needed for the error log.

Variables 5.6.29 5.7.11
eq_range_index_dive_limit 10 200
log_warnings 1 2

 

There are many changes to the defaults in 5.7. But many of these options have existed for a long time and should be familiar to users. Many people used these variables, and they are the best method to push MySQL forward. Remember, however, you can still edit these variables, and configure them to ensure that your server works it’s best for your data.

by bradley mickel at September 14, 2016 10:26 PM

pmp-check-pt-table-checksum Percona Monitoring Plugin

pmp-check-pt-table-checksum

pmp-check-pt-table-checksumRecently, I worked on a customer case where the customer needed to monitor the checksum via Nagios monitoring. The pmp-check-pt-table-checksum plugin from Percona Monitoring Plugins for MySQL achieves this goal. I thought it was worth a blogpost.

pmp-check-pt-table-checksum
 alerts you when the pt-table-checksum tool from Percona Toolkit finds data drifts on a replication slave.
pmp-checksum-pt-table-checksum
 monitors data differences on the slave from the checksum table as per information in the last checksum performed by the
pt-table-checksum
 tool. By default, the plugin queries the percona.checksum table to fetch information about data discrepancies. You can override this behavior with the “-T” option. You can check the
pmp-check-pt-table-checksum
 documentation for details.

Let’s demonstrate checksum monitoring via Nagios. My setup contains a master with two slave(s) connected, as follows:

  • Host 10.0.3.131 is master.
  • Host 10.0.3.83 is slave1
  • Host 10.0.3.36 is slave2

I intentionally generated more data on the master so

pt-table-checksum
 can catch the differences on the slave(s). Here’s what it looks like:

mysql-master> SELECT * FROM test.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
+------+
10 rows in set (0.00 sec)
mysql-slave1> SELECT * FROM test.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+------+
5 rows in set (0.00 sec)
mysql-slave2> SELECT * FROM test.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+------+
5 rows in set (0.00 sec)

As you can see, slave1 and slave2 are different from the master: the master has ten rows while the slave(s) have five rows each (table t1).

Then, I executed

pt-table-checksum
 from the master to check for data discrepancies:

[root@master]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password
 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
08-25T04:57:10 0 1 10 1 0 0.018 test.t1
[root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password
Differences on slave1
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.t1 1 -5 1
Differences on slave2
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.t1 1 -5 1

pt-table-checksum
 correctly identifies the differences for the test.t1 table on slave1 and slave2. Now, you can use the 
pmp-check-pt-table-checksum
  Percona checksum monitoring plugin. Let’s try to run it locally (via CLI) from the Nagios host.

[root@nagios]# pmp-check-pt-table-checksum -H slave1 -l checksum_user -p checksum_password -P 3306
WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1
[root@nagios]# pmp-check-pt-table-checksum -H slave2 -l checksum_user -p checksum_password -P 3306
WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1]

NOTE: The

checksum_user
 database user needs SELECT privileges on both the checksum table (Percona.checksums) and the slave(s) in order for SQL to alert for checksum differences on slave(s).

On the Nagios monitoring server, you need to add the 

pmp-check-pt-table-checksum
 command to the commands.cfg file:

define command{
command_name            pmp-check-pt-table-checksum
command_line            $USER1$/pmp-check-pt-table-checksum -H $HOSTADDRESS$ -c $ARG1$
        }

NOTE: I used “-c” option for

pmp-check-pt-table-checksum
, which raises a critical error instead of a warning.

And, on the existing hosts.cfg file (i.e., slave1.cfg and slave2.cfg), you need to add a monitoring command accordingly as below:

define service{
        use                             generic-service
        host_name                       slave1
        service_description             Checksum Status
        check_command                   pmp-check-pt-table-checksum!1
}

In this command “1” is an argument to command “-c $ARG1$” so

pmp-check-pt-table-checksum
will raise a critical error when one or more chunks on the slave(s) are different from the master.

Last but not least, restart the Nagios daemon on the monitoring host to make the change.

Below is how it looks like on the Nagios monitoring on the web:

pmp-check-pt-table-checksum
pmp-check-pt-table-checksum

I also think the “INTERVAL” option is useful:

-i INTERVAL     Interval over which to ensure pt-table-checksum was run, in days; default - not to check.

It makes sure that chunks are recent on the checksum table. Used the other way around, it checks on how old your chunks are. This option ensures the checksum cron executes at a defined number of days. Let’s say you have

pt-table-checksum
 cron running once per week. In that case, setting INTERVAL 14 or 21 alerts you if chunks are older then defined number of days (i.e., the INTERVAL number).

Conclusion:

Percona Monitoring plugins for MySQL are very useful and easy to embed in your centralize monitoring dashboard. You can schedule

pt-table-checksum
 via a cron job, and get reports regarding master/slave(s) data drifts (if any) from one global dashboard on the monitoring host. There are various plugins available from Percona, e.g. processlist plugin, replication delay plugin, etc. Along with that, Percona offers Cacti and Zabbix templates to graph various MySQL activities.

by Muhammad Irfan at September 14, 2016 08:52 PM

Daniël van Eeden

Visualizing the MySQL Bug Tide

On the MySQL Bugs website there are some tide stats available. These show rate of bug creation.

I've put them in a graph:
I made these with this IPython Notebook. There are more detailed graphs per version in the notebook.

Update: The version in the notebook now uses the same range for the Y axis and has a marker for the GA dates of each release.

by Daniël van Eeden (noreply@blogger.com) at September 14, 2016 07:08 PM

Jean-Jerome Schmidt

New ClusterControl subscriptions for managing MySQL, MongoDB and PostgreSQL

We’ve got your databases covered: check out our new pricing plans for ClusterControl, the single console to deploy, monitor and manage your entire database infrastructure.

Whether you’re looking to manage standalone instances, need high availability or have 24/7 SLA requirements for your databases, ClusterControl now comes with three enhanced options for you to chose from in addition to its Community Edition.

Standalone

Do you have standalone database servers to manage? Then this is the best plan for you. From real-time monitoring and performance advisors, to analyzing historical query data and making sure all your servers are backed up, ClusterControl Standalone has you covered.

Advanced

As our company name indicates, we’re all about achieving high availability. With ClusterControl Advanced, you can take the guesswork out of managing your high availability database setups - automate failover and recovery of your databases, add load balancers with read-write splits, add nodes or read replicas - all with a couple of clicks.

Enterprise

If you’re looking for all of the above in a 24/7 secure service environment, then look no further. From high-spec operational reports to role-based access control and SSL encryption, this is our most advanced plan aimed at mission-critical environments.

Here is a summary view of the new subscriptions:

Full features table & pricing plans Contact us

Note that ClusterControl can be downloaded for free and that each download includes an initial 30 day trial of ClusterControl Enterprise, so that you can test the full features set of our product. It then becomes ClusterControl Community, should you decide not to purchase a plan. With ClusterControl Community, you can deploy and monitor MySQL, MongoDB and PostgreSQL.

Happy Clustering!

by Severalnines at September 14, 2016 03:42 PM

Peter Zaitsev

Webinar Thursday Sept. 15: Identifying and Solving Database Performance Issues with PMM

PMM

PMMPlease join Roman Vynar, Lead Platform Engineer on Thursday, September 15, 2016 at 10 am PDT (UTC-7) for a webinar on Identifying and Solving Database Performance Issues with PMM.

Database performance is the key to high-performance applications. Gaining visibility into the database is the key to improving database performance. Percona’s Monitoring and Management (PMM) provides the insight you need into your database environment.

In this webinar, we will demonstrate how using PMM for query analytics, in combination with database and host performance metrics, can more efficiently drive tuning, issue management and application development. Using PMM can result in faster resolution times, more focused development and a more efficient IT team.

Register for the webinar here.

register-now

PMMRoman Vynar, Lead Platform Engineer
Roman is a Lead Platform Engineer at Percona. He joined the company to establish and develop the Remote DBA service from scratch. Over time, the growing service successfully expanded to Managed Services. Roman develops the monitoring tools, automated scripts, backup solution, notification and incident tracking web system and currently leading Percona Monitoring and Management project.

by Dave Avery at September 14, 2016 01:51 PM

Black Friday and Cyber Monday: Best Practices for Your E-Commerce Database

E-Commerce Database

E-Commerce DatabaseThis blog post discusses how you can protect your e-commerce database from a high traffic disaster.

Databases power today’s e-commerce. Whether it’s listing items on your site, contacting your distributor for inventory, tracking shipments, payments, or customer data, your database must be up, running, tuned and available for your business to be successful.

There is no time that this is more important than high-volume traffic days. There are specific events that occur throughout the year (such as Black Friday, Cyber Monday, or Singles Day) that you know are going to put extra strain on your database environment. But these are the specific times that your database can’t go down – these are the days that can make or break your year!

So what can you do to guarantee that your database environment is up to the challenge of handling high traffic events? Are there ways of preparing for this type of traffic?

Yes, there are! In this blog post, we’ll look at some of the factors that can help prepare your database environment to handle large amounts of traffic.

Synchronous versus Asynchronous Applications

Before moving to strategies, we need to discuss the difference between synchronous and asynchronous applications.

In most web-based applications, user input starts a number of requests for resources. Once the server answers the requests, no communication stops until the next input. This type of communication between a client and server is called synchronous communication.

Restricted application updates limit synchronous communication. Even synchronous applications designed to automatically refresh application server information at regular intervals have consistent periods of delay between data refreshes. While usually such delays aren’t an issue, some applications (for example, stock-trading applications) rely on continuously updated information to provide their users optimum functionality and usability.

Web 2.0-based applications address this issue by using asynchronous communication. Asynchronous applications deliver continuously updated data to users. Asynchronous applications separate client requests from application updates, so multiple asynchronous communications between the client and server can occur simultaneously or in parallel.

The strategy you use to scale the two types of applications to meet growing user and traffic demands will differ.

Scaling a Synchronous/Latency-sensitive Application

When it comes to synchronous applications, you really have only one option for scaling performance: sharding. With sharding, the tables are divided and distributed across multiple servers, which reduces the total number of rows in each table. This consequently reduces index size, and generally improves search performance.

A shard can also be located on its own hardware, with different shards added to different machines. This database distribution over a large multiple of machines spreads the load out, also improving performance. Sharding allows you to scale read and write performance when latency is important.

Generally speaking, it is better to avoid synchronous applications when possible – they limit your scalability options.

Scaling an Asynchronous Application

When it comes to scaling asynchronous applications, we have many more options than with synchronous applications. You should try and use asynchronous applications whenever possible:

  • Secondary/Slave hosts. Replication can be used to add more hardware for read traffic. Replication usually employs a master/slave relationship between a designated “original” server and copies of the server. The master logs and then distributes the updates to the slaves. This setup allows you to distribute the read load across more than one machine.
  • Caching. Database caching (tables, data, and models – caching summaries of data) improves scalability by distributing the query workload from expensive (overhead-wise) backend processes to multiple cheaper ones. It allows more flexibility for data processing: for example premium user data can be cached, while regular user data isn’t.

    Caching also improves data availability by providing applications that don’t depend on backend services continued service. It also allows for improved data access speeds by localizing the data and avoiding roundtrip queries. There are some specific caching strategies you can use:

    • Pre-Emptive Caching. Ordinarily, an object gets cached the first time it is requested (or if cached data isn’t timely enough). Preemptive caching instead generates cached versions before an application requests them. Typically this is done by a cron process.
    • Hit/Miss Caching. A cache hit occurs when an application or software requests data. First, the central processing unit (CPU) looks for the data in its closest memory location, which is usually the primary cache. If the requested data is found in the cache, it is considered a cache hit. Cache miss occurs within cache memory access modes and methods. For each new request, the processor searched the primary cache to find that data. If the data is not found, it is considered a cache miss. A cache hit serves data more quickly, as the data can be retrieved by reading the cache memory. The cache hit also can be in disk caches where the requested data is stored and accessed by the first query. A cache miss slows down the overall process because after a cache miss, the central processing unit (CPU) will look for a higher level cache, such as random access memory (RAM) for that data. Further, a new entry is created and copied into cache before it can be accessed by the processor.
    • Client-side Caching. Client-side caching allows server data to be copied and cached on the client computer. Client side caching reduces load times by several factors
  • Queuing Updates. Queues are used to order queries (and other database functions) in a timely fashion. There are queues for asynchronously sending notifications like email and SMS in most websites. E-commerce sites have queues for storing, processing and dispatching orders. How your database handles queues can affect your performance:
    • Batching. Batch processing can be used for efficient bulk database updates and automated transaction processing, as opposed to interactive online transaction processing (OLTP) applications.
    • Fan-Out Updates. Fan-out duplicates data in the database. When data is duplicated it eliminates slow joins and increases read performance.

Efficient Usage of Data at Scale

As you scale up in terms of database workload, you need to be able to avoid bad queries or patterns from your applications.

  • Moving expensive queries out of the user request path. Even if your database server uses powerful hardware, its performance can be negatively affected by a handful of expensive queries. Even a single bad query can cause serious performance issues for your database. Make sure to use monitoring tools to track down the queries that are taking up the most resources.
  • Using caching to offload database traffic. Cache data away from the database using something like memcached. This is usually done at the application layer, and is highly effective.
  • Counters and In-Memory Stores. Use memory counters to monitor performance hits: pages/sec, faults/sec, available bytes, total server, target server memory, etc. Percona’s new in-memory storage engine for MongoDB also can help.
  • Connection Pooling. A connection pool made up of cached database connections, remembered so that the connections can be reused for future requests to the database. Connection pools can improve the performance of executing commands on a database.

Scaling Out (Horizontal) Tricks

Scaling horizontally means adding more nodes to a system, such as adding a new server to a database environment to a distributed software application. For example, scaling out from one Web server to three.

  • Pre-Sharding Data for Flexibility. Pre-sharding the database across the server instances allows you to have the entire environment resources available at the start of the event, rather than having to rebalance during peak event traffic.
  • Using “Kill Switches” to Control Traffic. The idea of a kill switch is a single point where you can stop the flow of data to a particular node. Strategically set up kill switches allow you to stop a destructive workload that begins to impact the entire environment.
  • Limiting Graph Structures. By limiting the size or complexity of graph structures in the database, you will simplify data lookups and data size.

Scaling with Hardware (Vertical Scaling)

Another option to handle the increased traffic load is adding more hardware to your environment: more servers, more CPUs, more memory, etc. This, of course, can be expensive. One option here is to pre-configure your testing environment to become part of the production environment if necessary. Another is to pre-configure more Database-as-a-Service (DaaS) instances for the event (if you are a using cloud-based services).

Whichever method, be sure you verify and test your extra servers and environment before your drop-dead date.

Testing Performance and Capacity

As always, in any situation where your environment is going to be stressed beyond usual limits, testing under real-world conditions is a key factor. This includes not only testing for raw traffic levels, but also the actual workloads that your database will experience, with the same volume and variety of requests.

Knowing Your Application and Questions to Ask at Development Time

Finally, it’s important that you understand what applications will be used and querying the database. This sort of common sense idea is often overlooked, especially when teams (such as the development team and the database/operations team) get siloed and don’t communicate.

Get to know who is developing the applications that are using the database, and how they are doing it. As an example, a while back I had the opportunity to speak with a team of developers, mostly to just understand what they were doing. In the process of whiteboarding the app with them, we discovered a simple query issue that – now that we were aware of it – took little effort to fix. These sorts of interactions, early in the process, can save a great deal of headache down the line.

Conclusion

There are many strategies that can help you prepare for high traffic events that will impact your database. I’ve covered a few here briefly. For an even more thorough look at e-commerce database strategies, attend my webinar “Black Friday and Cyber Monday: How to Avoid an E-Commerce Disaster” on Thursday, September 22, 2016 10:00 am Pacific Time.

Register here.

by Tim Vaillancourt at September 14, 2016 12:06 PM

MariaDB AB

Is Your MariaDB Version Affected by the Remote Root Code Execution Vulnerability CVE-2016-6662?

Rasmus Johansson

Over the last few days, there has been a lot of questions and discussion around a vulnerability referred to as MySQL Remote Root Code Execution / Privilege Escalation 0day with CVE code CVE-2016-6662. It’s a serious vulnerability and we encourage every MariaDB Server, MariaDB Enterprise and MariaDB Enterprise Cluster user to read the below update on the vulnerability and how it affects MariaDB products.

The vulnerability can be exploited by both local and remote users. Both an authenticated connection to or SQL injection in an affected version of MariaDB Server can be used to exploit the vulnerability. If successful, a library file could be loaded and executed with root privileges.

The corresponding bug about the vulnerability can be seen in MariaDB’s project tracking with bug number MDEV-10465, which was opened on July 31, 2016.

MariaDB Enterprise and Enterprise Cluster
The following versions of MariaDB Enterprise and Enterprise Cluster include the fix for the vulnerability:

  • 5.5.51 or later versions
  • 10.0.27 or later versions
  • 10.1.17 or later versions

MariaDB Server
All stable MariaDB versions (5.5, 10.0, 10.1) were fixed in August 2016 in the following versions:

  • 5.5.51, released on August 10, 2016
  • 10.0.27, released on August 25, 2016
  • 10.1.17, released on August 30, 2016

If you’re on any of the above versions (or later), rest assured, you’re protected against this vulnerability. If you happen to be testing an alpha version of MariaDB 10.2, please be aware that the fix will be available in version 10.2.2, which is expected to be released soon.

More details on the vulnerability

The vulnerability makes use of the mysqld_safe startup script.

However, if the database user being used has neither SUPER nor FILE privilege or if the user has FILE but –secure-file-priv is set to isolate the location of import and export operations, then the vulnerability is NOT exploitable. It is always a recommended configuration to not grant SUPER privileges and to avoid granting FILE privileges without using –secure-file-priv.

Users that have installed MariaDB Server 10.1.8 or later from RPM or DEB packages are NOT affected by the vulnerability. This is due to the fact that in version 10.1.8, MariaDB started using systemd instead of init to manage the MariaDB service. In this case the mysqld_safe startup script isn’t used.

For the complete report of the vulnerability, please refer to the advisory by Dawid Golunski (legalhackers.com) who discovered the vulnerability.

About the Author

Rasmus Johansson's picture

Rasmus has worked with MariaDB since 2010 and was appointed VP Engineering in 2013. As such, he takes overall responsibility for the architecting and development of MariaDB Server, MariaDB Galera Cluster and MariaDB Enterprise.

by Rasmus Johansson at September 14, 2016 12:00 AM

September 13, 2016

Peter Zaitsev

MySQL CDC, Streaming Binary Logs and Asynchronous Triggers

MySQL CDC

MySQL CDCIn this post, we’ll look at MySQL CDC, streaming binary logs and asynchronous triggers.

What is Change Data Capture and why do we need it?

Change Data Capture (CDC) tracks data changes (usually close to realtime). In MySQL, the easiest and probably most efficient way to track data changes is to use binary logs. However, other approaches exist. For example:

  • General log or Audit Log Plugin (which logs all queries, not just the changes)
  • MySQL triggers (not recommended, as it can slow down the application — more below)

One of the first implementations of CDC for MySQL was the FlexCDC project by Justin Swanhart. Nowadays, there are a lot of CDC implementations (see mysql-cdc-projects wiki for a long list).

CDC can be implemented for various tasks such as auditing, copying data to another system or processing (and reacting to) events. In this blog post, I will demonstrate how to use a CDC approach to stream MySQL binary logs, process events and save it (stream to) another MySQL instance (or MongoDB). In addition, I will show how to implement asynchronous triggers by streaming binary logs.

Streaming binary logs 

You can read binary logs using the mysqlbinlog utility, by adding “-vvv” (verbose option). mysqlbinlog can also show human readable version for the ROW based replication. For example:

# mysqlbinlog -vvv /var/lib/mysql/master.000001
BINLOG '
JxiqVxMBAAAALAAAAI7LegAAAHQAAAAAAAEABHRlc3QAAWEAAQMAAUTAFAY=
JxiqVx4BAAAAKAAAALbLegAAAHQAAAAAAAEAAgAB//5kAAAAedRLHg==
'/*!*/;
### INSERT INTO `test`.`a`
### SET
###   @1=100 /* INT meta=0 nullable=1 is_null=0 */
# at 8047542
#160809 17:51:35 server id 1  end_log_pos 8047573 CRC32 0x56b36ca5      Xid = 24453
COMMIT/*!*/;

Starting with MySQL 5.6, mysqlbinlog can also read the binary log events from a remote master (“fake” replication slave).

Reading binary logs is a great basis for CDC. However, there are still some challenges:

  1. ROW-based replication is probably the easiest way to get the RAW changes, otherwise we will have to parse SQL. At the same time, ROW-based replication binary logs don’t contain the table metadata, i.e. it does not record the field names, only field number (as in the example above “@1” is the first field in table “a”).
  2. We will need to somehow record and store the binary log positions so that the tool can be restarted at any time and proceed from the last position (like a MySQL replication slave).

Maxwell’s daemon (Maxwell = Mysql + Kafka), an application recently released by Zendesk, reads MySQL binlogs and writes row updates as JSON (it can write to Kafka, which is its primary goal, but can also write to stdout and can be extended for other purposes). Maxwell stores the metadata about MySQL tables and binary log events (and other metadata) inside MySQL, so it solves the potential issues from the above list.

Here is a quick demo of Maxwell:

Session 1 (Insert into MySQL):

mysql> insert into a (i) values (151);
Query OK, 1 row affected (0.00 sec)
mysql> update a set i = 300 limit 5;
Query OK, 5 rows affected (0.01 sec)
Rows matched: 5  Changed: 5  Warnings: 0

Session 2 (starting Maxwell):

$ ./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout
16:00:15,303 INFO  Maxwell - Maxwell is booting (StdoutProducer), starting at BinlogPosition[master.000001:15494460]
16:00:15,327 INFO  TransportImpl - connecting to host: 127.0.0.1, port: 3306
16:00:15,350 INFO  TransportImpl - connected to host: 127.0.0.1, port: 3306, context: AbstractTransport.Context[threadId=9,...
16:00:15,350 INFO  AuthenticatorImpl - start to login, user: maxwell, host: 127.0.0.1, port: 3306
16:00:15,354 INFO  AuthenticatorImpl - login successfully, user: maxwell, detail: OKPacket[packetMarker=0,affectedRows=0,insertId=0,serverStatus=2,warningCount=0,message=<null>]
16:00:15,533 INFO  MysqlSavedSchema - Restoring schema id 1 (last modified at BinlogPosition[master.000001:3921])
{"database":"test","table":"a","type":"insert","ts":1472937475,"xid":211209,"commit":true,"data":{"i":151}}
{"database":"test","table":"a","type":"insert","ts":1472937475,"xid":211209,"commit":true,"data":{"i":151}}
{"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}}
{"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}}
{"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}}
{"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}}
{"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"commit":true,"data":{"i":300},"old":{"i":150}}

As we can see in this example, Maxwell get the events from MySQL replication stream and outputs it into stdout (if we change the producer, it can save it to Apache Kafka).

Saving binlog events to MySQL document store or MongoDB

If we want to save the events to some other place we can use MongoDB or MySQL JSON fields and document store (as Maxwell will provide use with JSON documents). For a simple proof of concept, I’ve created nodeJS scripts to implement a CDC “pipleline”:

var mysqlx = require('mysqlx');
var mySession =
mysqlx.getSession({
    host: '10.0.0.2',
    port: 33060,
    dbUser: 'root',
    dbPassword: 'xxx'
});
process.on('SIGINT', function() {
    console.log("Caught interrupt signal. Exiting...");
    process.exit()
});
process.stdin.setEncoding('utf8');
process.stdin.on('readable', () => {
  var chunk = process.stdin.read();
  if(chunk != null) {
    process.stdout.write(`data: ${chunk}`);
    mySession.then(session => {
                    session.getSchema("mysqlcdc").getCollection("mysqlcdc")
                    .add(  JSON.parse(chunk)  ) .execute(function (row) {
                            // can log something here
                    }).catch(err => {
                            console.log(err);
                    })
                    .then( function (notices) {
                            console.log("Wrote to MySQL: " + JSON.stringify(notices))
                    });
    }).catch(function (err) {
                  console.log(err);
                  process.exit();
    });
  }
});
process.stdin.on('end', () => {
  process.stdout.write('end');
  process.stdin.resume();
});

And to run it we can use the pipeline:

./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout --log_level=ERROR  | node ./maxwell_to_mysql.js

The same approach can be used to save the CDC events to MongoDB with mongoimport:

$ ./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout --log_level=ERROR |mongoimport -d mysqlcdc -c mysqlcdc --host localhost:27017

Reacting to binary log events: asynchronous triggers

In the above example, we only recorded the binary log events. Now we can add “reactions”.

One of the practical applications is re-implementing MySQL triggers to something more performant. MySQL triggers are executed for each row, and are synchronous (the query will not return until the trigger event finishes). This was known to cause poor performance, and can significantly slow down bulk operations (i.e., “load data infile” or “insert into … values (…), (…)”). With triggers, MySQL will have to process the “bulk” operations row by row, killing the performance. In addition, when using statement-based replication, triggers on the slave can slow down the replication thread (it is much less relevant nowadays with ROW-based replication and potentially multithreaded slaves).

With the ability to read binary logs from MySQL (using Maxwell), we can process the events and re-implement triggers — now in asynchronous mode — without delaying MySQL operations. As Maxwell gives us a JSON document with the “new” and “old” values (with the default option binlog_row_image=FULL, MySQL records the previous values for updates and deletes) we can use it to create triggers.

Not all triggers can be easily re-implemented based on the binary logs. However, in my experience most of the triggers in MySQL are used for:

  • auditing (if you deleted a row, what was the previous value and/or who did and when)
  • enriching the existing table (i.e., update the field in the same table)

Here is a quick algorithm for how to re-implement the triggers with Maxwell:

  • Find the trigger table and trigger event text (SQL)
  • Create an app or a script to parse JSON for the trigger table
  • Create a new version of the SQL changing the NEW.<field> to “data.field” (from JSON) and OLD.<field> to “old.field” (from JSON)

For example, if I want to audit all deletes in the “transactions” table, I can do it with Maxwell and a simple Python script (do not use this in production, it is a very basic sample):

import json,sys
line = sys.stdin.readline()
while line:
    print line,
    obj=json.loads(line);
    if obj["type"] == "delete":
        print "INSERT INTO transactions_delete_log VALUES ('" + str(obj["data"]) + "', Now() )"
    line = sys.stdin.readline()

MySQL:

mysql> delete from transactions where user_id = 2;
Query OK, 1 row affected (0.00 sec)

Maxwell pipeline:

$ ./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout --log_level=ERROR  | python trigger.py
{"database":"test","table":"transactions","type":"delete","ts":1472942384,"xid":214395,"commit":true,"data":{"id":2,"user_id":2,"value":2,"last_updated":"2016-09-03 22:39:31"}}
INSERT INTO transactions_delete_log VALUES ('{u'last_updated': u'2016-09-03 22:39:31', u'user_id': 2, u'id': 2, u'value': 2}', Now() )

Maxwell limitations

Maxwell was designed for MySQL 5.6 with ROW-based replication. Although it can work with MySQL 5.7, it does not support new MySQL 5.7 data types (i.e., JSON fields). Maxwell does not support GTID, and can’t failover based on GTID (it can parse events with GTID thou).

Conclusion

Streaming MySQL binary logs (for example with Maxwell application) can help to implement CDC for auditing and other purposes, and also implement asynchronous triggers (removing the MySQL level triggers can increase MySQL performance).

by Alexander Rubin at September 13, 2016 10:21 PM

ProxySQL and MHA Integration

MHA

MHAThis blog post discusses ProxySQL and MHA integration, and how they work together.

MHA (Master High Availability Manager and tools for MySQL) is almost fully integrated with the ProxySQL process. This means you can count on the MHA standard feature to manage failover, and ProxySQL to manage the traffic and shift from one server to another.

This is one of the main differences between MHA and VIP, and MHA and ProxySQL: with MHA/ProxySQL, there is no need to move IPs or re-define DNS.

The following is an example of an MHA configuration file for use with ProxySQL:

server default]
    user=mha
    password=mha
    ssh_user=root
    repl_password=replica
    manager_log=/tmp/mha.log
    manager_workdir=/tmp
    remote_workdir=/tmp
    master_binlog_dir=/opt/mysql_instances/mha1/logs
    client_bindir=/opt/mysql_templates/mysql-57/bin
    client_libdir=/opt/mysql_templates/mysql-57/lib
    master_ip_failover_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_failover
    master_ip_online_change_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_online_change
    log_level=debug
    [server1]
    hostname=mha1r
    ip=192.168.1.104
    candidate_master=1
    [server2]
    hostname=mha2r
    ip=192.168.1.107
    candidate_master=1
    [server3]
    hostname=mha3r
    ip=192.168.1.111
    candidate_master=1
    [server4]
    hostname=mha4r
    ip=192.168.1.109
    no_master=1

NOTE: Be sure to comment out the “FIX ME ” lines in the sample/scripts.

After that, just install MHA as you normally would.

In ProxySQL, be sure to have all MHA users and the servers set.

When using ProxySQL with standard replication, it’s important to set additional privileges for the ProxySQL monitor user. It must also have “Replication Client” set or it will fail to check the SLAVE LAG. The servers MUST have a defined value for the attribute

max_replication_lag
, or the check will be ignored.

As a reminder:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',600,3306,1000,0);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.107',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.111',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.109',601,3306,1000,10);
INSERT INTO mysql_replication_hostgroups VALUES (600,601);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
insert into mysql_query_rules (username,destination_hostgroup,active) values('mha_W',600,1);
insert into mysql_query_rules (username,destination_hostgroup,active) values('mha_R',601,1);
insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('mha_RW',600,1,3,'^SELECT.*FOR UPDATE');
insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('mha_RW',601,1,3,'^SELECT');
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_W','test',1,600,'test_mha');
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_R','test',1,601,'test_mha');
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_RW','test',1,600,'test_mha');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK

OK, now that all is ready,  let’s rock’n’roll!

Controlled fail-over

First of all, the masterha_manager should not be running or you will get an error.

Now let’s start some traffic:

Write
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all  run
Read only
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10  --mysql-ignore-errors=all run

Let it run for a bit, then check:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.1.104 | 3306     | ONLINE | 10       | 0        | 20     | 0       | 551256  | 44307633        | 0               | 285        | <--- current Master
| 601       | 192.168.1.111 | 3306     | ONLINE | 5        | 3        | 11     | 0       | 1053685 | 52798199        | 4245883580      | 1133       |
| 601       | 192.168.1.109 | 3306     | ONLINE | 3        | 5        | 10     | 0       | 1006880 | 50473746        | 4052079567      | 369        |
| 601       | 192.168.1.107 | 3306     | ONLINE | 3        | 5        | 13     | 0       | 1040524 | 52102581        | 4178965796      | 604        |
| 601       | 192.168.1.104 | 3306     | ONLINE | 7        | 1        | 16     | 0       | 987548  | 49458526        | 3954722258      | 285        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Now perform the failover. To do this, instruct MHA to do a switch, and to set the OLD master as a new slave:

masterha_master_switch --master_state=alive --conf=/etc/mha.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=0

Check what happened:

[ 160s] threads: 10, tps: 354.50, reads: 3191.10, writes: 1418.50, response time: 48.96ms (95%), errors: 0.00, reconnects:  0.00
[ 170s] threads: 10, tps: 322.50, reads: 2901.98, writes: 1289.89, response time: 55.45ms (95%), errors: 0.00, reconnects:  0.00
[ 180s] threads: 10, tps: 304.60, reads: 2743.12, writes: 1219.91, response time: 58.09ms (95%), errors: 0.10, reconnects:  0.00 <--- moment of the switch
[ 190s] threads: 10, tps: 330.40, reads: 2973.40, writes: 1321.00, response time: 50.52ms (95%), errors: 0.00, reconnects:  0.00
[ 200s] threads: 10, tps: 304.20, reads: 2745.60, writes: 1217.60, response time: 58.40ms (95%), errors: 0.00, reconnects:  1.00
[ 210s] threads: 10, tps: 353.80, reads: 3183.80, writes: 1414.40, response time: 48.15ms (95%), errors: 0.00, reconnects:  0.00

Check ProxySQL:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.1.107 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 123457  | 9922280         | 0               | 658        | <--- new master
| 601       | 192.168.1.111 | 3306     | ONLINE | 2        | 6        | 14     | 0       | 1848302 | 91513537        | 7590137770      | 1044       |
| 601       | 192.168.1.109 | 3306     | ONLINE | 5        | 3        | 12     | 0       | 1688789 | 83717258        | 6927354689      | 220        |
| 601       | 192.168.1.107 | 3306     | ONLINE | 3        | 5        | 13     | 0       | 1834415 | 90789405        | 7524861792      | 658        |
| 601       | 192.168.1.104 | 3306     | ONLINE | 6        | 2        | 24     | 0       | 1667252 | 82509124        | 6789724589      | 265        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

In this case, the servers weren’t behind the master and switch happened quite fast.

We can see that the WRITE operations that normally are an issue, given the need to move around a VIP or change name resolution, had a limited hiccup.

Read operations were not affected, at all. Nice, eh?

Do you know how long it takes to do a switch under these conditions? real 0m2.710s yes 2.7 seconds.

This is more evidence that, most of the time, an MHA-based switch is caused by the need to redirect traffic from A to B using the network.

Crash fail-over

What happened if instead of an easy switch, we have to cover a real failover?

First of all, let’s start masterha_manager:

nohup masterha_manager --conf=/etc/mha.cnf --wait_on_monitor_error=60 --wait_on_failover_error=60 >> /tmp/mha.log 2>&1

Then let’s start a load again. Finally, go to the MySQL node that uses master xxx.xxx.xxx.107

ps aux|grep mysql
mysql    18755  0.0  0.0 113248  1608 pts/0    S    Aug28   0:00 /bin/sh /opt/mysql_templates/mysql-57/bin/mysqld_safe --defaults-file=/opt/mysql_instances/mha1/my.cnf
mysql    21975  3.2 30.4 4398248 941748 pts/0  Sl   Aug28  93:21 /opt/mysql_templates/mysql-57/bin/mysqld --defaults-file=/opt/mysql_instances/mha1/my.cnf --basedir=/opt/mysql_templates/mysql-57/ --datadir=/opt/mysql_instances/mha1/data --plugin-dir=/opt/mysql_templates/mysql-57//lib/plugin --log-error=/opt/mysql_instances/mha1/mysql-3306.err --open-files-limit=65536 --pid-file=/opt/mysql_instances/mha1/mysql.pid --socket=/opt/mysql_instances/mha1/mysql.sock --port=3306
And kill the MySQL process.
kill -9 21975 18755

As before, check what happened on the application side:

[  80s] threads: 4, tps: 213.20, reads: 1919.10, writes: 853.20, response time: 28.74ms (95%), errors: 0.00, reconnects:  0.00
[  90s] threads: 4, tps: 211.30, reads: 1901.80, writes: 844.70, response time: 28.63ms (95%), errors: 0.00, reconnects:  0.00
[ 100s] threads: 4, tps: 211.90, reads: 1906.40, writes: 847.90, response time: 28.60ms (95%), errors: 0.00, reconnects:  0.00
[ 110s] threads: 4, tps: 211.10, reads: 1903.10, writes: 845.30, response time: 29.27ms (95%), errors: 0.30, reconnects:  0.00 <-- issue starts
[ 120s] threads: 4, tps: 198.30, reads: 1785.10, writes: 792.40, response time: 28.43ms (95%), errors: 0.00, reconnects:  0.00
[ 130s] threads: 4, tps: 0.00, reads: 0.60, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.40         <-- total stop in write
[ 140s] threads: 4, tps: 173.80, reads: 1567.80, writes: 696.30, response time: 34.89ms (95%), errors: 0.40, reconnects:  0.00 <-- writes restart
[ 150s] threads: 4, tps: 195.20, reads: 1755.10, writes: 780.50, response time: 33.98ms (95%), errors: 0.00, reconnects:  0.00
[ 160s] threads: 4, tps: 196.90, reads: 1771.30, writes: 786.80, response time: 33.49ms (95%), errors: 0.00, reconnects:  0.00
[ 170s] threads: 4, tps: 193.70, reads: 1745.40, writes: 775.40, response time: 34.39ms (95%), errors: 0.00, reconnects:  0.00
[ 180s] threads: 4, tps: 191.60, reads: 1723.70, writes: 766.20, response time: 35.82ms (95%), errors: 0.00, reconnects:  0.00

So it takes ~10 seconds to perform failover.

To understand better, let see what happened in MHA-land:

Tue Aug 30 09:33:33 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Aug 30 09:33:33 2016 - [info] Reading application default configuration from /etc/mha.cnf..
... Read conf and start
Tue Aug 30 09:33:47 2016 - [debug] Trying to get advisory lock..
Tue Aug 30 09:33:47 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
... Wait for errors
Tue Aug 30 09:34:47 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) <--- Error time
Tue Aug 30 09:34:56 2016 - [warning] Connection failed 4 time(s)..                                     <--- Finally MHA decide to do something
Tue Aug 30 09:34:56 2016 - [warning] Master is not reachable from health checker!
Tue Aug 30 09:34:56 2016 - [warning] Master mha2r(192.168.1.107:3306) is not reachable!
Tue Aug 30 09:34:56 2016 - [warning] SSH is reachable.
Tue Aug 30 09:34:58 2016 - [info] Master failover to mha1r(192.168.1.104:3306) completed successfully. <--- end of the failover

MHA sees the server failing at xx:47, but because of the retry and checks validation, it actually fully acknowledges the downtime at xx:56 (~8 seconds after).

To perform the whole failover, it only takes ~2 seconds (again). Because no movable IPs or DNSs were involved, the operations were fast. This is true when the servers have the binary-log there, but it’s a different story if MHA also has to manage and push data from the binarylog to MySQL.

As you can see, ProxySQL can also help reduce the timing for this scenario, totally skipping the network-related operations. These operations are the ones causing the most trouble in these cases.

by Marco Tusa at September 13, 2016 06:03 PM

MariaDB Foundation

MariaDB 5.5.52 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 5.5.52. See the release notes and changelog for details on this release. IMPORTANT: There was a security fix included in the 5.5.51 release of MariaDB. If you are running MariaDB 5.5.50 or lower, please upgrade to at least MariaDB 5.5.51 right away. See […]

The post MariaDB 5.5.52 now available appeared first on MariaDB.org.

by Daniel Bartholomew at September 13, 2016 04:11 PM

Jean-Jerome Schmidt

Sign up for Part 2 of the MySQL Query Tuning Webinar Trilogy: Indexing & EXPLAIN

When it comes to the query tuning, EXPLAIN is one the most important tools in the DBA’s arsenal. Why is a given query slow, what does the execution plan look like, how will JOINs be processed, is the query using the correct indexes, or is it creating a temporary table?

You can now sign up for the webinar, which takes place at the end of this month on September 27th. We’ll look at the EXPLAIN command and see how it can help us answer these questions.

We will also look into how to use database indexes to speed up queries. More specifically, we’ll cover the different index types such as B-Tree, Fulltext and Hash, deepdive into B-Tree indexes, and discuss the indexes for MyISAM vs. InnoDB tables as well as some gotchas.

MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive

September 27th

Sign up now

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

And if you’d like to be a step ahead, you can also already sign up for the third and last part of this trilogy: MySQL Query Tuning: Working with optimizer and SQL tuning on October 25th.

We look forward to seeing you there!

by Severalnines at September 13, 2016 01:59 PM

MariaDB Foundation

MariaDB Server versions and the Remote Root Code Execution Vulnerability CVE-2016-6662

During the recent days there has been quite a lot of questions and discussion around a vulnerability referred to as MySQL Remote Root Code Execution / Privilege Escalation 0day with CVE code CVE-2016-6662. It’s a serious vulnerability and we encourage every MariaDB Server user to read the below update on the vulnerability from a MariaDB […]

The post MariaDB Server versions and the Remote Root Code Execution Vulnerability CVE-2016-6662 appeared first on MariaDB.org.

by rasmus at September 13, 2016 12:38 PM

Peter Zaitsev

Percona Monitoring and Management (PMM) is now available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the availability of Percona Monitoring and Management (PMM), an open source software database monitoring and management tool. Completely open source and free to download and use, Percona Monitoring and Management provides point-in-time visibility and historical trending of database performance that enables DBAs and application developers to optimize the performance of their MySQL and MongoDB databases.

Percona Monitoring and Management combines several best-of-breed tools, including Grafana, Prometheus, and Consul, in a single, easy-to-manage virtual appliance, along with Percona-developed query analytics, administration, API, agent and exporter components. Percona Monitoring and Management monitors and provides actionable performance data for MySQL variants, including Oracle MySQL Community Edition, Oracle MySQL Enterprise Edition, Percona Server for MySQL, and MariaDB, as well as MongoDB variants, including MongoDB Community Edition, and Percona Server for MongoDB.

PMM is an on-premises solution that keeps all of your performance and query data inside the confines of your environment, with no requirement for any data to cross the Internet.

Percona Monitoring and Management Highlights:

  • Provides query and metric information that enables administrators to optimize database performance
  • Displays current queries and highlights potential query issues to enable faster issue resolution
  • Maps queries against metrics to help make informed decisions about crucial database resources: platform needs, system growth, team focus and the most important database activities.

PMM provides database maintenance teams with better visibility into database and query activity, in order to implement actionable strategies and issue resolution more quickly. More information allows you to concentrate efforts on the areas that yield the highest value.

Like prior versions, PMM is distributed through Docker Hub and is free to download. Full instructions for downloading and installing the server and client are available in the documentation.

A PMM demonstration is available at pmmdemo.percona.com. We have also implemented forums for PMM discussions.

There will also be a webinar with Percona’s Roman Vynar, Lead Platform Engineer on Thursday, September 15, 2016 at 10:00am PDT (UTC-7) about “Identifying and Solving Database Performance Issues with PMM.” Register here for the webinar to learn more about PMM.  Can’t attend the webinar we got you covered! Register anyways and we’ll send you the recording and slides even if you can’t attend the webinar.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

As always, thanks for your continued support of Percona!

by Bob Davis at September 13, 2016 10:00 AM

September 12, 2016

Peter Zaitsev

Is Your Database Affected by CVE-2016-6662?

CVE-2016-6662

CVE-2016-6662In this blog post, I will discuss the CVE-2016-6662 vulnerability, how to tell if it affects you, and how to prevent the vulnerability from affecting you if you have an older version of MySQL.

I’ll also list which MySQL versions include the vulnerability fixes.

As we announced in a previous post, there are certain scenarios in Percona Server (and MySQL) that can allow a remote root code execution (CVE-2016-6662).

Vulnerability approach

The website legalhackers.com contains the full, current explanation of the CVE-2016-6662 vulnerability.

To summarize, the methods used to gain

root
 privileges require multiple conditions:

  1. A remote (or even local) MySQL user that has
    FILE
     permissions (or
    SUPER
    , which encompasses all of them).
  2. Improper OS files/directories permissions around MySQL configuration files that allow the MySQL system user access to modify or create new configuration files.
  3. Several techniques alter the MySQL configuration to include loading a malicious shared library.
    The techniques currently described require
    FILE
     or
    SUPER
     privileges, but also include the currently undisclosed CVE-2016-6663 (which demonstrates how to alter the configuration without
    FILE
     privileges).
  4. Have that malicious shared library loaded when MySQL restarts, which includes the code that allows privilege escalation.

Fixed versions

MySQL fixes

MySQL seems to have already released versions that include the security fixes.

This is coming from the release notes in MySQL 5.6.33:

  • For mysqld_safe, the argument to --malloc-lib now must be one of the directories /usr/lib/usr/lib64/usr/lib/i386-linux-gnu, or /usr/lib/x86_64-linux-gnu. In addition, the --mysqld and --mysqld-version options can be used only on the command line and not in an option file. (Bug #24464380)
  • It was possible to write log files ending with .ini or .cnf that later could be parsed as option files. The general query log and slow query log can no longer be written to a file ending with .ini or .cnf. (Bug #24388753)
  • Privilege escalation was possible by exploiting the way REPAIR TABLE used temporary files. (Bug #24388746)

You aren’t affected if you use version 5.5.52, 5.6.33 or 5.7.15.

Release notes: 5.5.525.6.335.7.15

Percona Server

The way Percona increased security was by limiting which libraries are allowed to be loaded with 

LD_PRELOAD
 (including
--malloc-lib
), and limiting them to
/usr/lib/
/usr/lib64
 and the MySQL installation base directory.

This means only locations that are accessible by

root
 users can load shared libraries.

The following Percona Server versions have this fix:

We are working on releasing new Percona XtraDB Cluster versions as well.

Future Percona Server releases will include all fixes from MySQL.

MariaDB

MariaDB has fixed the issue in 5.5.5110.1.17 and 10.0.27

I have an older MySQL Version, what to do now?

It is possible to change the database configuration so that it isn’t affected anymore (without changing your MySQL versions and restarting your database). There are several options, each of them focusing on one of the conditions required for the vulnerability to work.

Patch
mysqld_safe
 Manually

Just before publishing this, a blogpost came out with another alternative on how to patch your server: https://www.psce.com/blog/2016/09/12/how-to-quickly-patch-mysql-server-against-cve-2016-6662/.

Database user permissions

One way to avoid the vulnerability is making sure no remote user has 

SUPER
 or
FILE
 privileges.

However, CVE-2016-6663 mentions there is a way to do this without any

FILE
 privileges (likely related to the
REPAIR TABLE
 issue mentioned in MySQL release notes).

Configuration files permissions

The vulnerability needs to be able to write to some MySQL configuration files. Prevent that and you are secure.

Make sure you configure permissions for various config files as follows:

  • MySQL reads configuration files from different paths, including from your
    datadir
    • Create an (empty)
      my.cnf
        and
      .my.cnf
       in the
      datadir
       (usually 
      /var/lib/mysql
      ) and make
      root
       the owner/group with
      0644
       permissions.
    • Other Locations to look into:
      /etc/my.cnf /etc/mysql/my.cnf /usr/etc/my.cnf ~/.my.cnf
        (
      mysqld --help --verbose
       shows you where
      mysqld
       will look)
  • This also includes 
    !includedir
     paths defined in your current configurations — make sure they are not writeable by the
    mysql
     user as well
  • No config files should be writeable by the
    mysql
     user (change ownership and permissions)

by Kenny Gryp at September 12, 2016 10:55 PM

Percona Live Europe featured talk with Ronald Bradford — Securing your MySQL/MariaDB data

Percona Live Europe featured talk

Percona Live Europe featured talkWelcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Ronald Bradford, Founder & CEO of EffectiveMySQL. His talk will be on Securing your MySQL/MariaDB data. This talk will answer questions like:

  • How do you apply the appropriate filesystem permissions?
  • How do you use TLS/SSL for connections, and are they good for replication?
  • Encrypting all your data at rest
  • How to monitor your database with the audit plugin

. . . and more. I had a chance to speak with Ronald and learn a bit more about database security:

PerconaGive me a brief history of yourself: how you got into database development, where you work, what you love about it?

Ronald: My first introduction to relational theory and databases was with the writings of C.J. Date and Michael Stonebraker while using the Ingres RDBMS in 1988. For 28 years, my industry experience in the database field has covered a number of relational and non-relational products, including MySQL – which I started using at my first startup in 1999. For the last 17 years, I have enjoyed contributing to the MySQL ecosystem in many ways. I’ve consulted with hundreds of organizations, both small and large, that rely on MySQL to deliver strategic value to their business customers. I have given over 130 presentations in the past ten years across six continents and published a number of books and blog articles from my experiences with MySQL and open source. I am also the organizer of the MySQL Meetup group in New York City.

My goals have always been to help educate the current generation of software engineers to appreciate, use and maximize the right product for the job. I always hope that MySQL is the right solution, but recommend other options when it is not.

I am presently looking for my next opportunity to help organizations develop a strategic and robust data infrastructure that ensures business continuity for growing needs – ensuring a reliable and consistent user experience.

Percona: Your talk is called “Securing your MySQL/MariaDB data.” Why is securing your database important, and what are the real-world ramifications for a database security breach?

Ronald: We secure the belongings in our home, we secure the passengers in our car, we secure the possessions we carry on us. Data is a valuable asset for many organizations, and for some it is the only asset of value for continued operation. Should such important business information not have the same value as people or possessions?

Within any industry, you want to be the disruptor and not the disrupted. The press coverage on any alleged or actual data breach generally leads to a loss of customer confidence. This in turn can directly affect your present and future business viability – enabling competitors to take advantage of the situation. Data security should be as important as data recovery and system performance. Today we hear about data breaches on a weekly basis – everything from government departments to large retail stores. We often do not hear of the data breaches that can occur with smaller organizations, who also have your information: your local medical provider, or a school or university that holds your personal information.

A data breach can be much more impactful than data loss. It can be harder to detect and assess the long-term impact of a security breach because there might be unauthorized access over a longer time period. Often there are insufficient audit trails and logs to validate the impact of any security breach. Inadequate access controls can also lead to unauthorized data access both internally and externally. Many organizations fail to manage risk by not providing a “least privileges required approach” for any access to valuable data by applications or staff.

Any recent real-world example highlights the potential of insufficient data security, and therefore the increased risk of your personal information being used illegally. What is your level of confidence about security when you register with a new service and then you receive an email with your login and password in clear text? If your password is not secure, your personal data is also not secure and now it’s almost impossible for your address, phone number and other information to be permanently removed from this insecure site.

Percona: Are there significant differences between security for on-premise and cloud-based databases? What are they?

Ronald: There should be no differences in protecting your data within MySQL regardless of where this is stored.  When using a cloud-based database there is the additional need to have a shared responsibility with your cloud provider ensuring their IaaS and provided services have adequate trust and verification. For example, you need to ensure that provisioned disk and memory is adequately zeroed after use, and also ensure that adequate separation exists between hosts and clients on dedicated equipment in a virtualized environment. While many providers state these security and compliance processes, there have been instances where data has not been adequately protected.

Just as you may trust an internal department with additional security in the physical and electronic access to the systems that hold your data, you should “trust but verify” your cloud provider’s capacity to protect your data and that these providers continue to assess risk regularly and respond appropriately.

Percona: What is changing in database security that keeps you awake at night? What things does the market need to address immediately?

Ronald: A discussion with a CTO recently indicated he was worried about how their infrastructure would support high availability: what is the impact of any outage, and how does the organization know if he is prepared enough? Many companies, regardless of their size, are not prepared for either a lack of availability or a security breach.

The recent Delta is an example of an availability outage that cost the company many millions of dollars. Data security should be considered with the exact same concern, however it is often the poor cousin to availability. Disaster recovery is a commonly used term for addressing the potential loss of access to data, but there is not a well-known term or common processes for addressing data protection.

You monitor the performance of your system for increased load and for slow queries. When did you last monitor the volume of access to secure data to look for unexpected patterns or anomalies? A data breach can be a single SQL statement that is not an expected application traffic pattern. How can you protect your data in this situation? We ask developers to write unit tests to improve code coverage. Does your organization ask developers to write tests to perform SQL injection, or write SQL statements that should not be acceptable to manipulate data and are therefore correctly identified, alerted and actioned? Many organizations run load and volume testing regularly, but few organizations run security drills as regularly.

As organizations continue to address the growing data needs in the digital age, ongoing education and awareness are very important. There is often very little information in the MySQL ecosystem about validating data security, determining what is applicable security monitoring, and what is the validation and verification of authorized and unauthorized data access. What also needs to be addressed is the use (and abuse) of available security in current and prior MySQL versions. The key advancements in MySQL 5.6 and MySQL 5.7, combined with a lack of a migration path for organizations, is a sign that ongoing security improvements are not considered as important as other features.

Percona: What are looking forward to the most at Percona Live Europe this year?

Ronald: Percona Live Europe is a chance for all attendees, including myself, to see, hear and share in the wide industry use of MySQL today (and the possibilities tomorrow).

With eight sessions per time slot, I often wish for the ability to be in multiple places at  once! Of particular interest to myself are new features that drive innovation of the product, such as MySQL group replication.

I am also following efforts related to deploying your application stack in containers using Docker. Solving the state and persistence needs of a database is very different to providing application micro-services. I hope to get a better appreciation for finding a balance between the use of containers, VMs and dedicated hardware in a MySQL stack that promotes accelerated development, performance, business continuity and security.

You can read more about Ronald and his thoughts on database security at ronaldbradford.com.

Want to find out more about Ronald, MySQL/MariaDB and security? Register for Percona Live Europe 2016, and come see his talk Securing your MySQL/MariaDB data.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

by Dave Avery at September 12, 2016 03:59 PM

Percona Server Critical Update CVE-2016-6662

CVE-2016-6662

CVE-2016-6662This blog is an announcement for a Percona Server update with regards to CVE-2016-6662.

We have added a fix for CVE-2016-6662 in the following releases:

From seclist.org:

An independent research has revealed multiple severe MySQL vulnerabilities. This advisory focuses on a critical vulnerability with a CVEID of CVE-2016-6662. The vulnerability affects MySQL servers in all version branches (5.7, 5.6, and 5.5) including the latest versions, and could be exploited by both local and remote attackers.

Both the authenticated access to MySQL database (via network connection or web interfaces such as phpMyAdmin) and SQL Injection could be used as exploitation vectors. Successful exploitation could allow attackers to execute arbitrary code with root privileges which would then allow them to fully compromise the server on which an affected version of MySQL is running.

This is a CRITICAL update, and the fix mitigates the potential for remote root code execution.

We encourage our users to update to the latest version of their particular fork as soon as possible, ensuring they have appropriate change management procedures in place beforehand so they can test the update before placing it into production.

Percona would like to thank Dawid Golunski of http://legalhackers.com/ for disclosing this vulnerability in the MySQL software, and working with us to resolve this problem.

by David Busby at September 12, 2016 02:06 PM

MariaDB AB

Creating a MariaDB MaxScale Router Module

Anders Karlsson

I wanted to do some tests with MariaDB MaxScale and realized that the two existing routers (beyond the binlog router that is, which is a bit special) didn't do what I wanted them to do. What I was looking for was a simple round-robin feature and neither the readconnroute nor readwritesplit could be configured to do this. They are just too smart for my simple experiment.

Why would you want a round-robin router? Well, one use case is when you are INSERTing a lot of data and you just want to persist it. You don't have the use case where you have to SELECT data from all servers, but in the case you need it, you just select from all servers until you find what you need. Let's think about log data that you don't care much about but that you for some reason need to retain, maybe for corporate policy reasons or legal reasons. Using round-robin could, in theory, give you better performance, but that would require something way smarter than what I am proposing here. Rather, you get INSERT availability, i.e. you will always have some server to insert into and secondly, you get INSERT sharding, which is basic but useful, you only store so much data on each server.

So, let's get to work. To begin with you need the MaxScale source tree, place yourself in some directory where you want this and do this:

$ git clone https://github.com/mariadb-corporation/MaxScale.git

Now you should have a directory called MaxScale, so pop in there, create a build directory and then run cmake to configure MaxScale itself: 

$ cd MaxScale
$ mkdir build
$ cd build
$ cmake ..
$ make

These are the quick instructions and you will probably find that you lack some dependencies. The full instructions for how to do this is available as part of the sample code as presented later in this document and that is available from Sourceforge. Browse to https://sourceforge.net/projects/mxsroundrobin/files and then click on roundrobin 1.0 where you will find a pdf with detailed instructions. Also there is a tgz there will all the sourcecode presented late in this blog.

So, now we have something to work with and the plan is to introduce a new router module in this tree. To begin with pop over to where the routers module code is and create a directory for our code there:

$ cd ../server/modules/routing
$ mkdir roundrobin
$ cd roundrobin

Before we can start building some code, let's look at the basics of what kind of code gets into a module.

A plugin is a shared object that is loaded by MaxScale core when it starts. Early on when MaxScale starts it reads the configuration file, /etc/maxscale.cnf by default, and in there each service defines a router. Note that several services can use the same router so the code we write later has to take this into account. Look at this extract of a service section for example:

[Read-Write Service]
type=service
router=readwritesplit

The router here tells MaxScale to look for a readwritesplit module, or in technical terms, it will load the shared library: libreadwritesplit.so. After loading this library successfully, MaxScale has to figure out a few things about this module, like its name and version and above all, the entry points for the functions that MaxScale will call when processing a connection. In addition, we need to define a few structs that are passed around these different calls to give the different router functions some context. Let's start with a header file rooundrobin.h in the roundrobin directory:

#ifndef ROUNDROBIN_H
#define ROUNDROBIN_H
#include <server.h>

typedef struct tagROUNDROBININST *PROUNDROBININST;

typedef struct tagROUNDROBIN_CLIENT_SES {
  SPINLOCK lock;
  bool bClosed;
  SERVER **pBackends;
  SESSION *pSession;
  DCB **pdcbClients;
  unsigned int nBackends;
  unsigned int nCurrBackend;
  PROUNDROBININST pRouter;
  struct tagROUNDROBIN_CLIENT_SES *pNext;
} ROUNDROBIN_CLIENT_SES, *PROUNDROBIN_CLIENT_SES;

typedef struct tagROUNDROBININST {
  SERVICE *pService;
  PROUNDROBIN_CLIENT_SES pConnections;
  SPINLOCK lock;
  SERVER **pBackends;
  unsigned int nBackends;
  struct tagROUNDROBININST *pNext;
} ROUNDROBININST;
#endif

As you can see, the main thing here is that I define and typedef two structs. As I said, I have mostly been looking at other existing routers and grabbed the stuff in there, so I can't explain all aspects of these structs, but let's look at a few members:

These structs are in a linked list and the pNext member is a pointer to the next element in this list.

The lock members is a reference to a spinlock associated with the struct.

The pBackends member is a pointer to an array of pointers to the database SERVERS that this service is attached to.

The pbcdClients member is an array of pointers to DCDs. A DCB is the Descriptor Control Block which is a generic descriptor of a connection inside MaxScale, be it a server or a client. In this case this is the DCBs to the SERVERs in pBackends.

The nBackends is the number of elements in the pBackends and pdcbClientsarrays.

The pRouter member is a pointer to the ROUNDROBININST for the connection.

That is the most of that, the next step now is to start with the more exiting stuff of the actual code that make up this module. The main source file we work with here is roundrobin.c and we need a few basics in this. Let's have a look the beginning of roundrobin.c:

#include <my_config.h>
#include <router.h>
#include <query_classifier.h>
#include <mysql_client_server_protocol.h=>
#include "roundrobin.h"

/* Macros. */
#define ROUNDROBIN_VERSION "1.0.0"

/* Globals. */
MODULE_INFO info = {
  MODULE_API_ROUTER,
  MODULE_GA,
  ROUTER_VERSION,
  "A simple roundrobin router"
};
static PROUNDROBININST pInstances;

/* Function prototypes for API. */
static ROUTER *CreateInstance(SERVICE *service, char **options);
static void *CreateSession(ROUTER *pInstance, SESSION *session);
static void CloseSession(ROUTER *pInstance, void *session);
static void FreeSession(ROUTER *pInstance, void *session);
static int RouteQuery(ROUTER *pInstance, void *session, GWBUF *queue);
static void Diagnostic(ROUTER *pInstance, DCB *dcb);
static void ClientReply(ROUTER *pInstance, void *router_session,
  GWBUF *queue, DCB *backend_dcb);
static void HandleError(ROUTER *pInstance, void *router_session,
  GWBUF *errmsgbuf, DCB *backend_dcb, error_action_t action,
  bool *succp);
static int GetCapabilities();

static ROUTER_OBJECT RoundRobinRouter = {
  CreateInstance,
  CreateSession,
  CloseSession,
  FreeSession,
  RouteQuery,
  Diagnostic,
  ClientReply,
  HandleError,
  GetCapabilities
};

Let's now look at what is going on here. To begin with, I include a few necessary files, including roundrobin.h that we created above and then a macro is defined. Then the MODULE_INFO struct follows. The information in this is used by MaxScale to get information on the router, but if you leave this out, currently MaxScale will start anyway. The command shows modules in maxadmin will return the information in this struct for the module.

Then follows a number of function prototypes, and these are needed here before the ROUTER_OBJECT struct, and this is the key to the router as it provides the entry points for MariaDB itself. Again, I will not specify exactly what all of these do, I have mostly just grabbed code from other routers.

Following this, we need some basic functions that all routers implement, to initialize the module, get the version and a function to return the ROUTER OBJECT defined above:

/*
 * Function: ModuleInit()
 * Initialize the Round Robin router module.
 */
void ModuleInit()
   {
   MXS_NOTICE("Initialise roundrobin router module version " ROUNDROBIN_VERSION ".");
   pInstances = NULL;
   } /* End of ModuleInit(). */


/*
 * Function: version()
 * Get the version of the roundrobin router
 */
char *version()
   {
   return ROUNDROBIN_VERSION;
   } /* End if version(). */


/*
 * Function: GetModuleObject()
 * Get the object that describes this module.
 */
ROUTER_OBJECT *GetModuleObject()
   {
   return &RoundRobinRouter;
   } /* End of GetModuleObject(). */

With that we have completed the housekeeping code and are ready to look at the functions that implement the actual functionality. We'll look at CreateInstance first which, as the name implies, creates an snstance of RoundRobin. Note that within a running MaxScale, there might well be more than one instance, one for each RoundRobin service.

/*
 * Function: CreateInstance()
 * Create an instance of RoundRobing router.
 */
ROUTER *CreateInstance(SERVICE *pService, char **pOpts)
   {
   PROUNDROBININST pRet;
   PROUNDROBININST pTmp;
   SERVER_REF *pSvcRef;
   unsigned int i;

   MXS_NOTICE("Creating roundrobin router instance.");
/* Allocate the RoundRobin instance struct. */
   if((pRet = malloc(sizeof(ROUNDROBININST))) == NULL)
      return NULL;
   pRet->pService = pService;
   pRet->pConnections = NULL;
   pRet->pNext = NULL;
   pRet->nBackends = 0;

/* Count the number of backend servers we manage. */
   for(pSvcRef = pService->dbref; pSvcRef != NULL; pSvcRef = pSvcRef->next)
      pRet->nBackends++;

/* Allocate space for the backend servers and add to the instance struct. */
   if((pRet->pBackends = calloc(pRet->nBackends, sizeof(SERVER *))) == NULL)
      {
      free(pRet);
      return NULL;
      }

   spinlock_init(&pRet->lock);

/* Set up list of servers. */
   for(i = 0, pSvcRef = pService->dbref; pSvcRef != NULL; i++, pSvcRef = pSvcRef->next)
      pRet->pBackends[i] = pSvcRef->server;

/* Set up instance in list. */
   if(pInstances == NULL)
      pInstances = pRet;
   else
      {
      for(pTmp = pInstances; pTmp->pNext != NULL; pTmp = pTmp->pNext)
         ;
      pTmp->pNext = pRet;
      }

   MXS_NOTICE("Created roundrobin router instance.");
   return (ROUTER *) pRet;
   } /* End of CreateInstance(). */

Again, nothing really exiting is happening, I create a struct that defines the instance, initialize it and add it to the linked list of instances that I maintain. Also I get references to the backend servers that this instance uses and set up the array for it and I also initialize the spinlock. With that, we are done. Then there is the issue of creating a session and this function gets called when a client connects to MaxScale through the port that is linked to RoundRobin.

/*
 * Function: CreateSession()
 * Create a session in the RoundRobin router.
 */
void *CreateSession(ROUTER *pInstance, SESSION *session)
   {
   PROUNDROBIN_CLIENT_SES pRet;
   PROUNDROBIN_CLIENT_SES pTmp;
   PROUNDROBININST pRoundRobinInst = (PROUNDROBININST) pInstance;
   unsigned int i;

/* Allocating session struct. */
   if((pRet = malloc(sizeof(ROUNDROBIN_CLIENT_SES))) == NULL)
      return NULL;
   spinlock_init(&pRet->lock);
   pRet->pNext = NULL;
   pRet->nCurrBackend = 0;
   pRet->pSession = session;
   pRet->pRouter = pRoundRobinInst;
   pRet->nBackends = pRoundRobinInst->nBackends;

/* Allocating backends and DCBs. */
   if((pRet->pBackends = calloc(pRet->nBackends, sizeof(SERVER *))) == NULL)
      {
      free(pRet);
      return NULL;
      }
   if((pRet->pdcbClients = calloc(pRet->nBackends, sizeof(DCB *))) == NULL)
      {
      free(pRet->pBackends);
      free(pRet);
      return NULL;
      }

/* Set servers and DCBs. */
   for(i = 0; i < pRet->nBackends; i++)
      {
      pRet->pBackends[i] = pRoundRobinInst->pBackends[i];
      pRet->pdcbClients[i] = NULL;
      }

/* Place connecting last in list of connections in instance. */
   spinlock_acquire(&pRoundRobinInst->lock);
   if(pRoundRobinInst->pConnections == NULL)
      pRoundRobinInst->pConnections = pRet;
   else
      {
      for(pTmp = pRoundRobinInst->pConnections; pTmp->pNext != NULL; pTmp = pTmp->pNext)
         ;
      pTmp->pNext = pRet;
      }
   spinlock_release(&pRoundRobinInst->lock);

   return (void *) pRet;
   } /* End of CreateSession(). */

This is also pretty basic stuff, the server pointers are copied from the instance (do I need to do this you ask? Answer is, I don't know but I do know that what I do here works). I also clear the DCB pointers, these are created on an as-needed base later in the code.

Following this are a couple of basic housekeeping functions that I am not showing here. Actually, I'm just going to show one more function, which is RouteQuery. This is, as the name implies, the function that gets called to do what we are actually writing this code for - routing queries. Before I show that code, I have to explain that this is very simplistic code. To being with, it doesn't implement "session commands", these are commands that really should be run on all backends, like setting the current database, handling transactions and such things. As I said, I do not implement this and this is one of the major shortcomings on this code that makes it much less generally applicable. But it still has use cases. Secondly, I have tried to make sure that the code works, more than optimizing it to death, so maybe I grab the spinlock too often and maybe I am too picky with allocating/deallocating the DCBs, I let others answer that.

The role of the function at hand is to handle an incoming query and pass it along to one of the servers defined for the service in question. In the general case, the most complicated part of this is selection of which server to route the query to and handling of session commands. I have simplified this by only having a very simple routing algorithm where I store the index of the last used backed for a connection in the nCurrBackend member, and for each query this is incremented until nBackends is reached where it is reset to 0. And for the complexity of session commands, I just don't implement them.

So, lets have a look at what the RouteQuery function looks like:

/*
 * Function: RouteQuery()
 * Route a query in the RoundRobin router.
 */
int RouteQuery(ROUTER *instance, void *session, GWBUF *queue)
   {
   PROUNDROBIN_CLIENT_SES pSession = (PROUNDROBIN_CLIENT_SES) session;
   DCB *pDcb;
   int nRet;
   unsigned int nBackend;

   MXS_NOTICE("Enter RoundRobin RouteQuery.");
   queue = gwbuf_make_contiguous(queue);

   spinlock_acquire(&pSession->lock);
/* Check for the next running backend. Set non-running backend DCBs to NULL. */
   for(nBackend = pSession->nCurrBackend; nBackend < pSession->nBackends; nBackend++)
      {
/* If this server is up, then exit this loop now. */
      if(!SERVER_IS_DOWN(pSession->pBackends[nBackend]))
         break;

/* If the server is down and the DCB is non-null, then free the DCB and NULL it now. */
      if(pSession->pdcbClients[nBackend] != NULL)
         {
         dcb_close(pSession->pdcbClients[nBackend]);
         pSession->pdcbClients[nBackend] = NULL;
         }
      }
/* If I couldn't find a backend after the current, then look through the ones before. */
   if(nBackend >= pSession->nBackends)
      {
      for(nBackend = 0; nBackend <= pSession->nCurrBackend; nBackend++)
         {
         if(!SERVER_IS_DOWN(pSession->pBackends[nBackend]))
            break;
         if(pSession->pdcbClients[nBackend] != NULL)
            {
            dcb_close(pSession->pdcbClients[nBackend]);
            pSession->pdcbClients[nBackend] = NULL;
            }
         }

/* Check that I really found a suitable backend. */
      if(nBackend > pSession->nCurrBackend)
         {
         spinlock_release(&pSession->lock);
         MXS_NOTICE("No suitable RoundRobin running server found in RouteQuery.");
         return 0;
         }
      }

   pDcb = pSession->pdcbClients[nBackend];
/* If backend DCB wasn't set, then do that now. */
   if(pDcb == NULL)
      pDcb = pSession->pdcbClients[nBackend] = dcb_connect(pSession->pBackends[nBackend],
        pSession->pSession,
        pSession->pBackends[nBackend]->protocol);
   spinlock_release(&pSession->lock);

/* Route the query. */
   nRet = pDcb->func.write(pDcb, queue);

/* Move to next dcb. */
   pSession->nCurrBackend = nBackend;
   if(++pSession->nCurrBackend >= pSession->nBackends)
      pSession->nCurrBackend = 0;

   MXS_NOTICE("Exit RoundRobin RouteQuery.");
   return 1;
   } /* End of RouteQuery(). */

So, what is going on here? First I check for a backend, first the ones starting with the current one (which is badly named, this is actually the one after the current) and then until I find a server that is running. If I find a non-Running server I skip that one, after having closed the associated DCB. If I can't find a server after the current one, I start again from the first, processing servers in the same way.

Following this I should have a server, then I check if the DCB is open, and if not I open it now. After that I do the actual routing of the query, move not the next backend and then return. Simple as that. As I have stated, this is a very simple router, but it does work, within the given limitations, and it should be good enough as a crude example.

Before I can test my code, I have to set it up for inclusion in the build process and do a few other mundane tasks, but that is all documented in the pdf that comes with the code, download the package from Sourceforge.

by Anders Karlsson at September 12, 2016 09:07 AM

Colin Charles

Speaking in September 2016

A few events, but mostly circling around London:

  • Open collaboration – an O’Reilly Online Conference, at 10am PT, Tuesday September 13 2016 – I’m going to be giving a new talk titled Forking Successfully. I’ve seen how the platform works, and I’m looking forward to trying this method out (its like a webminar but not quite!)
  • September MySQL London Meetup – I’m going to focus on MySQL, a branch, Percona Server and the fork MariaDB Server. This will be interesting because one of the reasons you don’t see a huge Emacs/XEmacs push after about 20 years? Feature parity. And the work that’s going into MySQL 8.0 is mighty interesting.
  • Operability.io should be a fun event, as the speakers were hand-picked and the content is heavily curated. I look forward to my first visit there.

by Colin Charles at September 12, 2016 03:44 AM

September 10, 2016

Valeriy Kravchuk

Fun with Bugs #45 - On Some Bugs Fixed in MySQL 5.7.15

Oracle released MySQL 5.7.15 recently, earlier than expected. The reason for this "unexpected" release is not clear to me, but it could happen because of a couple of security related internal bug reports that got fixed:

  • "It was possible to write log files ending with .ini or .cnf that later could be parsed as option files. The general query log and slow query log can no longer be written to a file ending with .ini or .cnf. (Bug #24388753)
  • Privilege escalation was possible by exploiting the way REPAIR TABLE used temporary files. (Bug #24388746)"
Let me concentrate on the most important fixes to bugs and problems reported by Community users. First of all, in MySQL 5.7.15 one can just turn off InnoDB deadlock detection using the new  innodb_deadlock_detect dynamic server variable. Domas had explained the positive effect of this more than 6 years ago in his post. Some improvements to the way deadlock detection worked in MySQL happened in frames of fix for the Bug #49047 long time ago, but this time Oracle just implemented a way to disable check and rely on InnoDB lock wait timeout instead.

Other InnoDB-related fixes to problems reported in public bugs database include:
  • Bug #82073 - "Crash with InnoDB Encryption, 5.7.13, FusionIO & innodb_flush_method=O_DIRECT". It was reported by my colleague from MariaDB, Chris Calender, and verified by other my colleague from MariaDB, Jan Lindström. Probably Bugs Verification Team in Oracle just had no access to proper hardware to verify this.
  • Bug #79378 - "buf_block_align() makes incorrect assumptions about chunk size". This bug was reported by Alexey Kopytov, who had provided a patch.
There were several fixes to replication-related bugs:
  • Bug #81675 - "mysqlbinlog does not free the existing connection before opening new remote one". It was reported by Laurynas Biveinis from Percona, who had also provided a patch, and verified by Umesh.
  • Bug #80881 - "MTR: binlog test suite failed to cleanup (contribution)". This fix to the binlog test suit was contributed by Daniel Black and verified by Umesh.
  • Bug #79867 - "unnecessary using temporary for update". This bug was reported by Zhang Yingqiangwho had also contributed a patch (that was not used after all, according to the comment from Oracle developer). It was verified by Umesh.
 Some more bugs from other categories were also fixed:
  • Bug #82125 - "@@basedir sysvar value not normalized if set through the command line/INI file". It was reported by Georgi Kodinov from Oracle. It's funny that there is a typo in the release notes when this fix is described (pay attention to slashes):
    "If the basedir system variable was set at server startup from the command line or option file, the value was not normalized (on Windows, / was not replaced with /)"
  • Bug #82097 is private. I can not say anything about it in addition to this:
    "kevent statement timer subsystem deinitialization was revised to avoid a mysqld hang during shutdown on OS X 10.12."
    I can repeat, though, my usual statement that in most cases making bugs private is a wrong thing to do. I feel myself personally insulted every time when I see that fixed bug report remains private.
  • Bug #81666 - "The MYSQL_SERVER define not defined du to spelling error in plugin.cmake". It was reported by Magnus Blåudd who had provided a patch also.
  • Bug #81587 - "Combining ALTER operations triggers table rebuild". This bug was reported by Daniël van Eeden and verified by Umesh.
  • Bug #68972 - "Can't find temporary table". This bug (that could happen in a stored procedure or when prepared statements are used) was reported by Cyril Scetbon and verified by Miguel Solorzano.
  • Bug #82019 - "Is client library supposed to retry EINTR indefinitely or not". It was reported by Laurynas Biveinis from Percona, who had also contributed patches later. This bug was verified formally by Sinisa Milivojevic.
To summarize, you should consider upgrade to MySQL 5.7.15 for sure if you use FusionIO or want to be able to disable InnoDB deadlock detection entirely, or if you consider security-related fixes in this release really important (I don't). Otherwise just check other fixes that could impact you positively, or just wait for 5.7.16...

by Valeriy Kravchuk (noreply@blogger.com) at September 10, 2016 05:39 PM

September 09, 2016

Peter Zaitsev

Don’t Spin Your Data, Use SSDs!

ssds

ssdsThis blog post discussed the advantages of SSDs over HDDs for database environments.

For years now, I’ve been telling audiences for my MySQL Performance talk the following: if you are running an I/O-intensive database on spinning disks you’re doing it wrong. But there are still a surprising number of laggards who aren’t embracing SSD storage (whether it’s for cost or reliability reasons).

Let’s look at cost first. As I write this now (September 2016), high-performance server-grade spinning hard drives run for about $240 for 600GB (or $0.40 per GB).  Of course, you can get an 8TB archive drive at about same price (about $0.03 per GB), but it isn’t likely you’d use something like that for your operational database. At the same time, you can get a Samsung 850 EVO drive for approximately $300 (or $0.30 per GB), which is cheaper than the server-grade spinning drive!  

While it’s not the best drive money can buy, it is certainly an order of magnitude faster than any spinning disk drive!

(I’m focusing on the cost per GB rather than the cost of the number of IOPS per drive as SSDs have overtaken HDDs years ago when it comes to IOPS/$.)

If we take a look at Amazon EBS pricing, we will find that Amazon has moved to SSD volumes by default as “General Purpose” storage (gp2). Prices for this volume type run about 2x higher per GB than high-performance HDD-based volumes (st1) and provisioned IOPs volumes. The best volumes for databases will likely run you 4x higher than HDD.

This appears to be a significant cost difference, but keep in mind you can get much more IOPS at much better latency from these volumes. They also handle IO spikes better, which is very important for real workloads.

Whether we’re looking at a cloud or private environment, it is wrong just to look at the cost of the storage alone – you must look at the whole server cost. When using an SSD, you might not need to buy a RAID card with battery-backed-up (BBU) cache, as many SSDs have similar functions built in.

(For some entry-level SSDs, there might be an advantage to purchasing a RAID with BBU, but it doesn’t affect performance nearly as much as for HDDs. This works out well, however, as entry level SSDs aren’t going to cost that much to begin with and won’t make this setup particularly costly, relative to a higher-end SSD.)  

Some vendors can charge insane prices for SSDs, but this is where you should negotiate and your alternative vendor choice powers.

Some folks are concerned they can’t get as much storage per server with SSDs because they are smaller. This was the case a few years back, but not any more. You can find a 2TB 2.5” SSD drive easily, which is larger than the available 2.5” spinning drives. You can go as high as 13TB in the 2.5” form factor

There is a bit of challenge if you’re looking at the NVMe (PCI-E) cards, as you typically can’t have as many of those per server as you could using spinning disks, but the situation is changing here as well with the 6.4TB SX300 from Sandisk/FusionIO or the PM1725 from Samsung. Directly attached storage provides extremely high performance and 10TB-class sizes.  

To get multiple storage units together, you can use hardware RAID, software RAID, LVM striping or some file systems (such as ZFS) can take care of it for you.    

Where do we stand with SSD reliability? In my experience, modern SSDs (even inexpensive ones) are pretty reliable, particularly for online data storage. The shelf life of unpowered SSDs is likely to be less than HDDs, but we do not really keep servers off for long periods of time when running database workloads. Most SSDs also do something like RAID internally (it’s called RAIN) in addition to error correction codes that protect your data from a full single flash chip.

In truth, focusing on storage-level redundancy is overrated for databases. We want to protect most critical applications from complete database server failure, which means using some form of replication, storing several copies of data. In this case, you don’t need bulletproof storage on a single server – just a replication setup where you won’t lose the data and any server loss is easy to handle. For MySQL, solutions like Percona XtraDB Cluster come handy. You can use external tools such as Orchestrator or MHA to make MySQL replication work.  

When it comes to comparing SSD vs. HDD performance, whatever you do with SSDs they will likely still perform better than HDDs. Your RAID5 and RAID6 arrays made from SSDs will beat your RAID10 and RAID0 made from HDDs (unless your RAID card is doing something nasty).

Another concern with SSD reliability is write endurance. SSDs indeed have a specified amount of writes they can handle (after which they are likely to fail). If you’re thinking about replacing HDDs with SSDs, examine how long SSDs would endure under a comparable write load.  

If we’re looking at a high HDD write workload, a single device is likely to handle 200 write IOPS of 16KB (when running InnoDB). Let’s double that. That comes to 6.4MB/sec, which gives us  527GB/day (doing this 24/7). Even with the inexpensive Samsung 850 Pro we get 300TB of official write endurance – enough for 1.5 years. And in reality, drives tend to last well beyond their official specs.    

If you don’t like living on the edge, more expensive server-grade storage options have much better endurance. For example, 6.4TB SX300 offers almost 100x more endurance at 22 Petabytes written.

In my experience, people often overestimate how many writes their application performs on a sustained basis. The best approach is to do the math, but also monitor the drive status with a SMART utility or vendor tool. The tools can alert you in advance when drive wears out.

Whatever your workload is, you will likely find an SSD solution that offers you enough endurance while significantly exceeding the performance of an HDD-based solution.

Finally, there is a third and very important component of SSD reliability for operational database workloads: not losing your data during a power failure. Many “consumer-grade” SSDs come with drive write cache enabled by default, but without proper power loss protection. This means you can lose some writes during a power failure, causing data loss or database corruption.

Disabling write cache is one option, though it can severely reduce write performance and does not guarantee data won’t be lost. Using enterprise-grade SSDs from a reputable vendor is another option, and testing SSDs yourself might be a good idea if you’re on a budget.  

Conclusion

When it comes to operational databases, whether your workload is on-premises or in the cloud,  Don’t spin your data – use SSD. There are choices and options for almost any budget and every workload.

by Peter Zaitsev at September 09, 2016 08:49 PM

Basic Housekeeping for MySQL Indexes

MySQL Indexes

MySQL IndexesIn this blog post, we’ll look at some of the basic housekeeping steps for MySQL indexes.

We all know that indexes can be the difference between a high-performance database and a bad/slow/painful query ride. It’s a critical part that needs deserves some housekeeping once in a while. So, what should you check? In no particular order, here are some things to look at:

1. Unused indexes

With sys schema, is pretty easy to find unused indexes: use the schema_unused_indexes view.

mysql> select * from sys.schema_unused_indexes;
+---------------+-----------------+-------------+
| object_schema | object_name     | index_name  |
+---------------+-----------------+-------------+
| world         | City            | CountryCode |
| world         | CountryLanguage | CountryCode |
+---------------+-----------------+-------------+
2 rows in set (0.01 sec)

This view is based on the performance_schema.table_io_waits_summary_by_index_usage table, which will require enabling the Performance Schema, the events_waits_current consumer and the wait/io/table/sql/handler instrument. PRIMARY (key) indexes are ignored.

If you don’t have them enabled, just execute these queries:

update performance_schema.setup_consumers set enabled = 'yes' where name = 'events_waits_current';
update performance_schema.setup_instruments set enabled = 'yes' where name = 'wait/io/table/sql/handler';

Quoting the documentation:

“To trust whether the data from this view is representative of your workload, you should ensure that the server has been up for a representative amount of time before using it.”

And by representative amount, I mean representative: 

  • Do you have a weekly job? Wait at least one week
  • Do you have monthly reports? Wait at least one month
  • Don’t rush!

Once you’ve found unused indexes, remove them.

2. Duplicated indexes

You have two options here:

  • pt-duplicate-key-checker
  • the schema_redundant_indexes view from sys_schema

The pt-duplicate-key-checker is part of Percona Toolkit. The basic usage is pretty straightforward:

[root@e51d333b1fbe mysql-sys]# pt-duplicate-key-checker
# ########################################################################
# world.CountryLanguage
# ########################################################################
# CountryCode is a left-prefix of PRIMARY
# Key definitions:
#   KEY `CountryCode` (`CountryCode`),
#   PRIMARY KEY (`CountryCode`,`Language`),
# Column types:
#      	  `countrycode` char(3) not null default ''
#      	  `language` char(30) not null default ''
# To remove this duplicate index, execute:
ALTER TABLE `world`.`CountryLanguage` DROP INDEX `CountryCode`;
# ########################################################################
# Summary of indexes
# ########################################################################
# Size Duplicate Indexes   2952
# Total Duplicate Indexes  1
# Total Indexes            37

Now, the schema_redundant_indexes view is also easy to use once you have sys schema installed. The difference is that it is based on the information_schema.statistics table:

mysql> select * from schema_redundant_indexesG
*************************** 1. row ***************************
              table_schema: world
                table_name: CountryLanguage
      redundant_index_name: CountryCode
   redundant_index_columns: CountryCode
redundant_index_non_unique: 1
       dominant_index_name: PRIMARY
    dominant_index_columns: CountryCode,Language
 dominant_index_non_unique: 0
            subpart_exists: 0
            sql_drop_index: ALTER TABLE `world`.`CountryLanguage` DROP INDEX `CountryCode`
1 row in set (0.00 sec)

Again, once you find the redundant index, remove it.

3. Potentially missing indexes

The statements summary tables from the performance schema have several interesting fields. For our case, two of them are pretty important: NO_INDEX_USED (means that the statement performed a table scan without using an index) and NO_GOOD_INDEX_USED (“1” if the server found no good index to use for the statement, “0” otherwise).

Sys schema has one view that is based on the performance_schema.events_statements_summary_by_digest table, and is useful for this purpose: statements_with_full_table_scans, which lists all normalized statements that have done a table scan.

For example:

mysql> select * from world.CountryLanguage where isOfficial = 'F';
55a208785be7a5beca68b147c58fe634  -
746 rows in set (0.00 sec)
mysql> select * from statements_with_full_table_scansG
*************************** 1. row ***************************
                   query: SELECT * FROM `world` . `Count ... guage` WHERE `isOfficial` = ?
                      db: world
              exec_count: 1
           total_latency: 739.87 us
     no_index_used_count: 1
no_good_index_used_count: 0
       no_index_used_pct: 100
               rows_sent: 746
           rows_examined: 984
           rows_sent_avg: 746
       rows_examined_avg: 984
              first_seen: 2016-09-05 19:51:31
               last_seen: 2016-09-05 19:51:31
                  digest: aa637cf0867616c591251fac39e23261
1 row in set (0.01 sec)

The above query doesn’t use an index because there was no good index to use, and thus was reported. See the explain output:

mysql> explain select * from world.CountryLanguage where isOfficial = 'F'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: CountryLanguage
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 984
        Extra: Using where

Note that the “query” field reports the query digest (more like a fingerprint) instead of the actual query.

In this case, the CountryLanguage table is missing an index over the “isOfficial” field. It is your job to decide whether it is worth it to add the index or not.

4. Multiple column indexes order

It was explained before that Multiple Column index beats Index Merge in all cases when such index can be used, even when sometimes you might have to use index hints to make it work.

But when using them, don’t forget that the order matters. MySQL will only use a multi-column index if at least one value is specified for the first column in the index.

For example, consider this table:

mysql> show create table CountryLanguageG
*************************** 1. row ***************************
       Table: CountryLanguage
Create Table: CREATE TABLE `CountryLanguage` (
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `Language` char(30) NOT NULL DEFAULT '',
  `IsOfficial` enum('T','F') NOT NULL DEFAULT 'F',
  `Percentage` float(4,1) NOT NULL DEFAULT '0.0',
  PRIMARY KEY (`CountryCode`,`Language`),
  KEY `CountryCode` (`CountryCode`),
  CONSTRAINT `countryLanguage_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `Country` (`Code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

A query against the field “Language” won’t use an index:

mysql> explain select * from CountryLanguage where Language = 'English'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: CountryLanguage
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 984
        Extra: Using where

Simply because it is not the leftmost prefix for the Primary Key. If we add the “CountryCode” field, now the index will be used:

mysql> explain select * from CountryLanguage where Language = 'English' and CountryCode = 'CAN'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: CountryLanguage
         type: const
possible_keys: PRIMARY,CountryCode
          key: PRIMARY
      key_len: 33
          ref: const,const
         rows: 1
        Extra: NULL

Now, you’ll have to also consider the selectivity of the fields involved. Which is the preferred order?

In this case, the “Language” field has a higher selectivity than “CountryCode”:

mysql> select count(distinct CountryCode)/count(*), count(distinct Language)/count(*) from CountryLanguage;
+--------------------------------------+-----------------------------------+
| count(distinct CountryCode)/count(*) | count(distinct Language)/count(*) |
+--------------------------------------+-----------------------------------+
|                               0.2368 |                            0.4644 |
+--------------------------------------+-----------------------------------+

So in this case, if we create a multi-column index, the preferred order will be (Language, CountryCode).

Placing the most selective columns first is a good idea when there is no sorting or grouping to consider, and thus the purpose of the index is only to optimize where lookups. You might need to choose the column order, so that it’s as selective as possible for the queries that you’ll run most.

Now, is this good enough? Not really. What about special cases where the table doesn’t have an even distribution? When a single value is present way more times than all the others? In that case, no index will be good enough. Be careful not to assume that average-case performance is representative of special-case performance. Special cases can wreck performance for the whole application.

In conclusion, we depend heavily on proper indexes. Give them some love and care once in a while, and the database will be very grateful.

All the examples were done with the following MySQL and Sys Schema version:

mysql> select * from sys.version;
+-------------+-----------------+
| sys_version | mysql_version   |
+-------------+-----------------+
| 1.5.1       | 5.6.31-77.0-log |
+-------------+-----------------+

by Daniel Guzmán Burgos at September 09, 2016 05:44 PM

September 08, 2016

Peter Zaitsev

MySQL Replication Troubleshooting: Q & A

MySQL Replication Troubleshooting

MySQL Replication TroubleshootingIn this blog, I will provide answers to the Q & A for the MySQL Replication Troubleshooting webinar.

First, I want to thank everybody for attending the August 25 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: Hi Sveta. One question: how is it possible to get N previous events using the SHOW BINLOG EVENTS command? For example, the position is 999 and I want to analyze the previous five events. Is it possible?

A: Not, there is no such option. You cannot get the previous five events using

SHOW BINLOG EVENTS
. However, you can use
mysqlbinlog
 with the option
--stop-position
 and tail its output.

Q: We are having issues with inconsistencies over time. We also have a lot of “waiting for table lock” statuses during high volume usage. Would changing these tables to InnoDB help the replicated database remain consistent?

A: Do you use MyISAM? Switching to InnoDB might help, but it depends on what types of queries you use. For example, if you often use the 

LOCK TABLE
  command, that will cause a 
"waiting for table lock"
  error for InnoDB too. Regarding data consistency between the master and slave, you need to use row-based replication.

Q: For semi-sync replication, what’s the master’s behavior when the master never received ACK from any of the slaves?

A: It will timeout after

rpl_semi_sync_master_timeout
  milliseconds, and then switch to asynchronous replication.

Q: We’re using MySQL on r3.4xlarge EC2 instances (16 CPU). We use RBR. innodb_read_io_threads and innodb_write_io_threads =4. We often experience lags. Would increasing these to eight offer better IO for slaves? What other parameters could boost slave IO?

A: Yes, an increased number of IO threads would most likely improve performance. Other parameters that could help are similar to the ones discussed in “InnoDB Troubleshooting” and “Introduction to Troubleshooting Performance: What Affects Query Execution?” webinars. You need to pay attention to InnoDB options that affect IO (

innodb_thread_concurrency, innodb_flush_method, innodb_flush_log_at_trx_commit, innodb_flush_log_at_timeout
 ) and general IO options, such as
sync_binlog
 .

Q: How many masters can I have working together?

A: What do you mean by “how many masters can [you] have working together”? Do you mean circular replication or a multi-master setup? In any case, the only limitation is hardware. For a multi-master setup you should ensure that the slave has enough resources to process all requests. For circular replication, ensure that each of the masters in the chain can handle the increasing number of writes as they replicate down the chain, and do not lead to permanently increasing slave lags.

Q: What’s the best way to handle auto_increment?

A: Follow the advice in the user manual: set

auto_increment_offset
  to a unique value on each of servers,
auto_increment_increment
  to the number of servers and never update auto-incremented columns manually.

Q: I configured multi threads replication. Sometimes the replication lag keeps increasing while the slave was doing “invalidating query cache entries(table)”.  How should I do to fine tune it?

A: The status

"invalidating query cache entries(table)"
 means that the query cache is invalidating entries, and has been changed by a command currently being executed by the slave SQL thread. To avoid this issue, you need to keep the query cache small (not larger than 512 MB) and de-fragment it from time to time using the FLUSH QUERY CACHE command.

Q: Sometimes when IO is slow and during lag we see info: Reading event from the relay log “Waiting for master to send event” — How do we troubleshoot to get more details.

A: The

"Waiting for master to send event"
 state shows that the slave IO thread sent a request for a new event, and is waiting for the event from the master. If you believe it hasn’t received the event in a timely fashion, check the error log files on both the master and slave for connection errors. If there is no error message, or if the message doesn’t provide enough information to solve the issue, use the network troubleshooting methods discussed in the “Troubleshooting hardware resource usage” webinar.

Save

by Sveta Smirnova at September 08, 2016 06:53 PM

Percona is Hiring: Director of Platform Engineering

percona is hiring

percona is hiringPercona is hiring a Director of Platform Engineering. Find out more!

At Percona, we recognize you need much more than just a database server to successfully run a database-powered infrastructure. You also need strong tools that deploy, manage and monitor the software. Percona’s Platform Engineering group is responsible just for that. They build next-generation open source solutions for the deployment, monitoring and management of open source databases.

This  team is currently responsible for products such as Percona Toolkit , Percona Monitoring Plugins and Percona Monitoring and Management.  

Percona builds products that advance state-of-the-art open source software. Our products help our customers monitor and manage their databases. They help our services team serve customers faster, better and more effectively.

The leader of the Platform Engineering group needs a strong vision, as well as an understanding of market trends, best practices for automation, monitoring and management – in the cloud and on premises. This person must have some past technical operations background and experience building and leading engineering teams that have efficiently delivered high-quality software. The ideal candidate will also understand the nature of open source software development and experience working with distributed teams.

This position is for “player coach” – you will get your hands dirty writing code, performing quality assurance, making great documentation and assisting customers with troubleshooting.

We not looking for extensive experience with a particular programming language, but qualified candidates should be adept at learning new programming languages. Currently, our teams use a combination of Perl, Python, Go and Javascript.

The Director of Platform Engineering reports to Vadim Tkachenko, CTO and VP of Engineering. They will also work closely with myself, other senior managers and experts at Percona.

Interested? Please apply here on Percona’s website.

by Peter Zaitsev at September 08, 2016 06:20 PM

Jean-Jerome Schmidt

Planets9s - Try the new ClusterControl 1.3.2 with its new deployment wizard

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

Try the new ClusterControl 1.3.2 with its new deployment wizard

This week we’re delighted to announce the release of ClusterControl 1.3.2, which includes the following features: a new alarm viewer and new deployment wizard for MySQL, MongoDB & PostgreSQL making it ever easier to deploy your favourite open source databases; it also includes deployment of MongoDB sharded clusters as well as MongoDB advisors. If you haven’t tried it out yet, now is the time to download this latest release and provide us with your feedback.

Download the new ClusterControl

New partnership with WooServers helps start-ups challenge Google, Amazon and Microsoft

In addition to announcing the new ClusterControl 1.3.2 this week, we’ve also officially entered into a new partnership with WooServers to bring ClusterControl to web hosting. WooServers is a web hosting platform, used by 5,500 businesses, such as WhiteSharkMedia and SwiftServe to host their websites and applications. With ClusterControl, WooServers makes available a managed service that includes comprehensive infrastructure automation and management of MySQL-based database clusters. The service is available on WooServers data centers, as well as on Amazon Web Services and Microsoft Azure.

Find out more

Sign up for Part 2 of our MySQL Query Tuning Trilogy: Indexing and EXPLAIN

You can now sign up for Part 2 of our webinar trilogy on MySQL Query Tuning. In this follow up webinar to the one on process and tools, we’ll cover topics such as SQL tuning, indexing, the optimizer and how to leverage EXPLAIN to gain insight into execution plans. More specifically, we’ll look at how B-Tree indexes are built, indexes MyISAM vs. InnoDB, different index types such as B-Tree, Fulltext and Hash, indexing gotchas and an EXPLAIN walkthrough of a query execution plan.

Sign up today

How to set up read-write split in Galera Cluster using ProxySQL

ProxySQL is an SQL-aware load balancer for MySQL and MariaDB. A scheduler was recently added, making it possible to execute external scripts from within ProxySQL. In this new blog post, we’ll show you how to take advantage of this new feature to perform read-write splits on your Galera Cluster.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

by Severalnines at September 08, 2016 02:23 PM

Colin Charles

Speaking at Percona Live Europe Amsterdam

I’m happy to speak at Percona Live Europe Amsterdam 2016 again this year (just look at the awesome schedule). On my agenda:

I’m also signed up for the Community Dinner @ Booking.com, and I reckon you should as well – only 35 spots remain!

Go ahead and register now. You should be able to search Twitter or the Percona blog for discount codes :-)

by Colin Charles at September 08, 2016 11:20 AM

September 07, 2016

Peter Zaitsev

Percona Live Europe featured talk with Igor Canadi — Everything you wanted to know about MongoRocks

percona live europe featured talk

Percona Live Europe featured talkWelcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Igor Canadi, Software Engineer at Facebook, Inc. His talk will be on Everything you wanted to know about MongoRocks. MongoRocks is MongoDB with RocksDB storage engine. It was developed by Facebook, where it’s used to power mobile backend as a service provider Parse.

I had a chance to speak with Igor and learn a bit more about these questions:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it?

Igor: After I finished my undergrad at the University of Zagreb in Croatia, I joined University of Wisconsin-Madison’s Masters program. Even though UW-M is famous for its work on databases, during my two years there I worked in a different area. However, as I joined Facebook after school, I heard of a cool new project called RocksDB. Everything about building a new storage engine sounded exciting to me, although I had zero idea how thrilling the ride will actually be. The best part was working with and getting to know amazing people from Facebook, Parse, MongoDB, Percona, and many other companies that are using or experimenting with RocksDB.

Percona: Your talk is called “Everything you wanted to know about MongoRocks.” Briefly, what is MongoRocks and why did it get developed?

Igor: Back in 2014 MongoDB announced that they are building a pluggable storage engine API, which would enable MongoDB users to seamlessly choose a storage engine that works best for their workload. Their first prototype was actually using RocksDB as a storage engine, which was very exciting for us. However, they bought WiredTiger soon after, another great storage engine, and decided to abandon MongoDB+RocksDB project. At the same time, Parse was running into scaling challenges with their MongoDB deployment. We decided to help out and take over the development of MongoRocks. We started rolling it out at Parse in March of 2015 already and completed the rollout in October. Running MongoRocks instead of MongoDB with the MMap storage engine resulted in much greater efficiency and lower latencies in some scenarios. Some of the experiences are captured in Parse’s blog posts: http://blog.parse.com/announcements/mongodb-rocksdb-parse/ and http://blog.parse.com/learn/engineering/mongodb-rocksdb-writing-so-fast-it-makes-your-head-spin/

Percona: What are the workloads and database environments that are best suited for a MongoRocks deployment? Do you see and expansion of the solution to encompass other scenarios?

Igor: Generally speaking, MongoRocks should compress really well. Over the years of using LSM engines, we learned that its compression rates are hard to beat. The difference can sometimes be substantial. For example, many benchmarks of MyRocks, which is a MySQL with RocksDB storage engines, have shown that compressed InnoDB uses two times as much space as compressed RocksDB. With better compression, more of your data fits in memory, which could also improve read latencies and lower the stress on storage media. However, this is a tricky question to answer generally. It really depends on the metrics you care about. One great thing about Mongo and different storage engines is that the replication format is the same across all of them, so it’s simple to try it out and see how it performs under your workload. You can just add an additional node in your replica set that’s using RocksDB and monitor the metric you care about on that node.

Percona: What are the unique database requirements at Facebook that keep you awake at night? What would you most like to see feature-wise in MongoDB in the near future (or any database technology)?

Igor: One of the most exciting database projects that we’re working on at Facebook is MyRocks, which I mentioned previously. Currently, we use MySQL with InnoDB to store our Facebook graph and we are experimenting with replacing that with MyRocks. The main motivation behind the project is 2x better compression rates, but we also see better performance in some areas. If you’re attending Percona Live Europe I encourage you to attend either Mark Callaghan’s talk on MyRocks, or Yoshinori’s 3-hour tutorial to learn more.

Percona: What are looking forward to the most at Percona Live Europe this year?

Igor: The best part of attending conferences is the people. I am looking forward to seeing old friends and meeting new ones. If you like to talk storage engines, hit me up!

You can read more about Igor’s thoughts on MongoRocks at his twitter feed.

Want to find out more about Igor, Facebook and MongoRocks? Register for Percona Live Europe 2016, and come see his talk Everything you wanted to know about MongoRocks.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

by Dave Avery at September 07, 2016 05:47 PM

Get MySQL Passwords in Plain Text from .mylogin.cnf

MySQL Passwords

MySQL PasswordsThis post will tell you how to get MySQL passwords in plain text using the .mylogin.cnf file.

Since MySQL 5.6.6, it became possible to store MySQL credentials in an encrypted login path file named .mylogin.cnf, using the mysql_config_editor tool. This is better than in plain text anyway.

What if I need to read this password in plain text?

Perhaps because I didn’t save it? It might be that I don’t need it for long (as I can reset it), but it’s important that I get it. 😎

Unfortunately (or intentionally),

mysql_config_editor
 doesn’t allow it.

[root@db01 ~]# cat /root/.mylogin.cnf
????uUd????ٞN??3k??ǘ);??Ѻ0
                         ?'?(??W.???Xܽ<'?C???ha?$
??
r(?q`?+[root@db01 ~]#
[root@db01 ~]#
[root@db01 ~]# mysql_config_editor print --all
[client]
user = root
password = *****
[root@db01 ~]#

I wrote this blog post because I just faced this issue. I needed to get the password out of there. Surprisingly, it is simpler than I thought. While looking for an answer I found that some people created scripts to decrypt it (as it uses the AES-128 ECB algorithm), sometimes getting the MySQL code source or simply using a scripting language.

However, it turns to be very simple.

mysql_config_editor
 does not provide this option, but
my_print_defaults
 does!

[root@db01 ~]# my_print_defaults -s client
--user=root
--password=Xb_aOh1-R33_,_wPql68
[root@db01 ~]#

my_print_defaults
 is a standard tool in the MySQL server package. Keep your passwords safe!

by Roman Vynar at September 07, 2016 04:25 PM

Jean-Jerome Schmidt

Press Release: Severalnines and WooServers bring ClusterControl to web hosting

New partnership helps start-ups challenge Google, Amazon and Microsoft

Stockholm, Sweden and anywhere else in the world - 07 September 2016 - Severalnines, the provider of database automation and management software for open source databases, today announced its latest partnership with WooServers. WooServers is a web hosting platform, used by 5,500 businesses, such as WhiteSharkMedia and SwiftServe to host their websites and applications.

WooServers will make available, in this partnership with Severalnines, a managed service that includes comprehensive infrastructure automation and management of MySQL-based database clusters. The service is available on WooServers data centers, as well as on Amazon Web Services and Microsoft Azure. The main target group is online businesses who rely on solid IT infrastructure, but wish to avoid the operational heavy lifting of managing high availability databases across data centers, or even across different cloud providers.

The Severalnines and WooServers’ partnership began in August 2015. WooServers felt there was a gap in the market for providing managed database services across different cloud providers. A large portion of their own clients were using more than one cloud provider to take advantage of features offered by one provider over another. However, without a distributed database infrastructure, it is hard to be cloud agnostic. Ultimately, where the data is determines where the service or application resides. And the larger the data, the harder it is to move.

Businesses can now quickly be up and running with a distributed database across the cloud services of their choice, knowing the service is managed by hosting and database experts. It is powered by MySQL clusters, with ClusterControl to automate deployment, and operational tasks such as configuration, topology changes, patching, backups, and failure recovery.

Aleksey Krasnov, co-founder of WooServers, said, “Cloud providers are trying hard to differentiate from each other, and a hyper competitive market is good news for clients. But there is very little in terms of cross-cloud compatibility, especially for database services. Being tied to a particular service may not be optimal in the long run, because if the costs escalate and you want to move to a more cost-effective vendor, you’re in for a big surprise.

We provide choice and flexibility to our clients, and our service is cloud agnostic. Since we’re using ClusterControl, it means you could even bring the data in-house and run on your own infrastructure.“

Vinay Joosery, Severalnines CEO, said, “Operational management of databases is not something you’d want your developers to spend too much time on, which makes a managed service a very logical choice. However, yielding too much control over your data to any single cloud vendor is scary. WooServers provides a service that allows businesses to benefit from cloud economics, while keeping control over their data.”

One customer who is relying on this partnership is digital advertising services provider Plexiads. After its launch in 2015, investing in IT development was a priority for Plexiads, to ensure they can compete with Google Adsense by delivering the highest quality service to its customers all over the world, especially in Africa and the Middle East. A highly available database that is resistant to outages, is vital to keep customers’ advertising running and capitalise on every revenue opportunity.

Abdelkhalek Baou, Co-CEO and Founder of Plexiads, said: “Our products challenge web giants such as Google Adsense and Adwords, and it’s important to us to keep customers satisfied in a market which is worth $60 billion. Data is a critical part of our business, and we were looking for a partner to help us build and manage a state of the art data tier for our applications. We have been very happy with WooServers and Severalnines, and feel confident we will be able to expand the infrastructure as the business grows.”

About Severalnines

Severalnines provides automation and management software for database clusters. We help companies deploy their databases in any environment, and manage all operational aspects to achieve high-scale availability.

Severalnines' products are used by developers and administrators of all skills levels to provide the full 'deploy, manage, monitor, scale' database cycle, thus freeing them from the complexity and learning curves that are typically associated with highly available database clusters. The company has enabled over 8,000 deployments to date via its popular online database configurator. Currently counting BT, Orange, Cisco, CNRS, Technicolour, AVG, Ping Identity and Paytrail as customers. Severalnines is a private company headquartered in Stockholm, Sweden with offices in Singapore and Tokyo, Japan. To see who is using Severalnines today visit, http://www.severalnines.com/company.

About WooServers

Since 2009 WooServers offers web hosting, virtual and dedicated servers with full management for the price of unmanaged solutions of its competitors. Serving more than 5500 clients WooServers has always strived to take over all system administration hassle from the client and provide hands-on support, unmatched in the industry.

During the past 3 years WooServers made a considerable expansion into the SMB client segment in order to utilize its infrastructure and DB management experience and offer custom solutions to clients with more complicated requirements. As a result, a number of new products were introduced, such Microsoft Azure Management and High-Availability Database Clusters in partnership with Severalnines. 

by Severalnines at September 07, 2016 02:35 PM

September 06, 2016

Peter Zaitsev

MyRocks Docker images

MyRocks Docker images

homepage-docker-logoIn this post, I’ll point you to MyRocks Docker images with binaries, allowing you to install and play with the software.

During the @Scale conference, Facebook announced that MyRocks is mature enough that it has been installed on 5% of Facebook’s MySQL slaves. This has saved 50% of the space on these slaves, which allows them to decrease the number of servers by half. Check out the announcement here:  https://code.facebook.com/posts/190251048047090/myrocks-a-space-and-write-optimized-mysql-database/

Those are pretty impressive numbers, so I decided to take a serious look at MyRocks. The biggest showstopper is usually binary availability, since Facebook only provides the source code: https://github.com/facebook/mysql-5.6.

You can get the image from https://hub.docker.com/r/perconalab/myrocks/.

To start MyRocks:

docker run -d --name myr -P  perconalab/myrocks

To access it, use a regular MySQL client:

mysql -h127.0.0.1

From there you should see RocksDB installed:

show engines;
+------------+---------+----------------------------------------------------------------+--------------+------+------------+
| Engine | Support | Comment | Transactions | XA | Savepoints |
+------------+---------+----------------------------------------------------------------+--------------+------+------------+
| ROCKSDB | DEFAULT | RocksDB storage engine | YES | YES | YES |

I hope it makes easier to start experimenting with MyRocks!

by Vadim Tkachenko at September 06, 2016 08:28 PM

MongoDB at Percona Live Europe

MongoDB at Percona Live Europe

MongoDB at Percona Live EuropeThis year, you will find a great deal about MongoDB at Percona Live Europe.

As we continue to work on growing the independent MongoDB ecosystem, this year’s Percona Live Europe in Amsterdam includes many talks about MongoDB. If your company uses MongoDB technologies, is focused exclusively on developing with MongoDB or MongoDB operations, or is just evaluating MongoDB, attending Percona Live Europe will prove a valuable experience.  

As always with Percona Live conferences, the focus is squarely on the technical content — not sales pitches. We encourage our speakers to tell the truth: the good, the bad and the ugly. There is never a “silver bullet” when it comes to technology — only tradeoffs between different solution options.

As someone who has worked in database operations for more than 15 years, I recognize and respect the value of “negative information.” I like knowing what does not work, what you should not do and where trouble lies. Negative information often proves more valuable than knowing how great the features of a specific technology work — especially since the product’s marketing team tends to highlight those very well (and they seldom require independent coverage).

For MongoDB at this year’s Percona Live Europe:
  • We have talks about MongoRocks, a RocksDB powered storage engine for MongoDB — the one you absolutely need to know about if you’re looking to run the most efficient MongoDB deployment at scale!  
  • We will cover MongoDB Backups best practices, as well as several talks about MongoDB monitoring and management  (1, 2, 3) — all of them with MongoDB Community Edition and Percona Server for MongoDB (so they don’t require a MongoDB Enterprise subscription).

There will also be a number of talks about how MongoDB interfaces with other technologies. We show how ToroDB can use the MongoDB protocol while storing data in a relational database (and why that might be a good idea), we contrast and compare MySQL and MongoDB Geospatial features, and examine MongoDB from MySQL DBA point of view.

We also how to use Apache Spark to unify data from MongoDB, MySQL, and Redis, and what are generally the best practices for choosing databases for different application needs.

Finally, if you’re just starting with MongoDB and would like a jump start before attending more detailed MongoDB talks, we’ve got a full day MongoDB 101 tutorial for you.

Join us for the full conference, or register for just one day if that is all your schedule allows. But come to Percona Live Europe in Amsterdam on October 3-5 to get the best and latest MongoDB information.

by Peter Zaitsev at September 06, 2016 03:28 PM

Jean-Jerome Schmidt

ClusterControl 1.3.2 Released with Key MongoDB Features - Sharded Deployments, Cluster-Consistent Backups, Advisors and more

The Severalnines team is pleased to announce the release of ClusterControl 1.3.2.

This release contains new features, such as deploying MongoDB sharded clusters and scheduling cluster-consistent backups, MongoDB Advisors, a new alarm viewer and new deployment wizard for MySQL, MongoDB & PostgreSQL, along with performance improvements and bug fixes.

Highlights

  • For MongoDB
    • Deploy or add existing MongoDB sharded clusters
    • Support for Percona consistent MongoDB backup
    • Manage MongoDB configurations
    • MongoDB Advisors
    • New overview page for sharded clusters and performance graphs
  • For MySQL, MongoDB & PostgreSQL
    • New Alarm Viewer
    • New Deployment Wizard

For more details and resources:

Deploy or add existing MongoDB sharded clusters

Not only can users now deploy MongoDB sharded clusters, but adding your existing sharded cluster to ClusterControl is as easy as adding a replica set: all shard routers in the cluster need to be specified with its credentials, and ClusterControl will automatically discover all shards and replica sets in the cluster. Supports Percona MongoDB and MongoDB Inc v3.2.

Manage MongoDB configurations

MongoDB configuration management includes functionality such as: change the configuration, import configurations for all nodes and define/alter templates. Users can immediately change the whole configuration file and write this configuration back to the database node. With this latest release of ClusterControl, users can manage MongoDB configurations even more intuitively than before.

New overview page for sharded clusters and performance graphs

The MongoDB stats and performance overview can be found under the Performance tab of your ClusterControl instance. Mongo Stats is an overview of the output of mongostat and the Performance overview gives a good graphical overview of the MongoDB opcounters. It now includes a per replicaSet/Config Server/router view for sharded clusters, alongside performance graphs.

Support for Cluster Consistent MongoDB backup

ClusterControl now supports Percona’s Consistent MongoDB backup, if installed on the ClusterControl controller node. Percona’s Consistent MongoDB backup is able to create a consistent cluster backup across many separate shards. It auto-discovers healthy members for backup by considering replication lag, replication 'priority' and by preferring 'hidden' members.

New Alarm Viewer

The enhanced viewer gives the user a better overview of events, especially if you have multiple clusters. Alarms and jobs for all clusters are now consolidated in a single view with a timeline of each event/alarm. Click on each event name to view related logs to the event/alarm and take the relevant actions required to avoid potential issues.

New Deployment Wizard

It is now possible to create entire master-slave setups in one go via our new deployment wizard. In previous versions, one had to first create a master, and afterwards, add slaves to it. You can now do it all in one go. The wizard supports MySQL Replication, MySQL Galera, MySQL/NDB, MongoDB ReplicaSet, MongoDB Shards and PostgreSQL.

There is a bunch of other improvements that we have not mentioned here. You can find all details in the ChangeLog.

We encourage you to test this latest release and provide us with your feedback. If you’d like a demo, feel free to request one.

With over 8,000 users to date, ClusterControl is the leading, platform independent automation and management solution for MySQL, MariaDB, Percona, MongoDB and PostgreSQL.

Thank you for your ongoing support, and happy clustering!

For additional tips & tricks, follow our blog: http://www.severalnines.com/blog/.

by Severalnines at September 06, 2016 12:27 PM

Shlomi Noach

gh-ost 1.0.17: Hooks, Sub-second lag control, Amazon RDS and more

gh-ost version 1.0.17 is now released, with various additions and fixes. Here are some notes of interest:

Hooks

gh-ost now supports hooks. These are your own executables that gh-ost will invoke at particular points of interest (validation pass, about to cut-over, success, failure, status, etc.)

gh-ost will set various environment variables for your executables to pick up, passing along such information as migrated/ghost table name, elapsed time, processed rows, migrated host etc.

Sub-second lag control

At GitHub we're very strict about replication lag. We keep it well under 1 second at most times. gh-ost can now identify sub-second lag on replicas (well, you need to supply with the right query). Our current production migrations are set by default with --max-lag-millis=500 or less, and our most intensive migrations keep replication lag well below 1sec or even below 500ms

No SUPER

The SUPER privilege is required to set global binlog_format='ROW' and for STOP SLAVE; START SLAVE;

If you know your replica has RBR, you can pass --assume-rbr and skips those steps.

RDS

Hooks + No Super = RDS, as seems to be the case. For --test-on-replica you will need to supply your own gh-ost-on-stop-replication hook, to stop your RDS replica at cut-over phase. See this tracking issue

master-master

While active-active are still not supported, you now have greater control over master-master topologies by being able to explicitly pick your master (as gh-ost arbitrarily picks one of the co-masters). Do so by passing --assume-master-host. See cheatsheet.

tungsten replicator

Similarly, gh-ost cannot crawl your tungsten topology, and you are able to specify --tungsten --assume-master-host=the.master.com. See cheatsheet.

Concurrent-rowcount

--exact-rowcount is awesomeness, keeping quite accurate estimate of progress. With --concurrent-rowcount we begin migration with a rough estimate, and execute select count(*) from your_table in parallel, updating our estimate later on throughout the migration

Stricter, safer

gh-ost works in STRICT_ALL_TABLES mode, meaning it would fail rather than set the wrong value to a column.

In addition to unit-testing and production continuous test, a set of local tests is growing, hopefully to run as CI tests later on.

Fixed problems

Fixed time_zone related bug, high unsigned values bug; added strict check for triggers, relaxed config file parsing, and more. Thank you to community contributors for PRs, from ipv6 to typos!

Known issues

Issues coming and going at all times -- thank you for reporting Issues!

We have a confirmed bug with non-UTF charsets at this time. Some other minor issues and feature requests are open -- we'll take them as we go along.

Feedback requests

We are not testing gh-ost on RDS ourselves. We appreciate community feedback on this tracking issue.

We are not testing gh-ost on Galera/XtraDB cluster ourselves. We appreciate community feedback on this tracking issue.

We value submitted Issues and questions.

Speaking

We will be presenting gh-ost in the next month:

Hope to see you there, and thank you again to all contributors!

by shlomi at September 06, 2016 09:44 AM

September 05, 2016

Jean-Jerome Schmidt

How to set up read-write split in Galera Cluster using ProxySQL

Edited on Sep 12, 2016 to correct the description of how ProxySQL handles session variables. Many thanks to Francisco Miguel for pointing this out.


ProxySQL is becoming more and more popular as SQL-aware load balancer for MySQL and MariaDB. In previous blog posts, we covered installation of ProxySQL and its configuration in a MySQL replication environment. We’ve covered how to set up ProxySQL to perform failovers executed from ClusterControl. At that time, Galera support in ProxySQL was a bit limited - you could configure Galera Cluster and split traffic across all nodes but there was no easy way to implement read-write split of your traffic. The only way to do that was to create a daemon which would monitor Galera state and update weights of backend servers defined in ProxySQL - a much more complex task than to write a small bash script.

In one of the recent ProxySQL releases, a very important feature was added - a scheduler, which allows to execute external scripts from within ProxySQL even as often as every millisecond (well, as long as your script can execute within this time frame). This feature creates an opportunity to extend ProxySQL and implement setups which were not possible to build easily in the past due to too low granularity of the cron schedule. In this blog post, we will show you how to take advantage of this new feature and create a Galera Cluster with read-write split performed by ProxySQL.

First, we need to install and start ProxySQL:

[root@ip-172-30-4-215 ~]# wget https://github.com/sysown/proxysql/releases/download/v1.2.1/proxysql-1.2.1-1-centos7.x86_64.rpm

[root@ip-172-30-4-215 ~]# rpm -i proxysql-1.2.1-1-centos7.x86_64.rpm
[root@ip-172-30-4-215 ~]# service proxysql start
Starting ProxySQL: DONE!

Next, we need to download a script which we will use to monitor Galera status. Currently it has to be downloaded separately but in the next release of ProxySQL it should be included in the rpm. The script needs to be located in /var/lib/proxysql.

[root@ip-172-30-4-215 ~]# wget https://raw.githubusercontent.com/sysown/proxysql/master/tools/proxysql_galera_checker.sh

[root@ip-172-30-4-215 ~]# mv proxysql_galera_checker.sh /var/lib/proxysql/
[root@ip-172-30-4-215 ~]# chmod u+x /var/lib/proxysql/proxysql_galera_checker.sh

If you are not familiar with this script, you can check what arguments it accepts by running:

[root@ip-172-30-4-215 ~]# /var/lib/proxysql/proxysql_galera_checker.sh
Usage: /var/lib/proxysql/proxysql_galera_checker.sh <hostgroup_id write> [hostgroup_id read] [number writers] [writers are readers 0|1} [log_file]

As we can see, we need to pass couple of arguments - hostgroups for writers and readers, number of writers which should be active at the same time. We also need to pass information if writers can be used as readers and, finally, path to a log file.

Next, we need to connect to ProxySQL’s admin interface. For that you need to know credentials - you can find them in a configuration file, typically located in /etc/proxysql.cnf:

admin_variables=
{
        admin_credentials="admin:admin"
        mysql_ifaces="127.0.0.1:6032;/tmp/proxysql_admin.sock"
#       refresh_interval=2000
#       debug=true
}

Knowing the credentials and interfaces on which ProxySQL listens, we can connect to the admin interface and begin configuration.

[root@ip-172-30-4-215 ~]# mysql -P6032 -uadmin -padmin -h 127.0.0.1

First, we need to fill mysql_servers table with information about our Galera nodes. We will add them twice, to two different hostgroups. One hostgroup (with hostgroup_id of 0) will handle writes while the second hostgroup (with hostgroup_id of 1) will handle reads.

MySQL [(none)]> INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (0, '172.30.4.238', 3306), (0, '172.30.4.184', 3306), (0, '172.30.4.67', 3306);
Query OK, 3 rows affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (1, '172.30.4.238', 3306), (1, '172.30.4.184', 3306), (1, '172.30.4.67', 3306);
Query OK, 3 rows affected (0.00 sec)

Next, we need to add information about users which will be used by the application. We used a plain text password here but ProxySQL accepts also hashed passwords in MySQL format.

MySQL [(none)]> INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('sbtest', 'sbtest', 1, 0);
Query OK, 1 row affected (0.00 sec)

What’s important to keep in mind is the default_hostgroup setting - we set it to ‘0’ which means that, unless one of query rules say different, all queries will be sent to the hostgroup 0 - our writers.

At this point we need to define query rules which will handle read/write split. First, we want to match all SELECT queries:

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*', 1, 0);
Query OK, 1 row affected (0.00 sec)

It is important to make sure you get the regex correctly. It is also crucial to note that we set ‘apply’ column to ‘0’. This means that our rule won’t be the final one - a query, even if it matches the regex, will be tested against next rule in the chain. You can see why we’ve done that when you look at our second rule:

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*FOR UPDATE', 0, 1);
Query OK, 1 row affected (0.00 sec)

We are looking for SELECT … FOR UPDATE queries, that’s why we couldn’t just finish checking our SELECT queries on the first rule. SELECT … FOR UPDATE should be routed to our write hostgroup, where UPDATE will happen.

Those settings will work fine if autocommit is enabled and no explicit transactions are used. If your application uses transactions, one of the methods to make them work safely in ProxySQL is to use the following set of queries:

SET autocommit=0;
BEGIN;
...

The transaction is created and it will stick to the host where it was opened. You also need to have a query rule for BEGIN, which would route it to the hostgroup for writers - in our case we leverage the fact that, by default, all queries executed as ‘sbtest’ user are routed to writers’ hostgroup (‘0’) so there’s no need to add anything.

Second method would be to enable persistent transactions for our user (transaction_persistent column in mysql_users table should be set to ‘1’).

ProxySQL’s handling of other SET statements and user-defined variables is another thing we’d like to discuss a bit here. ProxySQL works on two levels of routing. First - query rules. You need to make sure all your queries are routed accordingly to your needs. Then, connection mutiplexing - even when routed to the same host, every query you issue may in fact use a different connection to the backend. This makes things hard for session variables. Luckily, ProxySQL treats all queries containing ‘@’ character in a special way - once it detects it, it disables connection multiplexing for the duration of that session - thanks to that, we don’t have to be worried that the next query won’t know a thing about our session variable.

The only thing we need to make sure of is that we end up in the correct hostgroup before disabling connection multiplexing. To cover all cases, the ideal hostgroup in our setup would be the one with writers. This would require slight change in the way we set our query rules (you may require to run ‘DELETE FROM mysql_query_rules’ if you already added the query rules we mentioned earlier).

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '.*@.*', 0, 1);
Query OK, 1 row affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*', 1, 0);
Query OK, 1 row affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*FOR UPDATE', 0, 1);
Query OK, 1 row affected (0.00 sec)

Those two cases could become a problem in our setup but as long as you are not affected by them (or if you used the proposed workarounds), we can proceed further with configuration. We still need to setup our script to be executed from ProxySQL:

MySQL [(none)]> INSERT INTO scheduler (id, active, interval_ms, filename, arg1, arg2, arg3, arg4, arg5) VALUES (1, 1, 1000, '/var/lib/proxysql/proxysql_galera_checker.sh', 0, 1, 1, 1, '/var/lib/proxysql/proxysql_galera_checker.log');
Query OK, 1 row affected (0.01 sec)

Additionally, because of the way how Galera handles dropped nodes, we want to increase the number of attempts that ProxySQL will make before it decides a host cannot be reached.

MySQL [(none)]> SET mysql-query_retries_on_failure=10;
Query OK, 1 row affected (0.00 sec)

Finally, we need to apply all changes we made to the runtime configuration and save them to disk.

MySQL [(none)]> LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK; LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK; LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; LOAD SCHEDULER TO RUNTIME; SAVE SCHEDULER TO DISK; LOAD MYSQL VARIABLES TO RUNTIME; SAVE MYSQL VARIABLES TO DISK;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 64 rows affected (0.05 sec)

Ok, let’s see how things work together. First, verify that our script works by looking at /var/lib/proxysql/proxysql_galera_checker.log:

Fri Sep  2 21:43:15 UTC 2016 Check server 0:172.30.4.184:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Check server 0:172.30.4.238:3306 , status OFFLINE_SOFT , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Changing server 0:172.30.4.238:3306 to status ONLINE
Fri Sep  2 21:43:15 UTC 2016 Check server 0:172.30.4.67:3306 , status OFFLINE_SOFT , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Changing server 0:172.30.4.67:3306 to status ONLINE
Fri Sep  2 21:43:15 UTC 2016 Check server 1:172.30.4.184:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Check server 1:172.30.4.238:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:16 UTC 2016 Check server 1:172.30.4.67:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:16 UTC 2016 Number of writers online: 3 : hostgroup: 0
Fri Sep  2 21:43:16 UTC 2016 Number of writers reached, disabling extra write server 0:172.30.4.238:3306 to status OFFLINE_SOFT
Fri Sep  2 21:43:16 UTC 2016 Number of writers reached, disabling extra write server 0:172.30.4.67:3306 to status OFFLINE_SOFT
Fri Sep  2 21:43:16 UTC 2016 Enabling config

Looks ok. Next we can check mysql_servers table:

MySQL [(none)]> select hostgroup_id, hostname, status from mysql_servers;
+--------------+--------------+--------------+
| hostgroup_id | hostname     | status       |
+--------------+--------------+--------------+
| 0            | 172.30.4.238 | OFFLINE_SOFT |
| 0            | 172.30.4.184 | ONLINE       |
| 0            | 172.30.4.67  | OFFLINE_SOFT |
| 1            | 172.30.4.238 | ONLINE       |
| 1            | 172.30.4.184 | ONLINE       |
| 1            | 172.30.4.67  | ONLINE       |
+--------------+--------------+--------------+
6 rows in set (0.00 sec)

Again, everything looks as expected - one host is taking writes (172.30.4.184), all three are handling reads. Let’s start sysbench to generate some traffic and then we can check how ProxySQL will handle failure of the writer host.

[root@ip-172-30-4-215 ~]# while true ; do sysbench --test=/root/sysbench/sysbench/tests/db/oltp.lua --num-threads=6 --max-requests=0 --max-time=0 --mysql-host=172.30.4.215 --mysql-user=sbtest --mysql-password=sbtest --mysql-port=6033 --oltp-tables-count=32 --report-interval=1 --oltp-skip-trx=on --oltp-read-only=off --oltp-table-size=100000  run ;done

We are going to simulate a crash by killing the mysqld process on host 172.30.4.184. This is what you’ll see on the application side:

[  45s] threads: 6, tps: 0.00, reads: 4891.00, writes: 1398.00, response time: 23.67ms (95%), errors: 0.00, reconnects:  0.00
[  46s] threads: 6, tps: 0.00, reads: 4973.00, writes: 1425.00, response time: 25.39ms (95%), errors: 0.00, reconnects:  0.00
[  47s] threads: 6, tps: 0.00, reads: 5057.99, writes: 1439.00, response time: 22.23ms (95%), errors: 0.00, reconnects:  0.00
[  48s] threads: 6, tps: 0.00, reads: 2743.96, writes: 774.99, response time: 23.26ms (95%), errors: 0.00, reconnects:  0.00
[  49s] threads: 6, tps: 0.00, reads: 0.00, writes: 1.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  50s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  51s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  52s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  53s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  54s] threads: 6, tps: 0.00, reads: 1235.02, writes: 354.01, response time: 6134.76ms (95%), errors: 0.00, reconnects:  0.00
[  55s] threads: 6, tps: 0.00, reads: 5067.98, writes: 1459.00, response time: 24.95ms (95%), errors: 0.00, reconnects:  0.00
[  56s] threads: 6, tps: 0.00, reads: 5131.00, writes: 1458.00, response time: 22.07ms (95%), errors: 0.00, reconnects:  0.00
[  57s] threads: 6, tps: 0.00, reads: 4936.02, writes: 1414.00, response time: 22.37ms (95%), errors: 0.00, reconnects:  0.00
[  58s] threads: 6, tps: 0.00, reads: 4929.99, writes: 1404.00, response time: 24.79ms (95%), errors: 0.00, reconnects:  0.00

There’s a ~5 seconds break but otherwise, no error was reported. Of course, your mileage may vary - all depends on Galera settings and your application. Such feat might not be possible if you use transactions in your application.

To summarize, we showed you how to configure read-write split in Galera Cluster using ProxySQL. There are a couple of limitations due to the way the proxy works, but as long as none of them are a blocker, you can use it and benefit from other ProxySQL features like caching or query rewriting. Please also keep in mind that the script we used for setting up read-write split is just an example which comes from ProxySQL. If you’d like it to cover more complex cases, you can easily write one tailored to your needs.

by Severalnines at September 05, 2016 02:02 PM

MariaDB AB

Real-time Data Streaming to Kafka with MaxScale CDC

Markus Mäkelä

The new Change Data Capture (CDC) protocol modules in MaxScale 2.0.0 can be used to convert binlog events into easy to stream data. These streams can be guided to other systems for further processing and in-depth analysis.

In this article, we set up a simple Kafka broker on CentOS 7 and publish changes in the database as JSON with the help of the new CDC protocol in MaxScale.

The tools we'll be using require Python 3, the pip Python package manager and the kafka-python package. You can install them with the following commands on CentOS 7.

sudo yum install epel-release
sudo yum install python34
curl https://bootstrap.pypa.io/get-pip.py|sudo python3.4
sudo pip3 install kafka-python

Note: CentOS 7 Python 3 package is broken and requires manual installation of the pip3 packager (seen in step three).

The environment consists of one MariaDB 10.0 master, MaxScale and one Kafka broker inside a docker container. We'll be running all the components (mysqld, maxscale, docker) on the same machine to make the setup and testing easier.

Configuring the Database

We start off by configuring the master server with row based replication by adding the following lines to its configuration file.

log-bin=binlog
binlog_format=row
binlog_row_image=full

With row based replication, all replication events contain the modified data. This information can be used to reconstruct each change in data as it happens. The MaxScale CDC protocol router, avrorouter, reads this information from the binary logs and converts it into the easy-to-process and compact Avro format.

Configuring MaxScale

The next step is to configure MaxScale. We add two services to the configuration, one for reading the binary logs from the master and another for converting them into JSON and streaming them to the clients.

The first service, named Binlog Service, registers as a slave to the MariaDB master and starts to read replication events. These events are stored to a local cache where the second service, named CDC Service, converts them into Avro format files (we'll configure that later).

Here's the configuration entry for the Binlog Service.

[Binlog Service]
type=service
router=binlogrouter
router_options=server-id=4000,binlogdir=/var/lib/maxscale,mariadb10-compatibility=1
user=maxuser
passwd=maxpwd

[Binlog Listener]
type=listener
service=Binlog Service
protocol=MySQLClient
port=3306

The maxuser:maxpwd credentials are used to connect to the master server and query for details about database users. Read the MaxScale Tutorial on how to create them.

The router_options parameter is the main way the binlogrouter module is configured. The server-id option is the server ID given to the master, binlogdir is the directory where the binlog files are stored and mariadb10-compatibility enables MariaDB 10.0 support in the binlogrouter. For more details on the binlogrouter options, read the Binlogrouter Documentation.

We also configured a MySQL protocol listener so that we can start the replication with the mysql cli.

After configuring the Binlog Service, we'll set up the CDC router and listeners.

[CDC Service]
type=service
router=avrorouter
source=Binlog Service
router_options=filestem=binlog
user=maxuser
passwd=maxpwd

[CDC Listener]
type=listener
service=CDC Service
protocol=CDC
port=4001

The configuration for the CDC Service is very simple. The only parameter we need to configure is source, which tells us which service we are going to use as the source for binary logs, and the router_option filestem, which tells the prefix of the binary log files. The CDC Service will read the Binlog Service configuration and gather all the required information from there and start the conversion process. For more details on how to fine-tune the avrorouter for optimal conversion speed, read the Avrorouter Documentation.

Setting up Kafka in Docker

After we have MaxScale configured, we'll start the Kafka broker inside Docker. We'll use the spotify/kafka image to set up a quick single node Kafka broker on the same machine MaxScale is running on.

sudo docker run -d --name kafka -p 2181:2181 -p 9092:9092 --env ADVERTISED_HOST=192.168.0.100 --env ADVERTISED_PORT=9092 spotify/kafka

This command will start the Kafka broker inside a Docker container. The container will contain command line utilities that we can use to read the queues messages from the broker to confirm our setup is working.

Next we'll heave to create a topic where we can publish messages. We'll use the packaged console utilities in the docker container to create it and we'll call it CDC_DataStream.

sudo docker exec -ti kafka /opt/kafka_2.11-0.8.2.1/bin/kafka-topics.sh --create --zookeeper 127.0.0.1:2181 --topic CDC_DataStream --replication-factor 1 --partitions 1

We are using the docker container version of Kafka to simplify the setup and make this easy to reproduce. Read the Kafka Quickstart guide on information how to set up your own Kafka cluster and for more details on the tools used inside the container.

Starting Up MaxScale

The final step is to start the replication in MaxScale and stream events into the Kafka broker using the cdc and cdc_kafka_producer tools included in the MaxScale installation.

After starting MaxScale we connect to the Binlog Service on port 3306 and start replication.

CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=3306, MASTER_USER='maxuser', MASTER_PASSWORD='maxpwd', MASTER_LOG_POS=4, MASTER_LOG_FILE='binlog.000001';
START SLAVE;

The CDC service allows us to query it for changes in a specific table. For the purpose of this article, we've created an extremely simple test table using the following statement and populated it with some data.

CREATE TABLE test.t1 (id INT);
INSERT INTO test.t1 VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);

After we have created and populated the table on the master, the table and the changes in it will be replicated to MaxScale. MaxScale will cache the binlogs locally and convert them into the Avro format for quick streaming. After a short while, MaxScale should've read and converted enough data so that we can start querying it using the cdc tool.

Next, we'll query the CDC Service for change records on the test.t1 table. Since we didn't configure any extra users, we'll use the service user we configured for the service.

cdc -u maxuser -pmaxpwd -h 127.0.0.1 -P 4001 test.t1

We'll get a continuous stream of JSON objects printed into the standard output which we can utilize as the source for out Kafka streamer, the cdc_kafka_producer utility. All we have to do is to pipe the output of the cdc program into the cdc_kafka_producer to push it to the broker.

cdc -u maxuser -pmaxpwd -h 127.0.0.1 -P 4001 test.t1 | cdc_kafka_producer --kafka-broker=127.0.0.1:9092 --kafka-topic=CDC_DataStream

We send the changes to the broker listening on the port 9092 on the local host and publish them on the CDC_DataStream topic we created earlier. When we start the console consumer in another terminal, we'll see the events arriving as they are published on the broker.

[vagrant@maxscale ~]$ sudo docker exec -ti kafka /opt/kafka_2.11-0.8.2.1/bin/kafka-console-consumer.sh --zookeeper 127.0.0.1:2181 --topic CDC_DataStream
{"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "fields": [{"type": "int", "name": "domain"}, {"type": "int", "name": "server_id"}, {"type": "int", "name": "sequence"}, {"type": "int", "name": "event_number"}, {"type": "int", "name": "timestamp"}, {"type": {"symbols": ["insert", "update_before", "update_after", "delete"], "type": "enum", "name": "EVENT_TYPES"}, "name": "event_type"}, {"type": "int", "name": "id"}], "name": "ChangeRecord"}
{"domain": 0, "event_number": 1, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 1}
{"domain": 0, "event_number": 2, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 2}
{"domain": 0, "event_number": 3, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 3}
{"domain": 0, "event_number": 4, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 4}
{"domain": 0, "event_number": 5, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 5}
{"domain": 0, "event_number": 6, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 6}
{"domain": 0, "event_number": 7, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 7}
{"domain": 0, "event_number": 8, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 8}
{"domain": 0, "event_number": 9, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 9}

The first JSON object is the schema of the table and the objects following it are the actual changes. If we insert, delete or modify data on the database, we'll see the changes in JSON only seconds after they happen.

And that's it, we've successfully created a real-time stream representation of the database changes with the help of MaxScale CDC protocol and Kafka.

About the Author

Markus Mäkelä's picture

Markus Mäkelä is a Software Engineer working on MariaDB MaxScale. He graduated from Metropolia University of Applied Sciences in Helsinki, Finland.

by Markus Mäkelä at September 05, 2016 12:00 AM

September 02, 2016

Peter Zaitsev

InnoDB Troubleshooting: Q & A

InnoDB Troubleshooting

InnoDB TroubleshootingIn this blog, I will provide answers to the Q & A for the InnoDB Troubleshooting webinar.

First, I want to thank everybody for attending the August 11 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: What’s a good speed for buffer pool speed/size for maximum query performance?

A: I am sorry, I don’t quite understand the question. InnoDB buffer pool is an in-memory buffer. In an ideal case, your whole active dataset (rows that are accessed by application regularly) should be in the buffer pool. There is a good blog post by Peter Zaitsev describing how to find the best size for the buffer pool.

Q: Any maximum range for these InnoDB options?

A: I am again sorry, I only see questions after the webinar and don’t know which slide you were on when you asked about options. But generally speaking, the maximum ranges should be limited by hardware: the size of InnoDB buffer pool limited by the amount of physical memory you have, the size of

innodb_io_capacity
  limited by the number of IOPS which your disk can handle, and the number of concurrent threads limited by the number of CPU cores.

Q: On a AWS r3.4xlarge, 16 CPU, 119GB of RAM, EBS volumes, what innodb_thread_concurrency, innodb_read_io_threads, innodb_write_io_threads would you recommend? and innodb_read_io_capacity?

A:

innodb_thread_concurrency = 16, innodb_read_io_threads = 8, innodb_write_io_threads = 8, innodb_io_capacity
 — but it depends on the speed of your disks. As far as I know, AWS offers disks with different speeds. You should consult IOPS about what your disks can handle when setting
innodb_io_capacity
, and “Max IOPS” when setting 
innodb_io_capacity_max
.

Q: About InnoDB structures and parallelism: Are there InnoDB settings that can prevent or reduce latching (causes semaphore locks and shutdown after 600s) that occur trying to add an index object to memory but only DML queries on the primary key are running?

A: Unfortunately, semaphore locks for the

CREATE INDEX
 command are not avoidable. You only can affect other factors that speed up index creation. For example, how fast you write records to the disk or how many concurrent queries you run. Kill queries that are waiting for a lock too long. There is an old feature request asking to handle long semaphore waits gracefully. Consider clicking “Affects Me” button to bring it to the developers’ attention.

Q: How can we check these threads?

A: I assume you are asking about InnoDB threads? You can find information about running threads in

SHOW ENGINE INNODB STATUS
 :

--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 1; buffer pool: 0
529 OS file reads, 252 OS file writes, 251 OS fsyncs
0.74 reads/s, 16384 avg bytes/read, 7.97 writes/s, 7.94 fsyncs/s

And in the Performance Schema

THREADS
 table:

mysql> select thread_id, name, type from performance_schema.threads where name like '%innodb%';
+-----------+----------------------------------------+------------+
| thread_id | name                                   | type       |
+-----------+----------------------------------------+------------+
|         2 | thread/innodb/io_handler_thread        | BACKGROUND |
|         3 | thread/innodb/io_handler_thread        | BACKGROUND |
|         4 | thread/innodb/io_handler_thread        | BACKGROUND |
|         5 | thread/innodb/io_handler_thread        | BACKGROUND |
|         6 | thread/innodb/io_handler_thread        | BACKGROUND |
|         7 | thread/innodb/io_handler_thread        | BACKGROUND |
|         8 | thread/innodb/io_handler_thread        | BACKGROUND |
|         9 | thread/innodb/io_handler_thread        | BACKGROUND |
|        10 | thread/innodb/io_handler_thread        | BACKGROUND |
|        11 | thread/innodb/io_handler_thread        | BACKGROUND |
|        13 | thread/innodb/srv_lock_timeout_thread  | BACKGROUND |
|        14 | thread/innodb/srv_monitor_thread       | BACKGROUND |
|        15 | thread/innodb/srv_error_monitor_thread | BACKGROUND |
|        16 | thread/innodb/srv_master_thread        | BACKGROUND |
|        17 | thread/innodb/srv_purge_thread         | BACKGROUND |
|        18 | thread/innodb/page_cleaner_thread      | BACKGROUND |
|        19 | thread/innodb/lru_manager_thread       | BACKGROUND |
+-----------+----------------------------------------+------------+
17 rows in set (0.00 sec)

Q: Give brief on InnoDB thread is not same as connection thread.

A: You create a MySQL connection thread each time the client connects to the server. Generally, the lifetime of this thread is the same as the connection (I won’t discuss the thread cache and thread pool plugin here to avoid unnecessary complexity). This way, if you have 100 connections you have 100 connection threads. But not all of these threads do something. Some are actively querying MySQL, but others are sleeping. You can find the number of threads actively doing something if you examine the status variable

Threads_running
. InnoDB doesn’t create as many threads as connections to perform its job effectively. It creates fewer threads (ideally, it is same as the number of CPU cores). So, for example just 16 InnoDB threads can handle100 and more connection threads effectively.

Q: How can we delete bulk data in Percona XtraDB Cluster?  without affecting production? nearly 6 million records worth 40 GB size table

A: You can use the utility pt-archiver. It deletes rows in chunks. While your database will still have to handle all these writes, the option

--max-flow-ctl
  pauses a purge job if the cluster spent too much time pausing for flow control.

Q: Why do we sometimes get “–tc-heuristic-recover” message in error logs? Especially when we recover after a crash? What does this indicate? And should we commit or rollback?

A: This means you used two transactional engines that support XA in the same transaction, and mysqld crashed in the middle of the transaction. Now mysqld cannot determine which strategy to use when recovering transactions: either

COMMIT
 or
ROLLBACK
. Strangely, this option is documented as “not used”. It certainly is, however. Test case for bug #70860 proves it. I reported a documentation bug #82780.

Q: Which parameter controls the InnoDB thread count?

A: The main parameter is

innodb_thread_concurrency
. For fine tuning, use 
innodb_read_io_threads, innodb_write_io_threads, innodb_purge_threads, innodb_page_cleaners
. Q:

Q: At what frequency will the InnoDB status be dumped in a file by using innodb-status-file?

A: Approximately every 15 seconds, but it can vary slightly depending on the server load.

Q: I faced an issue that once disk got detached from running server due to some issue on AWS ec2. MySQL went to default mode. After MySQL stopped and started, we observed slave skipped some around 15 mins data. We got it by foreign key relationship issue. Can you please explain why it was skipped data in slave?

A: Amazon Aurora supports two kinds of replication: physical as implemented by Amazon (this is the default for replicas in the same region), and the regular asynchronous replication for cross-region replication. If you use the former, I cannot help you because this is a closed-source Amazon feature. You need to report a bug to Amazon. If you used the latter, this looks buggy too. According to my experience, it should not happen. With regular replication you need to check which transactions were applied (best if you use GTIDs, or at least the 

log-slave-updates
 option) and which were not. If you find a gap, report a bug at bugs.mysql.com.

Q: Can you explain more about adaptive hash index?

A: InnoDB stores its indexes on disks as a B-Tree. While B-Tree indexes are effective in general, some queries can take advantage of using much simpler hash indexes. While your server is in use, InnoDB analyzes the queries it is currently processing and builds an in-memory hash index inside the buffer pool (using the prefix of the B-Tree key). While adaptive hash index generally works well, “with some workloads, the speedup from hash index lookups greatly outweighs the extra work to monitor index lookups and maintain the hash index structure” Another issue with adaptive hash index is that until version 5.7.8, it was protected by a single latch — which could be a contention point under heavy workloads. Since 5.7.8, adaptive hash index can be partitioned. The number of parts is controlled by option 

innodb_adaptive_hash_index_parts
.

Save

by Sveta Smirnova at September 02, 2016 09:12 PM

MHA Quick Start Guide

MHA

high availabilityMHA (Master High Availability Manager and tools for MySQL) is one of the most important pieces of our managed services. When properly set up, it can check replication health, move writer and reader virtual IPs, perform failovers, and have its output constantly monitored by Nagios. Is it easy to deploy and follows the KISS (Keep It Simple, Stupid) philosophy that I love so much.

This blog post is a quick start guide to try it out and play with it in your own testing environment. I assume that you already know how to install software, deal with SSH keys and setup replication in MySQL. The post just covers MHA configuration.

Testing environment

Taken from /etc/hosts

192.168.1.116	mysql-server1
192.168.1.117   mysql-server2
192.168.1.118   mysql-server3
192.168.1.119   mha-manager

mysql-server1: Our master MySQL server with 5.6
mysql-server2: Slave server
mysql-server3: Slave server
mha-manager: The server monitors the replication and from where we manage MHA. The installation is also required to meet some Perl dependencies.

We just introduced some new concepts, the MHA Node and MHA Manager:

MHA Node

It is installed and runs on each MySQL server. This is the piece of software that it is invoked by the manager every time we want to do something, like for example a failover or a check.

MHA Manager

As explained before, this is our operations center. The manager monitors the services, replication, and includes several administrative command lines.

Pre-requisites

  • Replication must already be running. MHA manages replication and monitors it, but it is not a tool to deploy it. So MySQL and replication need to be running already.
  • All hosts should be able to connect to each other using public SSH keys.
  • All nodes need to be able to connect to each other’s MySQL servers.
  • All nodes should have the same replication user and password.
  • In the case of multi-master setups, only one writable node is allowed. All others need to be configured with read_only.
  • MySQL version has to be 5.0 or later.
  • Candidates for master failover should have binary log enabled. The replication user must exist there too.
  • Binary log filtering variables should be the same on all servers (replicate-wild%, binlog-do-db…).
  • Disable automatic relay-log purge and do it regularly from a cron task. You can use an MHA-included script called “purge_relay_logs”.

While that is a large list of requisites, I think that they are pretty standard and logical.

MHA installation

As explained before, the MHA Node needs to be installed on all the nodes. You can download it from this Google Drive link.

This post shows you how to install it using the source code, but there are RPM packages available. Deb too, but only for older versions. Use the installation method you prefer. This is how to compile it:

tar -xzf mha4mysql-node-0.57.tar.gz
perl Makefile.PL
make
make install

The commands included in the node package are save_binary_logs, filter_mysqlbinlog, purge_relay_logs, apply_diff_relay_logs. Mostly tools that the manager needs to call in order to perform a failover, while trying to minimize or avoid any data loss.

On the manager server, you need to install MHA Node plus MHA Manager. This is due to MHA Manager dependance on a Perl library that comes with MHA Node. The installation process is just the same.

Configuration

We only need one configuration file on the Manager node. The example below is a good starting point:

# cat /etc/app1.cnf
[server default]
# mysql user and password
user=root
password=supersecure
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
[server1]
hostname=mysql-server1
candidate_master=1
[server2]
hostname=mysql-server2
candidate_master=1
[server3]
hostname=mysql-server3
no_master=1

So pretty straightforward. It specifies that there are three servers, two that can be master and one that can’t be promoted to master.

Let’s check if we meet some of the pre-requisites. We are going to test if replication is working, can be monitored, and also if SSH connectivity works.

# masterha_check_ssh --conf=/etc/app1.cnf
[...]
[info] All SSH connection tests passed successfully.

It works. Now let’s check MySQL:

# masterha_check_repl --conf=/etc/app1.cnf
[...]
MySQL Replication Health is OK.

Start the manager and operations

Everything is setup, we meet the pre-requisites. We can start our manager:

# masterha_manager --remove_dead_master_conf --conf=/etc/app1.cnf
[...]
[info] Starting ping health check on mysql-server1(192.168.1.116:3306)..
[info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

The manager found our master and it is now actively monitoring it using a SELECT command. –remove_dead_master_conf tells the manager that if the master goes down, it must edit the config file and remove the master’s configuration from it after a successful failover. This avoids the “there is a dead slave” error when you restart the manager. All servers listed in the conf should be part of the replication and in good health, or the manager will refuse to work.

Automatic and manual failover

Good, everything is running as expected. What happens if the MySQL master dies!?!

[...]
[warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
[info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/log/mysql, up to mysql-bin.000002
[info] HealthCheck: SSH to mha-server1 is reachable.
[...]

First, it tries to connect by SSH to read the binary log and save it. MHA can apply the missing binary log events to the remaining slaves so they are up to date with all the before-failover info. Nice!

Theses different phases follow:

* Phase 1: Configuration Check Phase..
* Phase 2: Dead Master Shutdown Phase..
* Phase 3: Master Recovery Phase..
* Phase 3.1: Getting Latest Slaves Phase..
* Phase 3.2: Saving Dead Master's Binlog Phase..
* Phase 3.3: Determining New Master Phase..
[info] Finding the latest slave that has all relay logs for recovering other slaves..
[info] All slaves received relay logs to the same position. No need to resync each other.
[info] Starting master failover..
[info]
From:
mysql-server1(192.168.1.116:3306) (current master)
 +--mysql-server2(192.168.1.117:3306)
 +--mysql-server3(192.168.1.118:3306)
To:
mysql-server2(192.168.1.117:3306) (new master)
 +--mysql-server3(192.168.1.118:3306)
* Phase 3.3: New Master Diff Log Generation Phase..
* Phase 3.4: Master Log Apply Phase..
* Phase 4: Slaves Recovery Phase..
* Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
* Phase 4.2: Starting Parallel Slave Log Apply Phase..
* Phase 5: New master cleanup phase..

The phases are pretty self-explanatory. MHA tries to get all the data possible from the master’s binary log and slave’s relay log (the one that is more advanced) to avoid losing any data or promote a slave that it was far behind the master. So it tries to promote a slave with the most current data as possible. We see that server2 has been promoted to master, because in our configuration we specified that server3 shouldn’t be promoted.

After the failover, the manager service stops itself. If we check the config file, the failed server is not there anymore. Now the recovery is up to you. You need to get the old master back in the replication chain, then add it again to the config file and start the manager.

It is also possible to perform a manual failover (if, for example, you need to do some maintenance on the master server). To do that you need to:

  • Stop masterha_manager.
  • Run masterha_master_switch –master_state=alive –conf=/etc/app1.cnf. The line says that you want to switch the master, but the actual master is still alive, so no need to mark it as dead or remove it from the conf file.

And that’s it. Here is part of the output. It shows the tool making the decision on the new topology and asking the user for confirmation:

[info]
From:
mysql-server1(192.168.1.116:3306) (current master)
 +--mysql-server2(192.168.1.117:3306)
 +--mysql-server3(192.168.1.118:3306)
To:
mysql-server2(192.168.1.117:3306) (new master)
 +--mysql-server3(192.168.1.118:3306)
Starting master switch from mha-server1(192.168.1.116:3306) to mha-server2(192.168.1.117:3306)? (yes/NO): yes
[...]
[info] Switching master to mha-server2(192.168.1.117:3306) completed successfully.

You can also employ some extra parameters that are really useful in some cases:

–orig_master_is_new_slave: if you want to make the old master a slave of the new one.

–running_updates_limit: if the current master executes write queries that take more than this parameter’s setting, or if any of the MySQL slaves behind master take more than this parameter, the master switch aborts. By default, it’s 1 (1 second). All these checks are for safety reasons.

–interactive=0: if you want to skip all the confirmation requests and questions masterha_master_switch could ask.

Check this link in case you use GTID and want to avoid problems with errant transactions during the failover:

https://www.percona.com/blog/2015/12/02/gtid-failover-with-mysqlslavetrx-fix-errant-transactions/

Custom scripts

Since this is a quick guide to start playing around with MHA, I won’t cover advanced topics in detail. But I will mention a few:

    • Custom scripts. MHA can move IPs around, shutdown a server and send you a report in case something happens. It needs a custom script, however. MHA comes with some example scripts, but you would need to write one that fits your environment.The directives are master_ip_failover_script, shutdown_script, report_script. With them configured, MHA will send you an email or a message to your mobile device in the case of a failover, shutdown the server and move IPs between servers. Pretty nice!

Hope you found this quickstart guide useful for your own tests. Remember, one of the most important things: don’t overdo automation!  😉 These tools are good for checking health and performing the first initial failover. But you must still investigate what happened, why, fix it and work to avoid it from happening again. In high availability (HA) environments, automate everything and cause it to stop being HA.

Have fun!

by Miguel Angel Nieto at September 02, 2016 07:22 PM