Peter ZaitsevSystemtap solves phantom MySQLd SIGTERM / SIGKILL issue (18.7.2014, 14:38 UTC)

The Percona Managed Services team recently faced a somewhat peculiar client issue. We’d receive pages about their MySQL service being unreachable. However, studying the logs showed nothing out of the ordinary…. for the most part it appeared to be a normal shutdown and there was nothing in anyone’s command history nor a cron task to speak of that was suspicious.

This is one of those obscure and peculiar (read: unique) issues that triggered an old memory; I’d seen this behavior before and I had just the tool to catch the culprit in the act.

Systemtap made diagnostics of this issue possible and I can’t state enough how much of a powerful and often under-utilized tool set systemtap really is.

cat > signals.stp << EOF
probe signal.send {
if (sig_name == “SIGKILL” || sig_name == “SIGTERM”)
printf(“[%s] %s was sent to %s (pid:%d) by %s uid:%dn”,
ctime(gettimeofday_s()), sig_name, pid_name, sig_pid, execname(), uid())
}
EOF

sudo stap ./signals.stp > signals.log 2>signals.err

grep mysqld signals.log
[Wed Jun 11 19:03:23 2014] SIGKILL was sent to mysqld (pid:8707) by cfagent uid:0
[Fri Jun 13 21:37:27 2014] SIGKILL was sent to mysqld (pid:6583) by cfagent uid:0
[Sun Jun 15 05:05:34 2014] SIGKILL was sent to mysqld (pid:19818) by cfagent uid:0
[Wed Jul 9 07:03:47 2014] SIGKILL was sent to mysqld (pid:4802) by cfagent uid:0

Addendum: It had been so long since I had used this tooling that I could not remember the original source from which I derived the module above; some cursory searching to rectify this issue for this blog post found this original source by Eugene Teo of Red Hat made available under GPLv2.

From this we were able to show that cfagent was killing the mysqld process presumably via a misconfigured job; this information was returned to the client and this has continued to be run in production for two months now at the client’s request with no issues to speak of.

This is by no means the limit to what systemtap can be used to achieve; you can hook into functions though whilst you may need to install the debug packages to find what functions are available run for example:

sudo stap -L 'process("/usr/sbin/mysqld").function("*")' > /tmp/mysql_stapfunc
...
head /tmp/mysql_stapfunc
process("/usr/sbin/mysqld").function("TIME_from_longlong_date_packed")
process("/usr/sbin/mysqld").function("TIME_from_longlong_datetime_packed")
process("/usr/sbin/mysqld").function("TIME_from_longlong_time_packed")
process("/usr/sbin/mysqld").function("TIME_set_hhmmss")
process("/usr/sbin/mysqld").function("TIME_set_yymmdd")
process("/usr/sbin/mysqld").function("TIME_to_longlong_date_packed")
process("/usr/sbin/mysqld").function("TIME_to_longlong_datetime_packed")
process("/usr/sbin/mysqld").function("TIME_to_longlong_packed")
process("/usr/sbin/mysqld").function("TIME_to_longlong_time_packed")
process("/usr/sbin/mysqld").function("TIME_to_ulonglong")
...

This is also true of the kernel using sudo stap -L 'kernel.function("*")' > /tmp/kernel_stapfunc however you must be booted into a debug kernel for this to function.

Systemtap is more than a worthy tool to have at your disposal with plenty of examples available.

Finally I invite you to join me July 23 at 10 a.m. Pacific time for my webinar, “What Every DBA Needs to Know About MySQL Security.” This detailed technical webinar provides insight into best security practices for either setting up a new MySQL environment or upgrading the security of an existing one. I hope to see you there!

The post Systemtap solves phantom MySQLd SIGTERM / SIGKILL issue appeared first on MySQL Performance Blog.

Link
Peter ZaitsevQ&A: Even More Deadly Mistakes of MySQL Development (17.7.2014, 16:57 UTC)

Percona WebinarsOn Wednesday I gave a presentation on “How to Avoid Even More Common (but Deadly) MySQL Development Mistakes” for Percona MySQL Webinars.  If you missed it, you can still register to view the recording and my slides.

Thanks to everyone who attended, and especially to folks who asked the great questions.  I answered as many as we had time for  during the session, but here are all the questions with my complete answers:

Q: Disk bandwidth also not infinite ;-)

Indeed, you’re right!

We discussed in the webinar the impact on network bandwidth from using column wildcards in queries like SELECT *, but it’s also possible that using SELECT * can impact disk operations. Varchar, Blob, or Text columns can be stored on extra pages in the database, and if you include those columns in your query needlessly, it can cause the storage engine to do a lot of seeks and page reads unnecessarily.

For more details on string storage in InnoDB, see Peter Zaitsev’s blog on Blob Storage in Innodb.

Q: How many tables can be joined in a single query? What is the optimal number of joins?

MySQL has a limit of 63 table references in a given query. This limits how many JOIN operations you can do, and also limits the number of UNIONs. Actually you can go over this limit if your JOIN or UNION don’t reference any tables, that is, create a derived table of one row of expressions.

If you do join a lot of tables (or even self-join the same table many times), you’re likely to hit a practical scaling limit long before you reach 63 table references. The practical limit in your case depends on many factors, including the length of the tables, the data types, the type of join expressions in your queries, and your physical server’s capabilities. It’s not a fixed limit I can cite for you.

If you think you need dozens of table references in a single query, you should probably step back and reconsider your database design or your query design.

I often see this type of question (“what is the limit on the number of joins?”) when people try to use key/value tables, also called Entity-Attribute-Value, and they’re trying to pivot attributes from rows into columns, as if the table were stored in a conventional way with one column per attribute. This is a broken design for many reasons, and the scalability of many-way joins is just one problem with it.

Q: How many indexes can be created in a single table? Any limitation? What is the optimal number of indexes?

All MySQL storage engines support at least 16 indexes per table.

As far as the optimal number of indexes, I don’t pay attention to the number of indexes (as long as it remains lower than the max of 16). I try to make sure I have the right indexes for my queries. If you put an arbitrary cap of for example 8 or 10 indexes on a given table, then you might be running queries that lack a needed index, and the unnecessary extra cost of running that query is probably greater than the cost of maintaining the one extra index it needs.

That said, there are cases where you have such variation in query types that there’s no way to have optimal indexes to cover every possible case. Given that you can have multi-column indexes, and multi-column indexes with columns in different orders, there are n-factorial possible indexes on a table with n columns.

Q: There is a table with 3 columns: id(int), user_id(int), day(date). There is a high chance same user_id will ‘exist’ for every day. I read data by “where user_id = some_id” (very high throuhput) and delete all entries once a day by cron using “where sent_date = ’2014-01-01′ “. Have approx 6M rows per day deletion is pretty painfull. Will partitioning by column ‘day’ help me deleting those bulks faster? If yes – how much faster? How much will it slow down SELECTs? – not all entries are deleted, but only entries for some specific old day, e.g. ‘ WHERE day =

Truncated by Planet PHP, read more at the original (another 8798 bytes)

Link
Peter ZaitsevHigh Availability with mysqlnd_ms on Percona XtraDB Cluster (16.7.2014, 14:11 UTC)

This is the second part of my series on High Availability with mysqlnd_ms. In my first post, “Simple MySQL Master HA with mysqlnd_ms,” I showed a simple HA solution using asynchronous MySQL replication. This time we will see how to leverage an all-primary cluster where you can write to all nodes. In this post I used Percona XtraDB Cluster, but you should also be able to do the same with MySQL NDB Cluster or Tungsten Replicator.

To start with, here is the mysqlnd_ms configuration I used:

mysqlnd_ms_mm.ini
.  All of these files are available from my Github repository. Below, I have three Percona XtraDB Cluster nodes, all defined as masters and no slaves. I’ve configured a roundrobin filter where all connections will happen on the first node, in this case
192.168.56.44
 . In case the first node fails, the second node will be used and so forth until no more nodes are available. Another interesting configuration option here is the loop_before_master strategy whereby if connection or a statement to the current server fails, it will be retried silently on the remaining nodes before returning an error to the user, more on this below.
{
   "primary": {
     "master": {
       "master_1": {
         "host": "192.168.56.44",
         "port": "3306"
       },
       "master_2": {
         "host": "192.168.56.43",
         "port": "3306"
       },
       "master_3": {
         "host": "192.168.56.42",
         "port": "3306"
       }
     },
     "slave": { },
     "filters": { "roundrobin": [ ] },
     "failover": { "strategy": "loop_before_master", "remember_failed": true }
   }
 }

Similar to my previous post, I also used a custom INI file for PHP to use, this time aptly named

master-master.ini
 :
mysqlnd_ms.enable = 1
mysqlnd_ms.disable_rw_split = 1
mysqlnd_ms.multi_master = 1
mysqlnd_ms.force_config_usage = 1
mysqlnd_ms.config_file = /home/revin/git/demo-me/phpugph201407/mysqlnd_ms_mm.ini

A new addition to this configuration is

mysqlnd_ms.multi_master
 , when enabled it would allow you to use all nodes or just one and treat the others as passive. The PHP script I used this time is called
master-master.php
 , it is largely similar to
master-slave-ng.php
with a few differences:
  1. There is no need for 
    /tmp/PRIMARY_HAS_FAILED
    
      sentinel as all nodes were writable.
  2. There is no need for 
    /*ms=master*/
    
      SQL hint when validating a connection from connect_mysql function since all nodes acts as master.

So here is a quick test, first with roundrobin filter, after 4 INSERTs, I shutdown 

192.168.56.44
  which sends my connection to the next server in the configuration,
192.168.56.43
 . When I started back 
192.168.56.44
  again, the script resumed connections there. Pretty cool right?
[revin@forge phpugph201407]$ php -c master-master.ini master-master.php
Last value 3564 from host 192.168.56.44 via TCP/IP and thread id 19
Last value 3565 from host 192.168.56.44 via TCP/IP and thread id 20
Last value 3566 from host 192.168.56.44 via TCP/IP and thread id 21
Last value 3567 from host 192.168.56.44 via TCP/IP and thread id 22
Warning: connect_mysql(): MySQL server has gone away in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63
Warning: connect_mysql(): Error while reading greeting packet. PID=23464 in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63
ERRROR: 192.168.56.43 via TCP/IP [2006] MySQL server has gone away on line 30
Last value 0 from host  and thread id 0
Last value 3568 from host 192.168.56.43 via TCP/IP and thread id 1552
Last value 3569 from host 192.168.56.43 via TCP/IP and thread id 1553
[...]
Last value 3584 from host 192.168.56.43 via TCP/IP and thread id 1568
Last value 3585 from host 192.168.56.44 via TCP/IP and thread id 18

Here’s another test using the random filter which allows you to write to all nodes, on my mysqlnd_ms_mm.ini above, I just changed 

roundrobin
  to
random
 . As you can see, all three nodes were being used, of course in random, at the same time yo

Truncated by Planet PHP, read more at the original (another 4136 bytes)

Link
Daniël van EedenOracle Critical Patch Update for MySQL (16.7.2014, 08:25 UTC)
Oracle has released the Critical Patch Update (CPU) for July 2014. The Oracle MySQL Risk Matrix lists 10 security fixes.

It took me some time to understand the subcomponent names. So here is the list with the full name of each subcomponent:

Subcomponent Full name
SRFTS Server: Full Text Search
SRINFOSC Server: INFORMATION_SCHEMA
SRCHAR Server: Character sets
ENARC Engine: Archive
SROPTZR Server: Optimizer
SRREP Server: Replication
SRSP Server: Stored Procecure
ENFED Engine: Federated

I don't think there is anything really important in the list, but it might be a good trigger to update to the latest release.

Upgrading should be easy especially if you're using the APT or YUM repositories from Oracle. If you're upgrading to a new major release (e.g. from 5.5 to 5.6) then you should read the instructions in the reference manual.
Link
Erkan YanarDockerHaterHipster@Docker Meeup (15.7.2014, 11:12 UTC)
Ahoi,

Im giving a OckerHaterHipster talk at the First Docker Meetup Frankfurt
It is about hating of course. But to be frankly it is more about understanding Docker is not just a replacement. It is more a rethinking of infrastructure. 

Not that you've got to love it of course  :)

Have Fun
Erkan


Link
Peter ZaitsevTokuDB tips: MySQL backups (15.7.2014, 10:00 UTC)

In my recent post, “TokuDB gotchas: slow INFORMATION_SCHEMA TABLES,” I saw a couple questions and tweets asking if we use TokuDB in production. Actually I mentioned it in that post and we also blogged about it in a couple of other recent posts:

So, yes, we are using Percona Server + TokuDB as a main storage engine in Percona Cloud Tools to store timeseries data.

And, yes, Percona Server + TokuDB is available GA Percona Server 5.6.19-67.0 with TokuDB (GA).

Just having good performance is not enough to make it into production; there are also operational questions and one such question is about backups. I want to explain how we do backups for Percona Server + TokuDB in Percona Cloud Tools.

I should say up front, that we DO NOT have support for TokuDB in Percona XtraBackup. TokuDB internals are significantly different from InnoDB/XtraDB, so it will be a major project to add this to Percona XtraBackup and we do not have any plans at the moment to work on this.

It does not mean that TokuDB users do not have options for backups. There is Tokutek Hot back-up, included in the Tokutek Enterpise Subscription. And there is a method we use in Percona Cloud Tools: LVM Backups. We use mylvmbackup scripts for this task and it works fairly well for us.

There is however some gotchas to be aware. If you understand an LVM backups mechanic, this is basically a managed crash recovery process when you restore from a backup.

Now we need to go in a little detail for TokuDB. To support transactions that involve both TokuDB and InnoDB engines, TokuDB uses a two-phase commit mechanism in MySQL. When involved, the two-phase commit requires binary logs presented for a proper recovery procedures.

But now we need to take a look at how we setup a binary log in Percona Cloud Tools. We used SSD for the main data storage (LVM partition is here) and we use a Hardware RAID1 over two hard-drives for binary logs. We choose this setup as we care about SSD lifetime. In write-intensive workloads, binary logs will produce a lot of write operations and in our calculation we will just burn these SSDs, so we have to store them on something less expensive.

So the problem there is that when we take an LVM snapshot over main storage, we do not have a consistent view of binary logs (although it is possible to modify backup scripts to copy the current binary log under FLUSH TABLES WITH READ LOCK operation, this is probably what we will do next). But binary logs are needed for recovery, without them we face these kind of errors during restoring from backup:

2014-DD-MM 02:15:16 16414 [Note] Found 1 prepared transaction(s) in TokuDB
2014-DD-MM 02:15:16 16414 [ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
2014-DD-MM 02:15:16 16414 [ERROR] Aborting

The error message actually hints a way out. Unfortunately it seems that we are the first ones to have ever tried this option, as tc-heuristic-recover is totally broken in current MySQL and not supposed to work… and it would be noticed if someone really tried it before us (which gives me an impression that Oracle/MySQL never properly tested it, but that is a different story).

We will fix this in Percona Server soon.

So the way to handle a recovery from LVM backup without binary logs is to start mysqld with –tc-heuristic-recover switch (unfortunately I did not figure out yet, should it be COMMIT or ROLLBACK value, hehe).

The proper way to use LVM

Truncated by Planet PHP, read more at the original (another 1322 bytes)

Link
Sergey PetruniaPriority queue optimization for filesort is now visible in MariaDB 10.0 (14.7.2014, 17:11 UTC)

TL;DR: Priority queue optimization for filesort with small LIMIT is now visible in MariaDB: there is a status variable and you can also see it in the slow query log (KB page link).

A longer variant:
One of the new optimizations in MySQL 5.6 is ability to use a priority queue instead of sorting for ORDER BY … LIMIT queries. The optimization was ported into MariaDB 10.0 long ago, but we still get questions if/when it will be ported. I guess, the reason for this is that, besides the query speed, you can’t see this optimization. Neither EXPLAIN, nor EXPLAIN FORMAT=JSON or PERFORMANCE_SCHEMA or status variables give any indication whether filesort used priority queue or the regular quicksort+merge algorithm.

MySQL 5.6 has only one way one can check whether filesort used priority queue. You need to enable optimizer_trace (set optimizer_trace=1), and then run the query (not EXPLAIN, but the query itself). Then, you can look into the optimizer trace and find something like this:

...
          "filesort_priority_queue_optimization": {
            "limit": 10,
            "rows_estimate": 198717,
            "row_size": 215,
            "memory_available": 262144,
            "chosen": true
          },
...

MariaDB doesn’t support optimizer_trace at the moment. Even if it did, I think it would be wrong to require one to look into the optimizer trace to find out about the picked query plan.

The natural place to show the optimization would be EXPLAIN output. We could show something like “Using filesort (priority queue)”. This was my initial intent. After looking into the source code, this turned out to be difficult to do. The logic that makes the choice between using quicksort+merge and using priority queue is buried deep inside query execution code. (As if the mess caused by late optimizations of ORDER BY and UNIONs didn’t teach anybody in MySQL team anything).

As for query execution, there are two facilities where one could record execution-time details about the query plan. They are the status variables and the slow query log.

Status variables

We’ve added Sort_priority_queue_sorts status variable. Now, the list of sort-related status variables is:

MariaDB [test]> show status like 'Sort%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| Sort_merge_passes         | 0     |
| Sort_priority_queue_sorts | 1     |
| Sort_range                | 0     |
| Sort_rows                 | 11    |
| Sort_scan                 | 1     |
+---------------------------+-------+

(Sort_range + Sort_scan) gives total number of sorts. Sort_priority_queue_sorts gives number of sorts that were done using priority queue.

Slow query log

Percona’s Extended statistics in the slow query log shows Filesort/Filesort_on_disk fields. We thought that adding information about priority queue use would be appropriate. Now, slow query log entries look like this:

# Time: 140714 18:30:39
# User@Host: root[root] @ localhost []
# Thread_id: 3  Schema: test  QC_hit: No
# Query_time: 0.053857  Lock_time: 0.000188  Rows_sent: 11  Rows_examined: 100011
# Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
# Filesort: Yes  Filesort_on_disk: No  Merge_passes: 0  Priority_queue: Yes
SET timestamp=1405348239;
select * from t1 where col1 between 10 and 20 order by col2 limit 100;

pt-query-digest is able to parse slow query logs with the new field.

What about PERFORMANCE_SCHEMA

What about PERFORMANCE_SCHEMA? After all, it is the most powerful tool for tracking query execution. It has “absorbed” some status variables into events_statements_history table. For sorting, it has these columns:

| SORT_MERGE_PASSES       | bigint(20) unsigned              | NO   |     | NULL    |       |
| SORT_RANGE              | bigint(20) unsigned              | NO   |     | NULL    |       |
| SORT_ROWS               | bigint(20) unsigned              | NO   |     | NULL    |       |
| SORT_SCAN               | bigint(20) unsigned              | NO   |     | NULL    |       |

Should we add a SORT_PRIORITY_QUEUE_SORTS column there? We didn’t add it into 10.0 right now because of compatibility concerns. Some tools may rely on the structure of PERFORMANCE_SCHEMA tables. Also, PERFORMANCE_SCHEMA table definitions are stored on disk, and one would have to run mysql_fix_privilege_tables after a minor upgrade, which is not good.

Link
Peter ZaitsevSimple MySQL Master HA with mysqlnd_ms (14.7.2014, 14:51 UTC)

I had the pleasure of presenting to the PHP Users Group Philippines a few days ago about mysqlnd_ms. The mysqlnd plugin, MySQL Master Slave, is a transparent layer on top of mysqlnd extension. This allows you to do read-write splitting and slave reads load balancing without needing to change anything from your application. But do you know you can also achieve a form of high availability with this plugin? I shared 2 forms on my presentation, using async MySQL replication either in master-slave configuration or master-master configuration, while the second form is having an all primary cluster where you can write to all nodes.

This first part is to demonstrate how you can achieve a simple HA solution using the first form. First, all the sample code here can be found on my GitHub repository. So, to use the mysqlnd_ms plugin, it uses an additional external configuration file in JSON format. This configuration file, will define your master and slave nodes, failover properties and any filters (connection selection method) you want to dictate how the algorithm will provide you the connection.

Let’s start with the mysqlnd_ms configuration I used,

mysqlnd_ms_ms.ini
 :
{
  "primary": {
    "master": {
      "master_1": {
        "host": "127.0.0.1",
        "port": "33001"
      }
    },
    "slave": {
    }
  },
  "standby": {
    "master": {
      "master_1": {
        "host": "127.0.0.1",
        "port": "33002"
      }
    },
    "slave": {
    }
  }
}

Here, I have two applications defined, one called “primary” and another called “standby”, I have not defined any slaves for simplicity. The two MySQL instances running on port 33001 and 33002 are in master-master configuration.

mysqlnd_ms.enable = 1
mysqlnd_ms.disable_rw_split = 1
mysqlnd_ms.force_config_usage = 1
mysqlnd_ms.config_file = /home/revin/git/demo-me/phpugph201407/mysqlnd_ms_ms.ini

This is the custom INI file I used for the tests,

master-slave.ini
 . The first line simply enables the plugin for use. The second line, mysqlnd_ms.disable_rw_split instructs the plugin that I should only send all queries to the master because I only have masters for this test.

As for the PHP script, the full copy can be found here, as it is a bit lengthy I will just explain the logic on what it does.

  1. To start the test, it bootstraps the test table via DROP and then CREATE queries.
  2. It then enters a for loop where it will execute an INSERT followed by a SELECT to validate the newly inserted row and additional information like the current active server id and the connection id.
  3. For every iteration of the loop, a new mysqli object is created to simulate non-persistent connections to the database server.
  4. To create the new connection, a call to the function 
    connect_mysql
    
      is made which returns a mysqli object when successful. An important thing to remember here is that mysqlnd_ms uses lazy connections by default, this means that when the mysqli object is created, it is not really connected yet to the server. One has to issue a statement like 
    'SELECT 1'
    
      to start the connection manually or call
    mysqli::real_connect
    
     . Not even 
    mysqli::ping
    
      does not work without the former, I’ve opened this bug.
  5. After the mysqli object is returned, the INSERT statement will trigger mysqlnd_ms to actually establish the connection and then execute the statement. This is where the good part is, if the connection cannot be made, the query_write_mysql function will know and will re-request the connection from connect_mysql, this time within the connect_mysql function, connection to the primary will be retried at least 10 times if the type of error from the previous failure is something related to a connection like error numbers 
    2002
    
      and
    2003
    
     . If the connection cannot be established after 10 retries, the application creates a sentinel file as 
    /tmp/PRIMARY_HAS_FAILED
    
      and will retry the connection to the secondary (slave or passive-master).

Here is an example run, my primary has a server id or 101 while my standby is 102:

[revin@forge phpugph201407]$ ph

Truncated by Planet PHP, read more at the original (another 3058 bytes)

Link
Cédric PEINTREWake up European DBA, call for papers for Percona Live London 2014 is open! (13.7.2014, 09:19 UTC)

Call for papers for Percona Live London 2014 is open. For the fourth consecutive year, PLUK is going to be one of the best community event in Europe.
I have the honour of being conference committee chairman and the hard task to reviewing the talks with my colleagues of the committee.

First, let me introduce the committee members:

  • Art van Scheppingen (Spil Games)
  • Nicolai Plum (Booking.com)
  • Luis Motta Campos (Ebay Classifieds Group)
  • Colin Charles (MariaDB)
  • David Busby (Percona)
  • Morgan Tocker (Oracle)
  • Cédric PEINTRE (Dailymotion)

Amazing, isn’t it?! I think we couldn’t have a better committee for a community event.
I’m very glad to take part in the adventure with you guys!

And if you wonder what the committee does, please, read this review from Shlomi.
You should know that the committee is fully independent. We are working hard to offer you the best conference and being as fair as possible, with our beliefs, with fun.

Now, it’s time to submit your talks and tutorials. Like I said last year, If you are using databases, you might have something to say.
We are looking for the best use cases in relation to these topics, surprise us!

  • High Availability
  • DevOps
  • Programming
  • Performance Optimization
  • Replication & Backup
  • MySQL in the Cloud
  • MySQL and NoSQL
  • MySQL Case Studies
  • Security
  • What’s New in MySQL

Tell us about your life, I’m sure you have a ton of stories to tell us.
We would love to have new faces, new companies and new topics this year again.
Don’t be shy, everyone is eager to drink your words.

Read the guidelines and submit your talk now!

Looking forward to meet you in London in November.
Cheers

Link
Peter ZaitsevManaging shards of MySQL databases with MySQL Fabric (11.7.2014, 09:00 UTC)

This is the fourth post in our MySQL Fabric series. In case you’re joining us now, we started with an introductory post, and then discussed High Availability (HA) using MySQL Fabric here (Part 1) and here (Part 2). Today we will talk about how MySQL Fabric can help you scale out MySQL databases with sharding.

Introduction

At the time of writing, MySQL Fabric includes support for range- and hash-based sharding. As with HA, the functionality is split between client, through a MySQL Fabric-aware connector; and server, through the mysqlfabric utility and the XML-RPC server we’ve talked about before.

In this post, we’ll go through the process of setting up a sharded table for use with MySQL Fabric, and then go through some usage examples, again using the Python connector.

In our next post, we’ll talk about shard management operations, and go into more detail about how we can combine the Sharding and HA features of MySQL Fabric.

The architecture

For our examples, we’ll be using a sharding branch from our vagrant-fabric repository. If you have been following previous posts and already have a local copy of the repo, you can get this one just by running the following command:

git checkout sharding

from the root of your copy. Bear in mind that the node names are the same in the Vagrantfile, so while in theory  just running vagrant provision should be enough, you may have to run vagrant destroy and vagrant up again, if you hit unexpected behavior.

The only difference between this branch and the original one is that you’ll have two mysqld instances per node: one on port 3306 and one on port 13306. This will let us achieve high availability for our shard groups. But don’t worry about that for now, it’s something we’ll discuss more in depth in our next post.

In today’s examples, we’ll be using the three group architecture described by this diagram:

Fabric Sharding Structure

The blue boxes represent shard-groups and the green box represent the global-group. The red arrows indicate the flow of replication and the violet arrows represent client connections.

Setting up sharding

The official documentation about sharding with MySQL Fabric can be found here. We’ll be using the same example employees database and shard the salaries table.

As we said, to keep things simple for the introduction, we’ll create all the groups but only add one instance to each one of them. In our next post, we’ll use two instances per group to evaluate how MySQL Fabric can make our shards highly available, and how it can rearrange replication topologies automatically after a failure.

To start, let’s create three groups:

[vagrant@store ~]$ mysqlfabric group create salaries-global
Procedure :
{ uuid = 390aa6c0-acda-40e2-ad52-8c0869613635,
finished = True,
success = True,
return = True,
activities =
}
[vagrant@store ~]$ for i in 1 2; do mysqlfabric group create salaries-$i; done
Procedure :
{ uuid = 274742a2-5e84-49b8-8446-5a8fc55f1899,
finished = True,
success = True,
return = True,
activities =
}
Procedure :
{ uuid = 408cfd6a-ff3a-493e-b39b-a3241d83fda6,
finished = True,
success = True,
return = True,
activities =
}

 

The global group will be used to propagate schema changes and to store unpartitioned data. Think of configuration tables that don’t need to be sharded, for example.

The other two groups will host shards, that is, tables that will have the same structure across all the nodes, but not the same data (and that will be empty in the global group’s nodes).

Now, let’s add one instance to each group:

[vagrant@store ~]$ mysqlfabric group add salaries-global node1:3306
Procedure :
{ uuid = 0d0f657c-9304-4e3f-bf5b-a63a5e2e4390,
finished = True

Truncated by Planet PHP, read more at the original (another 11565 bytes)

Link
LinksRSS 0.92   RDF 1.
Atom Feed