Peter ZaitsevManaging shards of MySQL databases with MySQL Fabric (11.7.2014, 09:00 UTC)

This is the fourth post in our MySQL Fabric series. In case you’re joining us now, we started with an introductory post, and then discussed High Availability (HA) using MySQL Fabric here (Part 1) and here (Part 2). Today we will talk about how MySQL Fabric can help you scale out MySQL databases with sharding.

Introduction

At the time of writing, MySQL Fabric includes support for range- and hash-based sharding. As with HA, the functionality is split between client, through a MySQL Fabric-aware connector; and server, through the mysqlfabric utility and the XML-RPC server we’ve talked about before.

In this post, we’ll go through the process of setting up a sharded table for use with MySQL Fabric, and then go through some usage examples, again using the Python connector.

In our next post, we’ll talk about shard management operations, and go into more detail about how we can combine the Sharding and HA features of MySQL Fabric.

The architecture

For our examples, we’ll be using a sharding branch from our vagrant-fabric repository. If you have been following previous posts and already have a local copy of the repo, you can get this one just by running the following command:

git checkout sharding

from the root of your copy. Bear in mind that the node names are the same in the Vagrantfile, so while in theory  just running vagrant provision should be enough, you may have to run vagrant destroy and vagrant up again, if you hit unexpected behavior.

The only difference between this branch and the original one is that you’ll have two mysqld instances per node: one on port 3306 and one on port 13306. This will let us achieve high availability for our shard groups. But don’t worry about that for now, it’s something we’ll discuss more in depth in our next post.

In today’s examples, we’ll be using the three group architecture described by this diagram:

Fabric Sharding Structure

The blue boxes represent shard-groups and the green box represent the global-group. The red arrows indicate the flow of replication and the violet arrows represent client connections.

Setting up sharding

The official documentation about sharding with MySQL Fabric can be found here. We’ll be using the same example employees database and shard the salaries table.

As we said, to keep things simple for the introduction, we’ll create all the groups but only add one instance to each one of them. In our next post, we’ll use two instances per group to evaluate how MySQL Fabric can make our shards highly available, and how it can rearrange replication topologies automatically after a failure.

To start, let’s create three groups:

[vagrant@store ~]$ mysqlfabric group create salaries-global
Procedure :
{ uuid = 390aa6c0-acda-40e2-ad52-8c0869613635,
finished = True,
success = True,
return = True,
activities =
}
[vagrant@store ~]$ for i in 1 2; do mysqlfabric group create salaries-$i; done
Procedure :
{ uuid = 274742a2-5e84-49b8-8446-5a8fc55f1899,
finished = True,
success = True,
return = True,
activities =
}
Procedure :
{ uuid = 408cfd6a-ff3a-493e-b39b-a3241d83fda6,
finished = True,
success = True,
return = True,
activities =
}

 

The global group will be used to propagate schema changes and to store unpartitioned data. Think of configuration tables that don’t need to be sharded, for example.

The other two groups will host shards, that is, tables that will have the same structure across all the nodes, but not the same data (and that will be empty in the global group’s nodes).

Now, let’s add one instance to each group:

[vagrant@store ~]$ mysqlfabric group add salaries-global node1:3306
Procedure :
{ uuid = 0d0f657c-9304-4e3f-bf5b-a63a5e2e4390,
finished = True

Truncated by Planet PHP, read more at the original (another 11565 bytes)

Link
Peter ZaitsevPercona Toolkit 2.2.9 is now available (10.7.2014, 11:05 UTC)

Percona ToolkitPercona is glad to announce the release of Percona Toolkit 2.2.9 on July 10, 2014 (downloads are available here and from the Percona Software Repositories). This release is the current GA (Generally Available) stable release in the 2.2 series.

Bugs Fixed:

  • Fixed bug 1335960: pt-query-digest could not parse the binlogs from MySQL 5.6 because the binlog format was changed.
  • Fixed bug 1315130: pt-online-schema-change did not find child tables as expected. It could incorrectly locate tables – tables which reference a table with the same name in a different schema, and could miss tables referencing the altered table – if they were in a different schema..
  • Fixed bug 1335322: pt-stalk would fail when variable or threshold was a non-integer.
  • Fixed bug 1258135: pt-deadlock-logger was inserting older deadlocks into the deadlock table even if it was already there, therby creating unnecessary noise. For example, if the deadlock happened 1 year ago, and MySQL keeps it in the memory, pt-deadlock-logger would INSERT it into percona.deadlocks table every minute until server was restarted. This was fixed by comparing with the last deadlock fingerprint before issuing the INSERT query.
  • Fixed bug 1329422: pt-online-schema-change foreign-keys-method=none can break FK constraints in a way that is hard to recover from. Although this method of handling foreign key constraints is provided so that the database administrator can disable the tool’s built-in functionality if desired, a warning and confirmation request when using alter-foreign-keys-method “none” has been added to warn users when using this option.

Percona Toolkit is free and open-source. Details of the release can be found in the release notes and the 2.2.9 milestone at Launchpad. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

The post Percona Toolkit 2.2.9 is now available appeared first on MySQL Performance Blog.

Link
Jean-Jerome SchmidtHow to Configure ClusterControl to run on nginx (10.7.2014, 10:24 UTC)
July 10, 2014
By Severalnines

ClusterControl uses the Apache HTTP Server to serve its web interface, but it is also possible to use nginx. nginx + PHP fastcgi is well-known for its capabilities to run on a small memory footprint compared to standard Apache + PHP DSO.

 

In this post, we will show you how to run ClusterControl on nginx web server by swapping out the default Apache web server installed during the initial deployment. This blog post does not mean that we officially support nginx, it just an alternative way that a portion of our users have been interested in. For instance, Phil Bayfield wrote a blog on the same topic a while back.

 

Apache Configuration

 

Before we jump into nginx configurations, let’s look at how the ClusterControl web application is configured with Apache web server. ClusterControl consists of two web components, a web UI and CMONAPI. These components are located in the Apache’s document root which might vary depending on the operating system.

 

1. Make sure ClusterControl UI and CMONAPI exist in the document root. Document root for RedHat/CentOS and Ubuntu 14.04 LTS (Apache 2.4) is located at /var/www/html while Debian and Ubuntu 12.04 and lower is located at /var/www. ClusterControl will be installed under this root directory and you should see something like this:

$ ls -l /var/www/html
drwxr-xr-x. 6 apache apache 4096 Jul  9 16:31 cc-cmonapi-1.2.6
drwxr-xr-x. 5 apache apache 4096 Jul  9 16:31 cc-ui-1.2.6
lrwxrwxrwx. 1 root   root     26 Jul  9 16:31 clustercontrol -> /var/www/html//cc-ui-1.2.6
drwxr-xr-x. 4 apache apache 4096 Jul  8 18:34 cmon
lrwxrwxrwx. 1 root   root     31 Jul  9 16:31 cmonapi -> /var/www/html//cc-cmonapi-1.2.6

 

2. Apache must be able to read custom configuration file (.htaccess) under the document root directory, by setting AllowOverride options to All. Example in /etc/httpd/conf/httpd.conf:

<Directory "/var/www/html">
    Options Indexes FollowSymLinks
    AllowOverride All
    Order allow,deny
    Allow from all
</Directory>

 

3. Require the following PHP modules to be installed and enabled:

  • common
  • mysql
  • ldap
  • gd
  • curl

The standard Apache installation via package manager will install PHP to run as dynamic shared object (DSO). Running on this mode will require you to restart Apache in case of php configuration changes.

 

read more

Link
Peter ZaitsevTokuDB gotchas: slow INFORMATION_SCHEMA TABLES (9.7.2014, 16:01 UTC)

We are using Percona Server + TokuDB engine extensively in Percona Cloud Tools and getting real usage operational experience with this engine. So I want to share some findings we came across, in hope it may help someone in their work with TokuDB.

So, one problem I faced is that SELECT * FROM INFORMATION_SCHEMA.TABLES is quite slow when I have thousands tables in TokuDB. How slow? For example…

select * from information_schema.tables limit 1000;
...
1000 rows in set (18 min 31.93 sec)

This is very similar to what InnoDB faced a couple years back. InnoDB solved it by adding variable innodb_stats_on_metadata.

So what happens with TokuDB? There is an explanation from Rich Prohaska at Tokutek: “Tokudb has too much overhead for table opens. TokuDB does a calculation on the table when it is opened to see if it is empty. This calculation can be disabled when ‘tokudb_empty_scan=disabled‘. ”

So let’s see what we have with tokudb_empty_scan=disabled

select * from information_schema.tables limit 1000;
...
1000 rows in set (3 min 4.59 sec)

An impressive improvement, but still somewhat slow. Tokutek promises a fix to improve it in the next TokuDB 7.2 release.

The post TokuDB gotchas: slow INFORMATION_SCHEMA TABLES appeared first on MySQL Performance Blog.

Link
Peter ZaitsevCheck for MySQL slave lag with Percona Toolkit plugin for Tungsten Replicator (9.7.2014, 13:28 UTC)

Continuent Tungsten ReplicatorA while back, I made some changes to the plugin interface for pt-online-schema-change which allows custom replication checks to be written. As I was adding this functionality, I also added the --plugin option to pt-table-checksum. This was released in Percona Toolkit 2.2.8.

With these additions, I spent some time writing a plugin that allows Percona Toolkit tools to use Tungsten Replicator to check for slave lag, you can find the code at https://github.com/grypyrg/percona-toolkit-plugin-tungsten-replicator

Requirements

The plugin uses the perl JSON::XS module (perl-JSON-XS rpm package, http://search.cpan.org/dist/JSON-XS/XS.pm), make sure it’s available or the plugin will not work.

Preparation

We need to use the --recursion-method=dsns as the Percona Toolkit tools are not able to automatically find the tungsten replicator slaves that are connected to the master database. (I did add a blueprint on launchpad to make this possible https://blueprints.launchpad.net/percona-toolkit/+spec/plugin-custom-recursion-method)

The dsns recursion-method gets the list of slaves from a database table you specify:

CREATE TABLE `percona`.`dsns` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `parent_id` int(11) DEFAULT NULL,
  `dsn` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
);

Here one slave node3 is replicating from the master:

node1 mysql> select * from percona.dsns;
+----+-----------+---------+
| id | parent_id | dsn     |
+----+-----------+---------+
|  2 |      NULL | h=node3 |
+----+-----------+---------+

Configuration

Currently, it is not possible to specify extra options for the plugin with Percona Toolkit, so some manual editing of the perl file is still necessary to configure it.

So before we can run a checksum, we need to configure the plugin:

## CONFIGURATION
# trepctl command to run
my $trepctl="/opt/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/trepctl";
# what tungsten replicator service to check
my $service="bravo";
# what user does tungsten replicator use to perform the writes?
# See Binlog Format for more information
my $tungstenusername = 'tungsten';

Running A Checksum

Here I did a checksum of a table with pt-table-checksum. During the checksum process, I brought the slave node offline and brought it back online again:

# pt-table-checksum
        -u checksum
        --no-check-binlog-format
        --recursion-method=dsn=D=percona,t=dsns
        --plugin=/vagrant/pt-plugin-tungsten_replicator.pl
        --databases app
        --check-interval=5
        --max-lag=10
Created plugin from /vagrant/pt-plugin-tungsten_replicator.pl.
PLUGIN get_slave_lag: Using Tungsten Replicator to check replication lag
Tungsten Replicator status of host node3 is OFFLINE:NORMAL, waiting
Tungsten Replicator status of host node3 is OFFLINE:NORMAL, waiting
Replica node3 is stopped.  Waiting.
Tungsten Replicator status of host node3 is OFFLINE:NORMAL, waiting
Replica lag is 125 seconds on node3.  Waiting.
Replica lag is 119 seconds on node3.  Waiting.
Checksumming app.large_table:  22% 00:12 remain
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
07-03T10:49:54      0      0  2097152       7       0 213.238 app.large_table

I recommend to change the check-interval higher than the default

Truncated by Planet PHP, read more at the original (another 5352 bytes)

Link
Peter ZaitsevTIMESTAMP Columns, Amazon RDS 5.6, and You (8.7.2014, 14:18 UTC)

This comes from an issue that I worked on recently, wherein a customer reported that their application was working fine under stock MySQL 5.6 but producing erroneous results when they tried running it on Amazon RDS 5.6. They had a table which, on the working server, contained two TIMESTAMP columns, one which defaulted to CURRENT_TIMESTAMP and the other which defaulted to ’0000-00-00 00:00:00′, like so:

CREATE TABLE mysql56 (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  ts1 TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  ts2 TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
);

However, under Amazon RDS, the same table looked like this:

CREATE TABLE rds56 ( 
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  ts1 TIMESTAMP NULL DEFAULT NULL,
  ts2 TIMESTAMP NULL DEFAULT NULL, 
);

They mentioned that their schema contains TIMESTAMP column definitions without any modifiers for nullability or default values. In other words, they were doing something like this:

CREATE TABLE foo56 (
    id NOT NULL AUTO_INCREMENT PRIMARY KEY,
    ts1 TIMESTAMP,
    ts2 TIMESTAMP
);

It’s a known issue (or change, or difference, whatever we choose to call it) that MySQL is deprecating defaults for TIMESTAMP columns that don’t have any nullability or default-value specifiers; this is covered in the 5.6 documentation. However, the docs also mention that the default value for this setting is OFF – i.e., if you create a table with TIMESTAMP columns without any defaults, it will fill them in for you, similarly to what I’ve described above.

As it turns out, the RDS default for this setting is ON, hence the “NULL DEFAULT NULL” modifiers when creating the table under RDS. We changed the parameter group, restarted the instance (note that this variable is NOT dynamic), and their schema-creation script created the tables in the proper way.

So, what have we learned here?

  • Migrating from standalone MySQL to Amazon RDS sometimes has hidden pitfalls that aren’t always readily apparent. Many times it will “just work” – but sometimes it doesn’t. Percona is, of course, happy to help review your configurations and assist with any Amazon RDS implementation plans you might have.
  • When in doubt, fully-specify your TIMESTAMP columns. If you want them NOT NULL, say so. If you want a default value or an on-updated value, set it. Even the configuration variable explicit_defaults_for_timestamp is deprecated and slated for removal in a future version, so eventually it won’t be possible to get the old pre-5.6 behavior at all.

The post TIMESTAMP Columns, Amazon RDS 5.6, and You appeared first on MySQL Performance Blog.

Link
Open QueryMunin graphing of MySQL (8.7.2014, 01:48 UTC)

While there are many graphing tools out there and we’ve used Munin for a while now.

The MySQL plugin for Munin had fallen out of date and the show engine innodb status output changed in 5.5 making some bits of the plugin simply not work any more. Also the show global status has some extra variables so there was a need to create new graphs.

All of these are now in the 2.1.8+ development releases of Munin.

Here are samples of the new/updated graphs.

mysql2_tables-day

Tables

Munin table definations

Table Definitions

mysql2_innodb_bpool_act-day


Innodb Buffer Pool Activity

mysql_innodb_bpool_internal_breakdown-day

Innodb Buffer Pool Internal Breakdown

Innodb Insert Buffer

Innodb Insert Buffer

mysql_innodb_bpool-day

Innodb Buffer Pool

Innodb Semaphores

Innodb Semaphores

Innodb Master Thread

Innodb Master Thread

mysql_innodb_adaptive_hash-day

Innodb Adaptive Hash Index

Innodb Queries and Transactions

Truncated by Planet PHP, read more at the original (another 6235 bytes)

Link
Jean-Jerome SchmidtClusterControl Module for Puppet (7.7.2014, 15:47 UTC)
July 7, 2014
By Severalnines

If you are automating your infrastructure using Puppet, then this blog is for you. We are glad to announce the availability of a Puppet module for ClusterControl. For those using Chef, we already published Chef cookbooks for Galera Cluster and ClusterControl some time back.  

 

 

ClusterControl on Puppet Forge

 

The ClusterControl module initial release is available on Puppet Forge, installing the module is as easy as:

$ puppet module install severalnines-clustercontrol

 

If you haven’t change the default module path, this module will be installed under /etc/puppet/modules/clustercontrol on your Puppet master host. ClusterControl supports following database clusters:

  • Galera Cluster
    • MySQL Galera Cluster by Codership
    • Percona XtraDB Cluster by Percona
    • MariaDB Galera Cluster by MariaDB
  • MySQL Cluster
  • MySQL Replication
  • MongoDB or TokuMX Clusters
    • Sharded Cluster
    • Replica Set

 

Severalnines Package Repository

 

This module makes use of the Severalnines repository for yum and apt packages. This repository hosts the latest stable release of ClusterControl and all of its components.

 

ClusterControl and all of its components requires post-installation procedures, like setting up MySQL, granting users, setting up Apache and etc. This module will automate most of these. 

 

If you lookup the Severalnines package repository, you will find the following packages:

  • clustercontrol - Severalnines ClusterControl Web Application. Frontend for clustercontrol-controller. Previously known as cc-ui.
  • clustercontrol-cmonapi - Severalnines ClusterControl REST API. Previously known as cc-cmonapi.
  • cmon-agent - Agent for ClusterControl. Manage and monitor MySQL, MySQL Cluster and Galera Cluster for MySQL
  • cmon-controller - ClusterControl Controller. Manage and monitor MySQL, MySQL Cluster and Galera Cluster for MySQL

 

read more

Link
Peter ZaitsevLooking out for max values in integer-based columns in MySQL (7.7.2014, 10:00 UTC)

Yay! My first blog post! As long as at least 1 person finds it useful, I’ve done my job. ;) Recently, one of my long-term clients was noticing that while their INSERTs were succeeding, a particular column counter was not incrementing. A quick investigation determined the column was of type int(11) and they had reached the maximum value of 2147483647. We fixed this by using pt-online-schema-change to change the column to int(10) unsigned, thus allowing values up to 4294967295.

My client was now concerned about all his other integer-based columns and wanted me to check them all. So I wrote a quick-n-dirty script in Go to check all integer-based columns on their current value compared to the maximum allowed for that column type.

You can find the full source code in my git repo.

Here’s a quick overview; the code is pretty simple.

First we connect to MySQL and verify the connection:

db, err := sql.Open("mysql", fmt.Sprintf("%s:%s@tcp(%s:3306)/%s", mysqlUn, mysqlPw, hostToCheck, dbToCheck))
if err != nil {
        fmt.Printf("Error connecting to MySQL on '%s': n", hostToCheck, err)
        db.Close()
        os.Exit(1)
}
// Check connection is alive.
err = db.Ping()
if err != nil {
        fmt.Printf("Unable to ping mysql at '%s': %sn", hostToCheck, err)
        db.Close()
        os.Exit(1)
}

Next, we query the information_schema.columns table for the names of all integer-based columns and calculate what their maximum value can be (credit for the clever SQL goes to Peter Boros).

// Construct our base i_s query
var tableExtraSql string
if tableToCheck != "" {
        tableExtraSql = fmt.Sprintf("AND TABLE_NAME = '%s'", tableToCheck)
}
baseSql := fmt.Sprintf(`
        SELECT TABLE_NAME, COLUMN_NAME, COLUMN_TYPE, (CASE DATA_TYPE
          WHEN 'tinyint' THEN 255
          WHEN 'smallint' THEN 65535
          WHEN 'mediumint' THEN 16777215
          WHEN 'int' THEN 4294967295
          WHEN 'bigint' THEN 18446744073709551615
        END >> IF(LOCATE('unsigned', COLUMN_TYPE) > 0, 0, 1)) AS MAX_VALUE
        FROM information_schema.columns
        WHERE TABLE_SCHEMA = '%s' %s
        AND DATA_TYPE IN ('tinyint', 'int', 'mediumint', 'bigint')`, dbToCheck, tableExtraSql)

Now that we have this list of columns to check, we simply loop over this result set, get the MAX() of each column and print a pretty report.

// Loop over rows received from i_s query above.
for columnsToCheck.Next() {
        err := columnsToCheck.Scan(&tableName, &columnName, &columnType, &maxValue)
        if err != nil {
                log.Fatal("Scanning Row Error: ", err)
        }
        // Check this column
        query := fmt.Sprintf("SELECT MAX(%s), ROUND((MAX(%s)/%d)*100, 2) AS ratio FROM %s.%s",
                columnName, columnName, maxValue, dbToCheck, tableName)
        err = db.QueryRow(query).Scan(&currentValue, &ratio)
        if err != nil {
                fmt.Printf("Couldn't get MAX(%s.%s): %sn", tableName, columnName, err)
                fmt.Println("SQL: ", query)
                continue
        }
        // Print report
        if ratio.Valid && ratio.Float64 >= float64(reportPct) {
                fmt.Printf("'%s'.'%s' - Type: '%s' - ", tableName, columnName, columnType)
                fmt.Printf("ColumMax: '%d'", maxValue)
                fmt.Printf(" - CurVal: '%d'", currentValue.Int64)
                fmt.Printf(" - FillRatio: '%.2f'n", ratio.Float64)
        }
}

There are more options to the app that allow you to silence some of the verbosity and to only print report lines where the value-to-max ratio is > a user-defined threshold. If you have frequently changing schemas, this should allow you to cron the app and only receive email reports when there is a potential problem. Otherwise, this tool could be useful to run once a month/quarter, just to verify things are in good standing.

Like I said before, hopefully this helps at least 1 person catch a potential problem sooner rather than later.

The post Looking out for max values in integer-based columns in MySQL appeared first on MySQL Performance Blog.

Link
Daniël van EedenThe MySQL 6.0 goodybag (6.7.2014, 20:39 UTC)
After MySQL 5.1 was released work started on MySQL 5.2, but then this was renamed to MySQL 6.0. There were many new features introduced in 6.0. But then stabilizing this branch became as huge task. Eventually the decision was made to start of with a stable branch and backport the new features from 6.0. This is how many of the 6.0 features landed in 5.5 and 5.6.

So let's see which features 6.0 brought and were they landed. I'll use the What Is New in MySQL 6.0 section of the MySQL 6.0 Reference Manual for this.

  • The Falcon storage engine. This never landed anywhere as far as I know. It's not even included in the list of storage engines in the MariaDB knowledgbase. As both InnoDB and MySQL are now part of Oracle I don't see any reason for Falcon to exist anymore.
  • 4-byte utf8 and support for utf16 and utf32. This is included in MySQL 5.5 together with many other Unicode enhancements.
  • Database backup with SQL. This allows you to make backups by executing 'BACKUP DATABASE' SQL statements. This is has not landed anywhere as far as I know, but some of the code might have made it into MySQL Enterprise Backup (both use  backup_history and backup_progress tables in the mysql database). This might be an interesting thing to have, but with MEB there is not a real need for it.
  • Subquery enhancements, BKA and MRR. This all made it into MySQL 5.6.
  • LOAD XML. This made it into MySQL 5.5, but I don't think it is used often.
  • The Maria storage engine. This is a transactional MyISAM storage egine. This is not used in Oracle MySQL. It is included in MariaDB, but renamed to Aria as it might otherwise cause confusion. As far as I known there is not much development being done on Aria.
  • Foreign Keys. Of course MySQL supports foreign keys (now even with NDB!), but those are implemented in the storage engine, not in the server. This was on the roadmap for 6.x but I don't know if it was actually implemented. Implementing this in the server could make writing storage engines easier and would probably also make it easier to combine partitioning and foreign keys.

Did I forget any 6.x features? Did you ever try 6.x? Let me know in the comments!



Link
LinksRSS 0.92   RDF 1.
Atom Feed