Peter ZaitsevOpenStack: A MySQL DBA Perspective – Sept. 17 webinar (11.9.2014, 20:58 UTC)
OpenStack: A MySQL DBA Perspective - Sept. 17 webinar

OpenStack: A MySQL DBA Perspective

I’ll have the pleasure to present, next Wednesday, September 17 at 10 a.m. PDT (1 p.m. EDT) a webinar titled “OpenStack: A MySQL DBA Perspective.” Everyone is invited.

The webinar will be divided into two parts. The first part will cover how MySQL can be used by the OpenStack infrastructure including the expected load, high-availability solutions and geo-DR.

The second part will focus on the use of MySQL within an OpenStack cloud. We’ll look into the various options that are available, the traditional ones and Trove. We’ll also discuss the block device options in regards with MySQL performance and, finally, we’ll review the high-availability implications of running MySQL in an OpenStack cloud.

Register here. I look forward to your questions, and if you have any related to OpenStack that I can help with in advance of the webinar please feel free to post those in the comments section below. I’ll write a followup post after the webinar to recap all related questions and answers. I’ll also provide the slides.

See you next Wednesday!

The post OpenStack: A MySQL DBA Perspective – Sept. 17 webinar appeared first on MySQL Performance Blog.

Link
Cédric PEINTREHow to shrink the ibdata file by transporting tables with Trite (11.9.2014, 16:41 UTC)

You’ve probably had some troubles with the shared InnoDB tablespace stored in the ibdata file. Especially when it has grown for some reasons and reached a critical size.

This behavior occurs in some cases, due to excessive rollback segments growth or during a migration from a unique shared tablespace to a file-per-table configuration for example.

In this post, I would like to explain how to shrink the ibdata file after an unwanted file growth in a file-per-table configuration.
Note that the process could be done without Trite but the tool avoids to write the script used to transport tables yourself.

Initial situation

Here is a sample of the InnoDB configuration:

innodb_data_file_path = ibdata1:100M:autoextend
innodb_file_per_table

And the status of your datafiles in the datadir directory:

drwx------ 2 mysql mysql 4,0K déc. 20 2012 performance_schema
drwxrwx--- 2 mysql mysql 4,0K déc. 20 2012 mysql
drwxrwx--- 2 mysql mysql 4,0K juin 10 23:58 DB_INNO_FPT_1
drwxrwx--- 2 mysql mysql 4,0K sept. 8 15:53 DB_INNO_FPT_2
-rwxrwx--- 1 mysql mysql 317G sept. 8 23:37 ibdata1

The ibdata file size is 317Go and you want to recover this lost space.

What is Trite?

Trite is a client/server tool that provides customizable transport of binary InnoDB table files.
The tool was developed by Joshua Prunier and is available on GitHub.

The tool allows to connect a client to your database backup (XtraBackup) and stream the files to another database.
Again, you can do that without Trite, but this tool automate the manual process.

Also, this procedure is much quicker than traditional mysqldump restores when the tables size becomes very large.

Finally, Trite can be useful in many use cases, it deserves your attention.

Why Trite?

Because in this case the dataset is too big to use a traditional export/import with mysqldump.
We have to copy very large InnoDB tables and Trite could help to da that with ease.

Also, we don’t simply want to restore a backup, we want to retrieve a ibdata file with a reasonable size.
So, we have to copy each table files from the backup to a new clean MySQL instance.

Pre-requisites

You need a spare server to restore the backuped files.
The source server should be master ready (binlog, server_id…)
Note that you can use a single server but the load generated by the process could be a problem on a production server.

Trite is written in Go, so, you need Go and a Go MySQL driver to install Trite on your server.
The Trite installation process is clearly detailed on the GitHub page of the project.

All tables must be InnoDB with the innodb_file_per_table configuration activated.

Xtrabackup is needed to perform the physical database backup.
Note that you must have Percona Server (5.1 to 5.6), Oracle MySQL 5.6 or MariaDB (5.5 to 10) for the target MySQL instance.

Overview of the procedure

We’ll copy the InnoDB tables from the source server (aka S) with the larger ibdata file to the target server (aka T).
I consider that MySQL is installed and configured on both servers.

Here is an overview of the procedure:

  • Install Trite on both servers (not detailed here)
  • [S] Make a backup of the source database with XtraBackup
  • [S] Apply logs with the --export option
  • [S] Make a dump of your database objects with Trite
  • [S] Start a Trite server pointing on the previous backup and dump directories
  • [T] Use the Trite client to restore all the tables in a brand new MySQL instance
  • [T] Configure the spare server as a slave and wait for the replication lag (not detailed here)
  • Switch your application from [S] to [T] or copy the fresh database on the source server (not detailed here)

Detailled procedure

Backup the source database with XtraBackup

On the source server, you have to make a full physical backup of your databases.
The backup process is not detailed here, please, refer to the documentation for the details on the installation and use.

Output example:

# innobackupex --user=root --password=pass /backup
... output truncated ...
innobackupex: Backup created in directory '/backup/2014-01-25_14-22-06'
140125 14:30:29 innobackupex: Connection to database server closed
140125 14:30:29 innobackupex: completed OK!

Truncated by Planet PHP, read more at the original (another 5156 bytes)

Link
Peter ZaitsevOpenStack users shed light on Percona XtraDB Cluster deadlock issues (11.9.2014, 15:27 UTC)

OpenStack_PerconaI was fortunate to attend an Ops discussion about databases at the OpenStack Summit Atlanta this past May as one of the panelists. The discussion was about deadlock issues OpenStack operators see with Percona XtraDB Cluster (of course this is applicable to any Galera-based solution). I asked to describe what they are seeing, and as it turned out, nova and neutron uses the SELECT … FOR UPDATE SQL construct quite heavily. This is a topic I thought was worth writing about.

Write set replication in a nutshell (with oversimplification)

Any node is writable, and replication happens in write sets. A write set is practically a row based binary log event or events and “some additional stuff.” The “some additional stuff” is good for 2 things.

  • Two write sets can be compared and told if they are conflicting or not.
  • A write set can be checked against a database if it’s applicable.

Before committing on the originating node, the write set is transferred to all other nodes in the cluster. The originating node checks that the transaction is not conflicting with any of the transactions in the receive queue and checks if it’s applicable to the database. This process is called certification. After the write set is certified the transaction is committed. The remote nodes will do certification asynchronously compared to the local node. Since the certification is deterministic, they will get the same result. Also the write set on the remote nodes can be applied later because of this reason. This kind of replication is called virtually synchronous, which means that the data transfer is synchronous, but the actual apply is not.

We have a nice flowchat about this.

Since the write set is only transferred before commit, InnoDB row level locks, which are held locally, are not held on remote nodes (if these were escalated, each row lock would take a network round trip to acquire). This also means that by default if multiple nodes are used, the ability to read your own writes is not guaranteed. In that case, a certified transaction, which is already committed on the originating node can still sit in the receive queue of the node the application is reading from, waiting to be applied.

SELECT … FOR UPDATE

The SELECT … FOR UPDATE construct reads the given records in InnoDB, and locks the rows that are read from the index the query used, not only the rows that it returns. Given how write set replication works, the row locks of SELECT … FOR UPDATE are not replicated.

Putting it together

Let’s create a test table.

CREATE TABLE `t` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

And some records we can lock.

pxc1> insert into t values();
Query OK, 1 row affected (0.01 sec)
pxc1> insert into t values();
Query OK, 1 row affected (0.01 sec)
pxc1> insert into t values();
Query OK, 1 row affected (0.01 sec)
pxc1> insert into t values();
Query OK, 1 row affected (0.00 sec)
pxc1> insert into t values();
Query OK, 1 row affected (0.01 sec)
pxc1> select * from t;
+----+---------------------+
| id | ts                  |
+----+---------------------+
|  1 | 2014-06-26 21:37:01 |
|  4 | 2014-06-26 21:37:02 |
|  7 | 2014-06-26 21:37:02 |
| 10 | 2014-06-26 21:37:03 |
| 13 | 2014-06-26 21:37:03 |
+----+---------------------+
5 rows in set (0.00 sec)

On the first node, lock the record.

pxc1> start transaction;
Query OK, 0 rows affected (0.00 sec)
pxc1> select * from t where id=1 for update;
+----+---------------------+
| id | ts                  |
+----+---------------------+
|  1 | 2014-06-26 21:37:01 |
+----+---------------------+
1 row in set (0.00 sec)

On the second, update it with an autocommit transaction.

pxc2> update t set ts=now() where id=1;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0
pxc1> select * from t;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transa

Truncated by Planet PHP, read more at the original (another 5638 bytes)

Link
Jean-Jerome SchmidtWebinar Replay & Slides: Performance Tuning of HAProxy for Database Load Balancing (11.9.2014, 11:55 UTC)
September 11, 2014
By Severalnines

 

Thanks to everyone who attended and participated in this week’s webinar on ‘Performance Tuning of HAProxy for Database Load Balancing’. And special thanks to our guest speaker, Baptiste Assmann of HAProxy Technologies. 

 

If you missed the sessions or would like to watch the webinar again & browse through the slides, they are now available online.

 

Watch the replay of this webinar to learn about what HAProxy can tell you about your application and database instances. And understand the difference between short-lived connections and persistent connections, and how they affect the performance and availability of your applications. 

 

Watch the replay

Performance Tuning of HAProxy for Database Load Balancing from Severalnines AB

 

Read the slides

Performance Tuning of HAProxy for Database Load Balancing - Slides from Severalnines AB

 

This webinar replay covers the performance tuning basics for HAProxy and explains how to take advantage of some of the new features in 1.5, released in June 2014 after 4 years of development work.

 

Full Agenda

  • New features in HAProxy 1.5
  • What can HAProxy tell you about your application and your database
  • OS/Network performance tuning for databases with short living connections
  • Handling persistent connections
  • Dynamic re-configuration
  • HAProxy multi-process: advantages and limitations
  • HAProxy active/active failover setups
  • Security considerations

 

read more

Link
Henrik IngoSelling Open Source 101: The sales funnel and its variables (11.9.2014, 08:56 UTC)

Since I joined MongoDB it seems I have mostly been doing technical blogs. Yesterday I had a conversation with a long time friend from the open source database scene, which inspired me to jot down some observations on my long time favorite topic: open source business strategy.

In fact, this will be very much a Selling Open Source 101 blog. I've come to realize that while what I'm about to write is well known to open source oldtimers, those of us who were lucky to work at Red Hat and MySQL and other first generation open source companies, these ideas are not necessarily well known to many executives and sales managers working in open source today.

read more

Link
Jean-Jerome SchmidtPuppet Module for ClusterControl - Adding Management and Monitoring to your Existing Database Clusters (10.9.2014, 13:45 UTC)
September 10, 2014
By Severalnines

If you are automating your infrastructure using Puppet, then this blog is for you. We are glad to announce the availability of a Puppet module for ClusterControl. For those using Chef, we already published Chef cookbooks for Galera Cluster and ClusterControl some time back.  

 

 

ClusterControl on Puppet Forge

 

The ClusterControl module initial release is available on Puppet Forge, installing the module is as easy as:

$ puppet module install severalnines-clustercontrol

 

If you haven’t change the default module path, this module will be installed under /etc/puppet/modules/clustercontrol on your Puppet master host. ClusterControl supports following database clusters:

  • Galera Cluster
    • MySQL Galera Cluster by Codership
    • Percona XtraDB Cluster by Percona
    • MariaDB Galera Cluster by MariaDB
  • MySQL Cluster
  • MySQL Replication
  • MongoDB or TokuMX Clusters
    • Sharded Cluster
    • Replica Set

 

Severalnines Package Repository

 

This module makes use of the Severalnines repository for yum and apt packages. This repository hosts the latest stable release of ClusterControl and all of its components.

 

ClusterControl and all of its components requires post-installation procedures, like setting up MySQL, granting users, setting up Apache and etc. This module will automate most of these. 

 

If you lookup the Severalnines package repository, you will find the following packages:

  • clustercontrol - Severalnines ClusterControl Web Application. Frontend for clustercontrol-controller. Previously known as cc-ui.
  • clustercontrol-cmonapi - Severalnines ClusterControl REST API. Previously known as cc-cmonapi.
  • clustercontrol-agent - Agent for ClusterControl. Manage and monitor MySQL, MySQL Cluster and Galera Cluster for MySQL
  • clustercontrol-controller - ClusterControl Controller. Manage and monitor MySQL, MySQL Cluster and Galera Cluster for MySQL

 

read more

Link
Peter ZaitsevGenerating test data from the mysql> prompt (10.9.2014, 08:00 UTC)

There are a lot of tools that generate test data.  Many of them have complex XML scripts or GUI interfaces that let you identify characteristics about the data. For testing query performance and many other applications, however, a simple quick and dirty data generator which can be constructed at the MySQL command line is useful.

First, let’s talk about what kind of data you can easily create with MySQL function calls:

You can generate a decimal number between zero and another number using the MySQL RAND() function like the following query (here between 0 and 10000):

SELECT RAND() * 10000;

Similarly, you can generate a random integer by adding the FLOOR() function:

SELECT FLOOR(RAND() * 10000)

You can generate a random string of 32 characters using MD5():

SELECT MD5(RAND() * 10000)

You can return a random integer between 500 and 1000 with the following:

SELECT FLOOR( 500 + RAND() * (1000 - 500))

You can return a random string from a list of strings by using a table to hold the list. A subselect can select a random name from the list of names.

create table names(id int auto_increment primary key, name varchar(20));
insert into names (name) values ('Justin','Jerry','James','Josh','Julien');
select (select name from names where id = 1 + rand() * 4);

Now we can generate a “fact” table with many rows using fairly simple SQL statements.

First create a table to generate data into:

CREATE TABLE fact (
  dim1 int,
  dim2 int,
  name varchar(20),
  hash varchar(32),
  measure1 double
);

Seed the table with one initial row:

INSERT INTO fact
VALUES (1,1,'Justin',md5(''), .1);

Now grow the table by selecting from the table but providing new random values for the inserted rows:

INSERT INTO fact
SELECT FLOOR(1+ rand()*9999),
       FLOOR(1 + rand()*499),
       (select name from names where id = 1 + rand() * 4),
       MD5(1+rand()*9999),
       rand()
 FROM fact;

As you repeat the INSERT … SELECT, the table will grow exponentially. You may want to add a LIMIT clause to the INSERT … SELECT to reduce the amount of data generated as the table grows.

You will create a table with an even data distribution for each column. You can then add some queries to add skew, either using INSERT … SELECT or UPDATE, for example:

INSERT INTO fact
SELECT 1,1,'Justin',md5(''), .1
  FROM fact
 LIMIT 10000;

That will skew the values by creating many rows with the same data as our initial row.

Using these simple tools, you can generate a data set that is great for testing purposes. For example, dim1 might be a customer_id and dim2 a product_id, and you would populate those tables with 10000 and 500 rows, respectively.

The post Generating test data from the mysql> prompt appeared first on MySQL Performance Blog.

Link
Jean-Jerome SchmidtPuppet Module for ClusterControl – Adding Management and Monitoring to your Existing Database Clusters (10.9.2014, 06:47 UTC)
July 7, 2014
By Severalnines

If you are automating your infrastructure using Puppet, then this blog is for you. We are glad to announce the availability of a Puppet module for ClusterControl. For those using Chef, we already published Chef cookbooks for Galera Cluster and ClusterControl some time back.  

 

 

ClusterControl on Puppet Forge

 

The ClusterControl module initial release is available on Puppet Forge, installing the module is as easy as:

$ puppet module install severalnines-clustercontrol

 

If you haven’t change the default module path, this module will be installed under /etc/puppet/modules/clustercontrol on your Puppet master host. ClusterControl supports following database clusters:

  • Galera Cluster
    • MySQL Galera Cluster by Codership
    • Percona XtraDB Cluster by Percona
    • MariaDB Galera Cluster by MariaDB
  • MySQL Cluster
  • MySQL Replication
  • MongoDB or TokuMX Clusters
    • Sharded Cluster
    • Replica Set

 

Severalnines Package Repository

 

This module makes use of the Severalnines repository for yum and apt packages. This repository hosts the latest stable release of ClusterControl and all of its components.

 

ClusterControl and all of its components requires post-installation procedures, like setting up MySQL, granting users, setting up Apache and etc. This module will automate most of these. 

 

If you lookup the Severalnines package repository, you will find the following packages:

  • clustercontrol - Severalnines ClusterControl Web Application. Frontend for clustercontrol-controller. Previously known as cc-ui.
  • clustercontrol-cmonapi - Severalnines ClusterControl REST API. Previously known as cc-cmonapi.
  • clustercontrol-agent - Agent for ClusterControl. Manage and monitor MySQL, MySQL Cluster and Galera Cluster for MySQL
  • clustercontrol-controller - ClusterControl Controller. Manage and monitor MySQL, MySQL Cluster and Galera Cluster for MySQL

 

read more

Link
Peter ZaitsevMulti-Valued INSERTs, AUTO_INCREMENT & Percona XtraDB Cluster (9.9.2014, 14:38 UTC)

A common migration path from standalone MySQL/Percona Server to a Percona XtraDB Cluster (PXC) environment involves some measure of time where one node in the new cluster has been configured as a slave of the production master that the cluster is slated to replace. In this way, the new cluster acts as a slave of the production environment – traditional replication takes care of getting the data into the cluster, and then Galera replication handles the intra-cluster traffic. This often works without issue, although there is one case that I’ve encountered recently where special care must be taken to properly configure the stream to ensure that replication does not break. If you use multi-valued inserts with auto-increment columns, then this post is for you.

For purposes of our discussion, assume that we have a basic 3-node PXC cluster that we’ve set up using the PXC Reference Architecture document, and that we’re replicating from an asynchronous master (call it “server A”) into one of the PXC nodes. Without loss of generality, we’ll pick PXC03. Also, for purposes of our discussion, we’ll be working with the following table definition:

serverA> show create table auto_inc_test;
CREATE TABLE `auto_inc_test` (
  `i` int(11) NOT NULL AUTO_INCREMENT,
  `stuff` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`i`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
serverA> SELECT * FROM auto_inc_test;
Empty set (0.00 sec)

If we insert rows into this table one at a time, we have no issues.

serverA> INSERT INTO auto_inc_test(stuff) VALUES ('first row');
serverA> INSERT INTO auto_inc_test(stuff) VALUES ('second row');
serverA> INSERT INTO auto_inc_test(stuff) VALUES ('third row');
serverA> SELECT * FROM auto_inc_test;
+---+------------+
| i | stuff      |
+---+------------+
| 1 | first row  |
| 2 | second row |
| 3 | third row  |
+---+------------+
PXC03> SELECT * FROM auto_inc_test;
+---+------------+
| i | stuff      |
+---+------------+
| 1 | first row  |
| 2 | second row |
| 3 | third row  |
+---+------------+

But if we start doing multi-valued inserts, we can run into a problem.

serverA> INSERT INTO auto_inc_test(stuff) VALUES('first row'),('second row'),('third row');
serverA> INSERT INTO auto_inc_test(stuff) VALUES('fourth row'),('fifth row');
serverA> SELECT * FROM auto_inc_test;
+---+------------+
| i | stuff      |
+---+------------+
| 1 | first row  |
| 2 | second row |
| 3 | third row  |
| 4 | fourth row |
| 5 | fifth row  |
+---+------------+
PXC03> SELECT * FROM auto_inc_test;
+---+------------+
| i | stuff      |
+---+------------+
| 1 | first row  |
| 2 | second row |
| 5 | third row  |
+---+------------+
PXC03> SHOW SLAVE STATUS;
... output elided ...
Last_SQL_Errno: 1062
Last_SQL_Error: Error 'Duplicate entry '5' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO auto_inc_test (stuff) VALUES ('fourth row'),('fifth row')'
... output elided ...

Uh oh. Replication is broken and our data is now inconsistent. So why does this happen and how can we prevent it?

binlog_format

The astute observer will note that I have not yet said anything about the binary log format on the master. If the binary log format on the master is already set to ROW, then the above error will not occur. RBR will properly replicate multi-valued INSERTs to the PXC cluster without issue, and the data will be consistent. Problem solved. However, there may be reasons that the master is not using or cannot use RBR, such as disk space or IOPS limitations, and thus it’s running in MIXED or STATEMENT mode. In that case, we need to look elsewhere….

wsrep_auto_increment_control

When set to ON (the default), this variable has the effect of automatically specifying values for auto_increment_increment and auto_increment_offset based on the cluster size. The idea behind it is to help prevent auto-increment value conflicts when writing to multiple nodes. However, what it also means is that in a multi-node cluster, the auto-increment values generated by any given node will never be consecutive and the “next” auto-increment value on the slave cluster node will always be higher than what the master believes it should be. For example:

serverA> INSERT INTO auto_inc_test (stuff) VALUES ('first row'),('second row'),('third row');
serverA> SHOW CREATE TABLE auto_inc_test;
CREATE TABLE `auto_inc_test` (
  `i` int(11) NOT NULL AUTO_INCREMENT,
  `stuff` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`i`)
) ENGINE=InnoDB AU

Truncated by Planet PHP, read more at the original (another 3010 bytes)

Link
Jean-Jerome SchmidtAnnouncing Severalnines Package Repository (8.9.2014, 16:45 UTC)
September 9, 2014
By Severalnines

We are excited to announce the availability of YUM/APT repositories for ClusterControl, making new releases of ClusterControl  easily accessible using YUM or APT package managers. The repo is found at http://repo.severalnines.com, with instructions provided on the landing page. Our Cluster Configurators will be using these repositories. As a result, users upgrading from s9s_upgrade_cmon starting from version v.1.2.8 will be configured with the package repository.

 

ClusterControl requires extra post-installation setup steps, such as generating an API token, configuring cmon/dcps database schema, grant privileges on cmon schema, setting up SSL and so on. We provide a post-installation script for this purpose at [Apache document root]/clustercontrol/app/tools/setup-cc.sh. If you are installing for the first time, you are required to run this script to ensure ClusterControl is properly set up.

 

In this blog post, we will show you how to use the Severalnines repository to install and manage ClusterControl packages in your infrastructure. 

 

New Package Naming

 

For the purpose of streamlining installations based on repository, we have changed the package names as follows:

Package

Old Package Name

New Package Name

ClusterControl CMON Controller

cmon-controller

Truncated by Planet PHP, read more at the original (another 4475 bytes)

Link
LinksRSS 0.92   RDF 1.
Atom Feed