Peter ZaitsevStore UUID in an optimized way (19.12.2014, 14:00 UTC)

A few years ago Peter Zaitsev, in a post titled “To UUID or not to UUID,” wrote: There is timestamp based part in UUID which has similar properties to auto_increment and which could be used to have values generated at same point in time physically local in BTREE index.”

For this post I’ve rearranged the timestamp part of UUID (Universal Unique Identifier) and did some benchmarks.

Many people store UUID as char (36) and use as row identity value (PRIMARY KEY) because it is unique across every table, every database and every server and allow easy merging of records from different databases. But here comes the problem, using it as PRIMARY KEY causes the problems described below.

Problems with UUID

  • UUID has 36 characters which makes it bulky.
  • InnoDB stores data in the PRIMARY KEY order and all the secondary keys also contain PRIMARY KEY. So having UUID as PRIMARY KEY makes the index bigger which can not be fit into the memory
  • Inserts are random and the data is scattered.

Despite the problems with UUID, people still prefer it because it is UNIQUE across every table, can be generated anywhere. In this blog, I will explain how to store UUID in an efficient way by re-arranging timestamp part of UUID.

Structure of UUID

MySQL uses UUID version 1 which is a 128-bit number represented by a utf8 string of five hexadecimal numbers

  • The first three numbers are generated from a timestamp.
  • The fourth number preserves temporal uniqueness in case the timestamp value loses monotonicity (for example, due to daylight saving time).
  • The fifth number is an IEEE 802 node number that provides spatial uniqueness. A random number is substituted if the latter is not available (for example, because the host computer has no Ethernet card, or we do not know how to find the hardware address of an interface on your operating system). In this case, spatial uniqueness cannot be guaranteed. Nevertheless, a collision should have very low probability.

The timestamp is mapped as follows:
When the timestamp has the (60 bit) hexadecimal value: 1d8eebc58e0a7d7. The following parts of the UUID are set:: 58e0a7d7-eebc-11d8-9669-0800200c9a66. The 1 before the most significant digits (in 11d8) of the timestamp indicates the UUID version, for time-based UUIDs this is 1.

Fourth and Fifth parts would be mostly constant if it is generated from a single server. First three numbers are based on timestamp, so they will be monotonically increasing. Lets rearrange the total sequence making the UUID closer to sequential. This makes the inserts and recent data look up faster. Dashes (‘-‘) make no sense, so lets remove them.
58e0a7d7-eebc-11d8-9669-0800200c9a66 => 11d8eebc58e0a7d796690800200c9a66


I created created three tables

  • events_uuid – UUID binary(16) PRIMARY KEY
  • events_int – Additional BIGINT auto increment column and made it as primary key and index on UUID column
  • events_uuid_ordered – Rearranged UUID binary(16) as PRIMARY KEY

I created three stored procedures which insert 25K random rows at a time into the respective tables. There are three more stored procedures which call the random insert-stored procedures in a loop and also calculate the time taken to insert 25K rows and data and index size after each loop. Totally I have inserted 25M records.

  • Data Size
    Horizontal Axis – Number of inserts x 25,000
    Vertical Axis – Data Size in MB
    Data Size
    The data size for UUID table is more than other two tables.
  • Index Size
    Horizontal axis – Number of inserts x 25,000
    Vertical axis – Index Size in MB
    Index Size
  • Total Size
    Horizontal Axis – Number of inserts x 25,000
    Vertical Axis – Total Size in MB

Truncated by Planet PHP, read more at the original (another 5128 bytes)

Oli SennhauserAvoid temporary disk tables with MySQL (19.12.2014, 06:38 UTC)
Taxonomy upgrade extras: 

For processing SELECT queries MySQL needs some times the help of temporary tables. These temporary tables can be created either in memory or on disk.

The number of creations of such temporary tables can be found with the following command:

mysql> SHOW GLOBAL STATUS LIKE 'created_tmp%tables';
| Variable_name           | Value |
| Created_tmp_disk_tables | 4     |
| Created_tmp_tables      | 36    |

There are 2 different reasons why MySQL is creating a temporary disk table instead of a temporary memory table:
  • The result is bigger than the smaller one of the MySQL variables max_heap_table_size and tmp_table_size.
  • The result contains columns of type BLOB or TEXT.
In the following example we can see how the temporary disk table can be avoided without changing the column types:
mysql> CREATE TABLE test (
, data TEXT

mysql> INSERT INTO test VALUES (NULL, 'State is green', 1), (NULL, 'State is green', 1)
, (NULL, 'State is red', 3), (NULL, 'State is red', 3)
, (NULL, 'State is red', 3), (NULL, 'State is orange', 2);

mysql> EXPLAIN SELECT data, COUNT(*) FROM test GROUP BY data;
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
|  1 | SIMPLE      | test  | ALL  | NULL          | NULL | NULL    | NULL |    6 | Using temporary; Using filesort |

mysql> SHOW SESSION STATUS LIKE 'created_tmp%tables';
| Variable_name           | Value |
| Created_tmp_disk_tables | 0     |
| Created_tmp_tables      | 3     |

mysql> SELECT data, COUNT(*) FROM test GROUP BY data;
| data            | count(*) |
| State is green  |        2 |
| State is orange |        1 |
| State is red    |        3 |

mysql> SHOW SESSION STATUS LIKE 'created_tmp%tables';
| Variable_name           | Value |
| Created_tmp_disk_tables | 1     |
| Created_tmp_tables      | 4     |

mysql> SELECT SUBSTR(data, 1, 32), COUNT(*) FROM test GROUP BY SUBSTR(data, 1, 32);
mysql> SHOW SESSION STATUS LIKE 'created_tmp%tables';
| Variable_name           | Value |
| Created_tmp_disk_tables | 1     |
| Created_tmp_tables      | 5     |

This method can be used if changing the table structure from TEXT to VARCHAR or the use of a RAM disk are not possible solutions.

Shlomi NoachOrchestrator 1.2.9 GA released (18.12.2014, 16:24 UTC)

Orchestrator 1.2.9 GA has been released. Noteworthy:

  • Added "ReadOnly" (true/false) configuration param. You can have orchestrator completely read-only
  • Added "AuthenticationMethod": "multi": works like BasicAuth (your normal HTTP user+password) only it also accepts the special user called "readonly", which, surprise, can only view and not modify
  • Centralized/serialized most backend database writes (with hundreds/thousands monitored servers it was possible or probable that high concurrency led to too-many-connections openned on the backend database).
  • Fixed evil evil bug that would skip some checks if binary logs were not enabled
  • Better hostname resolve (now also asking MySQL server to resolve hostname; resolving is cached)
  • Pseudo-GTID (read here, here, here) support now considered stable (apart from being tested it has already been put to practice multiple times in production at Outbrain, in different planned and unplanned crash scenarios)

I continue developing orchestrator as free and open source at my new employer,



Peter ZaitsevMaking HAProxy 1.5 replication lag aware in MySQL (18.12.2014, 15:48 UTC)

HAProxy is frequently used as a software load balancer in the MySQL world. Peter Boros, in a past post, explained how to set it up with Percona XtraDB Cluster (PXC) so that it only sends queries to available nodes. The same approach can be used in a regular master-slaves setup to spread the read load across multiple slaves. However with MySQL replication, another factor comes into play: replication lag. In this case the approach mentioned for Percona XtraDB Cluster does not work that well as the check we presented only returns ‘up’ or ‘down’. We would like to be able to tune the weight of a replica inside HAProxy depending on its replication lag. This is what we will do in this post using HAProxy 1.5.

Agent checks in HAProxy

Making HAProxy 1.5 replication lag aware in MySQLHAProxy 1.5 allows us to run an agent check, which is a check that can be added to a regular health check. The benefit of agent checks is that the return value can be ‘up’ or ‘down’, but also a weight.

What is an agent? It is simply a program that can be accessed from a TCP connection on a given port. So if we want to run an agent on a MySQL server that will:

  • Mark the server as down in HAProxy if replication is not working
  • Set the weight to 100% if the replication lag is < 10s
  • Set the weight to 50% if the replication lag is >= 10s and < 60s
  • Set the weight to 5% in all other situations

We can use a script like this:

$ less agent.php
= 10 && $lag < 60){
                return "up 50%";
                return "up 5%";
$socket = stream_socket_server("tcp://$port", $errno, $errstr);
if (!$socket) {
        echo "$errstr ($errno)
} else {
        while ($conn = stream_socket_accept($socket,9999999999999)) {
                $cmd = "$mysql -h127.0.0.1 -u$user -p$password -P$mysql_port -Ee "$query" | grep Seconds_Behind_Master | cut -d ':' -f2 | tr -d ' '";
                $weight = set_weight($lag[0]);
                fputs ($conn, $weight);
                fclose ($conn);

If you want the script to be accessible from port 6789 and connect to a MySQL instance running on port 3306, run:

$ php agent.php 6789 3306

You will also need a dedicated MySQL user:

mysql> GRANT REPLICATION CLIENT ON *.* TO 'haproxy'@'' IDENTIFIED BY 'haproxy_pwd';

When the agent is started, you can check that it is working properly:

# telnet 6789
Connected to
Escape character is '^]'.
up 100%
Connection closed by foreign host.

Assuming it is run locally on the app server, that 2 replicas are available ( and and that the application will send all reads on port 3307, you will define a frontend and a backend in your HAProxy configuration like this:

frontend read_only-front
bind *:3307
mode tcp
option tcplog
log global
default_backend read_only-back
backend read_only-back
mode tcp
balance leastconn
server slave1 weight 100 check agent-check agent-port 6789 inter 1000  rise 1  fall 1 on-marked-down shutdown-sessions
server slave2 weight 100 check agent-check agent-port 6789 inter 1000  rise 1  fall 1 on-marked-down shutdown-sessions


Now that everything is set up, let’s see how HAProxy can dynamically change the weight of the servers depending on the replication lag.

No lag

# Slave1
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# Slave2
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# HAProxy
$ echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19
# pxname,svname,status,weight

Slave1 lagging

# Slave1
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 25
# Slave2
$ my

Truncated by Planet PHP, read more at the original (another 2025 bytes)

Peter ZaitsevInnoDB’s multi-versioning handling can be Achilles’ heel (17.12.2014, 16:05 UTC)

I believe InnoDB storage engine architecture is great for a lot of online workloads, however, there are no silver bullets in technology and all design choices have their trade offs. In this blog post I’m going to talk about one important InnoDB limitation that you should consider.

InnoDB is a multiversion concurrency control (MVCC) storage engine which means many versions of the single row can exist at the same time. In fact there can be a huge amount of such row versions. Depending on the isolation mode you have chosen, InnoDB might have to keep all row versions going back to the earliest active read view, but at the very least it will have to keep all versions going back to the start of SELECT query which is currently running.

In most cases this is not a big deal – if you have many short transactions happening you will have only a few row versions to deal with. If you just use the system for reporting queries but do not modify data aggressively at the same time you also will not have many row versions. However, if you mix heavy updates with slow reporting queries going at the same time you can get into a lot of trouble.

Consider for example an application with a hot row (something like actively updated counter) which has 1000 updates per second together with some heavy batch job that takes 1000 to run. In such case we will have 1M of row versions to deal with.

Let’s now talk about how those old-row versions are stored in InnoDB – they are stored in the undo space as an essentially linked list where each row version points to the previous row version together with transaction visibility information that helps to decide which version will be visible by this query. Such design favors short new queries that will typically need to see one of the newer rows, so they do not have to go too far in this linked list. This might not be the case with reporting queries that might need to read rather old row version which correspond to the time when the query was started or logical backups that use consistent reads (think mysqldump or mydumper) which often would need to access such very old row versions.

So going through the linked list of versions is expensive, but how expensive it can get? In this case a lot depends upon whenever UNDO space fits in memory, and so the list will be traversed efficiently – or it does not, in which case you might be looking at the massive disk IO. Keep in mind undo space is not clustered by PRIMARY key, as normal data in InnoDB tables, so if you’re updating multiple rows at the same time (typical case) you will be looking at the row-version chain stored in many pages, often as little as one row version per page, requiring either massive IO or a large amount of UNDO space pages to present in the InnoDB Buffer pool.

Where it can get even worse is Index Scan. This is because Indexes are structured in InnoDB to include all row versions corresponding to the key value, current and past. This means for example the index for KEY=5 will contain pointers to all rows that either have value 5 now or had value 5 some time in the past and have not been purged yet. Now where it can really bite is the following – InnoDB needs to know which of the values stored for the key are visible by the current transaction – and that might mean going through all long-version chains for each of the keys.

This is all theory, so lets see how we can simulate such workloads and see how bad things really can get in practice.

I have created 1Bil rows “sysbench” table which takes some 270GB space and I will use a small buffer pool – 6GB. I will run sysbench with 64 threads pareto distribution (hot rows) while running a full table scan query concurrently: select avg(k) from sbtest1 Here is exact sysbench run done after prepare.

sysbench --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=pareto --oltp-table-size=1000000000 --mysql-user root --mysql-password=password  --test /usr/share/doc/sysbench/tests/db/oltp.lua run

Here is the explain for the “reporting” query that you would think to be a rather efficient index scan query. With just 4 bytes 1 Billion of values would be just 4G (really more because of InnoDB overhead) – not a big deal for modern systems:

mysql> explain select avg(k) from sbtest1 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sbtest1
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 953860873
        Extra: Using index
1 row in set (0.00 sec)

2 days have passed and the “reporting” query is still running… furthermore the foreground workload started to

Truncated by Planet PHP, read more at the original (another 13078 bytes)

Jean-Jerome SchmidtNew Webinar: A DevOps Guide to Database Infrastructure Automation for eCommerce (17.12.2014, 15:10 UTC)
December 17, 2014
By Severalnines

For an online shop, the website is the cash register. It has to be open for business 24 hours a day.

As the ops person on duty, when you get a call at 3am after your website went down, your priority number one is to restore the service asap. But why can we not have our application stack automatically recover, and not have the pager wake us at all? Why do we still stick to tedious manual processes, which take up time and resources, and hinder future growth?

Infrastructure automation isn’t easy, but it’s not rocket science either, says Riaan Nolan. Riaan has been in operations for the past decade, and has built over a dozen eCommerce properties. Automation is a worthwhile investment for retailers serious about eCommerce, but deciding on which tools to invest in can be a confusing and overwhelming process.

Join us for this webinar to understand the key pain points that online retailers experience which indicate it’s time to invest in database automation. Our guest speaker this time will be Riaan Nolan of Foodpanda, Rocket Internet’s global online food delivery marketplace operating in over 40 countries. 


New Webinar: A DevOps Guide to Database Infrastructure Automation for eCommerce





Tuesday, February 17th 2015 at 09:00 GMT (UK) / 10:00 CET (Germany, France, Sweden)

Register Now


North America/LatAm

Tuesday, February 17th 2015 at 9:00 Pacific Time (US) / 12:00 Eastern Time (US)

Register Now



  • eCommerce infrastructure challenges in 2014
  • Provisioning of test/QA and highly available production environments across multi-datacenter and multi-cloud environments
  • Building and maintaining configuration management systems such as Puppet and Chef
  • Enabling self-service infrastructure services to internal dev teams
  • Health and performance monitoring 
  • Capacity analysis and planning
  • Elastic scaling 
  • Automating failure handling
  • Disaster recovery




     Riaan Nolan

  • Expert Live Systems Administrator, foodpanda | Hellofood
  • Senior Systems Administrator / Infrastructure Lead, Rocket Internet GmbH
  • Senior Technology Manager, Africa Internet Accelerator
  • Uses Amazon EC2, VPC and Autoscale with Cloudformation.
  • First Puppet Labs Certified Professional in South Africa. 
  • Specialties: Puppet Automation, Cloud Deployments, eCommerce, eMarketing, Specialized Linux Services, Windows, Process making, Budgets, Asset Tracking, Procurement



read more

Peter ZaitsevOpenStack Live tutorials & sessions to bring OpenStack users up to speed (16.12.2014, 17:28 UTC)

I attended the OpenStack Paris summit last month (Percona had a booth there). It was my first opportunity to meet face-to-face with this thriving community of developers and users. I’m proud that Percona is part of this open source family and look forward to reconnecting with many of the developers and users I met in Paris – as well as meeting new faces – at OpenStack Live in Silicon Valley April 13-14.

OpenStack Live 2015: Sneak peak of the April conferenceOpenStack summits, generally held twice a year, are the place where (for the most part) developers meet and design “in the open,” as the OpenStack organization says. OpenStack Live 2015, held in parallel with the annual Percona Live MySQL Conference and Expo, will be a unique opportunity for users and enthusiasts to learn from leading OpenStack experts in the field about top cloud strategies, improving overall cloud performance, and operational best practices for managing and optimizing OpenStack and its MySQL database core.

OpenStack Live will also provide some serious classroom-style learning. Percona announced the OpenStack Live tutorials sessions a couple days ago. Most sessions are three hours long and because they really are “hands-on” require that you bring your laptop – and a power cord (not to be confused with a “power chord,” though those also welcome”).

Let’s take a closer look at the OpenStack Live tutorial sessions.

Barbican: Securing Your Secrets.” Join Rackspace gurus Douglas Mendizábal, Chelsea Winfree and Steve Heyman on a tour through the magical world of Barbican (yes, they are dedicated members of the Barbican project).

Don’t be intimidated if don’t have any previous experience with Barbican (and if you’ve never heard of it, more the reason to attend!). A basic understanding of security components (such as keys and certificates) and a basic understanding of ReST is helpful, but not required.

By the end of the class you will know:
1)   Importance of secret storage
2)   How to store & retrieve secrets with Barbican
3)   How to submit an order with Barbican
4)   How to create a container
5)   Use cases for Barbican / Examples
6)   The future of Barbican –Ordering SSL Certs

Deploying, Configuring and Operating OpenStack Trove.” As the title suggests, these three hours focus squarely on Trove. The tutorial – led by Tesora founder & CTO Amrith Kumar, along with Doug Shelley, the company’s vice president of product development – will begin with a quick overview of OpenStack and the various services.

If you attend this tutorial you’ll actually deploy your own OpenStack environment – and create and manage a Nova (compute) instance using a command line and a graphical user interface (Horizon). And the fun continues! You’ll then install and configure Trove, and create and manage a single MySQL instance. Finally, pupils will create and operate a simple replicated MySQL instance pair and ensure that data is being properly replicated from master to slave.

Essential DevStack.” DevStack is an opinionated script to quickly create an OpenStack development environment. It can also be used to demonstrate starting/running OpenStack services and provide examples of using them from a command line. The power of DevStack lies within small trick that if people understand can hugely improve the contribution effectiveness, quality and required time. This three-hour tutorial will be led by Red Hat senior software engineer Swapnil Kulkarni.

OpenStack Networking Introduction,” with PLUMgrid’s Valentina Alaria and Brendan Howes. Buckle your seat belts! Designed for IT professionals looking to expand their O

Truncated by Planet PHP, read more at the original (another 2884 bytes)

Jean-Jerome SchmidtGalera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison - Webinar Replay & Slides (16.12.2014, 15:33 UTC)
December 16, 2014
By Severalnines


Thanks to everyone who attended and participated in last week’s webinar on 'Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison'. If you missed the sessions or would like to watch the webinar again & browse through the slides, they are now available online.


In this webinar, Severalnines VP of Products, Alex Yu, who was part of the team at Ericsson who originally developed the NDB storage engine in 2001, gave an overview of the two clustering architectures and discussed their respective strengths and weaknesses: 

  1. MySQL Cluster architecture: strengths and limitations
  2. Galera Architecture: strengths and limitations
  3. Deployment scenarios
  4. Data migration
  5. Read and write workloads (Optimistic/pessimistic locking)
  6. WAN/Geographical replication
  7. Schema changes
  8. Management and monitoring


Watch the replay

Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison from Severalnines AB


Read the slides

Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison from Severalnines AB


read more

Peter ZaitsevMySQL Tutorials: A time to learn at Percona Live 2015 (15.12.2014, 08:00 UTC)

The many hours of intensive tutorials, led by some of the top minds in MySQL, have always been a major draw each year to the Percona Live MySQL Conference and Expo. And 2015’s event will be no exception.

Percona Live 2015 runs April 13-16 in Santa Clara, Calif. and the first day is dedicated to the classroom – so bring your laptops for the combined 45 hours of learning. MySQL tutorials are included with the full-conference pass but a “tutorial-only pass” is also available. Super-saver registration discounts have been extended until Dec. 19. Here’s a look at this year’s tutorials lineup. (There will be a couple more announced in January).

And that’s just on Monday! There will be much more over the four days of the Percona Live MySQL Conference and Expo 2015. I posted a sneak peek of the full Percona Live (initial) roster a couple weeks ago. And remember, super-saver registration discounts have been extended until Dec. 19 so register now – and don’t forgot your laptop (and power cord)!

The post MySQL Tutorials: A time to learn at Percona Live 2015 appeared first on MySQL Performance Blog.

Oli SennhauserMaking HAProxy High Available for MySQL Galera Cluster (14.12.2014, 17:37 UTC)

After properly installing and testing a Galera Cluster we see that the set-up is not finished yet. It needs something in front of the Galera Cluster that balances the load over all nodes.
So we install a load balancer in front of the Galera Cluster. Typically nowadays HAProxy is chosen for this purpose. But then we find, that the whole Galera Cluster is still not high available in case the load balancer fails or dies. So we need a second load balancer for high availability.
But how should we properly fail-over when the HAProxy load balancer dies? For this purpose we put a Virtual IP (VIP) in front of the HAProxy load balancer pair. The Virtual IP is controlled and fail-overed with Keepalived.


Installation of HAProxy and Keepalived

First some preparations: For installing socat we need the repoforge repository:

shell> cd /tmp
shell> wget
shell> yum localinstall rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm 
shell> yum update
shell> yum install socat

Then we can start installing HAProxy and Keepalived:

shell> yum install haproxy keepalived

shell> chkconfig haproxy on
shell> chkconfig keepalived on

We can check the installed HAProxy and Keepalived versions as follows:

shell> haproxy -v
HA-Proxy version 1.5.2 2014/07/12

shell> keepalived --version
Keepalived v1.2.13 (10/15,2014)

Configuration of HAProxy

More details you can find in the HAProxy documentation.

shell> cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak
shell> cat << _EOF >/etc/haproxy/haproxy.cfg
# /etc/haproxy/haproxy.cfg

# Global settings
  # to have these messages end up in /var/log/haproxy.log you will
  # need to:
  # 1) configure syslog to accept network log events.  This is done
  #    by adding the '-r' option to the SYSLOGD_OPTIONS in
  #    /etc/sysconfig/syslog
  # 2) configure local2 events to go to the /var/log/haproxy.log
  #   file. A line like the following can be added to
  #   /etc/sysconfig/syslog
  #    local2.*                       /var/log/haproxy.log
  log local2

  chroot      /var/lib/haproxy
  pidfile     /var/run/
  maxconn     1020   # See also: ulimit -n
  user        haproxy
  group       haproxy

  # turn on stats unix socket
  stats socket /var/lib/haproxy/stats.sock mode 600 level admin
  stats timeout 2m

# common defaults that all the 'frontend' and 'backend' sections will
# use if not designated in their block

Truncated by Planet PHP, read more at the original (another 4916 bytes)

LinksRSS 0.92   RDF 1.
Atom Feed