Planet MariaDB

July 30, 2015

Peter Zaitsev

Why base64-output=DECODE-ROWS does not print row events in MySQL binary logs

Lately I saw many cases when users specified option

--base64-output=DECODE-ROWS
  to print out a statement representation of row events in MySQL binary logs just to get nothing. Reason for this is obvious: option
--base64-output=DECODE-ROWS
  does not convert row events into its string representation, this is job of option
--verbose
. But why users mix these two options so often? This blog post is result of my investigations.

There are already two great blog posts about printing row events on the Percona blog: “Debugging problems with row based replication” by Justin Swanhart and “Identifying useful info from MySQL row-based binary logs” by Alok Pathak.

Both authors run

mysqlbinlog
  with options 
–base64-output=decode-rows -vv
  and demonstrate how a combination of them can produce human-readable output of row events. However, one thing which is not clear yet is what the differences are between these options. I want to underline the differences in this post.

Let’s check the user manual first.

–base64-output=value

This option determines when events should be displayed encoded as base-64 strings using BINLOG statements. The option has these permissible values (not case sensitive):

    AUTO (“automatic”) or UNSPEC (“unspecified”) displays BINLOG statements automatically when necessary (that is, for format description events and row events). If no –base64-output option is given, the effect is the same as –base64-output=AUTO.
    Note

    Automatic BINLOG display is the only safe behavior if you intend to use the output of mysqlbinlog to re-execute binary log file contents. The other option values are intended only for debugging or testing purposes because they may produce output that does not include all events in executable form.

    NEVER causes BINLOG statements not to be displayed. mysqlbinlog exits with an error if a row event is found that must be displayed using BINLOG.

    DECODE-ROWS specifies to mysqlbinlog that you intend for row events to be decoded and displayed as commented SQL statements by also specifying the –verbose option. Like NEVER, DECODE-ROWS suppresses display of BINLOG statements, but unlike NEVER, it does not exit with an error if a row event is found.

For examples that show the effect of –base64-output and –verbose on row event output, see Section 4.6.8.2, “mysqlbinlog Row Event Display”.

Literally

--base64-output=DECODE-ROWS
  just suppresses
BINLOG
  statement and does not print anything.

To test its effect I run command

insert into t values (2, 'bar');

on an InnoDB table while binary log uses ROW format. As expected if I specify no option I will receive unreadable output:

$mysqlbinlog var/mysqld.1/data/master-bin.000002
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150720 15:19:15 server id 1  end_log_pos 120 CRC32 0x3d52aee2  Start: binlog v 4, server v 5.6.25-73.1-debug-log created 150720 15:19:15
BINLOG '
Q+esVQ8BAAAAdAAAAHgAAAAAAAQANS42LjI1LTczLjEtZGVidWctbG9nAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAeKu
Uj0=
'/*!*/;
# at 120
#150720 15:19:21 server id 1  end_log_pos 192 CRC32 0xbebac59d  Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1437394761/*!*/;
SET @@session.pseudo_thread_id=2/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 192
#150720 15:19:21 server id 1  end_log_pos 239 CRC32 0xe143838b  Table_map: `test`.`t` mapped to number 70
# at 239
#150720 15:19:21 server id 1  end_log_pos 283 CRC32 0x75523a2d  Write_rows: table id 70 flags: STMT_END_F
BINLOG '
SeesVRMBAAAALwAAAO8AAAAAAEYAAAAAAAEABHRlc3QAAXQAAgMPAv8AA4uDQ+E=
SeesVR4BAAAALAAAABsBAAAAAEYAAAAAAAEAAgAC//wCAAAAA2Jhci06UnU=
'/*!*/;
# at 283
#150720 15:19:21 server id 1  end_log_pos 314 CRC32 0xd183c769  Xid = 14
COMMIT/*!*/;
# at 314
#150720 15:19:22 server id 1  end_log_pos 362 CRC32 0x892fe43b  Rotate to master-bin.000003  pos: 4
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

The

INSERT
  is here:

BINLOG '
SeesVRMBAAAALwAAAO8AAAAAAEYAAAAAAAEABHRlc3QAAXQAAgMPAv8AA4uDQ+E=
SeesVR4BAAAALAAAABsBAAAAAEYAAAAAAAEAAgAC//wCAAAAA2Jhci06UnU=
'/*!*/;

But this string is not for humans.

What will happen if I add option

--base64-output=DECODE-ROWS
 ?

$mysqlbinlog var/mysqld.1/data/master-bin.000002 --base64-output=DECODE-ROWS
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150720 15:19:15 server id 1  end_log_pos 120 CRC32 0x3d52aee2  Start: binlog v 4, server v 5.6.25-73.1-debug-log created 150720 15:19:15
# at 120
#150720 15:19:21 server id 1  end_log_pos 192 CRC32 0xbebac59d  Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1437394761/*!*/;
SET @@session.pseudo_thread_id=2/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 192
#150720 15:19:21 server id 1  end_log_pos 239 CRC32 0xe143838b  Table_map: `test`.`t` mapped to number 70
# at 239
#150720 15:19:21 server id 1  end_log_pos 283 CRC32 0x75523a2d  Write_rows: table id 70 flags: STMT_END_F
# at 283
#150720 15:19:21 server id 1  end_log_pos 314 CRC32 0xd183c769  Xid = 14
COMMIT/*!*/;
# at 314
#150720 15:19:22 server id 1  end_log_pos 362 CRC32 0x892fe43b  Rotate to master-bin.000003  pos: 4
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Row event was just suppressed!

Lets now check option verbose:

–verbose, -v

Reconstruct row events and display them as commented SQL statements. If this option is given twice, the output includes comments to indicate column data types and some metadata.

For examples that show the effect of –base64-output and –verbose on row event output, see Section 4.6.8.2, “mysqlbinlog Row Event Display”.

Surprisingly

--base64-output=DECODE-ROWS
  is not needed!:

$mysqlbinlog var/mysqld.1/data/master-bin.000002 --verbose
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150720 15:19:15 server id 1  end_log_pos 120 CRC32 0x3d52aee2  Start: binlog v 4, server v 5.6.25-73.1-debug-log created 150720 15:19:15
BINLOG '
Q+esVQ8BAAAAdAAAAHgAAAAAAAQANS42LjI1LTczLjEtZGVidWctbG9nAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAeKu
Uj0=
'/*!*/;
# at 120
#150720 15:19:21 server id 1  end_log_pos 192 CRC32 0xbebac59d  Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1437394761/*!*/;
SET @@session.pseudo_thread_id=2/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 192
#150720 15:19:21 server id 1  end_log_pos 239 CRC32 0xe143838b  Table_map: `test`.`t` mapped to number 70
# at 239
#150720 15:19:21 server id 1  end_log_pos 283 CRC32 0x75523a2d  Write_rows: table id 70 flags: STMT_END_F
BINLOG '
SeesVRMBAAAALwAAAO8AAAAAAEYAAAAAAAEABHRlc3QAAXQAAgMPAv8AA4uDQ+E=
SeesVR4BAAAALAAAABsBAAAAAEYAAAAAAAEAAgAC//wCAAAAA2Jhci06UnU=
'/*!*/;
### INSERT INTO `test`.`t`
### SET
###   @1=2
###   @2='bar'
# at 283
#150720 15:19:21 server id 1  end_log_pos 314 CRC32 0xd183c769  Xid = 14
COMMIT/*!*/;
# at 314
#150720 15:19:22 server id 1  end_log_pos 362 CRC32 0x892fe43b  Rotate to master-bin.000003  pos: 4
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

INSERT statement successfully restored as:

### INSERT INTO `test`.`t`
### SET
###   @1=2
###   @2='bar'
# at 283

Why do the bloggers mentioned above suggest to use

--base64-output=DECODE-ROWS
 ? Lets try to use both options:

$mysqlbinlog var/mysqld.1/data/master-bin.000002 --base64-output=DECODE-ROWS --verbose
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150720 15:19:15 server id 1  end_log_pos 120 CRC32 0x3d52aee2  Start: binlog v 4, server v 5.6.25-73.1-debug-log created 150720 15:19:15
# at 120
#150720 15:19:21 server id 1  end_log_pos 192 CRC32 0xbebac59d  Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1437394761/*!*/;
SET @@session.pseudo_thread_id=2/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 192
#150720 15:19:21 server id 1  end_log_pos 239 CRC32 0xe143838b  Table_map: `test`.`t` mapped to number 70
# at 239
#150720 15:19:21 server id 1  end_log_pos 283 CRC32 0x75523a2d  Write_rows: table id 70 flags: STMT_END_F
### INSERT INTO `test`.`t`
### SET
###   @1=2
###   @2='bar'
# at 283
#150720 15:19:21 server id 1  end_log_pos 314 CRC32 0xd183c769  Xid = 14
COMMIT/*!*/;
# at 314
#150720 15:19:22 server id 1  end_log_pos 362 CRC32 0x892fe43b  Rotate to master-bin.000003  pos: 4
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

In this case row event was suppressed and statement is printed. Also, the resulting file cannot be used to re-apply events, because statements are commented out. This is very useful when binary log is big and you just need to investigate what it contains, not re-apply events.

This is not main purpose of this post, but you can also find information about columns metadata if specify option

--verbose
  twice:

$mysqlbinlog var/mysqld.1/data/master-bin.000002 --base64-output=DECODE-ROWS --verbose --verbose
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150720 15:19:15 server id 1  end_log_pos 120 CRC32 0x3d52aee2  Start: binlog v 4, server v 5.6.25-73.1-debug-log created 150720 15:19:15
# at 120
#150720 15:19:21 server id 1  end_log_pos 192 CRC32 0xbebac59d  Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1437394761/*!*/;
SET @@session.pseudo_thread_id=2/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 192
#150720 15:19:21 server id 1  end_log_pos 239 CRC32 0xe143838b  Table_map: `test`.`t` mapped to number 70
# at 239
#150720 15:19:21 server id 1  end_log_pos 283 CRC32 0x75523a2d  Write_rows: table id 70 flags: STMT_END_F
### INSERT INTO `test`.`t`
### SET
###   @1=2 /* INT meta=0 nullable=1 is_null=0 */
###   @2='bar' /* VARSTRING(255) meta=255 nullable=1 is_null=0 */
# at 283
#150720 15:19:21 server id 1  end_log_pos 314 CRC32 0xd183c769  Xid = 14
COMMIT/*!*/;
# at 314
#150720 15:19:22 server id 1  end_log_pos 362 CRC32 0x892fe43b  Rotate to master-bin.000003  pos: 4
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Note, this is, again, job of

--verbose
 , not
--base64-output=DECODE-ROWS

To conclude:

If you want to see statement representation of row events use option

--verbose (-v)

If you want to see metadata of columns specify
--verbose
  twice:
--verbose --verbose
  or
-vv

If you want to suppress output of row events specify option
--base64-output=DECODE-ROWS

The post Why base64-output=DECODE-ROWS does not print row events in MySQL binary logs appeared first on MySQL Performance Blog.

by Sveta Smirnova at July 30, 2015 07:00 AM

Jean-Jerome Schmidt

Webinar Replay & Slides: Become a MySQL DBA - Designing High Availability for MySQL

Thanks to everyone who joined us yesterday for this live session on designing HA for MySQL led by Krzysztof Książek, Senior Support Engineer at Severalnines. The replay and slides to the webinar are now available to watch and read online via the links below.

Watch the replay:

 

Read the slides:

 

AGENDA

  • HA - what is it?
  • Caching layer
  • HA solutions
    • MySQL Replication
    • MySQL Cluster
    • Galera Cluster
    • Hybrid Replication
  • Proxy layer
    • HAProxy
    • MaxScale
    • Elastic Load Balancer (AWS)
  • Common issues
    • Split brain scenarios
    • GTID-based failover and Errant Transactions

 

SPEAKER

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. This webinar builds upon recent blog posts and related webinar series by Krzysztof on how to become a MySQL DBA.

For further blogs in this series visit: http://www.severalnines.com/blog-categories/db-ops

Blog category:

by Severalnines at July 30, 2015 02:05 AM

July 29, 2015

Shlomi Noach

Pseudo GTID, ASCENDING

Pseudo GTID is a technique where we inject Globally Unique entries into MySQL, gaining GTID abilities without using GTID. It is supported by orchestrator and described in more detail here, here and here.

Quick recap: we can join two slaves to replicate from one another even if they never were in parent-child relationship, based on our uniquely identifiable entries which can be found in the slaves' binary logs or relay logs. Having Pseudo-GTID injected and controlled by us allows us to optimize failovers into quick operations, especially where a large number of server is involved.

Ascending Pseudo-GTID further speeds up this process for delayed/lagging slaves.

Recap, visualized

(but do look at the presentation):

pseudo-gtid-quick

  1. Find last pseudo GTID in slave’s binary log (or last applied one in relay log)
  2. Search for exact match on new master’s binary logs
  3. Fast forward both through successive identical statements until end of slave’s applied entries is reached
  4. Point slave into cursor position on master

What happens if the slave we wish to reconnect is lagging? Or perhaps it is a delayed replica, set to run 24 hours behind its master?

The naive approach would expand bullet #2 into:

  • Search for exact match on master’s last binary logs
  • Unfound? Move on to previous (older) binary log on master
  • Repeat

The last Pseudo-GTID executed by the slave was issued by the master over 24 hours ago. Suppose the master generates one binary log per hour. This means we would need to full-scan 24 binary logs of the master where the entry will not be found; to only be matched in the 25th binary log (it's an off-by-one problem, don't hold the exact number against me).

Ascending Pseudo GTID

Since we control the generation of Pseudo-GTID, and since we control the search for Pseudo-GTID, we are free to choose the form of Pseudo-GTID entries. We recently switched into using Ascending Pseudo-GTID entries, and this works like a charm. Consider these Pseudo-GTID entries:

drop view if exists `meta`.`_pseudo_gtid_hint__asc:55B364E3:0000000000056EE2:6DD57B85`
drop view if exists `meta`.`_pseudo_gtid_hint__asc:55B364E8:0000000000056EEC:ACF03802`
drop view if exists `meta`.`_pseudo_gtid_hint__asc:55B364ED:0000000000056EF8:06279C24`
drop view if exists `meta`.`_pseudo_gtid_hint__asc:55B364F2:0000000000056F02:19D785E4`

The above entries are ascending in lexical order. The above is generated using a UTC timestamp, along with other watchdog/random values. For a moment let's trust that our generation is indeed always ascending. How does that help us?

Suppose the last entry found in the slave is

drop view if exists `meta`.`_pseudo_gtid_hint__asc:55B364E3:0000000000056EE2:6DD57B85`

And this is what we're to search on the master's binary logs. Starting with the optimistic hope that the entry is in the master's last binary log, we start reading. By nature of binary logs we have to scan them sequentially from start to end. As we read the binary log entries, we soon meet the first Pseudo-GTID injection, and it reads:

drop view if exists `meta`.`_pseudo_gtid_hint__asc:55B730E6:0000000000058F02:19D785E4`

 

At this stage we know we can completely skip scanning the rest of the binary log. Our entry will not be there: this entry is larger than the one we're looking for, and they'll only get larger as we get along in the binary log. It is therefore safe to ignore the rest of this file and move on to the next-older binary log on the master, to repeat our search there.

Binary logs where the entry cannot be in are only briefly examined: orchestrator will probably read no more than first 1,000 entries or so (can't give you a number, it's your workload) before giving up on the binary log.

On every topology chain we have 2 delayed replica slaves, to help us out in the case we make a grave mistake of DELETing the wrong data. These slaves would take, on some chains, 5-6 minutes to reconnect to a new master using Pseudo-GTID, since it required scanning many many GBs of binary logs. This is no longer the case; we've reduced scan time for such servers to about 25s at worst, and much quicker on average. There can still be dozens of binary logs to open, but all but one are given up very quickly. I should stress that those 25s are nonblocking for other slaves which are mote up to date than the delayed replicas.

Can there be a mistake?

Notice that the above algorithm does not require each and every entry to be ascending; it just compares the first entry in each binlog to determine whether our target entry is there or not. This means if we've messed up our Ascending order and injected some out-of-order entries, we can still get away with it -- as long as those entries are not the first ones in the binary log, nor are they the last entries executed by the slave.

But why be so negative? We're using UTC timestamp as the major sorting order, and inject Pseudo-GTID every 5 seconds; even with leap second we're comfortable.

On my TODO is to also include a "Plan B" full-scan search: if the Ascending algorithm fails, we can still opt for the full scan option. So there would be no risk at all.

Example

We inject Pseudo-GTID via event-scheduler. These are the good parts of the event definition:

create event if not exists
  create_pseudo_gtid_event
  on schedule every 5 second starts current_timestamp
  on completion preserve
  enable
  do
    begin
      set @connection_id := connection_id();
      set @now := now();
      set @rand := floor(rand()*(1 << 32));
      set @pseudo_gtid_hint := concat_ws(':', lpad(hex(unix_timestamp(@now)), 8, '0'), lpad(hex(@connection_id), 16, '0'), lpad(hex(@rand), 8, '0'));

      set @_create_statement := concat('drop ', 'view if exists `meta`.`_pseudo_gtid_', 'hint__asc:', @pseudo_gtid_hint, '`');
      PREPARE st FROM @_create_statement;
      EXECUTE st;
      DEALLOCATE PREPARE st;

We accompany this by the following orchestrator configuration:

 "PseudoGTIDPattern": "drop view if exists .*?`_pseudo_gtid_hint__",
 "PseudoGTIDMonotonicHint": "asc:",

"PseudoGTIDMonotonicHint" notes a string; if that string ("asc:") is found in the slave's Pseudo-GTID entry, then the entry is assumed to have been injected as part of ascending entries, and the optimization kicks in.

The Manual has more on this.

by shlomi at July 29, 2015 10:59 AM

Peter Zaitsev

Multi-source replication in MySQL 5.7 vs Tungsten Replicator

MySQL 5.7 comes with a new set of features and multi-source replication is one of them. In few words this means that one slave can replicate from different masters simultaneously.

During the last couple of months I’ve been playing a lot with this trying to analyze its potential in a real case that I’ve been facing while working with a customer.

This was motivated because my customer is already using multi-sourced slaves with Tungsten Replicator and I wanted to do a side-by-side comparison between Tungsten Replicator and Multi-source Replication in MySQL 5.7

Consider the following scenario:

mixed
DB1 is our main master attending mostly writes from several applications, it also needs to serve read traffic which is putting it’s capacity close to the limit. It has attached 6 replication slaves using regular replication.
A1, A2, A3, B1, B2 and DB7 are reporting slaves used to offload some reads from master and also woking on some offline ETL processes.

Since they had some idle capacity customer decided to go further and set a different architecture:
A1 and B1 became also masters of other slaves using Tungsten Replicator, in this case group A is a set of servers for a statistics application and B is attending a finance application, so A2, A3 and B2 became multi sourced slaves.
New applications writes directly to A1 and B1 without impacting write capacity of main master.

Pros and Cons of this approach

Pros

  • It just works. We’ve been running this way for a long time now and we haven’t suffered major issues.
  • Tungsten Replicator has some built in tools and scripts to make slave provision easy.

Cons

  • Tungsten Replicator is a great product but bigger than needed for this architecture. In some cases we had to configure Java Virtual Machine with 4GB of RAM to make it work properly.
  • Tungsten is a complex tool that needs some extra expertise to deploy it, make it work and troubleshoot issues when errors happen (i.e. handling duplicate keys errors)

With all this in mind we moved a step forward and started to test if we can move this architecture to use legacy replication only.

New architecture design:
Blank Flowchart - New Page (7)

We added some storage capacity to DB7  for our testing purposes and the goal here is to replace all Tungsten replicated slaves by a single server where all databases are consolidated.

For some data dependency we weren’t able to completely separate A1 and B1 servers to become master-only so they are currently acting as masters of DB7 and slaves of DB1 By data dependency I mean DB1 replicates it’s schemas to all of it’s direct slaves, including DB7.  DB7 also gets replication of the finance DB running locally to B1 and stats DB running locally to A1.

Some details about how this was done and what multi source is implemented:

  • The main difference between regular replication, as known up to 5.6 version, is that now you have replication channels, each channel means a different source, in other words each master has it’s own replication channel.
  • Replication needs to be set as crash safe, meaning that both master_info_repository and
    relay_log_info_repository variables needs to be set to TABLE
  • We haven’t considered GTID because servers acting as masters have different versions than our test multi-sourced slave.
  • log_slave_updates needs to be disabled in A1 and B2 to avoid having duplicate data in DB7 due replication flow.

Pros and Cons of this approach

Pros

  • MySQL 5.7 can replicate from different versions of master, we tested multi-source replication working along with 5.5 and 5.6 simultaneously and didn’t suffer problems besides those known changes with timestamp based fields.
  • Administration becomes easier. Any DBA already familiar with legacy replication can adapt to handle multiple channels without much learning, some new variables and a couple of new tables and you’re ready to go here.

Cons

  • 5.7 is not production ready yet. At this point we don’t have a GA release data which means that we may expect bugs to appear in the short/mid term.
  • Multi-source is still tricky for some special cases: database and table filtering works globally (can’t set per-channel filters) and administration commands like sql_slave_skip_counter is a global command still which means you can’t easily skip a statement in a particular channel.

Now the funny part: The How

It was easier than you think. First of all we needed to start from a backup of data coming from our masters. Due to versions used in production (main master is 5.5, A1 and B1 are 5.6) we started from a logical dump so we avoided to deal with mysql_upgrade issues.

Disclaimer: this does not pretend to be a guide on how to setup multi-source replication

For the matter of our case we did the backup/restore using mydumper/myloader as follow:

[root@db1]$ mydumper -l 600 -v 3 -t 8 --outputdir /mnt/backup_db1/20150708 --less-locking --regex="^(database1.|database2.|database3.)"
[root@a1]$ mydumper -l 600 -v 3 -t 8 --outputdir /mnt/backup_a1/20150708 --less-locking --regex="^(tungsten_stats.|stats.)"
[root@b1]$ mydumper -l 600 -v 3 -t 8 --outputdir /mnt/backup_b1/20150708 --less-locking --regex="^(tungsten_finance.|finance.)"

Notice each command was run in each master server, now the restore part:

[root@db7]$ myloader -d /mnt/backup_db1/20150708  -o -t 8 -q 10000 -h localhost
[root@db7]$ myloader -d /mnt/backup_a1/20150708 -o -t 8 -q 10000 -h localhost
[root@db7]$ myloader -d /mnt/backup_b1/20150708 -o -t 8 -q 10000 -h localhost

So at this point we have a new slave with a copy of databases from 3 different masters, just for context we need to dump/restore tungsten* databases because they are constantly updated by Replicator (which at this point is still in use). Pretty easy right?

Now the most important part of this whole process, setting up replication. The procedure is very similar than regular replication but now we need to consider which binlog position is necessary for each replication channel, this is very easy to get from each backup by reading in this case the metadata file created by mydumper. In known backup methods (either logical or physical) you have a way to get binlog coordinates, for example –master-data=2 in mysqldump or xtrabackup_binlog_info file in xtrabackup.

Once we get the replication info (and created a replication user in master) then we only need to run the known CHANGE MASTER TO and START SLAVE commands, but here we have our new way to do it:

db7:information_schema> change master to master_host='db1', master_user='rep', master_password='rep', master_log_file='db1-bin.091487', master_log_pos=74910596 FOR CHANNEL 'main_master';
       Query OK, 0 rows affected (0.02 sec)
db7:information_schema> change master to master_host='a1', master_user='rep', master_password='rep', master_log_file='a1-bin.394460', master_log_pos=56004 FOR CHANNEL 'a1_slave';
       Query OK, 0 rows affected (0.02 sec)
db7:information_schema> change master to master_host='b1', master_user='rep', master_password='rep', master_log_file='b1-bin.1653245', master_log_pos=2563356 FOR CHANNEL 'b1_slave';
       Query OK, 0 rows affected (0.02 sec)

Replication is set and now we are good to go:

db10:information_schema> START SLAVE FOR CHANNEL 'main_master';
       Query OK, 0 rows affected (0.00 sec)
db10:information_schema> START SLAVE FOR CHANNEL 'a1_slave';
       Query OK, 0 rows affected (0.00 sec)
db10:information_schema> START SLAVE FOR CHANNEL 'b1_slave';
       Query OK, 0 rows affected (0.00 sec)

New commands includes the FOR CHANNEL 'channel_name' option to handle replication channels independently

At this point we have a slave running 3 replication channels from different sources, we can check the status of replication with our known command SHOW SLAVE STATUS (TL;DR)

db10:information_schema> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: db1
                  Master_User: rep
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: db1-bin.077011
          Read_Master_Log_Pos: 15688468
               Relay_Log_File: db7-relay-main_master.000500
                Relay_Log_Pos: 18896705
        Relay_Master_Log_File: db1-bin.076977
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table: mysql.%,temp.%
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 18896506
              Relay_Log_Space: 2260203264
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 31047
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1004
                  Master_UUID: 65107c0c-7ab5-11e4-a85a-bc305bf01f00
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: System lock
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name: main_master
*************************** 2. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: a1
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: a1-bin.072336
          Read_Master_Log_Pos: 10329256
               Relay_Log_File: db7-relay-db3_slave.000025
                Relay_Log_Pos: 10329447
        Relay_Master_Log_File: a1-bin.072336
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table: mysql.%,temp.%
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 10329256
              Relay_Log_Space: 10329697
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 4000
                  Master_UUID: 0f061ec4-6fad-11e4-a069-a0d3c10545b0
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name: a1_slave
*************************** 3. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: b1.las1.fanops.net
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: b1-bin.093214
          Read_Master_Log_Pos: 176544432
               Relay_Log_File: db7-relay-db8_slave.000991
                Relay_Log_Pos: 176544623
        Relay_Master_Log_File: b1-bin.093214
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table: mysql.%,temp.%
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 176544432
              Relay_Log_Space: 176544870
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1001
                  Master_UUID:
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name: b1_slave
3 rows in set (0.00 sec)

Yeah I know, output is too large and the Oracle guys noticed it, too, so they have created a set of new tables in performance_schema DB to help us retrieving this information in a friendly manner, check this link for more information. We could also run SHOW SLAVE STATUS FOR CHANNEL 'b1_slave' for instance

Some limitations found during tests:

  • As mentioned some configurations are still global and can’t be set per replication channel, for instance replication filters which can be set without restarting MySQL but they will affect all replication channels as you can see here.
  • Replication events are somehow serialized at slave side, just like a global counter that is not well documented yet. In reality this means that you need to be very careful when troubleshooting issues because you may suffer unexpected issues, for instance if you have 2 replication channels failing with a duplicate key error then is not easy to predict which even you will skip when running set global sql_slave_skip_counter=1

Conclusions
So far this new feature looks very nice and provides some extra flexibility to slaves which helps to reduce architecture complexity when we want to consolidate databases from different sources into a single server. After some time testing it I’d say that I prefer this type of replication over Tungsten Replicator in this kind of scenarios due it’s simplicity for administration, i.e. pt-table-checksum and pt-table-sync will work without proper limitations of Tungsten.

With the exception of some limitations that need to be addressed, I believe this new feature is game changing and will definitely make DBA’s life easier. I still have a lot to test still but that is material for a future post.

The post Multi-source replication in MySQL 5.7 vs Tungsten Replicator appeared first on MySQL Performance Blog.

by Franciso Bordenave at July 29, 2015 07:00 AM

July 28, 2015

Jean-Jerome Schmidt

Become a MySQL DBA blog series - Database upgrades

Database vendors typically release patches with bug/security fixes on a monthly basis, why should we care? The news is full of reports of security breaches and hacked systems, so unless security is not a concern, you might want to have the most current security fixes on your systems. Major versions are rarer, and usually harder (and riskier) to upgrade to. But they might bring along some important features that make the upgrade worth the effort.

In this blog post, we will cover one of the most basic tasks of the DBA - minor and major database upgrades.  

This is the sixth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include Replication Topology Changes, Schema Changes, High Availability, Backup & Restore, Monitoring & Trending.

MySQL upgrades

Once every couple of years, a MySQL version becomes outdated and is not longer supported by Oracle. It happened to MySQL 5.1 on December 4, 2013, and earlier to MySQL 5.0 on January 9, 2012. It will also happen to MySQL 5.5 somewhere in 2018, 8 years after the GA was released. It means that for both MySQL 5.0 and MySQL 5.1, users cannot rely on fixes - not even for serious, security bugs. This is usually the point where you really need to plan an upgrade of MySQL to a newer version.

You won’t be dealing only with major version upgrades, though - it’s more likely that you’ll be upgrading to minor versions more often, like 5.6.x -> 5.6.y. Most likely, it is so that the newest version brings some fixes for bugs that affect your workload, but it can be any other reason. 

There is a significant difference in the way you perform a major and a minor version upgrade.

Preparations

Before you can even think about performing an upgrade, you need to decide what kind of testing you need to do. Ideally, you have a staging/development environment where you do tests for your regular releases. If that is the case, the best way of doing pre-upgrade tests will be to build a database layer of your staging environment using the new MySQL version. Once that is done, you can proceed with a regular set of tests. More is better - you want to focus not only on the “feature X works/does not work” aspect but also performance.

On the database side, you can also do some generic tests. For that you would need a list of queries in a slow log format. Then you can use pt-upgrade to run them on both the old and the new MySQL version, comparing the response time and result sets. In the past, we have noticed that pt-upgrade returns a lot of false positives - it may report a query as slow while in fact, the query is perfectly fine on both versions. For that, you may want to introduce some additional sanity checks - parse pt-upgrade output, grab the slow queries it reported, execute them once more on the servers and compare the results again. What you need to keep in mind that you should connect to both old and new database servers in the same way (socket connection will be faster than TCP).

Typical results from such generic tests are queries where the execution plan has changed - usually it’s enough to add some indexes or force the optimizer to pick a correct one. You can also see queries with discrepancies in the result set - it’s most likely a result of lack of explicit ORDER BY in the query - you can’t rely on rows being sorted the way they are if you didn’t sort them explicitly.

Minor version upgrades

A minor upgrade is relatively easy to perform - most of the time, all you need to do is to just install the new version using the package manager of your distribution. Once you do that, you need to ensure that MySQL has been started after the upgrade and then you should run the mysql_upgrade script. This script goes through the tables in your database and ensures all of them are compatible with the current version. It may also fix your system tables if required.

Obviously, installing the new version of a package requires the service to be stopped. Therefore you need to plan the upgrade process. It may slightly differ depending if you use Galera Cluster or MySQL replication.

MySQL replication

When we are dealing with MySQL replication, the upgrade process is fairly simple. You need to upgrade slave by slave, taking them out of rotation for the time required to perform the upgrade (it is a short time if everything goes right, not more than few minutes of downtime). For that you may need to do some temporary changes in your proxy configuration to ensure that the traffic won’t be routed to the slave that is under maintenance. It’s hard to give any details here because it depends on your setup. In some cases, it might not even be needed to make any changes as the proxy can adapt to topology changes on it’s own and detects which node is available and which is not. That’s how you should configure your proxy, by the way.

Once every slave has been updated, you need to execute a planned failover. We discussed the process in an earlier blog post. The process may also depend on your setup. It doesn’t have to be manual one if you have tools to automate it for you (MHA for example). Once a new master is elected and failover is completed, you should perform the upgrade on the old master which, at this point, should be slaving off the new master. This will conclude minor version upgrade for the MySQL replication setup.

Galera Cluster

With Galera, it is somewhat easier to perform upgrades - you need to stop the nodes one by one, upgrade the stopped node and then restart before moving to the next. If your proxy needs some manual tweaks to ensure traffic won’t hit nodes which are undergoing maintenance, you will have to make those changes. If it can detect everything automatically, all you need to do is to stop MySQL, upgrade and restart. Once you gone over all nodes in the cluster, the upgrade is complete.

Major version upgrades

A major version upgrade in MySQL would be 5.x -> 5.y or even 4.x > 5.y. Such upgrade is more tricky and complex that the minor upgrades we just covered in earlier paragraphs.

The recommended way of performing the upgrade is to dump and reload the data - this requires some time (depends on the database size) but it’s usually not feasible to do it while the slave is out of rotation. Even when using mydumper/myloader, the process will take too long. In general, if the dataset is larger than a hundred of gigabytes, it will probably require additional preparations.

While it might be possible to do just a binary upgrade (install new packages), it is not recommended as there could be some incompatibilities in binary format between the old version and the new one, which, even after mysql_upgrade has been executed, may still cause some problems. We’ve seen cases where a binary upgrade resulted is some weird behavior in how the optimizer works, or caused instability. All those issues were solved by performing the dump/reload process. So, while you may be ok to run a binary upgrade, you may also run into serious problems - it’s your call and eventually it’s your decision. If you decide to perform a binary upgrade, you need to do detailed (and time-consuming) tests to ensure it does not break anything. Otherwise you are at risk. That’s why dump and reload is the officially recommended way to upgrade MySQL and that’s why we will focus on this approach to the upgrade.

MySQL replication

If our setup is based on MySQL replication, we will build a slave on the new MySQL version. Let’s say we are upgrading from MySQL 5.5 to MySQL 5.6. As we have to perform a long dump/reload process, we may want to build a separate MySQL host for that. A simplest way would be to use xtrabackup to grab the data from one of the slaves along with the replication coordinates. That data will allow you to slave the new node off the old master. Once the new node (still running MySQL 5.5 - xtrabackup just moves the data so we have to use the same, original, MySQL version) is up and running, it’s time to dump the data. You can use any of the logical backup tools that we discussed in our earlier post on Backup and Restore. It doesn’t matter as long as you can restore the data later.

After the dump had been completed, it’s time to stop the MySQL, wipe out the current data directory, install MySQL 5.6 on the node, initialize the data directory using mysql_install_db script and start the new MySQL version. Then it’s time to load the dumps - a process which also may take a lot of time. Once done, you should have a new and shiny MySQL 5.6 node. It’s time now to sync it back with the master - you can use coordinates collected by xtrabackup to slave the node off a member of the production cluster running MySQL 5.5. What’s important to remember here is that, as you want to eventually slave the node off the current production cluster, you need to ensure that binary logs won’t rotate out. For large datasets, the dump/reload process may take days so you want to adjust expire_logs_days accordingly on the master. You also want to confirm you have enough free disk space for all those binlogs.

Once we have a MySQL 5.6 slaving off MySQL 5.5 master, it’s time to go over the 5.5 slaves and upgrade them. The easiest way now would be to leverage xtrabackup to copy the data from the 5.6 node. So, we take a 5.5 slave out of rotation, stop the MySQL server, wipe out data directory, upgrade MySQL to 5.6, restore data from the other 5.6 slave using xtrabackup. Once that’s done, you can setup the replication again and you should be all set.

This process is much faster than doing dump/reload for each of the slaves - it’s perfectly fine to do it once per replication cluster and then use physical backups to rebuild other slaves. If you use AWS, you can rely on EBS snapshots instead of xtrabackup. Similar to the logical backup, it doesn’t really matter how you rebuild the slaves as long as it will work.

Finally, once all of the slaves were upgraded, you need to failover from the 5.5 master to one of the 5.6 slaves. At this point it may happen that you won’t be able to keep the 5.5 in the replication (even if you setup master - master replication between them). In general, replicating from a new version of MySQL to an older one is not supported - replication might break. One way or another, you’ll want to upgrade and rebuild the old master using the same process as with slaves.

Galera Cluster

Compared to MySQL Replication, Galera is, at the same time, both trickier and easier to upgrade. A cluster created with Galera should be treated as a single MySQL server. This is crucial to remember when discussing Galera upgrades - it’s not a master with some slaves or many masters connected to each other - it’s like a single server. To perform an upgrade of a single MySQL server you need to either do the offline upgrade (take it out of rotation, dump the data, upgrade MySQL to 5.6, load the data, bring it back into rotation) or create a slave, upgrade it and finally failover to it (the process we described in the previous section, while discussing MySQL replication upgrade).

Same thing applies for Galera cluster - you either take everything down for the upgrade (all nodes) or you have to build a slave - another Galera cluster connected via MySQL replication.

An online upgrade process may look as follows. For starters, you need to create the slave on MySQL 5.6 - process is exactly the same as above: create a node with MySQL 5.5 (it can be a Galera but it’s not required), use xtrabackup to copy the data and replication coordinates, dump the data using a logical backup tool, wipe out the data directory, upgrade MySQL to 5.6 Galera, bootstrap the cluster, load the data, slave the node off the 5.5 Galera cluster.

At this point you should have two Galera clusters - 5.5 and a single node of Galera 5.6, both connected via replication. Next step will be to build the 5.6 cluster to a production size. It’s hard to tell how to do it - if you are in the cloud, you can just spin up new instances. If you are using colocated servers in a datacenter, you may need to move some of the hardware from the old to the new cluster. You need to keep in mind the total capacity of the system to make sure it can cope with some nodes taken out of rotation. While hardware management may be tricky, what is nice is that you don’t have to do much regarding building the 5.6 cluster - Galera will use SST to populate new nodes automatically.

In general, the goal of this phase is to build a 5.6 cluster that’s large enough to handle the production workload. Once it’s done, you need to failover to 5.6 Galera cluster - this will conclude the upgrade. Of course, you may still need to add some more nodes to it but it’s now a regular process of provisioning Galera nodes, only now you use 5.6 instead of 5.5.

 

Blog category:

by Severalnines at July 28, 2015 12:56 PM

Peter Zaitsev

MySQL QA Episode 9: Reducing Testcases for Experts: multi-threaded reducer.sh

Welcome to MySQL QA Episode 9. This episode will go more in-depth into reducer.sh: Reducing Testcases for Experts: multi-threaded reducer.sh

We will explore how to use reducer.sh to do true multi-threaded testcase reduction – a world’s first.

Topics:

  1. Expert configurable variables & their default reducer.sh settings
    1. PQUERY_MULTI
    2. PQUERY_MULTI_THREADS
    3. PQUERY_MULTI_CLIENT_THREADS
    4. PQUERY_MULTI_QUERIES
    5. PQUERY_REVERSE_NOSHUFFLE_OPT

Full-screen viewing @ 720p resolution recommended.

The post MySQL QA Episode 9: Reducing Testcases for Experts: multi-threaded reducer.sh appeared first on MySQL Performance Blog.

by Roel Van de Paar at July 28, 2015 10:00 AM

Henrik Ingo

It was my fault

Last Friday noonish, I was back at PDX. I had decided to invest in the Thursday night parties - to strengthen those bonds of friendship that are the backbone of the open source community - then sleep, pack and take the light rail to the airport in the morning, skipping the remaining Friday morning conference sessions. I had already been at the convention center 6 days in a row, figured it would be enough for now.

read more

by hingo at July 28, 2015 06:56 AM

July 27, 2015

MariaDB Foundation

MariaDB 10.1.6 now available

Download MariaDB 10.1.6

Release Notes Changelog What is MariaDB 10.1?

MariaDB APT and YUM Repository Configuration Generator

mariadb-seal-shaded-browntext-altThe MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.6. This is a Beta release.

See the Release Notes and Changelog for detailed information on this release and the What is MariaDB 10.1? page in the MariaDB Knowledge Base for general information about the MariaDB 10.1 series.

Thanks, and enjoy MariaDB!

by Daniel Bartholomew at July 27, 2015 03:10 PM

July 25, 2015

Shlomi Noach

What makes a MySQL server failure/recovery case?

Or: How do you reach the conclusion your MySQL master/intermediate-master is dead and must be recovered?

This is an attempt at making a holistic diagnosis of our replication topologies. The aim is to cover obvious and not-so-obvious crash scenarios, and to be able to act accordingly and heal the topology.

At Booking.com we are dealing with very large amounts of MySQL servers. We have many topologies, and many servers in each topology. See past numbers to get a feel for it. At these numbers failures happen frequently. Typically we would see normal slaves failing, but occasionally -- and far more frequently than we would like to be paged for -- an intermediate master or a master would crash. But our current (and ever in transition) setup also include SANs, DNS records, VIPs, any of which can fail and bring down our topologies.

Tackling issues of monitoring, disaster analysis and recovery processes, I feel safe to claim the following statements:

  • The fact your monitoring tool cannot access your database does not mean your database has failed.
  • The fact your monitoring tool can access your database does not mean your database is available.
  • The fact your database master is unwell does not mean you should fail over.
  • The fact your database master is alive and well does not mean you should not fail over.

Bummer. Let's review a simplified topology with a few failure scenarios. Some of these scenarios you will find familiar. Some others may be caused by setups you're not using. I would love to say I've seen it all but the more I see the more I know how strange things can become.

We will consider the simplified case of a master with three replicas: we have M as master, A, B, C as slaves.

mysql-topologies-failures

 

A common monitoring scheme is to monitor each machine's IP, availability of MySQL port (3306) and responsiveness to some simple query (e.g. "SELECT 1"). Some of these checks may run local to the machine, others remote.

Now consider your monitoring tool fails to connect to your master.

mysql-topologies-failures (1)

I've marked the slaves with question marks as the common monitoring schema does not associate the master's monitoring result to the slaves'.  Can you safely conclude your master is dead? Are your feeling comfortable with initiating a failover process? How about:

  • Temporary network partitioning; it just so happens that your monitoring tool cannot access the master, though everyone else can.
  • DNS/VIP/name cache/name resolving issue. Sometimes similar to the above; does you monitoring tool host think the master's IP is what it really is? Has something just changed? Some cache expired? Some cache is stale?
  • MySQL connection rejection. This could be due to a serious "Too many connections" problem on the master, or due to accidental network noise.

Now consider the following case: a first tier slave is failing to connect to the master:

mysql-topologies-failures (2)

The slave's IO thread is broken; do we have a problem here? Is the slave failing to connect because the master is dead, or because the slave itself suffers from a network partitioning glitch?

A holistic diagnosis

In the holistic approach we couple the master's monitoring with that of its direct slaves. Before I continue to describe some logic, the previous statement is something we must reflect upon.

We should associate the master's state with that of its direct slaves. Hence we must know which are its direct slaves. We might have slaves D, E, F, G replicating from B, C. They are not in our story. But slaves come and go. Get provisioned and de-provisioned. They get repointed elsewhere. Our monitoring needs to be aware of the state of our replication topology.

My preferred tool for the job is orchestrator, since I author it. It is not a standard monitoring tool and does not serve metrics; but it observes your topologies and records them. And notes changes. And acts as a higher level failure detection mechanism which incorporates the logic described below.

We continue our discussion under the assumption we are able to reliably claim we know our replication topology. Let's revisit our scenarios from above and then add some.

We will further only require MySQL client protocol connection to our database servers.

Dead master

A "real" dead master is perhaps the clearest failure. MySQL has crashed (signal 11); or the kernel panicked; or the disks failed; or power went off. The server is really not serving. This is observed as:

mysql-topologies-failures (3)

In the holistic approach, we observe that:

  • We cannot reach the master (our MySQL client connection fails).
  • But we are able to connect to the slaves A, B, C
  • And A, B, C are all telling us they cannot connect to the master

We have now cross referenced the death of the master with its three slaves. Funny thing is the MySQL server on the master may still be up and running. Perhaps the master is suffering from some weird network partitioning problem (when I say "weird", I mean we have it; discussed further below). And perhaps some application is actually still able to talk to the master!

And yet our entire replication topology is broken. Replication is not there for beauty; it serves our application code. And it's turning stale. Even if by some chance things are still operating on the master, this still makes for a valid failover scenario.

Unreachable master

Compare the above with:

mysql-topologies-failures (4)

Our monitoring scheme cannot reach our master. But it can reach the slaves, an they're all saying: "I'm happy!"

This gives us suspicion enough to avoid failing over. We may not actually have a problem: it's just us that are unable to connect to the master.

Right?

There are still interesting use cases. Consider the problem of "Too many connections" on the master. You are unable to connect; the application starts throwing errors; but the slaves are happy. They were there first. They started replicating at the dawn of time, long before there was an issue. Their persistent connections are good to go.

Or the master may suffer a deadlock. A long, blocking ALTER TABLE. An accidental FLUSH TABLES WITH READ LOCK. Or whatever occasional bug we hit. Slaves are still connected; but new connections are hanging; and your monitoring query is unable to process.

And still our holistic approach can find that out: as we are able to connect to our slaves, we are also able to ask them: well what have your relay logs have to say about this? Are we progressing in replication position? Do we actually find application content in the slaves' relay logs? We can do all this via MySQL protocol ("SHOW SLAVE STATUS", "SHOW RELAYLOG EVENTS").

Understanding the topology gives you greater insight into your failure case; you have increasing leevels of confidentiality in your analysis. Strike that: in your automated analysis.

Dead master and slaves

They're all gone!

mysql-topologies-failures (5)

You cannot reach the master and you cannot reach any of its slaves. Once you are able to associate your master and slaves you can conclude you either have a complete DC power failure problem (or is this cross DC?) or you are having a network partitioning problem. Your application may or may not be affected -- but at least you know where to start. Compare with:

Failed DC

mysql-topologies-failures (6)

I'm stretching it now, because when a DC fails all the red lights start flashing. Nonetheless, if M, A, B are all in one DC and C is on another, you have yet another diagnosis.

Dead master and some slaves

mysql-topologies-failures (7)

Things start getting complicated when you're unable to get an authorized answer from everyone. What happens if the master is dead as well as one of its slaves? We previously expected all slaves to say "we cannot replicate". For us, master being unreachable, some slaves being dead and all other complaining on IO thread is good enough indication that the master is dead.

All first tier slaves not replicating

mysql-topologies-failures (9)

Not a failover case, but certainly needs to ring the bells. All master's direct slaves are failing replication on some SQL error or are just stopped. Our topology is turning stale.

Intermediate masters

With intermediate master the situation is not all that different. In the below:

Untitled presentation

The servers E, F, G replicating from C provide us with the holistic view on C. D provides the holistic view on A.

Reducing noise

Intermediate master failover is a much simpler operation than master failover. Changing masters require name resolve changes (of some sort), whereas moving slaves around the topology affects no one.

This implies:

  • We don't mind over-reacting on failing over intermediate masters
  • We pay with more noise

Sure, we don't mind failing over D elsewhere, but as D is the only slave of A, it's enough that D hiccups that we might get an alert ("all" intermediate master's slaves are not replicating). To that effect orchestrator treats single slave scenarios differently than multiple slaves scenarios.

Not so fun setups and failures

At Booking.com we are in transition between setups. We have some legacy configuration, we have a roadmap, two ongoing solutions, some experimental setups, and/or all of the above combined. Sorry.

Some of our masters are on SAN. We are moving away from this; for those masters on SANs we have cold standbys in an active-passive mode; so master failure -> unmount SAN -> mount SAN on cold standby -> start MySQL on cold standby -> start recovery -> watch some TV -> go shopping -> end recovery.

Only SANs fail, too. When the master fails, switching over to the cold standby is pointless if the origin of the problem is the SAN. And given that some other masters share the same SAN... whoa. As I said we're moving away from this setup for Pseudo GTID and then for Binlog Servers.

The SAN setup also implied using VIPs for some servers. The slaves reference the SAN master via VIP, and when the cold standby start up it assumes the VIP, and the slaves know nothing about this. Same setup goes for DC masters. What happens when the VIP goes down? MySQL is running happily, but slaves are unable to connect. Does that make for a failover scenario? For intermediate masters we're pushing it to be so, failing over to a normal local-disk based server; this improves out confidence in non-SAN setups (which we have plenty of, anyhow).

Double checking

You sample your server once every X seconds. But in a failover scenario you want to make sure your data is up to date. When orchestrator suspects a dead master (i.e. cannot reach the master) it immediately contacts its direct slaves and checks their status.

Likewise, when orchestrator sees a first tier slave with broken IO thread, it immediately contacts the master to check if everything is fine.

For intermediate masters orchestrator is not so concerned and does not issue emergency checks.

How to fail over

Different story. Some other time. But failing over makes for complex decisions, based on who the replicating slaves are; with/out log-slave-updates; with-out GTID; with/out Pseudo-GTID; are binlog servers available; which slaves are available in which data centers. Or you may be using Galera (we're not) which answers most of the above.

Anyway we use orchestrator for that; it knows our topologies, knows how they should look like, understands how to heal them, knows MySQL replication rules, and invokes external processes to do the stuff it doesn't understand.

by shlomi at July 25, 2015 07:00 AM

July 24, 2015

Peter Zaitsev

InnoDB vs TokuDB in LinkBench benchmark

Previously I tested Tokutek’s Fractal Trees (TokuMX & TokuMXse) as MongoDB storage engines – today let’s look into the MySQL area.

I am going to use modified LinkBench in a heavy IO-load.

I compared InnoDB without compression, InnoDB with 8k compression, TokuDB with quicklz compression.
Uncompressed datasize is 115GiB, and cachesize is 12GiB for InnoDB and 8GiB + 4GiB OS cache for TokuDB.

Important to note is that I used tokudb_fanout=128, which is only available in our latest Percona Server release.
I will write more on Fractal Tree internals and what does tokudb_fanout mean later. For now let’s just say it changes the shape of the fractal tree (comparing to default tokudb_fanout=16).

I am using two storage options:

  • Intel P3600 PCIe SSD 1.6TB (marked as “i3600” on charts) – as a high end performance option
  • Crucial M500 SATA SSD 900GB (marked as “M500” on charts) – as a low end SATA SSD

The full results and engine options are available here

Results on Crucial M500 (throughput, more is better)

Crucial M500

    Engine Throughput [ADD_LINK/10sec]

  • InnoDB: 6029
  • InnoDB 8K: 6911
  • TokuDB: 14633

There TokuDB outperforms InnoDB almost two times, but also shows a great variance in results, which I correspond to a checkpoint activity.

Results on Intel P3600 (throughput, more is better)

Intel P3600

  • Engine Throughput [ADD_LINK/10sec]
  • InnoDB: 27739
  • InnoDB 8K: 9853
  • TokuDB: 20594

To understand the reasoning why InnoDB shines on a fast storage let’s review IO usage by all engines.
Following chart shows Reads in KiB, that engines, in average, performs for a request from client.

IO Reads

Following chart shows Writes in KiB, that engines, in average, performs for a request from client.

IO Writes

There we can make interesting observations that TokuDB on average performs two times less writes than InnoDB, and this is what allows TokuDB to be better on slow storages. On a fast storage, where there is no performance penalty on many writes, InnoDB is able to get ahead, as InnoDB is still better in using CPUs.

Though, it worth remembering, that:

  • On a fast expensive storage, TokuDB provides a better compression, which allows to store more data in limited capacity
  • TokuDB still writes two time less than InnoDB, that mean twice longer lifetime for SSD (still expensive).

Also looking at the results, I can make the conclusion that InnoDB compression is inefficient in its implementation, as it is not able to get benefits: first, from doing less reads (well, it helps to get better than uncompressed InnoDB, but not much); and, second, from a fast storage.

The post InnoDB vs TokuDB in LinkBench benchmark appeared first on MySQL Performance Blog.

by Vadim Tkachenko at July 24, 2015 02:12 PM

July 23, 2015

Peter Zaitsev

The Q&A: Creating best-in-class backup solutions for your MySQL environment

Percona MySQL and MongoDB WebinarsThank you for attending my July 15 webinar, “Creating Best in Class Backup solutions for your MySQL environment.” Due to the amount of content we discussed and some minor technical difficulties faced near the end of webinar we have decided to cover the final two slides of the presentation along with the questions asked by attendees during the webinar via this blog post.

The slides are available for download. And you can watch the webinar in it’s entirety here.

The final two slides were about our tips for having a good backup and recovery strategy. Lets see the bullet points along with what would have been their explanation during the webinar :

  • Use the three types of backups
    • Binary for full restores, new slaves
      • Binary backups are easy to restore, plus takes the least amount of time to restore. The mean time to recover is mostly bound by the time to transfer backup to the appropriate target server,
    • Logical for partial restores
      • Logical backups, especially when done table-wise come in handy when you’re wanting to restore one or few smaller tables,
    • Binlog for point in time recovery
      • Very often the need is to have Point In Time Recovery, with a Full backup of any type (Logical or Binary) its half the story, we still need the DML statements processed on the server in order to bring it to the latest state, thats where Binary logs (Binlog) backups come into picture.
  • Store on more than one server and off-site
    •  Store your backups on more than one location, what if the backup server goes down ? Considering offsite storages like Amazon S3 and Glacier with weekly or monthly backups retention can be cheaper options.
  • Test your backups!!!!
    • Testing your backups is very important, its always great to know backups are recoverable and not corrupted. Spin off an EC2 instance if you want, copy and restore the backup there, roll-forward a days worth of binlogs just to be sure.
  • Document restore procedures, script them and test them!!!
    • Also when you test your backups, make sure to document the steps to restore the backup to avoid last minute hassle over which commands to use.
  • If taking from a slave run pt-table-checksum
    • Backups are mostly taken from slaves, as such make sure to checksum them regularly, you dont wanna backup inconsistent data. 
  • Configuration files, scripts
    • Data is not the only thing you should be backing up, backup your config files, scripts and user access at a secure location.
  • Do you need to backup everything all days?
    • For very large instances doing a logical backup is a toughie, in such cases evaluate your backup needs, do you want to backup all the tables ? Most of the time smaller tables are the more important ones, and needs partial restore, backup only those.
  • Hardlinking backups can save lot of disk space in some circumstances
    • There are schemas which contains only a few high activity tables, rest of them are probably updated once a week or are updated by an archiver job that runs montly, make sure to hardlink the files with the previous backup, it can save good amount of space in such scenarios.
  • Monitor your Backups
    • Lastly, monitor your backups. You do not want to realize that you’re backup had been failing the whole time. Even a simple email notification from your backup scripts can help reduce the chance of failure.

Now lets try to answer some of the questions asked during the webinar :

Q : –use-memory=2G, is that pretty standard, if we have more more, should we have a higher value?
Usually we would evaluate the value based on size of xtrabackup_logfile (amount of transactions to apply). If you have more free memory feel free to provide it to –use-memory, you dont want to let the memory be a bottleneck in the restore process.

Q : which is the best backup option for a 8Tb DB?
Usually it would depend on what type of data would you have and business requirements for the backups. For eg: a full xtrabackup and later incrementals on the weekdays would be a good idea. Time required for backups play an important role here, backing up to a slow NAS share can be time consuming, and it will make xtrabackup record lot of transactions which will further increase your restore time. Also look into backing up very important medium-small size tables via logical backups.

Q : I’m not sure if this will be covered, but if you have a 3 master-master-master cluster using haproxy, is it recommended to run the backup from the haproxy server or directly on a specific server? Would it be wise to have a 4th server which would be part of the cluster, but not read from to perform the backups?
I am assuming this a Galera cluster setup, in which case you can do backups locally on any of the node by using tools like percona xtrabackup, however the best solution would be spinning off a slave from one of the nodes and running backups there.

Q : With Mudumper, can we strem the data over SSH or netcat to another server? Or would one have to use something like NFS? I’ve used mysqldump and piped it over netcat before.. curious if we can do that with Mydumper ?
Mydumper is similar in nature with other mysql client tools. They can be run remotely (–host option). Which means you can run mydumper from another server to backup from the master or slave. Mydumper can be piped for sure too.

Q : Is Mydumper still maintained. It hasn’t had a release since March of last year?
Indeed, Max Bubenick from Percona is currently maintaining the project. Actually he has added new features to the tool which  makes it more comprehensive and feature rich. He is planning the next release soon, stay tuned for the blog post.

Q : Is MyDumper an opensource ? prepare and restore are same ?
Absolutely. Right now we need to download the source and compile, however very soon we will have packages built for it too. Prepare and Restore are common terminologies used in the backup lingo, in the webinar, Restore means copying back the backup files from its storage location to the destination location, whereas Prepare means applying the transactions to the backup and making it ready to restore.

Q : Is binlog mirroring needed on Galera (PXC)?
It is good idea to keep binlog mirroring. Even though the IST and SST will do its job to join the node, the binlogs could play a role in case you wanted to rollforward a particular schema on a slave or QA instance.

Q : As we know that Percona XtraBackup takes Full & Incremental as well. Like that Does MyDumper helps in taking the incremental backup.?
At this moment we do not have the ability to take Incremental backups with mydumper or with any other logical backup tool. However, Weekly full backups (logical) and daily binlog backups can serve as the same strategy with other Incremental backup solutions, plus they are easy to restore :)

Q : Is it possible to encrypt the output file ? What will be Best methodology to back up data with the database size of 7 to 8 Gb and increses 25 % each day ? what is difference between innobackupex and mydumper ?
Indeed its possible to encrypt the backup files, as a matter of fact, we encrypt backup files with GPG keys before uploading to offsite storage. The best method to backup a 7 to 8G instance would be implementing all 3 types of backup we discussed in the webinar, your scenarios require planning for the future, so its always best to have different solutions available as the data grows. Innobackupex is part of the Percona-Xtrabackup toolkit and is a script which does binary backups of databases, MyDumper on the other hand is a logical backup tool which creates backups as text files.

Q : How can I optimize a MySQL dump of a large database? The main bottleneck while taking MySQL dump backup of a large database is if any table is found to be corrupted then it never goes beyond by skipping this corrupted tables temporary. Can we take database backup of large database without using locking mechanism i.e. Does someone know how to make the backup without locking the tables ? Is there any tools which would faster in restoration and backup technique or how come we use MySQL dump to optimize this kind of issue in future during crash recovery.
Mysqldump is logical backup tool, and as such it executes full table scans to backup the tables and write them down in the output file, hence its very difficult to improve performance of mysqldump (query-wise). Assuming that you’re referring the corruption to MyISAM tables, it is highly recommended you repair them before backing up, also to make sure mysqldump doesnt fail due to error on such a corrupt table try using –force option to mysqldump. If you’re using MyISAM tables first recommendation would be to switch to Innodb, with most of the tables innodb locking can be greatly reduced, actually till a point where the locking is negligible, look into –single-transaction. Faster backup recovery can be achieved with binary backups, look into using Percona Xtrabackup tool, we have comprehensive documentation to get you started.

Hope this was a good webinar and we have answered most of your questions. Stay tuned for more such webinars from Percona.

The post The Q&A: Creating best-in-class backup solutions for your MySQL environment appeared first on MySQL Performance Blog.

by Akshay Suryawanshi at July 23, 2015 01:55 PM

MySQL QA Episode 8: Reducing Testcases for Engineers: tuning reducer.sh

Welcome to MySQL QA Episode 8: Reducing Testcases for Engineers: tuning reducer.sh

  1. Advanced configurable variables & their default/vanilla reducer.sh settings
    1. FORCE_SKIPV
    2. FORCE_SPORADIC
    3. TIMEOUT_COMMAND & TIMEOUT_CHECK
    4. MULTI_THREADS
    5. MULTI_THREADS_INCREASE
    6. QUERYTIMEOUT
    7. STAGE1_LINES
    8. SKIPSTAGE
    9. FORCE_KILL
  2. Some examples
    1. FORCE_SKIPV/FORCE_SPORADIC
    2. TIMEOUT_COMMAND/TIMEOUT_CHECK

Full-screen viewing @ 720p resolution recommended.

The post MySQL QA Episode 8: Reducing Testcases for Engineers: tuning reducer.sh appeared first on MySQL Performance Blog.

by Roel Van de Paar at July 23, 2015 10:00 AM

July 22, 2015

Peter Zaitsev

SELinux and the MySQL init script

I recently worked with a customer who had a weird issue: when their MySQL server was started (Percona Server 5.5), if they try to run service mysql start a second time, the init script was not able to detect that an instance was already running. As a result, it tried to start a second instance with the same settings as the first one. Of course this fails and this creates a mess. What was the issue? A missing rule in SELinux. At least it looks like

Summary

If SELinux is set to enforcing and if you are using Percona Server on CentOS/RHEL 6 (other versions could be affected), service mysql start doesn’t work properly and a fix is simple to run:

# grep mysqld_safe /var/log/audit/audit.log | audit2allow -M mysqld_safe
# semodule -i mysqld_safe.pp
# service mysql restart

Other options are:

  • Set SELinux to permissive
  • Use the CentOS/RHEL standard MySQL init script (note I didn’t extensively check if that could trigger other errors)

How did we see the issue?

That was pretty easy: if an instance is already running and if you run service mysql start again, you should see something like this in the MySQL error log:

150717 08:47:58 mysqld_safe A mysqld process already exists

But if you rather see tons of error messages like:

2015-07-17 08:47:05 27065 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
2015-07-17 08:47:05 27065 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.

it means that the init script is broken somewhere.

Investigation

When the issue was brought to my attention, I tried to reproduce it on my local box, but with no luck. What was so special in the configuration used by the customer?

The only thing that was slightly out of the ordinary was SELinux which was set to enforcing. Then we set SELinux to permissive, and guess what? service mysql start was now working properly and it didn’t allow 2 concurrent instances to be run!

Next step was to look at the SELinux logs to find any error related to MySQL, and we discovered messages like:

type=SYSCALL msg=audit(1437121845.464:739): arch=c000003e syscall=62 success=no exit=-13
a0=475 a1=0 a2=0 a3=7fff0e954130 items=0 ppid=1 pid=5732 auid=500 uid=0 gid=0 euid=0 suid=0
fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=5 comm="mysqld_safe" exe="/bin/bash"
subj=unconfined_u:system_r:mysqld_safe_t:s0 key=(null)

At this point, we knew that a rule was missing for mysqld_safe, we needed to add a new one.

Deeper investigation

Actually what happens is that SELinux prevents this condition to be executed in mysqld_safe:

if kill -0 $PID > /dev/null 2> /dev/null

and then the script assumes that this means the mysqld process is not running. That’s why a second mysqld is started.

However users of Oracle MySQL will probably never experience that issue, simply because the init script is a bit different: before calling mysqld_safe, the init script tries to ping a potential mysqld instance and if it gets a positive reply or an Access denied error, it concludes that mysqld is already running and it doesn’t invoke mysqld_safe.

The fix

Fortunately, this is quite simple. You can generate the corresponding rule with audit2allow:

grep mysqld_safe /var/log/audit/audit.log | audit2allow -M mysqld_safe

And after checking the corresponding .te file, we were able to load that new module:

semodule -i mysqld_safe.pp

After stopping MySQL, you can now use service mysql start normally.

Conclusion

This issue was quite interesting to work on because finding the culprit was not that easy. Also it only triggers when SELinux is enabled and Percona Server is used. Now should the init script of Percona Server be fixed? I’m not sure of the potential problems that could occur if we did so, but of course feel free to leave your feedback in the comments.

The post SELinux and the MySQL init script appeared first on MySQL Performance Blog.

by Stephane Combaudon at July 22, 2015 05:09 PM

July 21, 2015

Peter Zaitsev

Percona now offering 24/7 support for MongoDB and TokuMX

Today Percona announced the immediate availability of 24/7, enterprise-class support for MongoDB and TokuMX. The new support service helps organizations achieve maximum application performance without database bloat. Customers have round-the-clock access (365 days a year) to the most trusted team of database experts in the open source community.

The news means that Percona now offers support across the entire open-source database ecosystem, including the entire LAMP stack (Linux, Apache, MySQL, and PHP/Python/Perl), providing a single, expert, proven service provider for companies to turn to in good times (always best to be proactive) – and during emergencies, too.

Today’s support announcement follows Percona’s acquisition of Tokutek, which included the Tokutek distribution of MongoDB – making Percona the first vendor to offer both MySQL and MongoDB software and solutions.

Like Percona’s other support services, support for MongoDB and TokuMX enables organizations to talk directly with Percona’s support experts at any time, day or night.

The Percona Support team is always ready to help resolve database and server instability, initiate data recovery, optimize performance, deal with response and outage issues – and ensure proactive system monitoring and alert responses. Percona also offers support across on-premises, cloud, and hybrid deployments.

The post Percona now offering 24/7 support for MongoDB and TokuMX appeared first on MySQL Performance Blog.

by Tom Diederich at July 21, 2015 07:22 PM

MySQL QA Episode 7: Reducing Testcases for Beginners – single-threaded reducer.sh!

Welcome to MySQL QA Episode #7 – Reducing Testcases for Beginners: single-threaded reducer.sh!

In this episode we’ll learn how to use reducer.sh. Topics discussed;

  1. reducer.sh introduction/concepts
  2. Basic configurable variables & their default reducer.sh settings
    1. INPUTFILE options
    2. MODE=x
    3. TEXT=”text”
    4. WORKDIR_LOCATION & WORKDIR_M3_DIRECTORY
    5. MYEXTRA
    6. MYBASE
    7. PQUERY_MOD & PQUERY_LOC
    8. MODE5_COUNTTEXT, MODE5_ADDITIONAL_TEXT & MODE5_ADDITIONAL_COUNTTEXT
    9. How to learn more about each of the settings
  3. Manual example
  4. Introduction to the script’s self-recursion concept – subreducer
  5. Quick setup re-cap, details of an already executed QA run
  6. Examples from pquery-prep-red.sh (including some issue reviews)
  7. Gotcha’s
  8. QUERYTIMEOUT & STAGE1_LINES Variables

Full-screen viewing @ 720p resolution recommended.

If the speed is too slow for you, consider setting YouTube to 1.25 playback speed.

The post MySQL QA Episode 7: Reducing Testcases for Beginners – single-threaded reducer.sh! appeared first on MySQL Performance Blog.

by Roel Van de Paar at July 21, 2015 10:00 AM

July 20, 2015

Peter Zaitsev

Percona Live Amsterdam discounted pricing ends July 26!

Percona Live Amsterdam discounted pricing ends soon!The Percona Live Data Performance Conference in Amsterdam is just two months away and it’s going to be an incredible event. With a new expanded focus on MySQL, NoSQL, and Data in the Cloud, this conference will be jam-packed with talks from some of the industry’s leading experts from MongoDB, VMware, Oracle, MariaDB, Facebook, Booking.com, Pythian, Google, Rackspace, Yelp (and many more, including of course Percona).

Early Bird pricing ends this Sunday (July 26)! So if you want to save €25, then you’d better register now. And for all of my readers, you can take an additional 10% off the entire registration price by using the promo code “10off” at checkout.

It’s also important to book your room at the Mövenpick Hotel for a special rate – but hurry because that deal ends July 27 and the rooms are disappearing fast due to some other big events going on in Amsterdam that week.

Sponsorship opportunities are also still available for Percona Live Amsterdam. Event sponsors become part of a dynamic and fast-growing ecosystem and interact with hundreds of DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors and entrepreneurs who typically attend the event. This year’s conference will feature expanded accommodations and turnkey kiosks. Current sponsors include:

We’ve got a fantastic conference schedule on tap. Sessions, which will follow each morning’s keynote addresses, feature a variety of topics related to MySQL and NoSQL, High Availability, DevOps, Programming, Performance Optimization, Replication and Backup, MySQL in the Cloud, MySQL Case Studies, Security, and What’s New in MySQL and MongoDB.

Sessions Include:
  • “InnoDB: A Journey to the Core,” Jeremy Cole, Sr. Systems Engineer, Google, Inc. and Davi Arnaut, Software Engineer, LinkedIn
  • “MongoDB Patterns and Antipatterns for Dev and Ops,” Steffan Mejia, Principal Consulting Engineer, MongoDB, Inc.
  • “NoSQL’s Biggest Lie: SQL Never Went Away,” Matthew Revell, Lead Developer Advocate, Couchbase
  • “The Future of Replication is Today: New Features in Practice,” Giuseppe Maxia, Quality Assurance Architect, VMware
  • “What’s New in MySQL 5.7,” Geir Høydalsvik, Senior Software Development Director, Oracle
Tutorials include:
  • “Best Practices for MySQL High Availability,” Colin Charles, Chief Evangelist, MariaDB
  • “Mongo Sharding from the Trench: A Veterans Field Guide,” David Murphy, Lead DBA, Rackspace Data Stores
  • “Advanced Percona XtraDB Cluster in a Nutshell, La Suite: Hands on Tutorial Not for Beginners!,” Frederic Descamps, Senior Architect, Percona

The conference’s evening events will be a perfect way to network, relax and have FUN while seeing the beautiful city of Amsterdam!

Monday night, September 21, after the tutorial sessions conclude, attendees are invited to the Delirium Cafe located across the street from the conference venue. With more than 500 beers on tap and great food, this will be the perfect way to kick off the Conference.

Tuesday night, September 22, Booking.com will be hosting the Community dinner of the year at their very own headquarters located in historic Rembrandt Square in the heart of the city. Hop on one of the sponsored canal boats that will pick you up right outside of the Mövenpick for your chance to see the city from the water on the way to the community dinner! You’ll be dropped off right next to Booking.com’s offices!

Wednesday night, September 23, there will be a closing reception taking place at the Mövenpick for your last chance to visit with our exhibitors and to wrap up what promises to be an amazing conference!

See you in Amsterdam!

The post Percona Live Amsterdam discounted pricing ends July 26! appeared first on MySQL Performance Blog.

by Kortney Runyan at July 20, 2015 09:32 PM

Fractal Tree library as a Key-Value store

As you may know, Tokutek is now part of Percona and I would like to explain some internals of TokuDB and TokuMX – what performance benefits they bring, along with further optimizations we are working on.

However, before going into deep details, I feel it is needed to explain the fundamentals of Key-Value store, and how Fractal Tree handles it.

Before that, allow me to say that I hear opinions that the “Fractal Tree” name does not reflect an internal structure and looks more like a marketing term than a technical one. I will not go into this discussion and will keep using name “Fractal Tree” just out of the respect to inventors. I think they are in a position to name their invention with any name they want.

So with that said, the Fractal Tree library implements a new data structure for a more efficient handling (with main focus on insertion, but more on this later) of Key-Value store.

You may question how Key-Value is related to SQL Transactional databases – this is more from the NOSQL world. Partially this is true, and Fractal Tree Key-Value library is successfully used in Percona TokuMX (based on MongoDB 2.4) and Percona TokuMXse (storage engine for MongoDB 3.0) products.

But if we look on a Key-Value store in general, actually it maybe a good fit to use in structural databases. To explain this, let’s take a look in Key-Value details.

So what is Key-Value data structure?

We will use a notation (k,v), or key=>val, which basically mean we associate some value “v” with a key “k”. For software developers following analogies may be close:
key-value access is implemented as dictionary in Python, associative array in PHP or map in C++.
(More details in Wikipedia)

I will define key-value structure as a list of pairs (k,v).

It is important to note that both key and value cannot be just scalars (single value), but to be compound.
That is "k1, k2, k3 => v1, v2", which we can read as (give me two values by a 3-part key).

This brings us closer to a database table structure.
If we apply additional requirement that all (k) in list (k,v) must be unique, this will represent
a PRIMARY KEY for a traditional database table.
To understand this better, let’s take a look on following table:
CREATE TABLE metrics (
ts timestamp,
device_id int,
metric_id int,
cnt int,
val double,
PRIMARY KEY (ts, device_id, metric_id),
KEY metric_id (metric_id, ts),
KEY device_id (device_id, ts)
)

We can state that Key-Value structure (ts, device_id, metric_id => cnt, val), with a requirement
"ts, device_id, metric_id" to be unique, represents PRIMARY KEY for this table, actually this is how InnoDB (and TokuDB for this matter) stores data internally.

Secondary indexes also can be represented in Key=>Value notion, for example, again, how it is used in TokuDB and InnoDB:
(seconday_index_key=>primary_key), where a key for a secondary index points to a primary key (so later we can get values by looking up primary key). Please note that that seconday_index_key may not be unique (unless we add an UNIQUE constraint to a secondary index).

Or if we take again our table, the secondary keys are defined as
(metric_id, ts => ts, device_id, metric_id)
and
(device_id, ts => ts, device_id, metric_id)

It is expected from a Key-Value storage to support basic data manipulation and extraction operations, such as:

        – Add or Insert: add

(key => value)

        pair to a collection
        – Update: from

(key => value2)

        to

(key => value2)

        , that is update

"value"

        assigned to

"key"

        .
        – Delete: remove(key): delete a pair

(key => value)

        from a collection
        – Lookup (select): give a

"value"

        assigned to

"key"

and I want to add fifth operation:

        – Range lookup: give all values for keys defined by a range, such as

"key > 5"

        or

"key >= 10 and key < 15"

They way software implements an internal structure of Key-Value store defines the performance of mentioned operations, and especially if datasize of a store grows over a memory capacity.

For the decades, the most popular data structure to represent Key-Value store on disk is B-Tree, and within the reason. I won’t go into B-Tree details (see for example https://en.wikipedia.org/wiki/B-tree), but it provides probably the best possible time for Lookup operations. However it has challenges when it comes to Insert operations.

And this is an area where newcomers to Fractal Tree and LSM-tree (https://en.wikipedia.org/wiki/Log-structured_merge-tree) propose structures which provide a better performance for Insert operations (often at the expense of Lookup/Select operation, which may become slower).

To get familiar with LSM-tree (this is a structure used by RocksDB) I recommend http://www.benstopford.com/2015/02/14/log-structured-merge-trees/. And as for Fractal Tree I am going to cover details in following posts.

The post Fractal Tree library as a Key-Value store appeared first on MySQL Performance Blog.

by Vadim Tkachenko at July 20, 2015 01:50 PM

MariaDB AB

Five things you must know about parallel replication in MariaDB 10.x

guillaumelefranc

When MariaDB 10.0 was launched as GA in 2014 it introduced a major feature: Parallel Replication. Parallel Replication is a fantastic addition to the long list of new MariaDB features, along with Global Transaction IDs. However I don’t think like many people understand this feature very well and know how to leverage its potential to the fullest extent. This blog post will explain a few of the gotchas that you could need while setting up Parallel Replication.

1. Single-thread replication can be slower in MariaDB 10.x than in MariaDB 5.5

The addition of new features in a new branch often comes at a cost. If you have a huge replication workload then you may find out that MariaDB 10.x replication is somehow slower than its counterpart in MariaDB 5.5. Thankfully, that can be counterbalanced by setting up Parallel Replication and the benefits will be much bigger than the ones you’d have while sticking with 5.5.

2. Parallel replication is not enabled by default.

Newsflash. You have to enable it, for example in your my.cnf (or SET GLOBAL, that variable is dynamic).

slave_parallel_threads = 8

I’ve seen customers putting it unnecessary large values, like 24 of 32. Don’t do that (unless you have 64 CPU Cores). You might introduce more contention, and the potential for parallel replication is not that big, which leads us to our third bullet point.

3. There needs to be a potential for parallel replication.

I won’t enter in depth in the specifics of parallel replication (I’ll point out to a couple of resources at the end of this blog post). Basically, you must have group commit happening on the master for that, in other words, write events which are committed as a group. To check if you have group commit happening, look up the two following status variables on the master:

MariaDB(db-01)[(none)]> show global status like 'binlog_%commits';
+----------------------+------------+
| Variable_name        | Value      |
+----------------------+------------+
| Binlog_commits       | 3790021298 |
| Binlog_group_commits | 3000740090 |
+----------------------+------------+

The bigger the difference between this two values, the more group commits are happening. For example above I have 1 group commit happening for 1.26 transactions, not fantastic but still OK.

4. You can make parallel replication faster by making commits slower

To increase group commit potential, you might actually need to make commits slower on the master. Don’t close this blog yet by calling me crazy. We can actually introduce a very small amount of latency on write transactions by using two configuration variables: binlog_commit_wait_count and binlog_commit_wait_usec.

The first variable, binlog_commit_wait_count, will delay a given number of transactions until they are ready to commit together. You can control this delay with the second variable, which is a given interval in microseconds which represents the maximum time those transactions will wait. So with those two variables, you can fine tune your group commit very precisely while taking a negligible performance hit on write commit time.

Of course you have to start with sensible values and increase/decrease if needed. Typical values for testing are like binlog_commit_wait_count=10 and binlog_commit_wait_usec=5000. This means that 10 transactions will wait a maximum 5ms to commit together. As you can guess, a 5ms delay is not an issue for most real-world applications.

5. You can make things even better with GTID and Domain Identifiers

Parallel Replication works without GTID, but turning on GTID has major advantages. Among other nice features like crash-safe replication and easy topology management, you can use multiple replication streams, identified each by a different Domain ID. Imagine you have two totally different applications in your server, you just have to implement the following in your code:

Application ACME -> SET gtid_domain_id=1
Application BETA -> SET gtid_domain_id=2

By using different Domain Identifiers, you will make sure those transactions always execute out-of-order, making use of two different replication threads on the slaves.

Bibliography

Try MariaDB 10

About the Author

guillaumelefranc's picture

Guillaume Lefranc is managing the MariaDB Remote DBA Services Team, delivering performance tuning and high availability services worldwide. He's a believer in DevOps culture, Agile software development, and Craft Brewing.

by guillaumelefranc at July 20, 2015 12:09 PM

Daniël van Eeden

Inserting large rows in MySQL and MariaDB

As the maximum storage size for a LONGBLOB in MySQL is 4GB and the maximum max_allowed_packet size is 1GB I was wondering how it is possible to use the full LONGBLOB.

So I started testing this. I wrote a Python script with MySQL Connector/Python and used MySQL Sandbox to bring up an instance of MySQl 5.6.25.

One of the first settings I had to change was max_allowed_packet, which was expected. I set it to 1GB.

The next setting was less expected, it was innodb_log_file_size. The server enforces that the transaction has to fit in 10% of the InnoDB log files. So I had to set it to 2 files of 5G to be able to insert one record of (almost) 1GB.

So that worked for a row of a bit less that 1GB, this is because there is some overhead in the packet and the total has to fit in 1GB.

For the next step (>1GB) I switched from Python to C so I could use mysql_stmt_send_long_data() which allows you to upload data in multiple chunks.

I expected that to work, but it didn't. This is because in MySQL 5.6 and up the max_long_data_size was replaced by max_allowed_packet. But max_allowed_packet can only be set to max 1GB and max_long_data_size can be set to 4GB.

So I switched from MySQL 5.6 to MariaDB 10.1, because MariaDB still has max_long_data_size. That worked, now I could upload rows of up to (almost) 4GB.

I also noticed InnoDB will complain in the error logs about large tuples
InnoDB: Warning: tuple size very big: 1100000025

So you can insert CD ISO images in your database. For small DVD images this could work if your connector uses the COM_STMT_SEND_LONG_DATA command.

But it's best to avoid this and keep the size of rows smaller.

The scripts I used for my tests (and references to the bugs I found):
https://github.com/dveeden/mysql_supersize

by Daniël van Eeden (noreply@blogger.com) at July 20, 2015 05:47 AM

July 17, 2015

MariaDB AB

Greater Developer Automation and Efficiency with MariaDB Enterprise Summer 2015

diptijoshi

In the last two releases of MariaDB Enterprise, we have provided enhanced performance with the introduction of certified MariaDB binaries for POWER8 and optimized binaries for the x86 platform. This Summer we make it more efficient and automated for developers and DBAs to use our high performance MariaDB binaries.

Ease of Use

MariaDB Enterprise now comes with Docker images as well as Chef recipes and cookbooks so developers can now easily deploy and run their database applications. Using Docker images from the MariaDB Enterprise repository,

  • you can run multiple instance of MariaDB Enterprise Server with different configurations on the same physical server, and
  • easily move your MariaDB Enterprise Server instance from one machine to another machine.

Greater Automation

MariaDB MaxScale™, an additional part of the MariaDB Enterprise subscription, allows DBAs to launch external scripts when MaxScale detects a change in the state of any backend server. This applies to MariaDB master-slave replication cluster, as well as MariaDB Enterprise Galera Cluster. DBAs can now automatically trigger failover by promoting a slave to master when the master fails, or can simply opt to be notified of a master or slave failure to take appropriate corrective action. You can also launch scripts with separate behaviour when a database server node is down, versus the database server being unreachable. In MariaDB Enterprise Galera Cluster, in addition to database nodes going up and down, you can also launch external scripts when the sync status of a Galera node changes.

Learn more about MariaDB Enterprise and MariaDB MaxScale, or simply download them to experience the efficiency and automation in this new release.

About the Author

diptijoshi's picture

Dipti Joshi is Sr Product Manager at MariaDB, Corp.

by diptijoshi at July 17, 2015 07:47 AM

July 16, 2015

Peter Zaitsev

Bypassing SST in Percona XtraDB Cluster with binary logs

In my previous post, I used incremental backups in Percona XtraBackup as a method for rebuilding a Percona XtraDB Cluster (PXC) node without triggering an actual SST. Practically this reproduces the SST steps, but it can be handy if you already had backups available to use.

In this post, I want to present another methodology for this that also uses a full backup, but instead of incrementals uses any binary logs that the cluster may be producing.

Binary logs on PXC

Binary logs are not strictly needed in PXC for replication, but you may be using them for backups or for asynchronous slaves of the cluster.  To set them up properly, we need the following settings added to our config:

server-id=1
log-bin
log-slave-updates

As I stated, none of these are strictly needed for PXC.

  • server-id=1 — We recommend PXC nodes share the same server-id.
  • log-bin — actually enable the binary log
  • log-slave-updates — log ALL updates to the cluster to this server’s binary log

This doesn’t need to be set on every node, but likely you would set these on at least two nodes in the cluster for redundancy.

Note that this strategy should work with or without 5.6 asynchronous GTIDs.

Recovering data with backups and binary logs

This methodology is conventional point-in-time backup recovery for MySQL.  We have a full backup that was taken at a specific binary log position:

... backup created in the past...
# innobackupex --no-timestamp /backups/full
# cat /backups/full/xtrabackup_binlog_info
node3-bin.000002	735622700

We have this binary log and all binary logs since:

-rw-r-----. 1 root root 1.1G Jul 14 18:53 node3-bin.000002
-rw-r-----. 1 root root 1.1G Jul 14 18:53 node3-bin.000003
-rw-r-----. 1 root root 321M Jul 14 18:53 node3-bin.000004

Recover the full backup

We start by preparing the backup with –apply-log:

# innobackupex --apply-log --use-memory=1G /backups/full
...
xtrabackup: Recovered WSREP position: 1663c027-2a29-11e5-85da-aa5ca45f600f:60072936
...
InnoDB: Last MySQL binlog file position 0 735622700, file name node3-bin.000002
...
# innobackupex --copy-back /backups/full
# chown -R mysql.mysql /var/lib/mysql

The output confirms the same binary log file and position that we knew from before.

Start MySQL without Galera

We need to start mysql, but without Galera so we can apply the binary log changes before trying to join the cluster. We can do this simply by commenting out all the wsrep settings in the MySQL config.

# grep wsrep /etc/my.cnf
#wsrep_cluster_address           = gcomm://pxc.service.consul
#wsrep_cluster_name              = mycluster
#wsrep_node_name                 = node3
#wsrep_node_address              = 10.145.50.189
#wsrep_provider                  = /usr/lib64/libgalera_smm.so
#wsrep_provider_options          = "gcache.size=8G; gcs.fc_limit=1024"
#wsrep_slave_threads             = 4
#wsrep_sst_method                = xtrabackup-v2
#wsrep_sst_auth                  = sst:secret
# systemctl start mysql

Apply the binary logs

We now check our binary log starting position:

# mysqlbinlog -j 735622700 node3-bin.000002 | grep Xid | head -n 1
#150714 18:38:36 server id 1  end_log_pos 735623273 CRC32 0x8426c6bc 	Xid = 60072937

We can compare the Xid on this binary log position to that of the backup. The Xid in a binary log produced by PXC will be the seqno of the GTID of that transaction. The starting position in the binary log shows us the next Xid is one increment higher, so this makes sense: we can start at this position in the binary log and apply all changes as high as we can go to get the datadir up to a more current position.

# mysqlbinlog -j 735622700 node3-bin.000002 | mysql
# mysqlbinlog node3-bin.000003 | mysql
# mysqlbinlog node3-bin.000004 | mysql

This action isn’t particularly fast as binlog events are being applied by a single connection thread. Remember that if the cluster is taking writes while this is happening, the amount of time you have is limited by the size of gcache and the rate at which it is being filled up.

Prime the grastate

Once the binary logs are applied, we can check the final log’s last position to get the seqno we need:

[root@node3 backups]# mysqlbinlog node3-bin.000004 | tail -n 500
...
#150714 18:52:52 server id 1  end_log_pos 335782932 CRC32 0xb983e3b3 	Xid = 63105191
...

This is indeed the seqno we put in our grastate.dat. Like in the last post, we can copy a grastate.dat from another node to get the proper format. However, this time we must put the proper seqno into place:

# cat grastate.dat
# GALERA saved state
version: 2.1
uuid:    1663c027-2a29-11e5-85da-aa5ca45f600f
seqno:   63105191
cert_index:

Be sure the grastate.dat has the proper permissions, uncomment the wsrep settings and restart mysql on the node:

# chown mysql.mysql /var/lib/mysql/grastate.dat
# grep wsrep /etc/my.cnf
wsrep_cluster_address           = gcomm://pxc.service.consul
wsrep_cluster_name              = mycluster
wsrep_node_name                 = node3
wsrep_node_address              = 10.145.50.189
wsrep_provider                  = /usr/lib64/libgalera_smm.so
wsrep_provider_options          = "gcache.size=8G; gcs.fc_limit=1024"
wsrep_slave_threads             = 4
wsrep_sst_method                = xtrabackup-v2
wsrep_sst_auth                  = sst:secret
# systemctl restart mysql

The node should now attempt to join the cluster with the proper GTID:

2015-07-14 19:28:50 4234 [Note] WSREP: Found saved state: 1663c027-2a29-11e5-85da-aa5ca45f600f:63105191

This, of course, still does not guarantee an IST. See my previous post for more details on the conditions needed for that to happen.

The post Bypassing SST in Percona XtraDB Cluster with binary logs appeared first on MySQL Performance Blog.

by Jay Janssen at July 16, 2015 10:00 AM

Bypassing SST in Percona XtraDB Cluster with incremental backups

Beware the SST

In Percona XtraDB Cluster (PXC) I often run across users who are fearful of SSTs on their clusters. I’ve always maintained that if you can’t cope with a SST, PXC may not be right for you, but that doesn’t change the fact that SSTs with multiple Terabytes of data can be quite costly.

SST, by current definition, is a full backup of a Donor to Joiner.  The most popular method is Percona XtraBackup, so we’re talking about a donor node that must:

  1. Run a full XtraBackup that reads its entire datadir
  2. Keep up with Galera replication to it as much as possible (though laggy donors don’t send flow control)
  3. Possibly still be serving application traffic if you don’t remove Donors from rotation.

So, I’ve been interested in alternative ways to work around state transfers and I want to present one way I’ve found that may be useful to someone out there.

Percona XtraBackup and Incrementals

It is possible to use Percona XtraBackup Full and Incremental backups to build a datadir that might possibly SST.  First we’ll focus on the mechanics of the backups, preparing them and getting the Galera GTID and then later discuss when it may be viable for IST.

Suppose I have fairly recent full Xtrabackup and and one or more incremental backups that I can apply on top of that to get VERY close to realtime on my cluster (more on that ‘VERY’ later).

# innobackupex --no-timestamp /backups/full
... sometime later ...
# innobackupex --incremental /backups/inc1 --no-timestamp --incremental-basedir /backups/full
... sometime later ...
# innobackupex --incremental /backups/inc2 --no-timestamp --incremental-basedir /backups/inc1

In my proof of concept test, I now have a full and two incrementals:

# du -shc /backups/*
909M	full
665M	inc1
812M	inc2
2.4G	total

To recover this data, I follow the normal Xtrabackup incremental apply process:

# cp -av /backups/full /backups/restore
# innobackupex --apply-log --redo-only --use-memory=1G /backups/restore
...
xtrabackup: Recovered WSREP position: 1663c027-2a29-11e5-85da-aa5ca45f600f:35694784
...
# innobackupex --apply-log --redo-only /backups/restore --incremental-dir /backups/inc1 --use-memory=1G
# innobackupex --apply-log --redo-only /backups/restore --incremental-dir /backups/inc2 --use-memory=1G
...
xtrabackup: Recovered WSREP position: 1663c027-2a29-11e5-85da-aa5ca45f600f:46469942
...
# innobackupex --apply-log /backups/restore --use-memory=1G

I can see that as I roll forward on my incrementals, I get a higher and higher GTID. Galera’s GTID is stored in the Innodb recovery information, so Xtrabackup extracts it after every batch it applies to the datadir we’re restoring.

We now have a datadir that is ready to go, we need to copy it into the datadir of our joiner node and setup a grastate.dat. Without a grastate, starting the node would force an SST no matter what.

# innobackupex --copy-back /backups/restore
# ... copy a grastate.dat from another running node ...
# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid:    1663c027-2a29-11e5-85da-aa5ca45f600f
seqno:   -1
cert_index:
# chown -R mysql.mysql /var/lib/mysql/

If I start the node now, it should see the grastate.dat with the -1 seqo and run –wsrep_recover to extract the GTID from Innodb (I could have also just put that directly into my grastate.dat).

This will allow the node to startup from merged Xtrabackup incrementals with a known Galera GTID.

But will it IST?

That’s the question.  IST happens when the selected donor has all the transactions the joiner needs to get it fully caught up inside of the donor’s gcache.  There are several implications of this:

  • A gcache is mmap allocated and does not persist across restarts on the donor.  A restart essentially purges the mmap.
  • You can query the oldest GTID seqno on a donor by checking the status variable ‘wsrep_local_cached_downto’.  This variable is not available on 5.5, so you are forced to guess if you can IST or not.
  • most PXC 5.6 will auto-select a donor based on IST.  Prior to that (i.e., 5.5) donor selection was not based on IST candidacy at all, meaning you had to be much more careful and do donor selection manually.
  • There’s no direct mapping from the earliest GTID in a gcache to a specific time, so knowing at a glance if a given incremental will be enough to IST is difficult.
  • It’s also difficult to know how big to make your gcache (set in MB/GB/etc.)  with respect to your backups (which are scheduled by the day/hour/etc.)

All that being said, we’re still talking about backups here.  The above method will only work if and only if:

  • You do frequent incremental backups
  • You have a large gcache (hopefully more on this in a future blog post)
  • You can restore a backup faster than it takes for your gcache to overflow

The post Bypassing SST in Percona XtraDB Cluster with incremental backups appeared first on MySQL Performance Blog.

by Jay Janssen at July 16, 2015 09:00 AM

July 15, 2015

Jean-Jerome Schmidt

How to Avoid SST when adding a new node to Galera Cluster for MySQL or MariaDB

State Snapshot Transfer (SST) is a way for Galera to transfer a full data copy from an existing node (donor) to a new node (joiner). If you come from a MySQL replication background, it is similar to taking a backup of a master and restoring on a slave. In Galera Cluster, the process is automated and is triggered depending on the joiner state.

SST can be painful in some occasions, as it can block the donor node (with SST methods like mysqldump or rsync) and burden it when backing up the data and feeding it to the joiner. For a dataset of a few hundred gigabytes or more, the syncing process can take hours to complete - even if you have a fast network. It might be advisable to avoid e.g. when running in WAN environments with slower connects and limited bandwidth, or if you just want a very fast way of introducing a new node in your cluster.

In this blog post, we’ll show you how to avoid SST. 

SST Methods

Through the variable wsrep_sst_method, it is possible to set the following methods:

  • mysqldump
  • rsync
  • xtrabackup/xtrabackup-v2

xtrabackup is non-blocking for the donor. Use xtrabackup-v2 (and not xtrabackup) if you are running on MySQL version 5.5.54 and later. 

Incremental State Transfer (IST)

To avoid SST, we’ll make use of IST and gcache. IST is a method to prepare a joiner by sending only the missing writesets available in the donor’s gcache. gcache is a file where a Galera node keeps a copy of writesets. IST is faster than SST, it is non-blocking and has no significant performance impact on the donor. It should be the preferred option whenever possible.

IST can only be achieved if all changes missed by the joiner are still in the gcache file of the donor. You will see the following in the donor’s MySQL error log:

WSREP: async IST sender starting to serve tcp://10.0.0.124:4568 sending 689768-761291

And on the joiner side:

150707 17:15:53 [Note] WSREP: Signalling provider to continue.
150707 17:15:53 [Note] WSREP: SST received: d38587ce-246c-11e5-bcce-6bbd0831cc0f:689767
150707 17:15:53 [Note] WSREP: Receiving IST: 71524 writesets, seqnos 689767-761291

 

Determining a good gcache size

Galera uses a pre-allocated gcache file of a specific size to store writesets in circular buffer style. By default, its size is 128MB. We have covered this in details here. It is important to determine the right size of the gcache size, as it can influence the data synchronization performance among Galera nodes.

The below gives an idea of the amount of data replicated by Galera. Run the following statement on one of the Galera node during peak hours (this works on MariaDB 10 and PXC 5.6, galera 3.x):

mysql> set @start := (select sum(VARIABLE_VALUE/1024/1024) from information_schema.global_status where VARIABLE_NAME like 'WSREP%bytes'); do sleep(60); set @end := (select sum(VARIABLE_VALUE/1024/1024) from information_schema.global_status where VARIABLE_NAME like 'WSREP%bytes'); set @gcache := (select SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(variable_value,';',29),';',-1),'=',-1),'M',1) from information_schema.global_variables where variable_name  like 'wsrep_provider_options'); select round((@end - @start),2) as `MB/min`, round((@end - @start),2) * 60 as `MB/hour`, @gcache as `gcache Size(MB)`, round(@gcache/round((@end - @start),2),2) as `Time to full(minutes)`;

+--------+---------+-----------------+-----------------------+
| MB/min | MB/hour | gcache Size(MB) | Time to full(minutes) |
+--------+---------+-----------------+-----------------------+
|   7.95 |  477.00 |  128            |                 16.10 |
+--------+---------+-----------------+-----------------------+

We can tell that the Galera node can have approximately 16 minutes of downtime, without requiring SST to join (unless Galera cannot determine the joiner state). If this is too short time and you have enough disk space on your nodes, you can change the wsrep_provider_options=”gcache.size=<value>” to an appropriate value. In this example, setting gcache.size=1G allows us to have 2 hours of node downtime with high probability of IST when the node rejoins.

Avoiding SST on New Node

Sometimes, SST is unavoidable. This can happen when Galera fails to determine the joiner state when a node is joining. The state is stored inside grastate.dat. Should the following scenarios happen, SST will be triggered:

  • grastate.dat does not exist under MySQL data directory - it could be a new node with a clean data directory, or e.g., the DBA manually deleted the file and intentionally forces Galera to perform SST 
  • This grastate.dat file has no seqno or group ID. This node crashed during DDL.
  • The seqno inside grastate.dat shows -1 while the MySQL server is still down, which means unclean shutdown or MySQL crashed/aborted due to database inconsistency. (Thanks to Jay Janssen from Percona for pointing this out)
  • grastate.dat is unreadable, due to lack of permissions or corrupted file system.

To avoid SST, we would get a full backup from one of the available nodes, restore the backup on the new node and create a Galera state file so Galera can determine the node’s state and skip SST. 

In the following example, we have two Galera nodes with a garbd. We are going to convert the garbd node to a MySQL Galera node (Db3). We have a full daily backup created using xtrabackup. We’ll create an incremental backup to get as close to the latest data before IST.

1. Install MySQL server for Galera on Db3. We will not cover the installation steps here. Please use the same Galera vendor as for the existing Galera nodes (Codership, Percona or MariaDB). If you are running ClusterControl, you can just use ‘Add Node’ function and disable the cluster/node auto recovery beforehand (as we don’t want ClusterControl to automatically join the new node, that will trigger SST):

2. The full backup is stored on the ClusterControl node (refer to the diagram above). Firstly, copy and extract the full backup to Db3:

[root@db3]$ mkdir -p /restore/full
[root@db3]$ cd /restore/full
[root@db3]$ scp root@clustercontrol:/root/backups/mysql_backup/BACKUP-1/backup-full-2015-07-08_113938.xbstream.gz .
[root@db3]$ gunzip backup-full-2015-07-08_113938.xbstream.gz
[root@db3]$ xbstream -x < backup-full-2015-07-08_113938.xbstream

3. Increase the gcache size on all nodes to increase the chance of IST. Append the gcache.size parameter in wsrep_provider_options line in the MySQL configuration file:

wsrep_provider_options="gcache.size=1G"

Perform a rolling restart, one Galera node a time (ClusterControl user can use Manage > Upgrades > Rolling Restart):

$ service mysql restart

4. Before creating an incremental backup on Db1 to get the latest data since the last full backup, we need to copy back the xtrabackup_checkpoints file that we got from the extracted full backup on Db3. The incremental backup, when applied, will bring us closer to the current database state. Since we got 2 hours of buffer after increasing the gcache size, we should have enough time to restore the backup and create the necessary files to skip SST.  

Create a base directory, which in our case will be /root/temp. Copy xtrabackup_checkpoints from Db3 into it:

[root@db1]$ mkdir -p /root/temp
[root@db1]$ scp root@db3:/restore/backup/xtrabackup_checkpoints /root/temp/

5. Create a target directory for incremental backup in Db3:

[root@db3]$ mkdir -p /restore/incremental

6. Now it is safe to create an incremental backup on Db1 based on information inside /root/temp/xtrabackup_checkpoints and stream it over to Db3 using SSH:

[root@db1]$ innobackupex --user=root --password=password --incremental --galera-info --incremental-basedir=/root/temp --stream=xbstream ./ 2>/dev/null | ssh root@db3 "xbstream -x -C /restore/incremental"

If you don’t have a full backup, you can generate one by running the following command on Db1 and stream it directly to Db3:

[root@db1]$ innobackupex --user=root --password=password --galera-info --stream=tar ./ | pigz | ssh root@db3 "tar xizvf - -C /restore/full"

7. Prepare the backup files:

[root@db3]$ innobackupex --apply-log --redo-only /restore/full
[root@db3]$ innobackupex --apply-log /restore/full --incremental-dir=/restore/incremental

Ensure you got the following line at the end of the output stream as an indicator that the above succeeded:

150710 14:08:20  innobackupex: completed OK!

8. Build a Galera state file under the /restore/full directory based on the latest information from the incremental backup. You can get the information inside /restore/incremental/xtrabackup_galera_info:

[root@db3]$ cat /restore/full/xtrabackup_galera_info
d38587ce-246c-11e5-bcce-6bbd0831cc0f:1352215

Create a new file called grastate.dat under the full backup directory:

[root@db3]$ vim /restore/full/grastate.dat

And add the following lines (based on the xtrabackup_galera_info):

# GALERA saved state
version: 2.1
uuid:    d38587ce-246c-11e5-bcce-6bbd0831cc0f
seqno:   1352215
cert_index:

9. The backup is prepared and ready to be copied over to the MySQL data directory. Clear the existing path, copy the prepared data and assign correct ownership:

[root@db3]$ rm -Rf /var/lib/mysql/*
[root@db3]$ innobackupex --copy-back /restore/full
[root@db3]$ chown -Rf mysql.mysql /var/lib/mysql

10. If you installed garbd through ClusterControl, remove it by going to Manage > Load Balancer > Remove Garbd > Remove and skip the following command. Otherwise, stop garbd service on this node:

[root@db3]$ killall -9 garbd

11. The new node is now ready to join the rest of the cluster. Fire it up:

[root@db3]$ service mysql start

Voila, the new node should bypass SST and sync via IST. Monitor the output of MySQL error log and ensure you see something like below:

150710 14:49:58 [Note] WSREP: Signalling provider to continue.
150710 14:49:58 [Note] WSREP: SST received: d38587ce-246c-11e5-bcce-6bbd0831cc0f:1352215
150710 14:49:58 [Note] WSREP: Receiving IST: 4921 writesets, seqnos 1352215-1357136

 

Blog category:

by Severalnines at July 15, 2015 11:51 AM

Henrik Ingo

Speaking at CLS and Oscon and shouting at Portland Timbers next week

Vacation is almost over - and it's still 17 degrees outside :-( It's time to start packing for Portland.

I'm as excited as ever, since this year I'm delivering 2 talks. Both fall into the category which is a long time passion of mine - open source community and business. It's refreshing to not have to talk about databases for once :-)

CLS

The Community Leadership Summit is mostly an unconference, but in recent years have started adding short 15 minute pre-arranged talks. (Kind of like morning keynotes, even if they don't call them that.) On Sunday the 19th, I will be do a talk called Open Source Governance Models Revisited.

read more

by hingo at July 15, 2015 09:49 AM

July 13, 2015

Jean-Jerome Schmidt

s9s Tools and Resources: The 'Become a MySQL DBA' Series, ClusterControl 1.2.10, Advisors and More!

Check Out Our Latest Technical Resources for MySQL, MariaDB, Postgres and MongoDB

This blog is packed with all the latest resources and tools we’ve recently published! Please do check it out and let us know if you have any comments or feedback.

Live Technical Webinar

In this webinar, we will look at some of the most widely used HA alternatives in the MySQL world and discuss their pros and cons. Krzysztof is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. This webinar builds upon recent blog posts by Krzysztof on OS and database monitoring.

Register for the webinar

Product Announcements & Resources

ClusterControl 1.2.10 Release

We were pleased to announce a milestone release of ClusterControl in May, which includes several brand new features, making it a fully programmable DevOps platform to manage leading open source databases.

ClusterControl Developer Studio Release

With ClusterControl 1.2.10, we introduced our new, powerful ClusterControl DSL (Domain Specific Language), which allows you to extend the functionality of your ClusterControl platform by creating Advisors, Auto Tuners, or “mini Programs”. Check it out and start creating your own advisors! We’d love to hear your feedback!

Technical Webinar - Replay

We recently started a ‘Become a MySQL DBA’ blog and webinar series, which we’re extending throughout the summer. Here are the first details of that

Become a MySQL DBA - Deciding on a relevant backup solution

In this webinar, we discussed the multiple ways to take backups, which method best fits specific needs and how to implement point in time recovery.

Watch the replay and view the slides here

 

Technical Blogs

Here is a listing of our most recent technical blogs. Do check them out and let us know if you have any questions.

Become a MySQL DBA Blog Series

Further Technical Blogs:

We are hiring!

We’re looking for an enthusiastic frontend developer! If you know of anyone, who might be interested, please do let us know.

We trust these resources are useful. If you have any questions on them or on related topics, please do contact us!

Your Severalnines Team

Blog category:

by Severalnines at July 13, 2015 11:21 AM

July 09, 2015

Jean-Jerome Schmidt

Become a MySQL DBA - Webinar series: Which High Availability Solution?

There are many many approaches to MySQL high availability - from traditional, loosely-coupled database setups based on asynchronous replication to more modern, tightly-coupled architectures based on synchronous replication. These offer varying degrees of protection, and DBAs almost always have to choose a trade-off between high-availability and cost.

In this webinar, we will look at some of the most widely used HA alternatives in the MySQL world and discuss their pros and cons.

DATE & TIME

Europe/MEA/APAC
Tuesday, July 28th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)
Register Now

North America/LatAm
Tuesday, July 28th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)
Register Now

AGENDA

  • HA - what is it?
  • Caching layer
  • HA solutions
    • MySQL Replication
    • MySQL Cluster
    • Galera Cluster
    • Hybrid Replication
  • - Proxy layer
    • HAProxy
    • MaxScale
    • Elastic Load Balancer (AWS)
  • - Common issues
    • Split brain scenarios 
    • GTID-based failover and Errant Transactions

SPEAKER

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. This webinar builds upon recent blog posts and related webinar series by Krzysztof on how to become a MySQL DBA.

We look forward to “seeing” you there and to insightful discussions!

Blog category:

by Severalnines at July 09, 2015 03:28 PM

Colin Charles

#PerconaLive Amsterdam – schedule now out

The schedule is out for Percona Live Europe: Amsterdam (September 21-23 2015), and you can see it at: https://www.percona.com/live/europe-amsterdam-2015/program.

From MariaDB Corporation/Foundation, we have 1 tutorial: Best Practices for MySQL High Availability – Colin Charles (MariaDB)

And 5 talks:

  1. Using Docker for Fast and Easy Testing of MariaDB and MaxScale – Andrea Tosatto (Colt Engine s.r.l.) (I expect Maria Luisa is giving this talk together – she’s a wonderful colleague from Italy)
  2. Databases in the Hosted Cloud Colin Charles (MariaDB)
  3. Database Encryption on MariaDB 10.1 Jan Lindström (MariaDB Corporation), Sergei Golubchik (Monty Program Ab)
  4. Meet MariaDB 10.1 Colin Charles (MariaDB), Monty Widenius (MariaDB Foundation)
  5. Anatomy of a Proxy Server: MaxScale Internals Ivan Zoratti (ScaleDB Inc.)

OK, Ivan is from ScaleDB now, but he was the SkySQL Ab ex-CTO, and one of the primary architects behind MaxScale! We may have more talks as there are some TBD holes to be filled up, but the current schedule looks pretty amazing already.

What are you waiting for, register now!

by Colin Charles at July 09, 2015 02:39 AM

July 08, 2015

Jean-Jerome Schmidt

Webinar Replay & Slides: Become a MySQL DBA - Deciding on a relevant backup solution

Thanks to everyone who joined us last week for this live session on backup strategies for MySQL and Galera clusters led by Krzysztof Książek, Senior Support Engineer at Severalnines. The replay and slides to the webinar are now available to watch and read online via the links below.

Watch the replay

 
Read the slides

Backup and restore is one of the most important aspects of database administration. If a database crashed and there was no way to recover it, any resulting data loss might lead to devastating results to a business. As the DBA operating a MySQL or Galera cluster in production, you need to ensure your backups are scheduled, executed and regularly tested.

In this webinar, we discussed the multiple ways to take backups, which method best fits specific needs and how to implement point in time recovery.

AGENDA

  • Logical and Physical Backup methods
    • Tools
    • mysqldump
    • mydumper
    • xtrabackup
    • snapshots
  • How backups are done in ClusterControl
  • Best practices
  • Example Setups
    • On premises / private datacenter
    • Amazon Web Services

SPEAKER

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. This webinar builds upon recent blog posts and related webinar series by Krzysztof on how to become a MySQL DBA.

database-backup-icons.jpg

Blog category:

by Severalnines at July 08, 2015 01:38 PM

July 07, 2015

Jean-Jerome Schmidt

Become a MySQL DBA blog series - Common operations - Replication Topology Changes

MySQL replication has been available for years, and even though a number of new clustering technologies showed up recently, replication is still very common among MySQL users. It is understandable as replication is a reliable way of moving your data between MySQL instances. Even if you use Galera or NDB cluster, you still may have to rely on MySQL replication to distribute your databases across WAN.

In this blog post we’d like to discuss one of the most common operations DBA has to handle - replication topology changes and planned failover. 

This is the fifth installment in the ‘Become a MySQL DBA’ blog series, and discusses one of the most common operations a DBA has to handle - replication topology changes and planned failover. Our previous posts in the DBA series include Schema Changes, High Availability, Backup & Restore, Monitoring & Trending.

Replication topology changes

In a previous blog post, we discussed the schema upgrade process and one of the ways to execute it is to perform a rolling upgrade - an operation that requires changes in replication topology. We’ll now see how this process is actually performed, and what you should keep an eye on. The whole thing is really not complex - what you need to do is to pick a slave that will become a master later on, reslave the remaining slaves off it, and then failover. Let’s get into the details.

Topology changes using GTID

First things first, the whole process depends on whether you use Global Transaction ID or regular replication. If you use GTID, you are in much better position as GTID allows you to move a host into any position in the replication chain. There’s no need for preparations, you just move the slaves around using:

STOP SLAVE;
CHANGE MASTER TO master_host='host', master_user='user', master_password='password', master_auto_position=1;
START SLAVE;

We’ll describe the failover process later in more detail, but what needs to be said now is that, once you make a failover, you’ll end up with a master host that is out of sync. There are ways to avoid that (and we’ll cover them), but if you use GTID, the problem can be easily fixed - all you need to do is to slave the old master off any other host using the command above. The old master will connect, retrieve any missing transactions, and get back in sync.

Topology changes using standard replication

Without GTID, things are definitely more complex as you can’t rely on a slave being aware of the transactions that are missing. The most important rule to keep in mind is that you have to ensure your slaves are in a known position to each other before any topology change is performed. Consider the following example.

Let’s assume the following, rather typical, replication topology: one master and three slaves.

Let’s also assume that you are executing a rolling schema change and you’d like to promote  “DB2” to become the new master. At the end you’d like the replication topology to look like this:

What needs to be accomplished is to slave DB3 and DB4 off DB2 and then finally, after the failover, to slave DB1 off DB2. Let’s start with DB3 and DB4.

If you plan to slave DB3 and DB4 off DB2, you need to enable binlogs on that host and enable log-slave-updates option. Otherwise it won’t record events from DB1 in it’s binary logs.

What’s required in the reslaving process is to have all of the involved nodes stopped at the same transaction in relation to the master. There are couple of ways to achieve that. One of them is to use START SLAVE UNTIL ... to stop them in a known position. Here is how you do that. We need to check the SHOW MASTER STATUS on the master host (DB1 in our case):

mysql> show master status\G
*************************** 1. row ***************************
             File: mysql-bin.000119
         Position: 448148420
     Binlog_Do_DB:
 Binlog_Ignore_DB:

Then, on all involved hosts (DB2, DB3 and DB4), we need to stop replication and then start it again, this time using START SLAVE UNTIL and setting it to stop at the first event, two binary logs later.

You can find a position of the first event by running mysqlbinlog on one of the binary logs:

mysqlbinlog /mysqldata/mysql-bin.000112 | head -n 10
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150705 23:45:22 server id 153011  end_log_pos 120 CRC32 0xcc4ee3be     Start: binlog v 4, server v 5.6.24-72.2-log created 150705 23:45:22
BINLOG '
ksGZVQ+zVQIAdAAAAHgAAAAAAAQANS42LjI0LTcyLjItbG9nAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAb7j
Tsw=

In our case, the first event is at the position of 4 therefore we want to start the slaves in the following manner:

START SLAVE UNTIL master_log_file='mysql-bin.000121', master_log_pos=4;

All slaves should catch up and proceed with the replication. Next step is to stop them - you can do it by issuing FLUSH LOGS on the master two times. This will rotate binlogs and eventually open a mysql-bin.000121 file. All slaves should stop at the same position (4) of this file. In this way we managed to bring all of them to the same position.

Once that’s done, the rest is simple - you need to ensure that the binlog position (checked using SHOW MASTER STATUS) doesn’t change on any of the nodes. It shouldn’t as the replication is stopped. If it does change, something is issuing writes to the slaves which is very bad position to be in - you need to investigate before you can perform any further changes. If everything is ok, then all you need is to grab the current stable binary log coordinates of DB2 (future master) and then execute CHANGE MASTER TO … on DB3 and DB4 using those positions and slaving those hosts off DB2. Once it’s done, you can commence replication on the DB2. At this point you should have following replication topology:

As we are talking about planned failover, we want to ensure that after it’s done, we can slave DB1 off DB2. For that, we need to confirm that the writes from DB2 (which will happen after the failover), will end up in DB1 as well. We can do it by setting up master - master replication between those two nodes. It’s a very simple process, as long as DB2 is not yet written to. If it is, then you might be in trouble - you’ll need to identify the source of these writes and remove it. One of the ways to check it is to convert binary logs to the plain text format using the mysqlbinlog utility and then look for the server id’s. You should not see any id’s other than that of the master.

If there are no writes hitting DB2, then you can go ahead and execute CHANGE MASTER TO … on DB1 pointing it to the DB2 and using any recent coordinates - no need to stop the replication as it is expected that no writes will be executed on DB1 while we fail over. Once you slave DB1 off DB2, you can monitor replication for any unexpected writes coming from DB2 by watching Exec_Master_Log_Pos in the SHOW SLAVE STATUS. On DB1 it should be constant as there should be nothing to execute. At this time, we have the following replication topology ready for the failover:

Failover process

Failover process is tricky to describe as it is strongly tied to the application - your requirements and procedures may vary from what we’ll describe here. Still, we think it’s a good idea to go over this process and point to some important bits that should be common for many applications.

Assuming that you have your environment in the state we described above (all slaves slaved off the master candidate and master in master-master replication with the master candidate), there’s not much else to do on the database side - you are well prepared. The rest of the process is all about ensuring that there are no violations of consistency during the failover. For that, the best way is to stop the application. Unfortunately, it is also the most expensive way.

When your application is down, you want to ensure that the database does not handle any writes and there are no new connections getting through. Writes can be verified by checking the SHOW MASTER STATUS output. Connections - by checking either processlist or Com-* counters. In general, as long as there are no writes, you should be just fine - it is not that big of a problem if there is a forgotten connection that is executing SELECTs.  It would be a problem if it executes DML from time to time.

Once you verified that no DML hits the database, you need to repoint your application to the new master. In our example, that would be DB2. Again, all depends on how exactly you have your environment set up. If you have a proxy layer, you may need to implement some changes there. You should strive to automate this process, though, using scripts that’d detect whether a node is a master or a slave. This speeds things up and results in less mistakes. A common practice is to use read_only setting to differentiate master from slaves. Proxies can then detect if the node is a master or not, and route traffic accordingly. If you use this method, you can, as soon as you confirm no writes are coming, just set the read_only=1 on DB1 and then set read_only=0 on DB2 - it should be enough to repoint the proxy to a correct host.

No matter how you repoint the app (by changing read_only setting, proxy or app configuration or a DNS entry), once you’re done with it, you should test if the application works correctly after the change. In general, it’s great to have a “test-only” mode of an app - an option to keep it offline from public access but allow you to do some testing and QA before going live after such significant change. During those tests you may want to keep an eye on the old master (DB1 in our case). It should not take any writes, yet it is not uncommon to see that some forgotten piece of code is hardcoded to connect directly to a given database and that will cause problems. If you have DB1 and DB2 in master - master replication, you should be good in terms of the data consistency. If not, this is something you need to fix before going live again.

Finally, once you verified you are all good, you can go back live and monitor the system for a while. Again, you want to keep an eye on the old master to ensure there are no writes hitting it.

As you can imagine, replication topology changes and failover processes are common operations, albeit complex. In future posts, we will discuss rolling MySQL upgrades and migrations between different environments. As we mentioned earlier, even if you use Galera or NDB Cluster you may need to use replication to connect different datacenters or providers over a WAN,  and eventually perform the standard planned failover process that we described above. 

Blog category:

by Severalnines at July 07, 2015 11:05 AM

July 03, 2015

Stephane Varoqui

Social Networking Using OQGraph

I was given the chance to experiment typical social networking query on an existing 60 Millions edges dataset

How You're Connected


Such algorithms and others are simply hardcoded into the OQGraph. 

With the upgrade of OQGraph V3 into MariaDB 10 we can proceed directly on top of the exiting tables holding the edges kine of featured VIRTUAL VIEW. 



CREATE OR REPLACE TABLE `relations` (
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `id1` int(10) unsigned NOT NULL,
  `id2` int(10) unsigned NOT NULL,
  `relation_type` tinyint(3) unsigned DEFAULT NULL,
  KEY `id1` (`id1`),
  KEY `id2` (`id2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

oqgraph=# select count(*) from relations;

+----------+
| count(*) |
+----------+
| 59479722 |
+----------+
1 row in set (23.05 sec)

Very nice integration of table discovery that save me referring to documentation to found out all columns definition.  

CREATE TABLE `oq_graph`
ENGINE=OQGRAPH `data_table`='relations' `origid`='id1' `destid`='id2';

oqgraph=# SELECT * FROM oq_graph WHERE latch='breadth_first' AND origid=175135 AND destid=7;
+---------------+--------+--------+--------+------+--------+
| latch         | origid | destid | weight | seq  | linkid |
+---------------+--------+--------+--------+------+--------+
| breadth_first | 175135 |      7 |   NULL |    0 | 175135 |
| breadth_first | 175135 |      7 |      1 |    1 |      7 |
+---------------+--------+--------+--------+------+--------+
2 rows in set (0.00 sec)


oqgraph=# SELECT * FROM oq_graph WHERE latch='breadth_first' AND origid=175135 AND destid=5615775;
+---------------+--------+---------+--------+------+----------+
| latch         | origid | destid  | weight | seq  | linkid   |
+---------------+--------+---------+--------+------+----------+
| breadth_first | 175135 | 5615775 |   NULL |    0 |   175135 |
| breadth_first | 175135 | 5615775 |      1 |    1 |        7 |
| breadth_first | 175135 | 5615775 |      1 |    2 | 13553091 |
| breadth_first | 175135 | 5615775 |      1 |    3 |  1440976 |
| breadth_first | 175135 | 5615775 |      1 |    4 |  5615775 |
+---------------+--------+---------+--------+------+----------+
5 rows in set (0.44 sec)

What we first highlight is that underlying table indexes KEY `id1` (`id1`), KEY `id2` (`id2`) are used by OQgrah to navigate the vertices via a number of key reads and range scans, such 5 level relation was around 2689 jump and 77526  range access to the table . 

Meaning the death of the graph was around 2500 with an average of 30 edges per vertex 

# MyISAM

oqgraph=# SELECT * FROM oq_graph_myisam WHERE latch='breadth_first' AND origid=175135 AND destid=5615775;
+---------------+--------+---------+--------+------+----------+
| latch         | origid | destid  | weight | seq  | linkid   |
+---------------+--------+---------+--------+------+----------+
| breadth_first | 175135 | 5615775 |   NULL |    0 |   175135 |
| breadth_first | 175135 | 5615775 |      1 |    1 |        7 |
| breadth_first | 175135 | 5615775 |      1 |    2 | 13553091 |
| breadth_first | 175135 | 5615775 |      1 |    3 |  1440976 |
| breadth_first | 175135 | 5615775 |      1 |    4 |  5615775 |
+---------------+--------+---------+--------+------+----------+
5 rows in set (0.11 sec)

Need to investigate more such speed difference using MyISAM. Ideas are welcome ?

by Stephane Varoqui (noreply@blogger.com) at July 03, 2015 11:05 PM

Slave Election is welcoming GTID

Slave election is a popular HA architecture,  first MySQL MariaDB toolkit to manage switchover and failover in a correct way was introduce by Yoshinori Matsunobu into MHA.

Failover and switchover in asynchronous clusters require caution:

- The CAP theorem need to be satisfy. Getting strong consistency, require the slave election to reject transactions ending up in the old master when electing the candidate master.

- Slave election need to take care that all events on the old master are applied to the candidate master before switching roles.

- Should be instrumented to found a good candidate master and make sure it's setup to take the master role.

- Need topology detection, a master role can't be pre defined, as the role is moving around nodes .

- Need monitoring to escalate switchover to failover.

MHA as been coded at a time no unique event id was possible in a cluster,  each event was track as independent coordinate on each node, making MHA architecture to have an internal way to rematch coordinate on all nodes.

With introduction of GTID, MHA brings the heritage and looks like unnecessary complex, with an agent base solution and ssh connections requirement to all nodes .

A lighter MHA was needed for MariaDB when the replication is using GTID, and that's what my colleague Guillame Lefranc have been addressing inside a new MariaDB toolkit

In MariaDB GTID usage is as simple as:

#>stop slave;change master to master_use_gtid=current_pos;start slave; 

As a bonus, the code is in golang and do not require any external dependencies
We can enjoy a singe command line procedure in interactive mode.

mariadb-repmgr -hosts=9.3.3.55:3306,9.3.3.56:3306,9.3.3.57:3306 -user=admin:xxxxx -rpluser=repl:xxxxxx -pre-failover-script="/root/pre-failover.sh" -post-failover-script="/root/post-failover.sh" -verbose -maxdelay 15    
Don't be afraid default is to run in interactive mode and it does not launch anything yet.


In my post configuration script i usually update some haproxy configuration store in a NAS or a SAN and reload or shoot in the head all proxies

Note that the new elected master will be passed as second argument of the script.

I strongly advice not to try to auto failover base on some monitoring, get a good replication monitoring tool and analyze all master status alerts, checking for false positive situation before enjoying pre coded failover.

Loss less semi-synchronous replication in MDEV-162  and multiple performance improvements of semi-synchronous MDEV-7257, have made it to MariaDB 10.1, it can be use to greatly improve zero data lost in case of failure . Combine with parallel replication it's now possible to have an HA architecture that is as robust as asynchronous can be, and under replication delay control is crash safe as well.    

Galera aka MariaDB Cluster as a write speed limit bound to upper network speed, it come at the advantage to always offer crash safe consistency. Slave election HA have the master disk speed limit and do not suffer lower network speed but is losing consistency in failover when slave can't catch.

Interesting time to see how flash storage adoption flavor one or the other architecture.

by Stephane Varoqui (noreply@blogger.com) at July 03, 2015 11:04 PM

Chris Calender

MariaDB 10.0.20 Overview and Highlights

MariaDB 10.0.20 was recently released, and is available for download here:

https://downloads.mariadb.org/mariadb/10.0.20/

This is the eleventh GA release of MariaDB 10.0, and 21st overall release of MariaDB 10.0.

There were no major functionality changes, but there was one security fix, 6 crashing bugs fixed, some general upstream fixes, and quite a few bug fixes, so let me cover the highlights:

  • Security Fix: Client command line option –ssl-verify-server-cert (and MYSQL_OPT_SSL_VERIFY_SERVER_CERT option of the client API) when used together with –ssl will ensure that the established connection is SSL-encrypted and the MariaDB server has a valid certificate. This fixes CVE-2015-3152.
  • Crashing Bug: mysql_upgrade crashes the server with REPAIR VIEW (MDEV-8115).
  • Crashing Bug: Server crashes in intern_plugin_lock on concurrent installing semisync plugin and setting rpl_semi_sync_master_enabled (MDEV-363).
  • Crashing Bug: Server crash on updates with joins still on 10.0.18 (MDEV-8114).
  • Crashing Bug: Too large scale in DECIMAL dynamic column getter crashes mysqld (MDEV-7505).
  • Crashing Bug: Server crashes in get_server_from_table_to_cache on empty name (MDEV-8224).
  • Crashing Bug: FreeBSD-specific bug that caused a segfault on FreeBSD 10.1 x86 (MDEV-7398).
  • XtraDB upgraded to 5.6.24-72.2
  • InnoDB updated to InnoDB-5.6.25
  • Performance Schema updated to 5.6.25
  • TokuDB upgraded to 7.5.7

Given the security fix, you may want to consider upgrading if that particular CVE is of concern to you. Also, please review the crashing bugs to see if they might affect you, and upgrade if so. Also, if running TokuDB, XtraDB, InnoDB, or Performance Schema, you may also want to benefit from those fixes, as well as the new MariaDB fixes (139 in all).

You can read more about the 10.0.20 release here:

https://mariadb.com/kb/en/mariadb-10020-release-notes/

And if interested, you can review the full list of changes in 10.0.20 (changelogs) here:

https://mariadb.com/kb/en/mariadb-10020-changelog/

Hope this helps.

by chris at July 03, 2015 03:18 PM

July 01, 2015

MariaDB AB

MariaDB with Galera available on the IBM Power8 platform

anderskarlsson

It was a very long time since I wrote something in this blob, but I have been very busy this spring with MariaDB on Power. This has been a lot of work, but also a lot of fun. So, what is this MariaDB on Power thing all about, well I wrote an introduction to the Power platform late last year. Since then a lot of things has happened.

One thing is that several service providers out there have adopted Power8 as a platform. To be honest, this really isn't sexy, but it is useful and as a user of one of these services, you will just see the same old Linux you are used to, but potentially it is more powerful and reliable. One such provider is OVH, whose service is more known as RunAbove. If you want to try it, you can do so for free for 7 days, just go there and off you go.

Another important thing is that MariaDB is now available on Power8 running, RedHat, SUSE or Ubuntu Linux. To get access to this, pop by MariaDB and if you are not yet signed up, then do this now and then go to "My Portal", further to "Downloads" and then select "MariaDB Enterprise and MariaDB Enterprise Cluster". You are now ready to install using the operating system of your choice, but on Power you are, as I said before, limited to SUSE, RedHat and Ubuntu, and if you want to test MariaDB Enterprise Cluster, i.e. MariaDB with Galera, you have to go with Ubuntu.

Installing MariaDB Enterprise Cluster on Power8 is no more complex than on Intel. There are a few thing to adjust before you can get started with this, after having installed the software. The first node has, as usual, to be configured with wsrep_cluster_adress set to gcomm:// to ensure that this first node will bootstrap without having to connect to a cluster. Once the cluster is up and running though, this variable is set to the cluster addresses. In my case, this what the Galera setting look like in /etc/mysql/my.cnf which is the location of this file on Ubuntu.

# Galera
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="GaleraPower1"
wsrep_cluster_address=gcomm://92.127.22.124
wsrep_node_address=92.127.22.121
wsrep_node_name=galera3
binlog_format=ROW

Note in particular the binlog_format setting. This MUST be set to ROW for Galera to work. But the fact is that these settings are not particular to MariaDB on Power, this is the same even on Intel.

If this isn't enough to convince you about the advantages of running MariaDB on IBM Power, then see what Foedus in Italy has to say about the combination in this video.

There is more to say about running MariaDB on Power and there is more to come here, I'll look at some performance data, we'll have a look at MaxScale on Power (this is not official yet, but that isn't stopping me) as well as a blog on how to run a Power8 emulation on Intel which I have promised before.

So, don't touch that dial!

/Karlsson

Originally posted on Karlsson on databases and stuff

About the Author

anderskarlsson's picture

Anders Karlsson is a Sales Engineer with a long experience from the field of database software. Anders has worked for many of the major database software companies.

by anderskarlsson at July 01, 2015 12:54 PM

June 29, 2015

Chris Calender

MariaDB 5.5.44 Overview and Highlights

MariaDB 5.5.44 was recently released (it is the latest MariaDB 5.5), and is available for download here:

https://downloads.mariadb.org/mariadb/5.5.44/

This is a maintenance release, and no major changes, so there are only several noteworthy items (but one of those being a security fix and five potential crashing bug):

  • Security Fix: Client command line option –ssl-verify-server-cert (and MYSQL_OPT_SSL_VERIFY_SERVER_CERT option of the client API) when used together with –ssl will ensure that the established connection is SSL-encrypted and the MariaDB server has a valid certificate. This fixes CVE-2015-3152.
  • Crashing Bug: mysql_upgrade crashes the server with REPAIR VIEW (MDEV-8115).
  • Crashing Bug: Server crashes in intern_plugin_lock on concurrent installing semisync plugin and setting rpl_semi_sync_master_enabled (MDEV-363).
  • Crashing Bug: Server crash on updates with joins still on 10.0.18 (MDEV-8114).
  • Crashing Bug: Too large scale in DECIMAL dynamic column getter crashes mysqld (MDEV-7505).
  • Crashing Bug: Server crashes in get_server_from_table_to_cache on empty name (MDEV-8224).
  • XtraDB upgraded to 5.5.42-37.2
  • TokuDB upgraded to 7.5.7

Given the security fix, you may want to review the CVE to see if this is something you need to address. Also, please review the crashing bugs to see if they might affect you, and upgrade if so. Also, if running TokuDB or XtraDB, you may also want to benefit from those fixes, as well as the new MariaDB fixes (59 in all).

If interested, the official MariaDB 5.5.44 release notes are here:

https://mariadb.com/kb/en/mariadb/development/release-notes/mariadb-5544-release-notes/

And the full list of fixed bugs and changes in MariaDB 5.5.44 can be found here:

https://mariadb.com/kb/en/mariadb/development/changelogs/mariadb-5544-changelog/

Hope this helps.

by chris at June 29, 2015 10:52 PM

MariaDB 10.1.5 Overview and Highlights

MariaDB 10.1.5 was recently released, and is available for download here:

https://downloads.mariadb.org/mariadb/10.1.5/

This is the 3rd beta, and 6th overall, release of MariaDB 10.1. There were not many major changes in this release, but a few notable items, as well as many overall bugs fixed (I counted 306).

Since it’s beta, I’ll only cover the major changes and additions, and omit covering general bug fixes (feel free to browse them all here).

To me, these are the highlights:

Of course it goes without saying that do not use this for production systems since it is still only beta. However, I definitely recommend installing it on a test server and testing it out. And if you happen to be running a previous version of 10.1, then you should definitely upgrade to this latest release.

You can read more about the 10.1.5 release here:

https://mariadb.com/kb/en/mariadb-1015-release-notes/

And if interested, you can review the full list of changes in 10.1.5 (changelogs) here:

https://mariadb.com/kb/en/mariadb-1015-changelog/

Hope this helps.

by chris at June 29, 2015 09:33 PM

Jean-Jerome Schmidt

Deploying Galera Cluster for MySQL using Vagrant

Setting up environments, starting processes, and monitoring these processes on multiple machines can be time consuming and error prone - stale settings from previous test runs, wrong configurations, wrong commands, package conflicts, etc.. quite a few things can go wrong. If you are using Galera Cluster, you would probably want application developers to have a proper development environment on their local computers. Proper here means testing your code on a local Galera Cluster, not on a single instance MySQL. Galera Cluster differs from a single instance MySQL, so this allows you to catch these differences early in the project. But how can you quickly roll out a mini test clusters to your application developers, without having them waste time setting these up? This is where Vagrant comes in.

Vagrant is a system that allows you to easily create and move development environments from one machine to another. Simply define what type of VM you want in a file called Vagrantfile and then fire them up with a single command. It integrates well with virtual machine providers like VirtualBox, VMware and AWS. In this blog, we’ll show you how to expedite the deployment of your development environment using some Vagrant boxes we’ve put together.

Our Vagrantfile deploys 4 instances on VirtualBox platform, three for Galera nodes plus one for ClusterControl. It requires the following Vagrant boxes available on our site:

  • s9s-cc (505 MB) - Ubuntu 14.04.x, ClusterControl 1.2.10
  • s9s-galera (407 MB) - Ubuntu 14.04.x, Percona XtraDB Cluster 5.6

Here are the main steps:

  1. Install Vagrant and Virtualbox
  2. Download the related Vagrant boxes and Vagrantfile
  3. Launch the instances
  4. Bootstrap the Galera cluster
  5. Add the cluster to ClusterControl.

The following architecture diagram shows what you will get once everything is deployed:

Ensure that you have Vagrant and VirtualBox installed. We are not going to cover the installation of these in this blog post.

Deploying the Cluster

1.  Download and install the Vagrant boxes:

$ vagrant box add s9s-cc http://severalnines.com/downloads/cmon/s9s-cc.box
$ vagrant box add s9s-galera http://severalnines.com/downloads/cmon/s9s-galera.box

Make sure you keep the box names s9s-cc and s9s-galera, otherwise you’ll need to change the corresponding values in the Vagrantfile.

2. Create a directory and download the Vagrantfile:

$ mkdir s9s-cc
$ cd s9s-cc
$ wget http://severalnines.com/downloads/cmon/Vagrantfile

3. Launch 4 instances, each requires 768 MB of memory:

$ vagrant up

4. Verify if all instances are up with:

$ vagrant status

4. SSH to vm2 (n2) and run the start-node.sh script located under s9s directory. This will copy the relevant my.cnf file and bootstrap the Galera cluster:

$ vagrant ssh vm2
vagrant@n2:~$ cd s9s
vagrant@n2:~$ ./start-node.sh

5. Execute the same on vm3 (n3) and vm4 (n4). This will copy the relevant my.cnf file and start the node to join n2:

$ vagrant ssh vm3
vagrant@n3:~$ cd s9s
vagrant@n3:~$ ./start-node.sh
$ vagrant ssh vm4
vagrant@n4:~$ cd s9s
vagrant@n4:~$ ./start-node.sh

At this point, our Galera cluster should be up and running. You should be able to access each MySQL server on its respective IP address and port. The default MySQL root password is root123 while the ‘cmon’ password is cmon.

Adding Galera Cluster into ClusterControl

Once Galera Cluster is running, add it to ClusterControl. Open a web browser and point it to http://localhost:8080/clustercontrol. Create a default admin user with a valid email address and password, and click ‘Register & Create User’. 

Once logged in, click on ‘Add Existing Server/Cluster’, and enter the following details:

Click ‘Add Cluster’ and monitor the output of cluster jobs. Once done, you should able to see the Galera Cluster listed:

That’s it! Quick, simple and works every time :-)

Blog category:

by Severalnines at June 29, 2015 04:29 PM

June 27, 2015

MariaDB Foundation

MariaDB Galera Cluster 5.5.44 and 10.0.20 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB Galera Cluster 10.0.20 and MariaDB Galera Cluster 5.5.44. These are Stable (GA) releases.


Download MariaDB Galera Cluster 10.0.20

Release Notes Changelog What is MariaDB Galera Cluster?


Download MariaDB Galera Cluster 5.5.44

Release Notes Changelog What is MariaDB Galera Cluster?


MariaDB APT and YUM Repository Configuration Generator

mariadb-seal-shaded-browntext-altSee the Release Notes and Changelogs for detailed information on these releases and the What is MariaDB Galera Cluster? page in the MariaDB Knowledge Base for general information about MariaDB Galera Cluster.

Thanks, and enjoy MariaDB!

by Daniel Bartholomew at June 27, 2015 04:11 PM

June 25, 2015

Jean-Jerome Schmidt

Become a MySQL DBA blog series - Common operations - Schema Changes

Database schema changes are not popular among DBAs, not when you are operating production databases and cannot afford to switch off the service during a maintenance window. These are unfortunately frequent and necessary, especially when introducing new features to existing applications. 

Schema changes can be performed in different ways, with tradeoffs such as complexity versus performance or availability. For instance, some methods would trigger a full table rewrite which could lead to high server load. This in turn would lead to degraded performance and increased replication lag in master-slave replication setups. 

This is the fourth installment in the ‘Become a MySQL DBA’ series, and discusses the different approaches to schema changes in MySQL. Our previous posts in the DBA series include High Availability, Backup & Restore and Monitoring & Trending.

Schema changes in MySQL

A common obstacle when introducing new features to your application is making a MySQL schema change in production, in the form of additional columns or indexes. Traditionally, a schema change in MySQL was a blocking operation - a table had to be locked for the duration of the ALTER. This is unacceptable for many applications - you can’t just stop receiving writes as this causes your application to become unresponsive. In general, “maintenance breaks” are not popular - databases have to be up and running most of the time. The good news is that there are ways to make this an online process. 

Rolling schema update in MySQL Replication setups

MySQL replication is an easy way of setting up high availability, but managing schema updates are tricky. Some ALTERs may lock writes on the master and create replication lag - this is obvious for any ALTER statement. The reason is simple - MySQL replication is single-threaded and if the SQL thread is executing an ALTER statement, it won’t execute anything else. It is also important to understand that the slave is able to start replicating the schema change only after it has completed on the master. This results in a significant amount of time needed to complete changes on the slave: time needed for a change on the master + time needed for a change on the slave.

All of this sounds bad but replication can be used to help a DBA manage some of the schema changes. The plan is simple - take one of the slaves out of rotation, execute ALTERs, bring it back, rinse and repeat until all slaves have been updated. Once that’s done, promote one of the slaves to master, run ALTER on the old master, bring it back as a slave.

This is a simple yet efficient way of implementing schema changes. Failover requires some downtime but it is much less impacting than running all of the changes through the replication chain, starting from the master. The main limitation of this method is that the new schema has to be compatible with the current schema - remember, master (where all writes happen) has it unchanged until almost the end. This is a significant limitation, especially if you use row-based binary log format. While statement-based replication (SBR) is pretty flexible, row-based replication (RBR) is much more demanding when it comes to the schema consistency. For example, adding a new column in any place other than the end of the table won’t work in RBR. With SBR, it is not an issue. Be sure you checked the documentation and verified that your schema change is compatible. Last but not least, if you use mixed binlog format, keep in mind that while it uses mostly statement-based binlog format, it will use row-based binlog format for those queries which are not deterministic. Thus, it may cause similar problems as RBR.

MySQL-based functionality for online schema change

As we mentioned earlier, some of the operations may not be blocking in MySQL and thus can be executed on a live system. It is true especially with MySQL 5.6, which brought a number of improvements in this area. Unfortunately, it doesn’t solve problems with replication lag - ALTERs will still cause this type of problem. Still, this is a great choice for smaller tables where lag created is acceptable. Of course, it is application-dependent but usually it’s not a big deal if the slave lag is a couple of seconds and this may mean that tables even up to a couple of gigabytes (hardware-dependent) may be within range. If your application cannot accept even such small lag, then we’d strongly suggest to rethink about the design. Slaves will lag, it is just a matter of when it will happen.

Other tools for Online schema change

There are a couple of tools that perform online schema change in MySQL. The best known is probably pt-online-schema-change, which is part of Percona Toolkit. Another one is “Online Schema Change” developed by Facebook.

  • Those tools work in a similar way
  • create a new table with the desired schema;
  • create triggers on the old table that will mirror all changes and store them in the new table;
  • copy data from old table into a new one in batches;
  • once it’s done, rename tables and drop the old one.

Those tools give the DBA great flexibility - you don’t have to do a time-consuming rolling upgrade, it’s enough to run pt-online-schema-change and it will take care of your ALTER. It’s even replication-aware and, as such, it can throttle itself down when a lag is detected on one of the slaves. It’s not without limitations, though. 

You need to be aware that the “copy” operation is basically a number of low priority inserts. They will impact the overall performance - it’s inevitable. The process of moving millions of rows takes time - online schema change is much slower than the direct ALTER executed on the same table. By “much” we mean even an order of magnitude. Of course, it all depends on your hardware (disk throughput is the most important factor) and table schema, but it is not uncommon to see changes which literally take days to finish. Another limitation is the fact that this tool cannot be used on a table where triggers already exist. For now MySQL allows only a single trigger of a given type per table. This will probably change in MySQL 5.7 (the relevant worklog is marked as completed) but it doesn’t help much if you run on MySQL 5.6. 

Another problem is with foreign keys - they are linked to a given table and if you create a new one and then swap it with the old table, foreign keys will have to be updated to point to the new table. Pt-online-schema-change gives you two options to deal with it but, frankly, none of them is good. 

The first option, fast but risky, is to drop the old table instead of renaming it. The main problem here is two-fold - first, for a while there’s no table - renaming a table is an atomic operation, dropping it is not. Second, as the old table has been dropped, there’s no rollback if an error occurs after the drop. 

The second option requires executing ALTERs on the tables linked by foreign keys - those tables are basically altered and new FKs are created. This is fine as long as those tables are small because the change is executed as a normal ALTER with all it’s consequences (replication lag, for example).

Metadata locking is another problem that you may experience while using pt-online-schema-change. Pt-osc have to create triggers and this operation requires a metadata lock. On a busy server with plenty of long-running transactions, this could be hard to acquire. It is possible to increase timeouts and, in that way, increase chances of acquiring the lock. But we’ve seen servers where it’s virtually impossible to run pt-online-schema-change due to this problem. 

Given this long list of the problems and limitations, you might think that this tool is not worth your time. Well, on the contrary. The list is so long because almost every MySQL DBA will rely on pt-online-schema-change heavily and, in the process, will learn all of it’s dark sides. This tool is one of the most useful tools in the DBA’s toolkit. Even though it has some limitations, it gives you great degree of flexibility regarding how to approach schema changes in MySQL.

Schema changes in Galera Cluster

Galera cluster brings another layer of complexity when it comes to schema changes. As it is a ‘virtually’ synchronous cluster, having a consistent schema is even more important than regular MySQL connected via replication. Galera brings two methods of running schema changes and we’ll discuss them and the repercussions of using them below.

TOI (Total Order Isolation)

The default one, TOI - Total Order Isolation, works in a way that the change happens at exactly the same time on all of the nodes in the cluster. This is great for consistency and allows you to run any kind of change, even non-compatible ones. But it comes with a huge cost - all other writes have to wait until the ALTER finishes. This, of course, make long-running ALTERs not feasible to execute because every one of them will cause significant downtime for the whole application. This mode can be used successfully for quick, small changes which do not take more than a second (unless you are ok with some ‘stalls’ in your application or you have a maintenance window defined for such changes).

What is also important is that MySQL’s online ALTERs do not help here. Even a change, which you could easily run on the master without blocking it (and only be concerned about slaves lagging) will cause all writes to halt.

RSU (Rolling Schema Upgrade)

The second option that Galera offers is RSU - Rolling Schema Upgrade. This is somewhat similar to the option we discussed above, (see section on Rolling Schema Update in MySQL Replication setups). At that time we were pulling out our slaves, one by one, and finally we executed a master change. Here we’ll be taking the Galera nodes out of rotation.

The whole process is partially automated - set the wsrep_OSU_method variable to RSU, and all you need to do is to proceed with the ALTER. The node will switch to the Desync state and flow control will be disabled ensuring that the ALTER will not affect the rest of the cluster. If your proxy layer is setup in a way that Desync state means no traffic will reach this node (and that’s how you should set up your proxy), such operation is transparent to the application. Once the ALTER finishes, the node is brought back to sync with the cluster.

This has several repercussions that you need to keep in mind. First of all, similar to the rolling schema upgrade on MySQL replication, changes have to be compatible with the old schema. As Galera uses row-based format for replication, it is very strict regarding changes that can be done online. You should verify every change you plan to make (see MySQL documentation) to ensure it is indeed compatible. If you performed an incompatible schema change, Galera won’t be able to apply writesets and it will complain about a node not being consistent with the rest of the cluster. This will result in Galera wiping out the offending node and executing SST.

You also need be aware of the fact that, for the duration of the change, the altered node does not process writesets. It will ask for them later, once it finishes the ALTER process. If it won’t find the writesets on any of the other synced nodes in the cluster, it will execute SST, removing the change completely. You have to ensure that gcache is large enough to store the data for the duration of the ALTER. It can be tricky and problematic as gcache size is only one of the factors - another one is the workload. You may have increased gcache but if the amount (and size) of the writesets in a given time increases too, you may still run out of space in the cache.

Generic scenarios of the schema change

Now, let’s look at some real life scenarios and how you could approach them. We hope this will make more clear the strong and weak points of each method. Please note that we are adding estimated time to each of these scenarios. It is critical that the DBA, before executing a change, has knowledge about the time needed to complete it. We cannot stress it enough - you have to know what you’ll be executing and how long will it take.

There are a couple of ways in which you can estimate the performance. First, you can (and you should) have a development environment with a copy of your production data. This data should be as close to the real production copy as possible in terms of the size. Sure, sometimes you have to scrub it for security reasons, but still - closer to production means better estimates. If you have such environment, you can execute a change and assess the performance.

Another way, even more precise, is to run the change on a host that is connected to the production setup via replication. It is more precise because, for example, pt-online-schema-change execute numerous inserts and they can be slowed down because of the regular traffic. Having the regular traffic flown in via replication helps to make a good assessment. 

Finally, it’s all about the experience of the DBA - knowledge about the system’s performance and workload patterns. From our experience we’d say that when in doubt, add 50% to the estimated time. In the best case, you’ll be happy. In the worst, you should be about right, maybe a bit over the ETA.

Scenario - Small table, alter takes up to 10s

MySQL Replication

In this case it’s a matter of answering the question - does your application allow some lag? If yes, and if the change is non-blocking, you can run direct ALTER. On the other hand, pt-online-schema-change shouldn’t take more than couple of minutes on such a table and it won’t cause any lag-related issues. It’s up to you to decide which approach is better. Of course, if the change is blocking on the MySQL version you have installed, online schema change is the only option.

Galera Cluster

In this case, we’d say the only feasible way of executing the change is to use pt-online-schema-change. Obviously we don’t want to use TOI as we’d be locked for couple of seconds. We could use RSU if the change is compatible, but it creates additional overhead of running the change on a node, one by one, keeping an eye on their status, ensuring the proxy layer is taking nodes out of rotation. It’s doable but if we can use online schema change and just let it run, why not do that?

Scenario - Medium-sized table, from 20 - 30 minutes up to 1h

Replication and Galera Cluster

This is where pt-online-schema-change shines. Changes take too long for a direct ALTER to be feasible yet the table is not too big and pt-osc should be able to finish the process within several hours at the most. It may take a while but it will eventually be done. It’s also much less cumbersome than executing a rolling schema upgrade.

Scenario - Large tables, more than 1h, up to 6 -12h

MySQL Replication

Such tables can become tricky ones. On the one hand, pt-online-schema-change will work fine, but problems may start to appear. As pt-osc is expected to take even 36 - 48h to finish such change, you need to consider impact on the performance (because pt-osc has its impact, the inserts need to be executed). You also need to assess if you have enough disk space. This is somewhat true for most of the methods we described (except maybe for online ALTERs) but it’s even more true for pt-osc as inserts will significantly increase the size of the binary logs. Therefore you may want to try to use Rolling Schema Upgrade - downtime will be required but the overall impact may be lower than using pt-osc.

Galera Cluster

In Galera, the situation is somewhat similar. You can also use pt-online-schema-change if you are ok with some performance impact. You may also use RSU mode and execute changes node by node. Keep in mind that gcache size for 12hrs worth of writesets, on a busy cluster, may require a significant amount of memory. What you can do is to monitor wsrep_last_committed and wsrep_local_cached_downto counters to estimate how long the gcache is able to store data in your case.

Scenario - Very large tables, more than 12h

First of all, why do you need such a large table? :-) Is it really required to have all this data in a single table? Maybe it’s possible to archive some of this data in a set of archive tables (one per year/month/week, depending on their size) and remove it from the “main” table?

If it’s not possible to decrease the size (or it’s too late as this process will take weeks while ALTER has to be executed now), you need to get creative. For MySQL Replication you’ll probably use rolling schema upgrade as a method of choice with a slight change, though. Instead of running the ALTER over and over again you may want to use xtrabackup or even snapshots, if you have LVM or run on EBS volumes in EC2, to propagate changes through the replication chain. It will be probably faster to run ALTER once and then rebuild slaves from scratch using the new data (rather than executing the ALTER on every host).

Galera Cluster may suffer from problems with gcache. If you can fit the 24h or even more data into gache, good for you - you can use RSU. If not, though, you will have to improvise. One way would be to take a backup of the cluster and use it to build a new Galera cluster which will be connected to the production one via replication. Once that is done, run the change on the ‘other’ cluster and, finally, failover to it.

As you can see, schema changes may become a serious problem to deal with. This is a good point to keep in mind that the schema design is very important in relational databases - once you push data into tables, things may become hard to change. Therefore you need to design table schemas as time-proof as possible (including indexing any access pattern that may be used by queries in the future). Also, before you start inserting data in your tables, you need to plan how to archive this data. Partitions maybe? Separate archive tables? As long as you can keep the tables reasonably small, you won’t have problems with adding a new index.

Of course, your mileage may vary - we used time as a main differentiating factor because an ALTER on a 10GB table may take minutes or hours. You also need to remember that pt-online-schema-change has its limitations - if a table has triggers, you may need to use rolling schema upgrade on it. Same with foreign keys. This is another question to answer while designing the schema - do you need triggers? Can it be done from within app? Are foreign keys required or can you have some consistency checks in the application? It is very likely that developers will push on using all those database features, and that’s perfectly understandable - they are there to be used. But you, as a DBA, will have to assess all of the pros and cons and help them decide whether the pros of using all those database features are larger than the cons of maintaining a database that is full of triggers and foreign keys. Schema changes will happen and eventually you’ll have to perform them. Not having an option to run pt-online-schema-change may significantly limit your possibilities.

Related Blogs

 

Blog category:

by Severalnines at June 25, 2015 03:07 PM

June 23, 2015

Oli Sennhauser

FromDual Backup Manager for MySQL 1.2.2 has been released

FromDual has the pleasure to announce the release of the new version 1.2.2 of the popular Backup Manager for MySQL and MariaDB (fromdual_bman).

You can download the FromDual Backup Manager from here.

In the inconceivable case that you find a bug in the Backup Manager please report it to our Bugtracker.

Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.

Upgrade from 1.2.x to 1.2.2

# cd ${HOME}/product
# tar xf /download/fromdual_brman-1.2.2.tar.gz
# rm -f fromdual_brman
# ln -s fromdual_brman-1.2.2 fromdual_brman

Changes in FromDual Backup Manager 1.2.2

FromDual Backup Manager

It contains mainly fixes with brman catalog and physical backups.

You can verify your current FromDual Backup Manager version with the following command:

fromdual_bman --version

  • Archiving with physical backup bug fixed.
  • Connect replaced by OO style and error exit fixed.
  • Create catalog fixed.
  • Archivedir without archive option does not make sense.

by Shinguz at June 23, 2015 09:33 AM

June 22, 2015

Oli Sennhauser

Wir suchen Dich: MySQL/MariaDB DBA für FromDual Support

Taxonomy upgrade extras: 

Wer sind wir?


FromDual ist das führende unabhängige Beratungs- und Dienstleistungsunternehmen für MySQL, Galera Cluster, MariaDB und Percona Server in Europa mit Hauptsitz in der Schweiz.

Unsere Kunden stammen hauptsächlich aus Europa und reichen vom kleinen Start-Up bis zur europäischen Top-500 Firma. Sie erhalten von uns Support bei Datenbank-Problemen, direkte Eingriffe als remote-DBA, Schulung für ihre DBAs und Entwickler sowie Beratung bei Architektur- und Design-Entscheidungen. Ausserdem entwickeln wir Tools rund um MySQL, schreiben Blog-Artikel und halten Vorträge bei Konferenzen.

Da unsere qualitativ guten Dienstleistungen immer mehr Kunden anziehen, brauchen wir Kollegen (m/w), welche selbst und mit uns wachsen wollen.

Stellenbeschreibung


Wir suchen deutschsprachige Mitarbeiter (Sie oder Ihn) auf Junior- oder Senior-Level für Dienstleistungen rund um MySQL (hauptsächlich Support und remote-DBA Arbeiten) in Vollzeit. Primär solltest Du sicherstellen, dass die geschäftskritischen MySQL-Datenbanken unserer Kunden wie am Schnürchen laufen - und falls nicht, diese schnell wieder ans Laufen kriegen...


Unser/e "Wunschkandidat/in"

  • hat Erfahrung im Betrieb kritischer und hoch verfügbarer produktiver Datenbanken hauptsächlich auf Linux,
  • kennt Replikation in allen Variationen aus der täglichen Arbeit,
  • weiß, wie die meist verbreiteten MySQL-HA-Setups funktionieren und wie man sie wieder effizient repariert, wenn ein Problem auftritt,
  • ist sattelfest in SQL,
  • bringt Erfahrung mit Galera Cluster mit,
  • kann Bash skripten und einfache Programme in mindestens einer verbreiteten Programmier-/Skripting-Sprache (PHP, Bash, ...) erstellen.

Wir suchen Verstärkung, die von soliden Grundlagen aus auf dem Weg zu diesem Ideal ist.


Was wir von Dir erwarten:

  • Kenntnisse in MySQL, Percona Server oder MariaDB oder Bereitschaft, sich diese anzueignen
  • wissen, wie man kritische Datenbank-Systeme betreibt
  • Verständnis, was beim Betrieb von Datenbanken falsch laufen kann
  • selbständige Arbeitsweise (remote) mit Kommunikation über IRC, Skype, Mail und Telefon
  • Kenntnisse des Linux Systems

DBA- oder DevOps-Erfahrungen wären z.B. eine gute fachliche Basis.


Du schätzt den direkten Kontakt mit Kunden, hast ein gutes Gespür für deren Probleme, kannst zuhören und findest schnell die eigentlichen Probleme. Du bist gewohnt, proaktiv zu handeln bevor etwas passiert, und führst den Kunden wieder auf den richtigen Pfad zurück.


Um Deine Arbeit erledigen zu können, arbeitest Du in einer europäischen Zeitzone. Deine Arbeitszeit kannst Du, der betrieblichen Situation entsprechend, flexibel gestalten. Wir erwarten, dass Du Deinen Beitrag zum Bereitschaftsdienst leistest. FromDual hat voraussichtlich keine Büroräumlichkeiten in Deinem Wohnort. Ein Umzug ist jedoch nicht notwendig: Wir ermöglichen Dir das Arbeiten von zu Hause aus oder unterstützen Dich bei der Suche einer geeigneten Arbeitsräumlichkeit in Deiner Nähe. Gute schriftliche und mündliche Englischkenntnisse sind erforderlich.

Was wir Dir bieten:


  • Deinen Leistungen angemessenes Gehalt.
  • Möglichkeit Dich zum Top MySQL-Datenbankspezialisten zu entwickeln.
  • Selbständiges Arbeiten.
  • Verantwortung für Deine Projekte und Kunden zu übernehmen.
  • Gute Kameradschaft im Team, sowie lockerer und angenehmer Umgang.
  • Stellenbezogene Weiterbildungsmöglichkeiten.
  • Teilnahme an Open Source Anlässen.
  • Arbeit von Deinem bevorzugten Wohnort aus.

Du solltest in der Lage sein, die meiste Zeit selbständig zu arbeiten, zu denken und zu handeln und Dir neues Wissen selbständig anzueignen (durch Web-Suche, die MySQL-Dokumentation, Ausprobieren, etc.). Solltest Du dennoch einmal nicht weiterkommen, werden Dir Deine Kollegen von FromDual gerne helfen.


Wenn Du jemanden brauchst, der Dir die ganze Zeit Dein Händchen hält, ist FromDual nicht die richtige Wahl.


Wie geht es weiter


Wenn Du an dieser Chance interessiert bist und Du denkst, dass Du die passende Kandidatin oder der passende Kandidat bist, würden wir uns freuen, von Dir zu hören. Wir wissen, dass niemand 100% auf diese Stellenbeschreibung passt!


Bitte schicke Deinen ungeschönten Lebenslauf mit Deinen Gehaltsvorstellungen an jobs@fromdual.com. Wenn Du mehr über diese Stelle erfahren oder wenn Du mit mir persönlich sprechen möchtest, ruf mich bitte an unter +41 79 830 09 33 (Oli Sennhauser, CTO). Bitte nur Bewerber, KEINE Headhunter!


Nachdem Du uns Deinen Lebenslauf zugeschickt hast, darfst Du Deine Fähigkeiten in einem kleinen MySQL-Test unter Beweis zu stellen. Nach bestandenem Test laden wir Dich für die finalen Interviews ein.

by Shinguz at June 22, 2015 11:51 AM

Chris Calender

MySQL 5.6.25 Overview and Highlights

MySQL 5.6.25 was recently released (it is the latest MySQL 5.6, is GA), and is available for download here.

For this release, there are 2 “Functionality Added or Changed” items of note:

  • Functionality Added/Changed: MySQL distributions now include an innodb_stress suite of test cases. Thanks to Mark Callaghan for the contribution. (Bug #76347)
  • Functionality Added/Changed: my_print_defaults now masks passwords. To display passwords in cleartext, use the new –show option.

In addition to those, there were 55 other bug fixes:

  • 10 InnoDB
  •   8 Replication
  •   3 Partitioning (one overlaps w/ an InnoDB bug fix)
  • 35 Miscellaneous (and 6 of those were specifically for “MySQL Enterprise Firewall”)

The highlights for me are 5 of the replication bugs, 1 partitioning bug, 1 performance-related bug, 1 wrong results bug, and 9 crashing bugs:

  • Replication: When using semisynchronous replication performance was degrading when the number of threads increased beyond a certain threshold. To improve performance, now only the thread which is committing is responsible for deleting the active transaction node. All other operations do not touch this active transaction list. (Bug #75570)
  • Replication: When binary logging was enabled, using stored functions and triggers resulting in a long running procedure that inserted many records caused the memory use to increase rapidly. This was due to memory being allocated per variable. The fix ensures that in such a situation, memory is allocated once and the same memory is reused. (Bug #75879)
  • Replication: If an error was encountered while adding a GTID to the received GTID set, the log lock was not being correctly released. This could cause a deadlock. (Bug #75781)
  • Replication: When master_info_repository=TABLE the receiver thread stores received event information in a table. The memory used in the process of updating the table was not being freed correctly and this could lead to an out of memory error. The fix ensures that after an event is flushed to the relay log file by a receiver thread, the memory used is freed. (Bug #72885, Bug #69848)
  • Replication: Using mysqlbinlog to process log events greater than 1.6GB failed with an out of memory error. This was caused by an internal error converting the length variable. The fix upgrades the length variable to avoid overflow in both encoding and decoding functions. (Bug #74734)
  • Partitioning: Executing an ALTER TABLE on a partitioned table on which a write lock was in effect could cause subsequent SQL statements on this table to fail. (Bug #74288).
  • Performance-related: Certain queries for the INFORMATION_SCHEMA TABLES and COLUMNS tables could lead to excessive memory use when there were large numbers of empty InnoDB tables. (Bug #72322)
  • Incorrect Results: Queries that included a HAVING clause based on nondeterministic functions could produce incorrect results. (Bug #69638)
  • Crashing Bug: For small values of the read_rnd_buffer_size system variable, internal caching of temporary results could fail and cause query execution failure.
  • Crashing Bug: A failed FLUSH PRIVILEGES statement followed by statements to create or drop accounts could cause a server exit.
  • Crashing Bug: SHOW VARIABLES mutexes were being locked twice, resulting in a server exit.
  • Crashing Bug: For join queries with a large number of tables, the server could exit converting the join to a semi-join.
  • Crashing Bug: Deleting rows from mysql.user following by granting privileges to a new account could result in a server exit.
  • Crashing Bug: Within a stored procedure, access to view columns after DDL or FLUSH TABLES statements in the procedure could cause a server exit.
  • Crashing Bug: Execution of certain BINLOG statements while temporary tables were open by HANDLER statements could cause a server exit.
  • Crashing Bug: For a prepared statement with an ORDER BY that refers by column number to a GROUP_CONCAT() expression that has an outer reference, repeated statement execution could cause a server exit.
  • Crashing Bug: Specifying –general_log_file= (with an empty value) at server startup caused the server to fail and exit.

Conclusions:

So while there were no major changes, the partitioning fix could fix a potentially serious issue if you think you might encounter it (some partitioning use-cases involve frequent ALTERs), the replication fixes could potentially be important for you, and the numerous crashing (and performance-related & wrong results) bugs are important if you’re performing the operations that trigger them. So read through these, and if you’ll be affected by any of the above, or think you might be, then I’d recommend upgrading.

On a side note, there are several serious “MySQL Enterprise Firewall” bug fixes in this release, which I omitted above since the general public doesn’t have access to it, but if you are using it, you should upgrade due to the number of potentially serious bugs that exist in prior versions.

The full 5.6.25 changelogs can be viewed here (which has more details about all of the bugs listed above):

http://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-25.html

Hope this helps. :)

by chris at June 22, 2015 12:53 AM

June 18, 2015

MariaDB Foundation

MariaDB 10.0.20 now available

Download MariaDB 10.0.20

Release Notes Changelog What is MariaDB 10.0?

MariaDB APT and YUM Repository Configuration Generator

mariadb-seal-shaded-browntext-altThe MariaDB project is pleased to announce the immediate availability of MariaDB 10.0.20. This is a Stable (GA) release.

See the Release Notes and Changelog for detailed information on this release and the What is MariaDB 10.0? page in the MariaDB Knowledge Base for general information about the MariaDB 10.0 series.

Thanks, and enjoy MariaDB!

by Daniel Bartholomew at June 18, 2015 09:23 PM

June 17, 2015

Jean-Jerome Schmidt

Become a MySQL DBA - webinar series: deciding on a relevant backup solution

Backup and restore is one of the most important aspects of database administration. If a database crashed and there was no way to recover it, any resulting data loss might lead to devastating results to a business. As the DBA operating a MySQL or Galera cluster in production, you need to ensure your backups are scheduled, executed and regularly tested.

There are multiple ways to take backups, but which method fits your specific needs? How do I implement point in time recovery? 

Join us for this live session on backup strategies for MySQL and Galera clusters led by Krzysztof Książek, Senior Support Engineer at Severalnines.

DATE & TIME

Europe/MEA/APAC
Tuesday, June 30th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)
Register Now

North America/LatAm
Tuesday, June 30th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)
Register Now

AGENDA

  • Logical and Physical Backup methods
  • Tools
    • mysqldump
    • mydumper
    • xtrabackup
    • snapshots
  • How backups are done in ClusterControl
  • Best practices
  • Example Setups
    • On premises / private datacenter
    • Amazon Web Services

SPEAKER

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience in managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. 

This webinar ‘Become a MySQL DBA - deciding on a relevant backup solution’ will show you the pros and cons of different backup options and help you pick one that goes well with your environment.

We look forward to “seeing” you there and to insightful discussions!

restore_logo.png

Blog category:

by Severalnines at June 17, 2015 03:29 PM

June 15, 2015

Jean-Jerome Schmidt

Become a MySQL DBA blog series - Database High Availability

There are many many approaches to MySQL high availability - from traditional, loosely-coupled database setups based on asynchronous replication to more modern, tightly-coupled architectures based on synchronous replication. These offer varying degrees of protection, and DBAs almost always have to choose a tradeoff between high-availability and cost. 

This is the third installment in the ‘Become a MySQL DBA’ series, and discusses the pros and cons of different approaches to high availability in MySQL. Our previous posts in the DBA series include Backup and Restore and Monitoring & Trending.

High Availability - what does it mean?

Availability is somewhat self-explanatory. If your database can be queried by your application, it is available. High, on the other hand, is a separate story. For some organizations, ‘high’ means max several minutes of downtime over the year. For others, it might mean a few hours per month. If you’ve read the previous blogs in this series, you may have noticed a pattern - “it depends on the business requirements”. This applies here also - you need to know your requirements in terms of how long downtime you can accept as it may limit your HA options significantly. What you need to keep in mind is that the length of a database incident, that causes some disturbance in database access, may be related to the HA method you choose. On the other hand, whether this disturbance affects end users is a different thing. For starters - does your application use a cache? How often does it need to be refreshed? Is it acceptable for your application to show stale data for some period of time? And for how long?

Caching Layer - for database reads and writes?

A cache that sits between the application and the database might be a way of decoupling those two from each other. 

For reads you can use one of many cache solutions - memcached, Redis, couchbase. Cache refresh can be performed by a background thread which, when needed, gets the data out of MySQL and stores it in the caching layer. It could be that the data is outdated because the database is not reachable and the background thread is not able to refresh the cache. While the database is down, the application serves the data out of cache - as long as it’s ok to serve stale data for some time, you are just fine and users may not even experience any issues. 

With writes, it is a similar story - you may want to cache writes in a queue. In the background, you would have threads that read the data out of the queue and store them into the database. Ideally those background threads keep the queue empty and any write request is handled immediately. If the database is down, the queue can serve as a write buffer - the application can still make modifications to the data but the results are not immediately stored in the database - they will be later on, when the database gets back online and the background threads start working on the backlog.

There are many ways to keep users happy and unaware of the issues behind the scenes - all user-related modifications can be immediately presented to the user, to give an impression that everything is just fine. Other users will not see those changes until the write queue is flushed to the database. Of course, it depends on what kind of data we are talking about - in many cases (e.g., social media site, web forum, chat engine, comment engine), it might be just fine. One way or another, this “illusion” can be maintained only for some period of time though. Eventually, the database has to be brought up again. Let’s talk now about our options for database high availability.

Block-level replication (DRBD)

We’ll start with DRBD - Distributed Replicated Block Device. In short, imagine that you could create a RAID1 over the network. This is, more or less, what DRBD does. You have two nodes (or three in the latest versions), each of them have a block device dedicated to storing data. One of them is in active mode, mounted and basically works as a database server. The rest of them are in passive standby mode - any changes made on the active node’s block device are replicated to the passive nodes and applied. Replication can be synchronous, asynchronous or memory synchronous. The point of this exercise is that, should the active node fail, the passive nodes have an exact copy of the data (if you use replication in synchronous mode, that is). You can then promote a passive node to active, mount the block volume, start the services you want (like, MySQL for example), and you have a replacement node up and running.

There are couple of disadvantages in the DRBD setup. One of them is the active - passive approach. It’s a problem on multiple layers. For starters, you have to have two nodes while you can use only one of them. You cannot use the passive node for ad-hoc or reporting queries, you cannot take backups off it. Additionally, fail-over equals to starting a crashed MySQL (like someone just pulled the power plug) - InnoDB recovery will kick in and while data may not be lost (subject to InnoDB’s durability settings), the process may take significant amount of time, workload depending. Once the node is up, it willl need some time to warm up - you can’t prewarm it as it is not active. Last but not least, we are talking about 1:1 or 1:2 setups - only one active node and one or two copies. Theoretically you could use DRBD to keep a copy of master -> slave setup but we haven’t seen it in production nor it makes sense from a cost point of view.

MySQL replication

MySQL replication is one of the oldest and probably the most popular way of achieving MySQL high availability. The concept is simple - you have a master that replicates to one or more slaves. If a slave goes down, you use another slave. If the master is down, you promote one of the slaves to act as a new master. When you get into details, though, things become more complex.

Master failover consists of several phases:

  1. You need to locate the most advanced slave
  2. If there are more of them, pick one as a new master and reslave the rest to the new master
  3. If there is only one “most advanced” slave, you should try to identify missing transactions and replay them on the rest of the slaves to get them in sync
  4. If #3 is not possible, you’ll have to rebuild slaves from scratch, using the data from the new master
  5. Perform the switch (change proxy configuration, move virtual IP, anything you need to move the traffic to the new master)

This is a cumbersome process and while it’s possible to manually perform all the steps, its very easy to make mistakes. There are options to automate it, though. One of the best solutions is MHA - a tool which handles the failover, forced or planned, it doesn’t matter. It is designed to find a slave that is the most up to date compared with the master. It will also try to apply any missing transactions to this slave (if the binary logs on the master are available). Finally, it should reslave all of the slaves, wherever possible, to the new master.

Along with MySQL 5.6, Oracle introduced Global Transaction Identifiers and this opened a whole new world for HA possibilities in MySQL replication. For starters, you can easily reslave any slave to any master - something which had not been possible with regular replication. There is no need to check binlog positions, all you need to know is CHANGE MASTER TO … MASTER_AUTO_POSITION=1; Even though the reslaving part is easy, you still have to keep an eye on the slave’s status and determine which one will be the best candidate for a master. Regarding tooling: MHA can be used in GTID replication in a similar way as with regular replication. In addition, in such setup it is possible to use binlog servers as a source of missing transactions. Oracle also created a tool - mysqlfailover which performs periodical or constant health checks for the system and has support for both automated and user-initiated failover.

The main issue with standard MySQL replication is that by default it is asynchronous which means, in short, that in the event of master’s crash, it is possible that not all transactions were replicated to at least one of the slaves. If a master is not accessible (so tools like MHA can’t parse its binlogs to extract missing data), it means that this data is lost. To eliminate this problem, semi-sync replication was added to MySQL. It ensures that at least one of the slaves got the transaction and wrote it in its relay logs. It may be lagging but the data is there. Therefore, if you use MySQL replication, you may consider setting up one of your slaves as a semi-sync slave. This is not without impact, though - commits will be slower since the master needs to wait for the semi-sync slave to log the transactions. Still, its something that you may want to consider - it is possible that for your workload it won’t make a visible difference. By default, ClusterControl works in this mode with MySQL replication. If you are using GTID-based failover, you should also be aware of Errant Transactions.

Clustering

The ultimate solution to HA is to use a synchronous (or at least “virtually” synchronous) cluster. This leads us to MySQL Cluster and Galera (in all it’s flavors).

MySQL Cluster is based on the NDB engine and delivers great point-select performance or inserts. It provides internal redundancy for the data as well as in connectivity layer. This is one of the best solutions, as long as it is feasible to use in your particular case. This is also its main issue - it is not your regular MySQL/InnoDB and behaves differently. The way it stores data (partitioned across multiple data nodes) makes some of the queries much more expensive as there is quite a bit of network activity needed to grab the data and prepare a result. More information in our MySQL Cluster training slides

Galera, be it Codership’s vanilla version, MariaDB Cluster or Percona XtraDB Cluster, much closer resembles MySQL with InnoDB. Actually, it does use InnoDB as storage engine. There are a couple of things to keep an eye on (very big transactions, DDL’s) but for most of the cases, it is the same MySQL/InnoDB that we are used to. Galera does not split the data, it uses multiple nodes, each has a full copy of the dataset - similar concept to the master/slave. The main difference is that the replication protocol is “virtually” synchronous which means that the data is almost immediately available across the cluster - there is no slave lag. Another important aspect, when comparing Galera to NDB cluster, is the fact that every node has a full dataset available. It makes it harder to scale (you can’t add more nodes to add more data capacity of the cluster) but on the other hand, it is easier to run all kind of queries, reporting included - no need to move the data across the network. More information on this online tutorial for Galera Cluster

Both clusters, practically speaking (there are some exceptions on both sides), work as a single instance. Therefore it is not important which node you connect to as long as you get connected - you can read and write from any node.

From those options, Galera is a more likely choice for the common user - its workload patterns are mostly close to the standalone MySQL, maintenance is also somewhat similar to what users are used to do. This is one of the biggest advantages of using Galera. MySQL Cluster (NDB) may be a great fit for your needs but you have to do some testing to ensure its indeed the case. This webinar discusses the differences between Galera and NDB

Proxy layer

Having MySQL setup one way or another is not enough to achieve high availability. Next step would be to solve another problem - how should I connect to the database layer so I’ll always connect to hosts which are up and available?

Here, a proxy layer can be very useful. There are couple of options to pick from.

HAProxy

HAProxy is probably the most popular software proxy out there, at least in MySQL world. It is fast, easy to configure and there are numerous howto’s and config snippets in the Internet which makes it easy to set it up. On the other hand, HAProxy does not have any sophisticated database logic and is not aware of what’s going on in MySQL or Galera Cluster. It can check MySQL’s port but that’s all - it’s either up or down. It can be a serious problem for both regular replication and setups based on Galera Cluster.

Regular replication has two types of hosts - master, serving reads and writes, and read-only slaves. If we set up an automated failover using, for example, MHA, it may happen that the master is no longer a master and one of the slaves is no longer a slave. Proxy configuration has to be changed, ideally - dynamically. Galera cluster, on the other hand, has nodes which may be in various states. A node can be a donor, serving data to the joining node. A node can be joining the cluster. A node also can be desynced manually (for example, during the time you’re taking a backup). Finally, a node can be in non-Primary state. It is not a 0/1 situation - we may want to avoid nodes which are in the donor state as they do significant amount of I/O and it can cause impact to production. We also do not want to use joining nodes as they most likely are not up to date in terms of executed writesets. More details can be found in this webinar on HAProxy

HAProxy, out of the box, do not have any options to handle such cases. It has a feature which we may utilize to enhance its abilities - HTTP check. Basically, instead of checking if a given port is open or close, HAProxy may do a HTTP connection to a given port. It it receives 200 code, it assumes that the service is up. Any other code, let’s say 503 (which is pretty popular in scripts) will trigger ‘service down’ state. This, along with xinetd and a simple (or more complex) script allows a DBA to implement more complex logic behind the scenes. The script may check the MySQL replication topology and return the correct error code depending on whether a host is a slave or not, depending on which backend is used (usually we define one backend for a master and one for all slaves, as described here). For Galera, it may check the node’s state and, based on some logic, decide if it’s ok to serve reads from the node or not.

MaxScale

One of the latest additions to the MySQL ecosystem is MaxScale, a proxy developed by MariaDB Corporation. The main difference over HAProxy is that MaxScale is database-aware. It was designed to work with MySQL and it gives a DBA more flexibility. It also has a significant number of features, in addition to being a proxy. For example, should you need a binlog server, MaxScale may help you here. From an HA point of view though, the most important feature is its ability to understand MySQL states. If you use regular replication, MaxScale will be able to determine which node is the master and which one is a slave. In case of failover, this makes one less config change to keep in mind. In case of Galera Cluster, MaxScale has the ability to understand which node is joined and which is not. This helps to keep traffic away from nodes which are, for example, receiving incremental state transfer. If you have Galera, MaxScale also picks up one of the nodes as a “master” even though there is no “master” in a sense of normal replication. It is still very useful - in case you’d like to perform a read/write split (to avoid deadlocks, for example), you can rely on the proxy to direct your writes to a single node in the cluster while the reads will hit the other nodes. We previously blogged about how to deploy/configure MaxScale.  

There are also some issues with MaxScale that you need to be aware of. Even though it is GA, it is relatively new software. Therefore detailed tests should be carried out to check if the features that you will rely upon do work as advertised. Another problem, somehow connected, is that MaxScale uses quite a bit of CPU. It is understandable as some of the features require processing power, but it may be a limitation for environments with larger traffic. We assume that eventually, this will be optimized but for now, this is something you need to keep in mind. You might want to check out performance benchmark MaxScale vs HAProxy.

HA for proxies

So, here we are, our database and proxy layers are up and running. Proxies are configured to split the workload across the database layer, ensuring that traffic is served even if some of the  database instances are down. Next problem to solve is - what happens if your proxy goes down? How do you route traffic to your databases?

If you use Amazon Web Services, Elastic Load Balancer (ELB) is a great tool to solve this problem. All you need to do is to set it up with proxy nodes as backend and you are all good. Under the hood AWS will create several ELB instances that will be highly available and will route the traffic to those proxy nodes which are up.

If you do not use AWS, you may need to develop some other method. One of them could be to have a virtual IP assigned to one of the proxy instances. If the instance is down, the IP will be moved to another proxy. Keepalived is one of the tools that could provide this kind of functionality, but there are others as well. One of the advantages of this setup is that you only have two proxy nodes on which you need to introduce configuration changes (as compared to a number of instances, as described in the next paragraph). Two nodes is the minimal requirement for HA. The disadvantage is that only one of them will be up at any given time - this could be a limitation if the workload is high.

Another approach could be to collocate proxy servers on application servers. Then you can configure the application to connect to the database nodes using a proxy installed on localhost. The reasoning behind it is that by sharing hardware we minimize the chance that the proxy will be down while application server will be up. It is more probable that both services will be either up or down and if a given application instance works, it will be able to connect to the proxy. The main advantage of this setup is that we have multiple proxy nodes, which helps to scale. On the other hand, it is more cumbersome to maintain - any configuration changes have to be introduced on every node.

Do we need a proxy layer?

While a proxy layer is useful, it is not required. It’s especially true if we are talking about Galera Cluster. In such case you can as well read and write to any of the nodes and if a given node doesn’t respond, you can just skip it and move the next one. You may encounter issues with deadlocks but as long as you are ok with it (or you can work around them), there’s no need to add additional complexity. If you’d like to perform an automated failover in MySQL replication, things are different - you have a single point where you can write - a master. One of possibilities is to use a virtual IP as a point where the application can write. Then you can move it from host to host, following the replication chain changes, ensuring that it always points to the current master.

Split-brain scenarios

There are cases where issues in communication between data replicas may lead to two separate data sets, each one randomly serving applications without coordinating with the other one.

Let’s take a look at the simplest example - one master, two slaves, VIP pointing to the master, automated failover. 

  • Master loses network connection
  • Failover is deemed as needed 
  • one of the slaves is staged to be a new master
  • the other slave is reslaved
  • VIP is assigned to the new master.

So far so good. There’s a ticking bomb hidden in the basement, though. The old master lost the network connection - this was main reason for the failover, but it also means that it was not possible to connect to it and take down the VIP. If its connection recovers, you’ll end up with two hosts having the same VIP. For a while at least, as you would probably have some scripts to detect such a situation and take down VIP on the old master. During this short time, some of the writes will hit the old master creating a data mismatch.

It’s hard to get protected against such situation. What you want to do is to have a STONITH implemented (Shoot The Other Node In The Head, one of the nicest acronyms in IT). Basically, you want to ensure that after a successful failover, the former master is down as in “down and will never come back up”. There are numerous ways to achieve this and it mostly depends on your environment. Barebone servers are more flexible here.

You may want to use a separate network to form a “backup” link - one switch, couple of patchcords. Something disconnected from main network, routers etc. You can use such a connection to check the health of the other node - maybe it’s just a primary network that failed? Such dedicated connection can also be used for a IPMI or some other KVM-ish access. Maybe you have access to the manageable power strip and you can turn off a power outlet? There are many ways to shutdown the server remotely if you are in the datacenter. In a cloud environment, things are different but the least you could do is to utilize different NIC’s and create a bonded interface (keeping fingers crossed that, behind the scenes, they do not use exactly the same hardware). If using AWS, you can also try and stop the node using the EC2 CLI.

We are aware that this topic is more suitable for a book than a mere blog post. High Availability in MySQL is a complex topic which requires plentiful of research and depends heavily on the environment that you use. We’ve tried to cover some of the main aspects, but do not hesitate to hit the comment button and let us know your thoughts.

Blog category:

by Severalnines at June 15, 2015 01:14 PM

June 11, 2015

MariaDB Foundation

MariaDB 5.5.44 now available

Download MariaDB 5.5.44

Release Notes Changelog What is MariaDB 5.5?

MariaDB APT and YUM Repository Configuration Generator

mariadb-seal-shaded-browntext-altThe MariaDB project is pleased to announce the immediate availability of MariaDB 5.5.44. This is a Stable (GA) release.

See the Release Notes and Changelog for detailed information on this release and the What is MariaDB 5.5? page in the MariaDB Knowledge Base for general information about the MariaDB 5.5 series.

Thanks, and enjoy MariaDB!

by Daniel Bartholomew at June 11, 2015 02:35 PM

June 08, 2015

Jean-Jerome Schmidt

Become a MySQL DBA blog series - Backup and Restore

It is not uncommon that developers, network/system administrators, or DevOps folks with general backgrounds, find themselves in a DBA role at some point in their career. So, what does a DBA do? In the previous post, we covered monitoring and trending practices, as well as some popular tools that you might find handy in your day to day work. 

We’ll continue this blog series with another basic but crucial DBA responsibility - taking backups of your data. Backup and restore is one of the most important aspects of database administration. If a database crashed and there was no way to recovery it, any resulting data loss might lead to devasting results to a business. One could argue that you can protect against crashes by replicating to multiple servers or data centers. But if it is an application error that propagates to all instances, or a human dropping a part of a database by mistake, you will probably need to restore from backup.

Different backup methodologies

There are multiple ways to take a backup of a MySQL database, but we can divide these methods into two groups - logical and physical.

Logical backups contain data that is exported using SQL commands and stored in a file. This can be, for e.g., a set of SQL commands (INSERTs), that, when executed, will result in restoring a content of the database. It does not have to be SQL code, it can be anything that is restorable - you can as well use SELECT … INTO OUTFILE to generate a file with your database contents. With some modifications to the output file’s syntax, you can store your backup in CSV files.

Physical backups are copies of physical database files. Here, we would make a binary copy of a whole database by, for example, copying all of the files or by making a snapshot of a volume where data directory is located.

A logical backup is usually slower than a physical one, because of the overhead to execute SQL commands to get the data out and then to execute another set of SQL commands to get the data back into the database. This is a severe limitation that tends to prevent the logical backup from being a single backup method on large (high tens or hundreds of gigabytes) databases. On the other hand, a major advantage of the logical backup is the fact that, having all data in the SQL format, you can restore single rows.

Physical backups are not that flexible - while some of the methods make it possible to restore separate tables, you cannot go down to row level. On the other hand, this is a fastest way to backup and restore your database - you are limited only by the performance of your hardware - disk speed and network throughput will be the main limiting factor.

One more important concept, when it comes to the MySQL backup, is point in time recovery. A backup, whether logical or physical, takes place at a given time. This is not enough, you have to be able to restore your database to any point in time, also to a point which happened between the backups. In MySQL, the main way to handle point in time recovery is to use binary logs to replay the workload. With that in mind, a backup is not complete unless you make a copy of the binlogs along with it.

Logical backup methods

mysqldump

The most known method is definitely mysqldump, a CLI tool that enables the DBA to create an SQL dump of the database. Mysqldump is a single-threaded tool and this is its most significant drawback - performance is ok for small databases but it becomes quickly unacceptable if the data set grows to tens of gigabytes. If you plan to use mysqldump as a mean of taking backups, you need to keep a few things in mind. First, by default mysqldump doesn’t include routines and events in its output - you have to explicitly set --routines (-R) and --events (-E) flags. Second, if you want to take a consistent backup then things become tricky. As long as you use InnoDB only, you can use --single-transaction flag and you should be all set. You can also use --apply-slave-statements to get the CHANGE MASTER statements at the beginning of the dump if you plan to create a slave using the backup. If you have other non-transactional tables (MyISAM for example), then mysqldump will have to lock the whole database to ensure consistency. This is a serious drawback and may be one of the reasons why mysqldump won’t work for you.

By default, mysqldump creates a file where you’ll first find SQL to create the schema and then SQL to restore data. To have more flexibility, you may change this behavior and script the backup in such a way that it creates a schema dump first and then the rest of the data. Additionally, you may also want to script the backup process so that it stores separate tables in separate sql files. This will come in handy when you need to restore several rows or to compare current data with the previous day’s data. It’s all about the file size: separate dumps, created per table, will likely to be smaller and more manageable. E.g., in case you want to use a CLI tool to find a given row in the SQL file.

SELECT … INTO OUTFILE

This is more of a way how mysqldump works rather than a separate backup method, but it’s distinct enough to be included here. Mysqldump can be executed in a mode where, instead of SQL syntax, it will generate a backup in some other way. In general, its format is similar to CSV with a difference that the actual format can be defined by the user. By default, it is tab-separated instead of comma-separated.
This format is faster to load than SQL dump (you can use LOAD DATA INFILE to make it happen) but it is also harder to use to restore a single row. Most people probably don’t remember LOAD DATA INFILE syntax, while almost everybody can run SQL.

Mydumper/myloader

Those tools work in pair to overcome the main pain-point of mysqldump - single thread. Mydumper can be used to generate a backup of the data (and data only, you need also to use mysqldump --no-data to get a dump of the schema) and then load it. Both processes can use multiple threads. You can either split the workload per table or you can define a size of a chunk and then large tables will also be worked on by numerous threads. It’s still a logical backup so the process may still take a while. Based on numbers reported by different users, mydumper can load data up to 2-3 times faster. The process may still take days, though - depending on the database size, row size etc.

Even if the restore time is not acceptable for your data set, you still may be interested in mydumper because of periodical MySQL upgrades. For any major version upgrade (like 5.5 -> 5.6 or upcoming 5.6 -> 5.7), the recommended way for an upgrade is to perform a logical dump of the data and then load it back up. In such a case, time is not that crucial but it is still much better to finish the restore in 2-3 days using mydumper/myloader rather than 6 - 9 days using mysqldump.

Physical backup methods

xtrabackup

Percona’s xtrabackup is the backup method for MySQL. It is a tool that allows the DBA to take a (virtually) non-blocking snapshot of the InnoDB database. It works by copying the data files physically from one volume to another location. You can also stream the backup over the network, to a separate backup host where the backup will be stored. While copying the data, it keeps an eye on the InnoDB redo log and writes down any change that happened in the meantime. At the end, it executes FLUSH TABLES WITH READ LOCK (that’s why we used a word ‘virtually’) and finalizes the backup. Thanks to the last lock, the backup is consistent. If you use MyISAM tables, xtrabackup is more impacting as the non-transactional tables have to be copied over the network while FTWRL is in place - this, depending on the size of those tables, may take a while. During that time, no query will be executed on the host.

Restore is pretty simple - especially if you apply redo logs to the backup taken. Theoretically speaking, you could as well start MySQL without any further actions but then InnoDB recovery will have to be performed at the start. This process takes time. Preparing the backup first (by applying redo logs) can be done in its own time. When the backup needs to be (quickly) restored, you won’t have to go over this process. To speed up the backup preparing phase (using --apply-log) you may increase memory available for xtrabackup using --use-memory flag. As long as you have several gigabytes of free memory, you can use them here to speed up the process significantly.

Xtrabackup is probably the most popular tool out there and it’s not without reason. It is very flexible, you can use multiple threads to copy the files quicker (as long as your hardware permits it), you can use compression to minimize size of the backup. As we mentioned, it is possible to create a backup locally or stream it over the network using (for example) an SSH tunnel or netcat. Xtrabackup allows you to create incremental backups which take significantly less disk space than the full one and won’t take as much time. When restoring, though, it is a slower process as deltas have to be applied one after another and it may take significant amount of time.

Another feature of xtrabackup is its ability to backup single schemas or even tables. It has its uses but also limitations. First of all, it can be used to restore several rows that got dropped accidently. It is still a less efficient way of doing this than restoring that data from an SQL dump, as you’d have to create a separate host, restore the given table, dump missing rows and load them onto the production server - you cannot restore a whole table because you’ll be missing data that happened after the backup was taken. It is possible to work it out with binary logs but it will take too much time to be feasible. On the other hand, if a whole table or schema is missing, you should be able to restore that pretty easily.

The main advantage of the xtrabackup over logical backups is its speed - performance is limited by your disk or network throughput. On the other hand, its much harder to recover single rows from the database. The ideal use case for xtrabackup is to recover a whole host from scratch or provision a new server. It comes with options to store information about MySQL replication or Galera writeset replication along with the backup. This is very useful if you need to provision a new replication slave or a new node in a cluster.

Snapshots

We’ll be talking here about backing up MySQL using snapshots - it does not matter much how you are taking those snapshots. It can be either LVM installed on a host (using LVM is not an uncommon way of setting up MySQL servers) or it could be a “cloudish” snapshot - EBS snapshot or it’s equivalent in your environment. If you use SAN as a storage for your MySQL server and you can generate a snapshot of a volume, it also belongs here. We will focus mostly on the AWS, though - it’s the most popular cloud environment.

In general, snapshots are a great way of backing up any data - it is quick and while it adds some overhead, there are definitely more pros of this method than cons. The main problem with backing up MySQL using the snapshot is consistency - taking a snapshot on the server is comparable to a forced power off. If you run your MySQL server in full durability mode, you should be just fine. If not, it is possible that some of the transactions won’t make it to disk and, as a result, you will lose data. Of course, there are ways of dealing with this issue. First of all, you can change durability settings to more durable (SET GLOBAL innodb_flush_log_at_trx_commit=1, SET GLOBAL sync_binlog=1) prior to the snapshot and then revert back to the original settings after a snapshot has been started. This is the least impacting way of making sure your snapshot is consistent. Another method include stopping a slave (if the replication is the only means of modifying data on a given host) and then run FLUSH TABLES. You can also stop the activity by using FLUSH TABLES WITH READ LOCK to get a consistent state of the database. What is important to keep in mind, though, is that no matter which approach you take, you will end up with data in “crashed” state - if you’d like to use this data to create a new MySQL server, at the first start MySQL will have to perform recovery procedures on InnoDB tables. InnoDB recovery, on the other hand, may take a while, hours even - depending on the amount of modifications.

One way to go around this problem is to take cold backups. As they involve stopping MySQL before taking a snapshot, you can be sure that data is consistent and it’s all just a matter of starting MySQL to get a new server up. No recovery is needed because data came from a server which did a clean shutdown. Of course, stopping MySQL servers is not an ideal way to handle backups but sometimes it is feasible. For example, maybe you have a slave dedicated to ad-hoc queries, executed manually, which does not have to be up all the time? You could use such a server also as a backup host, shutting down MySQL from time to time in order to take a clean snapshot of its data.

As we discussed above, getting a consistent snapshot may be tricky at times. On the pro side, snapshots are a great way of provisioning new instances. This is true especially in the cloud, where you can easily create a new node using a few clicks or API calls. That it’s all true as long as you use a single volume for your data directory. Until recently, to get a decent I/O performance in EC2, the only option was to use multiple EBS volumes and setup a RAID0 over them. It was caused by a limit of how many pIOPS a single EBS instance may have. This limit has increased significantly (to 20k pIOPS), but even now there are still reasons to use RAIDed approach. In such a setup, you can’t just take snapshots and hope for the best - such snapshots will be inconsistent on RAID level, not to mention MySQL level. Cold backup will still work, as MySQL is down and no disk activity should happen (as long as MySQL data directory is located on a separate device). For more “hot” approaches, you may want to look at ec2-consistent-snapshot - a tool that gives you some options how to perform a consistent snapshot of a RAIDed volume with several EBSes under the hood. It can help you to automate some MySQL tasks like stopping a slave and running FLUSH TABLES WITH READ LOCK. It can also freeze the filesystem on operating system level. ec2-consistent-snapshot is tricky to setup and needs detailed tests, but it is one of the options to pick from.

Good practices and guidelines

We covered some ways in which you can take a backup of the MySQL database. It is time to put it together and discuss how you could setup an efficient backup process.

The main problem is that all of the backup methods have their pros and cons. They also have their requirements when it comes to how they affect regular workloads. As usual, how you’d like to make backups depends on the business requirements, environment and resources. We’d still like to share some guidelines with you.

First of all, you want to have an ability to perform point-in-time recovery. It means that you have to copy binary logs along your backup. It can be either copy from disk to disk or EBS snapshot of a volume where binlogs are located - you have to have them available.

Second - you probably want to have an ability to restore single rows. Now, everything depends on your environment. One way would be to take a logical backup of your system but it may be hard to execute on a large data set. On the other hand, if you can restore a database from the physical backup (for example, click to create a new EBS volume out of the snapshot, click to create a new EC2 instance, click to attach EBS to it), you could be just fine with this process and you won’t have to worry about the logical backup at all.

For larger databases you will be forced to use one of the physical backup methods because of the time needed to perform logical one. Next question - how often do you want to perform a backup? You have binary logs so, theoretically speaking, it should be just fine to get a backup once per day and restore the rest of the data from binlogs. In real world, though, replaying binlogs is a slow and painful process. Of course, your mileage may wary - it all depends on the amount of modifications to the database. So, you need to test it - how quickly you can process and replay binary logs in your environment? How it looks like compared to your business requirements which determines maximum allowed downtime? If you use snapshots - how long the recovery process takes? Or, if you use a cold backup approach, how often can you stop your MySQL and take a snapshot? Even on a dedicated instance, you can’t really do it more often than once per 15 - 30 minutes, workload and traffic permitting. Remember, cold backup = replication lag, no matter if you use regular replication or Galera Cluster (in Galera it’s just called differently - node is in Desync state and applying missing writesets after IST). The backup node has to be able to catch up between backups.

Xtrabackup is a great tool for taking backups - using its incremental backup feature, you can easily take deltas every five minutes or so. On the other hand, restoring those increments may take long time and is error-prone - there is a bunch of not yet discovered bugs in both xtrabackup and InnoDB which sometimes corrupts backups and render them useless. If one of the incremental backups is corrupted, the rest will not be usable. This leads us to another important point - how good is the backup data?

You have to test your backups. We mentioned it in a previous post - as a part of the healthcheck you should be checking if the backup, whichever method you choose to use, looks sane. Looking at file sizes is not enough though. From time to time, for example on monthly basis (but again, it depends on your business requirements), you should perform a full restore test - get a test server, install MySQL, restore data from the backup, test that you can join the Galera cluster or slave it off the master. Having backups is not enough - you need to ensure you have working backups.

We hope this introduction to MySQL backup methods will help you find your own solution in safeguarding your data. The main thing to keep in mind is that you should not be afraid of testing - if you don’t know whether your backup process design makes sense, do test it. As long as you have a working backup process that fulfills your organization’s requirements, there is no bad way of designing it. Just remember to test the restore from time to time and ensure you still can restore the database in a timely manner - databases change, their content too. Usually it grows. What was acceptable one year ago may not be acceptable today - you also need to take that under consideration.

Related Resources

Blog category:

by Severalnines at June 08, 2015 03:10 PM

June 05, 2015

Shlomi Noach

Orchestrator visual cheatsheet

Orchestrator is growing. Supporting automatic detection of topologies, simple refactoring of topology trees, complex refactoring via Pseudo-GTID, failure detection and automated discovery, it is becoming larger and larger by the day.

One of the problems with growign projects is hwo to properly document them. Orchestrator enjoys a comprehensive manual, but as these get more and more detailed, it is becoming difficult to get oriented and pointed in the right direction. I've done my best to advise the simple use cases throughout the manual.

One thing that is difficult to put into words is topologies. Explaining "failover of an intermediate master S1 that has S2,...,Sn slaves onto a sibling of S1 provided that..." is too verbose. So here's a quick visual cheatsheet for (current) topology refactoring commands. Refactoring commands are a mere subset of overall orchestrator commands, but they're great to play with and perfect for visualization.

The "move" and related commands use normal replication commands (STOP SLAVE; CHANGE MASTER TO; START SLAVE UNTIL;"...).

The "match" and related commands utilize Pseudo-GTID and use more elaborate MySQL commands (SHOW BINLOG EVENTS, SHOW RELAYLOG EVENTS).

So without further ado, here's what each command does (and do run "orchestrator" from the command line to get a man-like explanation of everything, or just go to the manual).

orchestrator-cheatsheet-visualized-7
orchestrator-cheatsheet-visualized-8
orchestrator-cheatsheet-visualized-9
orchestrator-cheatsheet-visualized-10
orchestrator-cheatsheet-visualized-11
orchestrator-cheatsheet-visualized-12
orchestrator-cheatsheet-visualized-13
orchestrator-cheatsheet-visualized-1
orchestrator-cheatsheet-visualized-2
orchestrator-cheatsheet-visualized-3
orchestrator-cheatsheet-visualized-4
orchestrator-cheatsheet-visualized-5
orchestrator-cheatsheet-visualized-6
orchestrator-cheatsheet-visualized-14
orchestrator-cheatsheet-visualized-15
orchestrator-cheatsheet-visualized-16
orchestrator-cheatsheet-visualized-17
orchestrator-cheatsheet-visualized-18

by shlomi at June 05, 2015 12:19 PM

June 04, 2015

MariaDB Foundation

MariaDB 10.1.5 now available

Download MariaDB 10.1.5

Release Notes Changelog What is MariaDB 10.1?

MariaDB APT and YUM Repository Configuration Generator

mariadb-seal-shaded-browntext-altThe MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.5. This is a Beta release.

See the Release Notes and Changelog for detailed information on this release and the What is MariaDB 10.1? page in the MariaDB Knowledge Base for general information about the MariaDB 10.1 series.

Thanks, and enjoy MariaDB!

by Daniel Bartholomew at June 04, 2015 05:49 PM

June 02, 2015

MariaDB AB

Protect Your Data: Row-level Security in MariaDB 10.0

geoff_montee_g

Most MariaDB users are probably aware of the privilege system available in MariaDB and MySQL. Privileges control what databases, tables, columns, procedures, and functions a particular user account can access. For example, if an application stored credit card data in the database, this data should probably be protected from most users. To make that happen, the DBA might disallow access to the table or column storing this sensitive data.

However, sometimes the privilege system isn't sufficient to secure data. Sometimes data needs to be secured beyond tables and columns. In those cases, row-level security (sometimes abbreviated RLS) may be necessary. Possible use cases for row-level security are:

  • A government agency might only allow a user to see a row based on classification (CONFIDENTIAL, SECRET, TOP SECRET) and other factors.
  • An e-commerce site storing credit card information might only allow users to see the credit cards tied to their account.
  • A hospital or clinic might only allow staff to see records for patients that they are authorized to see.
  • A regional store manager might only be able to see employment records for employees and inventory records for stores in their region.

MariaDB's privilege system does not support row-level privileges, so developers and DBAs need to find another way to implement row-level security.

Sometimes, the row-level security logic is taken care of by the application. Other times, it can be more effective or better design to put the row-level security logic into the database. For example, if multiple applications use the same database, it might be better for the database to handle security. That way, the security functionality only has to be designed once, and it works the same for every application.

In this blog post, I will show a very simple way to implement row-level security in MariaDB 10.0 using the following features:

Of course, this is just a simple example. This is not the only way to implement row-level security in MariaDB.

Security Labels and Policies

To implement row-level security, you need two things:

  • Some way to label the data. This might be the name of the owner of the data, or a classification level (CONFIDENTIAL, SECRET, TOP SECRET), or it might be something else entirely.
  • Some rules or policies that outline which users can see data labelled with each security label.

Real world security labels and policies can be very complicated. There might be a hierarchical system of labels, or there might be several groups of labels that contribute different authorization information to the policy.

In this example, we will use a very simple labelling system. Data will be labelled using colors. For a user to access data labelled with the red security label, the user needs to be granted access to the red security label. For the user to access data labelled blue, the user needs to be granted access to the blue security label. The labels of each color work exactly the same way.

Now, let's start creating the database objects.

First, let's create a database to store access information.

CREATE DATABASE accesses;

Second, let's store the possible security labels. Bitstrings can be a good way to efficiently store a lot of security labels. Each label is assigned a bit field, and then bitwise operations can be used to get/set individual labels from the bitstring.

We will use bitstrings to store the labels that a user can access, so let's also store the bit field of the label in a BIT column.

CREATE TABLE accesses.security_labels (
	id INT AUTO_INCREMENT PRIMARY KEY,
	security_label VARCHAR(50),
	label_value BIT(5)
);

INSERT INTO accesses.security_labels (security_label, label_value) VALUES
	('red', b'00001'),
	('blue', b'00010'),
	('green', b'00100'),
	('yellow', b'01000'),
	('purple', b'10000');

Third, let's store the actual access levels for the user accounts.

CREATE TABLE accesses.user_accesses (
	id INT AUTO_INCREMENT PRIMARY KEY,
	user VARCHAR(50),
	access_label_values BIT(5)
);

INSERT INTO accesses.user_accesses (user, access_label_values) VALUES
	('root@localhost', b'11111'),
	('alice@localhost', b'00011'),
	('bob@localhost', b'11100'),
	('trudy@localhost', b'00000');

Fourth, let's create a stored function to represent our row-level security policy.

The function takes user name X and security label Y, and it returns true if the user is allowed to access the label. Notice that the function uses the bitwise AND (&) operator to get the individual label's bit field from the bitstring column.

DELIMITER //

CREATE FUNCTION accesses.access_check (v_user VARCHAR(50), v_security_label VARCHAR(50)) 
RETURNS BOOLEAN
NOT DETERMINISTIC
READS SQL DATA
SQL SECURITY INVOKER
BEGIN
	SELECT label_value INTO @v_label_value
	FROM accesses.security_labels
	WHERE security_label = v_security_label;
	
	SELECT @v_label_value & access_label_values INTO @v_label_check
	FROM accesses.user_accesses 
	WHERE user = v_user;

	IF @v_label_check = @v_label_value THEN
		RETURN true;
	ELSE
		RETURN false;
	END IF;
END
//

DELIMITER ;

Now, let's test out the function with a few user and label combinations.

MariaDB [(none)]> SELECT accesses.access_check('alice@localhost', 'red');
+-------------------------------------------------+
| accesses.access_check('alice@localhost', 'red') |
+-------------------------------------------------+
|                                               1 |
+-------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT accesses.access_check('alice@localhost', 'blue');
+--------------------------------------------------+
| accesses.access_check('alice@localhost', 'blue') |
+--------------------------------------------------+
|                                                1 |
+--------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT accesses.access_check('alice@localhost', 'green');
+---------------------------------------------------+
| accesses.access_check('alice@localhost', 'green') |
+---------------------------------------------------+
|                                                 0 |
+---------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT accesses.access_check('bob@localhost', 'red');
+-----------------------------------------------+
| accesses.access_check('bob@localhost', 'red') |
+-----------------------------------------------+
|                                             0 |
+-----------------------------------------------+
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT accesses.access_check('bob@localhost', 'blue');
+------------------------------------------------+
| accesses.access_check('bob@localhost', 'blue') |
+------------------------------------------------+
|                                              0 |
+------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT accesses.access_check('bob@localhost', 'green');
+-------------------------------------------------+
| accesses.access_check('bob@localhost', 'green') |
+-------------------------------------------------+
|                                               1 |
+-------------------------------------------------+
1 row in set (0.00 sec)

Protecting the Data

Now that the user accounts' accesses are set up, let's set up some data to protect.

First, let's create a regular table with some labeled data.

CREATE DATABASE unprotected;

CREATE TABLE unprotected.important_data (
	id INT AUTO_INCREMENT PRIMARY KEY,
	data VARCHAR(50),
	security_label VARCHAR(50)
);

INSERT INTO unprotected.important_data (data, security_label) VALUES
	('correct', 'red'),
	('horse', 'blue'),
	('battery', 'green'),
	('stapler', 'yellow'),
	('correcter', 'purple');

Second, let's create a view that queries the unprotected table in a secure manner.

CREATE DATABASE protected;

CREATE 
SQL SECURITY DEFINER
VIEW protected.important_data
AS
	SELECT *
	FROM unprotected.important_data uid
	WHERE accesses.access_check(SESSION_USER(), uid.security_label)
WITH CHECK OPTION;

Some things to note here:

  • The protected.important_data view queries the unprotected.important_data table.
  • The view adds a WHERE clause that filters the results based on the accesses of SESSION_USER().
  • SESSION_USER() has to be used, rather than CURRENT_USER(), since the view is defined with SQL SECURITY DEFINER.
  • SQL SECURITY DEFINER has to be used, since the view's invoker (i.e. a normal user) usually won't have privileges to directly access the unprotected.important_data table or the accesses.access_check function. (Giving regular users direct access to these objects may allow ways to bypass the security mechanisms.)
  • The WITH CHECK OPTION makes it so that users can only insert and update data that they are authorized to see. Depending on the type of data, if a user is inserting data that they aren't authorized to see, it could mean that a security incident of some kind (potentially outside the database) has already occurred, which allowed the user to receive that data.

Testing the Interface

Now that everything is set up, let's create some user accounts and test it out.

First, create an anonymous account and grant it access to the protected database.

CREATE USER ''@'localhost';
GRANT SELECT, INSERT, UPDATE, DELETE ON protected.* TO ''@'localhost';

Now we can log in as any user to this database.

[gmontee@localhost ~]$ mysql -u alice --execute="SELECT SESSION_USER(), CURRENT_USER();"
+-----------------+----------------+
| SESSION_USER()  | CURRENT_USER() |
+-----------------+----------------+
| alice@localhost | @localhost     |
+-----------------+----------------+
[gmontee@localhost ~]$ mysql -u bob --execute="SELECT SESSION_USER(), CURRENT_USER();"
+----------------+----------------+
| SESSION_USER() | CURRENT_USER() |
+----------------+----------------+
| bob@localhost  | @localhost     |
+----------------+----------------+

Now let's test out some queries using different user accounts.

[gmontee@localhost ~]$ mysql -u root --execute="SELECT * FROM protected.important_data"
+----+-----------+----------------+
| id | data      | security_label |
+----+-----------+----------------+
|  1 | correct   | red            |
|  2 | horse     | blue           |
|  3 | battery   | green          |
|  4 | stapler   | yellow         |
|  5 | correcter | purple         |
+----+-----------+----------------+
[gmontee@localhost ~]$ mysql -u alice --execute="SELECT * FROM protected.important_data"
+----+---------+----------------+
| id | data    | security_label |
+----+---------+----------------+
|  1 | correct | red            |
|  2 | horse   | blue           |
+----+---------+----------------+
[gmontee@localhost ~]$ mysql -u bob --execute="SELECT * FROM protected.important_data"
+----+-----------+----------------+
| id | data      | security_label |
+----+-----------+----------------+
|  3 | battery   | green          |
|  4 | stapler   | yellow         |
|  5 | correcter | purple         |
+----+-----------+----------------+
[gmontee@localhost ~]$ mysql -u trudy --execute="SELECT * FROM protected.important_data"
[gmontee@localhost ~]$ mysql -u alice --execute="SELECT * FROM protected.important_data WHERE security_label='purple'"
[gmontee@localhost ~]$ mysql -u alice --execute="SELECT * FROM protected.important_data WHERE security_label='red'"
+----+---------+----------------+
| id | data    | security_label |
+----+---------+----------------+
|  1 | correct | red            |
+----+---------+----------------+

The row-level security mechanism built into the view appears to work great. But what happens if these users try to query the actual table, rather than the view?

[gmontee@localhost ~]$ mysql -u root --execute="SELECT * FROM unprotected.important_data"
+----+-----------+----------------+
| id | data      | security_label |
+----+-----------+----------------+
|  1 | correct   | red            |
|  2 | horse     | blue           |
|  3 | battery   | green          |
|  4 | stapler   | yellow         |
|  5 | correcter | purple         |
+----+-----------+----------------+
[gmontee@localhost ~]$ mysql -u alice --execute="SELECT * FROM unprotected.important_data"
ERROR 1142 (42000) at line 1: SELECT command denied to user ''@'localhost' for table 'important_data'
[gmontee@localhost ~]$ mysql -u bob --execute="SELECT * FROM unprotected.important_data"
ERROR 1142 (42000) at line 1: SELECT command denied to user ''@'localhost' for table 'important_data'
[gmontee@localhost ~]$ mysql -u trudy --execute="SELECT * FROM unprotected.important_data"
ERROR 1142 (42000) at line 1: SELECT command denied to user ''@'localhost' for table 'important_data'

The root account can query the original table, but our other accounts don't have sufficient privileges.

Performance Concerns of Stored Functions

It was pointed out that using a non-deterministic stored function as the row-level security policy doesn't give the optimizer much power to optimize many queries. Even if indexes on expressions are added in a future version of MariaDB, it might not be safe to create an index on a non-deterministic function.

I chose to use stored functions as a way to allow for abstraction and code reuse. The stored function represents a single row-level security policy, which can be reused for all tables that need to be protected. If the policy ever needs to change, it only needs to be changed in one place.

If performance of this method is not acceptable, it may be necessary to remove the abstraction of the stored function. In that case, it is usually very easy to put the row-level security policy directly in the view. In our example, that would look like this:

CREATE 
SQL SECURITY DEFINER
VIEW protected.important_data
AS
	SELECT uid.*
	FROM unprotected.important_data uid
	JOIN accesses.user_accesses aua
	ON aua.user = SESSION_USER()
	JOIN accesses.security_labels asl
	ON asl.security_label = uid.security_label
	WHERE asl.label_value & aua.access_label_values = asl.label_value
WITH CHECK OPTION;

Here we are trading slightly more complex code for the ability to let MariaDB optimize better.

Conclusion

Although MariaDB 10.0 doesn't have a built-in row-level security mechanism, it is still fairly easy to implement row-level security with built-in features.

Has anyone been using row-level security implementations in MariaDB? Do you have any suggestions on how to improve MariaDB to make this better?

About the Author

geoff_montee_g's picture

Geoff Montee is a Support Engineer with MariaDB. He has previous experience as a Database Administrator/Software Engineer with the U.S. Government, and as a System Administrator and Software Developer at Florida State University.

by geoff_montee_g at June 02, 2015 07:07 PM

June 01, 2015

Chris Calender

MySQL 5.5.44 Overview and Highlights

MySQL 5.5.44 was recently released (it is the latest MySQL 5.5, is GA), and is available for download here:

http://dev.mysql.com/downloads/mysql/5.5.html

This release, similar to the last 5.5 release, is mostly uneventful.

There were 0 “Functionality Added or Changed” items this time, and just 15 overall bugs fixed.

Out of the 15 bugs, there were 5 InnoDB bugs (1 of which also spans partitioning), 1 security-related bug, 1 performance-related, and 3 additional potential crashing bugs. Here are the ones worth noting:

  • InnoDB: An assertion was raised on shutdown due to XA PREPARE transactions holding explicit locks.
  • InnoDB: Removal of a foreign key object from the data dictionary cache during error handling caused the server to exit.
  • InnoDB: SHOW ENGINE INNODB STATUS output showed negative reservation and signal count values due to a counter overflow error.
  • InnoDB: Estimates that were too low for the size of merge chunks in the result sorting algorithm caused a server exit.
  • InnoDB; Partitioning: The CREATE_TIME column of the INFORMATION_SCHEMA.TABLES table now shows the correct table creation time for partitioned InnoDB tables. The CREATE_TIME column of the INFORMATION_SCHEMA.PARTITIONS table now shows the correct partition creation time for a partition of partitioned InnoDB tables. The UPDATE_TIME column of the INFORMATION_SCHEMA.TABLES table now shows when a partitioned InnoDB table was last updated by an INSERT, DELETE, or UPDATE. The UPDATE_TIME column of the INFORMATION_SCHEMA.PARTITIONS table now shows when a partition of a partitioned InnoDB table was last updated. (Bug #69990)
  • Security-related: A user with a name of event_scheduler could view the Event Scheduler process list without the PROCESS privilege.
  • Performance-related: Certain queries for the INFORMATION_SCHEMA TABLES and COLUMNS tables could lead to excessive memory use when there were large numbers of empty InnoDB tables. (Bug #72322)
  • Crashing Bug: SHOW VARIABLES mutexes were being locked twice, resulting in a server exit.
  • Crashing Bug: Under certain conditions, the libedit command-line library could write outside an array boundary and cause a client program crash.
  • Crashing Bug: For a prepared statement with an ORDER BY that refers by column number to a GROUP_CONCAT() expression that has an outer reference, repeated statement execution could cause a server exit.

I don’t think I’d call any of these urgent for all, but if running 5.5, especially if not a very recent 5.5, you should consider upgrading.

For reference, the full 5.5.44 changelog can be viewed here:

http://dev.mysql.com/doc/relnotes/mysql/5.5/en/news-5-5-44.html

Hope this helps.

by chris at June 01, 2015 11:06 PM

MariaDB 10.1.4 Overview and Highlights

MariaDB 10.1.4 was recently released, and is available for download here:

https://downloads.mariadb.org/mariadb/10.1.4/

This is the 2nd beta, and 5th overall, release of MariaDB 10.1. Now that it is beta, there were not as many major changes in this release (compared to 10.1.3), but there were a few notable items as well as many overall bugs fixed (I counted 367).

Since it’s beta, I’ll only cover the major changes and additions, and omit covering general bug fixes (feel free to browse them all here).

To me, these are the highlights:

Of course it goes without saying that do not use this for production systems as it is only the 2nd beta release of 10.1. However, I definitely recommend installing it on a test server and testing it out. And if you happen to be running a previous version of 10.1, then you should definitely upgrade to this latest release.

You can read more about the 10.1.4 release here:

https://mariadb.com/kb/en/mariadb-1014-release-notes/

And if interested, you can review the full list of changes in 10.1.4 (changelogs) here:

https://mariadb.com/kb/en/mariadb-1014-changelog/

Hope this helps.

by chris at June 01, 2015 05:48 PM

May 31, 2015

Valeriy Kravchuk

Fun with Bugs #36 - Bugs fixed in MySQL 5.6.25

Two days ago Oracle had released MySQL 5.6.25, so it's time to check what bugs reported by MySQL Community are fixed there. As usual, I'll mention both a bug reporter and engineer who verified the bug. Please, pay attention to fixes in replication and partitioning - if you use these features (or queries to INFORMATION_SCHEMA with a lot of complex tables in your database), please, consider upgrading ASAP.

The following InnoDB related bugs were fixed:
  • Bug #69990 - CREATE_TIME and UPDATE_TIME are wrong for partitioned tables. Finally this bug reported by my colleague Justin Swanhart and verified by Umesh (almost immediately after it was reported) is fixed!
  • Bug #75790 - memcached SET command accepts negative values for expire time. This bug (that Oracle put into InnoDB section in the release notes) was reported and verified by Umesh
  • Bug #74686  - Wrong relevance ranking for InnoDB full text searches under certain conditions. This bug was reported by Tim McLaughlin and verified by Miguel Solorzano.
  • The last but not the least, new innodb_stress test suite by Mark Callaghan is included now, thanks to the Bug #76347 reported by Viswanatham Gudipati.
Oracle had fixed several more memcached and InnoDB-related bugs in 5.6.25, but as they were reported only internally, they are out of the scope of my posts.

A set of related bugs in Partitioning category was fixed:
  • Bug #74288 - Assertion `part_share->partitions_share_refs->num_parts >= m_tot_parts' failed. It was reported by my colleague Roel Van de Paar and verified by Umesh.
  • Bug #74634 - this bug is still private, so we do not see the details.
  • Bug #74451 - this bug is also private. We can probably assume that in case of private bug we had assertion failures or crashes on non-debug builds. So, if you use partitioning a lot, please, consider upgrading to 5.6.25 ASAP.
A lot of replication related bugs were fixed in 5.6.25:
  • Bug #75879 - memory consumed quickly while executing loop in procedure. It was reported by Zhai Weixiang (who had also provided a patch) and verified by Shane Bester. If you ask me, based on the contributions over last 2 years it's about time for Percona to hire Zhai Weixiang into our development team, or Oracle may approach him faster. He is a really brilliant engineer!
  • Bug #75781 - log lock may not be unlocked if add_logged_gtid failed. It was reported by Fangxin Flou (who had provided a patch as well) and verified by Sinisa Milivojevic.
  • Bug #75769 - this bug is still private. Release notes describes the problem as follows: "A slave running MySQL 5.6.24 or earlier could not connect to a master running MySQL 5.7.6 and later that had gtid_mode=OFF_PERMISSIVE or gtid_mode=ON_PERMISSIVE." I wonder why such a bug can be private. Either it was reported like that or we do not see all the details about the impact.
  • Bug #75574 - Can not execute change master after Error occurred in MTS mode. It was reported by Zhang Yingqiang and verified by Sveta Smirnova (while she still worked in Oracle).
  • Bug #75570 - semi-sync replication performance degrades with a high number of threads. The problem was studied in details and reported by Rene' Cannao' and verified by Umesh.
  • Bug #74734  - mysqlbinlog can't decode events > ~1.6GB. It was reported by Hartmut Holzgraefe and verified by Umesh.
  • Bug #69848 - mysql 5.6 slave out of memory error. It was reported by  Jianjun Yang and verified by Sveta SmirnovaBug #72885 (where Shane Bester had clearly identified the memory leak)was declared a duplicate. If you use master-info-repository = TABLE on your 5.6.x slaves, please, consider upgrading to 5.6.25 ASAP.
  • Bug #70711 - mysqlbinlog prints invalid SQL from relay logs when GTID is enabled. This bug was reported by Yoshinori Matsunobu and probably verified formally by Luis Soares.
 There are several fixes in other categories:
  • Bug #75740 - Fix errors detected by ASan at runtime. It was reported and verified by Anitha Gopi based on request from WebScaleSQL team. 
  • Bug #76612 - would like ability to throttle firewall ACCESS DENIED messages in error log. This feature was requested by Shane Bester. Should I tell you again how much I am happy when I see public bug reports from Oracle employees?
  • Bug #76552 - Cannot shutdown MySQL using JDBC driver. This regression bug was reported by Davi Arnaut (who provided a patch as well) and verified by Umesh.
  • Bug #76019 is private. Release notes say: "Inappropriate -Werror options could appear in mysql_config --cflags output." Why on the Earth anyone could set or leave this bug as private is beyond my imagination.
  • Bug #74517 - thread/sql/main doesn't change state/info after startup. PERFORMANCE_SCHEMA was meant to be perfect already, but still some fixes are needed. The bug was reported by Kolbe Kegel and verified by Umesh.
  • Bug #72322 - Query to I_S.tables and I_S.columns leads to huge memory usage. Now I am impressed and I want to check the fix ASAP (as release notes do not say much)! If this bug (reported by my colleague Przemyslaw Malkowski just few weeks ago, on April 11, and verified by Umesh) is really fixed, it's a huge step forward in making INFORMATION_SCHEMA usable.
  • Bug #69638 - Wrong results when running a SELECT that includes a HAVING based on a function. The only optimizer bug from Community fixed in this version was reported by Roger Esteban and verified by Umesh.
  • Bug #69453 - Prepared statement is written to general query log after its execution is finish. It was reported by my colleague Sergei Glushchenko and verified by Umesh.
  • Bug #68999 - SSL_OP_NO_COMPRESSION not defined. It was reported by Remi Colletand verified probably by Georgi Kodinov.
To summarize, 24 or so bug reports from public bugs database were fixed in 5.6.25, of them fixes for replication, partitioned tables and INFORMATION_SCHEMA look really important and impressive. At least 10 of these bug reports were verified by Umesh. 4 bugs remain private, and I think it's probably wrong.

by Valeriy Kravchuk (noreply@blogger.com) at May 31, 2015 12:58 PM

May 27, 2015

Chris Calender

MariaDB 10.0.19 Overview and Highlights

MariaDB 10.0.19 was recently released, and is available for download here:

https://downloads.mariadb.org/mariadb/10.0.19/

This is the tenth GA release of MariaDB 10.0, and 20th overall release of MariaDB 10.0.

This was a quick release in order to get a fix for a mysql_upgrade bug (MDEV-8115) introduced in 10.0.18, so there is that, and only 9 other bug fixes.

Here are the main items of note:

  • Fixed the server crash caused by mysql_upgrade (MDEV-8115)
  • Connect upgraded to 1.03.0007

Due to the mysql_upgrade bug fix as well as all of the fixes in MariaDB 10.0.18 (including 5 Security fixes), I would definitely recommend upgrading to this if you are running a prior version of MariaDB 10.0, especially 10.0.18.

You can read more about the 10.0.19 release here:

https://mariadb.com/kb/en/mariadb-10019-release-notes/

And if interested, you can review the full list of changes in 10.0.19 (changelogs) here:

https://mariadb.com/kb/en/mariadb-10019-changelog/

Hope this helps.

 

by chris at May 27, 2015 02:21 AM

MariaDB 10.0.18 Overview and Highlights

MariaDB 10.0.18 was recently released, and is available for download here:

https://downloads.mariadb.org/mariadb/10.0.18/

This is the ninth GA release of MariaDB 10.0, and 19th overall release of MariaDB 10.0.

There were no major functionality changes, but there were some general improvements, several security fixes, plus a 10.0.18 mysql_upgrade caution, and quite a few bug fixes, so let me cover what I feel are the main items of note:

  • Security Fixes: Fixes for the following security vulnerabilities:
  • InnoDB upgraded to 5.6.24
  • XtraDB upgraded to 5.6.23-72.1
  • Spider upgraded to 3.2.21
  • mroonga upgraded to 5.02
  • Performance Schema upgraded to 5.6.24
  • Connect upgraded to 1.03.0006
  • Deprecation Notice: As per the MariaDB Deprecation Policy, this will be the final release of MariaDB 5.5 for Fedora 19 “Schrödinger’s Cat”, Ubuntu 10.04 LTS “Lucid”, and Mint 9 LTS “Isadora”. When the next version of MariaDB 5.5 is released, repositories for these distributions will go away.
  • Important mysql_upgrade Caution: The mysql_upgrade in this version introduced a serious bug which affects mysql_upgrade. If already running a MariaDB 5.5.x version, then you can safely skip running mysql_upgrade. However, if migrating from MySQL to MariaDB 5.5, then note this bug. For this specific bug, the problem appears if the targeted databases include data structures such as views with binary or text blobs. The malfunction is in the REPAIR VIEW statement which the script calls.
    • The fix will appear in MariaDB 5.5.44, which will be available soon (MariaDB 5.5.44 includes all MySQL 5.5.44 fixes, so it will be available very shortly after MySQL 5.5.44 is released).

Given the security fixes, if you are running a prior version of 10.0, I would recommend upgrading. However, due to the mysql_upgrade bug in this version, I recommend upgrading to
10.0.19 instead (as it contains the fix for this bug).

You can read more about the 10.0.18 release here:

https://mariadb.com/kb/en/mariadb-10018-release-notes/

And if interested, you can review the full list of changes in 10.0.18 (changelogs) here:

https://mariadb.com/kb/en/mariadb-10018-changelog/

Hope this helps.

 

by chris at May 27, 2015 02:19 AM

MariaDB 5.5.43 Overview and Highlights

MariaDB 5.5.43 was recently released (it is the latest MariaDB 5.5), and is available for download here:

https://downloads.mariadb.org/mariadb/5.5.43/

This is a maintenance release, and so there were not too many major changes, but definitely a few worth mentioning, as well as one *important* caution:

  • Security Fixes: Fixes for the following security vulnerabilities:
  • Deprecation Notice: As per the MariaDB Deprecation Policy, this will be the final release of MariaDB 5.5 for Fedora 19 “Schrödinger’s Cat”, Ubuntu 10.04 LTS “Lucid”, and Mint 9 LTS “Isadora”. When the next version of MariaDB 5.5 is released, repositories for these distributions will go away.
  • Includes all bugfixes and updates from MySQL 5.5.43 (MySQL 5.5.43 Overview and Highlights)
  • TokuDB upgraded to 7.5.6
  • XtraDB upgraded to 5.5.42-37.1
  • Important mysql_upgrade Caution: The mysql_upgrade in this version introduced a serious bug which affects mysql_upgrade. If already running a MariaDB 5.5.x version, then you can safely skip running mysql_upgrade. However, if migrating from MySQL to MariaDB 5.5, then note this bug. For this specific bug, the problem appears if the targeted databases include data structures such as views with binary or text blobs. The malfunction is in the REPAIR VIEW statement which the script calls.
    • The fix will appear in MariaDB 5.5.44, which will be available soon (MariaDB 5.5.44 includes all MySQL 5.5.44 fixes, so it will be available very shortly after MySQL 5.5.44 is released).

Given the security fixes, you may want to review the CVEs to see if this is something you need to address. Also, if running TokuDB or XtraDB, you may also want to benefit from those fixes, as well as the new MariaDB fixes. However, if you plan on migrating from MySQL, if the above bug is relevant to you, then you should either upgrade to MariaDB 5.5.42, wait for 5.5.44, or possibly upgrade to MariaDB 10.0 (10.0.19 also contains the fix).

If interested, the official MariaDB 5.5.43 release notes are here:

https://mariadb.com/kb/en/mariadb/development/release-notes/mariadb-5543-release-notes/

And the full list of fixed bugs and changes in MariaDB 5.5.43 can be found here:

https://mariadb.com/kb/en/mariadb/development/changelogs/mariadb-5543-changelog/

Hope this helps.

 

by chris at May 27, 2015 01:30 AM

May 22, 2015

MariaDB AB

Optimizing Out-of-order Parallel Replication with MariaDB 10.0

geoff_montee_g

Out-of-order parallel replication is a great feature in MariaDB 10.0 that improves replication performance by committing independent transactions in parallel on a slave. If slave_parallel_threads is greater than 0, then the SQL thread will instruct multiple worker threads to concurrently apply transactions with different domain IDs.

If an application is setting the domain ID, and if parallel replication is enabled in MariaDB, then out-of-order parallel replication should mostly work automatically. However, depending on an application's transaction size and the slave's lag behind the master, slave_parallel_max_queued may have to be adjusted. In this blog post, I'll show an example where this is the case.

Configure the master and slave

For our master, let's configure the following settings:

[mysqld]
max_allowed_packet=1073741824
log_bin
binlog_format=ROW
sync_binlog=1
server_id=1

For our slave, let's configure the following:

[mysqld]
server_id=2
slave_parallel_threads=2
slave_domain_parallel_threads=1
slave_parallel_max_queued=1KB

In our test, we plan to use two different domain IDs, so slave_parallel_threads is set to 2. Also, notice how small slave_parallel_max_queued is here: it is only set to 1 KB. With such a small value, it will be easier to see the behavior I want to demonstrate.

Set up replication on master

Now, let's set up the master for replication:

MariaDB [(none)]> CREATE USER 'repl'@'192.168.1.46' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'192.168.1.46';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> RESET MASTER;
Query OK, 0 rows affected (0.22 sec)

MariaDB [(none)]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
            File: master-bin.000001
        Position: 313
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT BINLOG_GTID_POS('master-bin.000001', 313);
+-------------------------------------------+
| BINLOG_GTID_POS('master-bin.000001', 313) |
+-------------------------------------------+
|                                           |
+-------------------------------------------+
1 row in set (0.00 sec)

If you've set up GTID replication with MariaDB 10.0 before, you've probably used BINLOG_GTID_POS to convert a binary log position to its corresponding GTID position. On newly installed systems like my example above, this GTID position might be blank.

Set up replication on slave

Now, let's set up replication on the slave:

MariaDB [(none)]> SET GLOBAL gtid_slave_pos ='';
Query OK, 0 rows affected (0.09 sec)

MariaDB [(none)]> CHANGE MASTER TO master_host='192.168.1.45', master_user='repl', master_password='password', master_use_gtid=slave_pos;
Query OK, 0 rows affected (0.04 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.45
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 313
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 601
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 313
              Relay_Log_Space: 898
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 
1 row in set (0.00 sec)

Create some test tables on master

Let's set up some test tables on the master. These will automatically be replicated to the slave. We want to test parallel replication with two domains, so we will set up two separate, but identical tables, in two different databases:

MariaDB [(none)]> CREATE DATABASE db1;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> CREATE TABLE db1.test_table (
    -> id INT AUTO_INCREMENT PRIMARY KEY,
    -> file BLOB
    -> );
Query OK, 0 rows affected (0.12 sec)

MariaDB [(none)]> CREATE DATABASE db2;
Query OK, 1 row affected (0.01 sec)

MariaDB [(none)]> CREATE TABLE db2.test_table (
    -> id INT AUTO_INCREMENT PRIMARY KEY,
    -> file BLOB
    -> );
Query OK, 0 rows affected (0.06 sec)

Stop SQL thread on slave

For the test, we want the slave to fall behind the master, and we want its relay log to grow. To make this happen, let's stop the SQL thread on the slave:

MariaDB [(none)]> STOP SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.02 sec)

Insert some data on master

Now, in a Linux shell on the master, let's create a random 1 MB file:

[gmontee@master ~]$ dd if=/dev/urandom of=/tmp/file.out bs=1MB count=1
1+0 records in
1+0 records out
1000000 bytes (1.0 MB) copied, 0.144972 s, 6.9 MB/s
[gmontee@master ~]$ chmod 0644 /tmp/file.out

Now, let's create a script to insert the contents of the file into both of our tables in db1 and db2 with different values of gtid_domain_id:

tee /tmp/domain_test.sql <<EOF
SET SESSION gtid_domain_id=1;
BEGIN;
INSERT INTO db1.test_table (file) VALUES (LOAD_FILE('/tmp/file.out'));
COMMIT;
SET SESSION gtid_domain_id=2;
BEGIN;
INSERT INTO db2.test_table (file) VALUES (LOAD_FILE('/tmp/file.out'));
COMMIT;
EOF

After that, let's run the script a bunch of times. We can do this with a bash loop:

[gmontee@master ~]$ { for ((i=0;i<1000;i++)); do cat /tmp/domain_test.sql; done; } | mysql --max_allowed_packet=1073741824 --user=root

Restart SQL thread on slave

Now the relay log on the slave should have grown quite a bit. Let's restart the SQL thread and watch the transactions get applied. To do this, let's open up two shells on the slave.

On the first shell on the slave, connect to MariaDB and restart the SQL thread:

MariaDB [(none)]> START SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.00 sec)

On the second shell, let's look at SHOW PROCESSLIST output in a loop:

[gmontee@slave ~]$ for i in {1..1000}; do mysql --user=root --execute="SHOW PROCESSLIST;"; sleep 1s; done;

Take a look at the State column for the slave's SQL thread:

+----+-------------+-----------+------+---------+--------+-----------------------------------------------+------------------+----------+
| Id | User        | Host      | db   | Command | Time   | State                                         | Info             | Progress |
+----+-------------+-----------+------+---------+--------+-----------------------------------------------+------------------+----------+
|  3 | system user |           | NULL | Connect |    139 | closing tables                                | NULL             |    0.000 |
|  4 | system user |           | NULL | Connect |    139 | Waiting for work from SQL thread              | NULL             |    0.000 |
|  6 | system user |           | NULL | Connect | 264274 | Waiting for master to send event              | NULL             |    0.000 |
| 10 | root        | localhost | NULL | Sleep   |     43 |                                               | NULL             |    0.000 |
| 21 | system user |           | NULL | Connect |     45 | Waiting for room in worker thread event queue | NULL             |    0.000 |
| 54 | root        | localhost | NULL | Query   |      0 | init                                          | SHOW PROCESSLIST |    0.000 |
+----+-------------+-----------+------+---------+--------+-----------------------------------------------+------------------+----------+

With such a low slave_parallel_max_queued value, it will probably say "Waiting for room in worker thread event queue." most of the time. The SQL thread doesn't have enough memory allocated to read-ahead more of the relay log. This can prevent the SQL thread from providing enough work for all of the worker threads. The worker threads will probably show a State value of "Waiting for work from SQL thread" more often.

Conclusion

If you expect to be able to benefit from parallel slave threads, but you find that the State column in SHOW PROCESSLIST often shows "Waiting for room in worker thread event queue" for your SQL thread, you should try increasing slave_parallel_max_queued to see if that helps. The default slave_parallel_max_queued value of 132 KB will probably be acceptable for most workloads. However, if you have large transactions or if your slave falls behind the master often, and you hope to use out-of-order parallel replication, you may have to adjust this setting. Of course, most users probably want to avoid large transactions and slave lag for other reasons as well.

Has anyone run into this problem before? Were you able to figure out a solution on your own?

About the Author

geoff_montee_g's picture

Geoff Montee is a Support Engineer with MariaDB. He has previous experience as a Database Administrator/Software Engineer with the U.S. Government, and as a System Administrator and Software Developer at Florida State University.

by geoff_montee_g at May 22, 2015 07:19 AM

May 20, 2015

Open Query

More Cores or Higher Clock Speed?

This is a little quiz (could be a discussion). I know what we tend to prefer (and why), but we’re interested in hearing additional and other opinions!

Given the way MySQL/MariaDB is architected, what would you prefer to see in a new server, more cores or higher clock speed? (presuming other factors such as CPU caches and memory access speed are identical).

For example, you might have a choice between

  • 2x 2.4GHz 6 core, or
  • 2x 3.0GHz 4 core

which option would you pick for a (dedicated) MySQL/MariaDB server, and why?

And, do you regard the “total speed” (N cores * GHz) as relevant in the decision process? If so, when and to what degree?

by Arjen Lentz at May 20, 2015 01:12 AM

May 18, 2015

MariaDB Foundation

MariaDB Galera Cluster 10.0.19 and 5.5.43 now available

Download MariaDB Galera Cluster 10.0.19

Release Notes Changelog What is MariaDB Galera Cluster?

Download MariaDB Galera Cluster 5.5.43

Release Notes Changelog What is MariaDB Galera Cluster?

MariaDB APT and YUM Repository Configuration Generator

mariadb-seal-shaded-browntext-altThe MariaDB project is pleased to announce the immediate availability of MariaDB Galera Cluster 10.0.19 and 5.5.43. These are Stable (GA) releases.

See the Release Notes and Changelogs for detailed information on each release and the What is MariaDB Galera Cluster? page in the MariaDB Knowledge Base for general information about MariaDB Galera Cluster.

Thanks, and enjoy MariaDB!

by Daniel Bartholomew at May 18, 2015 05:31 PM

May 14, 2015

Chris Calender

MySQL 5.6.24 Overview and Highlights

MySQL 5.6.24 was recently released (it is the latest MySQL 5.6, is GA), and is available for download here.

For this release, there are 4 “Functionality Added or Changed” items:

  • Functionality Added/Changed: CMake support was updated to handle CMake version 3.1.
  • Functionality Added/Changed: The server now includes its version number when it writes the initial “starting” message to the error log, to make it easier to tell which server instance error log output applies to. This value is the same as that available from the version system variable. (Bug #74917)
  • Functionality Added/Changed: ALTER TABLE did not take advantage of fast alterations that might otherwise apply to the operation to be performed, if the table contained temporal columns found to be in pre-5.6.4 format (TIME, DATETIME, and TIMESTAMP columns without support for fractional seconds precision). Instead, it upgraded the table by rebuilding it. Two new system variables enable control over upgrading such columns and provide information about them:
    • avoid_temporal_upgrade controls whether ALTER TABLE implicitly upgrades temporal columns found to be in pre-5.6.4 format. This variable is disabled by default. Enabling it causes ALTER TABLE not to rebuild temporal columns and thereby be able to take advantage of possible fast alterations.
    • show_old_temporals controls whether SHOW CREATE TABLE output includes comments to flag temporal columns found to be in pre-5.6.4 format. Output for the COLUMN_TYPE column of the INFORMATION_SCHEMA.COLUMNS table is affected similarly. This variable is disabled by default.
  • Functionality Added/Changed: Statement digesting as done previously by the Performance Schema is now done at the SQL level regardless of whether the Performance Schema is compiled in and is available to other aspects of server operation that could benefit from it. The default space available for digesting is 1024 bytes, but can be changed at server startup using the max_digest_length system variable.

In addition to those, there were 50 other bug fixes:

  • 15 InnoDB
  •   4 Replication
  •   1 Partitioning
  • 30 Miscellaneous

The highlights for me are the Partitioning bug and 2 of the Replication bugs (of the 15 InnoDB bugs, 5 were related to full-text search, and 6 were related to Memcached plugin, and the other 4 were mostly obscure):

  • Partitioning: A number of ALTER TABLE statements that attempted to add partitions, columns, or indexes to a partitioned table while a write lock was in effect for this table were not handled correctly.
  • Replication: When gtid_mode=ON and slave_net_timeout was set to a low value, the slave I/O thread could appear to hang. This was due to the slave heartbeat not being sent regularly enough when the dump thread found many events that could be skipped. The fix ensures that the heartbeat is sent correctly in such a situation.
  • Replication: When replicating from a MySQL 5.7.6 or later server to a MySQL 5.6.23 or earlier server, if the older version applier thread encountered an Anonymous_gtid_log_event it caused an assert. The fix ensures that these new log events added in MySQL 5.7.6 and later do not cause this problem with MySQL 5.6.24 and later slaves.

Conclusions:

So while there were no major changes, the partitioning fix covered a number of bugs, the replication fixes could potentially be important for you, and the numerous InnoDB full-text and Memcached fixes would be important if you’re using either of those. Thus if you rely on any of this, I’d consider upgrading.

The full 5.6.24 changelogs can be viewed here (which has more details about all of the bugs listed above):

http://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-24.html

Hope this helps. :)

 

by chris at May 14, 2015 08:56 PM

Oli Sennhauser

Controlling worldwide manufacturing plants with MySQL

A MySQL customer of FromDual has different manufacturing plants spread across the globe. They are operated by local companies. FromDuals customer wants to maintain the manufacturing receipts centralized in a MySQL database in the Head Quarter in Europe. Each manufacturing plant should only see their specific data.

gtid_replication_customer.png

Manufacturing log information should be reported backup to European Head Quarter MySQL database.

The process was designed as follows:

gtid_replication_production_plant.png

Preparation of Proof of Concept (PoC)

To simulate all cases we need different schemas. Some which should be replicated, some which should NOT be replicated:

CREATE DATABASE finance;

CREATE TABLE finance.accounting (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `data` varchar(255) DEFAULT NULL,
  `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `data_rename` (`data`)
);


CREATE DATABASE crm;

CREATE TABLE crm.customer (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `data` varchar(255) DEFAULT NULL,
  `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `data_rename` (`data`)
);


CREATE DATABASE erp;

-- Avoid specifying Storage Engine here!!!
CREATE TABLE erp.manufacturing_data (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT
, manufacture_plant VARCHAR(32)
, manufacture_info VARCHAR(255)
, PRIMARY KEY (id)
, KEY (manufacture_plant)
);

CREATE TABLE erp.manufacturing_log (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT
, manufacture_plant VARCHAR(32)
, log_data VARCHAR(255)
, PRIMARY KEY (id)
, KEY (manufacture_plant)
);

MySQL replication architecture

Before you start with such complicated MySQL set-ups it is recommended to make a little sketch of what you want to build:

gtid_replication_master_slave.png

Preparing the Production Master database (Prod M1)

To make use of all the new and cool features of MySQL we used the new GTID replication. First we set up a Master (Prod M1) and its fail-over System (Prod M2) in the customers Head Quarter:

# /etc/my.cnf

[mysqld]

binlog_format            = row          # optional
log_bin                  = binary-log   # mandatory, also on Slave!
log_slave_updates        = on           # mandatory, also on Slave!
gtid_mode                = on           # mandatory, also on Slave!
enforce_gtid_consistency = on           # mandatory, also on Slave!
server-id                = 39           # mandatory, also on Slave!

This step requires a system restart (one minute downtime).

Preparing the Production Master standby database (Prod M2)

On Master (Prod M1):

GRANT REPLICATION SLAVE ON *.* TO 'replication'@'192.168.1.%' IDENTIFIED BY 'secret';

mysqldump -u root --set-gtid-purged=on --master-data=2 --all-databases --triggers --routines --events > /tmp/full_dump.sql

On Slave (Prod M2):

CHANGE MASTER TO MASTER_HOST='192.168.1.39', MASTER_PORT=3306
, MASTER_USER='replication', MASTER_PASSWORD='secret'
, MASTER_AUTO_POSITION=1;
RESET MASTER;   -- On SLAVE!
system mysql -u root < /tmp/full_dump.sql
START SLAVE;

To make it easier for a Slave to connect to its master we set a VIP in front of those 2 database servers (VIP Prod). This VIP should be used by all applications in the head quarter and also the filter engines.

Set-up filter engines (Filter BR and Filter CN)

To make sure every manufacturing plant sees only the data it is allowed to see we need a filtering engine between the production site and the manufacturing plant (Filter BR and Filter CN).

To keep this filter engine lean we use a MySQL instance with all tables converted to the Blackhole Storage Engine:

# /etc/my.cnf

[mysqld]

binlog_format            = row          # optional
log_bin                  = binary-log   # mandatory, also on Slave!
log_slave_updates        = on           # mandatory
gtid_mode                = on           # mandatory
enforce_gtid_consistency = on           # mandatory
server-id                = 36           # mandatory
default_storage_engine   = blackhole

On the production master (Prod M1) we get the data as follows:

mysqldump -u root --set-gtid-purged=on --master-data=2 --triggers --routines --events --no-data --databases erp > /tmp/erp_dump_nd.sql

The Filter Engines (Filter BR and CN) are set-up as follows::

-- Here we can use the VIP!
CHANGE MASTER TO master_host='192.168.1.33', master_port=3306
, master_user='replication', master_password='secret'
, master_auto_position=1;
RESET MASTER;   -- On SLAVE!

system cat /tmp/erp_dump_nd.sql | sed 's/ ENGINE=[a-zA-Z]*/ ENGINE=blackhole/' | mysql -u root

START SLAVE;

Do not forget to also create the replication user on the filter engines.

GRANT REPLICATION SLAVE ON *.* TO 'replication'@'192.168.1.%' IDENTIFIED BY 'secret';

Filtering out all non ERP schemata

We only want the erp schema to be replicated to the manufacturing plants, not the crm or the finance application. This we achieve with the following option on the filter engines:

# /etc/my.cnf

[mysqld]

replicate_do_db                = erp
replicate_ignore_table         = erp.manufacturing_log

MySQL row filtering

To achieve row filtering we use TRIGGERS. Make sure they are not replicated further down the hierarchy:

SET SESSION SQL_LOG_BIN = 0;

use erp

DROP TRIGGER IF EXISTS filter_row;

delimiter //

CREATE TRIGGER filter_row
BEFORE INSERT ON manufacturing_data
FOR EACH ROW
BEGIN

  IF ( NEW.manufacture_plant != 'China' ) THEN

    SIGNAL SQLSTATE '45000'
    SET MESSAGE_TEXT      = 'Row was filtered out.'
      , CLASS_ORIGIN      = 'FromDual filter trigger'
      , SUBCLASS_ORIGIN   = 'filter_row'
      , CONSTRAINT_SCHEMA = 'erp'
      , CONSTRAINT_NAME   = 'filer_row'
      , SCHEMA_NAME       = 'erp'
      , TABLE_NAME        = 'manufacturing_data'
      , COLUMN_NAME       = ''
      , MYSQL_ERRNO       = 1644
    ;
  END IF;
END;
//

delimiter ;

SET SESSION SQL_LOG_BIN = 1;

This filter must be applied for Brazil on the Brazil Filter node as well.

Up to now this would cause to stop replication for every filtered row. To avoid this we tell the Filtering Slaves to skip this error number:

# /etc/my.cnf

[mysqld]

slave_skip_errors = 1644

Attaching production manufacturing Slaves (Man BR M1 and Man CN M1)

When we have finished everything on our head quarter site. We can start with the manufacturing sites (BR and CN):

On Master (Prod M1):

mysqldump -u root --set-gtid-purged=on --master-data=2 --triggers --routines --events --where='manufacture_plant="Brazil"' --databases erp > /tmp/erp_dump_br.sql

mysqldump -u root --set-gtid-purged=on --master-data=2 --triggers --routines --events --where='manufacture_plant="China"' --databases erp > /tmp/erp_dump_cn.sql

On the Manufacturing Masters (Man BR M1 and Man BR M2). Here we do NOT use a VIP because we think a blackhole storage engine is robust enough as master:

CHANGE MASTER TO master_host='192.168.1.43', master_port=3306
, master_user='replication', master_password='secret'
, master_auto_position=1;
RESET MASTER;   -- On SLAVE!

system cat /tmp/erp_dump_br.sql | mysql -u root

START SLAVE;

The standby manufacturing (Man BR M2 and Man CN M2) database is created in the same way as the production manufacturing database on the master.

Testing replication from HQ to manufacturing plants

First we make sure, crm and finance is not replicated out and replication also does not stop (on Prod M1):

INSERT INTO finance.accounting VALUES (NULL, 'test data over VIP', NULL);
INSERT INTO finance.accounting VALUES (NULL, 'test data over VIP', NULL);
INSERT INTO crm.customer VALUES (NULL, 'test data over VIP', NULL);
INSERT INTO crm.customer VALUES (NULL, 'test data over VIP', NULL);
UPDATE finance.accounting SET data = 'Changed data';
UPDATE crm.customer SET data = 'Changed data';
DELETE FROM finance.accounting WHERE id = 1;
DELETE FROM crm.customer WHERE id = 1;

SELECT * FROM finance.accounting;
SELECT * FROM crm.customer;
SHOW SLAVE STATUS\G

The schema filter seems to work correctly. Then we check if also the row filter works correctly. For this we have to run the queries in statement based replication (SBR)! Otherwise the trigger would not fire:

use mysql

-- We are in RBR so row filter trigger does not apply:
INSERT INTO erp.manufacturing_data VALUES (NULL, 'China', 'Highly secret manufacturing info as RBR.');
INSERT INTO erp.manufacturing_data VALUES (NULL, 'Brazil', 'Highly secret manufacturing info as RBR.');

-- This needs SUPER privilege... :-(
SET SESSION binlog_format = STATEMENT;

-- Caution those rows will NOT be replicated!!!
-- See filter rules for SBR
INSERT INTO erp.manufacturing_data VALUES (NULL, 'China', 'Highly secret manufacturing info as SBR lost.');
INSERT INTO erp.manufacturing_data VALUES (NULL, 'Brazil', 'Highly secret manufacturing info as SBR lost.');

use erp

INSERT INTO manufacturing_data VALUES (NULL, 'China', 'Highly secret manufacturing info as SBR.');
INSERT INTO manufacturing_data VALUES (NULL, 'Brazil', 'Highly secret manufacturing info as SBR.');
INSERT INTO manufacturing_data VALUES (NULL, 'Germany', 'Highly secret manufacturing info as SBR.');
INSERT INTO manufacturing_data VALUES (NULL, 'Switzerland', 'Highly secret manufacturing info as SBR.');

SET SESSION binlog_format = ROW;

SELECT * FROM erp.manufacturing_data;

Production data back to head quarter

Now we have to take care about the production data on their way back to the HQ. To achieve this we use the new MySQL 5.7 feature called multi source replication. For multi source replication the replication repositories must be kept in tables instead of files:

# /etc/my.cnf

[mysqld]

master_info_repository    = TABLE   # mandatory
relay_log_info_repository = TABLE   # mandatory

Then we have to configure 2 replication channels from Prod M1 to their specific manufacturing masters over the VIP (VIP BR and VIP CN):

CHANGE MASTER TO MASTER_HOST='192.168.1.98', MASTER_PORT=3306
, MASTER_USER='replication', MASTER_PASSWORD='secret'
, MASTER_AUTO_POSITION=1
FOR CHANNEL "manu_br";

CHANGE MASTER TO MASTER_HOST='192.168.1.99', MASTER_PORT=3306
, MASTER_USER='replication', MASTER_PASSWORD='secret'
, MASTER_AUTO_POSITION=1
FOR CHANNEL "manu_cn";

START SLAVE FOR CHANNEL 'manu_br';
START SLAVE FOR CHANNEL 'manu_cn';

SHOW SLAVE STATUS FOR CHANNEL 'manu_br'\G
SHOW SLAVE STATUS FOR CHANNEL 'manu_cn'\G

Avoid to configure and activate the channels on Prod M2 as well.

Testing back replication from manufacturing plants

Brazil on Man BR M1:

INSERT INTO manufacturing_log VALUES (1, 'Production data from Brazil', 'data');

China on Man CN M1:

INSERT INTO manufacturing_log VALUES (2, 'Production data from China', 'data');

For testing:

SELECT * FROM manufacturing_log;

Make sure you do not run into conflicts (Primary Key, AUTO_INCREMENTS). Make sure filtering is defined correctly!

To check the different channel states you can use the following command:

SHOW SLAVE STATUS\G

or

SELECT ras.channel_name, ras.service_state AS 'SQL_thread', ras.remaining_delay
     , CONCAT(user, '@', host, ':', port) AS user
     , rcs.service_state AS IO_thread, REPLACE(received_transaction_set, '\n', '') AS received_transaction_set
  FROM performance_schema.replication_applier_status AS ras
  JOIN performance_schema.replication_connection_configuration AS rcc ON rcc.channel_name = ras.channel_name
  JOIN performance_schema.replication_connection_status AS rcs ON ras.channel_name = rcs.channel_name
;

Troubleshooting

Inject empty transaction

If you try to skip a transaction as you did earlier (SQL_SLAVE_SKIP_COUNTER) you will face some problems:

STOP SLAVE;
ERROR 1858 (HY000): sql_slave_skip_counter can not be set when the server is running with @@GLOBAL.GTID_MODE = ON. Instead, for each transaction that you want to skip, generate an empty transaction with the same GTID as the transaction

To skip the next transaction you have find the ones applied so far:

SHOW SLAVE STATUS\G
...
Executed_Gtid_Set: c3611091-f80e-11e4-99bc-28d2445cb2e9:1-20

then tell MySQL to skip this by injecting a new empty transaction:

SET SESSION GTID_NEXT='c3611091-f80e-11e4-99bc-28d2445cb2e9:21';

BEGIN;
COMMIT;

SET SESSION GTID_NEXT='AUTOMATIC';

SHOW SLAVE STATUS\G
...
Executed_Gtid_Set: c3611091-f80e-11e4-99bc-28d2445cb2e9:1-21

START SLAVE;

Revert from GTID-based replication to file/position-based replication

If you want to fall-back from MySQL GTID-based replication to file/position-based replication this is quite simple:

CHANGE MASTER TO MASTER_AUTO_POSITION = 0;

MySQL Support and Engineering

If you need some help or support our MySQL support and engineering team is happy to help you.

by Shinguz at May 14, 2015 07:43 PM

May 11, 2015

Colin Charles

Upcoming opportunities to talk MySQL/MariaDB in May 2015

May is quickly shaping up to be a month filled with activity in the MySQL/MariaDB space. Just a quick note to talk about where I’ll be; looking forward to meet folk to talk shop. 

  1. The London MySQL Meetup GroupMay 13 2015 – organized by former colleague & friend Ivan Zoratti, we will be doing a wrap up of recent announcements at Percona Live Santa Clara, and I’ll be showing off some of the spiffy new features we are building into MariaDB 10. 
  2. MariaDB Roadshow London – May 19 2015 – I’m going to give an overview of our roadmap, and there will be many excellent talks by colleagues there. I believe MariaDB Corporation CEO Patrik Sallner and Stu Schmidt, President at Zend will also be there. Should be a fun filled day. 
  3. Internet Society (ISOC) Hong Kong World Internet Developer Summit – May 21-22 2015 – I’ll be giving a keynote about MariaDB and how we are trying to make it important Internet infrastructure as well as making it developer friendly. 
  4. O’Reilly Velocity 2015 – May 27-29 2015 – I will in 90 minutes attempt to give a tutorial to attendees (over a 100 have already pre-registered) an overview of MySQL High Availability options and what their choices are in 2015. Expect a lot of talk on replication improvements from both MySQL & MariaDB, Galera Cluster, as well as tools around the ecosystem. 

by Colin Charles at May 11, 2015 10:43 AM

MariaDB Foundation

Fortran and MariaDB

Introduction

Fortran (FORmula TRANslating System) is a general-purpose, imperative programming language that is especially suited to numeric computation and scientific computing. History of FORTRAN can be tracked late 1953 when John W. Backus submitted a proposal to his superiors at IBM. The First FORTRAN compiler appeared in April 1957.

Some notable historical steps where:

  • FORTRAN II in 1958
  • FORTRAN III in 1958,
  • FORTRAN IV in 1962.
  • FORTRAN 66 or X3.9-1966 become the first industry-standard
  • FORTRAN 77 or X3.9-1978. This is the version of the Fortran I learned 1996.
  • Fortran 90 was released as ISO/IEC standard 1539:1991 and ANSI Standard in 1992
  • Fortran 95 was released as ISO/IEC standard 1539-1:1997
  • Fortan 2003 was released as ISO/IEC 1539-1:2004
  • Fortran 2008 was released as ISO/IEC 1539-1:2010 is most recent standard
  • Fortran 2015 is planned in late 2016.

More comprehensive history and introduction can be found e.g. from http://en.wikipedia.org/wiki/Fortran.

Thus Fortran programming language is not dead ! I did use Fortran in same day as I started writing this blog (05/07/2015). There is some historical reason why I decided to learn Fortran. In department of computer science, university of Helsinki there is course named Software Project where students design, implement and test larger application. I participated on this course 1996 and my application was Ringing Software for Ringing Centre, Natural History Museum, University of Helsinki. Their original software used magnetic tapes and Fortran66/77 programs. Our assignment was to transform this to use Oracle database and UNIX. At that time we decided to use Fortran77 (with some Fortran90 extensions, mainly structures) and ProFortran precompiler from Oracle.

Compilers

There is version of GNU Fortran named GFortran. The GFortran compiler is fully compliant with the Fortran 95 Standard and includes legacy F77 support. In addition, a significant number of Fortran 2003 and Fortran 2008 features are implemented.

To my experience GFortran is very good compiler and really includes most of the legacy support you need (and lot of new stuff I really do not need). However, I found one example that is useful but not supported, variable length format. Consider following:

cnt = max_year - min_year + 1
       WRITE (*, 20) (i, i = min_year, max_year)
   20  FORMAT ('Reng', (2X, I4), 2X, '  Total')

Here format (2x, I4) is repeated times and depends runtime values. This can be transformed to:

cnt = max_year - min_year + 1
       WRITE(fmt,'(A,I2,A,A)') '(A,',cnt,'(2X,I4)',',A)'
       WRITE (*, fmt) 'Reng', (i, i = min_year, max_year), ' Total'

This is because format can be a string variable and above produces format (A,44(2X,I4),A) (assuming years 1971 and 2014). But, in my opinion the first one is more clearer and simpler. Additionally, I learned to use pre-Fortran90 STRUCTURE and RECORD extensions, like

STRUCTURE /TVERSION/
    CHARACTER *80  VERSION
  END STRUCTURE

  RECORD /TVERSION/ MARIADB

  MARIADB.VERSION = ''

This can naturally expressed using TYPE:

TYPE t_version
    CHARACTER *80  :: version
  END TYPE

  TYPE(t_version) mariadb
  mariadb%version = ' '

I mostly use Fortran90 and free-form (longer line lengths than  allowed by standard Fortran77) but only limited amount of new features. Thus code might look like Fortran77 mostly:

50  CONTINUE
   55  FORMAT(I10,1X, A1, 1X, A)
       READ (10, 55, END = 70, ERR=800, IOSTAT = readstat, IOMSG=emsg) pesaid, rlaani, rkunta
       plaani(pesaid) = rlaani
       pkunta(pesaid) = rkunta
       GOTO 50

Naturally, there is number of commercial Fortran compilers like Intel Fortran https://software.intel.com/en-us/fortran-compilers and NAG http://www.nag.com/nagware/np.asp .

Clearly one of the bad features of Fortran are implicit types. If a variable is undeclared, Fortran 77 uses a set of implicit rules to establish the type. This means all variables starting with the letters i-n are integers and all others are real. Many old Fortran 77 programs uses these implicit rules, but you should not! The probability of errors in your program grows dramatically if you do not consistently declare your variables. Therefore, always put following in start of your Fortran program:

PROGRAM myprogram

        IMPLICIT NONE  ! No implicit rules used, compiler error instead

SQL and Fortran

Fortran does not natively understand SQL-clauses, but you can use e.g. embedded SQL. Embedded SQL is SQL-clauses inside a host language like Fortran. Lets take a example:

EXEC SQL BEGIN DECLARE SECTION
       CHARACTER *24 HTODAY
      EXEC SQL END DECLARE SECTION
      EXEC SQL INCLUDE SQLCA
      EXEC ORACLE OPTION (ORACA = YES)
      EXEC SQL INCLUDE ORACA
      EXEC SQL CONNECT :UID1 IDENTIFIED BY :UID2
      EXEC SQL SELECT TO_CHAR(SYSDATE,'YYYYMMDD HH24:MI:SS')
     -      INTO :HTODAY
     -      FROM DUAL

Naturally, normal Fortran compiler will not understand clauses starting with EXEC SQL. Thus, you need to first use precompiler. Precompiler changes embedded SQL-clauses (above include clauses are copied to resulting file) and other SQL-clauses are transformed to CALL-clauses to provided database server API-calls. Naturally, this means that you software will work only for precompiled (and then compiled) database provider.

Currently, there are precompilers at least for Oracle and DB2 databases (see http://en.wikipedia.org/wiki/Embedded_SQL). However, OS support is diminishing. E.g. Oracle Fortran Precompiler does not anymore work on Linux 64bit when using Oracle >10g. This in my opinion is bad because porting your Fortran software from Oracle to e.g. DB2 is not trivial especially if you have application with 100000 lines of Fortran code.

This fact has lead on my experience to situation where some of the system is re-implemented using Java and some of the code modified to pure Fortran so that it read input from files (generated using pure SQL) and by removing all embedded SQL-clauses.

Fortran and MariaDB

There is no connectors for Fortran to MariaDB /MySQL. However, you could use ODBC, however the free ODBC modules FLIBS and fodbc fail to compile in my 64-bit Linux and after some hacking with fodbc, it did not really work. Naturally, you could write your own fodbc for MariaDB/MySQL but currently I do not have a real need or enough free time to do so. Alternative way of doing this is create C-language interface between Fortran code and ODBC driver.

Lets take very simple example where Fortran program connects to MariaDB database, selects a version and disconnects.

PROGRAM myodbc_test

  INTEGER :: RC

  TYPE t_version
    CHARACTER *80  :: version
  END TYPE

  TYPE(t_version) mariadb

  RC = 0

  RC = connect()
  mariadb%version='select version()'//char(0)
  RC = version(mariadb)
  CALL mstrc2f(mariadb%version)
  WRITE (*,'(A)') mariadb%version
  RC = disconnect()

  STOP

  END PROGRAM

  SUBROUTINE mstrc2f(STR)

      IMPLICIT NONE

      CHARACTER *(*) STR
      INTEGER   *4 MAX
      INTEGER   *4 IND
      CHARACTER *1  EOS
      EOS  = CHAR(0)
      MAX  = LEN(STR)
      IND = MAX
  100 CONTINUE
      IF ( IND .GE. 1 ) THEN
          IF ( STR(IND:IND) .EQ. EOS) THEN
              GO TO 200
          ENDIF

          STR(IND:IND) = ' '
          IND = IND - 1

          GO TO 100
      ENDIF


  200 CONTINUE

      IF (IND .GE. 1) THEN
          STR(IND:IND) = ' '
      ENDIF

      RETURN

      END

As you note string variables need special handling as Fortran has constant strings. Therefore, we need to add C string end character before calling C-routines and then remove trailer before using string in Fortran again. And then simple C-interface (no real error handling):

#include 
#include 
#include 
#include 

SQLHENV env;
SQLHDBC dbc;

int connect_(void) {

  SQLHSTMT stmt;
  SQLRETURN ret;
  SQLCHAR outstr[1024];
  SQLSMALLINT outstrlen;

  SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &amp;env);
  SQLSetEnvAttr(env, SQL_ATTR_ODBC_VERSION, (void *) SQL_OV_ODBC3, 0);
  SQLAllocHandle(SQL_HANDLE_DBC, env, &amp;dbc);
  ret = SQLDriverConnect(dbc, NULL, "DSN=test;", SQL_NTS,
			 outstr, sizeof(outstr), &amp;outstrlen,
			 SQL_DRIVER_COMPLETE);

  return ret;
}

int disconnect_(void) {
    SQLDisconnect(dbc);
    SQLFreeHandle(SQL_HANDLE_DBC, dbc);
    SQLFreeHandle(SQL_HANDLE_ENV, env);
    fprintf(stderr, "Disconnected...\n");
}

typedef struct {
	char version[80];
} t_version;

int version_(t_version *version) {
	SQLHSTMT stmt;
	SQLRETURN ret;
	SQLSMALLINT columns;
	char buf[80];
	SQLLEN indicator;
	SQLAllocHandle(SQL_HANDLE_STMT, dbc, &amp;stmt);

	fprintf(stderr, "Selecting version...\n");

	ret = SQLPrepare(stmt,
           version-&gt;version, SQL_NTS);
	ret = SQLExecute(stmt);
	ret = SQLFetch(stmt);
	ret = SQLGetData(stmt, 1, SQL_C_CHAR, buf, sizeof(buf), &amp;indicator);
	strcpy(version-&gt;version, buf);
	return ret;
}

And, if you compile these and run the resulting program you might see something like following:

$ gcc myodbc.c -c -g -l myodbc5
$ gfortran myodbc_test.f90 myodbc.o -l myodbc5 -g
$ ./a.out
Selecting version...
10.0.18-MariaDB-debug                                                           
Disconnected...

Future of Fortran ?

There is clearly need languages like Fortran. It has some very nice features like formatted I/O and mathematical functions. However, learning Fortran might be up to you because it is not taught as first (or second) programming language on most universities or similar schools. Thus number of people who can use Fortran on programming or teach it is decreasing rapidly. However, my experience is that learning Fortran is simple if you can master at least one programming language (ok, I had learn already C/C++, COBOL, PL/I, Basic on my earlier studies). So you want to learn Fortran ? If Internet resources are not enough there is number of books. Book I have used is obsolete (Fortran 77 and Finish-language version Fortran 90/95) but e.g. http://www.amazon.com/Introduction-Programming-Fortran-With-Coverage/dp/0857292323 is a good one.

by Jan Lindstrom at May 11, 2015 07:20 AM

May 09, 2015

MariaDB Foundation

MariaDB 10.0.19 now available

Download MariaDB 10.0.19

Release Notes Changelog What is MariaDB 10.0?

MariaDB APT and YUM Repository Configuration Generator

mariadb-seal-shaded-browntext-altThe MariaDB project is pleased to announce the immediate availability of MariaDB 10.0.19. This is a Stable (GA) release.

See the Release Notes and Changelog for detailed information on this release and the What is MariaDB 10.0? page in the MariaDB Knowledge Base for general information about the MariaDB 10.0 series.

Thanks, and enjoy MariaDB!

by Daniel Bartholomew at May 09, 2015 12:20 PM

May 07, 2015

MariaDB Foundation

MariaDB 10.0.18 now available

Download MariaDB 10.0.18

Release Notes Changelog What is MariaDB 10.0?

MariaDB APT and YUM Repository Configuration Generator

mariadb-seal-shaded-browntext-altThe MariaDB project is pleased to announce the immediate availability of MariaDB 10.0.18. This is a Stable (GA) release.

See the Release Notes and Changelog for detailed information on this release and the What is MariaDB 10.0? page in the MariaDB Knowledge Base for general information about the MariaDB 10.0 series.

Thanks, and enjoy MariaDB!

by Daniel Bartholomew at May 07, 2015 06:03 PM

News from the third MariaDB Foundation Board Meeting this year

The MariaDB Foundation Board has been meeting monthly since February and on Monday this week had the third meeting of the year. Here is an update on a couple of things from the meeting.

We’re happy to announce that Booking.com has renewed their support to the foundation. As a major corporate sponsor Booking.com has been offered a seat on the Foundation board. Booking.com nominated Eric Herman.  Eric has a history with MySQL dating from 2004 where he joined MySQL working on the server and tools.  In 2010, Eric joined Booking.com where he works on database scaling challenges and BigData. As a community member, he has contributed to the perl MySQL client driver, the perl interpreter, and other Free Software.  To represent community and industry interests in line with the Foundation mission, Eric Herman has joined the Board.

The current Members of the Board ordered by last name are:

  • Sergei Golubchik, Chief Architect, MariaDB Corporation
  • Eric Herman, Principal Developer, Booking.com
  • Espen Håkonsen, CIO of Visma and Managing Director of Visma IT & Communications
  • Rasmus Johansson (Chair), VP Engineering, MariaDB Corporation
  • Michael Widenius, CTO, MariaDB Foundation
  • Jeremy Zawodny, Software Engineer, Craigslist

Last but not least secretary of the Board is the Foundation’s CEO Otto Kekäläinen.

The list of corporate sponsors so far this year are:

In case your company is interested to support the MariaDB project through the MariaDB Foundation please contact ”foundation ‘at’ mariadb (dot) org”.

It might be of interest that the mariadb.org website is getting a facelift to both look more appealing but also include more relevant information about the project and the Foundation. More about that later.

by rasmus at May 07, 2015 10:28 AM

May 05, 2015

MariaDB AB

Information on the SSL connection vulnerability of MySQL and MariaDB

rasmusjohansson

Last week, an SSL connection security vulnerability was reported for MySQL and MariaDB. The vulnerability states that since MariaDB and MySQL do not enforce SSL when SSL support is enabled, it is possible to launch Man In The Middle (MITM) attacks. MITM attacks can capture the secure connection and turn it into an unsecure connection, revealing data going back and forth to the server.

Issue resolution in MariaDB is visible through the corresponding ticket in MariaDB’s tracking system (JIRA): https://mariadb.atlassian.net/browse/MDEV-7937

The vulnerability affects the client library of the database server in both MariaDB and MySQL. However, the vulnerability does not affect all the libraries, drivers or connectors for establishing SSL connections with the server.

The vulnerability exists when the connection to the server is made through the client library libmysqlclient. This client library is provided with the database server and is a fork of the corresponding client library in MySQL. The client library is used by what is probably the most-used tool, the MySQL Command-Line tool, of which a forked version is shipped with MariaDB.

In addition to libmysqlclient, the MariaDB project provides the following connectors:

These connectors also support SSL connections to the database server and make use of similar parameters to establish secure connections. Here is an update on whether the connectors are affected or not:

  • Affected - MariaDB Connector/C is vulnerable in the same way as libmysqlclient
  • Not affected - MariaDB Connector/J works properly, aborting any unsecure connections if SSL is in use
  • Not affected - MariaDB Connector/ODBC does not currently support SSL

For MySQL’s Connector/J, it is worth mentioning that it has two properties: “useSSL” and “requireSSL”. If “requireSSL” is selected, then unsecure connections are aborted.

Many of the tools that are used to connect to MariaDB or MySQL make use of libmysqlclient. Thus, when using these tools over an untrusted network, it is highly recommended that you use best practices to restrict network access as much as possible, even if you’re using SSL to connect to MariaDB or MySQL. Some best practices that are easy to put in place for decreasing the risk of MITM attacks include:

Finally, since we are in the middle of fixing the vulnerability in MariaDB, we appreciate your input regarding which versions of MariaDB should get the fix backported. For background, the SSL support in MySQL (up until 5.7) and MariaDB is not enforceable. This is the intended MySQL behavior, implemented back in 2000, and clearly documented in the MySQL reference manual as:

 

“For the server, this option specifies that the server permits but does not require SSL connections.

 

For a client program, this option permits but does not require the client to connect to the server using SSL. Therefore, this option is not sufficient in itself to cause an SSL connection to be used. For example, if you specify this option for a client program but the server has not been configured to permit SSL connections, an unencrypted connection is used.”

MariaDB 5.5 and 10.0 are stable versions and behave as documented -- they permit SSL, but they do not require it. Enforcing SSL, when the appropriate options are given, will change the behavior of, and might break, existing applications where a mix of SSL and non-SSL connections are used. In MariaDB 10.1 this is not a problem since MariaDB 10.1 is still in beta, although it is very close to release candidate status. There we will introduce the fix. As for MariaDB 5.5 and 10.0, we are collecting input to determine whether we should change the behavior of 5.5 and 10.0.  Please visit our website for more details, and share your feedback at here.

The initial reports on the vulnerability can be found through these sources:

About the Author

rasmusjohansson's picture

Rasmus has worked with MariaDB since 2010 and was appointed VP Engineering in 2013. As such, he takes overall responsibility for the architecting and development of MariaDB Server, MariaDB Galera Cluster and MariaDB Enterprise.

by rasmusjohansson at May 05, 2015 06:52 AM

May 04, 2015

MariaDB AB

Query and Password Filtering with the MariaDB Audit Plugin

ralfgebhardt

The MariaDB Audit Plugin has been included in MariaDB Server by default since version 5.5.37 and 10.0.9. It's also pre-loaded in MariaDB Enterprise. The Audit Plugin as of version 1.2.0 includes new filtering options which are very useful. This article explains some aspects of them. However, if you haven't installed and used the plugin, you may want to read first a few other documents:

Filtering by Event Type

To appreciate the new features in the MariaDB Audit Plugin, you'll need to understand how this plugin handles filtering in general. The filtering options in version 1.1.x are based on defining the type of an event. Which event type that's used for logging can be configured using the global server variable, server_audit_events. There are three event types: a CONNECT; a TABLE, which is available in MariaDB only; and a QUERY.

The CONNECT event type handles connecting to a server or disconnecting from it. If this event type is defined in server_audit_events, connects, disconnects, and failed connects, including the related error code, are logged in an audit log file or system log.

Using the TABLE event type, the Audit Plugin will log several activities related to tables: when table objects are opened for read or write, and when table objects are created, altered, renamed, or dropped. It will log these actions without having to do complex parsing of the queries. To use this event type, you'll have to make some changes on the server itself. This feature is available only with version 5.5.31 or a newer version of MariaDB Server.

An audit at the table level will allow you to log access to real table objects used by a queries even when the queries themselves do not directly include table objects. This includes, for example, queries that use views or stored procedures.

The QUERY event type is used to log the queries themselves. All queries sent to the server are handled by this event type and logged. The full queries are always logged, together with any error codes. The query statements aren't parsed, though. This keeps the overhead of the audit plugin to a minimum.

If you don't want to log all of these long queries, or if you're only interested in the creation, change, or removal of objects but want to log DDL (Data Definition Language) statements, you can use the Audit Plugin to do just that to some extent. You can get what you want by just logging the TABLE and CONNECT events. In this way, any CREATE, ALTER, and RENAME statements for table objects are logged. If you are also interested in DDL (e.g., CREATE DATABASE), or if you're using the Audit Plugin with MySQL and not MariaDB Server, you'll need to use MariaDB Audit Plugin Version 1.2.0. Just remember that only the MariaDB Server can provide the TABLE events.

New Filtering Options for Queries

The MariaDB Audit Plugin has two new options for the server_audit_events server variable: QUERY_DML and QUERY_DDL. These options can be used, instead of using the QUERY option, to log only DML (Data Manipulation Language) or DDL statements in the audit log or system log. Using one of these options will result in parsing of query strings. This does requires a small overhead for the audit plugin.

The option QUERY still can be used. It's not equivalent, though, to using both QUERY_DML and QUERY_DDL. There are many queries that are neither DDL nor DML (e.g., GRANT statements). By using the old option, QUERY, you can avoid parsing and thereby reduce some overhead.

Password Filtering

As already mentioned, version 1.1 of the MariaDB Audit Plugin logs queries without any parsing of the queries. This means that passwords included in queries are logged as plain text in the audit log or system log. That's a security vulnerability. This has been changed, though, in version 1.2.0. Passwords are now replaced by asterisks (i.e., "*****") in the logs.

Be aware, though, that passwords given with functions PASSWORD() or OLD_PASSWORD() in DML statements will still be logged as plain text in queries. Key strings used with encrypt functions like ENCODE() and AES_ENCRYPT() are also still logged in plain text.

Download and Install

You can download and install the MariaDB Audit Plugin from mariadb.com/resources/downloads. If you're using the newest version of MariaDB Server, you won't have to download MariaDB Audit Plugin 1.2.0 separately, as it is included in MariaDB Server already. With MariaDB Enterprise the Audit-Plugin is pre-loaded and auditing just has to be activated.

For more information about the Audit Plugin refer to https://mariadb.com/kb/en/mariadb/about-the-mariadb-audit-plugin/

by ralfgebhardt at May 04, 2015 10:58 AM

May 02, 2015

Erkan Yanar

Cargo Cult aka Docker

Docker is kinda awesome, as it releases a lot of creativity and let
us rethink infrastructure. Think about upgrading an application(docker container). We just stop the old one and start the new container. Rollback is easy as stopping the new container and starting from the old image.

Let’s have a look at nginx. Within the Docker ecosystem, in a world where the backends come and go. You profit from writing the nginx configuration in a dynamic way. Most likely using confd or consul-template.

After that you stop the container and start it new from the image.

Kinda silly!

Why?

Sending nginx a SIGHUP would have told it to simply reread the configuration without stopping it by spawning a new process.

Nginx even has a nice trick to upgrade. Sending a SIGUSR2 nginx spawns a new process with the new binary.

In a standard Docker workflow you don’t use this features.

Regards yours

DockerHipster \o/

by erkan at May 02, 2015 07:39 AM

April 28, 2015

Shlomi Noach

"awesome-mysql" curated list created, open for pull requests

Following up on popular "awesome-*" lists (e.g. awesome-python, awesome-golang etc.), I've created the awesome-mysql curated list.

This is a list of technologies (and resources) in and around MySQL, and means to serve as a place to find reliable software and info. I recently happened to notice there are some tools I'm familiar with that are unknown to others; tools unknown to me that are in good use.

The list is naturally and intentionally incomplete. I wish this to be a community based creation; so I put some categories and some tools. I left many out, deliberatey. Please assist by filling in the missing projects, tools, libraries! Additions gladly accepted via pull-requests. Do note the contribution guidelines (somewhat lengthy, I apologize).

I will moderate FUDs, promotional, commercials etc., and otherwise it may take some days for me to merge requests.

The work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

by shlomi at April 28, 2015 12:48 PM

April 27, 2015

MariaDB AB

LinuxFest Northwest was HUGE this year!

Marc Sherwood

This weekend Team MariaDB attended the 15th annual LinuxFest Northwest. This event has been growing every year, but this year it crossed over to become a HUGE event. There were over 1800 attendees this year, and we had the chance to talk to many of them at our booth and in our talks. Monty talked about what was new in MariaDB 10.1 and Max spoke about Sharding MariaDB and MySQL with Spider. Both of these talks were well attended and had many great questions raised.

The big party, sponsored by Microsoft, was moved from the Bellingham Spark Museum, where in years past Monty got in a Faraday Cage beside a Tesla Coil that outputs 4,000,000 volts,to the Whatcom Museum.With 1800 attendees they had run out of tickets before we were able to make our way to the Microsoft booth to collect ours. I heard it was a great party, but I also appreciated the good night sleep that not going afforded me.

The exhibit hall was as action packed as the years gone by and this gave us a great chance to talk to people about MariaDB. A tend that I have been noticing at events is that we are now hearing from more and more people who are currently using MariaDB. Even as recent as last year I would have many conversations with those who were considering moving to MariaDB, but had not yet made the decision to do so. I think that this has to have been changing due in part to the fact that MariaDB is now packaged as default in most Linux distributions.

Another record from this year's event would be the amount of MariaDB Poppers® that we handed out. We shipped several hundred to the event and went home with none. It was pretty wild to look around the halls and show floor and see dozens of people walking around bouncing our MariaDB Poppers®. Kolbe shot some sweet slow motion video of a popper in action

If you are looking for an excellent event to attend next year I highly recommend LinuxFest Northwest 2016. They have it planned for the last weekend in April so mark you calendars now.

To see a full list of which events we will be attending this year have a look at our Events Section. If there is an event that you think that Team MariaDB should attend that is not listed please let us know. We hope to see you at an event soon.

Tags: 

About the Author

Marc Sherwood's picture

Marc Sherwood is North American Marketing Manager.

by Marc Sherwood at April 27, 2015 05:52 PM

April 24, 2015

Serge Frezefond

Happy to see this MySQL JSON momentum !

Last week at Percona Live Facebook has presented for the first time Docstore which is a native JSON implementation in MySQL. Oracle has also presented their MySQL 5.7 lab release that includes the implementation of a native JSON type. This is an important move as MySQL was behind other other RDMS regarding JSON (PostgreSQL already [...]

by Serge at April 24, 2015 10:24 AM

April 23, 2015

MariaDB AB

Configuring PAM Authentication and User Mapping with MariaDB

geoff_montee_g

User accounts in MariaDB have traditionally been completely separate from operating system accounts. However, MariaDB has included a PAM authentication plugin since version 5.2.10. With this plugin, DBAs can configure MariaDB user accounts to authenticate via PAM, allowing users to use their Linux username and password to log into the MariaDB server.

However, even when using the PAM authentication plugin, the user account still needs to exist in MariaDB, and the account needs to have privileges. Creating these MariaDB accounts and making sure the privileges are correct can be a lot of work. To decrease the amount of work involved, some users would like to be able to map a Linux user to a different MariaDB user. For example, let's say that "alice" and "bob" are both DBAs. It would be nice if each of them could log into MariaDB with their own Linux username and password, while MariaDB sees both of them as the same "dba" user. That way, there is only one MariaDB account to keep track of.

Luckily, both PAM and MariaDB support exactly that kind of use case. In this blog post, I will walk you through how to set up this kind of authentication.

Set up the user mapper PAM plugin

MariaDB's git repository has a simple user mapper PAM plugin. Downloading, compiling, and installing it is simple:

wget https://raw.githubusercontent.com/MariaDB/server/10.1/plugin/auth_pam/mapper/pam_user_map.c
gcc pam_user_map.c -shared -lpam -fPIC -o pam_user_map.so
sudo install --mode=0755 pam_user_map.so /lib64/security/

Set up the PAM policy

We want to configure the PAM policy so that:

  • Users authenticate with their Linux user names and passwords (i.e. use the pam_unix.so PAM module);
  • Login attempts go into the system's audit logs;
  • "Real" user names will be mapped to MariaDB user names (i.e. use the pam_user_map.so PAM module).

We can create a PAM policy to do all of the above with:

sudo tee /etc/pam.d/mysql <<EOF
auth required pam_unix.so audit
account required pam_unix.so audit
auth required pam_user_map.so
EOF

Create some test accounts

Let's create some Linux accounts to test things out:

# generic "dba" account to map other users to
sudo useradd dba
# a "real" account for Alice
sudo useradd alice
sudo passwd alice
# a "real" account for Bob
sudo useradd bob
sudo passwd bob

Configuring the user account mapping

By default, the pam_user_map.so module looks at /etc/security/user_map.conf for the mappings. Let's map both "alice" and "bob" to the "dba" user:

sudo tee /etc/security/user_map.conf <<EOF
alice: dba
bob: dba
EOF

Turn off SELinux

Even with SELinux set to permissive mode, you can still run into issues while trying to use MariaDB and PAM together. You may want to disable SELinux entirely. Otherwise, you could have messages like this show up in your system logs:

Apr 14 12:37:45 localhost setroubleshoot: SELinux is preventing /usr/sbin/mysqld from execute access on the file . For complete SELinux messages. run sealert -l 807c6372-91d9-4445-b944-79113756d6c2
Apr 14 12:37:45 localhost python: SELinux is preventing /usr/sbin/mysqld from execute access on the file .

*****  Plugin catchall_labels (83.8 confidence) suggests   *******************

If you want to allow mysqld to have execute access on the  file
Then you need to change the label on $FIX_TARGET_PATH
Do
# semanage fcontext -a -t FILE_TYPE '$FIX_TARGET_PATH'
where FILE_TYPE is one of the following: abrt_helper_exec_t, bin_t, boot_t, etc_runtime_t, etc_t, ld_so_t, lib_t, mysqld_exec_t, prelink_exec_t, shell_exec_t, src_t, system_conf_t, system_db_t, textrel_shlib_t, usr_t.
Then execute:
restorecon -v '$FIX_TARGET_PATH'

*****  Plugin catchall (17.1 confidence) suggests   **************************

If you believe that mysqld should be allowed execute access on the  file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# grep mysqld /var/log/audit/audit.log | audit2allow -M mypol
# semodule -i mypol.pp

Apr 14 12:37:59 localhost setroubleshoot: Plugin Exception restorecon_source
Apr 14 12:37:59 localhost setroubleshoot: SELinux is preventing /usr/sbin/unix_chkpwd from execute access on the file . For complete SELinux messages. run sealert -l c56fe6e0-c78c-4bdb-a80f-27ef86a1ea85
Apr 14 12:37:59 localhost python: SELinux is preventing /usr/sbin/unix_chkpwd from execute access on the file .

*****  Plugin catchall (100. confidence) suggests   **************************

If you believe that unix_chkpwd should be allowed execute access on the  file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# grep unix_chkpwd /var/log/audit/audit.log | audit2allow -M mypol
# semodule -i mypol.pp

Open up access to /etc/shadow

The pam_unix.so PAM module usually uses the unix_chkpwd utility to handle the authentication. This utility requires read access to /etc/shadow, which is usually unreadable for security reasons. To get PAM authentication to work with MariaDB, you will probably have to allow the mysql user to read this file. This is very easy to do:

sudo groupadd shadow
sudo usermod -a -G shadow mysql
sudo chown root:shadow /etc/shadow
sudo chmod g+r /etc/shadow

Of course, opening up access to /etc/shadow to some users is a security risk. However, if you try to use PAM together with MariaDB without opening up this access, you are likely to see messages like this in the system logs:

Apr 14 12:56:23 localhost unix_chkpwd[3332]: check pass; user unknown
Apr 14 12:56:23 localhost unix_chkpwd[3332]: password check failed for user (alice)
Apr 14 12:56:23 localhost mysqld: pam_unix(mysql:auth): authentication failure; logname= uid=991 euid=991 tty= ruser= rhost=  user=alice

Set up everything in MariaDB

Finally, let's set up everything in MariaDB:

-- Install the plugin
INSTALL SONAME 'auth_pam';

-- Create the "dba" user
CREATE USER 'dba'@'%' IDENTIFIED BY 'strongpassword';
GRANT ALL PRIVILEGES ON *.* TO 'dba'@'%';

-- Create an anonymous catch-all user that will use the PAM plugin and the mysql policy
CREATE USER ''@'%' IDENTIFIED VIA pam USING 'mysql';

-- Allow the anonymous user to proxy as the dba user
GRANT PROXY ON 'dba'@'%' TO ''@'%';

Since we changed the mysql user's group membership, we also have to restart the MariaDB service:

sudo service mysql restart

Try it out

Now, let's try it out. Even though we log in as "alice", our MariaDB privileges are actually those of the "dba" user:

[gmontee@localhost ~]$ mysql -u alice -h 127.0.0.1
[mariadb] Password:  
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.0.17-MariaDB-log MariaDB Server

Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> SELECT USER(), CURRENT_USER();
+-----------------+----------------+
| USER()          | CURRENT_USER() |
+-----------------+----------------+
| alice@localhost | dba@%          |
+-----------------+----------------+
1 row in set (0.00 sec)

Thoughts?

Is anyone using a setup like this? If so, how does it work for you? Can you think of any ways to improve this functionality?

About the Author

geoff_montee_g's picture

Geoff Montee is a Support Engineer with MariaDB. He has previous experience as a Database Administrator/Software Engineer with the U.S. Government, and as a System Administrator and Software Developer at Florida State University.

by geoff_montee_g at April 23, 2015 08:17 PM

April 22, 2015

MariaDB AB

Connecting to MariaDB through an SSH Tunnel

martinbrampton

When you want to connect a client to a database server through an insecure network, there are two main choices: use SSL or use an SSH tunnel. Although SSL often may seem to be the best option, SSH tunnels are in fact easier to implement and can be very effective. Traffic through an SSH tunnel is encrypted with all of the security of the SSH protocol, which has a strong track record against attacks.

There are various ways to implement an SSH tunnel. This article suggests a simple approach which is adequate in many situations. For the examples here, let’s assume that there is a database server running on a host named, server.example.com, with an IP address of 1.2.3.4. Suppose further that the client is on a host named, client.example.com, with an IP address of 5.6.7.8. We’ll also suppose that there are tightly configured iptables firewalls on both systems.

Dealing with Firewalls

The first step is to open the firewall for SSH communications between the systems. Let’s use the standard port for SSH (i.e., port 22). The tunnel will be instigated by the client. So the iptables script on the server might contain something like this:

IP_CLIENT=5.6.7.8
IPTABLES=/sbin/iptables
# Accept inbound packets that are 
# part of previously-OK’ed sessions

$IPTABLES -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
...
$IPTABLES -A INPUT -s $IP_CLIENT -p tcp -j ACCEPT --dport 22 -m state --state NEW

On the client side, the iptables script might include the following entries:

IP_SERVER=1.2.3.4
IPTABLES=/sbin/iptables
# Accept inbound packets that are part of previously-OK’ed sessions
$IPTABLES -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
…
$IPTABLES -A OUTPUT -d $IP_SERVER -p tcp --dport 22 -m state --state NEW -j ACCEPT

It’s useful to establish dedicated users on each machine. We’ll assume we’ve done that and called them simply, tunnel on both. They shouldn’t have any special privileges, but they need to have a home directory (assumed to be /home/tunnel) and the ability to run a command shell.

Preparing SSH Keys

The user on the client side will need to create SSH keys. This can be done by executing the ssh-keygen utility while logged in as the tunnel user like so:

ssh-keygen -t DSA -b 1024 -C tunnel@client.example.com

This example uses DSA, although you could use RSA. The last parameter, indicated by -C is purely a comment and doesn’t affect the use of the keys. The comment will be added to the end of the public key, and is useful for keeping track of what key belongs to what system.

When you execute the ssh-keygen command, it will offer to create the keys in the /home/tunnel/.ssh directory. This is fine; just accept this choice. It will also ask for a password, but this isn’t needed. So we’ll ignore it and press return. The result will be two files in the /home/tunnel/.ssh directory called id_dsa and id_dsa.pub. The first is the secret key, which should be kept secure. The second file is the public key, which can be distributed freely.

Now we need to place a copy of the public key on to the server system. It needs to go into the file called, /home/tunnel/.ssh/authorized_keys. Assuming this is the first key to be used on the server system for the tunnel user, the id_dsa.pub file can be copied into the server directory /home/tunnel/.ssh and renamed to authorized_keys. If not, you can append it to the end of the file by executing something like this at the command-line from the directory where you’ve uploaded the id_dsa.pub file:

cat id_dsa.pub >> /home/tunnel/.ssh/authorized_keys

You could also use a simple text editor to copy the contents of the id_dsa.pub file to the end of the authorized_keys file. Just put what you paste on a separate line in that file.

Testing the SSH Connection

Once the keys have been created and put where they belong, we should be able to log into the server with the tunnel user from the client, without having to enter a password. We would do that by executing this from the command-line:

ssh tunnel@server.example.com

The first time you do this, there should be a message that says that it is an unknown server. Just confirm that you want to go ahead with the connection. After this first time, you won’t get this message. If it connects successfully, you have proved that the tunnel user can make a connection to the server.

To make the SSH tunnel robust, it’s helpful to run a utility called autossh. This monitors an SSH tunnel and re-establishes it if it fails. You can find it in the standard repositories for Debian and Ubuntu or may need to add one of the well known additional repositories for other distributions. Once you’ve done that, autossh can be installed using the standard package management tools for the distribution (e.g., aptitude or yum).

Establishing an SSH Tunnel

We’re now ready to establish the SSH tunnel. In a Debian based installation, probably the best place to put the command to establish the tunnel is the directory, /etc/network/if-up.d. For Centos/Red Hat, it could go in the /etc/rc.local directory.

You would execute something like this from the command-line:

su - tunnel -c ‘autossh -M 0 -q -f -N -o “ServerAliveInterval 60” -o \ 
“ServerAliveCountMax 3” -L 4002:localhost:3306 tunnel@server.example.com’

Once we’ve executed that, we will have established port 4002 on the client and it will be connected to port 3306 on the server. If the command is run manually, the software invoked will run in the background and the terminal can be closed. The command can be placed in a script, though, that will run automatically at startup.

Connecting to MariaDB

Assuming the server has a MariaDB running on the default port and we have the MariaDB client installed on the client machine, we can now connect to MariaDB on the server. We would enter something like the first line below at the command-line on the client, and should see a message in response similiar to the one that followings:

mysql -u root -p –host=‘127.0.0.1’ –port=4002

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 164575
Server version: 10.1.1-MariaDB-1~wheezy-wsrep-log mariadb.org binary distribution, wsrep_25.10.r4123

Copyright (c) 2000, 2014, Oracle, SkySQL Ab and others.

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

MariaDB [(none)]> 

Conclusion

Generating SSH keys is a simpler process than the creation of SSL certificates, and the deployment is easier too. From my experience, there have also been fewer vulnerabilities with SSH than SSL. There is obviously some overhead in using an SSH tunnel, compared with an unencrypted connection. However, the overhead seems to be about the same as that imposed by SSL. The gain in security, though, is considerable.

About the Author

martinbrampton's picture

Martin Brampton is Principal Software Engineer at MariaDB

by martinbrampton at April 22, 2015 08:05 AM

April 18, 2015

Shlomi Noach

Percona Live 2015: Reflections

Some personal reflections on PerconaLive 2015:

Percona acquires Tokutek

Well done! Tokutek develops the TokuDB storage engine for MySQL and TokuMX engine for MongoDB. I will discuss the MySQL aspect only.

TokuDB was released as open source in 2013. It has attained a lot of traction and I have used it myself for some time. I met issues with locking or otherwise operational difficulties which I reported, and otherwise was fascinated by such features as great compression, online schema changes, and more.

Recently another company, InfiniDB, that also released its MySQL-backed codebase as open source, went out of business. I was afraid the same might happen to Tokutek.

I see Percona's purchase as a very good move for the community. I saw a lot of TokuDB interest in Percona for some time now, and it is clearly interested in the technology. I expect they will add their own hands-on experience into the development of more operations-friendly features; put effort in solving locking issues (it's been a while since I last checked, of course some of these may have been addressed by now). I am guessing they will work on a Galera/TokuDB integration and offer a "Toku-XtraDB-Cluster".

TokuDB can compete with InnoDB in many places, while in others each will have its distinct advantage.

I see this is as good news for the community.

Community Awards and Lightning Talks

On a completely different subject, I believe it is commonly accepted that this year's setup for the community awards & lightning talks was unsuccessful. The noise was astounding, human traffic was interrupting and overall this was a poor experience. We (Giuseppe Maxia, Kortney Runyan & myself) made a quick, informal brainstorming on this and came up with a couple ideas. One of which we hope to try in the upcoming Percona Live Europe - Amsterdam.

We apologize to the speakers for the difficulties.

Percona Live Europe - Amsterdam

Haha! Having recently relocated to the Netherlands I'm of course very happy. But regardless, Percona Live London was fun - and yet running on low fuel. I think it was a great idea to change location (and more locations expected in the future). This is the path taken by such conferences as OSCon, Velocity, Strata and more. Amsterdam in particular, as I've recently learned, is especially appreciated by many. I think this conf will do great!

Woz

And now for something completely different. Woz' talk was that. I'm happy he came; I appreciate that he discussed education; and it was fun.

by shlomi at April 18, 2015 01:41 AM

Percona Live 2015: Reflections; the Apache CCLA offer

Facebook, Google, Twitter, LinkedIn, Alibaba, MariaDB, Percona team up and offer Oracle all public changes under the Apache CCLA

Read again please.

My one word summary of this is: Romantic. In the most positive sense.

Disclaimer: I am not a lawyer; this is my understanding of the current status and of the offer.

Summarizing the deal: the teams participating with WebScaleSQL would like to push code upstream. Current legal issues limit their options. Existing patches/contributions from Percona & MariaDB are licensed by GPLv2, which Oracle cannot import as it distributes a commercial, closed source, edition, in addition to its open source MySQL community edition.

So what happens is that there is a lot of free code, great patches, new features out there, that are only available via MariaDB or WebscaleSQL or Percona Server, but not in the Oracle MySQL code base. This, in turn, means Oracle re-implements many features originating from said companies. And, more importantly, said companies need to routinely rebase their code on new Oracle releases, repeating tedious work.

The offer is that Oracle agrees to the Apache CCLA as a license by which it would be able to incorporate contributions. Oracle would then be able to use incorporated code in both open source and commercial edition. Oracle will choose what code to incorporate; hopefully many patches will be accepted upstream, and the community will benefit from a rich featureset, rapid developed MySQL server.

Clearly a lot of work, persuasion, lawyer time, discussions etc. have been invested in this effort. I would like to add my humble +1/like/favorite/whathaveyou. You may add yours by letting Oracle know your opinion on the subject. Media tools are great for this.

 

 

by shlomi at April 18, 2015 01:11 AM

MySQL Community Awards 2015: the Winners

The MySQL Community Awards initiative is an effort to acknowledge and thank individuals and corporates for their contributions to the MySQL ecosystem. It is a from-the-community, by-the-community and for-the-community effort. The committee is composed of an independent group of community members of different orientation and opinion, themselves past winners or known contributors to the community.

The 2015 community awards were presented on April 15rd, 2015, during the community event at the Percona Live conference. The winners are:

MySQL Community Awards: Community Contributor of the year 2015

  • Daniël van Eeden
    Daniël has done great work on MySQL security and has continued to fantastically support MySQL User Group.NL. He has also logged a lot of bugs (and submitted patches), across all sorts of different MySQL products and has done a great deal to help improve the quality of MySQL.Daniël consistently provides extremely good feedback, on a wide range of features and products, from MySQL server security, through InnoDB, partitioning, and even on other products such MySQL Enterprise Backup and MySQL Enterprise Monitor. His bugs are always reported with a high quality, and many times he even includes a contribution to fix those bugs.
  • Justin Swanhart
    Justin has worked tirelessly for the past few years on some amazing projects of his own design, Shard-Query and Flexviews. Cross shard aggregation is an extremely complex thing to get right, and Shard-Query takes an interesting approach at this. Flexviews provides a materialized view framework, which is something that MySQL lacks to many people's annoyance. Additionally Justin has also built some performance_schema related tools, reported many MySQL bugs, and has been a public speaker about MySQL in "can do" style.
  • Morgan Tocker
    In his day job, Morgan is Community Manager at Oracle. While some of his community interaction has been because of his job, he has gone far and beyond his corporate responsibilities. He is one of the most prolific writers on the MySQL Planet, he has been the most public face of MySQL, and he is always asking for feedback and showing a sincere concern for the Open Source community.For example, Morgan’s community polls on what defaults should be changed in MySQL 5.7 put some of the MySQL product decision making directly into the hands of the community. He is a key player on keeping the community and the MySQL developers at Oracle in touch with each other.

MySQL Community Awards: Application of the year 2015

  • sys schema
    As PERFORMANCE_SCHEMA matured in the MySQL ecosystem, Mark Leith identified the need to condense its information around user-friendliness and visibility.  Out of this, the ps_helper and eventually MySQL SYS projects were born. This actively-developed, one-man project has become a standard integration in the MySQL distribution today.
  • VividCortex
    VividCortex brings fresh and challenging ideas to the monitoring space, originally targeting MySQL. It provides near realtime information that was previously deemed unattainable through aggressive sampling and original statistical formulas that is raising the bar for monitoring high performance data stores at scale.

MySQL Community Awards: Corporate Contributor of the year 2015

  • WebScaleSQL Contributors: Facebook, Google, Twitter, LinkedIn, Alibaba
    It’s a common misconception, but WebScaleSQL is not a competitor to MySQL – it is strongly rooted in Oracle’s MySQL and closely follows its “upstream” MySQL codebase. Instead, it is intended as a place for several companies that were already collaborating on scalability improvements in MySQL to do so in a quicker and more succinct manner.

Congrats to all winners!

Committee members are:

  • Baron Schwartz
  • Colin Charles
  • Frederic Descamps
  • Geoffrey Anderson
  • Giuseppe Maxia
  • Marc Delisle
  • Mark Leith
  • Philip Stoev
  • Ronald Bradford
  • Santiago Lertora
  • Jeremy Cole (Secretary)
  • Shlomi Noach (Secretary)

Special thanks

Thank you to this year's anonymous sponsor for donating the goblets!

Thank you to Colin Charles for acquiring and transporting the goblets!

Thank you to Santiago Lertora for working out the new swards website!

community-awards-2015

by shlomi at April 18, 2015 12:19 AM

April 16, 2015

Stephane Varoqui

Howto - Move a table to different schema with no outage

I remember a time when it was debate if views can be useful for a web oriented workload ?

This post is about one good use case:

The  story is that some tables have been creating into a schema and used by the application into same connection.

Later on some more schema have been added to separate data for multiple application domain but still using original table, kind of cross domain universal table.

With addition of many new domains, a new global schema was added storing freshly create universal tables.

The question was how to move back the old table in the correct new schema without stopping availability of the service ?

We decided to use a view that point to the physical table. Change the application to use the view and later atomically switch the table and the view.


Here is the test case for doing that :


-- Create schemas
CREATE DATABASE schema1;
CREATE DATABASE schema2;

-- Create table in schema 1
CREATE TABLE schema1.t1 (
  id int
);

-- Create views in schema 2
CREATE VIEW schema2.t1 AS SELECT * FROM schema1.t1;
-- Create dummy view on view in schema 1 
CREATE VIEW schema1.t1_new AS SELECT * FROM schema2.t1;

-- Changing the API 

-- Switch schema 1 table and schema 2 view
RENAME TABLE schema2.t1 TO schema2.t1_old,
  schema1.t1 TO schema2.t1,
  schema1.t1_new TO schema1.t1;

Is there some other path ? Surely some triggers + insert ignore like done in OAK or Pt Online Alter table but i also remember a time when it was debate if triggers can be useful for a web oriented workload :)


Thanks to Nicolas @ccmbenchmark for contributing the test case.

by Stephane Varoqui (noreply@blogger.com) at April 16, 2015 06:29 PM

April 13, 2015

Valeriy Kravchuk

Fun with Bugs #35 - Bugs fixed in MySQL 5.6.24

I had not reviewed bug fixes in MySQL 5.6 for quite a some time, so I decided to check what bugs reported by MySQL Community were fixed in recently released MySQL 5.6.24. I'll mention both a bug reporter and engineer who verified the bug in the list below, because I still think that in MySQL world names should matter.

So, MySQL 5.6.24 includes fixes for the following bugs from http://bugs.mysql.com. I'd start with InnoDB and memcached-related fixes:
  • Bug #72080 - truncate temporary table crash: !DICT_TF2_FLAG_IS_SET(table, DICT_TF2_TEMPORARY). Reported by Doug Warner and verified by Shane Bester after a lot of testing. Note how fast it was fixed after verification!
  • Bug #75755 - Fulltext search behaviour with MyISAM vs. InnoDB (wrong result with InnoDB). Reported by Elena Stepanova from MariaDB and confirmed by my former boss Miguel Solorzano, this wrong results bug was also promptly fixed.
  • Bug #70055 - Expiration time ignored. This memcached-related bug was reported by Miljenko Brkic and verified by Umesh
  • Bug #74956 - Can not stop mysql with memcached plugin. This regression bug was reported by my colleague Nilnandan Joshi and verified by Umesh
  • Bug #75200 - MySQL crashed because of append operation. Reported by
    by already famous bug reporter (and developer) Zhai Weixiang, it was verified by Umesh and fixed fast enough.
    As you can see MySQL 5.6.24 fixed several more memcached-related bugs (reported internally), so if you use memcached it really makes sense to upgrade.
  • Bug #73361 - mutex contention caused by dummy table/index creation/free. Reported by Zhai Weixiang (who also suggested a patch) and verified by my dear friend and teacher Sinisa Milivojevic.  
Let's move on to partitioning. Just a couple of fixes there that fixed a long list of bugs reported by Percona QA engineers:
  •  Bug #74841 - handle_fatal_signal (sig=11) in cmp_rec_and_tuple | sql/sql_partition.cc:7610. This was reported by Percona's recent QA super star, Ramesh Sivaraman, and verified by Miguel Solorzano.
  • Bug #74860 - handle_fatal_signal (sig=11) in generate_partition_syntax. This was reported by Percona's all times QA superstar, Roel Van de Paar, and verified by Umesh.
  • Bug #74869 - handle_fatal_signal (sig=11) in ha_partition::handle_opt_partitions. It was reported by Ramesh Sivaraman, and verified by Miguel Solorzano.
  • Bug #74288 - Assertion `part_share->partitions_share_refs->num_parts >= m_tot_parts' failed. Reported by Roel Van de Paar and verified by Umesh.
  • Several other bugs mentioned remain private and not visible to us: Bug #74451, Bug #74478, Bug #74491, Bug #74560, Bug #74746, Bug #74634. I am not sure why they are private (or why the previous ones are still public, and for how long). Let's assume they were reported as private (and/or security ones) by my colleagues.
Now, only one replication bug reported at http://bugs.mysql.com was fixed, but serious one:
  • Bug #74607 - slave io_thread may get stuck when using GTID and low slave_net_timeouts. This bug was reported by Santosh Praneeth Banda and verified by Umesh.
There were several other bugs fixed in several categories:
  • Bug #74037 - group_concat_max_len=18446744073709547520 not accepted in my.cnf. It was reported by Leandro Morgado from Oracle and verified by himself probably. I am always to happy to see Oracle engineers reporting bugs in public.
  • Bug #73373 - Warning message shows old variable name. This was reported by Tsubasa Tanaka and verified by Miguel Solorzano.
  • Bug #71634 - P_S digest looks wrong for system variables, shown as @ @ variable... Reported by Simon Mudd and verified by the author of PERFORMANCE_SCHEMA, Marc Alff.
  • Bug #69744 - ssl.cmake silently chooses bundled yassl instead of erroring for old openssl ver. Good old build problem reported and verified by Shane Bester.
  • Bug #69423 - Double close() on the same file descriptor inside mysql_real_connect(). Reported by Yao Deng and verified by Igor Solodovnikov.
  • Bug #60782 - Audit plugin API: no MYSQL_AUDIT_GENERAL_LOG notifications with general log off. This one was reported by Olle Nilsson and verified (as a feature request) by ...yours truly almost 4 years ago.
A couple of issues were also fixed by introducing new server behavior:
  • Bug #74917 - Failed restarts contain no version details. Reported by my Oracle colleague Shawn Green and probably verified by him as well. Now server version is mentioned in the new error log file with the first message.
  • Bug #72997 - "fast" ALTER TABLE CHANGE on enum column triggers full table rebuild. Reported by famous Oracle customer Simon Mudd and verified by even more famous Shane Bester. The (temporary?) fix introduced two new variables, avoid_temporal_upgrade to control conversion to new "temporal" columns format (and rebuilding the table for any ALTER as a result), and show_old_temporals to control adding comments about old format of "temporal" column in SHOW CREATE TABLE output and corresponding INFORMATION_SCHEMA.COLUMNS.COLUMN value. Both variables are immediately declared as deprecated, so they may disappear in 5.7 (or 5.8? I am a bit lost with recent deprecation practices of Oracle).
That's all fixes for bugs reported at http://bugs.mysql.com in 5.6.24. Stay tuned, maybe one day we'll discuss MySQL 5.7.7 as well.

by Valeriy Kravchuk (noreply@blogger.com) at April 13, 2015 11:48 PM

April 11, 2015

Oli Sennhauser

Logging Galera Cluster conflicts

We typically suggest our customers to use our MySQL/Galera Cluster my.cnf configuration template to avoid MySQL configuration and performance problems.

And we are paranoid as well. Thus we enable all useful logging:

wsrep_log_conflicts = 1

But this has also some consequences of more visibility...

If you monitor carefully your Galera Cluster for example with the FromDual Performance Monitor for MySQL and MariaDB, you might probably see some strange values increasing from time to time:

mysql< SHOW GLOBAL STATUS LIKE 'wsrep_local_%r_s';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| wsrep_local_cert_failures | 42    |
| wsrep_local_bf_aborts     | 13    |
+---------------------------+-------+

Those values are indicators that some transactions (Galera write sets) did to not succeed and were aborted by Galera. In this case the paranoid logging helps to find, what exactly was aborted and possibly helps to find out, if this can or should be fixed:

150410  1:44:18 [Note] WSREP: cluster conflict due to certification failure for threads:
150410  1:44:18 [Note] WSREP: Victim thread:
   THD: 151856, mode: local, state: executing, conflict: cert failure, seqno: 30399304
   SQL: UPDATE login SET lTsexpire = UNIX_TIMESTAMP(NOW()) + lTimeout WHERE lSessionId = 'va3ta7besku82k56ncv3bnhlj5'

*** Priority TRANSACTION:
TRANSACTION 464359568, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
1 lock struct(s), heap size 360, 0 row lock(s)
MySQL thread id 4, OS thread handle 0x7f1c0916c700, query id 8190690 Update_rows_log_event::find_row(30399302)

*** Victim TRANSACTION:
TRANSACTION 464359562, ACTIVE 0 sec
mysql tables in use 1, locked 1
2 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1
MySQL thread id 151856, OS thread handle 0x7f1c09091700, query id 8190614 172.20.100.11 sam_angiz query end
UPDATE login SET lTsexpire = UNIX_TIMESTAMP(now()) + lTimeout WHERE lSessionId = 'va3ta7besku82k56ncv3bnhlj5'
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 835205 page no 3 n bits 72 index `PRIMARY` of table `fromdual`.`login` trx table locks 1 total table locks 2  trx id 464359562 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0
150410  1:44:18 [Note] WSREP: cluster conflict due to high priority abort for threads:
150410  1:44:18 [Note] WSREP: Winning thread:
   THD: 4, mode: applier, state: executing, conflict: no conflict, seqno: 30399302
   SQL: (null)
150410  1:44:18 [Note] WSREP: Victim thread:
   THD: 151856, mode: local, state: committing, conflict: no conflict, seqno: -1
   SQL: UPDATE login SET lTsexpire = UNIX_TIMESTAMP(now()) + lTimeout WHERE lSessionId = 'va3ta7besku82k56ncv3bnhlj5'

In the above Galera conflict 2 login transactions where running at the same time. They both come with the same Session ID and want to update the expiry timestamp. Now how to solve or fix this:

  • First check, if this table has a Primary Key (tables without a PK causes full table scans which can last for long time, increasing the chance for conflicts).
  • Second check, if there is a (UNIQUE?) index on lSessionId. A missing index leads to full table scans which increases the chance for conflicts.
  • Third check WHY 2 logins from the same Session ID can arrive at the same time (within 1 second) on 2 different Galera nodes (Ajax requests, etc...). Try to avoid such situations.

by Shinguz at April 11, 2015 10:30 AM

Galera Cluster last inactive check and VMware snapshots

Taxonomy upgrade extras: 

From time to time we see at Galera Cluster customer engagements the following, for me scary, warning in the MySQL error log:

[Warning] WSREP: last inactive check more than PT1.5S ago (PT7.06159S), skipping check

We mostly see this in VMware set-ups. Some further enquiry with the Galera developers did not give a satisfying answer:

This can be seen on bare metal as well - with poorly configured mysqld, O/S, or simply being overloaded. All it means is that this thread could not get CPU time for 7.1 seconds. You can imagine that access to resources in virtual machines is even harder (especially I/O) than on bare metal, so you will see this in virtual machines more often.

This is not a Galera specific issue (it just reports being stuck, other mysqld threads are equally stuck) so there is no configuration options for that. You simply must make sure that your system and mysqld are properly configured, that there is enough RAM (buffer pool not over provisioned), that there is swap, that there are proper I/O drivers installed on guest and so on.

Basically, Galera runs in virtual machines as well as well virtual machines approximates bare metal.

We were still suspecting that this is somehow VMware related. This week we had the chance to investigate... At 01:36 am node Galera2 lost connection to the Cluster and became NON-PRIMARY. This is basically a bad sign:

150401  1:36:15 [Warning] WSREP: last inactive check more than PT1.5S ago (PT5.08325S), skipping check
150401  1:36:15 [Note] WSREP: (09c6b2f2, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.42.2:4567
150401  1:36:16 [Note] WSREP: view(view_id(NON_PRIM,09c6b2f2,30) memb {
        09c6b2f2,0
} joined {
} left {
} partitioned {
        ce6bf2e1,0
        d1f9bee0,0
})
150401  1:36:16 [Note] WSREP: view(view_id(NON_PRIM,09c6b2f2,31) memb {
        09c6b2f2,0
} joined {
} left {
} partitioned {
        ce6bf2e1,0
        d1f9bee0,0
})
150401  1:36:16 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
150401  1:36:16 [Note] WSREP: Flow-control interval: [16, 16]
150401  1:36:16 [Note] WSREP: Received NON-PRIMARY.
150401  1:36:16 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 26304132)
150401  1:36:16 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
150401  1:36:16 [Note] WSREP: Flow-control interval: [16, 16]
150401  1:36:16 [Note] WSREP: Received NON-PRIMARY.
150401  1:36:16 [Warning] WSREP: Send action {(nil), 328, TORDERED} returned -107 (Transport endpoint is not connected)
150401  1:36:16 [Note] WSREP: New cluster view: global state: dcca768c-b5ad-11e3-bbc0-fb576fb3c451:26304132, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
150401  1:36:17 [Note] WSREP: (09c6b2f2, 'tcp://0.0.0.0:4567') reconnecting to d1f9bee0 (tcp://192.168.42.1:4567), attempt 0

I suspected, after some investigation with the FromDual Performance Monitor for MySQL and MariaDB, that the database backup (mysqldump) could be the reason. It was not. But the customer explained, that after the database backup they do a VMware snapshot.

And when we compared our problem with the backup log file:

2015/04/01 01:35:08 [3] backup.fromdual.com: Creating a snapshot of galera3
2015/04/01 01:35:16 [3] backup.fromdual.com: Created a snapshot of galera3
2015/04/01 01:35:23 [3] backup.fromdual.com: galera3: backup the changed blocks of disk 'Festplatte 1' using NBD transport
2015/04/01 01:36:10 [3] backup.fromdual.com: galera3: saving the Change Block Tracking's reference for disk 'Festplatte 1'
2015/04/01 01:36:10 [3] backup.fromdual.com: Removing Arkeia's snapshot of galera3

we can see that our problem pretty much started with the end of the WMware snapshot (01:36:10 + 5.08 = 1:36:15). By the way: For such kind of investigations it is always good to have a ntp daemon for time synchronization running. Otherwise problem investigation becomes much harder...

Some more and deeper investigation shows that we loose from time to time nodes during VMware snapshots (galera3). But they recover quickly because they can do an IST. In worst case we can loose 2 nodes and then the whole Galera Cluster has gone.

192.168.42.3 / node Galera3

2015-04-10 01:44:00 [3] backup.fromdual.com: Creating a snapshot of galera3
2015-04-10 01:44:08 [3] backup.fromdual.com: Created a snapshot of galera3
2015-04-10 01:44:15 [3] backup.fromdual.com: galera3: backup the changed blocks of disk 'Festplatte 1' using NBD transport
2015-04-10 01:45:39 [3] backup.fromdual.com: galera3: saving the Change Block Tracking's reference for disk 'Festplatte 1'
2015-04-10 01:45:39 [3] backup.fromdual.com: Removing Arkeia's snapshot of galera3

150410  1:44:07 [Note] WSREP: (158f71de, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://galera1:4567 tcp://galera2:4567
150410  1:44:07 [Warning] WSREP: last inactive check more than PT1.5S ago (PT7.06159S), skipping check
150410  1:44:08 [Note] WSREP: Received NON-PRIMARY.
150410  1:44:10 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 30399299)
150410  1:44:11 [Warning] WSREP: Gap in state sequence. Need state transfer.
150410  1:44:11 [Note] WSREP: Prepared IST receiver, listening at: tcp://galera3:4568
150410  1:44:11 [Note] WSREP: Member 0.0 (galera3) requested state transfer from '*any*'. Selected 2.0 (galera2)(SYNCED) as donor.
150410  1:44:11 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 30399309)
150410  1:44:11 [Note] WSREP: Requesting state transfer: success, donor: 2
150410  1:44:11 [Note] WSREP: 2.0 (galera2): State transfer to 0.0 (galera3) complete.
150410  1:44:11 [Note] WSREP: Member 2.0 (galera2) synced with group.
150410  1:44:11 [Note] WSREP: Receiving IST: 8 writesets, seqnos 30399291-30399299
150410  1:44:11 [Note] WSREP: IST received: dcca768c-b5ad-11e3-bbc0-fb576fb3c451:30399299
150410  1:44:11 [Note] WSREP: 0.0 (galera3): State transfer from 2.0 (galera2) complete.
150410  1:44:11 [Note] WSREP: Shifting JOINER -> JOINED (TO: 30399309)
150410  1:44:11 [Note] WSREP: Member 0.0 (galera3) synced with group.
150410  1:44:11 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 30399309)
150410  1:44:11 [Note] WSREP: Synchronized with group, ready for connections
150410  1:44:13 [Note] WSREP: (158f71de, 'tcp://0.0.0.0:4567') turning message relay requesting off
150410  1:45:42 [Warning] WSREP: last inactive check more than PT1.5S ago (PT2.47388S), skipping check
150410  1:45:43 [Note] WSREP: (158f71de, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://galera1:4567 tcp://galera2:4567
150410  1:45:44 [Note] WSREP: (158f71de, 'tcp://0.0.0.0:4567') reconnecting to 54de92f8 (tcp://galera1:4567), attempt 0
150410  1:45:44 [Note] WSREP: (158f71de, 'tcp://0.0.0.0:4567') reconnecting to c9d964d3 (tcp://galera2:4567), attempt 0
150410  1:45:48 [Note] WSREP: (158f71de, 'tcp://0.0.0.0:4567') turning message relay requesting off

150410  1:47:26 [Note] WSREP: (158f71de, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://galera1:4567
150410  1:47:27 [Note] WSREP: (158f71de, 'tcp://0.0.0.0:4567') reconnecting to 54de92f8 (tcp://galera1:4567), attempt 0
150410  1:47:31 [Note] WSREP: (158f71de, 'tcp://0.0.0.0:4567') turning message relay requesting off

192.168.42.1 / node Galera1

2015-04-10 01:47:24 [3] backup.fromdual.com: Creating a snapshot of galera1
2015-04-10 01:47:29 [3] backup.fromdual.com: Created a snapshot of galera1
2015-04-10 01:47:40 [3] backup.fromdual.com: galera1: backup the changed blocks of disk 'Festplatte 1' using NBD transport
2015-04-10 01:48:43 [3] backup.fromdual.com: galera1: saving the Change Block Tracking's reference for disk 'Festplatte 1'
2015-04-10 01:48:44 [3] backup.fromdual.com: Removing Arkeia's snapshot of galera1
150410  1:44:02 [Note] WSREP: (54de92f8, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://galera3:4567
150410  1:44:04 [Note] WSREP: (54de92f8, 'tcp://0.0.0.0:4567') reconnecting to 158f71de (tcp://galera3:4567), attempt 0
150410  1:44:12 [Note] WSREP: Member 0.0 (galera3) requested state transfer from '*any*'. Selected 2.0 (galera2)(SYNCED) as donor.

150410  1:45:43 [Note] WSREP: (54de92f8, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://galera3:4567
150410  1:45:44 [Note] WSREP: (54de92f8, 'tcp://0.0.0.0:4567') reconnecting to 158f71de (tcp://galera3:4567), attempt 0
150410  1:45:48 [Note] WSREP: (54de92f8, 'tcp://0.0.0.0:4567') turning message relay requesting off

150410  1:47:27 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.66452S), skipping check
150410  1:47:27 [Note] WSREP: (54de92f8, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://galera3:4567
150410  1:47:30 [Note] WSREP: (54de92f8, 'tcp://0.0.0.0:4567') turning message relay requesting off

192.168.42.2 / node Galera2

2015-04-10 02:09:55 [3] backup.fromdual.com: Creating a snapshot of galera2
2015-04-10 02:09:58 [3] backup.fromdual.com: Created a snapshot of galera2
2015-04-10 02:10:05 [3] backup.fromdual.com: galera2: backup the changed blocks of disk 'Festplatte 1' using NBD transport
2015-04-10 02:10:53 [3] backup.fromdual.com: galera2: saving the Change Block Tracking's reference for disk 'Festplatte 1'
2015-04-10 02:10:54 [3] backup.fromdual.com: Removing Arkeia's snapshot of galera2

150410  1:44:02 [Note] WSREP: (c9d964d3, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://galera3:4567
150410  1:44:03 [Note] WSREP: (c9d964d3, 'tcp://0.0.0.0:4567') reconnecting to 158f71de (tcp://galera3:4567), attempt 0

150410  1:44:08 [Warning] WSREP: discarding established (time wait) 158f71de (tcp://192.168.42.3:4567)
150410  1:44:11 [Note] WSREP: Member 0.0 (galera3) requested state transfer from '*any*'. Selected 2.0 (galera2)(SYNCED) as donor.
150410  1:44:13 [Note] WSREP: (c9d964d3, 'tcp://0.0.0.0:4567') turning message relay requesting off

150410  1:45:43 [Note] WSREP: (c9d964d3, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://galera3:4567
150410  1:45:44 [Note] WSREP: (c9d964d3, 'tcp://0.0.0.0:4567') reconnecting to 158f71de (tcp://galera3:4567), attempt 0
150410  1:45:48 [Note] WSREP: (c9d964d3, 'tcp://0.0.0.0:4567') turning message relay requesting off

150410  1:47:26 [Note] WSREP: (c9d964d3, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://galera1:4567
150410  1:47:27 [Note] WSREP: (c9d964d3, 'tcp://0.0.0.0:4567') reconnecting to 54de92f8 (tcp://galera1:4567), attempt 0
150410  1:47:30 [Note] WSREP: (c9d964d3, 'tcp://0.0.0.0:4567') turning message relay requesting off

150410  2:09:57 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.83618S), skipping check

The backups are done with the 2 options:

enabled.

Possibly this is the reason and one should disable those features in combination with Galera. Further investigation is going on. In worst case VMware snapshotting with Galera should be avoided.

by Shinguz at April 11, 2015 09:46 AM