Planet MariaDB

July 23, 2017

Valeriy Kravchuk

Why Thread May Hang in "Waiting for table level lock" State - Part I

Last time I had to use gdb to check what's going on in MySQL server and found something useful with it to share in the blog post it was April 2017, and I miss this kind of experience already. So, today I decided to try it again in a hope to get some insights in cases when other tools may not apply or may not be as helpful as one could expect. Here is the long enough story, based on recent customer issue I worked on this week.
* * *
Had you seen anything like this output of SHOW PROCESSLIST statement:

Id User Host db Command Time State 
Info Progress
...
28 someuser01 xx.xx.xx.xx:39644 somedb001 Sleep 247
NULL 0.000
29 someuser01 xx.xx.xx.yy:44100 somedb001 Query 276
Waiting for table level lock DELETE FROM t1 WHERE (some_id = 'NNNNN') AND ...
0.000
...
33 someuser01 xx.xx.zz.tt:35886 somedb001 Query 275
Waiting for table level lock DELETE FROM t2 WHERE (some_id = 'XXXXX') 0.000
...
38 someuser01 xx.xx.xx.xx:57055 somedb001 Query 246
Waiting for table level lock DELETE FROM t3 WHERE (some_id in (123456)) AND ... 0.000
...
recently? That is, many threads accessing InnoDB(!) tables and hanging in the "Waiting for table level lock" state for a long time without any obvious reason in the SHOW PROCESSLIST or SHOW ENGINE INNODB STATUS?

I've seen it this week, along with a statement from customer that the server is basically unusable. Unlike in many other cases I had not found any obvious problem from the outputs mentioned above (other than high concurrency and normal InnoDB row lock waits etc for other sessions). There were no unexpected table level locks reported by InnoDB, and the longest running transactions were those in that waiting state.

Unfortunately fine MySQL manual is not very helpful when describing this thread state:
"Waiting for lock_type lock
The server is waiting to acquire a THR_LOCK lock or a lock from the metadata locking subsystem, where lock_type indicates the type of lock.
This state indicates a wait for a THR_LOCK:
  • Waiting for table level lock
These states indicate a wait for a metadata lock:
  • Waiting for event metadata lock
  • Waiting for global read lock
    ...
    "
What that "THR_LOCK" should tell the reader is beyond me (and probably most of MySQL users). Fortunately, I know from experience that this state usually means that thread is trying to access some MyISAM table. The tables mentioned in currently running statements were all InnoDB, though. Server had performance_schema enabled, but only with default instrumentation and consumers defined, so I had no chance to get anything related to metadata locks or much hope to see all previous statement executed by each thread anyway. I surely insisted on checking the source code etc, but this rarely gives immediate results.

So, to check my assumption and try to prove MyISAM tables are involved, and get some hint what sessions they were locked by, I suggested to run mysqladmin debug, the command that is rarely used these days, with the output that Oracle explicitly refused to document properly (see my Bug #71346)! I do know what to expect there (from the days when MyISAM was widely used), and when in doubts I can always check the source code.

The mysqladmin debug command outputs a lot of information into the error log. Among other things I found the following section:

Thread database.table_name          Locked/Waiting        Lock_type

...
928
somedb001.somewhat_seq Waiting - write High priority write lock
...

12940
somedb001.somewhat_seq Locked - write High priority write lock
where somedb001.somewhat_seq surely was the MyISAM table. There were actually many tables mentioned as locked, with waits on them, and names ending with _seq suffix. One can now check what MySQL thread with id=12940 is doing at all levels, where it comes from, should it be killed etc.

Now it became more clear what's going on. Probably application developers tried to implement sequences with MyISAM tables and, probably, triggers to use them when data are not provided for the columns! The idea is quite popular and described in many blog posts. Check this old blog post by Peter Zaitsev, for example. This assumption was confirmed, and essentially they did something like I did in the following deliberately primitive example:
-- this is our sequence table
mysql> show create table misam\G
*************************** 1. row ***************************
       Table: misam
Create Table: CREATE TABLE `misam` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

mysql> create table inno like misam;
Query OK, 0 rows affected (0.30 sec)

mysql> alter table inno engine=InnoDB;
Query OK, 0 rows affected (2.73 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table inno\G
*************************** 1. row ***************************
       Table: inno
Create Table: CREATE TABLE `inno` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.03 sec)

mysql> select * from misam;
Empty set (0.00 sec)

-- let's set up the first value for the sequence
mysql> insert into misam values(1);
Query OK, 1 row affected (0.05 sec)

-- now let's create trigger to insert the value from sequencemysql> delimiter //
mysql> create trigger tbi before insert on inno
    -> for each row
    -> begin
    -> if ifnull(new.id,0) = 0 then
    ->   update misam set id=last_insert_id(id+1);
    ->   set new.id = last_insert_id();
    -> end if;
    -> end
    -> //

Query OK, 0 rows affected (0.25 sec)

mysql> delimiter ;
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into inno values(0);
Query OK, 1 row affected (0.20 sec)

mysql> insert into inno values(0);
Query OK, 1 row affected (0.10 sec)

mysql> select * from inno;
+----+
| id |
+----+
|  2 |
|  3 |
+----+
2 rows in set (0.00 sec)

mysql> select * from misam;
+----+
| id |
+----+
|  3 |
+----+
1 row in set (0.00 sec)
So, this is how it is supposed to work, and, you know, it works from two concurrent sessions without any locking noticed etc.

This is the end of the real story, for now. We have a lot of things to discuss there with customer, including how to find out previous commands executed in each sessions, what useful details we can get from the performance_schema etc. The root cause was clear, but we still have to find out why the sessions are spending so much time holding the blocking locks on MyISAM tables and what we can do about that, or with architecture based on such implementation of sequences (that I kindly ask you to avoid whenever possible! Use auto_increment values, please, if you care about concurrency.) I expect more blog posts eventually inspired by that real story, but now it's time to move to more generic technical details.
* * *
I have two problems with this story if I try to approach it in a more generic way.

First, do we have any other way to see MyISAM table level locks, besides mysqladmin debug command? (Oracle kindly plans to replace with something, let me quote Shane Bester's comment the bug that originates from hist internal feature request: "We need to replace it with appropriate performance_schema or information_schema tables.")

Second, how exactly to reproduce the situation customer reported? My initial attempts to get such a status with the trigger in place failed, I was not able to capture threads spending any notable time in "Waiting for table level lock" state even with concurrent single row inserts and the trigger above in place. I was thinking about explicit LOCK TABLES misam WRITE etc, but I doubt real code does use that. Fortunately, the following INSERT running from one session:
mysql> insert into inno values(sleep(100));
Query OK, 1 row affected (1 min 40.20 sec)
allowed me to run INSERT in the other session that got blocked:
mysql> insert into inno values(0);
Query OK, 1 row affected (1 min 36.36 sec)
and while it was blocked I've seen the thread state I wanted in the output of the SHOW PROCESSLIST:
mysql> show processlist;+----+------+-----------+------+---------+------+------------------------------+-------------------------------------+-----------+---------------+
| Id | User | Host      | db   | Command | Time | State                        | Info                                | Rows_sent | Rows_examined |
+----+------+-----------+------+---------+------+------------------------------+-------------------------------------+-----------+---------------+
| 16 | root | localhost | test | Query   |   17 | User sleep                   | insert into inno values(sleep(100)) |         0 |             0 |
| 17 | root | localhost | test | Query   |   13 | Waiting for table level lock | insert into inno values(0)          |         0 |             0 |
| 22 | root | localhost | test | Query   |    0 | starting                     | show processlist                    |         0 |             0 |
+----+------+-----------+------+---------+------+------------------------------+-------------------------------------+-----------+---------------+
3 rows in set (0.00 sec)
So, I know how exactly to reproduce the problem and what it can be caused by. Any slow running single INSERT statement (caused by something complex executed in trigger, some slow function used in the list of values inserted, or, maybe just INSERT ... SELECT ... from the big table) will give us the desired thread state.

Coming back to the first generic problem mentioned above, is there any way besides running mysqladmin debug and checking the output vs processlist etc, to identify the thread that holds MyISAM table level locks? One should expect performance_schema to help with this, at least as of 5.7 (I've used Percona Server 5.7.18-15 on my Ubuntu 14.04 netbook for testing today, while the original problem was on MariaDB 10.2.x. Percona is for fun, while MariaDB is for real work...). Indeed, we have table_handles there, and until recently I ignored its existence (it's new in 5.7, and I am not even sure if MariaDB 10.2.x has it already).

So, I reproduced the problem again and got the following there:
mysql> select * from performance_schema.table_handles;
+-------------+---------------+-------------+-----------------------+-----------------+----------------+---------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | OBJECT_INSTANCE_BEGIN | OWNER_THREAD_ID | OWNER_EVENT_ID | INTERNAL_LOCK | EXTERNAL_LOCK  |
+-------------+---------------+-------------+-----------------------+-----------------+----------------+---------------+----------------+
| TABLE       | test          | t           |       140392772351024 |               0 |              0 | NULL          | NULL           |
| TABLE       | test          | t1          |       140392772352560 |               0 |              0 | NULL          | NULL           |
| TABLE       | test          | ttime       |       140392772355632 |               0 |              0 | NULL          | NULL           |
| TABLE       | test          | misam       |       140392772358704 |              55 |             10 | WRITE         | WRITE EXTERNAL |
| TABLE       | test          | misam       |       140392896981552 |              54 |             10 | WRITE         | WRITE EXTERNAL |
| TABLE       | test          | inno        |       140392772361776 |              55 |             10 | NULL          | WRITE EXTERNAL |
| TABLE       | test          | inno        |       140392896983088 |              54 |             10 | NULL          | WRITE EXTERNAL |
| TABLE       | test          | inno        |       140392826836016 |               0 |              0 | NULL          | NULL           |
| TABLE       | test          | misam       |       140392826837552 |               0 |              0 | NULL          | NULL           |
+-------------+---------------+-------------+-----------------------+-----------------+----------------+---------------+----------------+
9 rows in set (0.00 sec)
What I am supposed to do with the above to find out the MySQL thread id of the blocking session? I am supposed to join to performance_schema.threads table, looking for thread_id value that is the same as owner_thread_id above, 54 and 55. I get the following, note that processlist_id is what you are looking for in the processlist:
mysql> select * from performance_schema.threads where thread_id in (54, 55)\G
*************************** 1. row ***************************
          THREAD_ID: 54
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 27
   PROCESSLIST_USER: root
   PROCESSLIST_HOST: localhost
     PROCESSLIST_DB: test
PROCESSLIST_COMMAND: Sleep
   PROCESSLIST_TIME: 90
  PROCESSLIST_STATE: NULL
   PROCESSLIST_INFO: NULL
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: Socket
       THREAD_OS_ID: 32252
*************************** 2. row ***************************
          THREAD_ID: 55
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 28
   PROCESSLIST_USER: root
   PROCESSLIST_HOST: localhost
     PROCESSLIST_DB: test
PROCESSLIST_COMMAND: Sleep
   PROCESSLIST_TIME: 101
  PROCESSLIST_STATE: NULL
   PROCESSLIST_INFO: NULL
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: Socket
       THREAD_OS_ID: 27844
2 rows in set (0.10 sec)
I do not see a way to distinguish lock wait from actually holding the lock. All I know at the moment is that there is engine-level lock on misam table (and inno table, for that matter). If the MySQL thread id is NOT for the thread that is "Waiting for table level lock", then it must be the thread that holds the lock! Some smart join with proper WHERE clause would let me to find out what I need directly. Maybe in one of the next parts I'll even present it, but writing it from the top of my head in critical situation is above my current skills related to performance_schema.
* * *
Now, what those of us should do who do not have performance_schema enabled, or have to use version before 5.7? Or those with access to gdb and spare 5 minutes?

Surely we should attach gdb to the mysqld process and, if in doubts, read some parts of the source code to know where to start. I started with the following fragments of code that were easy to find (as long as you know that COM_DEBUG command is actually sent by mysqladmin debug to server):
openxs@ao756:~/git/percona-server$ grep -rni com_debug *
include/mysql/plugin_audit.h.pp:160:  COM_DEBUG,
include/mysql.h.pp:47:  COM_DEBUG,
include/my_command.h:39:  COM_DEBUG,
libmysql/libmysql.c:899:  DBUG_RETURN(simple_command(mysql,COM_DEBUG,0,0,0));
rapid/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task_debug.h:62:  "[XCOM_DEBUG] ",
sql/sql_parse.cc:1884:  case COM_DEBUG:
^C
...
# from the line highlighted above...  case COM_DEBUG:
    thd->status_var.com_other++;
    if (check_global_access(thd, SUPER_ACL))
      break;                                    /* purecov: inspected */
    mysql_print_status();    query_logger.general_log_print(thd, command, NullS);
    my_eof(thd);
    break;

openxs@ao756:~/git/percona-server$ grep -rni mysql_print_status *
sql/sql_test.h:40:void mysql_print_status();
sql/sql_parse.cc:59:#include "sql_test.h"         // mysql_print_status
sql/sql_parse.cc:1888:    mysql_print_status();
sql/sql_test.cc:456:void mysql_print_status()sql/mysqld.cc:69:#include "sql_test.h"     // mysql_print_status
sql/mysqld.cc:2387:        mysql_print_status();   // Print some debug info
openxs@ao756:~/git/percona-server$

# from the line highlighted above...

void mysql_print_status()
{
  char current_dir[FN_REFLEN];
  STATUS_VAR current_global_status_var;

  printf("\nStatus information:\n\n");
  (void) my_getwd(current_dir, sizeof(current_dir),MYF(0));
  printf("Current dir: %s\n", current_dir);
  printf("Running threads: %u  Stack size: %ld\n",
         Global_THD_manager::get_instance()->get_thd_count(),
         (long) my_thread_stack_size);
  thr_print_locks();                            // Write some debug info
...
  printf("\nTable status:\n\
...
  display_table_locks();...
# finally, let's find the code of display_table_locks...openxs@ao756:~/git/percona-server$ grep -rni display_table_locks *
sql/sql_test.cc:371:static void display_table_locks(void)sql/sql_test.cc:501:  display_table_locks();
sql/mysqld.cc:9537:  { &key_memory_locked_thread_list, "display_table_locks", PSI_FLAG_THREAD},
openxs@ao756:~/git/percona-server$

static void display_table_locks(void)
{
  LIST *list;
  Saved_locks_array saved_table_locks(key_memory_locked_thread_list);
  saved_table_locks.reserve(table_cache_manager.cached_tables() + 20);

  mysql_mutex_lock(&THR_LOCK_lock);
  for (list= thr_lock_thread_list; list; list= list_rest(list))
  {
    THR_LOCK *lock=(THR_LOCK*) list->data;
    mysql_mutex_lock(&lock->mutex);
    push_locks_into_array(&saved_table_locks, lock->write.data, FALSE,
                          "Locked - write");
    push_locks_into_array(&saved_table_locks, lock->write_wait.data, TRUE,
                          "Waiting - write");
    push_locks_into_array(&saved_table_locks, lock->read.data, FALSE,
                          "Locked - read");
    push_locks_into_array(&saved_table_locks, lock->read_wait.data, TRUE,
                          "Waiting - read");
    mysql_mutex_unlock(&lock->mutex);
  }
...
That's enough to start. We'll have to finally study what THR_LOCK is. There is a global (double) linked lists of all them, thr_lock_thread_list, and this is where we can get details from, in a way similar (let's hope) to those server does while processing COM_DEBUG command for us. So, let's attach gdb and let's fun begin:
openxs@ao756:~/git/percona-server$ sudo gdb -p 23841
[sudo] password for openxs:
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
...
Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
0x00007fafd93d9c5d in poll () at ../sysdeps/unix/syscall-template.S:81
81      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) set $list=(LIST *)thr_lock_thread_list
(gdb) set $lock=(THR_LOCK*) $list->data
(gdb) p *($lock)
$1 = {list = {prev = 0x0, next = 0x7fafd2fa4780, data = 0x7fafbd545c80},
  mutex = {m_mutex = {__data = {__lock = 0, __count = 0, __owner = 0,
        __nusers = 0, __kind = 3, __spins = 0, __elision = 0, __list = {
          __prev = 0x0, __next = 0x0}},
      __size = '\000' <repeats 16 times>, "\003", '\000' <repeats 22 times>,
      __align = 0}, m_psi = 0x7fafd1f97c80}, read_wait = {data = 0x0,
    last = 0x7fafbd545cc8}, read = {data = 0x0, last = 0x7fafbd545cd8},
  write_wait = {data = 0x0, last = 0x7fafbd545ce8}, write = {
    data = 0x7fafbd69f188, last = 0x7fafbd69f190}
, write_lock_count = 0,
  read_no_write_count = 0, get_status = 0xf84910 <mi_get_status>,
  copy_status = 0xf84a70 <mi_copy_status>,
  update_status = 0xf84970 <mi_update_status>,
  restore_status = 0xf84a50 <mi_restore_status>,
  check_status = 0xf84a80 <mi_check_status>}
In the above we started from the first item in the list. If needed we can move on to the next with set $list=(LIST *)($list->next) etc. We then interpret $list->data as a (THD_LOCK*). There we can see read, read_wait, write and write_wait structures. One of them will have data item that is not 0x0. This way we can identify is it a lock or lock wait, and what kind of lock is it. In my case i was looking for a write lock, and here it is, the very first one. So, I continue:
(gdb) p *($lock)->write.data
$2 = {owner = 0x7fafbd468d38, next = 0x0, prev = 0x7fafbd545cf8,
  lock = 0x7fafbd545c80, cond = 0x0, type = TL_WRITE,
  status_param = 0x7fafbd69ee20, debug_print_param = 0x7fafbd515020,
  m_psi = 0x7fafa4c06cc0}
We can check the owner field:
(gdb) p *($lock)->write.data.owner
$3 = {thread_id = 16, suspend = 0x7fafbd469370}
and thread_id there is what we were looking for, the MySQL thread id of the blocking thread. If we want to get the details about the table locks we can do this as follows, by studying debug_print_param in data:
(gdb) set $table=(TABLE *) &(*($lock)->write.data->debug_print_param)
(gdb) p $table
$4 = (TABLE *) 0x7fafbd515020
(gdb) p *($table)
$5 = {s = 0x7fafbd4d7030, file = 0x7fafbd535230, next = 0x0, prev = 0x0,
  cache_next = 0x0, cache_prev = 0x7fafc4df05a0, in_use = 0x7fafbd468000,
  field = 0x7fafbd4d7440, hidden_field_count = 0, record = {
    0x7fafbd4d7430 "\377", 0x7fafbd4d7438 "\n"}, write_row_record = 0x0,
  insert_values = 0x0, covering_keys = {map = 0}, quick_keys = {map = 0},
  merge_keys = {map = 0}, possible_quick_keys = {map = 0},
  keys_in_use_for_query = {map = 0}, keys_in_use_for_group_by = {map = 0},
  keys_in_use_for_order_by = {map = 0}, key_info = 0x7fafbd4d7508,
  next_number_field = 0x0, found_next_number_field = 0x0, vfield = 0x0,
  hash_field = 0x0, fts_doc_id_field = 0x0, triggers = 0x0,
  pos_in_table_list = 0x7fafbd47a9b8, pos_in_locked_tables = 0x7fafbd4e5030,
  group = 0x0, alias = 0x7fafbbbad120 "misam",
  null_flags = 0x7fafbd4d7430 "\377", bitmap_init_value = 0x0, def_read_set = {
    bitmap = 0x7fafbd4d75a8, n_bits = 1, last_word_mask = 4294967294,
    last_word_ptr = 0x7fafbd4d75a8, mutex = 0x0}, def_write_set = {
    bitmap = 0x7fafbd4d75ac, n_bits = 1, last_word_mask = 4294967294,
    last_word_ptr = 0x7fafbd4d75ac, mutex = 0x0}, tmp_set = {
    bitmap = 0x7fafbd4d75b0, n_bits = 1, last_word_mask = 4294967294,
    last_word_ptr = 0x7fafbd4d75b0, mutex = 0x0}, cond_set = {
    bitmap = 0x7fafbd4d75b4, n_bits = 1, last_word_mask = 4294967294,
    last_word_ptr = 0x7fafbd4d75b4, mutex = 0x0},
  def_fields_set_during_insert = {bitmap = 0x7fafbd4d75b8, n_bits = 1,
    last_word_mask = 4294967294, last_word_ptr = 0x7fafbd4d75b8, mutex = 0x0},
---Type <return> to continue, or q <return> to quit---
  read_set = 0x7fafbd515128, write_set = 0x7fafbd515148,
  fields_set_during_insert = 0x7fafbd5151a8, query_id = 0, quick_rows = {
    0 <repeats 64 times>}, const_key_parts = {0 <repeats 64 times>},
  quick_key_parts = {0 <repeats 64 times>}, quick_n_ranges = {
    0 <repeats 64 times>}, quick_condition_rows = 0, lock_position = 0,
  lock_data_start = 0, lock_count = 1, temp_pool_slot = 0, db_stat = 39,
  current_lock = 1, nullable = 0 '\000', null_row = 0 '\000',
  status = 3 '\003', copy_blobs = 0 '\000', force_index = 0 '\000',
  force_index_order = 0 '\000', force_index_group = 0 '\000',
  distinct = 0 '\000', const_table = 0 '\000', no_rows = 0 '\000',
  key_read = 0 '\000', no_keyread = 0 '\000', locked_by_logger = 0 '\000',
  no_replicate = 0 '\000', locked_by_name = 0 '\000',
  fulltext_searched = 0 '\000', no_cache = 0 '\000',
  open_by_handler = 0 '\000', auto_increment_field_not_null = 0 '\000',
  insert_or_update = 0 '\000', alias_name_used = 0 '\000',
  get_fields_in_item_tree = 0 '\000', m_needs_reopen = 0 '\000',
  created = true, max_keys = 0, reginfo = {join_tab = 0x0, qep_tab = 0x0,
    lock_type = TL_WRITE, not_exists_optimize = false,
    impossible_range = false}, mem_root = {free = 0x7fafbd4d7420,
    used = 0x7fafbd535220, pre_alloc = 0x0, min_malloc = 32, block_size = 992,
    block_num = 6, first_block_usage = 0, max_capacity = 0,
    allocated_size = 2248, error_for_capacity_exceeded = 0 '\000',
    error_handler = 0xc4b240 <sql_alloc_error_handler()>, m_psi_key = 106},
---Type <return> to continue, or q <return> to quit---
  blob_storage = 0x0, grant = {grant_table = 0x0, version = 0,
    privilege = 536870911, m_internal = {m_schema_lookup_done = true,
      m_schema_access = 0x0, m_table_lookup_done = true,
      m_table_access = 0x0}}, sort = {filesort_buffer = {m_next_rec_ptr = 0x0,
      m_rawmem = 0x0, m_record_pointers = 0x0, m_sort_keys = 0x0,
      m_num_records = 0, m_record_length = 0, m_sort_length = 0,
      m_size_in_bytes = 0, m_idx = 0}, io_cache = 0x0, merge_chunks = {
      m_array = 0x0, m_size = 0}, addon_fields = 0x0,
    sorted_result_in_fsbuf = false, sorted_result = 0x0,
    sorted_result_end = 0x0, found_records = 0}, part_info = 0x0,
  all_partitions_pruned_away = false, mdl_ticket = 0x7fafbd4c4f10,
  m_cost_model = {m_cost_model_server = 0x0, m_se_cost_constants = 0x0,
    m_table = 0x0}, should_binlog_drop_if_temp_flag = false}
We already see table alias, but more details are in the s filed (of TABLE_SHARE * type):
(gdb) p $table->s$26 = (TABLE_SHARE *) 0x7fafbd4d7030
...
(gdb) p $table->s->table_cache_key
$27 = {str = 0x7fafbd4d73b8 "test", length = 11}
...
(gdb) p $table->s->db
$30 = {str = 0x7fafbd4d73b8 "test", length = 4}
(gdb) p $table->s->path
$31 = {str = 0x7fafbd4d73c8 "./test/misam", length = 12}
(gdb) p $table->s->table_name
$32 = {str = 0x7fafbd4d73bd "misam", length = 5}
# to remind you, this is how to get the thread id directly
(gdb) p (*($lock)->write.data.owner).thread_id
$37 = 16
I skilled some of my tests and intermediate results. To summarize, for me it took only a bit more time to search the code and try commands in gdb than to find out that metadata_locks does not help and one has to use table_handles in performance_schema. Next time in gdb I'll do it instantly, and the number of key strokes is far less :)

Surely, for production environments I'll have to write, test and note somewhere proper queries to the performance_schema. For next attempts with gdb and, maybe, FOSDEM talks I'll have to read the code to remember key details about the data structures used above... More posts to come, stay tuned!

by Valeriy Kravchuk (noreply@blogger.com) at July 23, 2017 06:20 PM

July 21, 2017

Peter Zaitsev

Faster Node Rejoins with Improved IST performance

IST Performance

In this blog, we’ll look at how improvements to Percona XtraDB Cluster improved IST performance.

Introduction

Starting in version 5.7.17-29.20 of Percona XtraDB Cluster significantly improved performance. Depending on the workload, the increase in throughput is in the range of 3-10x. (More details here). These optimization fixes also helped improve IST (Incremental State Transfer) performance. This blog is aimed at studying the IST impact.

IST

IST stands for incremental state transfer. When a node of the cluster leaves the cluster for a short period of time and then rejoins the cluster it needs to catch-up with cluster state. As part of this sync process existing node of the cluster (aka DONOR) donates missing write-sets to rejoining node (aka JOINER). In short, flow involves, applying missing write-sets on JOINER as it does during active workload replication.

Percona XtraDB Cluster / Galera already can apply write-sets in parallel using multiple applier threads. Unfortunately, due to commit contention, the commit action was serialized. This was fixed in the above Percona XtraDB Cluster release, allowing commits to proceed in parallel.

IST uses the same path for applying write-sets, except that it is more like a batch operation.

IST Performance

Let’s look at IST performance before and now.

Setup

  1. Two node cluster (node-1 and node-2) and gcache is configured large enough to avoid purging as we need IST
  2. Start workload against node-1 for 30 seconds
  3. Shutdown node-2
  4. Start workload that performs 4M requests against node-1. Workload produces ~3.5M write-sets that are cached in gcache and used later for IST
  5. Start node-2 with N-applier threads
  6. Wait until IST is done
  7. ….. repeat steps 3-6 with different values of N.

Observations:

  • IST is 4x faster with PXC-5.7.17 (compared to previous releases)
  • Improved performance means a quicker node rejoin, and an overall increase in cluster productivity as joiner node is available to process the workload more quickly

Conclusion

Percona XtraDB Cluster 5.7.17 significantly improved IST performance. A faster re-join of the node effectively means better cluster productivity and flexibility in planning maintenance window. So what are you waiting for? Upgrade to Percona XtraDB Cluster 5.7.17 or latest Percona XtraDB Cluster 5.7 release and experience the power!

by Krunal Bauskar at July 21, 2017 04:01 PM

Valeriy Kravchuk

Fun with Bugs #54 - On Some Bugs Fixed in MySQL 5.7.19

More than 3 months after 5.7.18 we' ve got MySQL 5.7.19 released recently. This is my quick review of the release notes with interesting fixed bug (reported in public) highlighted in the areas I am usually interested in.

Let's start with InnoDB. The following bug fixes attracted my attention:
  • Bug #85043 is still private. You know how much I hate those. At least we can see it was about the fact that "The server allocated memory unnecessarily for an operation that rebuilt the table." Let's hope this is no longer the case.
  • Bug #84038 - "Errors when restarting MySQL after FLUSH TABLES FOR EXPORT, RENAME and DROP.", was reported by Jean-François Gagné and verified by Umesh Shastry. Basically, InnoDB failed to update INNODB_SYS_DATAFILES internal data dictionary table properly while renaming tables.
  • Bug #80788 - "Reduce the time of looking for MLOG_CHECKPOINT during crash recovery". It was reported by  Zhai Weixiang (who had provided a patch) and quickly verified by Umesh Shastry.
  • Bug #83470 - "Reverse scan on a partitioned table does ICP check incorrectly, causing slowdown". This report that is related to both InnoDB, partitioning and optimizer, was created by my colleague from MariaDB, Sergey Petrunya, who had also suggested a patch under BSD license. Bug #84107 was considered a duplicate of this one it seems.
There were some public bugs fixed in group replication. While I do not really care, yet, about Group Replication, it's nice to see bugs in this area fixed so fast:
  • Bug #85667 - "GR node is in RECOVERING state if binlog_checksum configured on running server". This bug was reported by Ramana Yeruva.
  • Bug #85047 - "Secondaries allows transactions through ASYNC setup into the group". Yet another group replication bug recently fixed. It was reported by Narendra Chauhan who probably works for Oracle.
  • Bug #84728 - "Not Possible To Avoid MySQL From Starting When GR Cannot Start", was reported by Kenny Gryp from Percona.
  • Bug #84329 - "Some Class B private address is not permitted automatically". This funny bug was reported by Satoshi Mitani and verified by Umesh Shastry
  • Bug #84153 - "ASSERT "!contains_gtid(gtid)" rpl_gtid_owned.cc:94 Owned_gtids::add_gtid_owner". It was reported by Narendra Chauhan.
Usual async replication was also improved in 5.7.19, with the following related bugs fixed:
  • Bug #83184 - "Can't set slave_skip_errors > 3000". This bug was reported by Tsubasa Tanaka (who had provided a patch) and verified by Umesh Shastry.
  • Bug #82283 - "main.mysqlbinlog_debug fails with a LeakSanitizer error". Laurynas Biveinis from Percona found it and provided patches. The bug was verified by Umesh Shastry.
  • Bug #81232 - "Changing master_delay after stop slave results in loss of events". Ths regression bug (5.6.x was not affected) was reported by Parveez Baig. Bug #84249 (reported by Sergio Roysen) was declared a duplicate of this one.
  • Bug #77406 - "MTS must be able handle large packets with small slave_pending_jobs_size_max", was reported by my colleague Andrii Nikitin while he worked in Oracle.
  • Bug #84471 is still private. Why it could be so when the bug is supposed to be about "...the master could write to the binary log a last_committed value which was smaller than it should have been. This could cause the slave to execute in parallel transactions which should not have been, leading to inconsistencies or other errors." Let's hide the details about all bugs that may lead to data corruption,s why not?
  • Bug #83186 - "Request for slave_skip_errors = ER_INCONSISTENT_ERROR". Thanks to  Tsubasa Tanaka, who had provided a patch, we can do this now.
  • Bug #82848 - "Restarting a slave after seeding it with a mysqldump loses it's position". This serious problem with GTID-based replication was noted and resolved (with a patch) by Daniël van Eeden.
  • Bug #82209 is also private... Now, I am tired of them, just go real the release notes and hope they describe the problem and solution right. Great for any open source software, isn't it?
  • Bug #81119 - "multi source replication slave sql running error 1778". This serious bug (where GTIDs also play role) was reported by Mohamed Atef. Good to see it fixed finally.
I know long term query cache is doomed, Oracle is not going to work on it any more, but in the meantime, please, check this important bug fixed, Bug #86047 - "FROM sub query with 'group by' is cached by mistake", by Meiji Kimura. If you still use query cache and have complex queries with derived tables, please, consider upgrade. You might be getting wrong results all this time...

If you use xtrabackup or Oracle's MySQL Enterprise Backup in the environment with XA transactions, please, check Bug #84442 - "XA PREPARE inconsistent with XTRABACKUP", by David Zhao, who had also provided a patch. Your backups may be inconsistent until you upgrade...

Now, on some optimizer bugs fixed:
  • Bug #81031 - "count(*) on innodb sometimes returns 0". This happened when index merge was used by the optimizer. The bug was reported by Ashraf Amayreh and verified by Shane Bester.
  • Bug #84320 - "DISTINCT clause does not work in GROUP_CONCAT". It was reported by Varun Gupta and verified by Bogdan Kecman. See Bug #68145 also (that is still "Verified").
  • Bug #79675 - "index_merge_intersection optimization causes wrong query results". This bug was reported by Saverio M and verified by Miguel Solorzano.
There were a lot more bugs fixed in 5.7.19. I'd say that if one uses replication (especially GTID-based or group replication), upgrade is a must, but I had no tried this version myself, yet.

by Valeriy Kravchuk (noreply@blogger.com) at July 21, 2017 10:32 AM

July 20, 2017

MariaDB AB

Do you need five 9s?

Do you need five 9s? Amy Krishnamohan Thu, 07/20/2017 - 17:12

Finding the right high availability strategy for your business

When it comes to database high availability, how many 9s do you really need? The minimum for most organizations is three 9s. Will that suffice for you, or do you need four? Or is five 9s—less than 1 second of downtime per day—a must?

Finding the right mix of reliability and infrastructure complexity is vital to make the best use of your resources. Let’s take a high-level look at three common approaches, spanning from 99.9% to 99.999% uptime.

Master-slave replication: 99.9% or 99.99%

Master-slave replication is one of the most common methods for avoiding service disruption after a server fails. It allows you to mirror the contents of one or more master servers to one or more slave servers. Here’s how it works, in a nutshell: All database changes are written into the binary log as binlog events. The slaves then read the binary log from each master to access the data for replication. A slave server keeps track of the master binlog location for the most recent event applied on the slave, allowing for smooth failover, where the slave resumes from where the master left off.

Failover can be handled manually or automatically. For non–mission-critical applications, which require only three 9s, a DBA or sysadmin can manually promote one of the slaves to master, but most organizations now choose automatic failover. Automating failover is faster and many database management systems provide tools to make it easy. Replication with automatic failover can grant you four 9s.

The MariaDB MaxScale database proxy supports automatic failover by guaranteeing that all your slaves have same information and are therefore equally equipped to take over for the master. Additionally, because sudden spikes in the number of connections or queries can lead to unplanned downtime, MaxScale includes load-balancing features such as read/write splitting, connection-rate limiting and result-set limiting.

Master-slave replication can be asynchronous or semi-synchronous. We won’t go into detail on those topics here, but they’re a main focus of our upcoming Best Practices for High Availability with MariaDB webinar. That webinar will also delve into multi-master synchronous replication with the MariaDB Cluster, which we’ll take a quick look at now.

MariaDB Cluster: 99.999%

The MariaDB Cluster uses Galera technology to enable multi-master clustering solutions. It is a synchronous multi-master cluster that keeps all your server nodes in sync. When you conduct a transaction on one node, that transaction’s data is immediately available on all the other nodes in the cluster. Because there’s no slave lag and no chance for lost transactions (as is sometimes possible with simple replication), this schema allows for very fast failover and five 9s of uptime.

There’s so much more!

Now that you know the basic options for high availability, you need to know the pros and cons of each approach. Register for our July 25 webinar, where we’ll cover that topic plus use cases, various topologies and disaster-recovery tools.

When it comes to database high availability, how many 9s do you really need? Finding the right mix of reliability and infrastructure complexity is vital to make the best use of your resources. In this blog, we will take a high-level look at three common approaches, spanning from 99.9% to 99.999% uptime.

Login or Register to post comments

by Amy Krishnamohan at July 20, 2017 09:12 PM

Peter Zaitsev

Where Do I Put ProxySQL?

ProxySQL

In this blog post, we’ll look at how to deploy ProxySQL.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. It was created for DBAs by René Cannaò, as a means of solving complex replication topology issues. When bringing up ProxySQL with my clients, I always get questions about where it fits into the architecture. This post should clarify that.

Before continuing, you might want to know why you should use this software. The features that are of interest include:

  • MySQL firewall
  • Connection pooling
  • Shard lookup and automated routing
  • Ability to read/write split
  • Automatically switch to another master in case of active master failure
  • Query cache
  • Performance metrics
  • Other neat features!

Initial Configuration

In general, you install it on nodes that do not have a running MySQL database. You manage it via the MySQL command line on another port, usually 6032. Once it is started the configuration in /etc is not used, and you do everything within the CLI. The backend database is actually SQLite, and the db file is stored in /var/lib/proxysql.

There are many guides out there on initializing and installing it, so I won’t cover those details here. It can be as simple as:

apt-get install proxysql

ProxySQL Architecture

While most first think to install ProxySQL on a standalone node between the application and database, this has the potential to affect query performance due to the additional latency from network hops.

 

ProxySQL

To have minimal impact on performance (and avoid the additional network hop), many recommend installing ProxySQL on the application servers. The application then connects to ProxySQL (acting as a MySQL server) on localhost, using Unix Domain Socket, and avoiding extra latency. It would then use its routing rules to reach out and talk to the actual MySQL servers with its own connection pooling. The application doesn’t have any idea what happens beyond its connection to ProxySQL.

ProxySQL

Reducing Your Network Attack Surface

Another consideration is reducing your network attack surface. This means attempting to control all of the possible vulnerabilities in your network’s hardware and software that are accessible to unauthenticated users.

Percona generally suggests that you put a ProxySQL instance on each application host, like in the second image above. This suggestion is certainly valid for reducing latency in your database environment (by limiting network jumps). But while this is good for performance, it can be bad for security.

Every instance must be able to talk to:

  • Every master
  • Every slave

As you can imagine, this is a security nightmare. With every instance, you have x many more connections spanning your network. That’s x many more connections an attacker might exploit.

Instead, it can be better to have one or more ProxySQL instances that are between your application and MySQL servers (like the first image above). This provides a reasonable DMZ-type setup that prevents opening too many connections across the network.

That said, both architectures are valid production configurations – depending on your requirements.

by Manjot Singh at July 20, 2017 06:57 PM

July 19, 2017

Peter Zaitsev

Blog Poll: What Operating System Do You Run Your Development Database On?

Blog Poll

Blog PollIn this post, we’ll use a blog poll to find out what operating system you use to run your development database servers.

In our last blog poll, we looked at what OS you use for your production database. Now we would like to see what you use for your development database.

As databases grow to meet more challenges and expanding application demands, they must try and get the maximum amount of performance out of available resources. How they work with an operating system can affect many variables, and help or hinder performance. The operating system you use for your database can impact consumable choices (such as hardware and memory). The operating system you use can also impact your choice of database engine as well (or vice versa).

When new projects, new applications or services or testing new architecture solutions, it makes sense to create a development environment in order to test and run scenarios before they hit production. Do you use the same OS in your development environment as you do your production environment?

Please let us know what operating system you use to run your development database. For this blog poll, we’re asking which operating system you use to actually run your development database server (not the base operating system).

If you’re running virtualized Linux on Windows, please select Linux as the OS used for development. Pick up to three that apply. Add any thoughts or other options in the comments section:

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

Thanks in advance for your responses – they will help the open source community determine how database environments are being deployed.

by Dave Avery at July 19, 2017 10:05 PM

Multi-Threaded Slave Statistics

Multi-Threaded Slave Statistics

Multi-Threaded Slave StatisticsIn this blog post, I’ll talk about multi-threaded slave statistics printed in MySQL error log file.

MySQL version 5.6 and later allows you to execute replicated events using parallel threads. This feature is called Multi-Threaded Slave (MTS), and to enable it you need to modify the

slave_parallel_workers
 variable to a value greater than 1.

Recently, a few customers asked about the meaning of some new statistics printed in their error log files when they enable MTS. These error messages look similar to the example stated below:

[Note] Multi-threaded slave statistics for channel '': seconds elapsed = 123; events assigned = 57345; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 0 waited (count) when Workers occupied = 0 waited when Workers occupied = 0

The MySQL reference manual doesn’t show information about these statistics. I’ve filled a bug report asking Oracle to add information about these statistics in the MySQL documentation. I reported this bug as #85747.

Before they update the documentation, we can use the MySQL code to get insight as to the statistics meaning. We can also determine how often these statistics are printed in the error log file. Looking into the rpl_slave.cc file, we find that when you enable MTS – and log-warnings variable is greater than 1 (log-error-verbosity greater than 2 for MySQL 5.7) – the time to print these statistics in MySQL error log is 120 seconds. It is determined by a hard-coded constant number. The code below shows this:

/*
  Statistics go to the error log every # of seconds when --log-warnings > 1
*/
const long mts_online_stat_period= 60 * 2;

Does this mean that every 120 seconds MTS prints statistics to your MySQL error log (if enabled)? The answer is no. MTS prints statistics in the mentioned period depending on the level of activity of your slave. The following line in MySQL code verifies the level of the slave’s activity to print the statistics:

if (rli->is_parallel_exec() && rli->mts_events_assigned % 1024 == 1)

From the above code, you need MTS enabled and the modulo operation between the 

mts_events_assigned
 variable and 1024 equal to 1 in order to print the statistics. The 
mts_events_assigned
 variable stores the number of events assigned to the parallel queue. If you’re replicating a low level of events, or not replicating at all, MySQL won’t print the statistics in the error log. On the other hand, if you’re replicating a high number of events all the time, and the
mts_events_assigned
 variable increased its value until the remainder from the division between this variable and 1024 is 1, MySQL prints MTS statistics in the error log almost every 120 seconds.

You can find the explanation these statistics below (collected from information in the source code):

  1. Worker queues filled over overrun level: MTS tends to load balance events between all parallel workers, and the  
    slave_parallel_workers
     variable determines the number of workers. This statistic shows the level of saturation that workers are suffering. If a parallel worker queue is close to full, this counter is incremented and the worker replication event is delayed in order to avoid reaching worker queue limits.
  2. Waited due to a Worker queue full: This statistic is incremented when the coordinator thread must wait because of the worker queue gets overfull.
  3. Waited due to the total size: This statistic shows the number of times that the coordinator thread slept due to reaching the limit of the memory available in the worker queue to hold unapplied events. If this statistic keeps increasing when printed in your MySQL error log, you should resize the
    slave_pending_jobs_size_max
     variable to a higher value to avoid the coordinator thread waiting time.
  4. Waited at clock conflicts: In the case of possible dependencies between transactions, this statistic shows the wait time corresponding to logical timestamp conflict detection and resolution.
  5. Waited (count) when used occupied: A counter of how many times the coordinator saw Workers filled up with “enough” with assignments. The enough definition depends on the scheduler type (per-database or clock-based).
  6. Waited when workers occupied: These are statistics to compute coordinator thread waiting time for any worker available, and applies solely to the Commit-clock scheduler.
Conclusion

Multi-threaded slave is an exciting feature that allows you to replicate events faster, and keep in sync with master instances. By changing the log-warnings variable to a value greater than 1, you can get information from the slave error log file about how multi-threaded performance.

by Juan Arruti at July 19, 2017 05:02 PM

MariaDB AB

MariaDB 5.5.57 now available

MariaDB 5.5.57 now available dbart Wed, 07/19/2017 - 12:56

The MariaDB project is pleased to announce the immediate availability of MariaDB 5.5.57. See the release notes and changelog for details.

Download MariaDB 5.5.57

Release Notes Changelog What is MariaDB 5.5?

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.2.7. See the release notes and changelog for details.

Login or Register to post comments

by dbart at July 19, 2017 04:56 PM

July 18, 2017

MariaDB Foundation

MariaDB 5.5.57 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 5.5.57. This is a stable (GA) release. See the release notes and changelog for details. Download MariaDB 5.5.57 Release Notes Changelog What is MariaDB 5.5? MariaDB APT and YUM Repository Configuration Generator Thanks, and enjoy MariaDB!

The post MariaDB 5.5.57 now available appeared first on MariaDB.org.

by Ian Gilfillan at July 18, 2017 09:13 PM

Peter Zaitsev

Backups and Disaster Recovery

Backups and Disaster Recovery

Backups and Disaster RecoveryIn this post, we’ll look at strategies for backups and disaster recovery.

Note: I am giving a talk on Backups and Disaster Recovery Best Practices on July 27th.

When discussing disaster recovery, it’s important to take your business’ continuity plan into consideration. Backup and recovery processes are a critical part of any application infrastructure.

A well-tested backup and recovery system can be the difference between a minor outage and the end of your business.

You will want to take three things into consideration when planning your disaster recovery strategy: recovery time objective, recovery point objective and risk mitigation.

Recovery time objective (RTO) is how long it takes to restore your backups. Recovery point objective (RPO) is what point in time you want to recover (in other words, how much data you can afford to lose after recovery). Finally, you need to understand what risks you are trying to mitigate. Risks to your data include (but are not limited to) bad actors, data corruption, user error, host failure and data center failure.

Recommended Backup Strategies

We recommend that you use both physical (Percona XtraBackup, RDS/LVM Snapshots, MySQL Enterprise Backup) and logical backups (mysqldump, mydumper, mysqlpump). Logical backups protect against the loss of single data points, while physical backups protect against total data loss or host failure.

The best practice is running Percona XtraBackup nightly, followed by mysqldump (or in 5.7+, mysqlpump). Percona XtraBackup enables you to quickly restore a server, and mysqldump enables you to quickly restore data points. These address recovery time objectives.

For point-in-time recovery, it is recommended that you download binlogs on a regular basis (once an hour, for example).

Another option is binlog streaming. You can find more information on binlog streaming in our blog: Backing up binary log files with mysqlbinlog.

There is also a whitepaper that is the basis of my webinar here: MySQL Backup and Recovery Best Practices.

Delayed Slave

One way to save on operational overhead is to create a 24-hour delayed slave. This takes the place of the logical backup (mysqldump) as well as the binlog streaming. You want to ensure that you stop the delayed slave immediately following any issues. This ensures that the data does not get corrupted on the backup as well.

A delayed slave is created in 5.6 and above with:

CHANGE MASTER TO MASTER_DELAY = N;

After a disaster, you would issue:

STOP SLAVE;

Then, in order to get a point-in-time, you can use:

START SLAVE UNTIL MASTER_LOG_FILE = 'log_name', MASTER_LOG_POS = log_pos;

Restore

It is a good idea to test your backups at least once a quarter. Backups do not exist unless you know you can restore them. There are some recent high-profile cases where developers dropped tables or schemas, or data was corrupted in production, and in one case five different backup types were not viable to use to restore.

The best case scenario is an automated restore test that runs after your backup, and gives you information on how long it takes to restore (RTO) and how much data you can restore (RPO).

For more details on backups and disaster recovery, come to my webinar.

by Manjot Singh at July 18, 2017 05:06 PM

Upcoming Webinar Wednesday, July 19, 2017: Learning MySQL 5.7

MySQL 5.7

MySQL 5.7Join Percona’s, Technical Services Manager, Jervin Real as he presents Learning MySQL 5.7 on Wednesday, July 19, 2017 at 10:00 am PDT / 1:00 pm EDT (UTC-7).

MySQL 5.7 has a lot of new features. If you’ve dabbled with an older version of MySQL, it is worth learning what new features exist and how to use them. This webinar teaches you about new features such as multi-source replication, global transaction IDs (GTIDs), security improvements and more.

We’ll also discuss logical decoding. Logical decoding is one of the features under the BDR implementation, which allows bidirectional streams of data between Postgres instances. Also, it allows you to stream data outside Postgres into many other data systems.

Register for the webinar here.

MySQL 5.7Jervin Real, Technical Services Manager

Jervin is a member of the Technical Services at Percona, where he partners with customers to build reliable and highly-performant MySQL infrastructures, He also does other fun stuff like watching cat videos on the internet, reading bugs for novels and playing around servers for numbers. Jervin joined Percona in April of 2010.

by Dave Avery at July 18, 2017 01:13 PM

Percona Live Europe 2017 Call for Papers Deadline Extended to July 24, 2017

Percona Live Call for Papers

Percona Live Call for PapersWe are extending the Percona Live Open Source Database Conference Europe 2017 call for papers deadline to Monday, July 24, 2017. Get your submissions in now!

Between our Conference Committee working hard to review all the outstanding talk ideas, and the many community requests for more time, we didn’t want to shortchange any of our applicants. If you procrastinated too long, didn’t complete your talk submission or just plain forgot to submit, this is your reprieve! You have one extra week to pull together a proposal for a talk on open source databases at Percona Live Europe 2017.

The theme of Percona Live Europe 2017 is “Championing Open Source Databases.” There are sessions on MySQL, MariaDBMongoDB and Other Open Source Database technologies, including time series databases, PostgreSQL and RocksDB. Are you:

  • Working with MongoDB as a developer?
  • Creating a new MySQL-variant time series database?
  • Deploying MariaDB in a novel way?
  • Using open source database technology to solve a business issue?

Share your open source database experiences with peers and professionals in the open source community! We invite you to submit your speaking proposal for breakout, tutorial or lightning talk sessions:

  • Breakout Session. Broadly cover a technology area using specific examples. Sessions should be either 25 minutes or 50 minutes in length (including Q&A).
  • Tutorial Session. Present a technical session that aims for a level between a training class and a conference breakout session. Encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be three or six hours in length (including Q&A).
  • Lightning Talk. Give a five-minute presentation focusing on one key point that interests the open source community: technical, lighthearted or entertaining talks on new ideas, a successful project, a cautionary story, a quick tip or demonstration.

Submit your talk ideas now.

PLEASE NOTE: We have a new call for papers system that requires everyone to register for a new account. You can save proposals as drafts and edit them later. Once a proposal is submitted, it is final. You can NOT change a proposal once submitted.

Register for Percona Live Europe 2017 now! Early Bird registration lasts until August 8

Last year’s Percona Live Europe sold out, and we’re looking to do the same at this year’s conference. Don’t miss your chance to get your ticket at its most affordable price. Click here to register.

Percona Live 2017 sponsorship opportunities are available now

Percona Live Europe 2017 is just around the corner. Have you secured your sponsorship yet? Last year’s event sold out! Booth selection for our limited sponsorship opportunities is on a first-come, first-served basis. Click here to find out how to sponsor.

Don’t miss out on a special room rate at the Percona Live Europe 2017 venue

This year, Percona Live Europe is being held at the Radisson Blu Royal Hotel, Dublin. You can get a special room rate when you attend the conference. But hurry, rooms are going fast! To get the special room rate:

  1. Visit https://www.radissonblu.com/en/royalhotel-dublin.
  2. Click BOOK NOW at the top right.
  3. Enter your preferred check-in and check-out dates, and how many rooms.
  4. From the drop-down “Select Rate Type,” choose Promotional Code.
  5. Enter the code PERCON to get the discount.

This special deal includes breakfast each morning! The group rate only applies if used within the Percona Live Europe group block dates (September 25-27, 2017). The deadline for booking with this rate is July 24, 2017.

by Dave Avery at July 18, 2017 03:21 AM

July 17, 2017

Peter Zaitsev

Percona Server for MongoDB 3.2.14-3.4 is Now Available

Percona Server for MongoDB 3.2

Percona Server for MongoDB 3.2.14-3.4Percona announces the release of Percona Server for MongoDB 3.2.14-3.4 on July 17, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database that supports the MongoDB v3.2 protocol and drivers. It extends MongoDB with MongoRocks, Percona Memory Engine, and PerconaFT storage engine, as well as enterprise-grade features like External Authentication, Audit Logging, Profiling Rate Limiting, and Hot Backup at no extra cost. Percona Server for MongoDB requires no changes to MongoDB applications or code.

NOTE: This release deprecates the PerconaFT storage engine. It will not be available in future releases.

This release is based on MongoDB 3.2.14 and includes the following additional changes:

New Features

Bugs Fixed

  • #PSMDB-67: Fixed mongod service status messages.

Percona Server for MongoDB 3.2.14-3.4 release notes are available in the official documentation.

by Alexey Zhebel at July 17, 2017 07:47 PM

Valeriy Kravchuk

Fun with Bugs #53 - On Some Percona Software Bugs I've Reported

So far in this series I had written only/mostly about MySQL server bugs. Does it mean that there are no unique or interesting bugs in other MySQL-related software or MySQL forks and projects that use MySQL code?

Surely no, it doesn't. So, today I'd like to present a short list of currently active (that is, new or probably not yet resolved) bugs that I had reported for Percona's software at Launchpad over the years I use it. The list is based on this link, but I excluded minor reports and those that probably are just forgotten, while the problem is fixed long time ago.

Let me start with bugs in XtraBackup, the tool that plays a key role in most MySQL environments that are not using Oracle's Enterprise subscriptions:
  • lp:1704636 - "xtrabackup 2.4.7 crashes while backing up MariaDB 10.2.x with ftwrl-* options". I've reported it yesterday. Probably some unique background thread of MariaDB 10.2.x is not taken into account and some calculations are performed for it that lead to crash. Sorry, but there is no way to use --ftwrl-* options of XtraBackup with MariaDB 10.2.x.
  • lp:1418150 - "Provide a way for xtrabackup to communicate with storage managers via XBSA API". This is a feature request created based on customer's request to store backups in Tivoli Storage Manager. Similar request was recently noted from MariaDB customer as well. MySQL's Enterprise Backup allows to work with storage managers, even if not in the most direct way. I've used TSM a lot in the past in production as Informix DBA, so I care about this feature more than about many others...
Let's continue with another unique software package from Percona, Percona Toolkit. I still use some of the tools every day (and do not hesitate to write about this), and would like to see more attention to the toolkit in areas other than MongoDB. I've reported the following two bugs in the past that are still active:
  • lp:1566939 - "pt-stalk should not call mysqladmin debug by default". Having in mind documentation Bug #71346 that Oracle just refuses to fix, I'd say there may be a dozen of people in the world who can make a good use of the output even when it's correct, useful and really needed. For the rest of us it just pollutes error logs and pt-stalk data collections.
  • lp:1533271 - "pt-online-schema-change does not report anything when it waits for slave to replicate new table creation". The problem is clear enough from description. There is nothing useful to find out the reason for the hang in this case without PTDEBUG=1.
Finally, let's check several Percona Server bugs. Note that in many cases i had to file bug for Percona Server even if upstream MySQL was also affected, in a hope that Percona may fix the problem faster (and this happened sometimes). The following bugs were not inherited from MySQL directly:
  • lp:1455432 - "Audit log plugin should allow to include or exclude logging for specific users". This was request for the same functionality that was originally provided in MariaDB's implementation. Actually the bug was fixed in Percona Server 5.6.32-78.0 and 5.7.14-7, but remains in the list as it is still just "Triaged" for 5.5.x branch. Time to close it completely, I'd say.
  • lp:1408971 - "Manual does not describe Innodb_current_row_locks in any useful way". Nothing changed, check the manual.
  • lp:1401528 - "Add option to switch to "old" get_lock() behavior". It's hard to introduce new features in minor release during GA period, and do it right. Somebody had to think about backward compatibility, upgrades and downgrades, and about the completeness of implementation (see lp:1396336 about that). In this specific case Percona failed to do it right.
  • lp:1396035 - "Percona-Server-test-56 RPM package does NOT check /usr/share/percona-server directory and thus is unusable "as is". Packaging is another thing that is hard to do right. I was not lazy today, so I re-check this on CentOS 6.x. with 5.7.18, and ./mtr is somewhat useful there:

    [root@centos mysql-test]# ls -l ./mtr
    lrwxrwxrwx. 1 root root 19 Jun  7 18:20 ./mtr -> ./mysql-test-run.pl
    [root@centos mysql-test]# perl ./mtr analyze
    Logging: ./mtr  analyze
    MySQL Version 5.7.18
    Checking supported features...
     - SSL connections supported
    Collecting tests...
    Checking leftover processes...
    Removing old var directory...
    Creating var directory '/usr/share/mysql-test/var'...
    Installing system database...
    Using parallel: 1

    ==============================================================================

    TEST                                      RESULT   TIME (ms) or COMMENT
    --------------------------------------------------------------------------

    worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
    worker[1] mysql-test-run: WARNING: running this script as _root_ will cause some tests to be skipped
    main.analyze                             [ pass ]  10920
    --------------------------------------------------------------------------
    The servers were restarted 0 times
    Spent 10.920 of 58 seconds executing testcases

    Completed: All 1 tests were successful.
  • lp:1263968 - "DOCS: http://www.percona.com/doc/percona-server/5.6/installation.html does not provide enough details on further steps after installing .deb". I'd still say that some details requested are missing. Writing proper manuals is hard.
I had surely reported a lot more bugs, but many of them have related upstream MySQL ones, so there are no reasons to discuss them separately now. Also, note that in Percona I was involved in setting priorities for the bugs and mostly reported them based on customer issues or complains. So, no wonder that my bug reports were fixed fast, and not so many active ones left by now.

There was yet another Percona tool that I spent as lot of time on, both on building it from current sources, using it and  reporting problems with it. It's Percona Playback, Based on this, they seem to work on other but similar tool now. Still, I consider this request useful for any tool that allows to replay specific load: lp:1365538.

by Valeriy Kravchuk (noreply@blogger.com) at July 17, 2017 09:40 AM

July 14, 2017

Peter Zaitsev

Percona Monitoring and Management 1.2.0 is Now Available

Percona Monitoring and Management (PMM)

Percona announces the release of Percona Monitoring and Management 1.2.0 on July 14, 2017.

For installation instructions, see the Deployment Guide.


Changes in PMM Server

PMM Server 1.2.0 introduced the following changes:

New Features

  • PMM-737: New graphs in System Overview dashboard:
      • Memory Advanced Details
      • Saturation Metrics

  • PMM-1090: Added ESXi support for PMM Server virtual appliance.

UI Fixes

  • PMM-707: Fixed QPS metric in MySQL Overview dashboard to always show queries per second regardless of the selected interval.
  • PMM-708: Fixed tooltips for graphs that displayed incorrectly.
  • PMM-739PMM-797: Fixed PMM Server update feature on the landing page.
  • PMM-823: Fixed arrow padding for collapsible blocks in QAN.
  • PMM-887: Disabled the Add button when no table is specified for showing query info in QAN.
  • PMM-888: Disabled the Apply button in QAN settings when nothing is changed.
  • PMM-889: Fixed the switch between UTC and local time zone in the QAN time range selector.
  • PMM-909: Added message No query example when no example for a query is available in QAN.
  • PMM-933: Fixed empty tooltips for Per Query Stats column in the query details section of QAN.
  • PMM-937: Removed the percentage of total query time in query details for the TOTAL entry in QAN (because it is 100% by definition).
  • PMM-951: Fixed the InnoDB Page Splits graph formula in the MySQL InnoDB Metrics Advanced dashboard.
  • PMM-953: Enabled stacking for graphs in MySQL Performance Schema dashboard.
  • PMM-954: Renamed Top Users by Connections graph in MySQL User Statistics dashboard to Top Users by Connections Created and added the Connections/sec label to the Y-axis.
  • PMM-957: Refined titles for Client Connections and Client Questions graphs in ProxySQL Overview dashboard to mentioned that they show metrics for all host groups (not only the selected one).
  • PMM-961: Fixed the formula for Client Connections graph in ProxySQL Overview dashboard.
  • PMM-964: Fixed the gaps for high zoom levels in MySQL Connections graph on the MySQL Overview dashboard.
  • PMM-976: Fixed Orchestrator handling by supervisorctl.
  • PMM-1129: Updated the MySQL Replication dashboard to support new connection_name label introduced in mysqld_exporter for multi-source replication monitoring.
  • PMM-1054: Fixed typo in the tooltip for the Settings button in QAN.
  • PMM-1055: Fixed link to Query Analytics from Metrics Monitor when running PMM Server as a virtual appliance.
  • PMM-1086: Removed HTML code that showed up in the QAN time range selector.

Bug Fixes

  • PMM-547: Added warning page to Query Analytics app when there are no PMM Clients running the QAN service.
  • PMM-799: Fixed Orchestrator to show correct version.
  • PMM-1031: Fixed initialization of Query Profile section in QAN that broke after upgrading Angular.
  • PMM-1087: Fixed QAN package building.

Other Improvements

  • PMM-348: Added daily log rotation for nginx.
  • PMM-968: Added Prometheus build information.
  • PMM-969: Updated the Prometheus memory usage settings to leverage new flag. For more information about setting memory consumption by PMM, see FAQ.

Changes in PMM Client

PMM Client 1.2.0 introduced the following changes:

New Features

  • PMM-1114: Added PMM Client packages for Debian 9 (“stretch”).

Bug Fixes

  • PMM-481PMM-1132: Fixed fingerprinting for queries with multi-line comments.
  • PMM-623: Fixed mongodb_exporter to display correct version.
  • PMM-927: Fixed bug with empty metrics for MongoDB query analytics.
  • PMM-1126: Fixed promu build for node_exporter.
  • PMM-1201: Fixed node_exporter version.

Other Improvements

  • PMM-783: Directed mongodb_exporter log messages to stderr and excluded many generic messages from the default INFO logging level.
  • PMM-756: Merged upstream node_exporter version 0.14.0.
    PMM deprecated several collectors in this release:

    • gmond – Out of scope.
    • megacli – Requires forking, to be moved to textfile collection.
    • ntp – Out of scope.

    It also introduced the following breaking change:

    • Collector errors are now a separate metric: node_scrape_collector_success, not a label on node_exporter_scrape_duration_seconds
  • PMM-1011: Merged upstream mysqld_exporter version 0.10.0.
    This release introduced the following breaking change:

    • mysql_slave_... metrics now include an additional connection_name label to support MariaDB multi-source replication.

About Percona Monitoring and Management

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

Percona Monitoring and Management is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

by Alexey Zhebel at July 14, 2017 05:51 PM

A Little Trick Upgrading to MySQL 5.7

Upgrading to MySQL 5.7

Upgrading to MySQL 5.7In this blog post, I’ll look at a trick we use at Percona when upgrading to MySQL 5.7.

I’ll be covering this subject (and others) in my webinar Learning MySQL 5.7 on Wednesday, July 19, 2017.

We’ve been doing upgrades for quite a while here are Percona, and we try to optimize, standardize and improve this process to save time. When upgrading to MySQL 5.7, more often than not you need to run REPAIR or ALTER via mysql_upgrade to a number of MySQL tables. Sometimes a few hundred, sometimes hundreds of thousands.

One way to cut some time from testing or executing mysql_upgrade is to combine it with mysqlcheck. This identifies tables that need to be rebuilt or repaired. The first step is to capture the output of this process:

revin@acme:~$ mysqlcheck --check-upgrade --all-databases > mysql-check.log

This provides a lengthy output of what needs to be done to successfully upgrade our tables. On my test data, I get error reports like the ones below. I’ll need to take the specified action against them:

ads.agency
error    : Table upgrade required. Please do "REPAIR TABLE `agency`" or dump/reload to fix it!
store.categories
error    : Table rebuild required. Please do "ALTER TABLE `categories` FORCE" or dump/reload to fix it!

Before we run through this upgrade, let’s get an idea of how long it would take for a regular mysql_upgrade to complete on this dataset:

revin@acme:~$ time mysql_upgrade
Enter password:
Checking if update is needed.
Checking server version.
Running queries to upgrade MySQL server.
Checking system database.
mysql.columns_priv                                 OK
mysql.db                                           OK
...
mysql.user                                         OK
Upgrading the sys schema.
Checking databases.
ads.account_preference_assoc         OK
...
Repairing tables
...
ads.agency
Note     : TIME/TIMESTAMP/DATETIME columns of old format have been upgraded to the new format.
status   : OK
...
`store`.`categories`
Running  : ALTER TABLE `store`.`categories` FORCE
status   : OK
...
Upgrade process completed successfully.
Checking if update is needed.
real	25m57.482s
user	0m0.024s
sys	0m0.072s

On a cold server, my baseline above took about 25 minutes.

The second step on our time-saving process is to identify the tables that need some action (in this case, REPAIR and ALTER … FORCE). Generate the SQL statements to run them and put them into a single SQL file:

revin@acme:~$ for t in $(cat mysql-check.log |grep -B1 REPAIR | egrep -v 'REPAIR|--');
	do echo "mysql -e 'REPAIR TABLE $t;'" >> upgrade.sql; done
revin@acme:~$ for t in $(cat mysql-check.log |grep -B1 ALTER | egrep -v 'ALTER|--');
	do echo "mysql -e 'ALTER TABLE $t FORCE;'" >> upgrade.sql; done

My upgrade.sql file will have something like this:

mysql -e 'ALTER TABLE store.categories FORCE;'
mysql -e 'REPAIR TABLE ads.agency;'

Now we should be ready to run these commands in parallel as the third step in the process:

revin@acme:~$ time parallel -j 4 -- < upgrade.sql
...
real	17m31.448s
user	0m1.388s
sys	0m0.616s

Getting some parallelization is not bad, and the process improved by about 38%. If we are talking about multi-terabyte data sets, then it is already a big gain.

On the other hand, my dataset has a few tables that are bigger than the rest. Since mysqlcheck processes them in a specific order, one of the threads was processing most of them instead of spreading them out evenly to each thread by size. To fix this, we need to have an idea of the sizes of each table we will be processing. We can use a query from the INFORMATION_SCHEMA.TABLES for this purpose:

revin@acme:~$ for t in $(cat mysql-check.log |grep -B1 ALTER | egrep -v 'ALTER|--');
	do d=$(echo $t|cut -d'.' -f1); tbl=$(echo $t|cut -d'.' -f2);
	s=$(mysql -BNe "select sum(index_length+data_length) from information_schema.tables where table_schema='$d' and table_name='$tbl';");
	echo "$s |mysql -e 'ALTER TABLE $t FORCE;'" >> table-sizes.sql; done
revin@acme:~$ for t in $(cat mysql-check.log |grep -B1 REPAIR | egrep -v 'REPAIR|--');
	do d=$(echo $t|cut -d'.' -f1); tbl=$(echo $t|cut -d'.' -f2);
	s=$(mysql -BNe "select sum(index_length+data_length) from information_schema.tables where table_schema='$d' and table_name='$tbl';");
	echo "$s |mysql -e 'REPAIR TABLE $t;'" >> table-sizes.sql; done

Now my table-sizes.sql file will have contents like below, which I can sort and pass to the parallel command again and cut even more time!

32768 |mysql -e 'REPAIR TABLE ads.agency;'
81920 |mysql -e 'ALTER TABLE store.categories FORCE;'

revin@acme:~$ cat table-sizes.sql |sort -rn|cut -d'|' -f2 > upgrade.sql
revin@acme:~$ time parallel -j 4 -- < upgrade.sql
...
real	8m1.116s
user	0m1.260s
sys	0m0.624s

This go-around, my total execution time is 8 minutes – a good 65% improvement. To wrap it up, we will need to run mysql_upgrade one last time so that the system tables are also upgraded, the tables are checked again and then restart the MySQL server as instructed by the manual:

revin@acme:~$ time mysql_upgrade --force

The whole process should be easy to automate and script, depending on your preference. Lastly: YMMV. If you have one table that is more than half the size of your total data set, there might not be big gains.

If you want to learn more about upgrading to MySQL 5.7, come to my webinar on Wednesday, July 19: Learning MySQL 5.7. This process is only one of the phases in a multi-step upgrade process when moving to 5.7. I will discuss them in more detail next week. Register now from the link below, and I’ll talk to you soon!

by Jervin Real at July 14, 2017 05:20 PM

July 13, 2017

Peter Zaitsev

Setting Up Percona PAM with Active Directory for External Authentication

Percona PAM

Percona PAMIn this blog post, we’ll look at how to set up Percona PAM with Active Directory for external authentication.

In my previous article on Percona PAM, I demonstrated how to use Samba as a domain, and how easy it is to create domain users and groups via the samba-tool. Then we configured nss-pam-ldapd and nscd to enumerate user and group information via LDAP calls, and authenticate users from this source.

This time around, I will demonstrate two other ways of using Active Directory for external authentication by joining the domain via SSSD or Winbind. System Security Services Daemon (SSSD) allows you to configure access to several authentication hosts such as LDAP, Kerberos, Samba and Active Directory and have your system use this service for all types of lookups. Winbind, on the other hand, pulls data from Samba or Active Directory only. If you’re mulling over using SSSD or Winbind, take a look at this article on what SSSD or Winbind support.

For both methods, we’ll use realmd. That makes it easy to join a domain and enumerate users from it.

My testbed environment consists of two machines:

Samba PDC
OS: CentOS 7
IP Address: 172.16.0.10
Hostname: samba-10.example.com
Domain name: EXAMPLE.COM
DNS: 8.8.8.8(Google DNS), 8.8.4.4(Google DNS), 172.16.0.10(Samba)
Firewall: none

Note: Please follow the steps in the last article for setting up the Samba PDC environment.

Percona Server 5.7 with LDAP authentication via SSS or WinBind
OS: CentOS 7
IP Address: 172.16.0.21
Hostname: ps-ldap-21.example.com
DNS: 172.16.0.10(Samba PDC)

Installing realmd and Its Dependencies

  1. First, we need to make sure that the time is in sync (since this is a requirement for joining domains). Install NTP and make sure that it starts up at boot time:
    [root@ps-ldap-21 ~]# yum -y install ntp
    * * *
    Installed:
    ntp.x86_64 0:4.2.6p5-25.el7.centos.2
    * * *
    [root@ps-ldap-21 ~]# ntpdate 0.centos.pool.ntp.org
    systemctl enable ntpd.service
    systemc 3 Jul 03:48:35 ntpdate[3708]: step time server 202.90.132.242 offset 1.024550 sec
    [root@ps-ldap-21 ~]# systemctl enable ntpd.service
    Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.
    [root@ps-ldap-21 ~]# systemctl start ntpd.service
  2. Install realmd and its dependencies for SSSD or Winbind.
    For SSSD:
    yum -y install realmd oddjob oddjob-mkhomedir sssd adcli samba-common-tools

    For Winbind:
    yum -y install realmd oddjob oddjob-mkhomedir samba-winbind-clients samba-winbind samba-common-tools

Joining the Domain via SSSD and Preparing It for Percona PAM

  1. Run realm discover domain for realmd to discover what type of server it’s connecting to and what packages dependencies need to be installed:
    [root@ps-ldap-21 ~]# realm discover example.com
    example.com
    type: kerberos
    realm-name: EXAMPLE.COM
    domain-name: example.com
    configured: no
    server-software: active-directory
    client-software: sssd
    required-package: oddjob
    required-package: oddjob-mkhomedir
    required-package: sssd
    required-package: adcli
    required-package: samba-common-tools

    Our Samba PDC is detected as an Active Directory Controller, and the packages required have been installed previously.
  2. The next step is to join the domain by running realm join domain. If you want to get more information, add the
    --verbose option
    . You could also add the
    -U user
     option if you want to use a different administrator account.
    [root@ps-ldap-21 ~]# realm join example.com --verbose
     * Resolving: _ldap._tcp.example.com
     * Performing LDAP DSE lookup on: 172.16.0.10
     * Successfully discovered: example.com
    Password for Administrator:
     * Required files: /usr/sbin/oddjobd, /usr/libexec/oddjob/mkhomedir, /usr/sbin/sssd, /usr/bin/net
     * LANG=C LOGNAME=root /usr/bin/net -s /var/cache/realmd/realmd-smb-conf.DM6W2Y -U Administrator ads join example.com
    Enter Administrator's password:
    Using short domain name -- EXAMPLE
    Joined 'PS-LDAP-21' to dns domain 'example.com'
     * LANG=C LOGNAME=root /usr/bin/net -s /var/cache/realmd/realmd-smb-conf.DM6W2Y -U Administrator ads keytab create
    Enter Administrator's password:
     * /usr/bin/systemctl enable sssd.service
    Created symlink from /etc/systemd/system/multi-user.target.wants/sssd.service to /usr/lib/systemd/system/sssd.service.
     * /usr/bin/systemctl restart sssd.service
     * /usr/bin/sh -c /usr/sbin/authconfig --update --enablesssd --enablesssdauth --enablemkhomedir --nostart && /usr/bin/systemctl enable oddjobd.service && /usr/bin/systemctl start oddjobd.service
     * Successfully enrolled machine in realm

    As you can see from the command above, the realm command simplifies SSSD configuration and uses existing tools such as net and authconfig to join the domain and use it as an identity provider.
  3. Let’s test if we enumerate existing accounts by using the
    id
     command:
    [root@ps-ldap-21 ~]# id jervin
    id: jervin: no such user
    [root@ps-ldap-21 ~]# id jervin@example.com
    uid=343401115(jervin@example.com) gid=343400513(domain users@example.com) groups=343400513(domain users@example.com),343401103(support@example.com)

    As you can see, the user can be queried if the domain is specified. So if you want to log in as ‘jervin@example.com’, in Percona Server for MySQL you’ll need to create the user as ‘jervin@example.com’ and not ‘jervin’. For example:
    # Creating user 'jervin@example.com'
    CREATE USER 'jervin@example.com'@'%' IDENTIFIED WITH auth_pam;
    # Logging in as 'jervin@example.com'
    mysql -u 'jervin@example.com'

    If you want to omit the domain name when logging in, you’ll need to replace “use_fully_qualified_names = True” to “use_fully_qualified_names = False” in /etc/sssd/sssd.conf, and then restart SSSD. If you do this, then the user can be found without providing the domain:
    [root@ps-ldap-21 ~]# id jervin
    uid=343401115(jervin) gid=343400513(domain users) groups=343400513(domain users),343401103(support)
    [root@ps-ldap-21 ~]# id jervin@example.com
    uid=343401115(jervin) gid=343400513(domain users) groups=343400513(domain users),343401103(support)

    When you create the MySQL user, you don’t need to include the domain anymore:
    # Creating user 'jervin'
    CREATE USER 'jervin'@'%' IDENTIFIED WITH auth_pam;
    # Logging in as 'jervin'
    mysql -u jervin
  4. Optionally, you can specify which users and groups can log in by adding these settings to SSSD:
    Domain access filter
    Under “[domain/example.com]” /etc/sssd/sssd.conf, you can add the following to specify that only users that are members of support and dba are allowed to use SSSD. For example:
    ad_access_filter = (|(memberOf=CN=dba,CN=Users,DC=example,DC=com)(memberOf=CN=support,CN=Users,DC=example,DC=com))

    Simple filters
    You can use
    realm permit
     or
    realm permit -g
     to allow particular users or groups. For example:
    realm permit jervin
    realm permit -g support
    realm permit -g dba

    You can check sssd.conf on how these ACLs are implemented:
    access_provider = simple
    simple_allow_groups = support, dba
    simple_allow_users = jervin
  5. Finally, configure Percona Server for MySQL to authenticate to SSSD by creating /etc/pam.d/mysqld with this content:
    auth required pam_sss.so
    account required pam_sss.so
  6. Done. All you need to do now is to install Percona Server for MySQL, enable the auth_pam and auth_pam_compat plugins, and add PAM users. You can then check for authentication errors at /var/log/secure for troubleshooting. You could also get verbose logs by adding debug_level=[1-9] to [nss], [pam], or [domain] and then restarting SSSD. You can view the logs from /var/log/sssd.

Joining the Domain via Winbind and Preparing it for Percona PAM

  1. The
    realm
     command assumes that SSSD is used. To change the client software, use
    --client-software=winbind
     instead:
    [root@ps-ldap-21 ~]# realm --client-software=winbind discover example.com
    example.com
        type: kerberos
        realm-name: EXAMPLE.COM
        domain-name: example.com
        configured: no  
        server-software: active-directory
        client-software: winbind
        required-package: oddjob-mkhomedir
        required-package: oddjob
        required-package: samba-winbind-clients
        required-package: samba-winbind
        required-package: samba-common-tools
  2. Since the required packages have already been installed, we can now attempt to join this host to the domain:
    [root@ps-ldap-21 ~]# realm --verbose --client-software=winbind join example.com
     * Resolving: _ldap._tcp.example.com
     * Performing LDAP DSE lookup on: 172.16.0.10
     * Successfully discovered: example.com
    Password for Administrator:
     * Required files: /usr/libexec/oddjob/mkhomedir, /usr/sbin/oddjobd, /usr/bin/wbinfo, /usr/sbin/winbindd, /usr/bin/net
     * LANG=C LOGNAME=root /usr/bin/net -s /var/cache/realmd/realmd-smb-conf.9YEO2Y -U Administrator ads join example.com
    Enter Administrator's password:
    Using short domain name -- EXAMPLE
    Joined 'PS-LDAP-21' to dns domain 'example.com'
     * LANG=C LOGNAME=root /usr/bin/net -s /var/cache/realmd/realmd-smb-conf.9YEO2Y -U Administrator ads keytab create
    Enter Administrator's password:
     * /usr/bin/systemctl enable winbind.service
    Created symlink from /etc/systemd/system/multi-user.target.wants/winbind.service to /usr/lib/systemd/system/winbind.service.
     * /usr/bin/systemctl restart winbind.service
     * /usr/bin/sh -c /usr/sbin/authconfig --update --enablewinbind --enablewinbindauth --enablemkhomedir --nostart && /usr/bin/systemctl enable oddjobd.service && /usr/bin/systemctl start oddjobd.service
     * Successfully enrolled machine in realm
  3. Let’s test if we enumerate existing accounts by using the
    id
     command
    [root@ps-ldap-21 ~]# id jervin
    id: jervin: no such user
    [root@ps-ldap-21 ~]# id jervin@example.com
    uid=10000(EXAMPLEjervin) gid=10000(EXAMPLEdomain users) groups=10000(EXAMPLEdomain users),10001(EXAMPLEsupport)

    Unfortunately for Winbind, users identified with their domains cannot login to Percona Server for MySQL. We need to disable this from the Samba config (performed in the next step).
  4. Edit /etc/samba/smb.conf, and change “winbind use default domain = no” to “winbind use default domain = yes”. Restart the Winbind service. For example:
    vi /etc/samba/smb.conf
    #Look for:
    "winbind use default domain = no"
    #Change to:
    "winbind use default domain = yes"
    systemctl restart winbind.service

    Try running
    id
     again:
    [root@ps-ldap-21 ~]# id jervin
    uid=10000(jervin) gid=10000(domain users) groups=10000(domain users),10001(support)
    [root@ps-ldap-21 ~]# id jervin@example.com
    id: jervin@example.com: no such user

    When you create the MySQL user, do not include the domain name. For example:
    # Creating user 'jervin'
    CREATE USER 'jervin'@'%' IDENTIFIED WITH auth_pam;
    # Logging in as 'jervin'
    mysql -u jervin
  5. Finally, configure Percona Server for MySQL to authenticate to Winbind by creating /etc/pam.d/mysqld with this content:
    auth required pam_winbind.so
    account required pam_winbind.so

You can debug authentication attempts by reviewing the logs at /var/log/secure. You may also change “auth required pam_winbind.so” to “auth required pam_winbind.so debug” in /etc/pam.d/mysqld to get verbose logging in the same file.

As for filtering who can authenticate with Winbind, you can add

require_membership_of=group_name
 under the [global] section of /etc/security/pam_winbind.conf

You’ll need to restart winbind daemon to apply the changes.

Conclusion

Thanks to realmd, it’s easier to setup Active Directory as an identity provider. With minimal configuration tweaks, you can use the identity provider to authenticate MySQL users.

by Jaime Sicam at July 13, 2017 07:17 PM

MariaDB AB

Analysis of Financial Time Series Data Using MariaDB ColumnStore

Analysis of Financial Time Series Data Using MariaDB ColumnStore Satoru Goto Thu, 07/13/2017 - 11:31

MariaDB ColumnStore is a GPLv2 open source columnar database built on MariaDB Server. It is a fork and evolution of the former InfiniDB. It can be deployed in the cloud (optimized for Amazon Web Services) or on a local cluster of Linux servers using either local or networked storage. In this blog post, I will show some examples of analysis of financial time series data using MariaDB ColumnStore.

Free Forex Historical Data at HistData.com

First of all, we will download forex historical data (GBPUSD M1 Full 2016 Year Data) which is freely available on HistData.com. HistaData.com forex historical data look like:

20160103 170000;1.473350;1.473350;1.473290;1.473290;0
20160103 170100;1.473280;1.473360;1.473260;1.473350;0
20160103 170200;1.473350;1.473350;1.473290;1.473290;0
20160103 170300;1.473300;1.473330;1.473290;1.473320;0

1st column is timestamp of currency rate of every minute, but needed to convert the format, in order to fit with ColumnStore DATETIME data type. I modified the timestamp format using the following simple Ruby script :

convert.rb:
#!/usr/bin/env ruby
id = 0
while line = gets
  timestamp, open, high, low, close = line.split(";")
  year, month, day, hour, minute, second = timestamp.unpack("a4a2a2xa2a2a2")
  id+= 1
  print "#{id},#{year}-#{month}-#{day} #{hour}:#{minute},”
  puts  [open, high, low, close].join(“,”)
end

For example, you can convert the historical data as follows:

ruby convert.rb DAT_ASCII_GBPUSD_M1_2016.csv > gbpusd2016.csv

The converted forex historical data will be:

1,2016-01-03 17:00,1.473350,1.473350,1.473290,1.473290
2,2016-01-03 17:01,1.473280,1.473360,1.473260,1.473350
3,2016-01-03 17:02,1.473350,1.473350,1.473290,1.473290
4,2016-01-03 17:03,1.473300,1.473330,1.473290,1.473320

This demo was performed on following setup:

  • CPU: Intel Core i7-7560U 2.4 GHz 2 physical cores / 4 logical cores
  • OS: CentOS 7.3 ( virtualized on VMware Workstation 12 Pro on Windows 10 Pro)
  • Memory: 4GB
  • ColumnStore: 1 UM & 1 PM on single node

Next, we create a database and a table with MariaDB monitor:

# mcsmysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.1.23-MariaDB Columnstore 1.0.9-1

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> CREATE DATABASE forex;
MariaDB [(none)]> USE forex;
MariaDB [forex]> CREATE TABLE gbpusd (
    id INT, 
    time DATETIME,
    open DOUBLE,
    high DOUBLE, 
    low DOUBLE, 
    close DOUBLE
    ) engine=ColumnStore default character set=utf8;

Notes:
With standard ColumnStore installation, MariaDB monitor command is mcsmysql ( aliased to /usr/local/mariadb/columnstore/mysql/bin/mysql )
Storage engine must be ColumnStore

Now it’s time to import CSV data whose timestamps are corrected. For bulk import, we use cpimport command with ColumnStore. With -s (separator) option, we specify the delimiter character (in this example, comma). 

# cpimport -s ',' forex gbpusd gbpusd2016.csv
Locale is : C
Column delimiter : ,

Using table OID 3163 as the default JOB ID
Input file(s) will be read from : /home/vagrant/histdata
Job description file : /usr/local/mariadb/columnstore/data/bulk/tmpjob/3163_D20170624_T103843_S950145_Job_3163.xml

…

2017-06-24 10:38:45 (29756) INFO : For table forex.gbpusd: 372480 rows processed and 372480 rows inserted.
2017-06-24 10:38:46 (29756) INFO : Bulk load completed, total run time : 2.11976 seconds

We could import 372,280 rows of forex data in 2 seconds. LOAD DATA LOCAL INFILE can be used as well with ColumnStore.

Sample queries using functions

The reason why GBPUSD M1 data 2016 was chosen, is to track very quick moves of GBPUSD before and after the Brexit vote, which was held on 23th June 2016.

gbpusd-h4.png

 

Now we look at maximum, minimum and drop off rate during 23th – 24th June 2016, using Windows Functions of ColumnStore.

MariaDB [forex]> SELECT MAX(close) FROM gbpusd WHERE time BETWEEN TIMESTAMP ('2016-06-23') AND TIMESTAMP ('2016-06-25');
+------------+
| MAX(close) |
+------------+
|    1.50153 |
+------------+
1 row in set (0.03 sec)

MariaDB [forex]> SELECT MIN(close) FROM gbpusd WHERE time BETWEEN TIMESTAMP ('2016-06-23') AND TIMESTAMP ('2016-06-25');
+------------+
| MIN(close) |
+------------+
|    1.32322 |
+------------+
1 row in set (0.03 sec)

MariaDB [forex]> SELECT MAX(close)/MIN(close) FROM gbpusd WHERE time BETWEEN TIMESTAMP ('2016-06-23') AND TIMESTAMP ('2016-06-25');
+-----------------------+
| MAX(close)/MIN(close) |
+-----------------------+
|    1.1347546399658233 |
+-----------------------+
1 row in set (0.03 sec)

GBPUSD fell from 1.50 to 1.32 (-13%) in 2 days (actually in half a day). 

Correlation GBPUSD – USDJPY on 23th June 2016

Next graph shows scatter plot of GBPUSD vs. USDJPY during 23th – 24th June 2016. (USDJPY M1 2016 were imported same as GBPUSD)

scatter-plot-gbpusd-usdjpy.png

Note: GBPUSD is multiplied by 100, then subtracted by 130. USDJPY is subtracted by 110.

Now we calculate Pearson correlation coefficient using  statistical aggregation functions. Following SELECT statement was used: 

SELECT
  (AVG(gbpusd.close*usdjpy.close) - AVG(gbpusd.close)*AVG(usdjpy.close)) / 
  (STDDEV(gbpusd.close) * STDDEV(usdjpy.close))
AS correlation_coefficient_population
FROM gbpusd
JOIN usdjpy ON gbpusd.time = usdjpy.time
WHERE
  gbpusd.time BETWEEN TIMESTAMP('2016-06-23') AND TIMESTAMP('2016-06-25');

Query result:

+ ------------------------------------+
| correlation_coefficient_population |
+------------------------------------+
|                 0.9647697302087848 |
+------------------------------------+
1 row in set (0.57 sec)

GBPUSD and USDJPY were highly correlated during the UK EU membership referendum.

Moving Average GBPUSD 23th - 24th June 2016

In Forex trading, moving average is often used to smooth out price fluctuations. The following query allows moving average of sliding 13 rows window.

SELECT 
  time,
  close,
  AVG(close) OVER ( 
               ORDER BY time ASC
                 ROWS BETWEEN 
                   6 PRECEDING AND
                   6 FOLLOWING ) AS MA13,
  COUNT(close) OVER ( 
               ORDER BY time ASC
                 ROWS BETWEEN 
                   6 PRECEDING AND
                   6 FOLLOWING ) AS row_count
FROM gbpusd
WHERE time BETWEEN TIMESTAMP('2016-06-23') AND TIMESTAMP('2016-06-25');

GBPUSD-MA13.png

Summary

In this blog post, the following features of MariaDB ColumnStore were explained:

  • With cpimport command, users can easily perform fast bulk import to MariaDB ColumnStore tables.
  • MariaDB ColumnStore allows fast and easy data analytics using SQL 2003 compliant Window Functions, such as MAX, MIN, AVG and STDDEV.
  • No index management for query performance tuning needed.

MariaDB ColumnStore can be downloaded here, detailed instructions to install and test MariaDB ColumnStore on Windows using Hyper-V can be found here.

MariaDB ColumnStore is a GPLv2 open source columnar database built on MariaDB Server. It is a fork and evolution of the former InfiniDB. It can be deployed in the cloud (optimized for Amazon Web Services) or on a local cluster of Linux servers using either local or networked storage. In this blog post, I will show some examples of analysis of financial time series data using MariaDB ColumnStore.

 

annaj moore

annaj moore

Mon, 07/17/2017 - 07:02

I truly value this post. I've been searching all over for this! Thank heavens I discovered it on this blog . I truly respect your work and i seek in future i will return after more data. like this one.You have filled my heart with joy! Much appreciated once more! I have some good work experience with best essay writing service [https://buyessays.us] and my words are clearly based on what I felt through such processes in the past.

Login or Register to post comments

by Satoru Goto at July 13, 2017 03:31 PM

Colin Charles

CFP for Percona Live Europe Dublin 2017 closes July 17 2017!

I’ve always enjoyed the Percona Live Europe events, because I consider them to be a lot more intimate than the event in Santa Clara. It started in London, had a smashing success last year in Amsterdam (conference sold out), and by design the travelling conference is now in Dublin from September 25-27 2017.

So what are you waiting for when it comes to submitting to Percona Live Europe Dublin 2017? Call for presentations close on July 17 2017, the conference has a pretty diverse topic structure (MySQL [and its diverse ecosystem including MariaDB Server naturally], MongoDB and other open source databases including PostgreSQL, time series stores, and more).

And I think we also have a pretty diverse conference committee in terms of expertise. You can also register now. Early bird registration ends August 8 2017.

I look forward to seeing you in Dublin, so we can share a pint of Guinness. Sláinte.

by Colin Charles at July 13, 2017 01:27 AM

July 12, 2017

MariaDB Foundation

MariaDB 10.2.7 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.2.7. This is a stable (GA) release. See the release notes and changelog for details. Download MariaDB 10.2.7 Release Notes Changelog What is MariaDB 10.2? MariaDB APT and YUM Repository Configuration Generator Thanks, and enjoy MariaDB!

The post MariaDB 10.2.7 now available appeared first on MariaDB.org.

by MariaDB Foundation at July 12, 2017 08:59 PM

Peter Zaitsev

Gh-ost benchmark against pt-online-schema-change performance

gh-ost-benchmark

In this blog post, I will run a gh-ost benchmark against the performance of pt-online-schema-change.

When gh-ost came out, I was very excited. As MySQL ROW replication became commonplace, you could use it to track changes instead of triggers. This practice is cleaner and safer compared to Percona Toolkit’s pt-online-schema-change. Since gh-ost doesn’t need triggers, I assumed it would generate lower overhead and work faster. I frequently called it “pt-online-schema-change on steroids” in my talks. Finally, I’ve found some time to check my theoretical claims with some benchmarks.

DISCLAIMER: These benchmarks correspond to one specific ALTER TABLE on the table of one specific structure and hardware configuration. I have not set up a broad set of tests. If you have other results – please comment!

Benchmark Setup Details

  • pt-online-schema-change from Percona Toolkit 3.0.3
  • gh-ost 1.0.36
  • Percona Server 5.7.18 on Ubuntu 16.04 LTS
  • Hardware: 28CPU cores/56 Threads.  128GB Memory.   Samsung 960 Pro 512GB
  • Sysbench 1.0.7

Prepare the table by running:

sysbench --threads=40 --rate=0 --report-interval=1 --percentile=99 --events=0 --time=0 --db-ps-mode=auto --mysql-user=sbtest --mysql-password=sbtest  /usr/share/sysbench/oltp_read_write.lua --table_size=10000000 prepare

The table size is about 3GB (completely fitting to innodb_buffer_pool).

Run the benchmark in “full ACID” mode with:

  • sync_binlog=1
  • innodb_flush_log_at_trx_commit=1
  • innodb_doublewrite=1

This is important as this workload is heavily commit-bound, and extensively relies on group commit.

This is the pt-online-schema-change command to alter table:

time pt-online-schema-change --execute --alter "ADD COLUMN c1 INT" D=sbtest,t=sbtest1

This the gh-ost command to alter table:

time ./gh-ost  --user="sbtest" --password="sbtest" --host=localhost --allow-on-master --database="sbtest" --table="sbtest1"  --alter="ADD COLUMN c1 INT" --execute

Tests Details

For each test the old sysbench table was dropped and a new one prepared. I tested alter table in three different cases:

  • When nothing else was running (“Idle Load”)   
  • When the system handled about 2% of load it can handle at full capacity (“Light Background Load”)
  • When the system handled about 40% of the possible load, with sysbench injected about 25% of the transactions/sec the system could handle at full load (“Heavy Background Load”)

I measured the alter table completion times for all cases, as well as the overhead generated by the alter (in other words, how much peak throughput is reduced by running alter table through the tools).

Idle Load

gh-ost benchmark 1

For the Idle Load test, pt-online-schema-change completed nearly twice as fast as gh-ost. This was a big surprise for me. I haven’t looked into the reasons or details yet, though I can see most of the CPU usage for gh-ost is on the MySQL server side. Perhaps the differences relate to the SQL used to perform non-blocking alter tables.

Light Background Load

I generated the Light Background Load by running the sysbench command below. It corresponds to a roughly 4% load, as the system can handle some 2500 transactions/sec at this concurrency under full load. Adjust the 

--rate
 value to scale it for your system.

time sysbench --threads=40 --rate=100 --report-interval=1 --percentile=99 --events=0 --time=0 --db-ps-mode=auto --mysql-user=sbtest --mysql-password=sbtest  /usr/share/sysbench/oltp_read_write.lua --table_size=10000000 run

gh-ost benchmark 2

The numbers changed (as expected), but pt-online-schema-change is still approximately twice as fast as gh-ost.

What is really interesting in this case is how a relatively light background load affects the process completion time. It took both pt-online-schema-change and gh-ost about 2.7x times longer to finish! 

Heavy Background Load

I generated the Heavy Background Load running the sysbench command below. It corresponds to a roughly 40% load, as the system can handle some 2500 transactions/sec at this concurrency under full load. Adjust

--rate
 value to scale it for your system.

time sysbench --threads=40 --rate=1000 --report-interval=1 --percentile=99 --events=0 --time=0 --db-ps-mode=auto --mysql-user=sbtest --mysql-password=sbtest  /usr/share/sysbench/oltp_read_write.lua --table_size=10000000 run

gh-ost benchmark 3

What happened in this case? When the load gets higher, gh-ost can’t keep up with binary log processing, and just never finishes at all. While this may be surprising at first, it makes sense if you think more about how these tools work. pt-online-schema-change uses triggers, and while they have a lot of limitations and overhead they can execute in parallel. gh-ost, on the other hand, processes the binary log in a single thread and might not be able to keep up.   

In MySQL 5.6 we didn’t have parallel replication, which applies writes to the same table in parallel. For that version the gh-ost limitation probably isn’t as big a deal, as such a heavy load would also cause replication lag. MySQL 5.7 has parallel replication. This makes it much easier to quickly replicate workloads that are too heavy for gh-ost to handle.

I should note that the workload being simulated in this benchmark is a rather extreme case. The table being altered by gh-ost here is at the same time handling a background load so high it can’t be replicated in a single thread.

Future versions of gh-ost could improve this issue by applying binlog events in parallel, similar to what MySQL replicas do.

An excerpt from the gh-ost log shows how it is totally backed up trying to apply the binary log:

root@rocky:/tmp# time ./gh-ost  --user="sbtest" --password="sbtest" --host=localhost --allow-on-master --database="sbtest" --table="sbtest1"  --alter="ADD COLUMN c1 INT" --execute
2017/06/25 19:16:05 binlogsyncer.go:75: [info] create BinlogSyncer with config &{99999 mysql localhost 3306 sbtest sbtest  false false <nil>}
2017/06/25 19:16:05 binlogsyncer.go:241: [info] begin to sync binlog from position (rocky-bin.000018, 640881773)
2017/06/25 19:16:05 binlogsyncer.go:134: [info] register slave for master server localhost:3306
2017/06/25 19:16:05 binlogsyncer.go:568: [info] rotate to (rocky-bin.000018, 640881773)
2017-06-25 19:16:05 ERROR parsing time "" as "2006-01-02T15:04:05.999999999Z07:00": cannot parse "" as "2006"
# Migrating `sbtest`.`sbtest1`; Ghost table is `sbtest`.`_sbtest1_gho`
# Migrating rocky:3306; inspecting rocky:3306; executing on rocky
# Migration started at Sun Jun 25 19:16:05 -0400 2017
# chunk-size: 1000; max-lag-millis: 1500ms; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# Serving on unix socket: /tmp/gh-ost.sbtest.sbtest1.sock
Copy: 0/9872432 0.0%; Applied: 0; Backlog: 0/100; Time: 0s(total), 0s(copy); streamer: rocky-bin.000018:641578191; State: migrating; ETA: N/A
Copy: 0/9872432 0.0%; Applied: 0; Backlog: 100/100; Time: 1s(total), 1s(copy); streamer: rocky-bin.000018:641626699; State: migrating; ETA: N/A
Copy: 0/9872432 0.0%; Applied: 640; Backlog: 100/100; Time: 2s(total), 2s(copy); streamer: rocky-bin.000018:641896215; State: migrating; ETA: N/A
Copy: 0/9872432 0.0%; Applied: 1310; Backlog: 100/100; Time: 3s(total), 3s(copy); streamer: rocky-bin.000018:642178659; State: migrating; ETA: N/A
Copy: 0/9872432 0.0%; Applied: 1920; Backlog: 100/100; Time: 4s(total), 4s(copy); streamer: rocky-bin.000018:642436043; State: migrating; ETA: N/A
Copy: 0/9872432 0.0%; Applied: 2600; Backlog: 100/100; Time: 5s(total), 5s(copy); streamer: rocky-bin.000018:642722777; State:
...
Copy: 0/9872432 0.0%; Applied: 120240; Backlog: 100/100; Time: 3m0s(total), 3m0s(copy); streamer: rocky-bin.000018:694142377; State: migrating; ETA: N/A
Copy: 0/9872432 0.0%; Applied: 140330; Backlog: 100/100; Time: 3m30s(total), 3m30s(copy); streamer: rocky-bin.000018:702948219; State: migrating; ETA: N/A
Copy: 0/9872432 0.0%; Applied: 160450; Backlog: 100/100; Time: 4m0s(total), 4m0s(copy); streamer: rocky-bin.000018:711775662; State: migrating; ETA: N/A
Copy: 0/9872432 0.0%; Applied: 180600; Backlog: 100/100; Time: 4m30s(total), 4m30s(copy); streamer: rocky-bin.000018:720626338; State: migrating; ETA: N/A
Copy: 0/9872432 0.0%; Applied: 200770; Backlog: 100/100; Time: 5m0s(total), 5m0s(copy); streamer: rocky-bin.000018:729509960; State: migrating; ETA: N/A

Online Schema Change Performance Impact

For this test I started the alter table, waited 60 seconds and then ran sysbench at full speed for five minutes. Then I measured how much the performance was impacted by running the tool:

sysbench --threads=40 --rate=0 --report-interval=1 --percentile=99 --events=0 --time=300 --db-ps-mode=auto --mysql-user=sbtest --mysql-password=sbtest  /usr/share/sysbench/oltp_read_write.lua --table_size=10000000 run

gh-ost benchmark 4

As we can see, gh-ost has negligible overhead in this case. pt-online-schema-change on the other hand, had peformance reduced by 12%. It is worth noting though that pt-online-schema-change still makes progress in this case (though slowly), while gh-ost would never complete.

If anything, I was surprised at how little impact the pt-online-schema-change run had on sysbench performance.

It’s important to note that in this case we only measured the overhead for the “copy” stage of the online schema change. Another thing you should worry about is the impact to performance during “table rotation” (which I have not measured).

Summary

While gh-ost introduces a number of design advantages, and gives better results in some situation, I wouldn’t call it always superior the tried and true pt-online-schema-change. At least in some cases, pt-online-schema-change offers better performance than gh-ost and completes a schema change when gh-ost is unable to keep up. Consider trying out both tools and see what works best in your situation.

by Peter Zaitsev at July 12, 2017 06:31 PM

MariaDB AB

MariaDB 10.2.7 now available

MariaDB 10.2.7 now available dbart Wed, 07/12/2017 - 11:21

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.2.7. See the release notes and changelog for details.

Download MariaDB 10.2.7

Release Notes Changelog What is MariaDB 10.2?

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.2.7. See the release notes and changelog for details.

Login or Register to post comments

by dbart at July 12, 2017 03:21 PM

July 11, 2017

Peter Zaitsev

Thread_Statistics and High Memory Usage

thread_statistics

thread_statisticsIn this blog post, we’ll look at how using thread_statistics can cause high memory usage.

I was recently working on a high memory usage issue for one of our clients, and made some interesting discoveries: high memory usage with no bounds. It was really tricky to diagnose.

Below, I am going to show you how to identify that having thread_statistics enabled causes high memory usage on busy systems with many threads.

Part 1: Issue Background

I had a server with 55.0G of available memory. Percona Server for MySQL version:

Version | 5.6.35-80.0-log Percona Server (GPL), Release 80.0, Revision f113994f31
                 Built On | debian-linux-gnu x86_64

We have calculated approximately how much memory MySQL can use in a worst case scenario for max_connections=250:

mysql> select ((@@key_buffer_size+@@innodb_buffer_pool_size+@@innodb_log_buffer_size+@@innodb_additional_mem_pool_size+@@net_buffer_length+@@query_cache_size)/1024/1024/1024)+((@@sort_buffer_size+@@myisam_sort_buffer_size+@@read_buffer_size+@@join_buffer_size+@@read_rnd_buffer_size+@@thread_stack)/1024/1024/1024*250);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ((@@key_buffer_size+@@innodb_buffer_pool_size+@@innodb_log_buffer_size+@@innodb_additional_mem_pool_size+@@net_buffer_length+@@query_cache_size)/1024/1024/1024)+((@@sort_buffer_size+@@myisam_sort_buffer_size+@@read_buffer_size+@@join_buffer_size+@@read_rnd |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 12.445816040039 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

So in our case, this shouldn’t be more than ~12.5G.

After MySQL Server has been restarted, it allocated about 7G. After running for a week, it reached 44G:

# Memory #####################################################
       Total | 55.0G
        Free | 8.2G
        Used | physical = 46.8G, swap allocated = 0.0, swap used = 0.0, virtual = 46.8G
      Shared | 560.0k
     Buffers | 206.5M
      Caches | 1.0G
       Dirty | 7392 kB
     UsedRSS | 45.3G
  Swappiness | 60
 DirtyPolicy | 20, 10
 DirtyStatus | 0, 0

# Top Processes ##############################################
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2074 mysql     20   0 53.091g 0.044t   8952 S 126.0 81.7  34919:49 mysqld

I checked everything that could be related to the high memory usage (for example, operating system settings such as Transparent Huge Pages (THP), etc.). But I still didn’t find the cause (THP was disabled on the server). I asked my teammates if they had any ideas.

Part 2: Team Is on Rescue

After brainstorming and reviewing the status, metrics and profiles again and again, my colleague (Yves Trudeau) pointed out that User Statistics is enabled on the server.

User Statistics adds several INFORMATION_SCHEMA tables, several commands, and the

userstat
 variable. The tables and commands can be used to better understand different server activity, and to identify the different load sources. Check out the documentation for more information.

mysql> show global variables like 'user%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| userstat      | ON    |
+---------------+-------+
mysql> show global variables like 'thread_statistics%';
+-------------------------------+---------------------------+
| Variable_name                 | Value                     |
+-------------------------------+---------------------------+
| thread_statistics             | ON                        |
+-------------------------------+---------------------------+

Since we saw many threads running, it was a good option to verify this as the cause of the issue.

Part 3: Cause Verification – Did It Really Eat Our Memory?

I decided to apply some calculations, and the following test cases to verify the cause:

  1. Looking at the THREAD_STATISTICS table in the INFORMATION_SCHEMA, we can see that for each connection there is a row like the following:
    mysql> select * from THREAD_STATISTICS limit 1G
    *************************** 1. row ***************************
    THREAD_ID: 3566
    TOTAL_CONNECTIONS: 1
    CONCURRENT_CONNECTIONS: 0
    CONNECTED_TIME: 30
    BUSY_TIME: 0
    CPU_TIME: 0
    BYTES_RECEIVED: 495
    BYTES_SENT: 0
    BINLOG_BYTES_WRITTEN: 0
    ROWS_FETCHED: 27
    ROWS_UPDATED: 0
    TABLE_ROWS_READ: 0
    SELECT_COMMANDS: 11
    UPDATE_COMMANDS: 0
    OTHER_COMMANDS: 0
    COMMIT_TRANSACTIONS: 0
    ROLLBACK_TRANSACTIONS: 0
    DENIED_CONNECTIONS: 0
    LOST_CONNECTIONS: 0
    ACCESS_DENIED: 0
    EMPTY_QUERIES: 0
    TOTAL_SSL_CONNECTIONS: 1
    1 row in set (0,00 sec)
  2. We have 22 columns, each of them BIGINT, which gives us ~ 176 bytes per row.
  3. Let’s calculate how many rows we have in this table at this time, and check once again in an hour:
    mysql> select count(*) from information_schema.thread_statistics;
    +----------+
    | count(*) |
    +----------+
    | 7864343  |
    +----------+
    1 row in set (15.35 sec)

    In an hour:

    mysql> select count(*) from information_schema.thread_statistics;
    +----------+
    | count(*) |
    +----------+
    | 12190801 |
    +----------+
    1 row in set (24.46 sec)

  4. Now let’s check on how much memory is currently in use:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    2096 mysql 20 0 12.164g 0.010t 16036 S 173.4 18.9 2274:51 mysqld

  5. We have 12190801 rows in the THREAD_STATISTICS table, which is ~2G in size.
  6. Issuing the following statement cleans up the statistics:

    mysql> flush thread_statistics;
    mysql> select count(*) from information_schema.thread_statistics;
    +----------+
    | count(*) |
    +----------+
    | 0        |
    +----------+
    1 row in set (00.00 sec)

  7. Now, let’s check again on how much memory is in use:

    ID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2096 mysql 20 0 12.164g 7.855g 16036 S 99.7 14.3 2286:24 mysqld

As we can see, memory usage drops to the approximate value of 2G that we had calculated earlier!

That was the root cause of the high memory usage in this case.

Conclusion

User Statistics (basically Thread_Statistics) is a great feature that allows us to identify load sources and better understand server activity. At the same time, though, it can be dangerous (from the memory usage point of view) to use as a permanent monitoring solution due to no limitations on memory usage.

As a reminder, thread_statistics is NOT enabled by default when you enable User_Statistics. If you have enabled Thread_Statistics for monitoring purposes, please don’t forget to pay attention to it.

As a next step, we are considering submitting a feature request to implement some default limits that can prevent Out of Memory issues on busy systems.

by Alex Poritskiy at July 11, 2017 08:15 PM

MariaDB AB

New in Microsoft Azure: MariaDB TX 2.0

New in Microsoft Azure: MariaDB TX 2.0 kolbekegel Tue, 07/11/2017 - 12:56

Now in the Microsoft Azure Marketplace: MariaDB TX 2.0 with MariaDB Server 10.2, configured in a MariaDB Galera Cluster topology, with MariaDB MaxScale 2.1, and Azure Managed Disks.

What is MariaDB TX 2.0? This is the new name of MariaDB's flagship subscription offering for transactional (OLTP) workloads. MariaDB TX 2.0 includes MariaDB Server 10.2, featuring the Galera clustering technology we use in this solution, as well as MariaDB MaxScale 2.1.

The MariaDB TX 2.0 solution in the Azure Marketplace deploys a cluster with three MariaDB Server data nodes and two MariaDB MaxScale proxy nodes. This provides HA at the data store layer as well as the connectivity layer. MaxScale can automatically split reads and writes between the backend data nodes in the MariaDB Cluster, and it can automatically choose a new master if the topology changes. An Azure Load Balancer sits in front of the MaxScale nodes to seamlessly manage a switchover or failover between the MaxScale nodes.

MariaDB Server 10.2 includes lots of really interesting functionality, including Window Functions, Common Table Expressions (CTEs), CHECK constraints, new JSON functions, and lots more. Take a look at our recent MariaDB Server 10.2 new features for the full list and links to details.

New in the MariaDB TX 2.0 solution in Azure is a move to Ubuntu 16.04 LTS from CentOS 7. This is mostly transparent to users, as all the same packages and functionality are available as in previous iterations of this solution. This move makes maintenance of the solution a bit easier, and it seems that there's better support in Azure for their Ubuntu VMs than for CentOS, so this may even bring some performance and stability improvements.

Another notable change is a move to Azure Managed Disks. This finally brings Azure storage in line with what can be found from other cloud providers. Azure Managed Disks are a relatively new feature of Azure, and they'll be a best practice going forward, so I'm happy to have been able to add early support for them to our MariaDB TX 2.0 solution in the Azure Marketplace. Azure Managed Disks remove the complexity of having to directly manage Storage Accounts, which means a little bit less configuration to kick off the deployment process. It also gives access to Snapshots, Azure Disk Encryption, increased resiliency, and the ability to upgrade from Standard (hard disk) to Premium (SSD) storage without having to re-create the VM. Learn more about Azure Managed Disks.

Find MariaDB TX 2.0 in the Azure Marketplace. From there you can click "Get It Now" to deploy the solution into your existing Azure Subscription, or you can click "Test Drive" to start a 2-hour Test Drive to kick the tires.

The free-of-charge Test Drive deploys the same topology, including MariaDB TX 2.0, MariaDB Server 10.2, MariaDB MaxScale 2.1, and the move to Ubuntu 16.04 LTS. The Test Drive doesn't use Managed Disks, however. To get a look at that experience, you'll need to deploy the MariaDB TX 2.0 solution yourself into your own Azure Subscription.

Maybe you just want to start simple, with a single instance of MariaDB Server? If so, you can also start a standalone MariaDB Server instance from the Azure Marketplace.

Documentation and more information for all of these solutions can be found in the MariaDB Knowledge Base.

Have questions? If you're a customer with a MariaDB TX or MariaDB Enterprise subscription, please contact our MariaDB support team. If you're not a customer yet, get in touch with us — we're looking forwarding to hearing from you.

Now in the Microsoft Azure Marketplace: MariaDB TX 2.0 with MariaDB Server 10.2, configured in a MariaDB Galera Cluster topology, with MariaDB MaxScale 2.1, and Azure Managed Disks.

Login or Register to post comments

by kolbekegel at July 11, 2017 04:56 PM

Peter Zaitsev

Webinar Wednesday July 12, 2017: MongoDB Index Types – How, When and Where Should They Be Used?

MongoDB Index Types

MongoDB Index TypesJoin Percona’s Senior Technical Services Engineer, Adamo Tonete as he presents MongoDB Index Types: How, When and Where Should They Be Used? on Wednesday, July 12, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7).

MongoDB has 12 index types. Do you know how each works, or when you should use each of them? This talk will arm you with this knowledge, and help you understand how indexes impact performance, storage and even the sharding of your data. We will also discuss some solid index operational practices, as well as some settings for things like TTL you might not know exist. The contents of this webinar will make you a Rock Star!

Register for the webinar here.

MongoDB Index TypesAdamo Tonete, Senior Technical Services Engineer

Adamo joined Percona in 2015, after working as a MongoDB/MySQL database administrator for three years. As the main database admin of a startup, he was responsible for suggesting the best architecture and data flows for a worldwide company in a 24/7 environment. Before that, he worked as a Microsoft SQL Server DBA for a large e-commerce company, working mainly on performance tuning and automation. Adamo has almost eight years of experience working as a DBA, and in the past three years he has moved to NoSQL technologies (without giving up relational databases).

by Emily Ikuta at July 11, 2017 03:32 PM

July 10, 2017

Peter Zaitsev

Percona Server for MongoDB 3.4.5-1.5 is Now Available

Percona Server for MongoDB 3.4

Percona Server for MongoDB 3.4Percona announces the release of Percona Server for MongoDB 3.4.5-1.5 on July 10, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, fully compatible, highly-scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.5 and includes the following additional changes:

New Features

Bugs Fixed

  • #PSMDB-67: Fixed mongod service status messages.

by Alexey Zhebel at July 10, 2017 06:37 PM

Webinar Tuesday July 11, 2017: Securing Your MySQL/MariaDB Data

Securing Your MySQL/MariaDB Data

Securing Your MySQL/MariaDB DataJoin Percona’s Chief Evangelist, Colin Charles as he presents Securing Your MySQL/MariaDB Data on Tuesday, July 11, 2017 at 7:00 am PDT / 10:00 am EDT (UTC-7).

This webinar will discuss the features of MySQL/MariaDB that when enabled and used improve the default usage of MySQL. Many cloud-based applications fail to:

  • Use appropriate filesystem permissions
  • Employ TLS/SSL for connections
  • Require TLS/SSL with MySQL replication
  • Use external authentication plugins (LDAP, PAM, Kerberos)
  • Encrypt all your data at rest
  • Monitor your database with the audit plugin
  • Review and rejecting SQL injections
  • Design application access using traditional firewall technology
  • Employ other MySQL/MariaDB security features

This webinar will demonstrate and advise on how to correctly implement the features above. We will end the presentation with some simple steps on how to hack a MySQL installation.

You can register for the webinar here.

Securing Your MySQLColin Charles, Percona Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team of MariaDB Server in 2009, worked at MySQL since 2005 and been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and OpenOffice.org projects. He’s well known within open source communities in APAC, and has spoken at many conferences.

by Emily Ikuta at July 10, 2017 04:51 PM

July 09, 2017

Valeriy Kravchuk

MySQL Support Engineer's Chronicles, Issue #7

This week in Support was busy enough for me. Among other things I had to study all possible reasons (other than obvious query cache impact) for queries hanging in "query end" status and noted Bug #80652 related to binlog group commit and fixed in MySQL 5.7.17+ and 8.0.1+ only. The case I had to review was related to Galera though, and I suggest you to note that "query end" may be related to Galera replication stall. Studying this path further soon brought lp:1197771 - "Cluster stalls while distributing transaction" to my attention again, so I asked about proper status for it on Facebook. As it happens way too often recently, I've got few 'Likes" but no further comments, neither from Percona nor from Codership. It seems good old times of my efficient bugs escalations via Facebook had gone...

Does it mean I am going to stop writing and sharing posts about bugs on Facebook? Definitely it does NOT! One of the recent reasons is this post about Bug #86902 that, after being shared by me, got a quick comment and clarification by Sveta Smirnova that the problem (performance problem when binary logging is turned OFF and you use more than one transactional storage engine concurrently) is actually know, - it was  reported as Bug #80870 (that is still "Open", what a shame).

So, even if some Oracle or Percona employees will continue to ignore new bugs reports or my Facebook (re-)posts about MySQL bugs, at least bug reporter gets more chances to get help from Community, additional references and clarifications. So, stay tuned - I'll keep writing about MySQL bugs in this blog and, on a more regular basis and probably for a wider audience, - on Facebook. As popular Mahatma Gandhi's quote says:
"First they ignore you, then they laugh at you, then they fight you, then you win."
In my case with public MySQL bugs and the way they are processed by Oracle it was not strictly in this order, and I am yet to see people laughing at me for this reason, but I think I already won as since September 2012 public MySQL bugs database still exists and several Oracle engineers still pay attention to community bug reports there.

I had reported one bug myself this week, Bug #86930 - "Wrong results for queries with row constructors and information_schema". It was quickly verified by Miguel Solorzano. It's a funny bug (with workarounds), but not as scary as few other bugs recently reported and described in this post by Jean-François Gagné.Note, that as he stated, there are more than just that 2 bugs he mentioned (Bug #86926 and Bug #86927). Everybody in Oracle who should know that already knows it, and I know it. I keep stating that the implementation of InnoDB persistent statistics gives us more troubles than benefits. Last time I cared and complained that much it was about Bug #70629 - "InnoDB updated rec_per_key[] statistics not published to the optimizer enough often". But, trust me, this recent set of problems Jean-François found is way more serious.

I'd like to share few references I found extremely useful this week while working on Support issues this week. First of all, if you set up SSL connections for MySQL, take time to review this old post by my former colleague from Percona Roman Vynar, and another one by Matthew Boehm. This may save you some time and debugging efforts. If you care about MariaDB ColumnStore performance (I have to), check this KB article. If you want to replicate data to the table with somewhat different column types on slave, make sure to read the manual. Finally, if you care enough to keep SELinux enabled and use Galera clusters with any non-default ports or directories, make sure to read this blog post.

I've noted a couple of serious decisions made by Oracle in regard to MySQL. First of the is to discontinue work on MySQL Internals manual and integrate it some how into the main manual. You can find related text added recently to some bug reports, like Bug #67989 - "MySQL Internals documentation missing 5.6 binlog protocol parts" by my colleague Andrew Hutchings:
"No more updates are made to the MySQL Internals documentation, because it's in the process of being replaced by https://dev.mysql.com/doc/dev/mysql-server/latest/."
Sounds promising, isn't it?

The other one, that is going to be accepted way better, IMHO, is to discontinue work on query cache. Check Bug #86046 - "FROM sub query is cached by mistake" by Meiji Kimura, for example:
"[7 Jul 9:09] Erlend Dahl

MySQL will no longer invest in the query cache, see:

So, that were some of the links I've noted during this week. Some of the topics mentioned above may be continued and presented in more details in my further posts.

by Valeriy Kravchuk (noreply@blogger.com) at July 09, 2017 05:45 PM

July 08, 2017

Peter Zaitsev

MongoDB Indexing Types: How, When and Where Should They Be Used?

MongoDB Index Types

MongoDB IndexingIn this blog post, we will talk about MongoDB indexing, and the different types of indexes that are available in MongoDB.

Note: We are hosting a webinar on July 12, 2017, where I will talk about MongoDB indexes and how to choose a good indexing plan.

MongoDB is a NoSQL database that is document-oriented. NoSQL databases share many features with relational databases, and one of them is indexes. The question is how are such documents indexed in the database?

Remember that because MongoDB is a database that writes JSONs, there is no predefined schema in the document. The same field can be a string or an integer – depending on the document.

MongoDB, as well as other databases, use B-trees to index. With some exceptions, the algorithm is the same as a relational database.

The B-tree can use integers and strings together to organize data. The most important thing to know is that an index-less database must read all the documents to filter what you want, while an indexed database can – through indexes – find the documents quickly. Imagine you are looking for a book in a disorganized library. This is how the query optimizer feels when we are looking for something that is not indexed.

There are several different types of indexes available: single field, compound indexes, hashed indexes, geoIndexes, unique, sparse, partial, text… and so on. All of them help the database in some way, although they obviously also get in the way of write performance if used too much.

  • Single fields. Single fields are simple indexes that index only one field in a collection. MongoDB can usually only use one index per query, but in some cases the database can take advantage of more than one index to reply to a query (this is called index intersection). Also, $or actually executes more than one query at a time. Therefore $or and $in can also use more than one index.
  • Compound indexes. Compound indexes are indexes that have more than one indexed field, so ideally the most restrictive field should be to the left of the B-tree. If you want to index by sex and birth, for instance, the index should begin by birth as it is much more restrictive than sex.
  • Hashed indexes. Shards use hashed indexes, and create a hash according to the field value to spread the writes across the sharded instances.
  • GeoIndexes. GeoIndexes are a special index type that allows a search based on location, distance from a point and many other different features.
  • Unique indexes. Unique indexes work as in relational databases. They guarantee that the value doesn’t repeat and raise an error when we try to insert a duplicated value. Unique doesn’t work across shards.
  • Text indexes. Text indexes can work better with indexes than a single indexed field. There are different flags we can use, like giving weights to control the results or using different collections.
  • Sparse/Partial indexes. Sparse and partial indexes seem very similar. However, sparse indexes will index only an existing field and not check its value, while partial indexes will apply a filter (like greater than) to a field to index. This means the partial index doesn’t index all the documents with the existing field, but only documents that match the create index filter.

We will discuss and talk more about indexes and how they work in my webinar MongoDB® Index Types: How, When and Where Should They Be Used? If you have questions, feel free to ask them in the comments below and I will try to answer all of them in the webinar (or in a complementary post).

I hope this blog post was useful, please feel free to reach out me on twitter @AdamoTonete or @percona.

by Adamo Tonete at July 08, 2017 10:18 PM

July 07, 2017

Peter Zaitsev

Last Resort: How to Use a Backup to Start a Secondary Instance for MongoDB?

Backup to Start a Secondary

Backup to Start a SecondaryIn this blog post, I’ll look at how you can use a backup to start a secondary instance for MongoDB.

Although the documentation says it is not possible to use a backup to start a secondary, sometimes this is the only possible way to start a new instance. In this blog post, we will explain how to bypass this limitation and use a backup to start a secondary instance.

The initial sync/rsync or snapshot works fine when the instances are in the same data center, but it can fail or be really slow. Much slower than moving a compressed backup between data centers.

Not every backup can be used as a source for starting a replica set. The backup must have the oplog file. This means the backup must be done in a previously existent replica set using the

--oplog flag
 point in time backup when dumping the collections. The time spent to move and restore the backup file must be less than the oplog window.

Please follow the next steps to create a secondary from a backup:

  1. Create a backup using the
    --oplog
     command.
    mongodump --oplog -o /tmp/backup/
  2. Backup the replica set collection from the local database.

    mongodump -d local -c system.replset
  3. After backup finishes please confirm the oplog.rs file is in the backup folder.
  4. Use
    bsondump
     to convert the oplog to JSON. We will use the last oplog entry as a starting point for the replica set.
    bsondump oplog.rs > oplog.rs.txt
  5. Initiate the new instance without the replica set parameter configured. At this point, the instance will act as a single instance.
  6. Restore the database normally using
    --oplogreplay
     to apply all the oplogs that have been recorded while the backup was running.
    mongorestore /tmp/backup --oplogReplay
  7. Connect to this server and use the local database to create the oplog.rs collection. Please use the same value as the other members (e.g., 20 GB).
    mongo
    use local
    db.runCommand( { create: "oplog.rs", capped: true, size: (20 * 1024 * 1024 * 1024) } )
  8. From the oplog.rs.txt generated in step 4, get the last line and copy the fields ts and h values to a MongoDB document.

    tail -n 1 opslog.rs.txt
  9. Insert the JSON value to the oplog.rs collection that was created before.
    mongo
    use local
    db.oplog.rs.insert({ ts: Timestamp(12354454,1), h: NumberLong("-3434387384732843")})
  10. Restore the replset collection to the local database.
    mongorestore -d local -c system.replset ./backup/repliset.bson
  11. Stop the service and edit the parameter replica set name to match the existing replica set.
  12. Connect to the primary and add this new host. The new host must start catching up the oplog and get in sync after a few hours/minutes, depending on the number of operations the replica set handles. It is important to consider adding this new secondary as a hidden secondary, without votes if possible, to avoid triggering an election. When the secondary is added to the replica set drivers, it will start using this host to perform reads. If you don’t add the server with
    hidden: true
    , the application will read inconsistent data (old data).

    mongo
    PRIMARY>rs.add({_id : <your id>, host: "<newhost:27017>", hidden : true, votes : 0, priority :0})
  13. Please check the replication lag, and once the seconds behind master is near to zero, change the host parameters in the replica set to
    hidden: false
     and priority or
    votes
    .
    mongo
    PRIMARY> rs.printSlaveReplicationInfo()
  14. We are considering a replica set with three members, where the new secondary has the ID 2 in the member’s array. Use the following command to unhide the secondary and make it available for reads. The priority and votes depend on your environment. Please notice you might need to change the member ID.
    mongo
    PRIMARY> cfg = rs.config()
    cfg.members[2].hidden = false
    cfg.members[2].votes = 1
    cfg.members[2].priority = 1
    PRIMARY> rs.reconfig(rs)

I hope this tutorial helps in an emergency situation. Please consider using initial sync, disk snapshots and hot backups before using this method.

Feel free to reach out me on twitter @AdamoTonete or @percona.

by Adamo Tonete at July 07, 2017 06:56 PM

July 06, 2017

Peter Zaitsev

ClickHouse: One Year!

ClickHouse One Year

ClickHouse One YearIn this blog, we’ll look at ClickHouse on its one year anniversary.

It’s been a year already since the Yandex team released ClickHouse as open source software. I’ve had an interest in this project from the very start, as I didn’t think there was an open source analytical database that could compete with industry leaders like Vertica (for example).

This was an exciting year for ClickHouse early adopters. Let’s look at what it accomplished so far.

ClickHouse initially generated interest due to the Yandex name – the most popular search engine in Russia. It wasn’t long before jaw-dropping responses popped up: guys, this thing is crazy fast! Many early adopters who tried ClickHouse were really impressed.

Fast doesn’t mean convenient though. That was the main community concern to ClickHouse over the past months. Developed as an internal project for an internal customer (Yandex.Metrica), ClickHouse had a lot of limitations for general community use. It took several months before Yandex could restructure the team and mindset, establish proper communication with the community and start addressing external needs. There are still a lot of things that need to be done. The public roadmap is not easily available, for example, but the wheels are pointed in the right direction. The ClickHouse team has added a lot of the features people were screaming for, and more are in ClickHouse’s future plans.

The Yandex guys are actively attending international conferences, and they were:

They are speaking much more in Russia (no big surprise).

We were very excited by Yandex’s ClickHouse performance claims at Percona, and could not resist making our own benchmarks:

ClickHouse did very well in these benchmarks. There are many other benchmarks by other groups as well, including a benchmark against Amazon RedShift by Altinity.

The first ClickHouse production deployments outside of Yandex started in October-November 2016. Today, Yandex reports that dozens of companies around the world are using ClickHouse in production, with the biggest installations operating with up to several petabytes of data. Hundreds of other enterprises are deploying pilot installations or actively evaluating the software.

There are also interesting reports from CloudFare (How Cloudflare analyzes 1M DNS queries per second) and from Carto (Geospatial processing with ClickHouse).

There are also various community projects around ClickHouse worth mentioning:

Percona is also working to adapt ClickHouse to our projects. We are using ClickHouse to handle Query Analytics and as a long term metrics data for Metrics inside a new version (under development) of Percona Monitoring and Management.

I also will be speaking about ClickHouse at BIG DATA DAY LA 2017 on August 5th, 2017. You are welcome to attend if you are in Los Angeles this day!

ClickHouse has the potential to become one of the leading open source analytical DBMSs – much like MySQL and PostreSQL are leaders for OLTP workloads. We will see in the next couple of years if it happens or not. Congratulations to the Yandex team on their one-year milestone!

by Vadim Tkachenko at July 06, 2017 06:17 PM

Jean-Jerome Schmidt

MySQL on Docker: Running Galera Cluster in Production with ClusterControl on Kubernetes

In our “MySQL on Docker” blog series, we continue our quest to make Galera Cluster run smoothly in different container environments. One of the most important things when running a database service, whether in containers or bare-metal, is to eliminate the risk of data loss. We will see how we can leverage a promising feature in Kubernetes called StatefulSet, which orchestrates container deployment in a more predictable and controllable fashion.

In our previous blog post, we showed how one can deploy a Galera Cluster within Docker with the help of Kubernetes as orchestration tool. However, it is only about deployment. Running a database in production requires more than just deployment - we need to think about monitoring, backups, upgrades, recovery from failures, topology changes and so on. This is where ClusterControl comes into the picture, as it completes the stack and makes it production ready. In simple words, Kubernetes takes care of database deployment and scaling while ClusterControl fills in the missing components including configuration, monitoring and ongoing management.

ClusterControl on Docker

This blog post describes how ClusterControl runs in a Docker environment. The Docker image has been updated, and now comes with the standard ClusterControl packages in the latest stable branch with additional support for container orchestration platforms like Docker Swarm and Kubernetes, we’ll describe this further below. You can also use the image to deploy a database cluster on a standalone Docker host.

Details at the Github repository or Docker Hub page.

ClusterControl on Kubernetes

The updated Docker image now supports automatic deployment of database containers scheduled by Kubernetes. The steps are similar to the Docker Swarm implementation, where the user decides the specs of the database cluster and ClusterControl automates the actual deployment.

ClusterControl can be deployed as ReplicaSet or StatefulSet. Since it’s a single instance, either way works. The only significant difference is the container identification would be easier with StatefulSet, since it provides consistent identifiers like the container hostname, IP address, DNS and storage. ClusterControl also provides service discovery for the new cluster deployment.

To deploy ClusterControl on Kubernetes, the following setup is recommended:

  • Use centralized persistent volumes supported by Kubernetes plugins (e.g. NFS, iSCSI) for the following paths:
    • /etc/cmon.d - ClusterControl configuration directory
    • /var/lib/mysql - ClusterControl cmon and dcps databases
  • Create 2 services for this pod:
    • One for internal communication between pods (expose port 80 and 3306)
    • One for external communication to outside world (expose port 80, 443 using NodePort or LoadBalancer)

In this example, we are going to use simple NFS. Make sure you have an NFS server ready. For the sake of simplicity, we are going to demonstrate this deployment on a 3-host Kubernetes cluster (1 master + 2 Kubernetes nodes). For production use, please use at least 3 Kubernetes nodes to minimize the risk of losing quorum.

With that in place, we can deploy the ClusterControl as something like this:

On the NFS server (kube1.local), install NFS server and client packages and export the following paths:

  • /storage/pods/cc/cmon.d - to be mapped with /etc/cmon.d
  • /storage/pods/cc/datadir - to be mapped with /var/lib/mysql

Ensure to restart NFS service to apply the changes. Then create PVs and PVCs, as shown in cc-pv-pvc.yml:

$ kubectl create -f cc-pv-pvc.yml

We are now ready to start a replica of the ClusterControl pod. Send cc-rs.yml to Kubernetes master:

$ kubectl create -f cc-rs.yml

ClusterControl is now accessible on port 30080 on any of the Kubernetes nodes, for example, http://kube1.local:30080/clustercontrol. With this approach (ReplicaSet + PV + PVC), the ClusterControl pod would survive if the physical host goes down. Kubernetes will automatically schedule the pod to the other available hosts and ClusterControl will be bootstrapped from the last existing dataset which is available through NFS.

Galera Cluster on Kubernetes

If you would like to use the ClusterControl automatic deployment feature, simply send the following YAML files to the Kubernetes master:

$ kubectl create -f cc-galera-pv-pvc.yml
$ kubectl create -f cc-galera-ss.yml

Details on the definition files can be found here - cc-galera-pv-pvc.yml and cc-galera-ss.yml.

The above commands tell Kubernetes to create 3 PVs, 3 PVCs and 3 pods running as StatefulSet using a generic base image called “centos-ssh”. In this example, the database cluster that we are going to deploy is MariaDB 10.1. Once the containers are started, they will register themselves to ClusterControl CMON database. ClusterControl will then pick up the containers’ hostname and start the deployment based on the variables that have been passed.

You can check the progress directly from the ClusterControl UI. Once the deployment has finished, our architecture will look something like this:

HAProxy as Load Balancer

Kubernetes comes with an internal load balancing capability via the Service component when distributing traffic to the backend pods. This is good enough if the balancing (least connections) fits your workload. In some cases, where your application needs to send queries to a single master due to deadlock or strict read-after-write semantics, you have to create another Kubernetes service with a proper selector to redirect the incoming connection to one and only one pod. If this single pod goes down, there is a chance of service interruption when Kubernetes schedules it again to another available node. What we are trying to highlight here is that if you’d want better control on what’s being sent to the backend Galera Cluster, something like HAProxy (or even ProxySQL) is pretty good at that.

You can deploy HAProxy as a two-pod ReplicaSet and use ClusterControl to deploy, configure and manage it. Simply post this YAML definition to Kubernetes:

$ kubectl create -f cc-haproxy-rs.yml

The above definition instructs Kubernetes to create a service called cc-haproxy and run two replicas of “severalnines/centos-ssh” image without automatic deployment (AUTO_DEPLOYMENT=0). These pods will then connect to the ClusterControl pod and perform automatic passwordless SSH setup. What you need to do now is to log into ClusterControl UI and start the deployment of HAProxy.

Firstly, retrieve the IP address of HAProxy pods:

$ kubectl describe pods -l app=cc-haproxy | grep IP
IP:        10.44.0.6
IP:        10.36.0.5

Then use the address as the HAProxy Address under ClusterControl -> choose the DB cluster -> Manage -> Load Balancer -> HAProxy -> Deploy HAProxy, as shown below:

**Repeat the above step for the second HAProxy instance.

Once done, our Galera Cluster can be accessible through the “cc-haproxy” service on port 3307 internally (within Kubernetes network space) or port 30016 externally (outside world). The connection will be load balanced between these HAProxy instances. At this point, our architecture can be illustrated as the following:

With this setup, you have maximum control of your load-balanced Galera Cluster running on Docker. Kubernetes brings something good to the table by supporting stateful service orchestration.

Do give it a try. We would love to hear how you get along.

by ashraf at July 06, 2017 01:24 PM

July 05, 2017

Peter Zaitsev

Differences in PREPARE Statement Error Handling with Binary and Text Protocol (Percona XtraDB Cluster / Galera)

PREPARE Statement

PREPARE StatementIn this blog, we’ll look at the differences in how a PREPARE statement handles errors in binary and text protocols.

Introduction

Since Percona XtraDB Cluster is a multi-master solution, when an application executes conflicting workloads one of the workloads gets rolled back with a DEADLOCK error. While the same holds true even if you fire the workload through a PREPARE statement, there are differences between using the MySQL connector API (with binary protocol) and the MySQL client (with text protocol). Let’s look at these differences with the help of an example.

Base Workload

  • Say we have a two-node cluster (n1 and n2) with the following base schema and tables:
    use test;
    create table t (i int, k int, primary key pk(i)) engine=innodb;
    insert into t values (1, 10), (2, 20), (3, 30);
    select * from t;
  • Workload (w1) on node-1 (n1):
    prepare st1 from 'update t set k = k + 100 where i > 1';
    execute st1;
  • Workload (w2) on node-2 (n2):
    prepare st1 from 'update t set k = k + 100 where i > 1';
    execute st1;
  • The workloads are conflicting, and the first node to commit wins. Let’s assume n1 commits first, and the write-set is replicated to n2 while n2 is busy executing the local workload.

Different Scenarios Based on Configuration

wsrep_retry_autocommit > 0
  • n2 tries to apply the replicated workload, thereby forcefully aborting (brute-force abort) the local workload (w2).
  • The forcefully aborted workload is retried (based on wsrep_retry_autocommit value (default is 1)).
  • w2 succeeds on retry, and the new state of table t would be:
    (1, 10), (2, 220), (3, 230);
  • The user can increase wsrep_retry_autocommit to a higher value, though we don’t recommend this as a higher value increases pressure on the system to retry failed workloads. A retry doesn’t ensure things always pass if additional conflicting replicated workload are lined up in the certification queue. Also, it is important to note that a retry works only if
    AUTO_COMMIT=ON
    . For a begin-commit transaction, retry is not applicable. If we abort this transaction, then the complete transaction is rolled back. The workload executor application should have the logic to expect workload failure due to conflict and handle it accordingly.
wsrep_retry_autocommit = 0
  • Let’s say the user sets wsrep_retry_autocommit=0 on node-2 and executes the same workload.
  • This time it doesn’t retry the local workload (w2). Instead, it sends an error to the end-user:
    mysql> execute st1;
    ERROR 1213 (40001): WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction
    mysql> select * from t;
    +---+------+
    | i | k |
    +---+------+
    | 1 | 10 |
    | 2 | 120 |
    | 3 | 130 |
    +---+------+
    3 rows in set (0.00 sec)
  • Only the replicated workload (w1) is applied, and local workload w2 fails with a DEADLOCK error.
Use of MySQL Connector API

Now let’s try to execute these workloads through Connector API. Connector API uses a binary protocol, which means it directly invokes the PREPARE statement and executes the statement using dedicated command codes (

COM_STMT_PREPARE
 and
COM_STMT_EXECUTE
).

While the conflicting transaction internal abort remains same, the

COM_QUERY
 command code (text protocol) handling corrects the “query interrupted error” and resets it to “deadlock error” based on the
wsrep_conflict_state
 value. It also has logic to retry the auto-commit enable statement.

This error handling and retry logic is not present when the

COM_STMT_PREPARE
 and
COM_STMT_EXECUTE
 command codes are used. This means the user sees a different error (non-masked error) when we try the same workload through an application using the MySQL Connector API:

Output of application from node-2:
Statement init OK!
Statement prepare OK!
Statement execution failed: Query execution was interrupted
exact error-code to be specific: ER_QUERY_INTERRUPTED - 1317 - "Query execution was interrupted"

As you can see above, the statement/workload execution fails with “Query execution was interrupted” error vs. “deadlock” error (as seen with the normal MySQL client).

Conclusion

While the core execution remains same, error handling is different with text and binary protocol. This issue/limitation was always present, but it’s more prominent since sysbench-1.0 started using the PREPARE statement as a default.

What does this means to end-user/application?

  • Nothing has changed from the Percona XtraDB Cluster perspective, except that if your application is planning to use a binary protocol (like Connector API), then the application should able to handle both the errors and retry the failed transaction accordingly.

by Krunal Bauskar at July 05, 2017 06:22 PM

Webinar Thursday July 6, 2017: Security and Encryption in the MySQL World

Security and Encryption

Security and EncryptionJoin Percona’s Solutions Engineer, Dimitri Vanoverbeke as he presents Security and Encryption in the MySQL World on Thursday, July 6, 2017, at 7:00 am PDT / 10:00 am EDT (UTC-7).

 

MySQL and MariaDB Server provide many new features that help with security and encryption, both of which are extremely important in today’s world. Learn how to use these features, from roles to at-rest-encryption, to increase security. At the end of the webinar, you should understand how to have a securely configured MySQL instance!

Register for the webinar here.

dimitriDimitri Vanoverbeke, Solutions Engineer

At the age of 7, Dimitri received his first computer. Since then, he’s addicted to anything with a digital pulse. Dimitri has been active in IT professionally since 2003, when he took various roles from internal system engineering to consulting. Before joining Percona, Dimitri worked as an open source consultant for a leading open source software consulting firm in Belgium. During his career, Dimitri became familiar with a broad range of open source solutions and with the devops philosophy. When not glued to his computer screen, he enjoys traveling, cultural activities, basketball and the great outdoors.

by Emily Ikuta at July 05, 2017 04:25 PM

July 04, 2017

MariaDB Foundation

MariaDB 10.1.25 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.25, MariaDB Connector/J 2.0.3, and MariaDB Connector/J 1.6.2. See the release notes and changelogs for details. Download MariaDB 10.1.25 Release Notes Changelog What is MariaDB 10.1? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Connector/J 2.0.3 Release Notes Changelog About MariaDB Connector/J Download […]

The post MariaDB 10.1.25 now available appeared first on MariaDB.org.

by MariaDB Foundation at July 04, 2017 09:00 PM

MariaDB AB

MariaDB 10.1.25 now available

MariaDB 10.1.25 now available dbart Tue, 07/04/2017 - 11:07

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.25, MariaDB Connector/J 2.0.3, and MariaDB Connector/J 1.6.2. See the release notes and changelogs for details.

Download MariaDB 10.1.25

Release Notes Changelog What is MariaDB 10.1?


Download MariaDB Connector/J 2.0.3

Release Notes Changelog About MariaDB Connector/j


Download MariaDB Connector/J 1.6.2

Release Notes Changelog About MariaDB Connector/j

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.25, MariaDB Connector/J 2.0.3, and MariaDB Connector/J 1.6.2. See the release notes and changelogs for details.

Login or Register to post comments

by dbart at July 04, 2017 03:07 PM

July 03, 2017

Kristian Nielsen

Hacking a box of 240x320 displays with the ESP8266

For the last few days, I have been playing with some small displays we have lying around in Labitat. We have ninety-odd of them from an old donation, and I thought it would be cool to be able to use them for some fun projects.

The displays are ET024002DMU with a built-in ST7787 controller. They really are quite nice displays. 240-by-320 resolution in a 42mm-by-60mm form factor. 18-bit colour and a nice and clear display at a high resolution. The protocol is reasonably well designed and described, and they have some nice extra features that should make them suitable for most tasks, even demanding ones.

The picture to the right shows a display soldered to a simple breakout board and connected to an old STM32F405 (ARM Cortex-M4 board) of mine. This is using the display's 16-bit parallel bus, which allows for really fast update. It should even be possible to connect the display to the memory controller of the STM32 and utilise DMA to update the display without tying up the CPU (though my test just used bit-banging on a GPIO port). Other nice features include the ability to sync with or control the vertical refresh (to avoid visible glitches/tearing during frame updates) and hardware scrolling. The display is spec'ed for 2.8V, which is somewhat inconvenient when most of my stuff is otherwise running 3.3V logic, but so far just running the display at 3.3V seems to work fine.

Driving the display over a 16-bit bus from and ARM MUC is cool for high-end stuff. Since we have so many of these displays lying around, it would also be cool to have something that is really easy, and really cheap, to use. So I extended the TFT_eSPI library for the ESP8266 to support the displays. The ESP8266 is very cheap and conveniently runs 3.3V logic (as opposed to the Arduino's normally 5V). The ESPs have become very popular, so since I have not used them before, this was also a good chance to learn a bit more about them.

The TFT_eSPI library is done by Bodmer, derived from some Adafruit libraries. It uses the serial interface mode of the display. This greatly reduces pin count (the ESP8266 seems quite starved for GPIOs), but also greatly reduces performance, of course. But the TFT_eSPI is well optimised, and should be adequate for most purposes. I think I managed to fully implement the ST7787 controller for the library, all of the demos seem to run well. And the performance should be on par with the other supported display controllers. Here is a video showing the display running some demos from the library:

While implementing the support in the library, I discovered a couple interesting details about the display that I wanted to write up. Generally, these kind of displays all seem to share a protocol, and work much the same way. However, the ST7787 only supports a 3-wire serial mode, while the other controllers supported by the library are using a 4-wire serial mode. The difference is that the 4-wire mode uses a dedicated pin DC to distinguish between command (DC=0) and data (DC=1) bytes. In the 3-wire mode, the DC pin is not available, and the data/command bit is instead included as a 9th bit with every byte. Here are diagrams showing this, taken from the datasheet of another display controller, the ILI9341:

The main work in extending the library to support the ST7787 was to re-write all the data transfers to handle the 3-wire mode, bit-stuffing the D/C bit into the SPI data stream instead of pulsing the D/C GPIO. The bit-stuffing probably introduces some overhead; on the other hand, now multiple commands can be given to the ESP8266's SPI peripheral at once, without the need to manually pulse D/C in-between transfers. So I expect the performance to be roughly equivalent; it will be interesting to run some benchmark comparisons against other displays.

I did discover one notable limitation of the 3-wire mode, though. This is when reading pixel data back out of the display's frame buffer memory. There is a command RAMRD to do this, and in the other modes, it allows to read out all pixels in a previously defined read window at once in a single command, or a subset of pixels. But this did not appear to work in the 3-wire mode. It turns out that the display controller takes its dataline into high-Z mode as soon as one pixel has been transfered, no matter the size of the defined read window. This means that frame buffer reads will be a lot slower on the ST7787, due to the overhead of reading out pixels one by one (fortunately, framebuffer read is not a common operation).

I am thinking that this limitation comes from not having the D/C pin available to distinguish data from commands and thereby terminating a transfer. Apparently, the read window size is not available to the SPI interface, or forcing the host to transfer the full window size was deemed undesirable. Thus, only single pixel transfers are supported. The data sheet is not very clear on this, to say the least.

A related limitation is seen in the RAMWR command, which writes a sequence of pixels to the display. The RAMWR does support writing multiple pixels, which is fortunate as this is one of the most used and most performance-critical operations. But what I found was that when writing less that the defined window size, some glitches can occur. It appears that the controller's SPI interface is buffering up to 4 pixels before writing them to the display memory. When writing less than 4 pixels, the pixels would not be written to framebuffer memory immediately. Instead, they would be included in a later write, appearing in the wrong location!

This problem only occurs when writing less than the defined window size. So once understood, it was easy to work around by always setting the right window size before writing pixels - which only affected the line drawing algorithm. Again, this seems to be a limitation of the 3-wire protocol due to missing D/C input. Apparently, it is not able to utilise the D/C bit that is available to the controller in write mode, instead relying on the frame buffer handler to notify when the write window is full. Or maybe it is just a controller bug.

It was btw. interesting to see how similar these different controllers are. This is quite apparent also from similarities between the data sheets. There has clearly been a lot of copy-paste between them. For example, I found this gem in the ST7787 datasheet (in the description of the WRX pin):

-In 4-line serial interface, this pin is used as D/CX (data/ command select)

The ST7787 though does not have any 4-line serial interface (there is no other mention of this mode anywhere in the datasheet). So this is clearly a left-over from a copy-paste from another controller...

Otherwise the implementation of ST7787 support in the library was reasonably straight-forward. The library uses a lot of direct register access to the SPI hardware to optimise the transfer speed, so some care is needed to get the bit-stuffing right in all cases. One thing to be aware of is that the RAMRD command for reading a pixel needs 9 dummy clock pulses between sending the command and reading back the pixel data. A very careful reading of the datasheet might be said to hint at this, though it really is not very clear.

Another thing to be aware of is that the ST7787 uses a single shared data line for its communication; it does not have separate MOSI/MISO lines. This is not a problem in practice, as the controller never transmits and receives data at the same time. So the D0 pin of the ST7787 can simply be connected to both MOSI and MISO on the ESP8266 microcontroller. But the code then needs to temporarily disable the MOSI pin output while reading, to avoid contention on the data line.

So with the library working, it should be easy to control the displays from an ESP8266. The library has a lot of nice functionality, including really good text support with a lot of available fonts. The next step should be to make a couple nice PCBs with connector for the display and for an ESP8266, some power supply, handy pin breakouts, etc. Hopefully some cool projects will come of it in the coming months.

July 03, 2017 07:45 PM

Peter Zaitsev

Webinar Wednesday July 5, 2017: Indexes – What You Need to Know to Get the Most Out of Them

Indexes

IndexesJoin Percona’s Senior Architect, Matthew Boehm, as he presents Indexes – What You Need to Know to Get the Most Out of Them on Wednesday, July 5, 2017 at 8:00 am PDT / 11:00 am EDT (UTC-7).

Proper indexing is key to database performance. Find out how MySQL uses indexes for query execution, and then how to come up with an optimal index strategy. In this session, you’ll also learn how to know when you need an index, and also how to get rid of indexes that you don’t need to speed up queries.

Register for the webinar here.

MatthewMatthew Boehm, Architect

Matthew joined Percona in the fall of 2012 as a MySQL consultant. He quickly rose to Architect and became one of Percona’s training instructors. His areas of knowledge include the traditional Linux/Apache/MySQL/PHP stack, memcached, MySQL Cluster (NDB), massive sharding topologies, PHP development and a bit of MySQL-C-API development.

by Emily Ikuta at July 03, 2017 05:00 PM

July 02, 2017

Kristian Nielsen

FVWM and Java application window focus

I found the solution for a problem that has been annoying me for quite some time. I am using FVWM for a window manager. A few programs written in Java, most notably the Arduino IDE, will not accept keyboard focus. As soon as the window is given focus using normal window-manager commands, it immediately loses the focus again, making it impossible to eg. edit code or use keyboard shortcuts. I could usually get it to focus by randomly clicking a few menus or some such, but it was hardly usable.

Turns out that FVWM has a Style option to handle this: Lenience. From the FVWM manual:

The ICCCM states that windows possessing the property WM_HINTS(WM_HINTS): Client accepts input or input focus: False should not be given the keyboard input focus by the window manager... A number of applications set this property, and yet expect the window manager to give them the keyboard focus anyway, so fvwm provides a window style, Lenience, which allows fvwm to overlook this ICCCM rule.

And indeed, the problem seems solved after I put the following in my FVWM configuration:

  Style "*Arduino*" Lenience

The Arduino IDE window now keeps focus when switching between virtual desktops, using window selection hotkeys, etc. It still seems to require explicitly clicking the window with the mouse to actually accept keyboard events, which is annoying, but at least it is no longer impossible to type into the window. Not sure if this is a fundamental "feature" of Arduino, or if there is a way to fix this also. I rarely use the Arduino IDE anyway.

From the FVWM documentation it would appear that this is a problem with the Arduino code (or the underlying Java window toolkit, I remember seeing similar problems in other Java applications). Not sure why it has never been fixed though - or if it is actually FVWM that is mis-interpreting or being overly zealous with its adherence to ICCCM compliance.

July 02, 2017 08:44 AM

June 30, 2017

Peter Zaitsev

Industry Recognition of Percona: Great Products, Great Team

Percona

Percona’s focus on customer success pushes us to deliver the very best products, services and support. Three recent industry awards reflect our continued success in achieving these goals.

aba-logoWe recently garnered a Bronze Stevie Award “Company of the Year – Computer Software – Medium.” Stevies are some of the most coveted awards in the tech industry. Each year, more than 10,000 organizations in more than 60 nations compete for Stevies. It’s a huge honor to come away with a win.

DBTA_0_0Percona was also included in Database Trends and Applications’ (DBTA) annual DBTA 100 list of the companies that matter most in data. The directory was compiled by the editorial staff of DBTA and spotlights the companies that deal with evolving market demands through innovation in software, services, and hardware.

zippia_logo_headerNewZippia, a new website dedicated to helping recent grads with their career choices, ranked Percona in the top 10 of the “20 Best Companies To Work For In Raleigh, NC.” To compile the ranking, Zippia chose companies in a 25-mile radius that consistently get high reviews from career resource websites.  

Top CEOFinally, Percona co-founder and CEO Peter Zaitsev was honored as one of the Top Rated CEOs in Raleigh, North Carolina, by Owler. Owler looked at 167,000 leaders across 50 cities and 25 industries worldwide and identified the top 1,000. This is a huge honor for Peter!

Driven by Peter’s vision, all of us at Percona remain committed to building an excellent organization that is the champion of unbiased open source database solutions. Our continued success, growth and recognition from organizations like the above lead us to believe that the best is yet to come. Stay tuned!

by Brigit Valencia at June 30, 2017 10:07 PM

Oli Sennhauser

Storing BLOBs in the database

We have sometimes discussions with our customers whether to store LOBs (Large Objects) in the database or not. To not rephrase the arguments again and again I have summarized them in the following lines.

The following items are more or less valid for all large data types (BLOB, TEXT and theoretically also for JSON and GIS columns) stored in a MySQL or MariaDB (or any other relational) database.

The idea of a relational table based data-store is to store structured data (numbers, data and short character strings) to have a quick write and read access to them.

And yes, you can also store other things like videos, huge texts (PDF, emails) or similar in a RDBMS but they are principally not designed for such a job and thus non optimal for the task. Software vendors implement such features not mainly because it makes sense but because users want it and the vendors want to attract users (or their managers) with such features (USP, Unique Selling Proposition). Here also one of my Mantras: Use the right tool for the right task:

right_tool_for_the_right_task.jpg

The main topics to discuss related to LOBs are: Operations, performance, economical reasons and technical limitations.

Disadvantages of storing LOBs in the database

  • The database will grow fast. Operations will become more costly and complicated.
  • Backup and restore will become more costly and complicated for the admin because of the increased size caused by LOBs.
  • Backup and restore will take longer because of the same reason.
  • Database and table management functions (OPTIMIZE, ALTER, etc.) will take longer on big LOB tables.
  • Smaller databases need less RAM/disk space and are thus cheaper.
  • Smaller databases fit better into your RAM and are thus potentially faster (RAM vs disk access).
  • RDBMS are a relatively slow technology (compared to others). Reading LOBs from the database is significantly slower than reading LOBs from a filer for example.
  • LOBs stored in the database will spoil your database cache (InnoDB Buffer Pool) and thus possibly slow down other queries (does not necessarily happen with more sophisticated RBDMS).
  • LOB size limitation of 1 Gbyte in reality (max_allowed_packet, theoretically limit is at 4 Gbyte) for MySQL/MariaDB.
  • Expensive, fast database store (RAID-10, SSD) is wasted for something which can be stored better on a cheap slow file store (RAID-5, HDD).
  • It is programmatically often more complicated to get LOBs from a database than from a filer (depends on your libraries).

Advantages of storing LOBs in the database

  • Atomicity between data and LOB is guaranteed by transactions (is it really in MySQL/MariaDB?).
  • There are no dangling links (reference from data to LOB) between data and LOB.
  • Data and LOB are from the same point in time and can be included in the same backup.
  • Data and LOB can be transferred simultaneously to other machines, by database replication or dump/restore.
  • Applications can use the same mechanism to get the data and the LOB. Remote access needs no separate file transfer for the LOB.

Conclusion

So basically you have to balance the advantages vs. the disadvantages of storing LOBs in the database and decided what arguments are more important in your case.

If you have some more good arguments pro or contra storing LOBs in the database please let me know.

Literature

Check also various articles on Google.

Taxonomy upgrade extras: 

by Shinguz at June 30, 2017 12:18 PM

June 29, 2017

Peter Zaitsev

How Does Percona Software Compare: the Value of Percona Software and Services

Percona Software and Services

Percona Software and ServicesIn this blog post, I’ll discuss my experience as a Solutions Engineer in explaining the value of Percona software and services to customers.

The inspiration for this blog post was a recent conversation with a prospective client. They were exploring open source software solutions and professional services for the first time. This is a fairly common and recurring conversation with individuals and enterprises exploring open source for the first time. They generally find Percona through recommendations from colleagues, or organic Google searches. One of the most common questions Percona gets when engaging at this level is, “How does Percona’s software compare with << Insert Closed-source Enterprise DB Here >>?”

We’re glad you asked. Percona’s position on this question is to explain the value of Percona in the open source space. Percona does not push a particular flavor of open source software. We recommend an appropriate solution given each of our client’s unique business and application requirements. A client coming from this space is likely to have a lengthy list of compliance, performance and scalability requirements. We love these conversations. Our focus is always on providing our customers with the best services, support, and open-source technology solutions possible.

If there were a silver bullet for every database, Percona would not have been as successful over the past 10 (almost 11!) years. We know that MongoDB, MariaDB, MySQL Enterprise, MySQL Community, Percona Server for MySQL, etc., is never the right answer 100% of the time. Our competitors in this space will most likely disagree with this assessment – because they have an incentive to sell their own software. The quality of all of these offerings is extremely high, and our customers use each of them to great success. Using the wrong database software for a particular use-case is like using a butter knife to cut a steak. You can probably make it work, but it will be a frustrating and unrewarding experience.

Usually, there are more efficient alternatives.

There is a better question to ask instead of software vs. software, which basically turns into an apples vs. oranges conversation. It’s: “Why should we partner with Percona as our business adopts more open-source software?” This is where Percona’s experience and expertise in the open-source database space shine.

Percona is invested in ensuring you are making the right decisions based on your needs. Choosing the wrong database solution or architecture can be catastrophic. A recent Ponemon study (https://planetaklimata.com.ua/instr/Liebert_Hiross/Cost_of_Data_Center_Outages_2016_Eng.pdf) estimates the average cost of downtime can be up to $8,800 per minute. Not only is downtime costly, but re-architecting your environment due to an errant choice in database solutions can be costly as well. Uber experienced these pains when using Postgres as their database solution and wrote an interesting blog post about their decision to switch back to MySQL (https://eng.uber.com/mysql-migration/). Postgres has, in turn, responded to Uber recently (http://thebuild.com/presentations/uber-perconalive-2017.pdf). The point of bringing up this high-profile switch is not to take sides, but to highlight the breadth of knowledge necessary to navigate the open source database environment. Given its complexity, the necessity of a trusted business partner with unbiased recommendations should be apparent.

This is where Percona’s experience and expertise in the open source database space shines. Percona’s only investors are its customers, and our only customers are those using our services. Because of this, our core values are intertwined with customer success and satisfaction. This strategy trickles down through support, to professional services and all the way to the engineering team, where efforts are focused on developing software and features that provide the most value to customers as a whole.

Focusing on a business partner in the open source space is a much more important and worthwhile effort than focusing on a software provider. Percona has hands-on, practical experience with a varying array of open-source technologies in the MySQL, MariaDB, and MongoDB ecosystems. We’ve tracked our ability to assist our customer’s efficiencies in their databases and see a 30% efficiency improvement on average with some cases seeing a 10x boost in performance of their database. (https://www.percona.com/blog/2016/07/20/value-database-support/). The open source software itself has proven itself time and again to be the choice of billion dollar industries. Again, there is not one universal choice among companies. There are success stories that run the gamut of open source software solutions.

To summarize, there’s a reason we redirect the software vs. software question. It’s not because we don’t like talking about how great our software is (it is great, and holds it own with other open source offerings). It’s because the question does not highlight Percona’s value. Percona doesn’t profit from its software. Supporting the open-source ecosystem is our goal. We are heavily invested in providing our customers with the right open source software solution for their situation regardless of whether it is ours or not. Our experience and expertise in this space make us the ideal partner for businesses adopting more open source technology in place of enterprise-licensed models.

by Barrett Chambers at June 29, 2017 10:22 PM

Shlomi Noach

What's so complicated about a master failover?

The more work on orchestrator, the more user input and the more production experience, the more insights I get into MySQL master recoveries. I'd like to share the complexities in correctly running general-purpose master failovers; from picking up the right candidates to finalizing the promotion.

The TL;DR is: we're often unaware of just how things can turn at the time of failover, and the impact of every single decision we make. Different environments have different requirements, and different users wish to have different policies. Understanding the scenarios can help you make the right choice.

The scenarios and considerations below are ones I picked while browsing through the orchestrator code and through Issues and questions. There are more. There are always more scenarios.

I discuss "normal replication" scenarios below; some of these will apply to synchronous replication setups (Galera, XtraDB Cluster, InnoDB Cluster) where using cross DC, where using intermediate masters, where working in an evolving environment.

orchestrator-wise, please refer to "MySQL High Availability tools" followup, the missing piece: orchestrator, an earlier post. Some notions from that post are re-iterated here.

Who to promote?

Largely covered by the missing piece post (skip this section if you've read said post), consider the following:

  • You run with a mixed versions setup. Your master is 5.6, most your replicas are 5.6 but you've upgraded a couple replicas to 5.7. You must not promote those 5.7 servers since you cannot replicate 5.7->5.6. You may lose these servers upon failover.
  • But perhaps by now you've upgraded most of your replicas to 5.7, in which case you prefer to promote a 5.7 server in the event the alternative is losing your 5.7 fleet.
  • You run with both STATEMENT based replication and ROW based. You must not promote a ROW based replica because a STATEMENT based server cannot replicate from it. You may lose ROW servers during the failover.
  • But perhaps by now you've upgraded most of your replicas to ROW, in which case you prefer to promote a ROW server in the event the alternative is losing your ROW fleet.
  • Some servers can't be promoted because they don't use binary logging or log_slave_updates. They could be lost in action.

Noteworthy that MHA solves the above by syncing relay logs across the replicas. I had an attempt at doing the same for orchestrator but was unsatisfied with the results and am wary of hidden assumptions. I do not expect to continue working on that.

Intermediate masters

Recovery of intermediate masters, while simpler, also adds many more questions to the table.

  • An intermediate master crashes. What is the correct handling? Should you move all of its orphaned replicas under another server? Or does this group of replicas form some pact that must stick together? Perhaps you insist on promoting one of them on top of its siblings.
  • On failure, you are likely to prefer promoted server from same DC; this will have least impact on your application.
    • But are you willing to lose a server or two to make that so? Or do you prefer switching to a different DC and not lose any server in the process?
    • A specific, large orchestrator user actually wants to failover to a different DC. Not only that, the customer then prefers flipping all other cluster masters to the other DC (a full-DC failover)
  • An intermediate master had replication filters (scenario: you were working to extract and refactor a subtree onto its own cluster, but the IM crashed before you did so)
    • What do you do? Did you have all subsequent replicas run with same filters?
    • If not, do you have the playbook to do so at time of failure?
  • An intermediate master was writable. It crashed. What do you do? Who is a legitimate replacement? How can you even reconnect the subtree with the main tree?

Candidates

  • Do you prefer some servers over others? Some servers have stronger hardware; you'd like them to be promoted if possible
    • orchestrator can juggle with that to some extent.
  • Are there servers you never want to promote? Servers used by developers; used for backups with open logical volumes; weaker hardware; a DC you don't want to failover to; ...
    • But then again, maybe it's fine if those servers act as intermediate masters? So that they must not be promoted as masters, but are good to participate in an intermediate master failover?
  • How do you even define the promotion types for those servers?
    • We strongly prefer to do this live. Service discovery dictates the type of "promotion rule"; a recurring cronjob keeps updating orchestrator with the server's choice of promotion rule.
    • We strongly discourage configuration based rules, unless for servers which are obviously-never-promote.

What to respond to?

What is a scenario that kicks a failover?

Relate to the missing piece post for the holistic approach orchestrator takes to make a reliable detection. But regardless, do you want to take action where:

  • The master is completely dead and everyone sees that and agrees? (resounding yes)
  • The master is dead to the application but replication seems to be working? (master is at deadlock, but replication threads seem to be happy)
  • The master is half-dead to the application? (no new connections; old connections includign replication connections keep on running!)
  • A DC is network partitioned, the master is alive with some replicas in that DC; but the majority of the replicas are in other DCs, unable to see the master?
    • Is this a question of majority? Of DC preference? Is there at all an answer?

Upon promotion

Most people expect orchestrator to RESET SLAVE ALL and SET read_only=0 upon promotion. This is possible, but the default is not to do so. Why?

  • What if your promoted server still has unapplied relay logs? This can happen in the event all replicas were lagging at the time of master failure. Do you prefer:
    • To promote, RESET SLAVE ALL and lose all those relay logs? You gain availability at the expense of losing data.
    • To wait till SQL_THREAD has consumed the logs? You keep your data at the expense of availability.
    • To abort? You let a human handle this; this is likely to take more time.
  • What do you do with delayed/slow replicas? It could take a while to connect them back to the promoted master. Do you prefer:
    • Waiting for them to connect; delay promotion
    • Advertise new master, then asynchronously work to connect them: you may have improved availability, but at reduced capacity, which is an availability issue in itself.

Flapping

You wish to avoid flapping. A scenario could be that you're placing such load on your master that it crashes; the next server to promote as master will have the same load, will similarly crash. You do not wish to exhaust your fleet.

What makes a reasonable anti-flapping rule? Options:

  • Upon any failure in a cluster, block any other failover on that cluster for X minutes
    • What happens if a major power down issue requires two or three failovers on the same cluster?
  • Only block further master failovers, but allow intermediate master failovers as much as needed
    • There could be intermediate master exhaustion, as well
  • Only allow one of each (master and intermediate master), then block for X minutes?
  • Allow a burst of failovers for a duration of N seconds, then block for X minutes?

Detection spam

You don't always take action on failure detection. Maybe you're blocked via anti-flapping on an earlier failure; or have configured to not automatically failover.

Detection is the basis to, but independent of failover. You should have detection in place even if not failing over.

You wish to run detection continuously.

  • So if failover does not take place, detection will keep noticing the same problem again and again.
    • You get spammed by alerts
  • Only detect once?
    • Been there. When you really need that detection to alert you find out it alerted once 6 hours ago and you ignored it because it was not actionable at the time.
  • Only detect a change in diagnosis?
    • Been there. Diagnosis itself can flap back and forth. You get noise.
  • Block detection for X minutes?
    • What is a good tradeoff between noise and visibility?

Back to life

And old sub-tree comes back to life. Scenario: DC power failure.

This subtree claims "I'm cluster X". What happens?

  • Your infrastructure needs to have memory and understanding that there's already a running cluster called X.
  • Any failures within that subtree must not be interpreted as a "cluster X failure", or else you kick a cluster X failover when there is, in fact, a running cluster X. See this PR and related links. At this time orchestrator handles this scenario correctly.
  • When do you consider a previously-dead server to be alive and well?
    • I do mean, automatically and reliably so? See same PR for orchestrator's take.

User control

Failover tooling must always let a human decide something is broken. It must always allow for an urgent failover, even if nothing seems to be wrong to the system.

Isn't this just so broken? Isn't synchronous replication the answer?

The world is broken, and distributed systems are hard.

Synchronous replication is an answer, and solves many (I think) of the above issues, creating its own issues, but I'm not an expert on that.

However noteworthy that when some people think about synchronous replication they forget about cross-DC replication and cross-DC failovers,on upgrades and experiments. The moment you put intermediate masters at play, you're almost back to square one with many of the above questions again applicable to your use case.

by shlomi at June 29, 2017 03:01 PM

June 28, 2017

Peter Zaitsev

MySQL Encryption at Rest – Part 2 (InnoDB)

MySQL Encryption at Rest

MySQL Encryption at RestWelcome to Part 2 in a series of blog posts on MySQL encryption at rest. This post covers InnoDB tablespace encryption.

At Percona, we work with a number of clients that require strong security measures for PCI, HIPAA and PHI compliance, where data managed by MySQL needs to be encrypted “at rest.” As with all things open source, there several options for meeting the MySQL encryption at rest requirement. In this three-part series, we cover several popular options of encrypting data and present the various pros and cons to each solution. You may want to evaluate which parts of these tutorials work best for your situation before using them in production.

Part one of this series covered implementing disk-level encryption using crypt+LUKS.

In this post, we discuss InnoDB tablespace encryption (TE). You can choose to use just LUKS, just InnoDB TE, or you can use both. It depends on your compliance needs and your configuration. For example, say you put your binary logs and InnoDB redo logs on partition X, and your InnoDB tablespaces (i.e., your .ibd files) on partition Y. InnoDB TE only handles encrypting the tablespaces. You would need to use LUKS on partition X to encrypt those files.

InnoDB Tablespace Encryption Tutorial

The manual describes InnoDB encryption as follows:

InnoDB tablespace encryption uses a two-tier encryption key architecture, consisting of a master encryption key and tablespace keys. When an InnoDB table is encrypted, a tablespace key is encrypted and stored in the tablespace header. When an application or authenticated user wants to access encrypted tablespace data, InnoDB uses a master encryption key to decrypt the tablespace key. The decrypted version of a tablespace key never changes, but you can change the master encryption key as required. This action is referred to as master key rotation.

As a real-world example, picture a high-school janitor’s key-chain. Each key is different; each key works for only one door (or in MySQL’s case, table). The keys themselves never change, but the label on each key can change based on how the janitor wants to note which key works on which door. At any time, the janitor can re-label the keys with special names, thwarting any hooligans from opening doors they shouldn’t be opening.

Configuration Changes

As noted above, InnoDB uses a keyring to manage encryption keys on a per-table basis. But you must load the keyring plugin before InnoDB. Add the following lines to your my.cnf under the [mysqld] section. You may substitute a different path for the keyring file, but be sure that mysqld has permissions to this path:

early-plugin-load = keyring_file.so
keyring_file_data = /var/lib/mysql-keyring/keyring

You do not need to restart MySQL at this time, but you can if you wish. These settings are for when you restart MySQL in the future.

Install the Keyring Plugin in MySQL

If you don’t wish to restart MySQL at this time, you can still continue. Run the following SQL statements within MySQL to load the plugin and configure the keyring file:

mysql> INSTALL PLUGIN keyring_file SONAME 'keyring_file.so';
mysql> SET GLOBAL keyring_file_data = '/var/lib/mysql-keyring/keyring';

You can now CREATE encrypted tables or ALTER existing tables to encrypt them with a simple command:

mysql> CREATE TABLE t1 (c1 INT) ENCRYPTION='Y';
Query OK, 0 rows affected (0.04 sec)
mysql> ALTER TABLE users ENCRYPTION='Y';
Query OK, 1 row affected (0.04 sec)

Be careful with the above ALTER: It will most likely block all other transactions during this conversion.

Encrypting Tables Using pt-online-schema-change

Using pt-online-schema-change is fairly straightforward. While it has many options, the most basic usage to encrypt a single table is the following:

pt-online-schema-change --execute --alter "engine=innodb encryption='Y'" D=mydb,t=mytab

pt-online-schema-change creates an encrypted, shadow copy of the table mytab. Once the table is populated with the original data, it drop-swaps the table and replaces the original table with the new one.

It is always recommended to first run pt-online-schema-change with the --dry-run flag first, in order to validate the command and the change.

Encrypting All Tables in a Database

To encrypt all the tables in a database, we can simply take the above command and wrap it in a shell loop. The loop is fed a list of tables in the database from MySQL. You just need to change the dbname variable:

# dbname=mydb
# mysql -B -N -e "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='$dbname' and TABLE_TYPE='BASE TABLE'" |
while read line; do pt-online-schema-change --execute --alter "engine=innodb encryption='Y'" D=$dbname,t=$line; done

Be careful with the above command, as it will take off and blast through all the changes!

Rotating InnoDB Encryption Keys

Your compliance policy dictates how often you need to rotate the encryption keys within InnoDB:

mysql> ALTER INSTANCE ROTATE INNODB MASTER KEY;

This command creates a new key, decrypts all encrypted tablespace headers (where you store the per-table key) using the previous key, and then re-encrypts the headers with the new key. Notice that the table itself is not re-encrypted. This would be an extremely intrusive and potentially dangerous operation. Recall above that each tables’ encryption key never changes (unless the entire table gets decrypted). What can change is the key used to encrypt that key within each tables’ header.

The keyring UDFs in MySQL

The keyring plugin can also be used to store sensitive data within MySQL for later retrieval. This functionality is beyond the scope of this blog as it doesn’t deal directly with InnoDB TE. You can read more about it in the MySQL manual.

Conclusion

InnoDB Tablespace Encryption makes it easy to use MySQL encryption at rest. Keep in mind the downsides of using only TE: InnoDB tablespace encryption doesn’t cover undo logs, redo logs or the main ibdata1 tablespace. Additionally, InnoDB encryption doesn’t cover binary-logs and slow-query-logs.

by Manjot Singh at June 28, 2017 08:51 PM

Jean-Jerome Schmidt

What's New with ProxySQL in ClusterControl v1.4.2

ClusterControl v1.4.2 comes with a number of improvements around ProxySQL. The previous version, also known as “The ProxySQL Edition”, introduced a full integration of ProxySQL with MySQL Replication and Galera Cluster. With ClusterControl v1.4.2, we introduced some amazing features to help you running ProxySQL at scale in production. These features include running ProxySQL in high availability mode, keeping multiple instances in sync with each other, managing your existing ProxySQL instances and caching queries in one click.

High Availability with Keepalived

ProxySQL can be deployed as a distributed or centralized MySQL load balancer. Setting up a centralized ProxySQL usually requires two or more instances, these are coupled with a virtual IP address that provides a single endpoint for your database applications. This is a proven approach that we have been using with HAProxy. By having a primary ProxySQL instance and a backup instance, your load balancer would not be a single-point-of-failure. The following diagram shows an example architecture of centralized ProxySQL instances with Keepalived:

ClusterControl allows more than two ProxySQL instances to share the same virtual IP address. From the ClusterControl web interface, you can manage them from the Keepalived deployment wizard:

Note that Keepalived is automatically configured with a single-master-multiple-backups approach, where only one ProxySQL instance (assuming one instance per host) will hold the virtual IP address at a time. The rest will act as a hot-standby proxy, unless you explicitly connect to them via the instance IP address. Your application is simplified where you only need to connect to a single virtual IP address and the proxy will take care of load balancing connections on one of the available database nodes.

Add Existing ProxySQL

If you already installed ProxySQL manually, and want to manage it using ClusterControl, use the ‘Import ProxySQL’ feature to add it into ClusterControl. Before importing the instance, ensure ClusterControl is able to perform passwordless SSH to the target host. Once done, simply provide the ProxySQL host, the listening port (default to 6033) and ProxySQL administrator user credentials:

ClusterControl will then connect to the target host via SSH and perform some sanity checks before connecting to ProxySQL admin interface via local socket. You will then see it listed in the side menu of the Nodes tab, under ProxySQL section. That’s it, simple and straightforward. By ticking the ‘Import Configuration’, you can also choose to update the ProxySQL node with the configuration from an existing ProxySQL node.

Sync ProxySQL Instances

It is extremely common to deploy multiple instances of ProxySQL. One would deploy at least one primary and one standby for high availability purposes. Or the architecture might mandate multiple instances, for example one proxy instance per web server. Managing multiple ProxySQL instances can be a hassle though, as they need to be kept in sync - things like configurations, hostgroup definitions, query rules, backend/frontend users, global variables, and so on.

Consider the following architecture:

When you have a distributed ProxySQL deployment like the above, you need a way to synchronize configurations. Ideally, you would want to do this automatically and not rely on manually changing the configurations on each instance. ClusterControl v1.4.2 allows you to synchronize ProxySQL configurations between instances, or simply export and import the configuration from one instance to another.

When changing the configuration on a ProxySQL instance, you can use “Node Actions” -> “Synchronize Instances” feature to apply the configuration to another instance. The following configuration will be synced over:

  • Query rules
  • Hostgroups and servers
  • Users (backend and frontend)
  • Global variables
  • Scheduler
  • The content of ProxySQL configuration file (proxysql.cnf)

Take note that by clicking on “Synchronize”, the existing configuration on the target instance will be overwritten. There is a warning for that. A new job will be initiated by ClusterControl, and all changes will be flushed to disk on the target node to make it persistent across restarts.

Simplified Query Cache

ProxySQL allows you to cache specific read queries, as well as specify how long they should be cached. Resultsets are cached in native MySQL packets format. This offloads the database servers and applications have faster access to cached data. This also reduces the need for a separate caching layer (e.g. Redis or Memcached).

You can now cache any query digested by ProxySQL with a single click. Simply rollover the query that you would like to cache and click “Cache Query”:

ClusterControl will provide a shortcut to add a new query rule to cache the selected query:

Define the Rule ID (query is processed based on this ordering), destination hostgroup and TTL value in milliseconds. Click “Add Rule” to start cache the query. You can then verify from ProxySQL’s Rules page if the current incoming query matches the rule by looking at the “Hits” column:

That’s it for now. You are welcome to install ClusterControl and try out the ProxySQL features. Happy proxying!

by ashraf at June 28, 2017 09:59 AM

MariaDB AB

Testing MariaDB ColumnStore on Windows using Hyper-V

Testing MariaDB ColumnStore on Windows using Hyper-V david_thompson_g Tue, 06/27/2017 - 20:22

Hyper-V is the native virtualization technology contained within Windows 10 Professional (and Windows Server & Windows 8 Pro). Using Hyper-V to create one or more virtual machines running Linux, it is possible to run and test multiple types of MariaDB ColumnStore deployment on a single Windows machine.

Networking

Hyper-V's provides very flexible and powerful software defined networking. However some of this power is not available within the graphical Hyper-V Manager application and requires use of PowerShell commands. The following procedure will enable a functional network that offers to each Linux VM:

  • A Static IP address.

  • Ability to reach external resources such as package repositories.

  • Optionally, mapping a network port on the Windows host to a port on a VM instance.

Start an instance of PowerShell with administrator privileges (right click, run as administrator) and run the following three commands:

New-VMSwitch –SwitchName “NATSwitch” –SwitchType Internal
New-NetIPAddress –IPAddress 172.21.21.1 -PrefixLength 24 -InterfaceAlias "vEthernet (NATSwitch)"
New-NetNat –Name MyNATnetwork –InternalIPInterfaceAddressPrefix 172.21.21.0/24

The commands perform the following:

  1. Creates a new internal virtual switch named 'NATSwitch'

  2. Configures a gateway address of 172.21.21.1

  3. Configures a Network Address Translation (NAT) rule to allow the VM's running on the 172.21.21 subnet to communicate outbound to the Windows host network (and thus the internet).

Virtual Machine Setup

Once the virtual switch network has been created, create as many VM instances as you need. For running ColumnStore i recommend creating these with:

  • 4Gb memory (2Gb is the recommended lower limit per ColumnStore host if you have less physical memory).

  • 40GB disk (more or less depending on how much data you will use for testing, 10GB could be good for very small test data sets).

  • As many vCPU's as you have CPU cores.

  • Select 'NATSwitch' as the network virtual switch.

  • A non graphical Linux ISO install image, for example CentOS 7 Minimal.

  • Disable Secure Boot checkbox in Security settings as this is generally not compatible with downloaded linux ISO images.

Start and Connect to the VM from the Hyper-V application to install the Linux distribution on the VM. During the install setup or post installation configure a static IP for the host on the subnet 172.21.21. To configure this post installation, edit /etc/sysconfig/network-scripts/ifcfg-eth0 as root to be something like:

TYPE="Ethernet"
BOOTPROTO="static"
UUID="be5574cf-1728-4532-8495-727e1eb157b3"
DEVICE="eth0"
ONBOOT="yes"
IPADDR=172.21.21.2
NETMASK=255.255.255.0
GATEWAY=172.21.21.1
DNS1=8.8.8.8

Notes:

  • IPADDR is 172.21.21.2, this should be unique within the 172.21.21 subnet for each new virtual machine.

  • GATEWAY is set to 172.21.21.1 for all hosts.

  • DNS1 is set to 8.8.8.8 to use Google DNS (this may be changed to your preferred DNS host).

In addition you can configure the/etc/hosts file on each linux vm to have hostnames to simplify access between servers, for example:

127.0.0.1   localhost localhost.localdomain
::1         localhost localhost.localdomain
172.21.21.2 centos1
172.21.21.3 centos2 
172.21.21.4 centos3

On the Windows host, the same can be done to ease access by opening an editor with administrator privileges and editing the file C:\Windows\System32\drivers\etc\hosts:

172.21.21.2 centos1
172.21.21.3 centos2
172.21.21.4 centos3

Using the Linux VM's

You can access each VM through the Hyper-V manager, however cut and paste doesn't work well between Windows and Linux VM's so I recommend using the ssh client putty in conjunction with mtputty which provides  multiple tab view on top of putty.

Sometimes using ssh access to the vm's can be slow to login so the following edits to /etc/ssh/sshd_config will resolve this:

GSSAPIAuthentication no
UseDNS no

After this restart sshd, e.g. systemctl restart sshd on CentOS 7.

Optionally, an additional Hyper-V NAT rule can be configured to enable port forwarding from the windows host to a given virtual machine port. Assuming 172.21.21.2 is a VM running a ColumnStore UM module, the following powershell command will allow SQL clients to connect to the windows host ip address to issue queries to the ColumnStore instance running on that VM:

Add-NetNatStaticMapping -NatName “MyNATnetwork” -Protocol TCP -ExternalIPAddress 0.0.0.0 -InternalIPAddress 172.21.21.2 -InternalPort 3306 -ExternalPort 3306

Credit goes to Thomas Maurer's blog for the Hyper-V scripts.

Other Options

This deployment setup enables local development testing of a multi node cluster of MariaDB ColumnStore, however other options exist:

  1. Utilize another Virtualization technology such as VirtualBox. This will also allow you to run multiple VM's.

  2. Use Docker for Windows (Windows 10 Pro) or Docker Toolkit (other Windows versions) and utilize our reference Dockerfile for single server deployment. Some changes are being worked on in our upcoming 1.1 release with a goal of making it easier to support containers for multi server deployment.

  3. Use Bash for Windows (requires Windows 10 Creator's Edition) and follow the Ubuntu 16 installation instructions. This is limited to single server and a bash prompt must be running for Linux processes to remain running.

If you have a Mac, options 1 and 2 are the best approach.

Hyper-V is the native virtualization technology contained within Windows 10 Professional (and Windows Server & Windows 8 Pro). Using Hyper-V to create one or more virtual machines running Linux, it is possible to run and test multiple types of MariaDB ColumnStore deployment on a single Windows machine.

Login or Register to post comments

by david_thompson_g at June 28, 2017 12:22 AM

June 27, 2017

MariaDB AB

Poorly Tuned Databases Exact a Heavy Toll: 3 MariaDB Tuning Tips

Poorly Tuned Databases Exact a Heavy Toll: 3 MariaDB Tuning Tips Amy Krishnamohan Tue, 06/27/2017 - 17:44

Optimizing your database performance is important, but it often falls by the wayside in the bustle of daily business. How big a deal is it?

Let’s look at potential consequences of a poorly tuned database:

Bad user experience: Obviously you never want customers to complain of bad performance from your application. That’s the first step toward…

Lost sales: Customers may walk away when your application is just too slow. But it gets worse. You’re more likely to run into downtime when your database is overloaded, and if the server that’s down is an ordering or credit card–processing system, you’re guaranteed to forfeit sales. That money talk leads us to…

Unnecessary spending: You can spend more money on hardware (RAM, etc.) to speed up your system, or you can simply set up your database to run efficiently.

You can avoid these unwanted outcomes! Here’s a quick look at three common issues, with easy-to-address tuning solutions.

Problem: Poor overall database performance

Tuning fix: Control your swappiness!

Swapping is the exchange of data from real memory (RAM) to virtual memory (disk). MariaDB's algorithms assume memory is not swap, and they become highly inefficient if it is. The swappiness value ranges from 0 to 100; we recommend setting swappiness to 1 for MariaDB databases. Some sources advise setting it to 0, but keeping that 1 point of emergency swap available allows you some scope to kill runaway processes.

Problem: Slow query execution

Tuning fix: Dive into the InnoDB buffer pool

innodb_buffer_pool_size is the most important of all system variables. The buffer pool caches data and indexes, and you usually want it as large as possible so you keep that information in RAM, minimizing disk IO. You can set the buffer pool size to 70–80% of available memory on a dedicated database server. However, there are two major considerations:

  • Total memory allocated is about 10% more than the value you specify, due to space being reserved for control structures and buffers.

  • If your buffer pool is too large, it’ll cause swapping—which more than offsets the benefit of a large buffer pool.

Good news: Starting with MariaDB Server 10.2.2, you can set the InnoDB buffer pool size dynamically.

Problem: “Too many connections” error

Tuning fix: Check your max_connections server variable

Having excessive connections can cause high RAM usage and make your server lock up. Getting a too_many_connections error may be a symptom of slow queries and other bottlenecks, but if you’ve ruled those out, you can fix the issue by increasing the max_connections value. (To gauge your target number, check out the max_used_connections status variable to see the peak value.)

Performance tuning doesn’t stop with these three tips, of course. If you’re looking for more ways to whip your application into tip-top shape, watch the recorded database tuning and optimization webinar. We’ll cover available communication and monitoring tools, common bottlenecks and errors, best practices for sizing the InnoDB buffer pool and log files, and more.

Optimizing your database performance is important, but it often falls by the wayside in the bustle of daily business. This blog reviews three common performance issues, with easy-to-address tuning solutions.

Login or Register to post comments

by Amy Krishnamohan at June 27, 2017 09:44 PM

Peter Zaitsev

SSL Connections in MySQL 5.7

SSL Connections

SSL ConnectionsThis blog post looks at SSL connections and how they work in MySQL 5.7.

Recently I was working on an SSL implementation with MySQL 5.7, and I made some interesting discoveries. I realized I could connect to the MySQL server without specifying the SSL keys on the client side, and the connection is still secured by SSL. I was confused and I did not understand what was happening.

In this blog post, I am going to show you why SSL works in MySQL 5.7, and it worked previously in MySQL 5.6.

Let’s start with an introduction of how SSL worked in 5.6.

SSL in MySQL 5.6

The documentation for SSL in MySQL 5.6 is quite detailed, and it explains how SSL works. But first let’s make one thing clear: MySQL supports secure (encrypted) connections between clients and the server using the TLS (Transport Layer Security) protocol. TLS is sometimes referred to as SSL (Secure Sockets Layer), but MySQL doesn’t actually use the SSL protocol for secure connections because it provides weak encryption.

So when we/someone says MySQL is using SSL, it really means that it is using TLS. You can check which protocol you use:

show status like 'Ssl_version';
+---------------+---------+
| Variable_name | Value |
+---------------+---------+
| Ssl_version | TLSv1.2 |
+---------------+---------+

So how does it work? Let me quote a few lines from the MySQL documentation:

TLS uses encryption algorithms to ensure that data received over a public network can be trusted. It has mechanisms to detect data change, loss, or replay. TLS also incorporates algorithms that provide identity verification using the X509 standard. X509 makes it possible to identify someone on the Internet. In basic terms, there should be some entity called a “Certificate Authority” (or CA) that assigns electronic certificates to anyone who needs them. Certificates rely on asymmetric encryption algorithms that have two encryption keys (a public key and a secret key). A certificate owner can present the certificate to another party as proof of identity. A certificate consists of its owner’s public key. Any data encrypted using this public key can be decrypted only using the corresponding secret key, which is held by the owner of the certificate.

It works with key pairs (private and public): the server has the private keys and the client has the public keys. Here is a link showing how we can generate these keys.

MySQL 5.6 Client

When the server is configured with SSL, the client has to have the client certificates. Once it gets those, it can connect to the server using SSL:

mysql -utestuser -p -h192.168.0.1 -P3306 --ssl-key=/root/mysql-ssl/client-key.pem --ssl-cert=/root/mysql-ssl/client-cert.pem

We have to specify the key and the certificate. Otherwise, we cannot connect to the server with SSL. So far so good, everything works as the documentation says. But how does it work in MySQL 5.7?

SSL in MySQL 5.7

The documentation was very confusing to me, and did not help much. It described how to create the SSL keys the same way (and even the server and client configuration) as in MySQL 5.6. So if everything is the same, why I can connect without specifying the client keys? I did not have an answer for that. One of my colleagues (Alexey Poritskiy) found the answer after digging through the manuals for a while, and it finally clearly described this behavior:

As of MySQL 5.7.3, a client need specify only the --ssl option to obtain an encrypted connection. The connection attempt fails if an encrypted connection cannot be established. Before MySQL 5.7.3, the client must specify either the --ssl-ca option, or all three of the --ssl-ca, --ssl-key, and --ssl-cert options.

These lines are in the “CREATE USER” syntax manual.

After this, I re-read the SSL documentation and found the following:

5.7.3: On the client side, an explicit --ssl option is no longer advisory but prescriptive. Given a server enabled to support secure connections, a client program can require a secure conection by specifying only the --ssl option. The connection attempt fails if a secure connection cannot be established. Other --ssl-xxx options on the client side mean that a secure connection is advisory (the connection attempt falls back to an unencrypted connection if a secure connection cannot be established).

I still think the SSL manual could be more expressive and detailed.

Before MySQL 5.7.3, it used key pairs. After that, it works a bit similar to websites (HTTPS): the client does not need the keys and the connection still can be secure. (I would still like to see more detailed documentation how it really works.)

Is This Good for Us?

In my opinion, this is a really good feature, but it didn’t get a lot of publicity. Prior to 5.7.3, if we wanted to use SSL we had to create the keys and use the client keys in every client application. It’s doable, but it is just a pain — especially if we are using different keys for every server with many users.

With this feature we can have a different key for every server, but on the client side we only have to enable the SSL connection. Most of the clients use it as the default if SSL is available.

I am pretty sure if people knew about this feature they would use it more often.

Limitations

I tested it with many client applications, and all worked without specifying the client keys. I only had issues with the MySQL 5.6 client, even though the server was 5.7. In that case, I had to specify the client keys. Without them, I couldn’t connect to the server.

It is possible some older applications won’t support this feature.

Conclusion

This is a great feature, easy to use it and it makes SSL implementations much easier. But I think the documentation should be clearer and more precise in describing how this new feature actually works. I already submitted a request to update the documentation.

by Tibor Korocz at June 27, 2017 09:16 PM

Webinar Thursday June 29, 2017: Choosing a MySQL® High Availability Solution

High Availability SolutionJoin Percona’s Principal Technical Services Engineer, Marcos Albe as he presents Choosing a MySQL High Availability Solution on Thursday, June 29, 2017, at 11:00 am PDT / 2:00 pm EDT (UTC-7).

The MySQL world is full of tradeoffs, and choosing a high availability (HA) solution is no exception. Learn to think about high availability “the Percona way,” and how to use the solutions that we deploy on a regular basis.In this webinar, we will cover:

  • Percona XtraDB Cluster
  • DRBD
  • MHA
  • MySQL Orchestrator

Register for the webinar here.

Marcos AlbeMarcos Albe, Principal Technical Services Engineer

After 12 years of working as a PHP/JS developer for local and remote firms, Marcos decided to pursue his true love and become a full-time DBA. Since then, he has provided MySQL Support for Percona for the past six. He helps leading web properties with advice on anything-MySQL, as well as in-depth system performance diagnostic help.

 

by Emily Ikuta at June 27, 2017 06:40 PM

Jean-Jerome Schmidt

Announcing ClusterControl 1.4.2 - the DevOps Edition

Today we are pleased to announce the 1.4.2 release of ClusterControl - the all-inclusive database management system that lets you easily deploy, monitor, manage and scale highly available open source databases - and load balancers - in your infrastructure.

Release Highlights

For MySQL

Set up transparent failover of ProxySQL with Keepalived and Virtual IP

Keep query rules, users and other settings in sync across multiple instances

For PostgreSQL

New primary - standby deployment wizard for streaming replication

Automated failover and slave to master promotion

For MySQL, MongoDB & PostgreSQL

New Integrations with communications or incident response management systems such as Pagerduty, VictorOps, Telegram, Opsgenie and Slack

New Web SSH Console

And more! Read about the full details below.

Download ClusterControl

View release details and resources

Release description

This maintenance release of ClusterControl is all about consolidating the popular database management features our users have come to appreciate. And we have some great new features aimed at DevOps teams!

Our new integration with popular incident management and chat services lets you customise the alarms and get alerted in the ops tools you are already using - e.g., Pagerduty, VictorOps, Telegram, Opsgenie and Slack. You can also run any command available in the ClusterControl CLI from your CCBot-enabled chat.

ProxySQL can now be deployed in active standby HA mode with Keepalived and Virtual IP. It is also possible to export and synchronize configurations across multiple instances, which is an essential feature in a distributed environment.

And we’re introducing automatic failover and replication management of your PostgreSQL replication setups.

In more detail …

ChatOps with ClusterControl’s CCBot

In our previous ClusterControl release we included the new ClusterControl command line client (CLI). We have now made a new and improved CCBot available that has full integration with the CLI. This means you can use any command available in the CLI from your CCBot-enabled chat!

The command line client is intuitive and easy to use, and if you are a frequent command line user it will be quick to get accustomed to. However, not everyone has command line access to the hosts installed with ClusterControl, and if external connections to this node are prohibited, the CLI will not be able to send commands to the ClusterControl backend. Also some users may not be accustomed to work on the command line. Adding the CLI to our chatbot, CCBot, addresses both problems: this will empower those users to send commands to ClusterControl that they normally wouldn't have been able to.

New Integrations with popular notification systems

Alarms and Events can now easily be sent to incident management services like PagerDuty and VictorOps, or to chat services like Slack and Telegram. You can also use Webhooks if you want to integrate with other services to act on status changes in your clusters. The direct connections with these popular incident communication services allow you to customize how you are alerted from ClusterControl when something goes wrong with your database environments.

  • Send Alarms and Events to:
    • PagerDuty, VictorOps, and OpsGenie
    • Slack and Telegram
    • User registered Webhooks

Automated failover for PostgreSQL

Starting from ClusterControl 1.4.2, you can deploy an entire PostgreSQL replication setup in the same way as you would deploy MySQL and MongoDB: you can use the “Deploy Cluster” menu to deploy a primary and one or more PostgreSQL standby servers. Once the replication setup is deployed, ClusterControl will manage the setup and automatically recover failed servers.

Another feature is “Rebuild Replication Slave” job which is available for all slaves (or standby servers) in the replication setup. This is to be used for instance when you want to wipe out the data on the standby, and rebuild it again with a fresh copy of data from the primary. It can be useful if a standby server is not able to connect and replicate from the primary for some reason.

You can now easily check which queries are responsible for the load on your PostgreSQL setup. You’ll see here some basic performance data - how many queries of a given type have been executed? What was their maximum and average execution time? How the total execution time for that query looks like? Download ClusterControl to get started.

ProxySQL Improvements

In this release we have improvements for ProxySQL to help you deploy active/standby setups with Keepalived and Virtual IP. This improved integration with Keepalived and Virtual IP brings high availability and automatic failover to your load balancing.

And you can also easily synchronize a ProxySQL configuration that has query rules, users, and host groups with other instances to keep them identical.

  • Copy, Export and Import ProxySQL configurations to/from other instances to keep them in sync
  • Add Existing standalone ProxySQL instance
  • Add Existing Keepalived in active/passive setups
  • Deploy up to 3 ProxySQL instances with a Keepalived active/passive setup
  • Simplified Query Cache creation

New Web-based SSH Console

From the ClusterControl GUI, you now have SSH access to any of the database nodes right from your browser. This can be very useful if you need to quickly log into a database server and access the command line. Communication is based on HTTPS, so it is possible to access your servers from behind a firewall that restricts Internet access to only port 443. Access to WebSSH is configurable by the ClusterControl admin through the GUI.

  • Open a terminal window to any cluster nodes
    • Only supported with Apache 2.4+

There are a number of other features and improvements that we have not mentioned here. You can find all details in the ChangeLog.

We encourage you to test this latest release and provide us with your feedback. If you’d like a demo, feel free to request one.

Thank you for your ongoing support, and happy clustering!

PS.: For additional tips & tricks, follow our blog: https://severalnines.com/blog/.

by Severalnines at June 27, 2017 01:22 PM

June 26, 2017

Peter Zaitsev

Webinar Tuesday June 27, 2017: MariaDB® Server 10.2 – The Complete Guide

MariaDB Server 10.2

MariaDB Server 10.2Join Percona’s Chief Evangelist, Colin Charles as he presents MariaDB Server 10.2: The Complete Guide on Tuesday, June 27, 2017, at 7:00 am PDT / 10:00 am EDT (UTC-7).

The new MariaDB Server 10.2 release is out. It has some interesting new features, but beyond just a list of features we need to understand how to use them. This talk will go over everything new that MariaDB 10.2 has to offer.

In this webinar, we’ll learn about Window functions, common table expressions, finer-grained CREATE USER statements, and more – including getting mysqlbinlog up to parity with MySQL.

There are also unique MariaDB features that don’t exist in MySQL like encryption at rest, integrated Galera Cluster, threadpool, InnoDB defragmentation, roles, extended REGEXP, etc.

The webinar describes all the new features, both MySQL compatible and MariaDB-only ones, and show usage examples and practical use cases. We’ll also review the feature roadmap for MariaDB Server 10.3.

Register for the webinar here.

MariaDB Server 10.2Colin Charles, Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously part of the founding team of MariaDB Server (2009), worked at MySQL since 2005, and has been a MySQL user since 2000. Before joining MySQL, he actively worked on the Fedora and OpenOffice.org projects. He’s well known within open source communities in APAC and has spoken at many conferences.

by Emily Ikuta at June 26, 2017 09:37 PM

MariaDB AB

Schema Sharding with MariaDB MaxScale 2.1 - Part 2

Schema Sharding with MariaDB MaxScale 2.1 - Part 2 Wagner Bianchi Mon, 06/26/2017 - 09:15

In this second installment of schema sharing with MariaDB MaxScale to combine SchemaRouter and ReadWriteSplit MaxScale routers, we'll go through the details of implementing it in order to shard databases among many pairs of master/slave servers.

Before we move forward, I would like to highlight that you need to remove any duplicate database schema or a schema that is present on both shards. Schemas should exist in just one of the shards as when the SchemaRouter plugin starts, it’s going to read all the database schemas that exists on all the shards to get to know where the query will be routed. Take care to use mysql_secure_instalation and remove test database schema to avoid the below error when sending queries to MaxScale that will resolve them with the backend database servers (this rule to avoid having duplicate databases among shards does not apply for mysql system database.):

2017-03-06 16:12:23   error  : [schemarouter] Database 'test' found on servers 'Shard-A' and 'Shard-B' for user appuser@127.0.0.1.
2017-03-06 16:12:23   error  : [schemarouter] Duplicate databases found, closing session.

We can yet execute simple tests for SchemaRouter and for ReadWriteSplit, as below:

#: schemarouter
#: existing databases on shards
[root@maxscale log]# mysql -u appuser -p123456 -h 127.0.0.1 -P 3306 -e "show databases;"
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| shard_a            | <—— shard_a, box01
| shard_b            | <—— shard_b, box03
| test               |
+--------------------+ 
 
#: creating tables on both existing shards/schemas
[root@maxscale log]# mysql -u appuser -p123456 -h 127.0.0.1 -P 3306 -e "create table shard_a.t1(i int, primary key(i));"
[root@maxscale log]# mysql -u appuser -p123456 -h 127.0.0.1 -P 3306 -e "create table shard_b.t1(i int, primary key(i));"
[root@maxscale log]# mysql -u appuser -p123456 -h 127.0.0.1 -P 3306 -e "show tables from shard_a;"
+-------------------+
| Tables_in_shard_a |
+-------------------+
| t1                |
+-------------------+
[root@maxscale log]# mysql -u appuser -p123456 -h 127.0.0.1 -P 3306 -e "show tables from shard_b;"
+-------------------+
| Tables_in_shard_b |
+-------------------+
| t1                |
+-------------------+
 
#: testing the readwritesplit
#: let’s write to master node of shard_a and get the hostname to check if it’s writing to the right node
[root@maxscale log]# mysql -u appuser -p123456 -h 127.0.0.1 -P 3306 -e "insert into shard_a.t1 set i=1,server=(select @@hostname);”
 
#: let’s check if it’s reading form the right node
[root@maxscale log]# mysql -u appuser -p123456 -h 127.0.0.1 -P 3306 -e "select @@hostname,i,server from shard_a.t1;"
+------------+---+--------+
| @@hostname | i | server |
+------------+---+--------+
| box02      | 1 | box01  |
+------------+---+--------+
 
#: let’s do the same for shard_b
[root@maxscale log]# mysql -u appuser -p123456 -h 127.0.0.1 -P 3306 -e "insert into shard_b.t1 set i=1,server=(select @@hostname);"
[root@maxscale log]# mysql -u appuser -p123456 -h 127.0.0.1 -P 3306 -e "select @@hostname,i,server from shard_b.t1;"
+------------+---+--------+
| @@hostname | i | server |
+------------+---+--------+
| box04      | 1 | box03  |
+------------+---+--------+

In the above example, it’s valid to say that, you need to have server replicating in Row Based Replication, where binary log format should be configured as ROW. After these tests above, we can see some internal movement on MaxScale that makes it to increase statistic numbers/registers as you can see below:

[root@maxscale log]# maxadmin show service "Shard-A-Router"
     Service:                             Shard-A-Router
     Router:                              readwritesplit
     State:                               Started
 
     use_sql_variables_in:      all
     slave_selection_criteria:  LEAST_CURRENT_OPERATIONS
     master_failure_mode:       fail_instantly
     max_slave_replication_lag: -1
     retry_failed_reads:        true
     strict_multi_stmt:         true
     disable_sescmd_history:    true
     max_sescmd_history:        0
     master_accept_reads:       false
 
     Number of router sessions:                24
     Current no. of router sessions:           1
     Number of queries forwarded:               47
     Number of queries forwarded to master:     19 (40.43%)
     Number of queries forwarded to slave:      28 (59.57%)
     Number of queries forwarded to all:        46 (97.87%)
     Started:                             Mon Mar  6 17:03:19 2017
     Root user access:                    Disabled
     Backend databases:
          192.168.50.11:3306    Protocol: MySQLBackend    Name: server1
          192.168.50.12:3306    Protocol: MySQLBackend    Name: server2
     Total connections:                   25
     Currently connected:                 1
[root@maxscale log]# maxadmin show service "Shard-B-Router"
     Service:                             Shard-B-Router
     Router:                              readwritesplit
     State:                               Started
 
     use_sql_variables_in:      all
     slave_selection_criteria:  LEAST_CURRENT_OPERATIONS
     master_failure_mode:       fail_instantly
     max_slave_replication_lag: -1
     retry_failed_reads:        true
     strict_multi_stmt:         true
     disable_sescmd_history:    true
     max_sescmd_history:        0
     master_accept_reads:       false
 
     Number of router sessions:                24
     Current no. of router sessions:           1
     Number of queries forwarded:               25
     Number of queries forwarded to master:     18 (72.00%)
     Number of queries forwarded to slave:      7 (28.00%)
     Number of queries forwarded to all:        46 (184.00%)
     Started:                             Mon Mar  6 17:03:19 2017
     Root user access:                    Disabled
     Backend databases:
          192.168.50.13:3306    Protocol: MySQLBackend    Name: server3
          192.168.50.14:3306    Protocol: MySQLBackend    Name: server4
     Total connections:                   25
     Currently connected:                 1
[root@maxscale log]# maxadmin show service "Shard-Service"
Invalid argument: Shard-Service
[root@maxscale log]# maxadmin show service "Sharding-Service"
     Service:                             Sharding-Service
     Router:                              schemarouter
     State:                               Started
 
Session Commands
Total number of queries: 60
Percentage of session commands: 36.67
Longest chain of stored session commands: 0
Session command history limit exceeded: 0 times
Session command history: enabled
Session command history limit: unlimited
 
Session Time Statistics
Longest session: 0.00 seconds
Shortest session: 0.00 seconds
Average session length: 0.00 seconds
Shard map cache hits: 9
Shard map cache misses: 13
 
     Started:                             Mon Mar  6 17:03:25 2017
     Root user access:                    Disabled
     Backend databases:
          127.0.0.1:4006    Protocol: MySQLBackend    Name: Shard-A
          127.0.0.1:4007    Protocol: MySQLBackend    Name: Shard-B
     Total connections:                   23
     Currently connected:                 1

So, at this point as we have all well configured and the both modules we need to combine, the last part of this blog will be to run a sysbench individually for each of the shards, a and b, respectively hitting the ports 4006 and 4007 (as you can see on MaxScale configuration file abstraction) to have better number being reported. The sysbench I run are below:

[root@maxscale log]# sysbench --max-requests=0 --mysql-user=appuser --mysql-password=XXXXXX --mysql-db=shard_a --oltp-table-size=1000000 --oltp-read-only=off --oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=off --db-ps-mode=disable --max-time=60 --oltp-reconnect-mode=transaction --mysql-host=127.0.0.1 --num-threads=16  --mysql-port=4006 --db-driver=mysql --test=oltp run
sysbench 0.4.12:  multi-threaded system evaluation benchmark
 
Running the test with following options:
Number of threads: 16
 
Doing OLTP test.
Running mixed OLTP test
Using Special distribution (12 iterations,  1 pct of values are returned in 75 pct cases)
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Threads started!
Time limit exceeded, exiting...
(last message repeated 15 times)
Done.
 
OLTP test statistics:
    queries performed:
        read:                            10886
        write:                           54430
        other:                           32658
        total:                           97974
    transactions:                        10886  (181.24 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 65316  (1087.45 per sec.)
    other operations:                    32658  (543.73 per sec.)
 
Test execution summary:
    total time:                          60.0633s
    total number of events:              10886
    total time taken by event execution: 960.5471
    per-request statistics:
         min:                                 18.25ms
         avg:                                 88.24ms
         max:                                692.52ms
         approx.  95 percentile:             158.38ms
 
Threads fairness:
    events (avg/stddev):           680.3750/4.04
    execution time (avg/stddev):   60.0342/0.02
 
[root@maxscale log]# sysbench --max-requests=0 --mysql-user=appuser --mysql-password=XXXXXX --mysql-db=shard_b --oltp-table-size=1000000 --oltp-read-only=off --oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=off --db-ps-mode=disable --max-time=60 --oltp-reconnect-mode=transaction --mysql-host=127.0.0.1 --num-threads=16  --mysql-port=4007 --db-driver=mysql --test=oltp run
sysbench 0.4.12:  multi-threaded system evaluation benchmark
 
Running the test with following options:
Number of threads: 16
 
Doing OLTP test.
Running mixed OLTP test
Using Special distribution (12 iterations,  1 pct of values are returned in 75 pct cases)
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Threads started!
Time limit exceeded, exiting...
(last message repeated 15 times)
Done.
 
OLTP test statistics:
    queries performed:
        read:                            12638
        write:                           63190
        other:                           37914
        total:                           113742
    transactions:                        12638  (210.50 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 75828  (1262.99 per sec.)
    other operations:                    37914  (631.49 per sec.)
 
Test execution summary:
    total time:                          60.0386s
    total number of events:              12638
    total time taken by event execution: 960.2365
    per-request statistics:
         min:                                 18.38ms
         avg:                                 75.98ms
         max:                                316.19ms
         approx.  95 percentile:             120.72ms
 
Threads fairness:
    events (avg/stddev):           789.8750/6.69
    execution time (avg/stddev):   60.0148/0.01

And now we can have a look on the Maxadmin statistics for MaxScale:

[root@maxscale log]# maxadmin list services
Services.
--------------------------+-------------------+--------+----------------+-------------------
Service Name              | Router Module     | #Users | Total Sessions | Backend databases
--------------------------+-------------------+--------+----------------+-------------------
Shard-A-Router            | readwritesplit    |      1 |              1 | server1, server2
Shard-B-Router            | readwritesplit    |      1 |              1 | server3, server4
Sharding-Service          | schemarouter      |      1 |              1 | Shard-A, Shard-B
CLI                       | cli               |      1 |              1 |
MaxAdmin                  | cli               |      3 |              4 |
--------------------------+-------------------+--------+----------------+-------------------
 
[root@maxscale log]# maxadmin show service "Sharding-Service"
     Service:                             Sharding-Service
     Router:                              schemarouter
     State:                               Started
 
Session Commands
Total number of queries: 66
Percentage of session commands: 36.36
Longest chain of stored session commands: 0
Session command history limit exceeded: 0 times
Session command history: enabled
Session command history limit: unlimited
 
Session Time Statistics
Longest session: 0.00 seconds
Shortest session: 0.00 seconds
Average session length: 0.00 seconds
Shard map cache hits: 10
Shard map cache misses: 14
 
     Started:                             Mon Mar  6 17:03:25 2017
     Root user access:                    Disabled
     Backend databases:
          127.0.0.1:4006    Protocol: MySQLBackend    Name: Shard-A
          127.0.0.1:4007    Protocol: MySQLBackend    Name: Shard-B
     Total connections:                   25
     Currently connected:                 1
 
[root@maxscale log]# maxadmin show service "Shard-A-Router"
     Service:                             Shard-A-Router
     Router:                              readwritesplit
     State:                               Started
 
     use_sql_variables_in:      all
     slave_selection_criteria:  LEAST_CURRENT_OPERATIONS
     master_failure_mode:       fail_instantly
     max_slave_replication_lag: -1
     retry_failed_reads:        true
     strict_multi_stmt:         true
     disable_sescmd_history:    true
     max_sescmd_history:        0
     master_accept_reads:       false
 
     Number of router sessions:                10933
     Current no. of router sessions:           1
     Number of queries forwarded:               87248
     Number of queries forwarded to master:     87213 (99.96%)
     Number of queries forwarded to slave:      35 (0.04%)
     Number of queries forwarded to all:        10957 (12.56%)
     Started:                             Mon Mar  6 17:03:19 2017
     Root user access:                    Disabled
     Backend databases:
          192.168.50.11:3306    Protocol: MySQLBackend    Name: server1
          192.168.50.12:3306    Protocol: MySQLBackend    Name: server2
     Total connections:                   10934
     Currently connected:                 1
[root@maxscale log]# maxadmin show service "Shard-B-Router"
     Service:                             Shard-B-Router
     Router:                              readwritesplit
     State:                               Started
 
     use_sql_variables_in:      all
     slave_selection_criteria:  LEAST_CURRENT_OPERATIONS
     master_failure_mode:       fail_instantly
     max_slave_replication_lag: -1
     retry_failed_reads:        true
     strict_multi_stmt:         true
     disable_sescmd_history:    true
     max_sescmd_history:        0
     master_accept_reads:       false
 
     Number of router sessions:                42504
     Current no. of router sessions:           1
     Number of queries forwarded:               190647
     Number of queries forwarded to master:     190637 (99.99%)
     Number of queries forwarded to slave:      10 (0.01%)
     Number of queries forwarded to all:        42528 (22.31%)
     Started:                             Mon Mar  6 17:03:19 2017
     Root user access:                    Disabled
     Backend databases:
          192.168.50.13:3306    Protocol: MySQLBackend    Name: server3
          192.168.50.14:3306    Protocol: MySQLBackend    Name: server4
     Total connections:                   42505
     Currently connected:                 1

Conclusion

We saw that MariaDB MaxScale can scale very well the database servers you have on your shards as MaxScale itself will route the traffic based on the database/schema names all over the database nodes, it’s done by the schemarouter, one of the modules or routers supported by MariaDB MaxScale 2.1.3. In this blog post, I combined SchemaRouter with ReadWriteSplit in order to make it more scalable as it’s going to route reads to slave and writes to shard’s masters. It’s good to say that, if you need to have more than one slave per shard, it’s a matter of adding another slave and MariaDB MaxScale will start routing queries to it. I would like to thank the MariaDB Engineering team for the support and for checking/adding features to make this combination of routers available on the new MariaDB MaxScale 2.1.3 version.

In this second installment of schema sharing with MariaDB MaxScale to combine SchemaRouter and ReadWriteSplit MaxScale routers, we'll go through the details of implementing it in order to shard databases among many pairs of master/slave servers.

Login or Register to post comments

by Wagner Bianchi at June 26, 2017 01:15 PM

MariaDB Foundation

Embrace the Community, Fly the Open Source Dream

This article is by the Tencent Game DBA Team, and has been translated from the original. Tencent Game DBA Team (the DBA Team for short) has been serving the game business for a number of years. The mission of the DBA Team is to provide stable and efficient online storage services for Tencent Games. As […]

The post Embrace the Community, Fly the Open Source Dream appeared first on MariaDB.org.

by Ian Gilfillan at June 26, 2017 02:13 AM

June 25, 2017

Valeriy Kravchuk

My First Steps with MariaDB ColumnStore

This is a "Howto" kind of post, and some speculations and historical references aside, it will show you how to build MariaDB ColumnStore (current version 1.0.9) from GitHub source, how to install and configure it for basic usage, as well as how to resolve some minor problems you may get in the process. Keep reading and eventually you'll get the real Howto below :)

* * *

I try not to care about any software issues besides good old MySQL, InnoDB storage engine internals, query optimization and some MyRocks or a little bit of Galera clusters. But as I work in MariaDB Support it is not possible to ignore other things production DBAs use and care about these days. So, recently I noted increased number of issues related to MariaDB ColumnStore and had to step in to resolve some of them. To do this I had first to make sure this software is "supportable by me", that is, I can build it from source on my own hardware (so that I can apply patches if needed and is not dependent of any packaging problems), install, start and stop, do some basic things with it and know what to do to troubleshoot non-trivial problems.

Actually I tried to check if it is "supportable by me" long time ago. First during the company meeting in Berlin back in April 2016 I was presented with some basic installation and usage steps, then a year ago I tried to build one of the very first public releases of ColumnStore, and this ended up with MCOL-142  that was later fixed by the team. Starting from version 1.0.6 and my successful build of MariaDB ColumnStore based on GitHub instructions I was sure that I can supported it if needed.

This week I found myself in a situation that I do need to install MariaDB ColumnStore to help customer, but I had only my Ubuntu 14.04 netbook at hand. Based on my records, I had few minor problems building the engine itself there, as of version 1.0.7:

...
-- Performing Test INLINE - Failed
-- Could NOT find LibXml2 (missing:  LIBXML2_LIBRARIES LIBXML2_INCLUDE_DIR)
CMake Error at CMakeLists.txt:91 (MESSAGE):
  Could not find a usable libxml2 development environment!


-- Configuring incomplete, errors occurred!
See also "/home/openxs/git/mariadb-columnstore-engine/CMakeFiles/CMakeOutput.log".
See also "/home/openxs/git/mariadb-columnstore-engine/CMakeFiles/CMakeError.log".

sudo apt-get install build-essential automake libboost-all-dev bison cmake libncurses5-dev libreadline-dev libperl-dev libssl-dev libxml2-dev libkrb5-dev flex
...
Setting up libboost-wave-dev:amd64 (1.54.0.1ubuntu1) ...
Setting up libboost-all-dev (1.54.0.1ubuntu1) ...
Setting up libperl-dev (5.18.2-2ubuntu1.1) ...
Setting up libhwloc-plugins (1.8-1ubuntu1.14.04.1) ...
Processing triggers for libc-bin (2.19-0ubuntu6.9) ...
openxs@ao756:~/git/mariadb-columnstore-engine$
openxs@ao756:~/git/mariadb-columnstore-engine$ cmake . -DSERVER_BUILD_INCLUDE_DIR=../mariadb-columnstore-server/include -DSERVER_SOURCE_ROOT_DIR=../mariadb-columnstore-server
-- Running cmake version 2.8.12.2
-- MariaDB-Columnstore 1.0.7
-- Found LibXml2: /usr/lib/x86_64-linux-gnu/libxml2.so (found version "2.9.1")
-- Boost version: 1.54.0
-- Found the following Boost libraries:
--   system
--   filesystem
--   thread
--   regex
--   date_time
SERVER_BUILD_INCLUDE_DIR = /home/openxs/git/mariadb-columnstore-engine/../mariadb-columnstore-server/include
SERVER_SOURCE_ROOT_DIR = /home/openxs/git/mariadb-columnstore-engine/../mariadb-columnstore-server
-- Configuring done
-- Generating done
-- Build files have been written to: /home/openxs/git/mariadb-columnstore-engine
openxs@ao756:~/git/mariadb-columnstore-engine$

...
[100%] Built target cpimport
[100%] Building CXX object writeengine/server/CMakeFiles/WriteEngineServer.dir/we_getfilesizes.cpp.o
/home/openxs/git/mariadb-columnstore-engine/writeengine/server/we_getfilesizes.cpp: In static member function ‘static int WriteEngine::WE_GetFileSizes::processFileName(messageqcpp::ByteStream&, std::string&, int)’:
/home/openxs/git/mariadb-columnstore-engine/writeengine/server/we_getfilesizes.cpp:275:19: warning: ‘fileSize’ may be used uninitialized in this function [-Wmaybe-uninitialized]
     bs << fileSize;
                   ^
/home/openxs/git/mariadb-columnstore-engine/writeengine/server/we_getfilesizes.cpp:276:29: warning: ‘compressedFileSize’ may be used uninitialized in this function [-Wmaybe-uninitialized]
     bs << compressedFileSize;
                             ^
Linking CXX executable WriteEngineServer
[100%] Built target WriteEngineServer
openxs@ao756:~/git/mariadb-columnstore-engine$ echo $?
0
openxs@ao756:~/git/mariadb-columnstore-engine$ make install
...
[100%] Built target cpimport
Install the project...
-- Install configuration: "RELWITHDEBINFO"
CMake Error at cmake_install.cmake:44 (FILE):
  file cannot create directory: /usr/local/mariadb/columnstore.  Maybe need
  administrative privileges.


make: *** [install] Error 1
So, first I had some missing build dependencies, had to read the manual and install them. Then both server and engine, as of version 1.0.7, were built successfully, but I had a problem during installation. The reason is clear, ColumnStore used to insist on installing into the /usr/local/mariadb/columnstore only, and I insisted on getting it elsewhere (as I usually do for my source code builds, and failed, so had to give up and use sudo). I got it and moved on at that times. If you care, proper steps for installing into any directory as a non-user are described in KB now.

So, based on the previous attempts and my success on Fedora 23, I decided to get recent 1.0.9 version from GitHub and build it on this Ubuntu 14.04 netbook. I followed current  GitHub instructions and few of my older notes (hence some extra options and not just cmake ., but seems these are truly optional and everything works as described):
...
openxs@ao756:~/git/mariadb-columnstore-server$ cd ../mariadb-columnstore-engine/openxs@ao756:~/git/mariadb-columnstore-engine$ cmake . -DSERVER_BUILD_INCLUDE_DIR=../mariadb-columnstore-server/include -DSERVER_SOURCE_ROOT_DIR=../mariadb-columnstore-server
-- Running cmake version 2.8.12.2
-- MariaDB-Columnstore 1.0.9
-- Boost version: 1.54.0
-- Found the following Boost libraries:
--   system
--   filesystem
--   thread
--   regex
--   date_time
SERVER_BUILD_INCLUDE_DIR = /home/openxs/git/mariadb-columnstore-engine/../mariadb-columnstore-server/include
SERVER_SOURCE_ROOT_DIR = /home/openxs/git/mariadb-columnstore-engine/../mariadb-columnstore-server
-- Configuring done
-- Generating done
-- Build files have been written to: /home/openxs/git/mariadb-columnstore-engine
openxs@ao756:~/git/mariadb-columnstore-engine$ time make -j 2
...
[100%] Built target cpimport
[100%] Building CXX object writeengine/server/CMakeFiles/WriteEngineServer.dir/we_getfilesizes.cpp.o
/home/openxs/git/mariadb-columnstore-engine/writeengine/server/we_getfilesizes.cpp: In static member function -?static int WriteEngine::WE_GetFileSizes::processFileName(messageqcpp::ByteStream&, std::string&, int)Б-?:
/home/openxs/git/mariadb-columnstore-engine/writeengine/server/we_getfilesizes.cpp:275:19: warning: Б-?fileSizeБ-? may be used uninitialized in this function [-Wmaybe-uninitialized]
     bs << fileSize;
                   ^
/home/openxs/git/mariadb-columnstore-engine/writeengine/server/we_getfilesizes.cpp:276:29: warning: Б-?compressedFileSizeБ-? may be used uninitialized in this function [-Wmaybe-uninitialized]
     bs << compressedFileSize;
                             ^
Linking CXX executable WriteEngineServer
[100%] Built target WriteEngineServer

real    43m48.955s
user    80m37.430s
sys     4m17.819s

openxs@ao756:~/git/mariadb-columnstore-engine$ sudo make install
...
-- Installing: /usr/local/mariadb/columnstore/bin/cpimport
-- Set runtime path of "/usr/local/mariadb/columnstore/bin/cpimport" to "/usr/local/mariadb/columnstore/lib"

openxs@ao756:~/git/mariadb-columnstore-engine$ cd /usr/local/mariadb/columnstore/bin/
Next step is to run post-install (as root or via sudo in my case):
openxs@ao756:/usr/local/mariadb/columnstore/bin$ sudo ./post-install
...
There is literally nothing interesting in the output and everything should just work. Now time to configure settings for MariaDB ColumnStore and run postConfigure script (again via sudo):
openxs@ao756:/usr/local/mariadb/columnstore/bin$ sudo ./postConfigure

This is the MariaDB ColumnStore System Configuration and Installation tool.
It will Configure the MariaDB ColumnStore System and will perform a Package
Installation of all of the Servers within the System that is being configured.

IMPORTANT: This tool should only be run on the Parent OAM Module
           which is a Performance Module, preferred Module #1

Prompting instructions:

        Press 'enter' to accept a value in (), if available or
        Enter one of the options within [], if available, or
        Enter a new value


===== Setup System Server Type Configuration =====

There are 2 options when configuring the System Server Type: single and multi

  'single'  - Single-Server install is used when there will only be 1 server configured
              on the system. It can also be used for production systems, if the plan is
              to stay single-server.

  'multi'   - Multi-Server install is used when you want to configure multiple servers now or
              in the future. With Multi-Server install, you can still configure just 1 server
              now and add on addition servers/modules in the future.

Select the type of System Server install [1=single, 2=multi] (2) > 1

Performing the Single Server Install.
Enter System Name (columnstore-1) >

===== Setup Storage Configuration =====


----- Setup Performance Module DBRoot Data Storage Mount Configuration -----

There are 2 options when configuring the storage: internal or external

  'internal' -    This is specified when a local disk is used for the DBRoot storage.
                  High Availability Server Failover is not Supported in this mode

  'external' -    This is specified when the DBRoot directories are mounted.
                  High Availability Server Failover is Supported in this mode.

Select the type of Data Storage [1=internal, 2=external] (1) >


Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm1' (1) >
The MariaDB ColumnStore port of '3306' is already in-use on local server
Enter a different port number > 3307


===== Performing Configuration Setup and MariaDB ColumnStore Startup =====

NOTE: Setting 'NumBlocksPct' to 50%
      Setting 'TotalUmMemory' to 25% of total memory.

Running the MariaDB ColumnStore setup scripts

post-mysqld-install Successfully Completed
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/usr/local/mariadb/columnstore/mysql/lib/mysql/mysql.sock' (2 "No such file or directory")
Error running post-mysql-install, /tmp/post-mysql-install.log
Exiting...
openxs@ao756:/usr/local/mariadb/columnstore/bin$
The process is quite clear and I installed a simple test configuration on a single server, with internal storage used (you may find interesting details on how data are stored in KB articles and in my previous post), single Performance Module (PM) and single DBRoot. The process and details are well described here. It was noted that I have another running instance listening to port 3306 (it is Percona Server that I have on this netbook since my days working for Percona). I've picked up port 3307 for MariaDB instance. Now, it seems post-mysqld-install part was executed successfully and then after waiting for some time there was some failure while connecting to MySQL. I decided that it does not matter (I had some failures on Fedora 23, but eventually it worked, and I was in hurry and proceeded to the KB article on mscadmin and basic usage. I've noted that ColumnStore is NOT started and tried to start it:
openxs@ao756:/usr/local/mariadb/columnstore/bin$ sudo ./mcsadmin startsystem
startsystem   Sat Jun 24 20:24:54 2017
startSystem command, 'columnstore' service is down, sending command to
start the 'columnstore' service on all modules


   System being started, please wait...........
**** startSystem Failed : check log files
openxs@ao756:/usr/local/mariadb/columnstore/bin$
So, I've got my first real trouble with MariaDB ColumnStore and I have no other option than troubleshoot it. Let's check the content of the error log of MariaDB server first:
openxs@ao756:/usr/local/mariadb/columnstore/bin$ tail ../mysql/db/ao756.err
tail: cannot open Б-?../mysql/db/ao756.errБ-? for reading: Permission denied
openxs@ao756:/usr/local/mariadb/columnstore/bin$ sudo tail ../mysql/db/ao756.err
2017-06-24 20:25:14 139795698476992 [Note] InnoDB: Highest supported file format is Barracuda.
2017-06-24 20:25:14 139795698476992 [Note] InnoDB: starting tracking changed pages from LSN 1616918
2017-06-24 20:25:14 139795698476992 [Note] InnoDB: 128 rollback segment(s) are active.
2017-06-24 20:25:14 139795698476992 [Note] InnoDB: Waiting for purge to start
2017-06-24 20:25:14 139795698476992 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.35-80.0 started; log sequence number 1616918
2017-06-24 20:25:14 139795698476992 [Note] Plugin 'FEEDBACK' is disabled.
2017-06-24 20:25:14 139795698476992 [ERROR] /usr/local/mariadb/columnstore/mysql//bin/mysqld: unknown variable 'gtid_mode=ON'
2017-06-24 20:25:14 139795698476992 [ERROR] Aborting

170624 20:25:17 mysqld_safe mysqld from pid file /usr/local/mariadb/columnstore/mysql/db/ao756.pid ended
So, the problem is in incompatible setting in some my.cnf file used by default (this setting was used for Percona Server I have installed from packages). After fixing the problem (I had to proceed so went for as dirty fix of commenting out settings I've found in /etc/my,.cnf) it was easy too start ColumnStore:
openxs@ao756:/usr/local/mariadb/columnstore/bin$ sudo vi /etc/my.cnf
openxs@ao756:/usr/local/mariadb/columnstore/bin$ grep gtid /etc/my.cnf#gtid_mode=ON
#enforce_gtid_consistency
openxs@ao756:/usr/local/mariadb/columnstore/bin$ sudo ./mcsadmin shutdownsystem

shutdownsystem   Sat Jun 24 20:42:21 2017

This command stops the processing of applications on all Modules within the MariaDB ColumnStore System

   Checking for active transactions
           Do you want to proceed: (y or n) [n]: y

   Stopping System...
   Successful stop of System

   Shutting Down System...
   Successful shutdown of System

openxs@ao756:/usr/local/mariadb/columnstore/bin$ sudo ./mcsadmin startsystem
startsystem   Sat Jun 24 20:43:15 2017
startSystem command, 'columnstore' service is down, sending command to
start the 'columnstore' service on all modules


   System being started, please wait..................
   Successful start of System

openxs@ao756:/usr/local/mariadb/columnstore/bin$ ps aux | grep column
root      6993  0.0  0.0  15336  1484 pts/2    S    20:43   0:00 /bin/bash /usr/local/mariadb/columnstore/bin/run.sh /usr/local/mariadb/columnstore/bin/ProcMon
root      6994  7.1  0.4 1214156 15808 pts/2   Sl   20:43   0:05 /usr/local/mariadb/columnstore/bin/ProcMon
root      7162  0.0  0.0   4456   788 pts/2    S    20:43   0:00 /bin/sh /usr/local/mariadb/columnstore/mysql//bin/mysqld_safe --datadir=/usr/local/mariadb/columnstore/mysql/db --pid-file=/usr/local/mariadb/columnstore/mysql/db/ao756.pid --ledir=/usr/local/mariadb/columnstore/mysql//bin
mysql     7380  0.6  4.1 1170124 160416 pts/2  Sl   20:43   0:00 /usr/local/mariadb/columnstore/mysql//bin/mysqld --basedir=/usr/local/mariadb/columnstore/mysql/ --datadir=/usr/local/mariadb/columnstore/mysql/db --plugin-dir=/usr/local/mariadb/columnstore/mysql/lib/plugin --user=mysql --log-error=/usr/local/mariadb/columnstore/mysql/db/ao756.err --pid-file=/usr/local/mariadb/columnstore/mysql/db/ao756.pid --socket=/usr/local/mariadb/columnstore/mysql/lib/mysql/mysql.sock --port=3307
root      7454  0.0  0.2 506564 10564 pts/2    Sl   20:43   0:00 /usr/local/mariadb/columnstore/bin/controllernode fg
root      7475  0.0  0.2 353284 10176 pts/2    Sl   20:43   0:00 /usr/local/mariadb/columnstore/bin/ServerMonitor
root      7502  0.0  0.4 177956 17984 pts/2    Sl   20:43   0:00 /usr/local/mariadb/columnstore/bin/workernode DBRM_Worker1 fg
openxs    7951  0.0  0.0  14652   960 pts/2    S+   20:44   0:00 grep --color=auto column
openxs@ao756:/usr/local/mariadb/columnstore/bin$
 So, it seems the system started as expected, a lot of processes are running, but when I tried to create my first ColumnStore table I've got a message that the storage engine in not supported! Same for InfiniDB.

I'll skip some troubleshooting steps and ideas I had, and just state that:

You should make sure none of MySQL configuration files that are read by default should have any incompatible settings at the moment you run postConfigure script.

You can surely re-run postConfigure when everything is fixed, and this is a proper output for completed, correct attempt:
openxs@ao756:/usr/local/mariadb/columnstore/bin$ sudo ./postConfigure

This is the MariaDB ColumnStore System Configuration and Installation tool.
It will Configure the MariaDB ColumnStore System and will perform a Package
Installation of all of the Servers within the System that is being configured.

IMPORTANT: This tool should only be run on the Parent OAM Module
           which is a Performance Module, preferred Module #1

Prompting instructions:

        Press 'enter' to accept a value in (), if available or
        Enter one of the options within [], if available, or
        Enter a new value


===== Setup System Server Type Configuration =====

There are 2 options when configuring the System Server Type: single and multi

  'single'  - Single-Server install is used when there will only be 1 server configured
              on the system. It can also be used for production systems, if the plan is
              to stay single-server.

  'multi'   - Multi-Server install is used when you want to configure multiple servers now or
              in the future. With Multi-Server install, you can still configure just 1 server
              now and add on addition servers/modules in the future.

Select the type of System Server install [1=single, 2=multi] (1) >

Performing the Single Server Install.
Enter System Name (columnstore-1) >

===== Setup Storage Configuration =====


----- Setup Performance Module DBRoot Data Storage Mount Configuration -----

There are 2 options when configuring the storage: internal or external

  'internal' -    This is specified when a local disk is used for the DBRoot storage.
                  High Availability Server Failover is not Supported in this mode

  'external' -    This is specified when the DBRoot directories are mounted.
                  High Availability Server Failover is Supported in this mode.

Select the type of Data Storage [1=internal, 2=external] (1) >

Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm1' (1) >


===== Performing Configuration Setup and MariaDB ColumnStore Startup =====

NOTE: Setting 'NumBlocksPct' to 50%
      Setting 'TotalUmMemory' to 25% of total memory.

Running the MariaDB ColumnStore setup scripts

post-mysqld-install Successfully Completed
post-mysql-install Successfully Completed

Starting MariaDB Columnstore Database Platform

MariaDB ColumnStore Database Platform Starting, please wait ..... DONE

System Catalog Successfull Created

MariaDB ColumnStore Install Successfully Completed, System is Active

Enter the following command to define MariaDB ColumnStore Alias Commands

. /usr/local/mariadb/columnstore/bin/columnstoreAlias

Enter 'mcsmysql' to access the MariaDB ColumnStore SQL console
Enter 'mcsadmin' to access the MariaDB ColumnStore Admin console

openxs@ao756:/usr/local/mariadb/columnstore/bin$

Node new messages at the end. If you do NOT see any of them do not blindly assume that everything is working as expected. Also note that the tool remembereds previous chopices.

Now I've got access to the engine:
openxs@ao756:/usr/local/mariadb/columnstore/bin$ ../mysql/bin/mysql -uroot test --host=127.0.0.1 --port=3307
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.1.23-MariaDB Columnstore 1.0.9-1
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> show engines;
+--------------------+---------+--------------------------------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                                          | Transactions | XA   | Savepoints |
+--------------------+---------+--------------------------------------------------------------------------------------------------+--------------+------+------------+
| Columnstore        | YES     | Columnstore storage engine                                                                       | YES          | NO   | NO         |
| CSV                | YES     | CSV storage engine                                                                               | NO           | NO   | NO         |
| MyISAM             | YES     | MyISAM storage engine                                                                            | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables                                        | NO           | NO   | NO         |
| MRG_MyISAM         | YES     | Collection of identical MyISAM tables                                                            | NO           | NO   | NO         |
| InfiniDB           | YES     | Columnstore storage engine (deprecated: use columnstore)                                         | YES          | NO   | NO         |
| SEQUENCE           | YES     | Generated tables filled with sequential values                                                   | YES          | NO   | YES        |
| Aria               | YES     | Crash-safe tables with MyISAM heritage                                                           | NO           | NO   | NO         |
| InnoDB             | DEFAULT | Percona-XtraDB, Supports transactions, row-level locking, foreign keys and encryption for tables | YES          | YES  | YES        |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                                                               | NO           | NO   | NO         |
+--------------------+---------+--------------------------------------------------------------------------------------------------+--------------+------+------------+
10 rows in set (0.00 sec)

MariaDB [test]> create table taa(id int, c1 int, c2 date, c3 int) engine=columnstore;
Query OK, 0 rows affected (3.81 sec)

MariaDB [test]> insert into taa values(1,1,now(),1), (2,2,NULL,2),(3,NULL,NULL,3);
Query OK, 3 rows affected, 1 warning (1.15 sec)
Records: 3  Duplicates: 0  Warnings: 1

MariaDB [test]> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1265
Message: Data truncated for column 'c2' at row 1
1 row in set (0.00 sec)

MariaDB [test]> select * from taa;
+------+------+------------+------+
| id   | c1   | c2         | c3   |
+------+------+------------+------+
|    1 |    1 | 2017-06-24 |    1 |
|    2 |    2 | NULL       |    2 |
|    3 | NULL | NULL       |    3 |
+------+------+------------+------+
3 rows in set (0.15 sec)

MariaDB [test]> create table tab like taa;
ERROR 1178 (42000): The storage engine for the table doesn't support The syntax or the data type(s) is not supported by Columnstore. Please check the Columnstore syntax guide for supported syntax or data types.MariaDB [test]> create table tab engine=columnstore select * from taa;
ERROR 1178 (42000): The storage engine for the table doesn't support The syntax or the data type(s) is not supported by Columnstore. Please check the Columnstore syntax guide for supported syntax or data types.MariaDB [test]> create table tab(id int, c1 int, c2 date, c3 int) engine=columnstore;
Query OK, 0 rows affected (3.40 sec)

MariaDB [test]> alter table taa engine=InnoDB;
ERROR 1030 (HY000): Got error 1815 "Unknown error 1815" from storage engine Columnstore
MariaDB [test]>
Note some limitation we have in ColumnStore comparing to "normal" MariaDB it is based on. Some things do not work, some unique error messages and bugs are present, surely. You can read about current known limitations here.

My real goal was to check if NULL values are properly loaded by the cpimport tool, so I tried:
openxs@ao756:/usr/local/mariadb/columnstore/bin$ ./cpimport -e 1 -s '\t' -n1 test tab
Locale is : C
Column delimiter : \t

Using table OID 3005 as the default JOB ID
Input file(s) will be read from : STDIN
Job description file : /usr/local/mariadb/columnstore/data/bulk/tmpjob/3005_D20170624_T211331_S295214_Job_3005.xml
I/O error : Permission denied
I/O error : Permission denied
2017-06-24 21:13:31 (16045) ERR  :  file /usr/local/mariadb/columnstore/data/bulk/tmpjob/3005_D20170624_T211331_S295214_Job_3005.xml does not exist [1055]
Error in loading job information;  The File does not exist. ; cpimport.bin is terminating.
openxs@ao756:/usr/local/mariadb/columnstore/bin$

openxs@ao756:/usr/local/mariadb/columnstore/bin$ cat /tmp/taa.txt | sudo ./cpimport -e 1 -s '\t' -n1 test tab
Locale is : C
Column delimiter : \t

Using table OID 3005 as the default JOB ID
Input file(s) will be read from : STDIN
Job description file : /usr/local/mariadb/columnstore/data/bulk/tmpjob/3005_D20170624_T211615_S997504_Job_3005.xml
Log file for this job: /usr/local/mariadb/columnstore/data/bulk/log/Job_3005.log
2017-06-24 21:16:16 (16582) INFO : successfully loaded job file /usr/local/mariadb/columnstore/data/bulk/tmpjob/3005_D20170624_T211615_S997504_Job_3005.xml
2017-06-24 21:16:16 (16582) INFO : Job file loaded, run time for this step : 0.0409038 seconds
2017-06-24 21:16:16 (16582) INFO : PreProcessing check starts
2017-06-24 21:16:16 (16582) INFO : PreProcessing check completed
2017-06-24 21:16:16 (16582) INFO : preProcess completed, run time for this step : 0.209377 seconds
2017-06-24 21:16:16 (16582) INFO : No of Read Threads Spawned = 1
2017-06-24 21:16:16 (16582) INFO : No of Parse Threads Spawned = 3
2017-06-24 21:16:16 (16582) INFO : Reading input from STDIN to import into table test.tab...
2017-06-24 21:16:16 (16582) INFO : For table test.tab: 4 rows processed and 4 rows inserted.
2017-06-24 21:16:16 (16582) WARN : Column test.tab.c2; Number of invalid dates replaced with zero value : 1
2017-06-24 21:16:17 (16582) INFO : Bulk load completed, total run time : 1.28894 seconds

openxs@ao756:/usr/local/mariadb/columnstore/bin$
The problem is I have the error for a date column, and the error is caused by the first line in the file with column titles, as produced by mysql command line client by default. One has to add -N option to get proper data loaded:
openxs@ao756:/usr/local/mariadb/columnstore/bin$ ../mysql/bin/mysql -uroot test --host=127.0.0.1 --port=3307 -N -e"select * from taa" >/tmp/taa.txt
openxs@ao756:/usr/local/mariadb/columnstore/bin$ cat /tmp/taa.txt
1 1 2017-06-24 1

2 2 NULL 2
3 NULL NULL 3
openxs@ao756:/usr/local/mariadb/columnstore/bin$ cat /tmp/taa.txt | sudo ./cpimport -e 1 -s '\t' -n1 test tab
Locale is : C
Column delimiter : \t

Using table OID 3005 as the default JOB ID
Input file(s) will be read from : STDIN
Job description file : /usr/local/mariadb/columnstore/data/bulk/tmpjob/3005_D20170624_T212114_S554015_Job_3005.xml
Log file for this job: /usr/local/mariadb/columnstore/data/bulk/log/Job_3005.log
2017-06-24 21:21:14 (18107) INFO : successfully loaded job file /usr/local/mariadb/columnstore/data/bulk/tmpjob/3005_D20170624_T212114_S554015_Job_3005.xml
2017-06-24 21:21:14 (18107) INFO : Job file loaded, run time for this step : 0.0412772 seconds
2017-06-24 21:21:14 (18107) INFO : PreProcessing check starts
2017-06-24 21:21:14 (18107) INFO : PreProcessing check completed
2017-06-24 21:21:14 (18107) INFO : preProcess completed, run time for this step : 0.205652 seconds
2017-06-24 21:21:14 (18107) INFO : No of Read Threads Spawned = 1
2017-06-24 21:21:14 (18107) INFO : No of Parse Threads Spawned = 3
2017-06-24 21:21:14 (18107) INFO : Reading input from STDIN to import into table test.tab...
2017-06-24 21:21:15 (18107) INFO : For table test.tab: 3 rows processed and 3 rows inserted.
2017-06-24 21:21:15 (18107) INFO : Bulk load completed, total run time : 1.28646 seconds
openxs@ao756:/usr/local/mariadb/columnstore/bin$ ../mysql/bin/mysql -uroot test --host=127.0.0.1 --port=3307 -N -e"select * from tab"
+------+------+------------+------+
|    1 |    1 | 2017-06-24 |    1 |
|    2 |    2 | NULL       |    2 |
|    3 | NULL | NULL       |    3 |
+------+------+------------+------+
To sunmmarize:
  1. These days one can easliy build, install and configure current version of MariaDB ColumnStore from GitHub sources.
  2. As long as you are reding KB artciles and GitHub documentation carefully, there is no problem to install and configure even non-default environment, where other MySQL versions co-exist.
  3. Youi should make sure node of the configuration files that are read by default contains any incompatible setting (MySQL 5.7-specific ones may fit into that calregory, or anything about GTID-based replication in MySQL).
  4. cpimport is a cool tol for paralles/fast data loads into MariaDB ColumnStore, but it does assume that input is a CSV or tab-separated file, and it tries to load all rows.
  5. Read error messages carefully.
  6. Make sure you read this article if you want to use ColumnStore.

by Valeriy Kravchuk (noreply@blogger.com) at June 25, 2017 11:07 AM

June 23, 2017

Peter Zaitsev

Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication High Availability Webinar: Q & A

High Availability Webinar

High Availability WebinarThank you for attending the Wednesday, June 21, 2017 high availability webinar titled Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication. In this blog, I will provide answers to the Q & A for that webinar.

You can find the slides and a recording of the webinar here.

Is there a minimum MySQL server version for Group Replication?

MySQL Group Replication is GA since MySQL Community 5.7.17. This is the lowest version that you should use for the Group Replication feature. Otherwise, you are using a beta version.

Since 5.7.17 was the GA release, it’s strongly recommended you use the latest 5.7 minor release. Bugs get fixed and features added in each of the minor releases (as can be seen in the Limitations section in the slide deck).

In MySQL 5.6 and earlier versions, Group Replication is not supported. Note that Percona Server for MySQL 5.7.17 and beyond also ships with Group Replication.

Can I use Percona XtraDB Cluster with MariaDB v10.2? or must I use Percona Server for MySQL?

Percona XtraDB Cluster is Percona Server for MySQL and Percona XtraBackup with the modified Galera library. You cannot run Percona XtraDB Cluster on MariaDB.

However, as Percona XtraDB Cluster is open source, it is possible that MariaDB/Codership implements our modifications into their codebase.

If Percona XtraDB Cluster does not allow InnoDB tables, how do we typically deal with applications that need to use MyISAM tables?

You cannot use MyISAM with Percona XtraDB Cluster, Galera or Group Replication. However, there is experimental MyISAM support in Galera/Percona XtraDB Cluster. But we strongly recommend that you don’t use this in production. It effectively executes all statements in Total Order Isolation, which results in bad performance.

What is a typical business use case for the Group Replication? I specifically like the writes order feature.

Typical use cases are:

  • Environments with strict **durability** requirements
  • Write to multiple nodes simultaneously while keeping data **consistent**
  • Reducing failover time
  • Using other nodes for read-scaling, where reading stale data is more difficult for the application (as opposed to standard asynchronous replication)

The use cases for Galera and Percona XtraDB Cluster are similar.

Where do you run ProxySQL, on a separate server? We are using HAProxy.

You can deploy ProxySQL in many different ways. One common method of installation is to run ProxySQL on a separate layer of servers (ensuring there is failover on this layer). Another commonly used method is to run a ProxySQL daemon on every application server.

Do you support KVM?

Yes, there are no limitations on virtualization solutions.

Can you give some examples of an “arbitrator”?

Some useful links:

What does Percona XtraDB add to make it more performant than InnoDB?

The scalability and performance improvement of Percona XtraDB are listed on the Percona Server for MySQL documentation page: https://www.percona.com/doc/percona-server/LATEST/index.html

How scalable is Percona XtraDB Cluster storage wise? Do we have any limitations?

Storage happens through the storage engine (which is InnoDB). Percona XtraDB Cluster does not have any different limitations than Percona Server for MySQL or MySQL.

However, we need to also consider the practical side of things: the larger the cluster gets, the longer certain operations take. For example, when adding a new node to the cluster another node must be the donor and provide all the data. This will take substantially longer with larger datasets. Certain operational aspects might therefore become more complex.

Is there any development to add multiple nodes simultaneously?

No, at the moment only one node can join the cluster at the same time. Other nodes automatically wait until it is finished before joining.

Why does Galera say we cannot use READ COMMITTED isolation for multimaster mode, even though we can start the cluster with READ-COMMITTED?

You can use READ-COMMITTED as transaction isolation level. The limitation is that you cannot use SERIALIZABLE: http://galeracluster.com/documentation-webpages/isolationlevels.html.

Galera Cluster and MariaDB currently do not prevent a user from using this transaction isolation level. Percona XtraDB Cluster implemented the strict mode to prevent these operations: https://www.percona.com/doc/percona-xtradb-cluster/LATEST/features/pxc-strict-mode.html#explicit-table-locking

MariaDB 10.2 fixed the check constraints issue, When will Percona fix this issue?

There are currently no plans to support CHECK constraints in Percona Server for MySQL (and therefore Percona XtraDB Cluster as well).

As Percona Server is effectively a fully backwards-compatible (but modified) MySQL Community Server, CHECK constraints is a feature that normally would be implemented in MySQL Community first.

Can you share your performance benchmark git repository (if you have one)?

We don’t have a performance benchmark in git repository. You can get detailed information about this benchmark in this blog: Performance improvements in Percona XtraDB Cluster 5.7.17-29.20.

On your slide pointing to scalability charts, how many nodes did you run your test against?

We used a three-node cluster for this performance benchmark.

The product is using Master-Master replication. As such what do you mean when you talk about failover in such configuration?

“Master-Master” replication in the MySQL world means that you configure replication from node1 to node2, and from node2 to node1. However, since replication is asynchronous there are no consistency guarantees when writing to both nodes at the same time. Data can diverge, and nodes can end up with different data.

Master-Master is often used in order to failover from one master to another without having to reconfigure replication. All that is needed is to:

  • Stop the traffic from the current master
  • Let the other master sync up (usually almost immediately)
  • Mark the old master read_only, set the new master as read_write
  • Start traffic to the new master

Where do you maintain the cluster state?

All technologies automatically maintain the cluster state as you add and remove nodes.

What are the network/IP requirements for Proxy SQL?

There are no specific requirements. More documentation about ProxySQL can be found here: https://github.com/sysown/proxysql/wiki.

by Kenny Gryp at June 23, 2017 06:45 PM

MariaDB AB

Schema Sharding with MariaDB MaxScale 2.1 - Part 1

Schema Sharding with MariaDB MaxScale 2.1 - Part 1 Wagner Bianchi Fri, 06/23/2017 - 08:45

Most of the time when you start a database design you don’t imagine how your applications need to scale. Sometimes, you need to shard your databases among some different hosts and then, on each shard you want to split reads and writes between master and slaves. This blog is about MariaDB MaxScale being able to handle different databases across shards, and splitting up reads and writes into each of the shards to achieve the maximum level of horizontal scalability with MariaDB Server.

After reading this blog you will be able to:

  • Know how MariaDB MaxScale handles shards with SchemaRouter router;

  • Know how MariaDB MaxScale handles the split of reads and writes with ReadWriteSplit router;

  • Know how we can combine both mentioned router in order to have an scalable environment.

Introduction and Scenario

I have worked with customers and their environments before when it was possible to see customers just duplicating the whole database structure in order to make their systems work. Instead of having a unique database and clients with an ID and their data, some companies have systems that, for each of their clients, they just add a new database with empty tables and start the new client on the system - tables are all the same.

This way, you can imagine that after some time, database servers will have thousands of databases schemas with the same structure to attend different clients. As well, after some time, the capacity problem can arise and a known resolution for this problem is to start sharding the database schemas on more servers.

So, each database server is now a source of databases from a subset of clients and so forth. This is very interesting because the main benefit to use shards is to mitigate the pressure on just one server and spread the world on more servers, while client A’s database is located on shard_a and client B’s database is located on shard_b (for sure, I’m considering an environment with more than 600 databases per server and as said before, I saw customers using up to 6 to 8 shards, sharding clients databases this way). This is not just the number of databases or schemas, but as well, the number of tables each of this schema has (I’m talking about when querying I_S become a complete nightmare and can really crash a database server).

Up on this, we know very well that one of the magic things related to scalability in MySQL world is to split reads and writes, at least, it’s a good start when it can be done. It makes the writes just on master server, while all reads go to the slave or slaves, in case capacity to just one slave is hit. With that, we can say that each of the shards we’re talking about until here should have at least one slave which will be used for reads, while writes go to master, as said previously. With this in mind, I present the following scenario which I will be working with on this blog post:

maxscale - 192.168.50.100 (MaxScale Server)
 
#: SHARD A
box01 - 192.168.50.11 (Shard_A’s master)
    |
    +——— box02 - 192.168.50.12 (Shard_A’s slave)
 
#: SHARD B
box03 - 192.168.50.13 (Shard_B’s master)
    |
    +——— box04 -192.168.50.14 (Shard_B’s slave)

Above, you can see, I setup one server to run MaxScale and four others to run two shards having different databases or schemas in it. The nodes box01 through node 04, they are running MariaDB 10.1.21 (you can also run this on MariaDB 10.2.6, the most recent release of MariaDB Server), setup from MariaDB YUM repository for CentOS 7. You can setup CentOS repository, as well as downloading MaxScale rpm package. I would like to highlight that just the newer version of MariaDB MaxScale is able to combine modules as we’re going to present here: SchemaRouter and ReadWriteSplit.

The replication between servers on each shard is something that you know you need to set up by yourself. Once servers are all up, I’m talking about database nodes from box01 to box04, you can create a replication user on box01 and box03 and, enter the CHANGE MASTER TO, as shown below:

#: create replication user on shard_a
sudo mysql -e "grant replication client, replication slave on *.* to repl@'192.168.%' identified by ‘XXXXXX';"
 
#: create replication user on shard_b
sudo mysql -e "grant replication client, replication slave on *.* to repl@'192.168.%' identified by ‘XXXXXX’;"
 
#: configure replication for shard_a on box02
sudo mysql -e "change master to master_host='192.168.50.11',master_user='repl',master_password='XXXXXX',master_use_gtid=slave_pos,master_connect_retry=5;”
sudo mysql -e "start slave;"
 
#: configure replication for shard_b on box04
sudo mysql -e "change master to master_host='192.168.50.13',master_user='repl',master_password='XXXXXX',master_use_gtid=slave_pos,master_connect_retry=5;"
 sudo mysql -e "start slave;"

MariaDB MaxScale 2.1.3 Setup

Talking about MaxScale, we’re running the version 2.1.3 which supports the combination of SchemaRouter and ReadWriteSplit routers. To set up MaxScale on CentOS 7, you can download the rpm package from MariaDB repository and setup the package as below:

#: setup maxscale
sudo rpm -Uvih https://downloads.mariadb.com/MaxScale/2.1.3/rhel/7server/x86_64/maxscale-2.1.3-1.rhel.7.x86_64.rpm

Create the encrypted password using maxkeys and maxpasswd if you want to encrypt the password for the user that access the database servers: => https://mariadb.com/kb/en/mariadb-enterprise/6360/

#: generate keys
[root@maxscale ~]# maxkeys
Generating .secrets file in /var/lib/maxscale.
 
#: generate the hexadecimal encrypted password to add to the maxscale config file
[root@maxscale ~]# maxpasswd /var/lib/maxscale/ XXXXXX
DF5822F1038A154FEB68E667740B1160
 
#: change .secrets file ownership to permit maxscale user to access it
[root@maxscale log]# chown maxscale:maxscale /var/lib/maxscale/.secrets

After this, add the following configuration file that will combine modules, considering the above servers arrangement, to make the SchemaRouter and ReadWriteSplit happen:

#: maxscale configuration file - /etc/maxscale.cnf
[maxscale]
threads=16
skip_permission_checks=true
 
# Shard-A
 
[Shard-A-Monitor]
type=monitor
module=mysqlmon
servers=server1,server2
user=maxmon
passwd=DF5822F1038A154FEB68E667740B1160
monitor_interval=10000
 
[Shard-A-Router]
type=service
router=readwritesplit
servers=server1,server2
user=maxuser
passwd=DF5822F1038A154FEB68E667740B1160
 
[Shard-A-Listener]
type=listener
service=Shard-A-Router
protocol=MySQLClient
port=4006
 
[server1]
type=server
address=192.168.50.11
port=3306
protocol=MySQLBackend
 
[server2]
type=server
address=192.168.50.12
port=3306
protocol=MySQLBackend
 
# Shard-B
 
[Shard-B-Monitor]
type=monitor
module=mysqlmon
servers=server3,server4
user=maxmon
passwd=DF5822F1038A154FEB68E667740B1160
monitor_interval=10000
 
[Shard-B-Router]
type=service
router=readwritesplit
servers=server3,server4
user=maxuser
passwd=DF5822F1038A154FEB68E667740B1160
 
[Shard-B-Listener]
type=listener
service=Shard-B-Router
protocol=MySQLClient
port=4007
 
[server3]
type=server
address=192.168.50.13
port=3306
protocol=MySQLBackend
 
[server4]
type=server
address=192.168.50.14
port=3306
protocol=MySQLBackend
 
# The two services abstracted as servers
 
[Shard-A]
type=server
address=127.0.0.1
port=4006
protocol=MySQLBackend
 
[Shard-B]
type=server
address=127.0.0.1
port=4007
protocol=MySQLBackend
 
# The sharding service
 
[Sharding-Service]
type=service
router=schemarouter
servers=Shard-A,Shard-B
user=maxuser
passwd=DF5822F1038A154FEB68E667740B1160
router_options=refresh_interval=10
 
[Sharding-Listener]
type=listener
service=Sharding-Service
protocol=MySQLClient
port=3306
 
# Command line interface used by Maxadmin
 
[CLI]
type=service
router=cli

[CLI-Listener]
type=listener
service=CLI
protocol=maxscaled
socket=default

Of course, some other variables and configurations that should be considered as part of the monitor's sections, as well as, if you want to setup an automatic failover on a fail event on the backend servers. But, this is part of a future blog (for more info about how to mix together MariaDB MaxScale and Replication Manager for automatic failover, click here). 

Additionally, continuing with the configurations, make sure you create the user for MaxScale to be able to access database servers when it is started up. The below users are created on both masters, box01 and box03, to make the users to exists on those two shards:

sudo mysql -e "grant all on *.* to maxmon@'%'  identified by 'XXXXXX’;"
sudo mysql -e "grant all on *.* to maxuser@'%' identified by 'XXXXXX’;"
 
sudo mysql -e "grant all on *.* to maxmon@'127.0.0.1'  identified by 'XXXXXX’;"
sudo mysql -e "grant all on *.* to maxuser@'127.0.0.1' identified by 'XXXXXX’;"
 
sudo mysql -e "grant all on *.* to maxmon@'192.168.50.100'  identified by 'XXXXXX’;"
sudo mysql -e "grant all on *.* to maxuser@'192.168.50.100' identified by 'XXXXXX’;"

With this, you can start MaxScale:

[root@maxscale log]# systemctl start maxscale
...
MariaDB MaxScale  /var/log/maxscale/maxscale.log  Mon Mar  6 14:04:04 2017
----------------------------------------------------------------------------
2017-03-06 14:04:04   notice : Working directory: /var/log/maxscale
2017-03-06 14:04:04   notice : MariaDB MaxScale 2.1.3 started
2017-03-06 14:04:04   notice : MaxScale is running in process 4203
2017-03-06 14:04:04   notice : Configuration file: /etc/maxscale.cnf
2017-03-06 14:04:04   notice : Log directory: /var/log/maxscale
2017-03-06 14:04:04   notice : Data directory: /var/lib/maxscale
2017-03-06 14:04:04   notice : Module directory: /usr/lib64/maxscale
2017-03-06 14:04:04   notice : Service cache: /var/cache/maxscale
2017-03-06 14:04:04   notice : Loading /etc/maxscale.cnf.
2017-03-06 14:04:04   notice : /etc/maxscale.cnf.d does not exist, not reading.
2017-03-06 14:04:04   notice : [cli] Initialise CLI router module
2017-03-06 14:04:04   notice : Loaded module cli: V1.0.0 from /usr/lib64/maxscale/libcli.so
2017-03-06 14:04:04   notice : [schemarouter] Initializing Schema Sharding Router.
2017-03-06 14:04:04   notice : Loaded module schemarouter: V1.0.0 from /usr/lib64/maxscale/libschemarouter.so
2017-03-06 14:04:04   notice : [readwritesplit] Initializing statement-based read/write split router module.
2017-03-06 14:04:04   notice : Loaded module readwritesplit: V1.1.0 from /usr/lib64/maxscale/libreadwritesplit.so
2017-03-06 14:04:04   notice : [mysqlmon] Initialise the MySQL Monitor module.
2017-03-06 14:04:04   notice : Loaded module mysqlmon: V1.5.0 from /usr/lib64/maxscale/libmysqlmon.so
2017-03-06 14:04:04   notice : Loaded module MySQLBackend: V2.0.0 from /usr/lib64/maxscale/libMySQLBackend.so
2017-03-06 14:04:04   notice : Loaded module MySQLBackendAuth: V1.0.0 from /usr/lib64/maxscale/libMySQLBackendAuth.so
2017-03-06 14:04:04   notice : Loaded module maxscaled: V2.0.0 from /usr/lib64/maxscale/libmaxscaled.so
2017-03-06 14:04:04   notice : Loaded module MaxAdminAuth: V2.1.3 from /usr/lib64/maxscale/libMaxAdminAuth.so
2017-03-06 14:04:04   notice : Loaded module MySQLClient: V1.1.0 from /usr/lib64/maxscale/libMySQLClient.so
2017-03-06 14:04:04   notice : Loaded module MySQLAuth: V1.1.0 from /usr/lib64/maxscale/libMySQLAuth.so
2017-03-06 14:04:04   notice : No query classifier specified, using default 'qc_sqlite'.
2017-03-06 14:04:04   notice : Loaded module qc_sqlite: V1.0.0 from /usr/lib64/maxscale/libqc_sqlite.so
2017-03-06 14:04:04   notice : Using encrypted passwords. Encryption key: '/var/lib/maxscale/.secrets'.
2017-03-06 14:04:04   notice : [MySQLAuth] [Shard-A-Router] Loaded 4 MySQL users for listener Shard-A-Listener.
2017-03-06 14:04:04   notice : Listening connections at 0.0.0.0:4006 with protocol MySQL
2017-03-06 14:04:04   notice : [MySQLAuth] [Shard-B-Router] Loaded 4 MySQL users for listener Shard-B-Listener.
2017-03-06 14:04:04   notice : Listening connections at 0.0.0.0:4007 with protocol MySQL
2017-03-06 14:04:04   notice : [schemarouter] Authentication data is fetched from all servers. To disable this add 'auth_all_servers=0' to the service.
2017-03-06 14:04:04   notice : Server changed state: server3[192.168.50.13:3306]: new_master. [Running] -> [Master, Running]
2017-03-06 14:04:04   notice : Server changed state: server4[192.168.50.14:3306]: new_slave. [Running] -> [Slave, Running]
2017-03-06 14:04:04   notice : [mysqlmon] A Master Server is now available: 192.168.50.13:3306
2017-03-06 14:04:04   notice : Server changed state: server1[192.168.50.11:3306]: new_master. [Running] -> [Master, Running]
2017-03-06 14:04:04   notice : Server changed state: server2[192.168.50.12:3306]: new_slave. [Running] -> [Slave, Running]
2017-03-06 14:04:04   notice : [mysqlmon] A Master Server is now available: 192.168.50.11:3306
2017-03-06 14:04:10   notice : Listening connections at 0.0.0.0:3306 with protocol MySQL
2017-03-06 14:04:10   notice : Listening connections at /tmp/maxadmin.sock with protocol MaxScale Admin
2017-03-06 14:04:10   notice : MaxScale started with 1 server threads.
2017-03-06 14:04:10   notice : Started MaxScale log flusher.

An additional user needs to be created to be used by the application, which will connect to MaxScale and MaxScale will do the router based on modules. I suggest you create a user like the one below (if the user of your app does more than the privileges below, add the privileges you need or even, create a ROLE and give the ROLE privilege to the user:

#: create the user for the app to connect to db nodes through MaxScale
sudo mysql -e "grant all on *.* to appuser@'%' identified by 'XXXXXX’;"
sudo mysql -e "grant all on *.* to appuser@'127.0.0.1' identified by 'XXXXXX’;"
sudo mysql -e "grant all on *.* to appuser@'192.168.50.100' identified by 'XXXXXX’;"

At this point, we can use MaxAdmin on MaxScale server to exhibit the configuration we have until now:

#: configured services
[root@maxscale ~]# maxadmin list services
Services.
--------------------------+-------------------+--------+----------------+-------------------
Service Name              | Router Module     | #Users | Total Sessions | Backend databases
--------------------------+-------------------+--------+----------------+-------------------
Shard-A-Router            | readwritesplit    |      1 |              1 | server1, server2
Shard-B-Router            | readwritesplit    |      1 |              1 | server3, server4
Sharding-Service          | schemarouter      |      1 |              1 | Shard-A, Shard-B
CLI                       | cli               |      2 |              2 |
--------------------------+-------------------+--------+----------------+-------------------
 
#: database nodes and their status on shards
[root@maxscale ~]# maxadmin list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server1            | 192.168.50.11   |  3306 |           0 | Master, Running
server2            | 192.168.50.12   |  3306 |           0 | Slave, Running
server3            | 192.168.50.13   |  3306 |           0 | Master, Running
server4            | 192.168.50.14   |  3306 |           0 | Slave, Running
Shard-A            | 192.168.50.11   |  3306 |           0 | Running
Shard-B            | 192.168.50.13   |  3306 |           0 | Running
-------------------+-----------------+-------+-------------+--------------------

Maxadmin can tell us the statistics about the modules and the all the work they have done until now, this is just to register that we haven't yet tested the services as it’s going to be done on the next section. Below you can see the statistics of the configured services:

#: Shard-A-Router (readwritesplit)
[root@maxscale log]# maxadmin show service "Shard-A-Router"
     Service:                             Shard-A-Router
     Router:                              readwritesplit
     State:                               Started
 
     use_sql_variables_in:      all
     slave_selection_criteria:  LEAST_CURRENT_OPERATIONS
     master_failure_mode:       fail_instantly
     max_slave_replication_lag: -1
     retry_failed_reads:        true
     strict_multi_stmt:         true
     disable_sescmd_history:    true
     max_sescmd_history:        0
     master_accept_reads:       false
 
     Number of router sessions:                1
     Current no. of router sessions:           1
     Number of queries forwarded:               2
     Number of queries forwarded to master:     0 (0.00%)
     Number of queries forwarded to slave:      2 (100.00%)
     Number of queries forwarded to all:        1 (50.00%)
     Started:                             Mon Mar  6 01:37:27 2017
     Root user access:                    Disabled
     Backend databases:
          192.168.50.11:3306    Protocol: MySQLBackend    Name: server1
          192.168.50.12:3306    Protocol: MySQLBackend    Name: server2
     Total connections:                   2
     Currently connected:                 1
 
#: Shard-B-Router (readwritesplit)
[root@maxscale log]# maxadmin show service "Shard-B-Router"
     Service:                             Shard-B-Router
     Router:                              readwritesplit
     State:                               Started
 
     use_sql_variables_in:      all
     slave_selection_criteria:  LEAST_CURRENT_OPERATIONS
     master_failure_mode:       fail_instantly
     max_slave_replication_lag: -1
     retry_failed_reads:        true
     strict_multi_stmt:         true
     disable_sescmd_history:    true
     max_sescmd_history:        0
     master_accept_reads:       false
 
     Number of router sessions:                1
     Current no. of router sessions:           1
     Number of queries forwarded:               2
     Number of queries forwarded to master:     0 (0.00%)
     Number of queries forwarded to slave:      2 (100.00%)
     Number of queries forwarded to all:        1 (50.00%)
     Started:                             Mon Mar  6 01:37:27 2017
     Root user access:                    Disabled
     Backend databases:
          192.168.50.13:3306    Protocol: MySQLBackend    Name: server3
          192.168.50.14:3306    Protocol: MySQLBackend    Name: server4
     Total connections:                   2
     Currently connected:                 1
 
#: Sharding-Service
[root@maxscale log]# maxadmin show service "Sharding-Service"
     Service:                             Sharding-Service
     Router:                              schemarouter
     State:                               Started
 
Session Commands
Total number of queries: 0
Percentage of session commands: 0.00
Longest chain of stored session commands: 0
Session command history limit exceeded: 0 times
Session command history: enabled
Session command history limit: unlimited
Shard map cache hits: 0
Shard map cache misses: 0
 
     Started:                             Mon Mar  6 01:37:27 2017
     Root user access:                    Disabled
     Backend databases:
          192.168.50.11:3306    Protocol: MySQLBackend    Name: Shard-A
          192.168.50.13:3306    Protocol: MySQLBackend    Name: Shard-B
     Total connections:                   1
     Currently connected:                 1

Conclusion

The main goal of this first part is to show that we can mix together both routers SchemaRouter and ReadWriteSplit to have a robust solution for sharding databases among a number of sharding servers, initially, without changing any code on the application side. We show that the whole job can be done with MariaDB MaxScale after configuring it the way I showed in this blog. The second part, coming up next, will focus on showing the mechanics of the routing, while going through the details of how it works.

Most of the time when you start a database design you don’t imagine how your applications need to scale. Sometimes, you need to shard your databases among some different hosts and then, on each shard you want to split reads and writes between master and slaves. This blog is about MariaDB MaxScale being able to handle different databases across shards, and splitting up reads and writes into each of the shards to achieve the maximum level of horizontal scalability with MariaDB Server.

After reading this blog you will be able to:

  • Know how MariaDB MaxScale handles shards with SchemaRouter router;

  • Know how MariaDB MaxScale handles the split of reads and writes with ReadWriteSplit router;

  • Know how we can combine both mentioned router in order to have an scalable environment.

Login or Register to post comments

by Wagner Bianchi at June 23, 2017 12:45 PM

June 22, 2017

Peter Zaitsev

ClickHouse in a General Analytical Workload (Based on a Star Schema Benchmark)

ClickHouse

ClickHouseIn this blog post, we’ll look at how ClickHouse performs in a general analytical workload using the star schema benchmark test.

We have mentioned ClickHouse in some recent posts (ClickHouse: New Open Source Columnar Database, Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark), where it showed excellent results. ClickHouse by itself seems to be event-oriented RDBMS, as its name suggests (clicks). Its primary purpose, using Yandex Metrica (the system similar to Google Analytics), also points to an event-based nature. We also can see there is a requirement for date-stamped columns.

It is possible, however, to use ClickHouse in a general analytical workload. This blog post shares my findings. For these tests, I used a Star Schema benchmark — slightly-modified so that able to handle ClickHouse specifics.

First, let’s talk about schemas. We need to adjust to ClickHouse data types. For example, the biggest fact table in SSB is “lineorder”. Below is how it is defined for Amazon RedShift (as taken from https://docs.aws.amazon.com/redshift/latest/dg/tutorial-tuning-tables-create-test-data.html):

CREATE TABLE lineorder
(
  lo_orderkey          INTEGER NOT NULL,
  lo_linenumber        INTEGER NOT NULL,
  lo_custkey           INTEGER NOT NULL,
  lo_partkey           INTEGER NOT NULL,
  lo_suppkey           INTEGER NOT NULL,
  lo_orderdate         INTEGER NOT NULL,
  lo_orderpriority     VARCHAR(15) NOT NULL,
  lo_shippriority      VARCHAR(1) NOT NULL,
  lo_quantity          INTEGER NOT NULL,
  lo_extendedprice     INTEGER NOT NULL,
  lo_ordertotalprice   INTEGER NOT NULL,
  lo_discount          INTEGER NOT NULL,
  lo_revenue           INTEGER NOT NULL,
  lo_supplycost        INTEGER NOT NULL,
  lo_tax               INTEGER NOT NULL,
  lo_commitdate        INTEGER NOT NULL,
  lo_shipmode          VARCHAR(10) NOT NULL
);

For ClickHouse, the table definition looks like this:

CREATE TABLE lineorderfull (
        LO_ORDERKEY             UInt32,
        LO_LINENUMBER           UInt8,
        LO_CUSTKEY              UInt32,
        LO_PARTKEY              UInt32,
        LO_SUPPKEY              UInt32,
        LO_ORDERDATE            Date,
        LO_ORDERPRIORITY        String,
        LO_SHIPPRIORITY         UInt8,
        LO_QUANTITY             UInt8,
        LO_EXTENDEDPRICE        UInt32,
        LO_ORDTOTALPRICE        UInt32,
        LO_DISCOUNT             UInt8,
        LO_REVENUE              UInt32,
        LO_SUPPLYCOST           UInt32,
        LO_TAX                  UInt8,
        LO_COMMITDATE           Date,
        LO_SHIPMODE             String
)Engine=MergeTree(LO_ORDERDATE,(LO_ORDERKEY,LO_LINENUMBER),8192);

From this we can see we need to use datatypes like UInt8 and UInt32, which are somewhat unusual for database world datatypes.

The second table (RedShift definition):

CREATE TABLE customer
(
  c_custkey      INTEGER NOT NULL,
  c_name         VARCHAR(25) NOT NULL,
  c_address      VARCHAR(25) NOT NULL,
  c_city         VARCHAR(10) NOT NULL,
  c_nation       VARCHAR(15) NOT NULL,
  c_region       VARCHAR(12) NOT NULL,
  c_phone        VARCHAR(15) NOT NULL,
  c_mktsegment   VARCHAR(10) NOT NULL
);

For ClickHouse, I defined as:

CREATE TABLE customerfull (
        C_CUSTKEY       UInt32,
        C_NAME          String,
        C_ADDRESS       String,
        C_CITY          String,
        C_NATION        String,
        C_REGION        String,
        C_PHONE         String,
        C_MKTSEGMENT    String,
        C_FAKEDATE      Date
)Engine=MergeTree(C_FAKEDATE,(C_CUSTKEY),8192);

For reference, the full schema for the benchmark is here: https://github.com/vadimtk/ssb-clickhouse/blob/master/create.sql.

For this table, we need to define a rudimentary column C_FAKEDATE Date in order to use ClickHouse’s most advanced engine (MergeTree). I was told by the ClickHouse team that they plan to remove this limitation in the future.

To generate data acceptable by ClickHouse, I made modifications to ssb-dbgen. You can find my version here: https://github.com/vadimtk/ssb-dbgen. The most notable change is that ClickHouse can’t accept dates in CSV files formatted as “19971125”. It has to be “1997-11-25”. This is something to keep in mind when loading data into ClickHouse.

It is possible to do some preformating on the load, but I don’t have experience with that. A common approach is to create the staging table with datatypes that match loaded data, and then convert them using SQL functions when inserting to the main table.

Hardware Setup

One of the goals of this benchmark to see how ClickHouse scales on multiple nodes. I used a setup of one node, and then compared to a setup of three nodes. Each node is 24 cores of “Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz” CPUs, and the data is located on a very fast PCIe Flash storage.

For the SSB benchmark I use a scale factor of 2500, which provides (in raw data):

Table lineorder – 15 bln rows, raw size 1.7TB, Table customer – 75 mln rows

When loaded into ClickHouse, the table lineorder takes 464GB, which corresponds to a 3.7x compression ratio.

We compare a one-node (table names lineorderfull, customerfull) setup vs. a three-node (table names lineorderd, customerd) setup.

Single Table Operations

Query:

SELECT
    toYear(LO_ORDERDATE) AS yod,
    sum(LO_REVENUE)
FROM lineorderfull
GROUP BY yod

One node:

7 rows in set. Elapsed: 9.741 sec. Processed 15.00 billion rows, 90.00 GB (1.54 billion rows/s., 9.24 GB/s.)

Three nodes:

7 rows in set. Elapsed: 3.258 sec. Processed 15.00 billion rows, 90.00 GB (4.60 billion rows/s., 27.63 GB/s.)

We see a speed up of practically three times. Handling 4.6 billion rows/s is blazingly fast!

One Table with Filtering

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorderfull
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)

One node:

1 rows in set. Elapsed: 3.175 sec. Processed 2.28 billion rows, 18.20 GB (716.60 million rows/s., 5.73 GB/s.)

Three nodes:

1 rows in set. Elapsed: 1.295 sec. Processed 2.28 billion rows, 18.20 GB (1.76 billion rows/s., 14.06 GB/s.)

It’s worth mentioning that during the execution of this query, ClickHouse was able to use ALL 24 cores on each box. This confirms that ClickHouse is a massively parallel processing system.

Two Tables (Independent Subquery)

In this case, I want to show how Clickhouse handles independent subqueries:

SELECT sum(LO_REVENUE)
FROM lineorderfull
WHERE LO_CUSTKEY IN
(
    SELECT C_CUSTKEY AS LO_CUSTKEY
    FROM customerfull
    WHERE C_REGION = 'ASIA'
)

One node:

1 rows in set. Elapsed: 28.934 sec. Processed 15.00 billion rows, 120.00 GB (518.43 million rows/s., 4.15 GB/s.)

Three nodes:

1 rows in set. Elapsed: 14.189 sec. Processed 15.12 billion rows, 121.67 GB (1.07 billion rows/s., 8.57 GB/s.)

We  do not see, however, the close to 3x speedup on three nodes, because of the required data transfer to perform the match LO_CUSTKEY with C_CUSTKEY

Two Tables JOIN

With a subquery using columns to return results, or for GROUP BY, things get more complicated. In this case we want to GROUP BY the column from the second table.

First, ClickHouse doesn’t support traditional subquery syntax, so we need to use JOIN. For JOINs, ClickHouse also strictly prescribes how it must be written (a limitation that will also get changed in the future). Our JOIN should look like:

SELECT
    C_REGION,
    sum(LO_EXTENDEDPRICE * LO_DISCOUNT)
FROM lineorderfull
ANY INNER JOIN
(
    SELECT
        C_REGION,
        C_CUSTKEY AS LO_CUSTKEY
    FROM customerfull
) USING (LO_CUSTKEY)
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)
GROUP BY C_REGION

One node:

5 rows in set. Elapsed: 31.443 sec. Processed 2.35 billion rows, 28.79 GB (74.75 million rows/s., 915.65 MB/s.)

Three nodes:

5 rows in set. Elapsed: 25.160 sec. Processed 2.58 billion rows, 33.25 GB (102.36 million rows/s., 1.32 GB/s.)

In this case the speedup is not even two times. This corresponds to the fact of the random data distribution for the tables lineorderd and customerd. Both tables were defines as:

CREATE TABLE lineorderd AS lineorder ENGINE = Distributed(3shards, default, lineorder, rand());
CREATE TABLE customerd AS customer ENGINE = Distributed(3shards, default, customer, rand());

Where  rand() defines that records are distributed randomly across three nodes. When we perform a JOIN by LO_CUSTKEY=C_CUSTKEY, records might be located on different nodes. One way to deal with this is to define data locally. For example:

CREATE TABLE lineorderLD AS lineorderL ENGINE = Distributed(3shards, default, lineorderL, LO_CUSTKEY);
CREATE TABLE customerLD AS customerL ENGINE = Distributed(3shards, default, customerL, C_CUSTKEY);

Three Tables JOIN

This is where it becomes very complicated. Let’s consider the query that you would normally write:

SELECT sum(LO_REVENUE),P_MFGR, toYear(LO_ORDERDATE) yod FROM lineorderfull ,customerfull,partfull WHERE C_REGION = 'ASIA' and
LO_CUSTKEY=C_CUSTKEY and P_PARTKEY=LO_PARTKEY GROUP BY P_MFGR,yod ORDER BY P_MFGR,yod;

With Clickhouse’s limitations on JOINs syntax, the query becomes:

SELECT
    sum(LO_REVENUE),
    P_MFGR,
    toYear(LO_ORDERDATE) AS yod
FROM
(
    SELECT
        LO_PARTKEY,
        LO_ORDERDATE,
        LO_REVENUE
    FROM lineorderfull
    ALL INNER JOIN
    (
        SELECT
            C_REGION,
            C_CUSTKEY AS LO_CUSTKEY
        FROM customerfull
    ) USING (LO_CUSTKEY)
    WHERE C_REGION = 'ASIA'
)
ALL INNER JOIN
(
    SELECT
        P_MFGR,
        P_PARTKEY AS LO_PARTKEY
    FROM partfull
) USING (LO_PARTKEY)
GROUP BY
    P_MFGR,
    yod
ORDER BY
    P_MFGR ASC,
    yod ASC

By writing queries this way, we force ClickHouse to use the prescribed JOIN order — at this moment there is no optimizer in ClickHouse and it is totally unaware of data distribution.

There is also not much speedup when we compare one node vs. three nodes:

One node execution time:

35 rows in set. Elapsed: 697.806 sec. Processed 15.08 billion rows, 211.53 GB (21.61 million rows/s., 303.14 MB/s.)

Three nodes execution time:

35 rows in set. Elapsed: 622.536 sec. Processed 15.12 billion rows, 211.71 GB (24.29 million rows/s., 340.08 MB/s.)

There is a way to make the query faster for this 3-way JOIN, however. (Thanks to Alexander Zaytsev from https://www.altinity.com/ for help!)

Optimized query:

SELECT
    sum(revenue),
    P_MFGR,
    yod
FROM
(
    SELECT
        LO_PARTKEY AS P_PARTKEY,
        toYear(LO_ORDERDATE) AS yod,
        SUM(LO_REVENUE) AS revenue
    FROM lineorderfull
    WHERE LO_CUSTKEY IN
    (
        SELECT C_CUSTKEY
        FROM customerfull
        WHERE C_REGION = 'ASIA'
    )
    GROUP BY
        P_PARTKEY,
        yod
)
ANY INNER JOIN partfull USING (P_PARTKEY)
GROUP BY
    P_MFGR,
    yod
ORDER BY
    P_MFGR ASC,
    yod ASC

Optimized query time:

One node:

35 rows in set. Elapsed: 106.732 sec. Processed 15.00 billion rows, 210.05 GB (140.56 million rows/s., 1.97 GB/s.)

Three nodes:

35 rows in set. Elapsed: 75.854 sec. Processed 15.12 billion rows, 211.71 GB (199.36 million rows/s., 2.79 GB/s.

That’s an improvement of about 6.5 times compared to the original query. This shows the importance of understanding data distribution, and writing the optimal query to process the data.

Another option for dealing with JOIN complexity, and to improve performance, is to use ClickHouse’s dictionaries. These dictionaries are described here: https://www.altinity.com/blog/2017/4/12/dictionaries-explained.

I will review dictionary performance in future posts.

Another traditional way to deal with JOIN complexity in an analytics workload is to use denormalization. We can move some columns (for example, P_MFGR from the last query) to the facts table (lineorder).

Observations

  • ClickHouse can handle general analytical queries (it requires special schema design and considerations, however)
  • Linear speedup is possible, but it depends on query design and requires advanced planning — proper speedup depends on data locality
  • ClickHouse is blazingly fast (beyond what I’ve seen before) because it can use all available CPU cores for query, as shown above using 24 cores for single server and 72 cores for three nodes
  • Multi-table JOINs are cumbersome and require manual work to achieve better performance, so consider using dictionaries or denormalization

by Vadim Tkachenko at June 22, 2017 07:20 PM

June 21, 2017

MariaDB AB

Secure Binlog Server: Encrypted binary Logs and SSL Communication

Secure Binlog Server: Encrypted binary Logs and SSL Communication massimiliano_pinto_g Wed, 06/21/2017 - 19:23

The 2.1.3 GA release of MariaDB MaxScale, introduces the following key features for the secure setup of MaxScale Binlog Server:

  • The binlog cache files in the MaxScale host can now be encrypted.

  • MaxScale binlog server also uses SSL in communication with the master and the slave servers.

The MaxScale binlog server can optionally encrypt the events received from the master server: the setup requires a MariaDB (from 10.1.7) master server with encryption active and the mariadb10-compatibility=On option set in maxscale.cnf. This way both master and MaxScale will have encrypted events stored in the binlog files. How does the Binary log encryption work in MariaDB Server and in MaxScale?

Let’s look at MariaDB Server 10.1 or 10.2 implementation first.

  • The encryption is related to stored events and not to their transmission over the network: SSL must be enabled in order to secure the network layer.

  • Each binlog file holds a new special Binlog Event, type 0xa4 (164), called START_ENCRIPTION event: it's written to disk but never sent to connected slave servers.

  • Each binlog event is encrypted/decrypted using an AES Key and an Initialization Vector (IV) built from the "nonce" data in START_ENCRIPTION event and the current event position in the file.

  • The encrypted event has the same size as that of a non encrypted event.

Let’s start with MariaDB Server 10.1.7 configuration: my.cnf.

[mysqld]
…
encrypt-binlog=1
plugin-load-add=file_key_management.so
file_key_management_encryption_algorithm=aes_cbc
file_key_management_filename = /some_path/keys.txt

Binlog files can be optionally encrypted by specifying 

encrypt-binlog=1

The encryption key facility has to be added as well. The easiest one to setup is the Key File:

plugin-load-add=file_key_management.so

With its key file:

file_key_management_filename = /some_path/keys.txt

The keys listed in the key file can be specified per table but the system XtraDB/InnoDB tablespace and binary log files always use the key number 1, so it must always exists.

The last parameter is the AES encryption algorithm. AES_CBC (default) or AES_CTR.

file_key_management_encryption_algorithm=aes_cbc | aes_ctr

There is another key management solution (Eperi Gateway for Databases), which allows storing keys in a key server, preventing an attacker with file system access from unauthorized database file reading.

For additional information, please read more about MariaDB data-at-rest encryption.

MariaDB MaxScale Implementation:

The implementation follows the same MariaDB Server implementation above.

  • MaxScale adds its own START_ENCRIPTION event (which has same size but different content from the master one).

  • The encryption algorithm can be selected: AES_CBC or AES_CTR

  • The event size on disk is the same as the “clear” event.

  • Only key file management is currently supported: no key rotation.

A closer look at the new event might be interesting for most readers:

START_ENCRIPTION size is 36 or 40 bytes in size depending on CRC32 being used or not:

  • Replication header 19 bytes +

  • 1 byte encryption schema           // 1 is for system files

  • 4 bytes Encryption Key Version  // It allows key rotation

  • 12 bytes NONCE random bytes // first part of IV

As the saved event on disk has the same size of the clear event, the event size is in “clear” in the replication header.

Moving some bytes in the replication event header is required in order to encrypt/decrypt the event header and content.

MariaDB MaxScale configuration

Encrypt_binlog = On|Off
Encryption_algorithm = aes_ctr OR aes_cbc
Encryption_key_file =/maxscale_path/enc_key.txt

The specified key file must have this format: a line with 1;HEX(KEY)

Id is the scheme identifier, which must have the value 1 for binlog encryption, the ';' is a separator and HEX(KEY) contains the hex representation of the KEY. The KEY must have exact 16, 24 or 32 bytes size and the selected algorithm (aes_ctr or aes_cbc) with 128, 192 or 256 ciphers will be used.

Note: the key file has the same format as MariaDB Server 10.1 so it's possible to use an existing key file (not encrypted) which could contain several scheme; keys: only key id with value 1 will be parsed, and if not found, an error will be reported.

Example:

#
# This is the Encryption Key File
# key id 1 is for binlog files encryption: it's mandatory
# The keys come from a 32bytes value, 64 bytes with HEX format
#
2;abcdef1234567890abcdef12345678901234567890abcdefabcdef1234567890
1;5132bbabcde33ffffff12345ffffaaabbbbbbaacccddeee11299000111992aaa
3;bbbbbbbbbaaaaaaabbbbbccccceeeddddd3333333ddddaaaaffffffeeeeecccd
 

See all the encryption options in “router options”

[BinlogServer]
type=service
router=binlogrouter
version_string=10.1.17-log
router_options=server-id=93,mariadb10-compatibility=On,encrypt_binlog=On,encryption_key_file=/home/maxscale/binlog/keys.txt,encryption_algorithm=aes_cbc

Enable SSL communication to Master and from connecting Slaves

We have seen how to secure data-at-rest and we would like to show how to secure all the components in the Binlog Server setup:

  • Master

  • MaxScale

  • Slaves

  • Network connections

SSL between the Master and MaxScale is required, along with SSL communications with the slave servers.

How to use SSL in the master connection?

We have to add some new options to CHANGE MASTER TO command issued to MySQL Listener of Binlog Server, the options are used in the following examples:

All options.

MySQL [(none)]> CHANGE MASTER TO MASTER_SSL = 1, MASTER_SSL_CERT='/home/maxscale/packages/certificates/client/client-cert.pem', MASTER_SSL_CA='/home/maxscale/packages/certificates/client/ca.pem', MASTER_SSL_KEY='/home/maxscale/packages/certificates/client/client-key.pem', MASTER_TLS_VERSION='TLSv12';

Or just use some of them:

MySQL [(none)]> CHANGE MASTER TO MASTER_TLS_VERSION='TLSv12';
MySQL [(none)]> CHANGE MASTER TO MASTER_SSL = 0;

Some constraints:

  • In order to enable/re-enable Master SSL communication the MASTER_SSL=1 option is required and all certificate options must be explicitly set in the same CHANGE MASTER TO command.

  • New certificate options changes take effect after MaxScale restart or after MASTER_SSL=1 with the new options.

MySQL> SHOW SLAVE STATUS\G
Master_SSL_Allowed: Yes
       Master_SSL_CA_File: /home/mpinto/packages/certificates/client/ca.pem
       Master_SSL_CA_Path:
          Master_SSL_Cert: /home/mpinto/packages/certificates/client/client-cert.pem
        Master_SSL_Cipher:
           Master_SSL_Key: /home/mpinto/packages/certificates/client/client-key.pem

Setting Up Binlog Server Listener for Slave Server

[Binlog_server_listener]
type=listener
service=BinlogServer
protocol=MySQLClient
address=192.168.100.10
port=8808
authenticator=MySQL
ssl=required
ssl_cert=/usr/local/mariadb/maxscale/ssl/crt.maxscale.pem
ssl_key=/usr/local/mariadb/maxscale/ssl/key.csr.maxscale.pem
ssl_ca_cert=/usr/local/mariadb/maxscale/ssl/crt.ca.maxscale.pem
ssl_version=TLSv12

The final step is for each slave that connects to Binlog Server:

The slave itself should connect to MaxScale using SSL options in the CHANGE MASTER TO … SQL command

The setup is done! Encryption for data-at-rest and for data-in-transit is now complete!

The 2.1.3 GA release of MariaDB MaxScale, introduces the following key features for the secure setup of MariaDB MaxScale Binlog Server:

  • The binlog cache files in the MaxScale host can now be encrypted.

  • MaxScale binlog server also uses SSL in communication with the master and the slave servers.

This blog covers how the binary log encryption works in MariaDB Server and in MariaDB MaxScale.

Login or Register to post comments

by massimiliano_pinto_g at June 21, 2017 11:23 PM

Peter Zaitsev

Percona Monitoring and Management 1.1.5 is Now Available

Percona Monitoring and Management (PMM)

Percona announces the release of Percona Monitoring and Management 1.1.5 on June 21, 2017.

For installation instructions, see the Deployment Guide.


Changes in PMM Server

  • PMM-667: Fixed the Latency graph in the ProxySQL Overview dashboard to plot microsecond values instead of milliseconds.

  • PMM-800: Fixed the InnoDB Page Splits graph in the MySQL InnoDB Metrics Advanced dashboard to show correct page merge success ratio.

  • PMM-1007: Added links to Query Analytics from MySQL Overview and MongoDB Overview dashboards. The links also pass selected host and time period values.

    NOTE: These links currently open QAN2, which is still considered experimental.

Changes in PMM Client

  • PMM-931: Fixed pmm-admin script when adding MongoDB metrics monitoring for secondary in a replica set.

About Percona Monitoring and Management

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

by Alexey Zhebel at June 21, 2017 05:58 PM

Tracing MongoDB Queries to Code with Cursor Comments

Tracing MongoDB Queries

Tracing MongoDB QueriesIn this short blog post, we will discuss a helpful feature for tracing MongoDB queries: Cursor Comments.

Cursor Comments

Much like other database systems, MongoDB supports the ability for application developers to set comment strings on their database queries using the Cursor Comment feature. This feature is very useful for both DBAs and developers for quickly and efficiently tying a MongoDB query found on the database server to a line of code in the application source.

Once Cursor Comments are set in application code, they can be seen in the following areas on the server:

  1. The
    db.currentOp()
     shell command. If Auth is enabled, this requires a role that has the ‘inprog’ privilege.
  2. Profiles in the system.profile collection (per-db) if profiling is enabled.
  3. The QUERY log component.

Note: the Cursor Comment string shows as the field “query.comment” in the Database Profiler output, and as the field originatingCommand.comment in the output of the db.currentOp() command.

This is fantastic because this makes comments visible in the areas commonly used to find performance issues!

Often it is very easy to find a slow query on the database server, but it is difficult to target the exact area of a large application that triggers the slow query. This can all be changed with Cursor Comments!

Python Example

Below is a snippet of Python code implementing a cursor comment on a simple query to the collection “test.test”. (Most other languages and MongoDB drivers should work similarly if you do not use Python.)

My goal in this example is to get the MongoDB Profiler to log a custom comment, and then we will read it back manually from the server afterward to confirm it worked.

In this example, I include the following pieces of data in my comment:

  1. The Python class
  2. The Python method that executed the query
  3. The file Python was executing
  4. The line of the file Python was executing

Unfortunately, three of the four useful details above are not built-in variables in Python, so the “inspect” module is required to fetch those details. Using the “inspect” module and setting a cursor comment for every query in an application is a bit clunky, so it is best to create a method to do this. I made a class-method named “find_with_comment” in this example code to do this. This method performs a MongoDB query and sets the cursor comment automagically, finally returning a regular pymongo cursor object.

Below is the simple Python example script. It connects to a Mongod on localhost:27017, and demonstrates everything for us. You can run this script yourself if you have the “pymongo” Python package installed.

Script:

from inspect import currentframe, getframeinfo
from pymongo import MongoClient
class TestClass:
    def find_with_comment(self, conn, query, db, coll):
        frame      = getframeinfo(currentframe().f_back)
        comment    = "%s:%s;%s:%i" % (self.__class__.__name__, frame.function, frame.filename, frame.lineno)
        collection = conn[db][coll]
        return collection.find(query).comment(comment)
    def run(self):
        uri   = "localhost:27017"
        conn  = MongoClient(uri)
        query = {'user.name': 'John Doe'}
        for doc in self.find_with_comment(conn, query, 'test', 'test'):
            print doc
        conn.close()
if __name__  == "__main__":
    t = TestClass()
    t.run()

There are a few things to explain in this code:

  1. Line #6-10: The “find_with_comment” method runs a pymongo query and handles adding our special cursor comment string. This method takes-in the connection, query and db+collection name as variables.
  2. Line #7: is using the “inspect” module to read the last Python “frame” so we can fetch the file, line number, that called the query.
  3. Line #12-18: The “run” method makes a database connection, runs the “find_with_comment” method with a query, prints the results and closes the connection. This method is just boilerplate to run the example.
  4. Line #20-21: This code initiates the TestClass and calls the “run” method to run our test.

Trying It Out

Before running this script, enable database profiling mode “2” on the “test” database. This is the database the script will query. The profiling mode “2” causes MongoDB to profile all queries:

$ mongo --port=27017
> use test
switched to db test
> db.setProfilingLevel(2)
{ "was" : 1, "slowms" : 100, "ratelimit" : 1, "ok" : 1 }
> quit()

Now let’s run the script. There should be no output from the script, it is only going to do a find query to generate a Profile.

I saved the script as cursor-comment.py and ran it like this from my Linux terminal:

$ python cursor-comment.py
$

Now, let’s see if we can find any Profiles containing the “query.comment” field:

$ mongo --port=27017
> use test
> db.system.profile.find({ "query.comment": {$exists: true} }, { query: 1 }).pretty()
{
	"query" : {
		"find" : "test",
		"filter" : {
			"user.name" : "John Doe"
		},
		"comment" : "TestClass:run;cursor-comment.py:16"
	}
}

Now we know the exact class, method, file and line number that ran this profiled query! Great!

From this Profile we can conclude that the class-method “TestClass:run” initiated this MongoDB query from Line #16 of cursor-comment.py. Imagine this was a query that slowed down your production system and you need to know the source quickly. The usefulness of this feature/workflow becomes obvious, fast.

More on Python “inspect”

Instead of constructing a custom comment like the example above, you can also use Python “inspect” to collect the Python source code comment that precedes the code that is running. This might be useful for projects that have code comments that would be more useful than class/method/file/line number. As the comment is a string, the sky is the limit on what you can set!

Read about the

.getcomments()
  method of “inspect” here: https://docs.python.org/2/library/inspect.html#inspect.getcomments

Aggregation Comments

MongoDB 3.5/3.6 added support for comments in aggregations. This is a great new feature, as aggregations are often heavy operations that would be useful to tie to a line of code as well!

This can be used by adding a “comment” field to your “aggregate” server command, like so:

db.runCommand({
  aggregate: "myCollection",
  pipeline: [
    { $match: { _id: "foo" } }
  ],
  comment: "fooMatch"
})

See more about this new feature in the following MongoDB tickets: SERVER-28128 and DOCS-10020.

Conclusion

Hopefully this blog gives you some ideas on how this feature can be useful in your application. Start adding comments to your application today!

by Tim Vaillancourt at June 21, 2017 05:16 PM

Jean-Jerome Schmidt

Docker: All the Severalnines Resources

While the idea of containers have been around since the early days of Unix, Docker made waves in 2013 when it hit the market with its innovative solution. What began as an open source project, Docker allows you to add your stacks and applications to containers where they share a common operating system kernel. This lets you have a lightweight virtualized system with almost zero overhead. Docker also lets you bring up or down containers in seconds, making for rapid deployment of your stack.

Severalnines, like many other companies, got excited early on about Docker and began experimenting and developing ways to deploy advanced open source database configurations using Docker containers. We also released, early on, a docker image of ClusterControl that lets you utilize the management and monitoring functionalities of ClusterControl with your existing database deployments.

Here are just some of the great resources we’ve developed for Docker over the last few years...

Severalnines on Docker Hub

In addition to the ClusterControl Docker Image, we have also provided a series of images to help you get started on Docker with other open source database technologies like Percona XtraDB Cluster and MariaDB.

Check Out the Docker Images

ClusterControl on Docker Documentation

For detailed instructions on how to install ClusterControl utilizing the Docker Image click on the link below.

Read More

Top Blogs

MySQL on Docker: Running Galera Cluster on Kubernetes

In our previous posts, we showed how one can run Galera Cluster on Docker Swarm, and discussed some of the limitations with regards to production environments. Kubernetes is widely used as orchestration tool, and we’ll see whether we can leverage it to achieve production-grade Galera Cluster on Docker.

Read More

MySQL on Docker: Swarm Mode Limitations for Galera Cluster in Production Setups

This blog post explains some of the Docker Swarm Mode limitations in handling Galera Cluster natively in production environments.

Read More

MySQL on Docker: Composing the Stack

Docker 1.13 introduces a long-awaited feature called Compose-file support. Compose-file defines everything about an application - services, databases, volumes, networks, and dependencies can all be defined in one place. In this blog, we’ll show you how to use Compose-file to simplify the Docker deployment of MySQL containers.

Read More

MySQL on Docker: Deploy a Homogeneous Galera Cluster with etcd

Our journey to make Galera Cluster run smoothly on Docker containers continues. Deploying Galera Cluster on Docker is tricky when using orchestration tools. With this blog, find out how to deploy a homogeneous Galera Cluster with etcd.

Read More

MySQL on Docker: Introduction to Docker Swarm Mode and Multi-Host Networking

This blog post covers the basics of managing MySQL containers on top of Docker swarm mode and overlay network.

Read More

MySQL on Docker: Single Host Networking for MySQL Containers

This blog covers the basics of how Docker handles single-host networking, and how MySQL containers can leverage that.

Read More

MySQL on Docker: Building the Container Image

In this post, we will show you two ways how to build a MySQL Docker image - changing a base image and committing, or using Dockerfile. We’ll show you how to extend the Docker team’s MySQL image, and add Percona XtraBackup to it.

Read More

MySQL Docker Containers: Understanding the basics

In this post, we will cover some basics around running MySQL in a Docker container. It walks you through how to properly fire up a MySQL container, change configuration parameters, how to connect to the container, and how the data is stored.

Read More

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

ClusterControl on Docker

ClusterControl provides advanced management and monitoring functionality to get your MySQL replication and clustered instances up-and-running using proven methodologies that you can depend on to work. Used in conjunction with other orchestration tools for deployment to the containers, ClusterControl makes managing your open source databases easy with point-and-click interfaces and no need to have specialized knowledge about the technology.

ClusterControl delivers on an array of features to help manage and monitor your open source database environments:

  • Management & Monitoring: ClusterControl provides management features to repair and recover broken nodes, as well as test and automate MySQL upgrades.
  • Advanced Monitoring: ClusterControl provides a unified view of all MySQL nodes and clusters across all your data centers and lets you drill down into individual nodes for more detailed statistics.
  • Automatic Failure Detection and Handling: ClusterControl takes care of your replication cluster’s health. If a master failure is detected, ClusterControl automatically promotes one of the available slaves to ensure your cluster is always up.

Learn more about how ClusterControl can enhance performance here or pull the Docker Image here.

We hope that these resources prove useful!

Happy Clustering!

by Severalnines at June 21, 2017 09:59 AM

June 20, 2017

Peter Zaitsev

Webinar Thursday June 22, 2017: Deploying MySQL in Production

Deploying MySQL

Join Percona’s Senior Operations Engineer, Daniel Kowalewski as he presents Deploying MySQL in Production on Thursday, June 22, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7).

 MySQL is famous for being something you can install and get going in less than five minutes in terms of development. But normally you want to run MySQL in production, and at scale. This requires some planning and knowledge. So why not learn the best practices around installation, configuration, deployment and backup?

This webinar is a soup-to-nuts talk that will have you going from zero to hero in no time. It includes discussion of the best practices for installation, configuration, taking backups, monitoring, etc.

Register for the webinar here.

Deploying MySQLDaniel Kowalewski, Senior Technical Operations Engineer

Daniel has been designing and deploying solutions around MySQL for over ten years. He lives for those magic moments where response time drops by 90%, and loves adding more “nines” to everything.

by Dave Avery at June 20, 2017 10:42 PM

The MySQL High Availability Landscape in 2017 (The Elders)

High Availability

In this blog, we’ll look at different MySQL high availability options.

The dynamic MySQL ecosystem is rapidly evolving many technologies built around MySQL. This is especially true for the technologies involved with the high availability (HA) aspects of MySQL. When I joined Percona back in 2009, some of these HA technologies were very popular – but have since been almost forgotten. During the same interval, new technologies have emerged. In order to give some perspective to the reader, and hopefully help to make better choices, I’ll review the MySQL HA landscape as it is in 2017. This review will be in three parts. The first part (this post) will cover the technologies that have been around for a long time: the elders. The second part will focus on the technologies that are very popular today: the adults. Finally, the last part will try to extrapolate which technologies could become popular in the upcoming years: the babies.

Quick disclaimer, I am reporting on the technologies I see the most. There are likely many other solutions not covered here, but I can’t talk about technologies I have barely or never used. Apart from the RDS-related technologies, all the technologies covered are open-source. The target audience for this post are people relatively new to MySQL.

The Elders

Let’s define the technologies in the elders group. These are technologies that anyone involved with MySQL for last ten years is sure to be aware of. I could have called this group the “classics”.  I include the following technologies in this group:

  • Replication
  • Shared storage
  • NDB cluster

Let’s review these technologies in the following sections.

Replication

Simple replication topology

 

MySQL replication is very well known. It is one of the main features behind the wide adoption of MySQL. Replication gets used almost everywhere. The reasons for that are numerous:

  • Replication is simple to setup. There are tons of how-to guides and scripts available to add a slave to a MySQL server. With Amazon RDS, adding a slave is just a few clicks.
  • Slaves allow you to easily scale reads. The slaves are accessible and can be used for reads. This is the most common way of scaling up a MySQL database.
  • Slaves have little impact on the master. Apart from the added network traffic, the presence of slaves does not impact the master performance significantly.
  • It is well known. No surprises here.
  • Used for failover. Your master died, promote a slave and use it as your new master.
  • Used for backups. You don’t want to overload your master with the backups, run them off a slave.

Of course, replication also has some issues:

  • Replication can lag. Replication used to be single-threaded. That means a master with a concurrent load could easily outpace a slave. MySQL 5.6 and MariaDB 10.0 have introduced some parallelism to the slave. Newer versions have further improved to a point where today’s slaves are many times faster than they were.
  • Slaves can diverge. When you modify data on the master, the slave must perform the exact same update. That seems easy, but there are many ways an update can be non-deterministic with statement-based replication. They fixed many issues, and the introduction of row-based replication has been another big step forward. Still, if you write directly to a slave you are asking for trouble. There is a read_only setting, but if the MySQL user has the “SUPER” privilege it is just ignored. That’s why there is now the “super_read_only” setting. Tools like pt-table-checksum and pt-table-sync from the Percona toolkit exist to solve this problem.
  • Replication can impact the master. I wrote above that the presence of slaves does not affect the master, but logging changes are more problematic. The most common issue is the InnoDB table-level locking for auto_increment values with statement-based replication. Only one thread can insert new rows at a time. You can avoid this issue with row-based replication and properly configuring settings.
  • Data gets lost. Replication is asynchronous. That means the master will reply “done” after a commit statement even though the slaves have not received updates yet. Some transactions can get lost if the master crashes.

Although an old technology, a lot of work has been done on replication. It is miles away from the replication implementation of 5.0.x. Here’s a list, likely incomplete, of the evolution of replication:

  • Row based replication (since 5.1). The binary internal representation of the rows is sent instead of the SQL statements. This makes replication more robust against slave divergence.
  • Global transaction ID (since 5.6). Transactions are uniquely identified. Replication can be setup without knowing the binlog file and offset.
  • Checksum (since 5.6). Binlog events have checksum values to validate their integrity.
  • Semi-sync replication (since 5.5). An addition to the replication protocol to make the master aware of the reception of events by the slaves. This helps to avoid losing data when a master crashes.
  • Multi-source replication (since 5.7). Allows a slave to have more than one master.
  • Multi-threaded replication (since 5.6). Allows a slave to use multiple threads. This helps to limit the slave lag.

Managing replication is a tedious job. The community has written many tools to manage replication:

  • MMM. An old Perl tool that used to be quite popular, but had many issues. Now rarely used.
  • MHA. The most popular tool to manage replication. It excels at reconfiguring replication without losing data, and does a decent at handling failover.  It is also simple. No wonder it is popular.
  • PRM. A Pacemaker-based solution developed to replace MMM. It’s quite good at failover, but not as good as MHA at reconfiguring replication. It’s also quite complex, thanks to Pacemaker. Not used much.
  • Orchestrator. The new cool tool. It can manage complex topologies and has a nice web-based interface to monitor and control the topology.

 

Shared Storage

Simple shared storage topology

 

Back when I was working for MySQL ten years ago, shared storage HA setups were very common. A shared storage HA cluster uses one copy of the database files between one of two servers. One server is active, the other one is passive. In order to be shared, the database files reside on a device that can be mounted by both servers. The device can be physical (like a SAN), or logical (like a Linux DRBD device). On top of that, you need a cluster manager (like Pacemaker) to handle the resources and failovers. This solution is very popular because it allows for failover without losing any transactions.

The main drawback of this setup is the need for an idle standby server. The standby server cannot have any other assigned duties since it must always be ready to take over the MySQL server. A shared storage solution is also obviously not resilient to file-level corruption (but that situation is exceptional). Finally, it doesn’t play well with a cloud-based environment.

Today, newly-deployed shared storage HA setups are rare. The only ones I encountered over the last year were either old implementations needing support, or new setups that deployed because of existing corporate technology stacks. That should tell you about the technology’s loss of popularity.

NDB Cluster

A simple NDB Cluster topology

 

An NDB Cluster is a distributed clustering solution that has been around for a long time. I personally started working with this technology back in 2008. An NDB Cluster has three types of nodes: SQL, management and data. A full HA cluster requires a minimum of four nodes.

An NDB Cluster is not a general purpose database due to its distributed nature. For suitable workloads, it is extraordinary good. For unsuitable workloads, it is miserable. A suitable workload for an NDB Cluster contains high concurrency, with a high rate of small primary key oriented transactions. Reaching one million trx/s on an NDB Cluster is nothing exceptional.

At the other end of the spectrum, a poor workload for an NDB Cluster is a single-threaded report query on a star-like schema. I have seen some extreme cases where just the network time of a reporting query amounted to more than 20 minutes.

Although NDB Clusters have improved, and are still improving, their usage has been pushed toward niche-type applications. Overall, the technology is losing ground and is now mostly used for Telcos and online gaming applications.

by Yves Trudeau at June 20, 2017 10:36 PM

Jean-Jerome Schmidt

ClusterControl on Docker

(This blog was updated on June 20, 2017)

We’re excited to announce our first step towards dockerizing our products. Please welcome the official ClusterControl Docker image, available on Docker Hub. This will allow you to evaluate ClusterControl with a couple of commands:

$ docker pull severalnines/clustercontrol

The Docker image comes with ClusterControl installed and configured with all of its components, so you can immediately use it to manage and monitor your existing databases. Supported database servers/clusters:

  • Galera Cluster for MySQL
  • Percona XtraDB Cluster
  • MariaDB Galera Cluster
  • MySQL/MariaDB Replication
  • MySQL/MariaDB single instance
  • MongoDB Replica Set
  • PostgreSQL single instance

As more and more people will know, Docker is based on the concept of so called application containers and is much faster or lightweight than full stack virtual machines such as VMWare or VirtualBox. It's a very nice way to isolate applications and services to run in a completely isolated environment, which a user can launch and tear down the whole stack within seconds.

Having a Docker image for ClusterControl at the moment is convenient in terms of how quickly it is to get it up and running and it's 100% reproducible. Docker users can now start testing ClusterControl, since we have an image that everyone can pull down and then launch the tool.

ClusterControl Docker Image

The image is available on Docker Hub and the code is hosted in our Github repository. Please refer to the Docker Hub page or our Github repository for the latest instructions.

The image consists of ClusterControl and all of its components:

  • ClusterControl controller, CLI, cmonapi, UI and NodeJS packages installed via Severalnines repository.
  • Percona Server 5.6 installed via Percona repository.
  • Apache 2.4 (mod_ssl and mod_rewrite configured)
  • PHP 5.4 (gd, mysqli, ldap, cli, common, curl)
  • SSH key for root user.
  • Deployment script: deploy-container.sh

The core of automatic deployment is inside a shell script called “deploy-container.sh”. This script monitors a custom table inside CMON database called “cmon.containers”. The database container created by run command or some orchestration tool shall report and register itself into this table. The script will look for new entries and perform the necessary actions using s9s, the ClusterControl CLI. You can monitor the progress directly from the ClusterControl UI or using “docker logs” command.

New Deployment - ClusterControl and Galera Cluster on Docker

There are several ways to deploy a new database cluster stack on containers, especially with respect to the container orchestration platform used. The currently supported methods are:

  • Docker (standalone)
  • Docker Swarm
  • Kubernetes

A new database cluster stack would usually consist of a ClusterControl server with a three-node Galera Cluster. From there, you can scale up or down according to your desired setup. We have built another Docker image to serve as database base image called “centos-ssh”, which can be used with ClusterControl for automatic deployment. This image comes with a simple bootstrapping logic to download the public key from ClusterControl container (automatic passwordless SSH setup), register itself to ClusterControl and pass the environment variables for the chosen cluster setup.

Docker (Standalone)

To run on standalone Docker host, one would do the following:

  1. Run the ClusterControl container:

    $ docker run -d --name clustercontrol -p 5000:80 severalnines/clustercontrol
  2. Then, run the DB containers. Define the CC_HOST (the ClusterControl container's IP) environment variable or simply use container linking. Assuming the ClusterControl container name is ‘clustercontrol’:

    $ docker run -d --name galera1 -p 6661:3306 --link clustercontrol:clustercontrol -e CLUSTER_TYPE=galera -e CLUSTER_NAME=mygalera -e INITIAL_CLUSTER_SIZE=3 severalnines/centos-ssh
    $ docker run -d --name galera2 -p 6662:3306 --link clustercontrol:clustercontrol -e CLUSTER_TYPE=galera -e CLUSTER_NAME=mygalera -e INITIAL_CLUSTER_SIZE=3 severalnines/centos-ssh
    $ docker run -d --name galera3 -p 6663:3306 --link clustercontrol:clustercontrol -e CLUSTER_TYPE=galera -e CLUSTER_NAME=mygalera -e INITIAL_CLUSTER_SIZE=3 severalnines/centos-ssh

ClusterControl will automatically pick the new containers to deploy. If it finds the number of containers is equal or greater than INITIAL_CLUSTER_SIZE, the cluster deployment shall begin. You can verify that with:

$ docker logs -f clustercontrol

Or, open the ClusterControl UI at http://{Docker_host}:5000/clustercontrol and look under Activity -> Jobs.

To scale up, just run new containers and ClusterControl will add them into the cluster automatically:

$ docker run -d --name galera4 -p 6664:3306 --link clustercontrol:clustercontrol -e CLUSTER_TYPE=galera -e CLUSTER_NAME=mygalera -e INITIAL_CLUSTER_SIZE=3 severalnines/centos-ssh

Docker Swarm

Docker Swarm allows container deployment on multiple hosts. It manages a cluster of Docker Engines called a swarm.

Take note of some prerequisites for running ClusterControl and Galera Cluster on Docker Swarm:

  • Docker Engine version 1.12 and later.
  • Docker Swarm Mode is initialized.
  • ClusterControl must be connected to the same overlay network as the database containers.

In Docker Swarm mode, centos-ssh will default to look for 'cc_clustercontrol' as the CC_HOST. If you create the ClusterControl container with 'cc_clustercontrol' as the service name, you can skip defining CC_HOST variable.

We have explained the deployment steps in this blog post.

Kubernetes

Kubernetes is an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts, and provides a container-centric infrastructure. In Kubernetes, centos-ssh will default to look for “clustercontrol” service as the CC_HOST. If you use other name, please specify this variable accordingly.

Example YAML definition of ClusterControl deployment on Kubernetes is available in our Github repository under Kubernetes directory. To deploy a ClusterControl pod using a ReplicaSet, the recommended way is:

$ kubectl create -f cc-pv-pvc.yml
$ kubectl create -f cc-rs.yml

Kubernetes will then create the necessary PersistentVolume (PV) and PersistentVolumeClaim (PVC) by connecting to the NFS server. The deployment definition (cc-rs.yml) will then run a pod and use the created PV resources to map the datadir and cmon configuration directory for persistency. This allows the ClusterControl pod to be relocated to other Kubernetes nodes if it is rescheduled by the scheduler.

Import Existing DB Containers into ClusterControl

If you already have a Galera Cluster running on Docker and you would like to have ClusterControl manage it, you can simply run the ClusterControl container in the same Docker network as the database containers. The only requirement is to ensure the target containers have SSH related packages installed (openssh-server, openssh-clients). Then allow passwordless SSH from ClusterControl to the database containers. Once done, use “Add Existing Server/Cluster” feature and the cluster should be imported into ClusterControl.

Example of importing the existing DB containers

We have a physical host, 192.168.50.130 installed with Docker, and assume that you already have a three-node Galera Cluster in containers, running under the standard Docker bridge network. We are going to import the cluster into ClusterControl, which is running in another container. The following is the high-level architecture diagram:

Adding into ClusterControl

  1. Install OpenSSH related packages on the database containers, allow the root login, start it up and set the root password:

    $ docker exec -ti [db-container] apt-get update
    $ docker exec -ti [db-container] apt-get install -y openssh-server openssh-client
    $ docker exec -it [db-container] sed -i 's|^PermitRootLogin.*|PermitRootLogin yes|g' /etc/ssh/sshd_config
    $ docker exec -ti [db-container] service ssh start
    $ docker exec -it [db-container] passwd
  2. Start the ClusterControl container as daemon and forward port 80 on the container to port 5000 on the host:

    $ docker run -d --name clustercontrol -p 5000:80 severalnines/clustercontrol
  3. Verify the ClusterControl container is up:

    $ docker ps | grep clustercontrol
    59134c17fe5a        severalnines/clustercontrol   "/entrypoint.sh"       2 minutes ago       Up 2 minutes        22/tcp, 3306/tcp, 9500/tcp, 9600/tcp, 9999/tcp, 0.0.0.0:5000->80/tcp   clustercontrol
  4. Open a browser, go to http://{docker_host}:5000/clustercontrol and create a default admin user and password. You should now see the ClusterControl landing page.

  5. The last step is setting up the passwordless SSH to all database containers. Attach to ClusterControl container interactive console:

    $ docker exec -it clustercontrol /bin/bash
  6. Copy the SSH key to all database containers:

    $ ssh-copy-id 172.17.0.2
    $ ssh-copy-id 172.17.0.3
    $ ssh-copy-id 172.17.0.4
  7. Start importing the cluster into ClusterControl. Open a web browser and go to Docker’s physical host IP address with the mapped port e.g, http://192.168.50.130:5000/clustercontrol and click “Add Existing Cluster/Server” and specify following information:

    Ensure you got the green tick when entering the hostname or IP address, indicating ClusterControl is able to communicate with the node. Then, click Import button and wait until ClusterControl finishes its job. The database cluster will be listed under the ClusterControl dashboard once imported.

Happy containerizing!

by ashraf at June 20, 2017 11:09 AM

Shlomi Noach

Observations on the hashicorp/raft library, and notes on RDBMS

The hashicorp/raft library is a Go library to provide consensus via Raft protocol implementation. It is the underlying library behind Hashicorp's Consul.

I've had the opportunity to work with this library a couple projects, namely freno and orchestrator. Here are a few observations on working with this library:

  • TL;DR on Raft: a group communication protocol; multiple nodes communicate, elect a leader. A leader leads a consensus (any subgroup of more than half the nodes of the original group, or hopefully all of them). Nodes may leave and rejoin, and will remain consistent with consensus.
  • The hashicorp/raft library is an implementation of the Raft protocol. There are other implementations, and different implementations support different features.
  • The most basic premise is leader election. This is pretty straightforward to implement; you set up nodes to communicate to each other, and they elect a leader. You may query for the leader identity via Leader(), VerifyLeader(), or observing LeaderCh.
  • You have no control over the identity of the leader. You cannot "prefer" one node to be the leader. You cannot grab leadership from an elected leader, and you cannot demote a leader unless by killing it.
  • The next premise is gossip, sending messages between the raft nodes. With hashicorp/raft, only the leader may send messages to the group. This is done via the Apply() function.
  • Messages are nothing but blobs. Your app encodes the messages into []byte and ships it via raft. Receiving ends need to decode the bytes into a meaningful message.
  • You will check the result of Apply(), an ApplyFuture. The call to Error() will wait for consensus.
  • Just what is a message consensus? It's a guarantee that the consensus of nodes has received and registered the message.
  • Messages form the raft log.
  • Messages are guaranteed to be handled in-order across all nodes.
  • The leader is satisfied when the followers receive the messages/log, but it cares not for their interpretation of the log.
  • The leader does not collect the output, or return value, of the followers applying of the log.
  • Consequently, your followers may not abort the message. They may not cast an opinion. They must adhere to the instruction received from the leader.
  • hashicorp/raft uses either an LMDB-based store or BoltDB for persisting your messages. Both are transactional stores.
  • Messages are expected to be idempotent: a node that, say, happens to restart, will request to join back the consensus (or to form a consensus with some other node). To do that, it will have to reapply historical messages that it may have applied in the past.
  • Number of messages (log entries) will grow infinitely. Snapshots are taken so as to truncate the log history. You will implement the snapshot dump & load.
  • A snapshot includes the log index up to which it covers.
  • Upon startup, your node will look for the most recent snapshot. It will read it, then resume replication from the aforementioned log index.
  • hashicorp/raft provides a file-system based snapshot implementation.

One of my use cases is completely satisfied with the existing implementations of BoltDB and of the filesystem snapshot.

However in another (orchestrator), my app stores its state in a relational backend. To that effect, I've modified the logstore and snapshot store. I'm using either MySQL or sqlite as backend stores for my app. How does that affect my raft use?

  • My backend RDBMS is the de-facto state of my orchestrator app. Anything written to this DB is persisted and durable.
  • When orchestrator applies a raft log/message, it runs some app logic which ends with a write to the backend DB. At that time, the raft log is effectively not required anymore to persist. I care not for the history of logs.
  • Moreover, I care not for snapshotting. To elaborate, I care not for snapshot data. My backend RDBMS is the snapshot data.
  • Since I'm running a RDBMS, I find BoltDB to be wasteful, an additional transaction store on top a transaction store I already have.
  • Likewise, the filesystem snapshots are yet another form of store.
  • Log Store (including Stable Store) are easily re-implemented on top of RDBMS. The log is a classic relational entity.
  • Snapshot is also implemented on top of RDBMS,  however I only care for the snapshot metadata (what log entry is covered by a snapshot) and completely discard storing/loading snapshot state or content.
  • With all these in place, I have a single entity that defines:
    • What my data looks like
    • Where my node fares in the group gossip
  • A single RDBMS restore returns a dataset that will catch up with raft log correctly. However my restore window is limited by the number of snapshots I store and their frequency.

by shlomi at June 20, 2017 04:05 AM

June 19, 2017

Peter Zaitsev

Upcoming HA Webinar Wed 6/21: Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication

High Availability

High AvailabilityJoin Percona’s MySQL Practice Manager Kenny Gryp and QA Engineer, Ramesh Sivaraman as they present a high availability webinar around Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication on Wednesday, June 21, 2017 at 10:00 am PDT / 1:00 pm EDT (UTC-7).

What are the implementation differences between Percona XtraDB Cluster 5.7, Galera Cluster 5.7 and MySQL Group Replication?

  • How do they work?
  • How do they behave differently?
  • Do these methods have any major issues?

This webinar will describe the differences and shed some light on how QA is done for each of the different technologies.

Register for the webinar here.

High AvailabilityRamesh Sivaraman, QA Engineer

Ramesh joined the Percona QA Team in March 2014. He has almost six years of experience in database administration and, before joining Percona, was giving MySQL database support to various service and product based internet companies. Ramesh’s professional interests include writing shell/Perl script to automate routine tasks and new technology. Ramesh lives in Kerala, the southern part of India, close to his family.

High AvailabilityKenny Gryp, MySQL Practice Manager

Kenny is currently MySQL Practice Manager at Percona.

by Dave Avery at June 19, 2017 06:53 PM

MariaDB Server 10.2 GA Release Overview

MariaDB Server 10.2

MariaDB Server 10.2This blog post looks at the recent MariaDB Server 10.2 GA release.

Congratulations to the MariaDB Foundation for releasing a generally available (GA) stable version of MariaDB Server 10.2! We’ll definitely spend the next few weeks talking about MariaDB Server 10.2, but here’s a quick overview in the meantime. Keep in mind that when thinking about compatibility, this is meant to be the equivalent of MySQL 5.7 (GA: October 21, 2015, with Percona Server for MySQL 5.7 GA available February 23, 2016).

Some of the highlights include:

  • Window functions – this is the first release in the MySQL ecosystem that includes Window functions and Recursive Common Table Expression. At the time of this writing, MariaDB hasn’t completed the documentation. It is worth noting that the implementation of Window functions in MariaDB Server 10.2 differs from what you see in MariaDB ColumnStore.
  • JSON functions – Many JSON functions to query, update, index and validate JSON. It’s worth noting that MariaDB Server 10.2 does not include a JSON data type as compared to MySQL 5.7). This means you can’t do CREATE TABLE t1 (jdoc JSON) – instead you need to use a VARCHAR or TEXT column. There are also other differences that produce different result sets, and seemingly no column path operator.
  • There is also support for GeoJSON functionality, but when we tried ST_AsGeoJSON (yes, documentation needs work), we noticed that the output could vary from MySQL 5.7.
  • MyRocks – MariaDB added the hot new storage engine MyRocks as an alpha. You will have to install the MyRocks engine package separably. It isn’t fully merged yet. Watch the umbrella task MDEV-9658.
  • SHOW CREATE USER – A new SHOW CREATE USER statement allows you to look at user limitation, as you can now limit users to a maximum number of queries, updates and connections per hour, as well as a maximum number of connections permitted by the user (see setting account resource limits for the MySQL 5.7 equivalent). You’ll want to read the updated documentation around CREATE USER. Don’t be surprised when you see something like “ERROR 1226 (42000): User ‘foo’ has exceeded the ‘max_user_connections’ resource (current value: 1)”. This also is an extension to ALTER USER.
  • Flashback – binary log based rollback, aka flashback, can rollback tables and databases to an older snapshot. This should help when the DBA or a user makes an error. This tool works well as long as its a DML statement. This feature came from Alibaba’s AliSQL tree.
  • Time delayed replication – new in MariaDB Server 10.2.
  • OpenSSL 1.1 – now there is support for OpenSSL 1.1, LibreSSL
  • MariaDB Connector/C – most importantly, MariaDB Connector/C replaces libmysql (see: MDEV-9055). This should be API and ABI compatible, but naturally there are some teething problems (see: MDEV-12950).
  • Amazon Key Management plugin – from a key management standpoint, the Amazon Key Management plugin is now available to use for encryption. It’s compiled and available as a package. Previously, you had to compile it yourself.

Some of the important things to take note of are:

  • As of this release, MariaDB now ships with InnoDB as the default storage engine (as opposed to Percona XtraDB). This means that from here on out, the improvements and fixes to Percona XtraDB won’t necessarily be available in MariaDB. This also means that Percona XtraDB parameters might get ignored (as reported in MDEV-12472).
  • In MySQL 5.6+, you can use SHA-256 pluggable authentication. However, this features is still not implemented in MariaDB Server 10.2 (see: MDEV-9804). You can use the ed25519 authentication plugin as a replacement, however.
  • When it comes to replication, MySQL 5.7 defaults to row-based replication. MariaDB Server 10.2 defaults to mixed-mode replication (see the discussion around this at MDEV-7635).
  • It is worth noting that in order to make MariaDB Server more “Oracle compatible,” DECIMAL now goes up to 38 decimals instead of 30 decimals. MDEV-10138 tells you what happens when you migrate from a long decimal to a default decimal type install (i.e., if you’re moving to another variant in the MySQL ecosystem).
  • If you’re familiar with how MySQL 5.7 manages passwords and a new install, the MariaDB Server 10.2 method hasn’t changed.

All in all, this release took a little over a year to make (Alpha was 18 April 2016, GA was 23 May 2017). It is extremely important to read the release notes and the changelogs of each and every release, as MariaDB Server diverges from MySQL quite a bit. At Percona, we will monitor Jira closely to ensure that you always stay informed of the latest changes.

by Colin Charles at June 19, 2017 06:36 PM

MariaDB AB

MariaDB Server Default in Debian 9

MariaDB Server Default in Debian 9 rasmusjohansson Mon, 06/19/2017 - 12:15

One way of distributing Linux based software is to make the software so popular and easily consumable that distributions want to include it. By consumable, I mean that the software should be packaged correctly, processes around the software need to be transparent and any issues found with the software should be addressed swiftly.Debian logo large.jpg

MariaDB Server has become a software package distributed widely by the Linux distributions. There are a lot of distros including MariaDB Server nowadays. We try to keep an updated list of distributions that include MariaDB but let us know if there is something missing.

Since MariaDB started as a fork of MySQL, users evaluate between MariaDB or MySQL when choosing the right open source database. The same question is relevant for the distros as well, which variant should be installed by default? Red Hat, SUSE, CentOS, Fedora and many others already choose to have MariaDB Server as the default MySQL variant.

Now Debian joins by making MariaDB Server 10.1 the default in Debian 9.

With MariaDB Server as the default in Debian it’s included in the Stable repository for Debian 9 and available to all users of Debian 9. To read Debian’s update on this, please refer to the what’s new in Debian 9 article.  

From a MariaDB Server perspective, to be the default MySQL variant in Debian provides many benefits:

  • There will be more users running MariaDB Server. Whenever there is a dependency on MySQL or MariaDB, MariaDB Server will be installed by default.

  • Popular software will make sure that they are compatible with MariaDB since MariaDB Server will be installed by default.

  • Being the default means that MariaDB Server is part of Debian’s Stable repository and is being built by the Debian build system on all possible platforms from IBM System z to ARM-based platforms.

There are of course benefits for the users as well:

  • MariaDB follows good open source practices such as being open about security issues and providing immediate information about them. Also, the open development of MariaDB Server is hugely important. Users can see what’s going on, can participate in development and will know when they can expect new features and versions.

  • Debian users can benefit from all features provided in MariaDB Server 10.1 such as multi-master support through the inclusion of Galera, Data at Rest Encryption and improved replication.

Let’s look at what all this means in practice. Let’s say you have installed Debian 9 as your web server on which you want to install Wordpress for your blog. Looking at the details of the Wordpress package you’ll see a couple of dependencies and suggested installations together with installing Wordpress itself:

  • The first dependency of interest in this case is on the default-mysql-client. This is a metapackage that points to the latest version available in the repository for the chosen MySQL compatible client library that should be used with Wordpress. The metapackage points to mariadb-client-10.1.

  • The other dependency, which is of type suggestion is the package default-mysql-server. This is again a metapackage that points to the default MySQL variant in Debian, which is the package mariadb-server-10.1.

When installing Wordpress you’ll end up with the MariaDB 10.1 client library installed and you’ll also install MariaDB Server 10.1 (if you accept the suggestion).

A lot of work went into the packaging of MariaDB to support the concept of metapackages, and making the installation and upgrading process as smooth as possible as well as to also coordinate any package compatibility issues when making MariaDB the default MySQL variant. A special thanks goes to the package maintainers Otto Kekäläinen, Arnaud Fontaine and Ondřej Surý for making this happen! Thanks also to the many other developers  addressing issues found by the package maintainers and the Debian build system.

Especially if you’re upgrading from an earlier version of Debian and you already have MySQL installed on your system, we recommend that you read through our guidance in the article Moving from MySQL to MariaDB in Debian 9.

Testing has been done on both the Debian side as well as the MariaDB side. If you run into any problems when installing, upgrading or migrating to MariaDB Server, we’re of course here to help.

Red Hat, SUSE, CentOS, Fedora and many others already choose to have MariaDB Server as the default MySQL variant. Now Debian joins by making MariaDB Server 10.1 the default in Debian 9.

Login or Register to post comments

by rasmusjohansson at June 19, 2017 04:15 PM

MariaDB Foundation

Duel: gdb vs. linked lists, trees, and hash tables

My first encounter with the gdb command duel was on some old IRIX about 15 years ago. I immediately loved how convenient it was for displaying various data structures during MySQL debugging, and I wished Linux had something similar. Later I found out that Duel was not something IRIX specific, but a public domain patch […]

The post Duel: gdb vs. linked lists, trees, and hash tables appeared first on MariaDB.org.

by Sergei at June 19, 2017 09:01 AM

June 18, 2017

MariaDB Foundation

Debian 9 released with MariaDB as the only MySQL variant

The Debian project has today announced their 9th release, code named “Stretch”. This is a big milestone for MariaDB, because the release team decided to ship and support only one MySQL variant in this release, and MariaDB was chosen over MySQL. This is prominently mentioned in the press release about Debian 9 and more information […]

The post Debian 9 released with MariaDB as the only MySQL variant appeared first on MariaDB.org.

by Otto Kekäläinen at June 18, 2017 04:24 PM

Valeriy Kravchuk

MySQL Support Engineer's Chronicles, Issue #6

Previous post in series was published almost 4 months ago, but I do not plan to end it. So, let me quickly discuss some of problems I worked on or was interested in so far in June, and provide some useful links.

Back on June 2 I had to find out what exact files are created by MariaDB's ColumnStore when I create a table in this storage engine. Actually in recent versions one can check the tables in the INFORMATION_SCHEMA, but if still wonders why are all these directories with numbers in the names (/usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/193.dir/000.dir/FILE000.cdf), please, check also this storage architecture review and take into account the following hint by the famous LinuxJedi:
partA: oid>>24
partB: (oid&0x00ff0000)>>16
partC: (oid&0x0000ff00) >> 8
partD: oid&0x000000ff
partE: partition
fName: segment
Those numbers in the directory names are decimal integers, 3 digits, zero-padded. oid is object_id.

If you plan to try to work with MariaDB ColumnStore, please, make sure to check also this KB article, MariaDB ColumnStore System Usage.

Later I had to work on some out-of-memory issues, and found this article on how to get the details on transparent huge pages usage on RedHat systems very useful.

No matter how much I hate clusters, I have to deal at least with Galera-based ones. One of the problems to deal with, recently, was about many threads "hanging" in "wsrep in pre-commit stage" status. In the process of investigation I had found this bug report, lp:1197771, and I am still wondering what is the proper status for it ("Fix Committed" to nothing does not impress me). Check also this discussion.

The most important discussions for MySQL Community, IMHO, happened around June 9 in relation to Bug #86607, "InnoDB crashed when master thread evict dict_table_t object". Great finding and detailed analysis by Alibaba's engineer, Jianwei Zhao, as well as his detailed MTR test case (that worked on a bit modified source code with a couple of lines like os_thread_sleep(2000000); added here and there), were all rejected by my deal old friend Sinisa... It caused a lot of reactions and comments on Facebook. Further references to magic "Oracle policies"  had not helped. What really helped is a real verification by Umesh Shastry later. As many colleagues noted, one had just to try, and test fails even on a non-modified code, quite often. It's quite possible that MariaDB will fix this bug faster, so you may want to monitor this related bug report, MDEV-13051. To summarize, it's not about any policies, but all about the proper care and attitude, and had always been like that...

Another great by to follow, Bug #86215, was reported back in May by Mark Callaghan. He spent a lot of time checking how well MySQL performs, historically, on different versions, for 4.0 to 8.0. Recent findings are summarized in this blog post. Based on the results, it seems that for most tests huge drop in performance happened in MySQL 5.7, and so far version 8.0.x had made few improvements on top of that. Check also this summary of tests and related blog posts. I wonder what would be the results of analysis for this bug and related posts by the great Dimitri Kravtchuk, whom the bug is assigned to currently.

Those who may have to work with MariaDB or MySQL on Windows and deal with crashes or hangs, may find this Micsoft's article useful. It explains how to get call stacks for threads with Visual Studio 2017. Thanks to Vladislav Vaintroub, for the link. I badly need new box with Windows 10 it seems, to check all this and other cool stuff, like shadow backups...

MariaDB's MaxScale 2.1 GA version had appeared recently, with a new cache filter feature. You can find the documentation here and there, and some comparison of it's initial implementation with ProxySQL's cache in this great blog post.

Yet another MySQL bug, reported by my colleague Hartmut Holzgraefe long time ago, attracted my attention while working on issues. This was Bug #76793, "Different row size limitations in Anaconda and Barracuda file formats". It was verified more than 2 years ago, but still nothing happens, neither in the code, nor in the manual to describe and explain the differences. Sad, but true.

The last but not the least, Support team of MariaDB is hiring now. We need one more engineer for our APAC team. If you want to work on complex and interesting issues from reasonable customers with properly set expectations, in a wisely managed team, and get a chance to work with me for 3-4 hours on a daily basis (yes, I am around/working during some APAC hours often enough, at least this summer) - please, get in touch and send your CV!

by Valeriy Kravchuk (noreply@blogger.com) at June 18, 2017 02:19 PM

June 16, 2017

MariaDB AB

This is What it’s Like When Data Collides (Relational + JSON)

This is What it’s Like When Data Collides (Relational + JSON) Shane Johnson Fri, 06/16/2017 - 13:46

In the past, a benefit to using non-relational databases (i.e., NoSQL) was its simple and flexible structure. If the data was structured, a relational database was deployed. If it was semi-structured (e.g., JSON), a NoSQL database was deployed.

Today, a relational database like MariaDB Server (part of MariaDB TX) can read, write and query both structured and semi-structured data, together. MariaDB Server supports semi-structured data via dynamic columns and JSON functions. This blog post will focus on JSON functions with MariaDB Server, using examples to highlight one of they key benefits: data integrity.

MariaDB JSON example #1 – Table constraints with JSON

In this example, there is a single table for products, and every product has an id, type, name, format (e.g., Blu-ray) and price. This data will be stored in the id, name, format and price columns. However, every movie has audio and video properties and this data will be stored in the attributes column, in a JSON document.

id

type

name

format

price

attr

1

M

Aliens

Blu-ray

13.99

{

  "video": {

     "resolution": "1080p",

     "aspectRatio": "1.85:1"

  },

  "cuts": [{

     "name": "Theatrical",

     "runtime": 138

  },

  {

     "name": "Special Edition",

     "runtime": 155

  }

  ],

  "audio": ["DTS HD", "Dolby Surround"]

}

We can’t have anyone inserting a movie record without audio and video properties.

To ensure data integrity, the JSON document for audio and video attributes must have:

  • a video object

    • with a resolution

    • with an aspect ratio

  • an audio array

    • with at least one element

  • a cuts array

    • with at least one element

  • a disks integer

And while you may not be able to enforce the data integrity of JSON documents in a NoSQL database, you can with MariaDB Server – create a table constraint with JSON functions.

ALTER TABLE products ADD CONSTRAINT check_movies
	CHECK (
type = 'M' and 
	JSON_TYPE(JSON_QUERY(attr, '$.video')) = 'OBJECT' and
	JSON_TYPE(JSON_QUERY(attr, '$.cuts')) = 'ARRAY' and
	JSON_TYPE(JSON_QUERY(attr, '$.audio')) = 'ARRAY' and
	JSON_TYPE(JSON_VALUE(attr, '$.disks')) = 'INTEGER' and
	JSON_EXISTS(attr, '$.video.resolution') = 1 and
	JSON_EXISTS(attr, '$.video.aspectRatio') = 1 and
JSON_LENGTH(JSON_QUERY(attr, '$.cuts')) > 0 and
JSON_LENGTH(JSON_QUERY(attr, '$.audio')) > 0);

With MariaDB you get the best of both worlds, the data integrity of a relational database and the flexibility of semi-structured data.

MariaDB JSON example #2 – Multi-statement transactions

In this example, there is one table for inventory and one table for shopping carts. In the inventory table, there are columns for the SKU, the in-stock quantity and the in-cart quantity. In the shopping cart table, there are columns for the session id and the cart.

Inventory

sku

in_stock

in_cart

123456

1

99

 

Shopping carts

session_id

cart

aldkjdi8i8jdlkjd

{

  "items": {

     "123456": {

        "name": "Aliens",

        "quantity": 1

     }

  }

}

Let’s assume this is the SKU of Aliens on Blu-ray (one of the best movies ever), and let’s assume we have to update the inventory when a customer adds a copy of it to their shopping cart.

It’s easy to update the inventory table.

UPDATE products SET 
in_stock = in_stock = in_stock - 1,
In_cart = in_cart + 1
WHERE sku = '123456';

What about the JSON document for the shopping cart?

UPDATE products SET
	cart = JSON_SET(cart, '$.items.123456.quantity', 2)
WHERE session_id = 'aldkjdi8i8jdlkjd';

This is a problem for NoSQL databases, which do not support multi-statement transactions like relational databases do. You don’t know what’s going to be updated: both the inventory and shopping cart, either the inventory or the shopping cart, or neither.

MariaDB Server will ensure both the inventory and the shopping cart are updated using multi-statement transactions. On top of that, you get to use both structured and semi-structured data in the same database, together – relational and JSON.

Learn more

To learn more about using JSON functions and semi-structured data with MariaDB Server, join us for a webinar on June 29, 2017. Also, read two new white papers: dynamic columns and JSON functions.

Today, a relational database like MariaDB Server (part of MariaDB TX) can read, write and query both structured and semi-structured data, together. MariaDB Server supports semi-structured data via dynamic columns and JSON functions. This blog post will focus on JSON functions with MariaDB Server, using examples to highlight one of they key benefits: data integrity.

Manjot Singh

Manjot Singh

Fri, 06/16/2017 - 15:50

semi structured

I can see the usefulness of this, but in the case above, why wouldn't you just create multiple JSON columns with their own objects, ie video, cuts, audio

Login or Register to post comments

by Shane Johnson at June 16, 2017 05:46 PM

Peter Zaitsev

Peter Zaitsev’s Speaking Schedule: Percona University Belgium / PG Day / Meetups

Peter Zaitsev Speaking Schedule

This blog shows Peter Zaitsev’s speaking schedule for this summer.

Summer 2017 Speaking Engagements

This week I spoke at the DB Tech Showcase OSS conference in Japan and am now heading to Europe. I have a busy schedule in June and early July, but there are events and places where we can cross paths and have a quick conversation. Let’s meet at these events if you need anything from Percona (or me personally). 

Below is a full list of places I’ll be at this summer:

Amsterdam, Netherlands

On June 20 I am speaking at the In-Memory Computing Summit 2017 with Denis Magda (Product Manager, Gridgain Systems). Our talk “Accelerate MySQL® for Demanding OLAP and OLTP Use Cases with Apache® Ignite™” starts at 2:35 pm.

On the same day in Amsterdam, Denis and I will speak at the local MySQL User Group meetupI will share some how-tos for MySQL monitoring with Percona Monitoring and Management (PMM), along with a PMM demo.

Ghent, Belgium

On June 22 we are organizing a Percona University event in Ghent, Belgium, which is a widely known tech hub in the region. I will give several talks there on MySQL, MongoDB and PMM monitoring. Dimitri Vanoverbeke from Percona will discuss MySQL in the Cloud. We have also invited guest speakers: Frederic Descamps from Oracle, and Julien Pivotto from Inuits.

Percona University technical events are 100% free to attend, and so far we are getting very positive attendee feedback on them. To check the full agenda for the Belgium edition, and to register, please use this link.

St. Petersburg, Russia

Percona is sponsoring PG Day’17 Russia, the PostgreSQL conference. This year they added a track on open source databases (and I was happy to be their Committee member for the OSDB track). The conference starts on July 5, and on that day I will give a tutorial on InnoDB Architecture and Performance Optimization. Sveta Smirnova will also present a tutorial on MySQL Performance Troubleshooting.

On July 6-7, you can expect four more talks given by Perconians at PG Day. We invite you to stop by our booth (“Percona”) and ask us any tough questions you might have.

Moscow, Russia

On July 11 I will speak at a Moscow MySQL User Group meetup at the Mail.Ru Group office. While we’re still locking down the agenda, we always have a great selection of speakers at the MMUG meetups. Make sure you don’t miss this gathering!

Thank you, and I hope to see many of you at these events.

by Peter Zaitsev at June 16, 2017 03:05 PM

June 15, 2017

Peter Zaitsev

Three Methods of Installing Percona Monitoring and Management

Installing Percona Monitoring and Management

Installing Percona Monitoring and ManagementIn this blog post, we’ll look at three different methods for installing Percona Monitoring and Management (PMM).

Percona offers multiple methods of installing Percona Monitoring and Management, depending on your environment and scale. I’ll also share comments on which installation methods we’ve decided to forego for now. Let’s begin by reviewing the three supported methods:

  1. Virtual Appliance
  2. Amazon Machine Image
  3. Docker

Virtual Appliance

We ship an OVF/OVA method to make installation as simple as possible, with the least amount of effort required and at the lowest cost to you. You can leverage the investment in your virtualization deployment platform. OVF is an open standard for packaging and distributing virtual appliances, designed to be run in virtual machines.

Using OVA with VirtualBox as a first step is common in order to quickly play with a working PMM system, and get right to adding clients and observing activity within your own environment against your MySQL and MongoDB instances. But you can also use the OVA file for enterprise deployments. It is a flexible file format that can be imported into other popular hypervisor systems such as VMware, Red Hat Virtualization, XenServer, Microsoft System Centre Virtual Machine Manager and others.

We’d love to hear your feedback on this installation method!

AWS AMI

We also have an AWS AMI in order to provide easy scaling of PMM Server in AWS, so that you can deploy onto any instance size required for your monitoring instance. Depending on the AWS region you’re in, you’ll need to choose from the appropriate AMI Instance ID. Soon we’ll be moving to the AWS Marketplace for even easier deployment. When this is implemented, you will no longer need to clone an existing AMI ID.

Docker

Docker is our most common production deployment method. It is easy (three commands) and scalable (tuning passed on the command line to Docker run). While we recognize that Docker is still a relatively new deployment system for many users, it is dramatically gaining adoption. It is also where Percona is investing the bulk of our development efforts. We deploy PMM Server as two Docker containers: one for storing the data that persists across restarts/upgrades, and the other for running the actual PMM Server binaries (Grafana, Prometheus, consul, Orchestrator, QAN, etc.).

Where are the RPM/DEB/tar.gz packages?!

A common question I hear is why doesn’t Percona support binary-based installation?

We hear you: RPM/DEB/tar.gz methods are commonly used today for many of your own applications. Percona is striving for simplicity in our deployment of PMM Server, and we spend considerable development and QA effort validating the specific versions of Grafana/Prometheus/QAN/consul/Orchestrator all work seamlessly together.

Percona wants to ensure OS compatibility and long-term support of PMM, and to do binary distribution “right” means it can quickly get expensive to build and QA across all the popular Linux distributions available today. We’re in no way against binary distributions. For example, see our list of the nine supported platforms for which we provide bug fix support.

Percona decided to focus our development efforts on stability and features, and less on the number of supported platforms. Hence the hyper-focus on Docker. We don’t have any current plans to move to a binary deployment method for PMM, but we are always open to hearing your feedback. If there is considerable interest, then please let me know via the comments below. We’ll take these thoughts into consideration for PMM planning in the second half of 2017.

Which other methods of installing Percona Monitoring and Management would you like to see?

by Michael Coburn at June 15, 2017 06:54 PM

MariaDB AB

Introduction to Window Functions in MariaDB Server 10.2

Introduction to Window Functions in MariaDB Server 10.2 guest Thu, 06/15/2017 - 14:52

This is a guest post by Vicențiu Ciorbaru, software engineer for the MariaDB Foundation. 

Practical Examples Part 1

MariaDB Server 10.2 has now reached GA status with its latest 10.2.6 release. One of the main focuses of Server 10.2 was analytical use cases. With the surge of Big Data and Business Intelligence, the need for sifting and aggregating through lots of data creates new challenges for database engines. For really large datasets, tools like Hadoop are the go-to solution. However that is not always the most practical or easy-to-use solution, especially when you’re in a position where you need to generate on-the-fly reports. Window functions are meant to bridge that gap and provide a nice “middle of the road” solution for when you need to go through your production or data warehouse database.

The MariaDB Foundation has worked in collaboration with the MariaDB Corporation to get this feature out the door, as it was an area where the MariaDB Server was lagging behind other modern DBMS. Being an advanced querying feature, it is not trivial to explain. The best way to provide an introduction to window functions is through examples. Let us see what window functions can do with a few common use cases.

Generating an increasing row number

One of the simplest use cases (and one of the most frequent) is to generate a monotonically increasing sequence of numbers for a set of rows. Let us see how we can do that.

Our initial data looks like this:

SELECT
email, first_name,
last_name, account_type
FROM users
ORDER BY email;
 
+------------------------+------------+-----------+--------------+
| email                  | first_name | last_name | account_type |
+------------------------+------------+-----------+--------------+
| admin@boss.org         | Admin      | Boss      | admin        |
| bob.carlsen@foo.bar    | Bob        | Carlsen   | regular      |
| eddie.stevens@data.org | Eddie      | Stevens   | regular      |
| john.smith@xyz.org     | John       | Smith     | regular      |
| root@boss.org          | Root       | Chief     | admin        |
+------------------------+------------+-----------+--------------+

Now let’s add our window function.

SELECT
row_number() OVER (ORDER BY email) AS rnum,
email, first_name,
last_name, account_type
FROM users
ORDER BY email;
 
+------+------------------------+------------+-----------+--------------+
| rnum | email                  | first_name | last_name | account_type |
+------+------------------------+------------+-----------+--------------+
|    1 | admin@boss.org         | Admin      | Boss      | admin        |
|    2 | bob.carlsen@foo.bar    | Bob        | Carlsen   | regular      |
|    3 | eddie.stevens@data.org | Eddie      | Stevens   | regular      |
|    4 | john.smith@xyz.org     | John       | Smith     | regular      |
|    5 | root@boss.org          | Root       | Chief     | admin        |
+------+------------------------+------------+-----------+--------------+

So, a new column has been generated, with values always increasing. Time to dissect the syntax.

Notice the OVER keyword as well as the additional ORDER BY statement after it. The OVER keyword is required to make use of window functions. row_number is a special function, only available as a “window function”, it can not be used without the OVER clause. Inside the OVER clause, the ORDER BY statement is required to ensure that the values are assigned to each row deterministically. With this ordering, the first email alphabetically will have the rnum value of 1, the second email will have rnum value of 2, and so on.

Can we do more with this?

Yes! Notice the account_type column. We can generate separate sequences based on account type. The OVER clause supports another keyword called PARTITION BY, which works very similar to how GROUP BY works in a regular select. With PARTITION BY we split our data in “partitions” and compute the row number separately for each partition.

SELECT
row_number() OVER (PARTITION BY account_type ORDER BY email) AS rnum,
email, first_name,
last_name, account_type
FROM users
ORDER BY account_type,email;
 
+------+------------------------+------------+-----------+--------------+
| rnum | email                  | first_name | last_name | account_type |
+------+------------------------+------------+-----------+--------------+
|    1 | admin@boss.org         | Admin      | Boss      | admin        |
|    2 | root@boss.org          | Root       | Chief     | admin        |
|    1 | bob.carlsen@foo.bar    | Bob        | Carlsen   | regular      |
|    2 | eddie.stevens@data.org | Eddie      | Stevens   | regular      |
|    3 | john.smith@xyz.org     | John       | Smith     | regular      |
+------+------------------------+------------+-----------+--------------+

Looks good! Now that we have an understanding of the basic syntax supported by window functions (there’s more, but I won’t go into all the details at once) let’s look at another similar example.

Top-N Queries

The request is: “Find the top 5 salaries from each department.” Our data looks like this:

describe employee_salaries;

+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| Pk     | int(11)     | YES  |     |    NULL |       |
| name   | varchar(20) | YES  |     |    NULL |       |
| dept   | varchar(20) | YES  |     |    NULL |       |
| salary | int(11)     | YES  |     |    NULL |       |
+--------+-------------+------+-----+---------+-------+

If we don’t have access to window functions, we can do this query using regular SQL like so:

select dept, name, salary
from employee_salaries as t1
where (select count(t2.salary)
       from employee_salaries as t2
       where t1.name != t2.name and
             t1.dept = t2.dept and
             t2.salary > t1.salary) < 5
order by dept, salary desc;

The idea here is to use a subquery to count the number of employees (via count()) from the same department as the current row (where t1.dept = t2.dept) that have a salary greater than the current row (where t2.salary > t1.salary). Afterwards, we accept only the employees which have at most 4 other salaries greater than theirs.

The problem with using this subquery is that if there is no index in the table, the query takes a very long time to execute the larger the employee_salaries table gets. Of course, you can add an index to speed it up, but there are multiple problems that come up when you do it. First, creating the index is costly and maintaining it afterwards adds overhead for each transaction. This gets worse if the table you are using is a frequently updated one. Even if there is an index on the dept and salary column, each subquery execution needs to perform a lookup through the index. This adds overhead as well. There must be a better way!

Window functions to the rescue. For such queries there is a dedicated window function called rank. It is basically a counter that increments whenever we encounter a different salary, while also keeping track of identical values. Let’s see how we would get the same result using the rank function:

Let’s start with generating our ranks for all our employees.

select
rank() over (partition by dept order by salary desc) as ranking,
dept, name, salary
from employee_salaries
order by dept, ranking;

+----------------+----------+--------+
|ranking | dept  | name     | salary |
+--------+-------+----------+--------+
|      1 | Eng   | Kristian |   3500 |
|      2 | Eng   | Sergei   |   3000 |
|      3 | Eng   | Sami     |   2800 |
|      4 | Eng   | Arnold   |   2500 |
|      5 | Eng   | Scarlett |   2200 |
|      6 | Eng   | Michael  |   2000 |
|      7 | Eng   | Andrew   |   1500 |
|      8 | Eng   | Tim      |   1000 |
|      1 | Sales | Bob      |    500 |
|      2 | Sales | Jill     |    400 |
|      3 | Sales | Tom      |    300 |
|      3 | Sales | Lucy     |    300 |
|      5 | Sales | Axel     |    250 |
|      6 | Sales | John     |    200 |
|      7 | Sales | Bill     |    150 |
+--------+-------+----------+--------+

We can see that each department has a separate sequence of ranks, this is due to our partition by clause. This particular sequence of values for rank() is given by our ORDER BY clause inside the window function’s OVER clause. Finally, to get our results in a readable format we order our data by dept and the newly generated ranking column.

Given that we have the result set as is, we might be tempted to put a where clause in this statement and filter only the first 5 values per department. Let’s try it!

select
rank() over (partition by dept order by salary desc) as ranking,
dept, name, salary
from employee_salaries
where ranking <= 5
order by dept, ranking;

Unfortunately, we get the error message:

Unknown column 'ranking' in 'where clause'

There is a reason for this! This has to do with how window functions are computed. The computation of window functions happens after all WHERE, GROUP BY and HAVING clauses have been completed, right before ORDER BY. Basically, the WHERE clause has no idea that the ranking column exists. It is only present after we have filtered and grouped all the rows. To counteract this problem, we need to wrap our query into a derived table. We can then attach a where clause to it. We end up with:

select *
from (select 
        rank() over (partition by dept order by salary desc)
               as ranking,
        dept, name, salary
      from employee_salaries) as salary_ranks
where (salary_ranks.ranking <= 5)
order by dept, ranking;

This query does not return an error. Here’s how the result set looks like with a few sample data points.

+----------------+----------+--------+
|ranking | dept  | name     | salary |
+--------+-------+----------+--------+
|      1 | Eng   | Kristian |   3500 |
|      2 | Eng   | Sergei   |   3000 |
|      3 | Eng   | Sami     |   2800 |
|      4 | Eng   | Arnold   |   2500 |
|      5 | Eng   | Scarlett |   2200 |
|      1 | Sales | Bob      |    500 |
|      2 | Sales | Jill     |    400 |
|      3 | Sales | Tom      |    300 |
|      3 | Sales | Lucy     |    300 |
|      5 | Sales | Axel     |    250 |
+--------+-------+----------+--------+

We have our result with window functions, but how do the two queries compare performance wise? Disclaimer, this is done on a development machine to give a rough idea of the performance differences. This is not a “go-to” benchmark result.
 

#Rows

Regular SQL (seconds)

Regular SQL + Index (seconds)

Window Functions (seconds)

2 000

1.31

0.14

    0.00

20 000

123.6

    12.6

    0.02

200 000

14326.34

1539.79

    0.21

2 000 000

...

...

5.61

20 000 000

...

...

76.04

The #Rows represent the number of entries in the table. Each department has 100 employees. For our queries, we can see the computation time grows in a polynomial fashion. For each 10x increase in the number of rows, for regular SQL (with or without an index), the computation time increases by a factor of 100x. On the other hand, for each 10x increase in the number of rows, for regular Window Functions, the computation takes only 10x amount of time. This is due to eliminating an entire subquery evaluation for each row in our final result set.

I’ll dive in more detail about the window function implementation in further articles, but for now this should give you a taste of what window functions can do.

This is a guest post by Vicențiu Ciorbaru, software engineer for the MariaDB Foundation. 

MariaDB Server 10.2 has now reached GA status with its latest 10.2.6 release. One of the main focuses of Server 10.2 was analytical use cases. With the surge of Big Data and Business Intelligence, the need for sifting and aggregating through lots of data creates new challenges for database engines. For really large datasets, tools like Hadoop are the go-to solution. However that is not always the most practical or easy-to-use solution, especially when you’re in a position where you need to generate on-the-fly reports. Window functions are meant to bridge that gap and provide a nice “middle of the road” solution for when you need to go through your production or data warehouse database.

Login or Register to post comments

by guest at June 15, 2017 06:52 PM

Jean-Jerome Schmidt

Watch the tutorial: backup best practices for MySQL, MariaDB and Galera Cluster

Many thanks to everyone who registered and/or participated in Tuesday’s webinar on backup strategies and best practices for MySQL, MariaDB and Galera clusters led by Krzysztof Książek, Senior Support Engineer at Severalnines. If you missed the session, would like to watch it again or browse through the slides, they’re now online for viewing. Also check out the transcript of the Q&A session below.

Watch the webinar replay

Whether you’re a SysAdmin, DBA or DevOps professional operating MySQL, MariaDB or Galera clusters in production, you should make sure that your backups are scheduled, executed and regularly tested. Krzysztof shared some of his key best practice tips & tricks yesterday on how to do just that; including a live demo with ClusterControl. In short, this webinar replay shows you the pros and cons of different backup options and helps you pick the one that goes best with your environment.

Happy backuping!

Questions & Answers

Q. Can we control I/O while taking the backups with mysqldump and mysqldumper (I’ve used nice before, but it wasn’t helpful).

A. Theoretically it might be possible, although we haven’t really tested that. If you really want to apply some throttling then you may want to look into cgroups - it should help you to throttle I/O activity on a per-process basis.

Q. Can we take mydumper with ClusterControl and is ClusterControl is free software?

A. We don't currently support it, but you can always use it manually; ClusterControl doesn't prevent you from using this tool. There is a free community version of ClusterControl, yes, though its backup features are part of the commercial version. With the free community version you can deploy and monitor your database (clusters) as well as develop your own custom database advisors. You also have a one-month trial period that gives you access to all of ClusterControl’s features. You can find all the feature details here: https://severalnines.com/pricing

Q. Can xtrabackup work with data-at-rest encryption?

A. It can work with encrypted data in MySQL or Percona Server - it is because they encrypt only tablespaces, which xtrabackup just copies - it doesn’t have to access contents of tablespaces. MariaDB encrypts not only tablespaces but also, for example, InnoDB redo logs, which have to be accessed by xtrabackup - therefore xtrabackup cannot work with data-at-rest encryption as implemented in MariaDB. Because of this MariaDB Corporation forked xtrabackup into MariaDB Backup. This tool supports encryption done by MariaDB.

Q. Can you use mydumper for point-in-time recovery?

A. Yes, it is possible. mydumper can store GTID data so you can identify last applied transaction and use it as a starting position for processing binary logs.

Q. Is it a problem if we use binary logs with xtrabackup with start-datetime and end-datetime instead of start-position and end-position? We make a full backup on Fridays and every other day an incremental backup. When we need to recover we take the last full and all incremental backups and the binary logs from this day starting from 00:00 to NOW ... could there be a problem with apply-log?

A. In general, you should not use --start-datetime or --end-datetime when you want to reply binary log on the database. It’s not granular enough - it has a resolution of one second and there could be many transactions that happened during that second. You can use it to minimize timeframe to look for manually, but that’s all. If you want to replay binary logs, you should use --start-position and --end-position. Only this will precisely define from which event you will replay binlogs and on which event it’ll end up.

Q. Should I run the dump software on load balancer or one of the MySQL nodes?

A. Typically you’ll use it on MySQL nodes. Some of the tools can only do just that. For example, Xtrabackup - you have to run it locally, on the database host. You can stream output to another location, but it has to be started locally.

Q. Can we take partial backups with ClusterControl? And if yes, how can we restore a backup on a running instance?

A. Yes, you can take a partial backup using ClusterControl (you can backup separate schema using xtrabackup) but, as of now, you cannot restore a partial backup on a running instance. This is caused by the fact that the schema you’d recover will not be consistent with the rest of the cluster. To make it consistent, the cluster has to be bootstrapped from the node on which you restore a backup. So, technically, the node runs all the time but it’s a fairly heavy and invasive operation. This will change in the next version of ClusterControl in which you’d be able to restore backups on a separate host. From that host you could then dump contents of a restored schema using mysqldump (or mydumper) and restore it on a production cluster.

Q. Can you please share the mysqldumper command?

A. It’s rather hard to answer this question without doing copy and paste from the documentation, so we think it will be the best if we’d point you to the documentation: https://github.com/maxbube/mydumper/tree/master/docs

Watch the webinar replay

by jj at June 15, 2017 11:33 AM

MariaDB Foundation

Tencent Cloud Becomes a Platinum Sponsor of the MariaDB Foundation

The MariaDB Foundation today announced that Tencent Cloud, the cloud computing service provided by Tencent, has become its platinum sponsor. The sponsorship will help the Foundation ensure continuous collaboration amongst stakeholders within the MariaDB ecosystem, to drive adoption and serve the ever-growing community of users and developers. Tencent Cloud has already sponsored the MariaDB Foundation […]

The post Tencent Cloud Becomes a Platinum Sponsor of the MariaDB Foundation appeared first on MariaDB.org.

by Otto Kekäläinen at June 15, 2017 02:00 AM

June 14, 2017

Peter Zaitsev

MySQL Triggers and Updatable Views

MySQL Triggers

MySQL TriggersIn this post we’ll review how MySQL triggers can affect queries.

Contrary to what the documentation states, we can activate triggers even while operating on views:

https://dev.mysql.com/doc/refman/5.7/en/triggers.html

Important: MySQL triggers activate only for changes made to tables by SQL statements. They do not activate for changes in views, nor by changes to tables made by APIs that do not transmit SQL statements to the MySQL server.

Be on the lookout if you use and depend on triggers, since it’s not the case for updatable views! We have reported a documentation bug for this but figured it wouldn’t hurt to mention this as a short blog post, too. 🙂 The link to the bug in question is here:

https://bugs.mysql.com/bug.php?id=86575

Now, we’ll go through the steps we took to test this, and their outputs. These are for the latest MySQL version (5.7.18), but the same results were seen in 5.5.54, 5.6.35, and MariaDB 10.2.5.

First, we create the schema, tables and view needed:

mysql> CREATE SCHEMA view_test;
Query OK, 1 row affected (0.00 sec)
mysql> USE view_test;
Database changed
mysql> CREATE TABLE `main_table` (
   ->   `id` int(11) NOT NULL AUTO_INCREMENT,
   ->   `letters` varchar(64) DEFAULT NULL,
   ->   `numbers` int(11) NOT NULL,
   ->   `time` time NOT NULL,
   ->   PRIMARY KEY (`id`),
   ->   INDEX col_b (`letters`),
   ->   INDEX cols_c_d (`numbers`,`letters`)
   -> ) ENGINE=InnoDB  DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.31 sec)
mysql> CREATE TABLE `table_trigger_control` (
   ->   `id` int(11),
   ->   `description` varchar(255)
   -> ) ENGINE=InnoDB  DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.25 sec)
mysql> CREATE VIEW view_main_table AS SELECT * FROM main_table;
Query OK, 0 rows affected (0.02 sec)

Indexes are not really needed to prove the point, but were initially added to the tests for completeness. They make no difference in the results.

Then, we create the triggers for all possible combinations of [BEFORE|AFTER] and [INSERT|UPDATE|DELETE]. We will use the control table to have the triggers insert rows, so we can check if they were actually called by our queries.

mysql> CREATE TRIGGER trigger_before_insert BEFORE INSERT ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "BEFORE INSERT");
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TRIGGER trigger_after_insert AFTER INSERT ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "AFTER INSERT");
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TRIGGER trigger_before_update BEFORE UPDATE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "BEFORE UPDATE");
Query OK, 0 rows affected (0.19 sec)
mysql> CREATE TRIGGER trigger_after_update AFTER UPDATE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "AFTER UPDATE");
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TRIGGER trigger_before_delete BEFORE DELETE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (OLD.id, "BEFORE DELETE");
Query OK, 0 rows affected (0.18 sec)
mysql> CREATE TRIGGER trigger_after_delete AFTER DELETE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (OLD.id, "AFTER DELETE");
Query OK, 0 rows affected (0.05 sec)

As you can see, they will insert the ID of the row in question, and the combination of time/action appropriate for each one. Next, we will proceed in the following manner:

  1. INSERT three rows in the main table
  2. UPDATE the second
  3. DELETE the third

The reasoning behind doing it against the base table is to check that the triggers are working correctly, and doing what we expect them to do.

mysql> INSERT INTO main_table VALUES (1, 'A', 10, time(NOW()));
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO main_table VALUES (2, 'B', 20, time(NOW()));
Query OK, 1 row affected (0.14 sec)
mysql> INSERT INTO main_table VALUES (3, 'C', 30, time(NOW()));
Query OK, 1 row affected (0.17 sec)
mysql> UPDATE main_table SET letters = 'MOD' WHERE id = 2;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> DELETE FROM main_table WHERE id = 3;
Query OK, 1 row affected (0.10 sec)

And we check our results:

mysql> SELECT * FROM main_table;
+----+---------+---------+----------+
| id | letters | numbers | time     |
+----+---------+---------+----------+
|  1 | A       |      10 | 15:19:14 |
|  2 | MOD     |      20 | 15:19:14 |
+----+---------+---------+----------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM table_trigger_control;
+------+---------------+
| id   | description   |
+------+---------------+
|    1 | BEFORE INSERT |
|    1 | AFTER INSERT  |
|    2 | BEFORE INSERT |
|    2 | AFTER INSERT  |
|    3 | BEFORE INSERT |
|    3 | AFTER INSERT  |
|    2 | BEFORE UPDATE |
|    2 | AFTER UPDATE  |
|    3 | BEFORE DELETE |
|    3 | AFTER DELETE  |
+------+---------------+
10 rows in set (0.00 sec)

Everything is working as it should, so let’s move on with the tests that we really care about. We will again take the three steps mentioned above, but this time directly on the view.

mysql> INSERT INTO view_main_table VALUES (4, 'VIEW_D', 40, time(NOW()));
Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO view_main_table VALUES (5, 'VIEW_E', 50, time(NOW()));
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO view_main_table VALUES (6, 'VIEW_F', 60, time(NOW()));
Query OK, 1 row affected (0.11 sec)
mysql> UPDATE view_main_table SET letters = 'VIEW_MOD' WHERE id = 5;
Query OK, 1 row affected (0.04 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> DELETE FROM view_main_table WHERE id = 6;
Query OK, 1 row affected (0.01 sec)

And we check our tables:

mysql> SELECT * FROM main_table;
+----+----------+---------+----------+
| id | letters  | numbers | time     |
+----+----------+---------+----------+
|  1 | A        |      10 | 15:19:14 |
|  2 | MOD      |      20 | 15:19:14 |
|  4 | VIEW_D   |      40 | 15:19:34 |
|  5 | VIEW_MOD |      50 | 15:19:34 |
+----+----------+---------+----------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM view_main_table;
+----+----------+---------+----------+
| id | letters  | numbers | time     |
+----+----------+---------+----------+
|  1 | A        |      10 | 15:19:14 |
|  2 | MOD      |      20 | 15:19:14 |
|  4 | VIEW_D   |      40 | 15:19:34 |
|  5 | VIEW_MOD |      50 | 15:19:34 |
+----+----------+---------+----------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM table_trigger_control;
+------+---------------+
| id   | description   |
+------+---------------+
|    1 | BEFORE INSERT |
|    1 | AFTER INSERT  |
|    2 | BEFORE INSERT |
|    2 | AFTER INSERT  |
|    3 | BEFORE INSERT |
|    3 | AFTER INSERT  |
|    2 | BEFORE UPDATE |
|    2 | AFTER UPDATE  |
|    3 | BEFORE DELETE |
|    3 | AFTER DELETE  |
|    4 | BEFORE INSERT |
|    4 | AFTER INSERT  |
|    5 | BEFORE INSERT |
|    5 | AFTER INSERT  |
|    6 | BEFORE INSERT |
|    6 | AFTER INSERT  |
|    5 | BEFORE UPDATE |
|    5 | AFTER UPDATE  |
|    6 | BEFORE DELETE |
|    6 | AFTER DELETE  |
+------+---------------+
20 rows in set (0.00 sec)

As seen in the results, all triggers were executed, even when the queries were run against the view. Since this was an updatable view, it worked. On the contrary, if we try on a non-updatable view it fails (we can force ALGORITHM = TEMPTABLE to test it).

mysql> CREATE ALGORITHM=TEMPTABLE VIEW view_main_table_temp AS SELECT * FROM main_table;
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO view_main_table_temp VALUES (7, 'VIEW_H', 70, time(NOW()));
ERROR 1471 (HY000): The target table view_main_table_temp of the INSERT is not insertable-into
mysql> UPDATE view_main_table_temp SET letters = 'VIEW_MOD' WHERE id = 5;
ERROR 1288 (HY000): The target table view_main_table_temp of the UPDATE is not updatable
mysql> DELETE FROM view_main_table_temp WHERE id = 5;
ERROR 1288 (HY000): The target table view_main_table_temp of the DELETE is not updatable

As mentioned before, MariaDB shows the same behavior. The difference, however, is that the documentation is correct in mentioning the limitations, since it only shows the following:

https://mariadb.com/kb/en/mariadb/trigger-limitations/

Triggers cannot operate on any tables in the mysql, information_schema or performance_schema database.

Corollary to the Discussion

It’s always good to thoroughly check the documentation, but it’s also necessary to test things and prove the documentation is showing the real case (bugs can be found everywhere, not just in the code :)).

by Agustín at June 14, 2017 06:54 PM

June 13, 2017

Peter Zaitsev

Webinar Thursday, June 15, 2017: Demystifying Postgres Logical Replication

Postgres Logical Replication

Postgres Logical ReplicationJoin Percona’s Senior Technical Services Engineer Emanuel Calvo as he presents Demystifying Postgres Logical Replication on Thursday, June 15, 2017 at 7 am PDT / 10 am EDT (UTC-7).

The Postgres logical decoding feature was added in version 9.4, and thankfully it is continuously improving due to the vibrant open source community. In this webinar, we are going to walk through its concepts, usage and some of the new things coming up in future releases.

Logical decoding is one of the features under the BDR implementation, allowing bidirectional streams of data between Postgres instances. It also allows you to stream data outside Postgres into many other data systems.

Register for the webinar here.

Emanuel CalvoEmanuel Calvo, Percona Sr. Technical Services

Emanuel has worked with MySQL for more than eight years. He is originally from Argentina, but also lived and worked in Spain and other Latin American countries. He lectures and presents at universities and local events. Emanuel currently works as a Sr. Technical Services at Percona, focusing primarily on MySQL. His professional background includes experience at telecommunication companies, educational institutions and data warehousing solutions. In his career, he has worked as a developer, SysAdmin and DBA in companies like Pythian, Blackbird.io/PalominoDB, Siemens IT Solutions, Correo Argentino (Argentinian Postal Services), Globant-EA, SIU – Government Educational Institution and Aedgency among others. As a community member he has lectured and given talks in Argentina, Brazil, United States, Paraguay, Spain and Belgium as well as written several technical papers.

by Dave Avery at June 13, 2017 07:21 PM

MariaDB AB

Performance Improvements in MariaDB MaxScale 2.1 GA

Performance Improvements in MariaDB MaxScale 2.1 GA Johan Tue, 06/13/2017 - 03:19

Performance is important and perhaps doubly so for a database proxy such as MariaDB MaxScale. From the start, the goal of MaxScale’s architecture has been that the performance should scale with the number of cpu-cores and the means for achieving that is the use of asynchronous I/O and Linux' epoll, combined with a fixed number of worker threads.

However, benchmarking showed that there were some bottlenecks, as the throughput did not continuously increase as more threads were added, but rather quickly it would start to decrease. The throughput also hit a relatively low ceiling, above which it would not grow, irrespective of the configuration of MaxScale.

When we started working with MariaDB MaxScale 2.1 we decided to make some significant changes to the internal architecture, with the aim of improving the overall performance. In this blog post, I will describe what we have done and show how the performance has improved.

The benchmark results have all been obtained using the same setup; two computers (connected with a GBE LAN), both of which have 16 cores / 32 hyperthreads, 128GB RAM and an SSD drive. One of the computers was used for running 4 MariaDB-10.1.13 servers and the other for running MariaDB MaxScale and sysbench. The used sysbench is a version slightly modified by MariaDB and it is available here. The configurations used and the raw data can be found here.

In all test runs, the MaxScale configuration was basically the same and 16 worker threads were used. As can be seen here. 16 threads is not optimal for MaxScale 2.0, but that does not distort the general picture too much. The used workload is in all cases the same; an OLTP read-only benchmark with 100 simple selects per iteration, no transaction boundaries, and autocommit enabled.

In the figures

  • direct means that sysbench itself distributes the load to the servers in a round-robin fashion (i.e. not through MaxScale),

  • rcr means that MaxScale with the readconnroute router is used, and

  • rws means that MaxScale with the readwritesplit router is used.

The rcr and rws results are meaningful to compare and the difference reflects the overhead caused by the additional complexity and versatility of the readwritesplit router.

As a baseline, in the following figure is shown the result of MaxScale 2.0.5 GA.

MaxScale-2.0.5.png

As can be seen, a saturation point is reached at 64 client connections, at which point the throughput is less than half of that of a direct connection.

Threading Architecture

Before MaxScale 2.1, all threads were used in a completely symmetric fashion; every thread listened on the same epoll instance, which in practice meant that every thread could accept an incoming connection, read from and write to any client and read from and write to any server related to that client. Without going into the specifics, that implied not only a lot of locking inside MaxScale but also made it quite hard to ensure that there could no data races.

For 2.1 we changed the architecture so that when a client has connected, a particular thread is used for handling all communication with and behalf of that client. That is, it is the thread that reads a request from a client that also writes that request to one or more servers, reads the responses from those servers and finally writes a response to the client. With this change quite a lot of locking could be removed, while at the same time the possibilities for data races were reduced.

As can be seen in the following figure, that had quite an impact on the throughput, as far as readconnroute is concerned. The behaviour of readwritesplit did not really change.

MaxScale-2.1.0.png

In 2.1.0 Beta we also introduced a caching filter and as long as the number of client connections is low, the impact is quite significant. However, as the number of client connections grows, the benefit shrinks dramatically.

The cache can be configured to be shared between all threads or to be specific for each thread and in the benchmark a thread specific cache was used. Otherwise the cache was not configured in any way, which means that all SELECT statements, but for those that are considered non-cacheable, are cached and that the cache uses as much memory as it needs (no LRU based cache eviction).

It is not shown in the figure, but readconnroute + cache performed roughly equivalently with readwritesplit + cache. That is, from 32 client connections onwards readconnroute + cache performed much worse than readconnroute alone.

Increasing Query Classification Concurrency

The change in the threading architecture improved the raw throughput of MaxScale but it was still evident that the throughput did not increase in all contexts in the expected way. Investigations showed that there was some unnecessary serialization going on when statements were classified, something that readwritesplit needs to do, but readconnroute does not.

Statements are classified inside MaxScale using a custom sqlite based parser. An in memory database is created for each thread so that there can be no concurrency issues between classification performed by different threads. However, it turned out that sqlite was not built in a way that would have turned off all locking inside sqlite, but some locking was still taking place. For 2.1.1 Beta this issue was fixed and now no locking takes place when statements are classified.

More Aggressive Caching

Initially the cache worked so that caching was taking place only if no transaction was in progress or if an explicitly read-only transaction was in progress. In 2.1.1 Beta we changed that so that caching takes place also in transactions that are not explicitly read-only, but only until the first updating statement.

Reduce Data Collection

Up until 2.1.1 Beta the query classifier always collected of a statement all information that is accessible via the query classifier API. Profiling showed that the cost for the allocation of memory for storing the accessed fields and the used functions was significant. In most cases paying that cost is completely unnecessary, as it primarily is only the database firewall filter that is interested in that information.

For 2.1.2 Beta the internal API was changed so that filters and routers can express what information they need and only that information is subsequently collected during query classification.

Custom Parser

Many routers and filters need to know whether a transaction is in progress or not. Up until 2.1.1 Beta that information was obtained using the parser of the query classifier. That imposed a significant cost if classification otherwise was not needed. For 2.1.2 Beta a custom parser for only detecting statements affecting the transaction state and autocommit mode was written. Now the detection carries almost no cost at all.

Optional Cache Safety Checking

By default, the cache filter verifies that a SELECT statement is cacheable. For instance, a SELECT that refers to user variables or uses functions such as CURRENT_DATE(), is not cacheable. In order to do that verification, the cache uses the query classifier, which means that the statement will be parsed. In 2.1.2 Beta we introduced the possibility to specify in the configuration file that all SELECT statements can be assumed to be cacheable, which means that no parsing needs to be performed. In practice this behaviour is enabled by adding the following entry to the cache section in the MaxScale configuration file.

    [Cache]
    ...
    selects=assume_cacheable

2.1.3 GA

The impact of the changes above is shown in the figure below.

MaxScale-2.1.3.png

Compared to MaxScale 2.1.0, the performance of readconnroute is roughly the same. However, the performance of readwritesplit has increased by more than 100% and is no longer significantly different from readconnroute. Also the cache performance has improved quite dramatically, as can be seen in the following figure.

MaxScale-2.1.3-rwsplit.png

In the figure, cacheable means that the cache has been configured so that it can assume that all SELECT statements can be cached.

From the figure can be seen that if there are less than 64 concurrent client connections, the cache improves the throughput, even if is verified that each SELECT statement is cacheable. With more concurrent connections than that, the verification constitutes a significant cost that diminishes the benefit of the cache. Best performance is always obtained if the verification can be turned off.

Compared to 2.0.5 the overall improvement is quite evident and although the shown results reflect a particular benchmark, it is to expected that 2.1 consistently performs better than 2.0, irrespective of the workload.

Looking Forward

Performance wise 2.1.3 GA is the best MaxScale version there has ever been, but there are still improvements that can be made.

In 2.1 all threads listen on new clients but once the connection has been accepted, it is assigned to a particular thread. That causes cross-thread activity that requires locking. In the next version, MaxScale 2.2 will change this so that the thread that accepts the connection is also the one that will handle it. That way there will be no cross-thread activity when a client connects.

A significant amount of the remaining locking inside MaxScale is actually related to the collection of statistics. Namely, as the MaxScale component for handling maxadmin requests is a regular MaxScale router, a maxadmin connection will run on some thread, yet it needs to access data belonging to the other threads. In 2.2 we introduce an internal epoll-based cross-thread messaging mechanism and have a dedicated admin thread, which communicates with the worker threads using that messaging mechanism. The result is that all statistics related locking can be removed.

Try it out today. Download MariaDB MaxScale.

Performance is important and perhaps doubly so for a database proxy such as MariaDB MaxScale. When we started working with MariaDB MaxScale 2.1 we decided to make some significant changes to the internal architecture, with the aim of improving the overall performance. In this blog post, I will describe what we have done and show how the performance has improved.

Login or Register to post comments

by Johan at June 13, 2017 07:19 AM