Planet MariaDB

April 20, 2018

Peter Zaitsev

The Final Countdown: Are You Ready for Percona Live 2018?

Are you ready for Percona Live

Are you ready for Percona Live 2018It’s hard to believe Percona Live 2018 starts on Monday! We’re looking forward to seeing everyone in Santa Clara next week! Here are some quick highlights to remember:

  • In addition to all the amazing sessions and keynotes we’ve announced, we’ll be hosting the MySQL Community Awards and the Lightning Talks on Monday during the Opening Reception.
  • We’ve also got a great lineup of demos in the exhibit hall all day Tuesday and Wednesday – be sure to stop by and learn more about open source database products and tools.
  • On Monday, we have a special China Track now available from Alibaba Cloud, PingCAP and Shannon Systems. We’ve just put a $20.00 ticket on sale for that track, and if you have already purchased any of our other tickets, you are also welcome to attend those four sessions.
  • Don’t forget to make your reservation at the Community Dinner. It’s a great opportunity to socialize with everyone and Pythian is always a wonderful host!

Thanks to everyone who is sponsoring, presenting and attending! The community is who makes this event successful and so much fun to be a part of!

The post The Final Countdown: Are You Ready for Percona Live 2018? appeared first on Percona Database Performance Blog.

by Laurie Coffin at April 20, 2018 09:07 PM

Percona Toolkit 3.0.9 Is Now Available

Percona Toolkit 3.0.9Percona announces the release of Percona Toolkit 3.0.9 on April 20, 2018.

Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL®, MongoDB® and system tasks that are too difficult or complex to perform manually. With over 1,000,000 downloads, Percona Toolkit supports Percona Server for MySQL, MySQL, MariaDB®, Percona Server for MongoDB and MongoDB.

Percona Toolkit, like all Percona software, is free and open source. You can download packages from the website or install from official repositories.

This release includes the following changes:

New Tools:

  • PT-1501: pt-secure-collect – new tool to collect and sanitize pt-tools outputs

New Features:

  • PT-1530: Add support for encryption status to pt-mysql-summary
  • PT-1526: Add ndb status to pt-mysql-summary (Thanks Fernando Ipar)
  • PT-1525: Add support for MySQL 8 roles into pt-mysql-summary
  • PT-1509: Make pt-table-sync only set binlog_format when necessary (Thanks Moritz Lenz)
  • PT-1508: Add --read-only-interval and --fail-successive-errors flags to pt-heartbeat (Thanks Shlomi Noach)
  • PT-243: Add --max-hostname-length and --max-line-length flags to pt-query-digest

Bug Fixes:

  • PT-1527: Fixed pt-table-checksum ignores --nocheck-binlog-format

Improvements:

  • PT-1507: pt-summary does not reliably read in the transparent huge pages setting (Thanks Nick Veenhof)

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

The post Percona Toolkit 3.0.9 Is Now Available appeared first on Percona Database Performance Blog.

by Borys Belinsky at April 20, 2018 05:51 PM

Percona Monitoring and Management (PMM) 1.10.0 Is Now Available

Percona Monitoring and Management

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL® and MongoDB® performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL® and MongoDB® servers to ensure that your data works as efficiently as possible.

Percona Monitoring and ManagementWe focused mainly on two features in 1.10.0, but there are also several notable improvements worth highlighting:

  • Annotations – Record and display Application Events as Annotations using pmm-admin annotate
  • Grafana 5.0 – Improved visualization effects
  • Switching between Dashboards – Restored functionality to preserve host when switching dashboards
  • New Percona XtraDB Cluster Overview graphs – Added Galera Replication Latency graphs on Percona XtraDB Cluster Overview dashboard with consistent colors

The issues in the release include four new features & improvements, and eight bugs fixed.

Annotations

Application events are one of the contributors to changes in database performance characteristics, and in this release PMM now supports receiving events and displaying them as Annotations using the new command pmm-admin annotate. A recent Percona survey reveals that Database and DevOps Engineers highly value visibility into the Application layer.  By displaying Application Events on top of your PMM graphs, Engineers can now correlate Application Events (common cases: Application Deploys, Outages, and Upgrades) against Database and System level metric changes.

Usage

For example, you have just completed an Application deployment to version 1.2, which is relevant to UI only, so you want to set tags for the version and interface impacted:

pmm-admin annotate "Application deploy v1.2" --tags "UI, v1.2"

Using the optional --tags allows you to filter which Annotations are displayed on the dashboard via a toggle option.  Read more about Annotations utilization in the Documentation.

Grafana 5.0

We’re extremely pleased to see Grafana ship 5.0 and we were fortunate enough to be at Grafanacon, including Percona’s very own Dimitri Vanoverbeke (Dim0) who presented What we Learned Integrating Grafana and Prometheus!

 

 

Included in Grafana 5.0 are a number of dramatic improvements, which in future Percona Monitoring and Management releases we plan to extend our usage of each feature, and the one we like best is the virtually unlimited way you can size and shape graphs.  No longer are you bound by panel constraints to keep all objects at the same fixed height!  This improvement indirectly addresses the visualization error in PMM Server where some graphs would appear to be on two lines and ended up wasting screen space.

Switching between Dashboards

PMM now allows you to navigate between dashboards while maintaining the same host under observation, so that for example you can start on MySQL Overview looking at host serverA, switch to MySQL InnoDB Advanced dashboard and continue looking at serverA, thus saving you a few clicks in the interface.

New Percona XtraDB Cluster Galera Replication Latency Graphs

We have added new Percona XtraDB Cluster Replication Latency graphs on our Percona XtraDB Cluster Galera Cluster Overview dashboard so that you can compare latency across all members in a cluster in one view.

Issues in this release

New Features & Improvements

  • PMM-2330Application Annotations DOC Update
  • PMM-2332Grafana 5 DOC Update
  • PMM-2293Add Galera Replication Latency Graph to Dashboard PXC/Galera Cluster Overview RC Ready
  • PMM-2295Improve color selection on Dashboard PXC/Galera Cluster Overview RC Ready

Bugs fixed

  • PMM-2311Fix misalignment in Query Analytics Metrics table RC Ready
  • PMM-2341Typo in text on password page of OVF RC Ready
  • PMM-2359Trim leading and trailing whitespaces for all fields on AWS/OVF Installation wizard RC Ready
  • PMM-2360Include a “What’s new?” link for Update widget RC Ready
  • PMM-2346Arithmetic on InnoDB AHI Graphs are invalid DOC Update
  • PMM-2364QPS are wrong in QAN RC Ready
  • PMM-2388Query Analytics does not render fingerprint section in some cases DOC Update
  • PMM-2371Pass host when switching between Dashboards

How to get PMM

PMM is available for installation using three methods:

Help us improve our software quality by reporting any Percona Monitoring and Management bugs you encounter using our bug tracking system.

The post Percona Monitoring and Management (PMM) 1.10.0 Is Now Available appeared first on Percona Database Performance Blog.

by Borys Belinsky at April 20, 2018 05:36 PM

MariaDB Foundation

Live Q&A for beginner contributors on IRC every Monday! Join #maria @ 8:00 to 10:00 UTC

MariaDB is pleased to announce that we now have a dedicated time each week when we answer new contributor questions live on IRC. Starting from April 23rd, on every Monday, anybody is guaranteed to have a live person on the IRC list to ask any question they’d like in between 8:00 to 10:00 hours UTC. […]

The post Live Q&A for beginner contributors on IRC every Monday! Join #maria @ 8:00 to 10:00 UTC appeared first on MariaDB.org.

by Rutuja Surve at April 20, 2018 08:41 AM

Valeriy Kravchuk

Fun with Bugs #67 - On Some Public Bugs Fixed in MySQL 8.0.11 GA

I stopped reviewing MySQL Release Notes for quite a some time, but major GA releases of MySQL do not happen often, so I decided to make an exception and write about some bugs from Community users fixed in MySQL 8.0.11 GA.

I'll start with good news about MySQL 8.0.11 GA! You can get sources at GitHub, and I had no problems to build on Fedora 27 on my good old QuadCore box, using the following cmake command line:
[openxs@fc23 mysql-server]$ cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_CONFIG=mysql_release -DFEATURE_SET=community -DWITH_EMBEDDED_SERVER=OFF -DDOWNLOAD_BOOST=1 -DWITH_BOOST=/home/openxs/boost -DENABLE_DOWNLOADS=1 -DWITH_UNIT_TESTS=OFF -DCMAKE_INSTALL_PREFIX=/home/openxs/dbs/8.0
...
[openxs@fc23 mysql-server]$ time make -j 4
...

[100%] Built target mysqld

real    33m52.791s
user    105m47.475s
sys     8m19.018s
Comparing to previous experience, I had minor problem with unit tests, so just skipped them with -DWITH_UNIT_TESTS=OFF option. There is no problem to run the resulting binaries, unless you try to use data directory from older 8.0.x. Then you'll end up with:
2018-04-19T15:36:35.165841Z 1 [ERROR] [MY-011092] [Server] Upgrading the data dictionary from dictionary version '80004' is not supported.
2018-04-19T15:36:35.166239Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2018-04-19T15:36:35.166310Z 0 [ERROR] [MY-010119] [Server] Aborting
I had to remove data directory and initialize it from scratch (it was testing instance anyway, last time used for real while I worked on this presentation):
[openxs@fc23 8.0]$ rm -rf data/*
[openxs@fc23 8.0]$ bin/mysqld --no-defaults --initialize-insecure --port=3308 --socket=/tmp/mysql.sock --basedir=/home/openxs/dbs/8.0 --datadir=/home/openxs/dbs/8.0/data --skip-log-bin
2018-04-19T15:43:55.324606Z 0 [Warning] [MY-010139] [Server] Changed limits: max_open_files: 1024 (requested 8161)
2018-04-19T15:43:55.324726Z 0 [Warning] [MY-010142] [Server] Changed limits: table_open_cache: 431 (requested 4000)
2018-04-19T15:43:55.325147Z 0 [System] [MY-013169] [Server] /home/openxs/dbs/8.0/bin/mysqld (mysqld 8.0.11) initializing of server in progress as process 20034
2018-04-19T15:44:14.438776Z 4 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
2018-04-19T15:44:29.625227Z 0 [System] [MY-013170] [Server] /home/openxs/dbs/8.0/bin/mysqld (mysqld 8.0.11) initializing of server has completed
[openxs@fc23 8.0]$ bin/mysqld_safe --no-defaults --port=3308 --socket=/tmp/mysql.sock --basedir=/home/openxs/dbs/8.0 --datadir=/home/openxs/dbs/8.0/data --skip-log-bin &
[1] 20080
[openxs@fc23 8.0]$ 2018-04-19T15:44:58.224816Z mysqld_safe Logging to '/home/openxs/dbs/8.0/data/fc23.err'.
2018-04-19T15:44:58.271255Z mysqld_safe Starting mysqld daemon with databases from /home/openxs/dbs/8.0/data

[openxs@fc23 8.0]$ bin/mysql -uroot --socket=/tmp/mysql.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.11 MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like '%version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| innodb_version          | 8.0.11                       |
| protocol_version        | 10                           |
| slave_type_conversions  |                              |
| tls_version             | TLSv1,TLSv1.1,TLSv1.2        |
| version                 | 8.0.11                       |
| version_comment         | MySQL Community Server (GPL) |

| version_compile_machine | x86_64                       |
| version_compile_os      | Linux                        |
| version_compile_zlib    | 1.2.11                       |
+-------------------------+------------------------------+
9 rows in set (0.00 sec)
So, you can build MySQL 8.0.11 right now and start using it to make your own conclusions about this release.

I still do not care about NoSQL, JSON, new cool features etc. You'll see megabytes of texts about these by the end of 2018. I am going to concentrate mostly on InnoDB, replication bugs and few others:
  •  I am happy to start with Bugt #89509 - "Valgrind error on innodb.blob_page_reserve, bundled zlib", reported by Laurynas Biveinis. See also his Bug #89597 - "Valgrind reporting memory leak on MTR test main.validate_password_component" and Bug #89433 - "NULL dereference in dd::tables::DD_properties:unchecked_get". Percona engineers spent a lot of efforts recently testing MySQL 8.0.x and reporting bugs noted. I think Oracle should explicitly admit the impact of Percona's QA effrots for the quality of this GA release.
  • Biug #89127 - "Optimize trx_rw_is_active() by tracking the lowest active transaction id". This bug was reported by Zhai Weixiang, who had suggested a patch also.
  • Bug #89129 - "create table+DML on innodb_ddl_log table=crash in lock0lock.cc:7414:release_lock". This bug was reported by Ramana Yeruva. Tables were made protected and DDL and DML operations on these tables are no longer permitted.
  • Bug #89087 - "Assertion `key->flags & 1' failed". This debug assertion (related to the way PRIMARY key was created based on UNIQUE one) was reported by Roel Van de Paar for 5.7.21, but we see the fix documented only for 8.0.x.
  • Bug #87827 - "Performance regression in "create table" speed and scalability in 8.0.3". It was reported by Alexander Rubin from Percona.
  • Bug #87812 - "Concurrent DDL operation in progress even after acquiring backup lock". Nice bug report from Debarun Banerjee.
  • Bug #87532 - "Replay log record cause mysqld crash during online DDL". I am happy to see impovements in "online ALTER" implementation that covers all GA versions, not just 8.0. I am also happy to see Oracle engineers (Ohm Hong in this case) reporting bugs in public!
  • Bug #88272 - "Assertion `new_value >= 0' failed.". Yet another debug assertion found by Roel Van de Paar, this time related to GTIDs and XA transactions. Check also his Bug #88262 - "ERROR 1598 (HY000): Binary logging not possible + abort".
  • Bug #84415 - "slave don't report Seconds_Behind_Master when running slave_parallel_workers > 0". Yet another contribution from Percona engineers. This bug was reported by Marcelo Altmann and patches were provided by Robert Golebiowski. This bug is also fixed in MySQL 5.7.22.
  • Bug #89793 - "INFORMATION_SCHEMA.STATISTICS field type change". Unexpected change in early 8.0.x versions was noted and reported by Mark Guinness.
  • Bug #89584 - "5.7->8.0 upgrade crash with default-time-zone set". Nice to see this bug (reported by Shane Bester) fixed in GA release.
  • Bug #89487 - "ALTER TABLE hangs in "Waiting for tablespace metadata lock" state". This regression bug was reported by Sveta Smirnova.
  • Bug #89324 - "main.comment_column2 fails with compression". This regression was noted and reported by Manuel Ung.
  • Bug #89122 - "Severe performance regression in server bootstrap". I am really happy to see this bug reported by Georgi Kodinov fixed. I noted it as soon as I started testing 8.0.x (see a duplicate by Roel Van de Paar, Bug #89444) and it was very annoying. I've already checked (see above) that the problem is gone!
  • Bug #89038 - "Add new column to 'mysql.routines' to accommodate the Polygot project". So, Oracle is planning to support stored programs in different languages! Thank you, Sivert Sørumgård, for reporting this in public! See also his Bug #89035 - "Reject LCTN changing after --initialize".
  • Bug #87836 - "XA COMMIT/ROLLBACK rejected by non-autocommit session with no active transaction". It would be sad if this bug is not fixed in MySQL 5.7.x, where it was originally found by Wei Zhao.
  • Bug #87708 - "MDL for column statistics is not properly reflected in P_S.METADATA_LOCKS". It was reported by Erik Frøseth.
  • Bug #85997 - "inplace alter table with foreign keys causes table definition mismatch". This bug was reported by Magnus Blåudd.
  • Bug #85561 - "Users can be assigned non-existing roles as default". Nice to see this bug reported by Giuseppe Maxia fixed in GA release.
  • Bug #33004 - "integer constants casted to bigints by unions". This bug was reported by Domas Mituzas more than 10 years ago!
Now I have to stop, as I found private bug in release notes, Bug #89512. Based on description:
"Window function row-buffer handling has been refactored to reduce the number of handler reads by 25%. (Bug #89512, Bug #27484133)"
I truly do not get why it remains private (or why it was reported in public for such a "sensitive" matter), so I better stop.

MySQL 8 is GA, finally! There are a lot more fixes there that I had not mentioned above. I am surely there is even more bugs to find. So, happy hunting!



by Valeriy Kravchuk (noreply@blogger.com) at April 20, 2018 07:02 AM

April 19, 2018

Peter Zaitsev

Sysbench-tpcc Supports PostgreSQL (No, Really This Time)

Sysbench-tpcc Supports PostgreSQL

Sysbench-tpcc Supports PostgreSQLThis time, we really mean it when we say sysbench-tpcc supports PostgreSQL.

When I initially announced sysbench-tpcc, I mentioned it potentially could run against PostgreSQL, but it was more like wishful thinking than reality. The reality was that even though both databases speak SQL, the difference in dialects was too big and the queries written for MySQL could not run without modification on PostgreSQL.

Well, we introduced needed changes, and now you can use sysbench-tpcc with PostgreSQL. Just try the latest commit to https://github.com/Percona-Lab/sysbench-tpcc.

If you’re interested, here is a quick overview of what changes we had to make:

  1. It appears that PostgreSQL does not support the 
    tinyint
     and
    datetime
     data types. We had to use smallint and
    timestamp
     fields, even if using
    smallint
     makes the database size bigger.
  2. PostgreSQL does not have a simple equivalent for MySQL’s
    SHOW TABLES
    . The best replacement we found is
    select * from pg_catalog.pg_tables where schemaname != 'information_schema' and schemaname != 'pg_catalog'
    .
  3. PostgreSQL does not have a way to disable Foreign Key checks like MySQL:
    SET FOREIGN_KEY_CHECKS=0
    . With PostgreSQL, we needed to create and load tables in a very specific order to avoid Foreign Keys violations.
  4. PostgreSQL requires you to have a unique index name per the whole database, white MySQL requires it only per table. So instead of using:
    CREATE INDEX idx_customer ON customer1 (c_w_id,c_d_id,c_last,c_first)
    CREATE INDEX idx_customer ON customer2 (c_w_id,c_d_id,c_last,c_first)

    We need to use:
    CREATE INDEX idx_customer1 ON customer1 (c_w_id,c_d_id,c_last,c_first)
    CREATE INDEX idx_customer2 ON customer2 (c_w_id,c_d_id,c_last,c_first)
  5. PostgreSQL does not have a 
    STRAIGHT_JOIN
     hint, so we had to remove this from queries. But it is worth mentioning we use
    STRAIGHT_JOIN
     mostly as a hack to force MySQL to use a correct execution plan for one of the queries.
  6. PostgreSQL is very strict on GROUP BY queries. All fields that are not in the GROUP BY clause must use an aggregation function. So PostgreSQL complained on queries like
    SELECT d_w_id,sum(d_ytd)-w_ytd diff FROM district,warehouse WHERE d_w_id=w_id AND w_id=1 GROUP BY d_w_id
     even when we know that only single value for w_ytd is possible. We had to rewrite this query as
    SELECT d_w_id,SUM(d_ytd)-MAX(w_ytd) diff FROM district,warehouse WHERE d_w_id=w_id AND w_id=1 GROUP BY d_w_id
    .

So you can see there was some work involved when we try to migrate even a simple application from MySQL to PostgreSQL.

Hopefully, now sysbench-tpcc supports PostgreSQL, it is a useful tool to evaluate a PostgreSQL performance. If you find that we did not optimally execute some transaction, please let us know!

The post Sysbench-tpcc Supports PostgreSQL (No, Really This Time) appeared first on Percona Database Performance Blog.

by Vadim Tkachenko at April 19, 2018 10:38 PM

Congratulations to Our Friends at Oracle with the MySQL 8.0 GA Release!

MySQL 8.0 GA

MySQL 8.0 GAIt is a great today for whole MySQL community: MySQL 8.0 was just released as GA!

Geir Høydalsvik has a great summary in his “What’s New in MySQL 8.0” blog post. You can find additional information about MySQL 8.0 Replication and MySQL 8.0 Document Store that is also worth reading.

If you can’t wait to upgrade to MySQL 8.0, please make sure to read the Upgrading to MySQL 8.0 section in the manual, and pay particular attention to changes to Connection Authentication. It requires special handling for most applications.

Also keep in mind that while MySQL 8.0 passed through an extensive QA process, this is the first GA release. It is not yet as mature and polished as MySQL 5.7. If you’re just now starting application development, however, you should definitely start with MySQL 8.0 — by the time you launch your application, 8.0 will be good. 

All of us at Percona – and me personally – are very excited about this release. You can learn more details about what we expect from it in our Why We’re Excited about MySQL 8.0 webinar recording.    

We also wrote extensively about MySQL 8.0 on our blog. Below are some posts on various features, as well as thoughts on the various RCs, that you might want to review:

The best way to learn about MySQL 8.0, though, is to attend the Percona Live Open Source Database Conference 2018, taking place in Santa Clara, CA next week. We have an outstanding selection of MySQL 8.0 focused talks both from the MySQL Engineering team and the community at large (myself included):

You can still get tickets to the conference. Come by and learn about MySQL 8.0. If you can’t make it, please check back later for slides.

Done reading? Go ahead go download  MySQL 8.0 and check it out!

The post Congratulations to Our Friends at Oracle with the MySQL 8.0 GA Release! appeared first on Percona Database Performance Blog.

by Peter Zaitsev at April 19, 2018 07:45 PM

April 18, 2018

Peter Zaitsev

Why Analyze Raw MySQL Query Logs?

Raw MySQL Query Logs

Raw MySQL Query LogsIn this blog post, I’ll examine when looking at raw MySQL query logs can be more useful than working with tools that only have summary data.

In my previous blog post, I wrote about analyzing MySQL Slow Query Logs with ClickHouse and ClickTail. One of the follow-up questions I got is when do you want to do that compared to just using tools like Percona Monitoring and Management or VividCortex, which provide a beautiful interface for detailed analyses (rather than spartan SQL interface).    

MySQL Logs

A lot of folks are confused about what query logs MySQL has, and what you can use them for. First, MySQL has a “General Query Log”. As the name implies, this is a general-purpose query log. You would think this is the first log you should use, but it is, in fact, pretty useless:

2018-03-31T15:38:44.521650Z      2356 Query SELECT c FROM sbtest1 WHERE id=164802
2018-03-31T15:38:44.521790Z      2356 Query SELECT c FROM sbtest1 WHERE id BETWEEN 95241 AND 95340
2018-03-31T15:38:44.522168Z      2356 Query SELECT SUM(k) FROM sbtest1 WHERE id BETWEEN 1 AND 100
2018-03-31T15:38:44.522500Z      2356 Query SELECT c FROM sbtest1 WHERE id BETWEEN 304556 AND 304655 ORDER BY c
2018-03-31T15:38:44.522941Z      2356 Query SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 924 AND 1023 ORDER BY c
2018-03-31T15:38:44.523525Z      2356 Query UPDATE sbtest1 SET k=k+1 WHERE id=514

As you can see, it only has very limited information about queries: no query execution time or which user is running the query. This type of log is helpful if you want to see very clean, basic information on what queries your application is really running. It can also help debug MySQL crashes because, unlike other log formats, the query is written to this log file before MySQL attempts to execute the query.

The MySQL Slow Log is, in my opinion, much more useful (especially with Percona Server Slow Query Log Extensions). Again as the name implies, you would think it is only used for slow queries (and by default, it is). However, you can set long_query_time to 0 (with a few other options) to get all queries here with lots of rich information about query execution:

# Time: 2018-03-31T15:48:55.795145Z
# User@Host: sbtest[sbtest] @ localhost []  Id: 2332
# Schema: sbtest  Last_errno: 0 Killed: 0
# Query_time: 0.000143  Lock_time: 0.000047 Rows_sent: 1  Rows_examined: 1 Rows_affected: 0
# Bytes_sent: 188  Tmp_tables: 0 Tmp_disk_tables: 0  Tmp_table_sizes: 0
# QC_Hit: No  Full_scan: No Full_join: No  Tmp_table: No Tmp_table_on_disk: No
# Filesort: No  Filesort_on_disk: No  Merge_passes: 0
#   InnoDB_IO_r_ops: 0  InnoDB_IO_r_bytes: 0  InnoDB_IO_r_wait: 0.000000
#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
#   InnoDB_pages_distinct: 0
# Log_slow_rate_type: query  Log_slow_rate_limit: 10
SET timestamp=1522511335;
SELECT c FROM sbtest1 WHERE id=2428336;

Finally, there is the MySQL Audit Log, which is part of the MySQL Enterprise offering and format-compatible Percona Server for MySQL Audit Log Plugin. This is designed for auditing access to the server, and as such it has matched details in the log. Unlike the first two log formats, it is designed first and foremost to be machine-readable and supports JSON, XML and CVS output formats:

{"audit_record":{"name":"Query","record":"743017006_2018-03-31T01:03:12","timestamp":"2018-03-31T15:53:42 UTC","command_class":"select","connection_id":"2394","status":0,"sqltext":"SELECT SUM(k) FROM sbtest1 WHERE id BETWEEN 3 AND 102","user":"sbtest[sbtest] @ localhost []","host":"localhost","os_user":"","ip":"","db":"sbtest"}}
{"audit_record":{"name":"Query","record":"743017007_2018-03-31T01:03:12","timestamp":"2018-03-31T15:53:42 UTC","command_class":"select","connection_id":"2394","status":0,"sqltext":"SELECT c FROM sbtest1 WHERE id BETWEEN 2812021 AND 2812120 ORDER BY c","user":"sbtest[sbtest] @ localhost []","host":"localhost","os_user":"","ip":"","db":"sbtest"}}
{"audit_record":{"name":"Query","record":"743017008_2018-03-31T01:03:12","timestamp":"2018-03-31T15:53:42 UTC","command_class":"select","connection_id":"2394","status":0,"sqltext":"SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 1 AND 100 ORDER BY c","user":"sbtest[sbtest] @ localhost []","host":"localhost","os_user":"","ip":"","db":"sbtest"}}

As you can see, there are substantial differences in the purposes of the different MySQL log formats, along with the information they provide.

Why analyze raw MySQL query logs

In my opinion, there are two main reasons to look directly at raw log files without aggregation (you might find others):

  • Auditing, where the Audit Log is useful (Vadim recently blogged about it)
  • Advanced MySQL/application debugging, where an aggregated summary might not allow you to drill down to the fullest level of detail

When you’re debugging using MySQL logs, the Slow Query Log, set to log all queries with no sampling, is the most useful. Of course, this can cause significant additional overhead in many workloads, so it is best to do it in a development environment (if you can repeat the situation you’re looking to analyze). At the very least, don’t do it during peak time.

For Percona Server for MySQL, these options ensure it logs all queries to the query log with no sampling:

log_output=file
slow_query_log=ON
long_query_time=0
log_slow_rate_limit=1
log_slow_verbosity=full
log_slow_admin_statements=ON
log_slow_slave_statements=ON
slow_query_log_always_write_time=1

Now that we have full queries, we can easily use Linux command line tools like grep and others to look into what is going on. However, many times this isn’t always convenient. This is where loading logs into storage that you can conveniently query is a good solution.

Let’s look into some specific and interesting cases.

Were any queries killed?

SELECT
   _time,
   query,
   query_time
FROM mysql_slow_log
WHERE killed > 0
┌───────────────_time─┬─query───────────────────────────────┬─query_time─┐
│ 2018-04-02 19:02:56 │ select benchmark(10000000000,"1+1") │  10.640794 │
└─────────────────────┴─────────────────────────────────────┴────────────┘
1 rows in set. Elapsed: 0.242 sec. Processed 929.14 million rows, 1.86 GB (3.84                                         billion rows/s., 7.67 GB/s.)

Yes. A query got killed after running for 10 seconds.

Did any query fail? With what error codes?

SELECT
   error_num,
   min(_time),
   max(_time),
   count(*)
FROM mysql_slow_log
GROUP BY error_num
┌─error_num─┬──────────min(_time)─┬──────────max(_time)─┬───count()─┐
│         0 │ 2018-04-02 18:59:49 │ 2018-04-07 19:39:27 │ 925428375 │
│      1160 │ 2018-04-02 19:02:56 │ 2018-04-02 19:02:56 │         1 │
│      1213 │ 2018-04-02 19:00:00 │ 2018-04-07 19:18:14 │   3709520 │
│      1054 │ 2018-04-07 19:38:14 │ 2018-04-07 19:38:14 │         1 │
└───────────┴─────────────────────┴─────────────────────┴───────────┘
4 rows in set. Elapsed: 2.391 sec. Processed 929.14 million rows, 7.43 GB (388.64 million rows/s., 3.11 GB/s.)

You can resolve error codes with the 

perror
 command:

root@rocky:~# perror 1054
MySQL error code 1054 (ER_BAD_FIELD_ERROR): Unknown column '%-.192s' in '%-.192s'

This command has many uses. You can use it to hunt down application issues (like in this example of a missing column — likely due to bad or old code). It can also help you to spot SQL injection attempts that often cause queries with bad syntax, and troubleshoot deadlocks or foreign key violations.

Are there any nasty, long transactions?

SELECT
   transaction_id,
   max(_time) - min(_time) AS run_time,
   count(*) AS num_queries,
   sum(rows_affected) AS rows_changed
FROM mysql_slow_log
WHERE transaction_id != ''
GROUP BY transaction_id
ORDER BY rows_changed DESC
LIMIT 10
┌─transaction_id─┬─run_time─┬─num_queries─┬─rows_changed─┐
│ 17E070082      │ 0        │      1      │ 9999         │
│ 17934C73C      │ 2        │      6      │ 4            │
│ 178B6D346      │ 0        │      6      │ 4            │
│ 17C909086      │ 2        │      6      │ 4            │
│ 17B45EFAD      │ 5        │      6      │ 4            │
│ 17ABAB840      │ 0        │      6      │ 4            │
│ 17A36AD3F      │ 3        │      6      │ 4            │
│ 178E037A5      │ 1        │      6      │ 4            │
│ 17D1549C9      │ 0        │      6      │ 4            │
│ 1799639F2      │ 1        │      6      │ 4            │
└────────────────┴──────────┴─────────────┴──────────────┘
10 rows in set. Elapsed: 15.574 sec. Processed 930.58 million rows, 18.23 GB (59.75 million rows/s., 1.17 GB/s.)

Finding transactions that modify a lot of rows, like transaction 17E070082 above, can be very helpful to ensure you control MySQL replication slave lag. It is also critical if you’re looking to migrate to MySQL Group Replication or Percona XtraDB Cluster.

What statements were executed in a long transaction?

SELECT
   _time,
   _ms,
   query
FROM mysql_slow_log
WHERE transaction_id = '17E070082'
ORDER BY
   _time ASC,
   _ms ASC
LIMIT 10
┌───────────────_time─┬────_ms─┬─query─────────────────────────────────┐
│ 2018-04-07 20:08:43 │ 890693 │ update sbtest1 set k=0 where id<10000 │
└─────────────────────┴────────┴───────────────────────────────────────┘
1 rows in set. Elapsed: 2.361 sec. Processed 931.04 million rows, 10.79 GB (394.27 million rows/s., 4.57 GB/s.)

I used transaction 17E070082 from the previous query above (which modified 9999 rows). Note that this schema improves compression by storing the seconds and microseconds parts of the timestamp in different columns.

Were any queries dumping large numbers of rows from the database?

SELECT
   _time,
   query,
   rows_sent,
   bytes_sent
FROM mysql_slow_log
WHERE rows_sent > 10000
┌───────────────_time─┬─query────────────────────────────────────────────┬─rows_sent─┬─bytes_sent─┐
│ 2018-04-07 20:21:08 │ SELECT /*!40001 SQL_NO_CACHE */ * FROM `sbtest1` │  10000000 │ 1976260712 │
└─────────────────────┴──────────────────────────────────────────────────┴───────────┴────────────┘
1 rows in set. Elapsed: 0.294 sec. Processed 932.19 million rows, 3.73 GB (3.18 billion rows/s., 12.71 GB/s.)

Did someone Update a record?

SELECT
   _time,
   query
FROM mysql_slow_log
WHERE (rows_affected > 0) AND (query LIKE '%id=3301689%')
LIMIT 1
┌───────────────_time─┬─query─────────────────────────────────────┐
│ 2018-04-02 19:04:48 │ UPDATE sbtest1 SET k=k+1 WHERE id=3301689 │
└─────────────────────┴───────────────────────────────────────────┘
1 rows in set. Elapsed: 0.046 sec. Processed 2.29 million rows, 161.60 MB (49.57 million rows/s., 3.49 GB/s.)

Note that I’m cheating here by assuming we know an update used a primary key, but it is practically helpful in a lot of cases.

These are just some of the examples of what you can find out by querying raw slow query logs. They contain a ton of information about query execution (especially in Percona Server for MySQL) that allows you to use them both for performance analysis and some security and auditing purposes.

The post Why Analyze Raw MySQL Query Logs? appeared first on Percona Database Performance Blog.

by Peter Zaitsev at April 18, 2018 11:32 PM

Restore a MongoDB Logical Backup

MongoDB Logical Backup

MongoDB Logical BackupIn this article, we will explain how to restore a MongoDB logical backup performed via ‘mongodump’ to a mongod instance.

MongoDB logical backup requires the use of the ‘mongorestore‘ tool to perform the restore backup. This article focuses on this tool and process.

Note: Percona develops a backup tool named Percona-Lab/mongodb-consistent-backup, which is a wrapper for ‘mongodump‘, adding cluster-wide backup consistency. The backups created by mongodb_consistent_backup (in Dump/Mongodump mode) can be restored using the exact same steps as a regular ‘mongodump’ backup – no special steps!

Mongorestore Command Flags

–host/–port (and –user/–password)

Required, even if you’re using the default host/port (localhost:27017). If authorization is enabled, add –user/–password flags also.

–drop

This is almost always required. This causes ‘mongodump‘ to drop the collection that is being restored before restoring it. Without this flag, the documents from the backup are inserted one at a time and if they already exist the restore fails.

–oplogReplay

This is almost always required. Replays the oplog that was dumped by mongodump. It is best to include this flag on replset-based backups unless there is a specific reason not to. You can tell if the backup was from a replset by looking for the file ‘oplog.bson‘ at the base of the dump directory.

–dir

Required. The path to the mongodump data.

–gzip

Optional. For mongodump >= 3.2, enables inline compression on the restore. This is required if ‘mongodump‘ used the –gzip flag (look for *.bson.gz files if you’re not sure if the collection files have no .gz suffix, don’t use –gzip).

–numParallelCollections=<number>

Optional. For mongodump >= 3.2 only, sets the number of collections to insert in parallel. By default four threads are used, and if you have a large server and you want to restore faster (more resource usage though), you could increase this number. Note that each thread uncompresses bson if the ‘–gzip‘ flag is used, so consider this when raising this number.

Steps

  1. (Optional) If the backup is archived (mongodb_consistent_backup defaults to creating tar archives), un-archive the backup so that ‘mongorestore‘ can access the .bson/.bson.gz files:
    $ tar -C /opt/mongodb/backup/testbackup/20160809_1306 -xvf /opt/mongodb/backup/testbackup/20160809_1306/test1.tar
    test1/
    test1/dump/
    test1/dump/wikipedia/
    test1/dump/wikipedia/pages.metadata.json.gz
    test1/dump/wikipedia/pages.bson.gz
    test1/dump/oplog.bson

    ** This command un-tars the backup to ‘/opt/mongodb/backup/testbackup/20160809_1306/test1/dump’ **

  2. Check (and then check again!) that you’re restoring the right backup to the right host. When in doubt, it is safer to ask the customer or others.
    1. The Percona ‘mongodb_consistent_backup‘ tool names backup subdirectories by replica set name, so you can ensure you’re restoring the right backup by checking the replica set name of the node you’re restoring to, if it exists.
    2. If you’re restoring to a replica set you will need to restore to the PRIMARY member and there needs to be a majority (so writes are accepted – some exceptions if you override write-concern, but not advised).
  3. Use ‘mongorestore‘ to restore the data by dropping/restoring each collection (–drop flag) and replay the oplog changes (–oplogReplay flag), specifying the restore dir explicitly (–dir flag) to the ‘mongorestore‘ command. In this example I also used authorization (–user/–password flags) and un-compression (–gzip flag):
    $ mongorestore --drop --host localhost --port 27017 --user secret --password secret --oplogReplay --gzip --dir /opt/mongodb/backup/testbackup/20160809_1306/test1/dump
    2016-08-09T14:23:04.057+0200    building a list of dbs and collections to restore from /opt/mongodb/backup/testbackup/20160809_1306/test1/dump dir
    2016-08-09T14:23:04.065+0200    reading metadata for wikipedia.pages from /opt/mongodb/backup/testbackup/20160809_1306/test1/dump/wikipedia/pages.metadata.json.gz
    2016-08-09T14:23:04.067+0200    restoring wikipedia.pages from /opt/mongodb/backup/testbackup/20160809_1306/test1/dump/wikipedia/pages.bson.gz
    2016-08-09T14:23:07.058+0200    [#######.................]  wikipedia.pages  63.9 MB/199.0 MB  (32.1%)
    2016-08-09T14:23:10.058+0200    [###############.........]  wikipedia.pages  127.7 MB/199.0 MB  (64.1%)
    2016-08-09T14:23:13.060+0200    [###################.....]  wikipedia.pages  160.4 MB/199.0 MB  (80.6%)
    2016-08-09T14:23:16.059+0200    [#######################.]  wikipedia.pages  191.5 MB/199.0 MB  (96.2%)
    2016-08-09T14:23:19.071+0200    [########################]  wikipedia.pages  223.5 MB/199.0 MB  (112.3%)
    2016-08-09T14:23:22.062+0200    [########################]  wikipedia.pages  255.6 MB/199.0 MB  (128.4%)
    2016-08-09T14:23:25.067+0200    [########################]  wikipedia.pages  271.4 MB/199.0 MB  (136.4%)
    ...
    ...
    2016-08-09T14:24:19.058+0200    [########################]  wikipedia.pages  526.9 MB/199.0 MB  (264.7%)
    2016-08-09T14:24:22.058+0200    [########################]  wikipedia.pages  558.9 MB/199.0 MB  (280.8%)
    2016-08-09T14:24:23.521+0200    [########################]  wikipedia.pages  560.6 MB/199.0 MB  (281.6%)
    2016-08-09T14:24:23.522+0200    restoring indexes for collection wikipedia.pages from metadata
    2016-08-09T14:24:23.528+0200    finished restoring wikipedia.pages (32725 documents)
    2016-08-09T14:24:23.528+0200    replaying oplog
    2016-08-09T14:24:23.597+0200    done
    1. If you encounter problems with ‘mongorestore‘, carefully read the error message or rerun with several ‘-v‘ flags, e.g.: ‘-vvv‘. Once you have an error, attempt to troubleshoot the cause.
  4. Check to see that you saw “replaying oplog” and “done” after the restore (last two lines in the example). If you don’t see this, there is a problem.

As you notice, using this tool for MongoDB logical backup is very simple. However, when using sharding please note that –oplog is not available and the mongodump uses the primaries for each shard. As this is not advised typically in production, you might consider looking at Percona-Lab/mongodb-consistent-backup to ensure you are consistent and hitting secondary nodes, like mongodump with replica sets, will work.

If MongoDB and topics like this interest you, please see the document below, we are hiring!

{
  hiring: true,
  role: "Consultant",
  tech: "MongoDB",
  location: "USA",
  moreInfo: "https://www.percona.com/about-percona/careers/mongodb-consultant-usa-based"
}

The post Restore a MongoDB Logical Backup appeared first on Percona Database Performance Blog.

by Tim Vaillancourt at April 18, 2018 05:22 PM

Webinar Thursday, April 19, 2018: Running MongoDB in Production, Part 1

Running MongoDB

Running MongoDBPlease join Percona’s Senior Technical Operations Architect, Tim Vaillancourt as he presents Running MongoDB in Production, Part 1 on Thursday, April 19, 2018, at 10:00 am PDT (UTC-7) / 1:00 pm EDT (UTC-4).

Are you a seasoned MySQL DBA that needs to add MongoDB to your skills? Are you used to managing a small environment that runs well, but want to know what you might not know yet? This webinar helps you with running MongoDB in production environments.

MongoDB works well, but when it has issues, the number one question is “where should I go to solve a problem?”

This tutorial will cover:

Backups
– Logical vs Binary-level backups
– Sharding and Replica-Set Backup strategies
Security
– Filesystem and Network Security
– Operational Security
– External Authentication features of Percona Server for MongoDB
– Securing connections with SSL and MongoDB Authorization
– Encryption at Rest
– New Security features in 3.6
Monitoring
– Monitoring Strategy
– Important metrics to monitor in MongoDB and Linux
– Percona Monitoring and Management

Register for the webinar now.

Part 2 of this series will take place on Thursday, April 26, 2018, at 10:00 am PDT (UTC-7) / 1:00 pm EDT (UTC-4). Register for the second part of this series here.

Running MongoDBTimothy Vaillancourt, Senior Technical Operations Architect

Tim joined Percona in 2016 as Sr. Technical Operations Architect for MongoDB, with the goal to make the operations of MongoDB as smooth as possible. With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned. Tim is based in Amsterdam, NL and enjoys traveling, coding and music.

Prior to Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS. Before moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

The post Webinar Thursday, April 19, 2018: Running MongoDB in Production, Part 1 appeared first on Percona Database Performance Blog.

by Tim Vaillancourt at April 18, 2018 02:07 PM

MariaDB Foundation

Testing Rocket.Chat and Zulip

The MariaDB Foundation is publicly testing two new communications tools, Rocket.Chat and Zulip, and we’re seeing if they can meet a few of our requirements. Roughly in order of importance: 1) Open Source 2) A tool similar in functionality to the proprietary Slack. 3) A tool for internal staff chat, replacing some of the proprietary […]

The post Testing Rocket.Chat and Zulip appeared first on MariaDB.org.

by Ian Gilfillan at April 18, 2018 08:17 AM

April 17, 2018

Peter Zaitsev

Using Hints to Analyze Queries

Hints to Analyze Queries

Hints to Analyze QueriesIn this blog post, we’ll look at using hints to analyze queries.

There are a lot of things that you can do wrong when writing a query, which means that there a lot of things that you can do to make it better. From my personal experience there are two things you should review first:

  1. The table join order
  2. Which index is being used

Why only those two? Because many other alternatives that are more expensive, and at the end query optimization is a cost-effectiveness analysis. This is why we must start with the simplest fixes. We can control this with the hints “straight_join” and “force index”. These allow us to execute the query with the plan that we would like to test.

Join Order

In a query where we use multiple tables or subqueries, we have some particular fields that we are going to use to join the tables. Those fields could be the Primary Key of the table, the first part of a secondary index, neither or both. But before we analyze possible scenarios, table structure or indexes, we need to establish what is the best order for that query to join the tables.

When we talked about join order and the several tables to join, one possible scenario is that a table is using a primary key to join a table, and another field to join to other tables. For instance:

select
  table_a.id, table_b.value1, table_c.value1
from
  table_a join
  table_b on table_a.id = table_b.id join
  table_c on table_b.id_c = table_c.id
where
  table_a.value1=10;

We get this explain:

+----+-------------+---------+--------+----------------+---------+---------+------------------------------------+------+-------------+
| id | select_type | table   | type   | possible_keys  | key     | key_len | ref                                | rows | Extra       |
+----+-------------+---------+--------+----------------+---------+---------+------------------------------------+------+-------------+
|  1 | SIMPLE      | table_a | ref    | PRIMARY,value1 | value1  | 5       | const                              |    1 | Using index |
|  1 | SIMPLE      | table_b | eq_ref | PRIMARY        | PRIMARY | 4       | bp_query_optimization.table_a.id   |    1 | Using where |
|  1 | SIMPLE      | table_c | eq_ref | PRIMARY        | PRIMARY | 4       | bp_query_optimization.table_b.id_c |    1 | NULL        |
+----+-------------+---------+--------+----------------+---------+---------+------------------------------------+------+-------------+

It is filtering by value1 on table_a, which joins with table_b with the primary key, and table_c uses the value of id_c which it gets from table_b.

But we can change the table order and use straight_join:

select straight_join
  table_a.id, table_b.value1, table_c.value1
from
  table_c join
  table_b on table_b.id_c = table_c.id join
  table_a on table_a.id = table_b.id
where
  table_a.value1=10;

The query is semantically the same, but now we get this explain:

+----+-------------+---------+--------+----------------+---------+---------+----------------------------------+------+-------------+
| id | select_type | table   | type   | possible_keys  | key     | key_len | ref                              | rows | Extra       |
+----+-------------+---------+--------+----------------+---------+---------+----------------------------------+------+-------------+
|  1 | SIMPLE      | table_c | ALL    | PRIMARY        | NULL    | NULL    | NULL                             |    1 | NULL        |
|  1 | SIMPLE      | table_b | ref    | PRIMARY,id_c   | id_c    | 5       | bp_query_optimization.table_c.id |    1 | NULL        |
|  1 | SIMPLE      | table_a | eq_ref | PRIMARY,value1 | PRIMARY | 4       | bp_query_optimization.table_b.id |    1 | Using where |
+----+-------------+---------+--------+----------------+---------+---------+----------------------------------+------+-------------+

In this case, we are performing a full table scan over table_c, which then joins with table_b using index over id_c to finally join table_a using the primary key.

Sometimes the optimizer chooses the incorrect join order because of bad statistics. I found myself reviewing the first query with the second explain plan, where the only thing that I did to find the query problem was to add “STRAIGHT_JOIN” to the query.

Taking into account that the optimizer could fail on this task, we found a practical way to force it to do what we want (change the join order).

It is also useful to find out when an index is missing. For example:

SELECT costs.id as cost_id, spac_types.id as spac_type_id
FROM
spac_types INNER JOIN
costs_spac_types ON costs_spac_types.spac_type_id = spac_types.id INNER JOIN
costs ON costs.id = costs_spac_types.cost_id
WHERE spac_types.place_id = 131;

The explain plan shows:

+----+-------------+------------------+--------+----------------------------------------------------+----------------------------------------------------+---------+-----------------------------------+-------+-------------+
| id | select_type | table            | type  | possible_keys                                       | key                                                | key_len | ref                               | rows  | Extra       |
+----+-------------+------------------+--------+----------------------------------------------------+----------------------------------------------------+---------+-----------------------------------+-------+-------------+
|  1 | SIMPLE      | costs_spac_types | index  | index_costs_spac_types_on_cost_id_and_spac_type_id | index_costs_spac_types_on_cost_id_and_spac_type_id | 8       | NULL                              | 86408 | Using index |
|  1 | SIMPLE      | spac_types       | eq_ref | PRIMARY,index_spac_types_on_place_id_and_spac_type | PRIMARY                                            | 4       | pms.costs_spac_types.spac_type_id |     1 | Using where |
|  1 | SIMPLE      | costs            | eq_ref | PRIMARY                                            | PRIMARY                                            | 4       | pms.costs_spac_types.cost_id      |     1 | Using index |
+----+-------------+------------------+--------+----------------------------------------------------+----------------------------------------------------+---------+-----------------------------------+-------+-------------+

It is starting with costs_spac_types and then using the clustered index for the next two tables. The explain doesn’t look bad!

However, it was taking longer than this:

SELECT STRAIGHT_JOIN costs.id as cost_id, spac_types.id as spac_type_id
FROM
spac_types INNER JOIN
costs_spac_types ON costs_spac_types.spac_type_id = spac_types.id INNER JOIN
costs ON costs.id = costs_spac_types.cost_id
WHERE spac_types.place_id = 131;

0.17 sec versus 0.09 sec. This is the explain plan:

+----+-------------+------------------+--------+----------------------------------------------------+----------------------------------------------------+---------+------------------------------+-------+-----------------------------------------------------------------+
| id | select_type | table            | type   | possible_keys                                      | key                                                | key_len | ref                          | rows  | Extra                                                           |
+----+-------------+------------------+--------+----------------------------------------------------+----------------------------------------------------+---------+------------------------------+-------+-----------------------------------------------------------------+
|  1 | SIMPLE      | spac_types       | ref    | PRIMARY,index_spac_types_on_place_id_and_spac_type | index_spac_types_on_place_id_and_spac_type         | 4      | const                         |    13 | Using index                                                     |
|  1 | SIMPLE      | costs_spac_types | index  | index_costs_spac_types_on_cost_id_and_spac_type_id | index_costs_spac_types_on_cost_id_and_spac_type_id | 8      | NULL                          | 86408 | Using where; Using index; Using join buffer (Block Nested Loop) |
|  1 | SIMPLE      | costs            | eq_ref | PRIMARY                                            | PRIMARY                                            | 4      | pms.costs_spac_types.cost_id  |     1 | Using index                                                     |
+----+-------------+------------------+--------+----------------------------------------------------+----------------------------------------------------+---------+------------------------------+-------+-----------------------------------------------------------------+

Reviewing the table structure:

CREATE TABLE costs_spac_types (
  id int(11) NOT NULL AUTO_INCREMENT,
  cost_id int(11) NOT NULL,
  spac_type_id int(11) NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY index_costs_spac_types_on_cost_id_and_spac_type_id (cost_id,spac_type_id)
) ENGINE=InnoDB AUTO_INCREMENT=172742 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

I saw that the unique index was over cost_id and then spac_type_id. After adding this index:

ALTER TABLE costs_spac_types ADD UNIQUE KEY (spac_type_id,cost_id);

Now, the explain plan without STRIGHT_JOIN is:

+----+-------------+------------------+--------+-----------------------------------------------------------------+--------------------------------------------+---------+------------------------------+------+-------------+
| id | select_type | table            | type   | possible_keys                                                   | key                                        | key_len | ref                          | rows | Extra       |
+----+-------------+------------------+--------+-----------------------------------------------------------------+--------------------------------------------+---------+------------------------------+------+-------------+
|  1 | SIMPLE      | spac_types       | ref    | PRIMARY,index_spac_types_on_place_id_and_spac_type              | index_spac_types_on_place_id_and_spac_type | 4      | const                         |   13 | Using index |
|  1 | SIMPLE      | costs_spac_types | ref    | index_costs_spac_types_on_cost_id_and_spac_type_id,spac_type_id | spac_type_id                               | 4      | pms.spac_types.id             |   38 | Using index |
|  1 | SIMPLE      | costs            | eq_ref | PRIMARY                                                         | PRIMARY                                    | 4      | pms.costs_spac_types.cost_id  |    1 | Using index |
+----+-------------+------------------+--------+-----------------------------------------------------------------+--------------------------------------------+---------+------------------------------+------+-------------+

Which is much better, as it is scanning fewer rows and the query time is just 0.01 seconds.

Indexes

The optimizer has the choice of using a clustered index, a secondary index, a partial secondary index or no index at all, which means that it uses the clustered index.

Sometimes the optimizer ignores the use of an index because it thinks reading the rows directly is faster than an index lookup:

mysql> explain select * from table_c where id=1;
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | table_c | const | PRIMARY       | PRIMARY | 4       | const |    1 | NULL  |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------+
mysql> explain select * from table_c where value1=1;
+----+-------------+---------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+---------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | table_c | ALL  | NULL          | NULL | NULL    | NULL |    1 | Using where |
+----+-------------+---------+------+---------------+------+---------+------+------+-------------+

In both cases, we are reading directly from the clustered index.

Then, we have secondary indexes that are partially used or/and that are partially useful for the query. This means that we are going to scan the index and then we are going to lookup in the clustered index. YES! TWO STRUCTURES WILL BE USED! We usually don’t realize any of this, but this is like an extra join between the secondary index and the clustered index.

Finally, the covering index, which is simple to identify as “Using index” in the extra column:

mysql> explain select value1 from table_a where value1=1;
+----+-------------+---------+------+---------------+--------+---------+-------+------+-------------+
| id | select_type | table   | type | possible_keys | key    | key_len | ref   | rows | Extra       |
+----+-------------+---------+------+---------------+--------+---------+-------+------+-------------+
|  1 | SIMPLE      | table_a | ref  | value1        | value1 | 5       | const |    1 | Using index |
+----+-------------+---------+------+---------------+--------+---------+-------+------+-------------+

Index Analysis

As I told you before, this is a cost-effectiveness analysis from the point of view of query performance. Most of the time it is faster to use covering indexes than secondary indexes, and finally the clustered index. However, usually covering indexes are more expensive for writes, as you need more fields to cover the query needs. So we are going to use a secondary index that also uses the clustered index. If the amount of rows is not large and it is selecting most of the rows, however, it could be even faster to perform a full table scan. Another thing to take into account is that the amount of indexes affects the write rate.

Let’s do an analysis. This is a common query:

mysql> explain select * from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                             | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
|  1 | SIMPLE      | t2    | ALL    | NULL          | NULL    | NULL    | NULL                            |   64 | Using where |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | bp_query_optimization.t2.value1 |    1 | NULL        |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+

It is using all the fields of each table.

This is more restrictive:

mysql> explain select t1.id, t1.value1, t1.value2, t2.value2 from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                             | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
|  1 | SIMPLE      | t2    | ALL    | NULL          | NULL    | NULL    | NULL                            |   64 | Using where |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | bp_query_optimization.t2.value1 |    1 | NULL        |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+

But it is performing a full table scan over t2, and then is using t2.value1 to lookup on t1 using the clustered index.

Let’s add an index on table_index_analisis_2 over value1:

mysql> alter table table_index_analisis_2 add key (value1);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

The explain shows that it is not being used, not even when we force it:

mysql> explain select * from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                             | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
|  1 | SIMPLE      | t2    | ALL    | value1        | NULL    | NULL    | NULL                            |   64 | Using where |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | bp_query_optimization.t2.value1 |    1 | NULL        |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
mysql> explain select * from table_index_analisis_1 t1, table_index_analisis_2 t2 force key (value1) where t1.id = t2.value1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                             | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
|  1 | SIMPLE      | t2    | ALL    | value1        | NULL    | NULL    | NULL                            |   64 | Using where |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | bp_query_optimization.t2.value1 |    1 | NULL        |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+

This is because the optimizer considers performing a full table scan better than using a part of the index.

Now we are going to add an index over value1 and value2:

mysql> alter table table_index_analisis_2 add key (value1,value2);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> explain select t1.id, t1.value1, t1.value2, t2.value2 from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+----+-------------+-------+--------+-----------------+----------+---------+---------------------------------+------+--------------------------+
| id | select_type | table | type   | possible_keys   | key      | key_len | ref                             | rows | Extra                    |
+----+-------------+-------+--------+-----------------+----------+---------+---------------------------------+------+--------------------------+
|  1 | SIMPLE      | t2    | index  | value1,value1_2 | value1_2 | 10      | NULL                            |   64 | Using where; Using index |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY         | PRIMARY  | 4       | bp_query_optimization.t2.value1 |    1 | NULL                     |
+----+-------------+-------+--------+-----------------+----------+---------+---------------------------------+------+--------------------------+

We can see that now it is using the index, and in the extra column says “Using index” — which means that it is not using the clustered index.

Finally, we are going to add an index over table_index_analisis_1, in the best way that it is going to be used for this query:

mysql> alter table table_index_analisis_1 add key (id,value1,value2);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> explain select t1.id, t1.value1, t1.value2, t2.value2 from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+----+-------------+-------+--------+-----------------+----------+---------+---------------------------------+------+--------------------------+
| id | select_type | table | type   | possible_keys   | key      | key_len | ref                             | rows | Extra                    |
+----+-------------+-------+--------+-----------------+----------+---------+---------------------------------+------+--------------------------+
|  1 | SIMPLE      | t2    | index  | value1,value1_2 | value1_2 | 10      | NULL                            |   64 | Using where; Using index |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY,id      | PRIMARY  | 4       | bp_query_optimization.t2.value1 |    1 | NULL                     |
+----+-------------+-------+--------+-----------------+----------+---------+---------------------------------+------+--------------------------+
2 rows in set (0.00 sec)

However, it is not selected by the optimizer. That is why we need to force it:

mysql> explain select t1.id, t1.value1, t1.value2, t2.value2 from table_index_analisis_1 t1 force index(id), table_index_analisis_2 t2 where t1.id = t2.value1;
+----+-------------+-------+-------+-----------------+----------+---------+---------------------------------+------+--------------------------+
| id | select_type | table | type  | possible_keys   | key      | key_len | ref                             | rows | Extra                    |
+----+-------------+-------+-------+-----------------+----------+---------+---------------------------------+------+--------------------------+
|  1 | SIMPLE      | t2    | index | value1,value1_2 | value1_2 | 10      | NULL                            |   64 | Using where; Using index |
|  1 | SIMPLE      | t1    | ref   | id              | id       | 4       | bp_query_optimization.t2.value1 |    1 | Using index              |
+----+-------------+-------+-------+-----------------+----------+---------+---------------------------------+------+--------------------------+
2 rows in set (0.00 sec)

Now, we are just using the secondary index in both cases.

Conclusions

There are many more hints to analyze queries we could review, like handler used, table design, etc. However, in my opinion, it is useful to focus on these at the beginning of the analysis.

I will also like to point out that using hints is not a long-term solution! Hints should be used just in the analysis phase.

The post Using Hints to Analyze Queries appeared first on Percona Database Performance Blog.

by David Ducos at April 17, 2018 10:34 PM

MariaDB AB

Hands-on: MariaDB ColumnStore Spark Connector

Hands-on: MariaDB ColumnStore Spark Connector Jens Röwekamp Tue, 04/17/2018 - 13:34

In February with the release of MariaDB ColumnStore 1.1.3, we introduced a new Apache Spark connector (Beta) that exports data from Spark into MariaDB ColumnStore. The Spark connector is available as part of our MariaDB AX analytics solution and complements our suite of rapid-paced data ingestion tools such as a Kafka data adapter and MaxScale CDC data adapter. The connector empowers users to directly export machine learning results stored in Spark DataFrames to ColumnStore for high performance analytics. Internally, it utilizes ColumnStore’s Bulk Data Adapters to inject data directly into MariaDB ColumnStore’s WriteEngine.

In this blog, we’ll explain how to export the results of a simple machine learning pipeline on the classification example of the well known mnist handwritten digits dataset. Feel free to start your own copy of our lab environment by typing:

git clone https://github.com/mariadb-corporation/mariadb-columnstore-docker.git
cd mariadb-columnstore-docker/columnstore_jupyter
docker-compose up -d

This will spin up two docker containers, one with the latest version of MariaDB ColumnStore and the other with a pre-configured version of Jupyter, a web application to interactively execute Python and Scala code snippets.
Now you can access the demo notebook on port 8888 with the password “mariadb”.

In the first five blocks the whole machine learning takes place. For simplicity, a pre-trained random forest model is used to predict the labels of 10,000 handwritten digits of mnist’s test dataset.

The resulting DataFrame contains five columns, from whom we are interested to export three into ColumnStore. These are:

  • The original label to validate the prediction
  • The normalized probability vector of doubles representing the likelihood of being a certain digit
  • The final prediction made by the random forest model

The probability vector indicates the likelihood of the vector index position value matching the input image. This is converted into 10 individual values so that these can be mapped to 10 columns, 1 for each value. The final DataFrame consists of twelve columns: one for the label, one for the final prediction, and ten for each digits’ probability.

This DataFrame can now be exported into ColumnStore by either Spark’s native (JDBC) write function, MariaDB’s Bulk Data Adapters, or MariaDB’s Spark connector. All options are outlined in the demo for easy verification. The MariaDB Spark connector is the easiest to use and provides the best performance.

You’ll get the schema of the DataFrame to export by calling its printSchema() function and can use standard SQL to create the table accordingly. Don’t forget to specify to use the ColumnStore engine, for example:

CREATE TABLE IF NOT EXISTS bulk_api_1
(label double, prediction double, prob_0 double, prob_1 double, prob_2 double, prob_3 double, prob_4 double, prob_5 double, prob_6 double, prob_7 double, prob_8 double, prob_9 double)
ENGINE=columnstore;

The table schema needs to match the structure of the DataFrame otherwise the export will fail.

After the table is created it’s just a two-liner:

import columnStoreExporter
columnStoreExporter.export("database","table",dataFrame)

Now you could use your favourite SQL tool to analyse and visualize your results. There is an excellent blog entry on how to connect MariaDB with Tableau that I highly recommend for further reading.

Feel free to check out the additional notebooks in the lab environment. Next to Python, they also show how to use the Spark connector with Scala.

Last but not least, there is further information on how to set-up the Spark connector in a production environment in our knowledge base. You can also catch our recent webinar that explains how our Spark and Kafka data connectors streamline and simplify the process of getting near real-time data for analysis. Download MariaDB AX, our high performance analytics solution to get started.

As always, we are thrilled to hear your feedback and suggestions through the usual channels.

Login or Register to post comments

by Jens Röwekamp at April 17, 2018 05:34 PM

Peter Zaitsev

Webinar Wednesday, April 18, 2018: Percona XtraDB Cluster 5.7 Tutorial

Percona XtraDB Cluster Tutorial

Percona XtraDB Cluster 5.7 TutorialPlease join Percona’s Architect, Tibi Köröcz as he presents Percona XtraDB Cluster 5.7 Tutorial on Wednesday, April 18, 2018, at 7:00 am PDT (UTC-7) / 10:00 am EDT (UTC-4).

Never used Percona XtraDB Cluster before? Come join this 45-minute tutorial where we will introduce you to the concepts of a fully functional Percona XtraDB Cluster.

In this tutorial, we will show you how you can install Percona XtraDB Cluster with ProxySQL, and monitor it with Percona Monitoring and Management (PMM).

We will also cover topics like bootstrap, IST, SST, Certification, common-failure situations and online schema changes.

Register for the webinar now.

Percona XtraDB ClusterTibor Köröcz, Senior Consultant

Tibi joined Percona in 2015 as a Consultant. Before joining Percona, among many other things, he worked at the world’s largest car hire booking service as a Senior Database Engineer. He enjoys trying and working with the latest technologies and applications that can help or work with MySQL. In his spare time, he likes to spend time with his friends, travel around the world and play ultimate frisbee.

The post Webinar Wednesday, April 18, 2018: Percona XtraDB Cluster 5.7 Tutorial appeared first on Percona Database Performance Blog.

by Tibor Korocz at April 17, 2018 01:30 PM

Binlog and Replication Improvements in Percona Server for MySQL

Percona Server for MySQL

Percona Server for MySQLDue to continuous development and improvement, Percona Server for MySQL incorporates a number of improvements related to binary log handling and replication. This results in replication specifics, distinguishing it from MySQL Server.

Temporary tables and mixed logging format

Summary of the fix:

As soon as some statement involving temporary tables was met when using a mixed binlog format, MySQL switched to row-based logging for all statements until the end of the session (or until all temporary tables used in the session were dropped). This is inconvenient when you have long-lasting connections, including replication-related ones. Percona Server for MySQL fixes the situation by switching between statement-based and row-based logging when necessary.

Details:

The new mixed binary logging format, supported by Percona Server for MySQL, means that the server runs in statement-based logging by default, but switches to row-based logging when replication would be unpredictable. For example, in the case of a nondeterministic SQL statement that could cause data divergence if reproduced on a slave server. The switch is done when matching any condition from a long list, and one of these conditions is the use of temporary tables.

Temporary tables are never logged using row-based format, but any statement that touches a temporary table is logged in row mode. This way, we intercept all the side effects that temporary tables can produce on non-temporary ones.

There is no need to use the row logging format for any other statements, solely because of the temp table presence. However, MySQL undertook such an excessive precaution: once some statement with a temporary table had appeared and the row-based logging was used, MySQL was logging unconditionally put all subsequent statements in row format.

Percona Server for MySQL has implemented more accurate behavior. Instead of switching to row-based logging until the last temporary table is closed, the usual rules of row vs. statement format apply, and we don’t consider the presence of currently opened temporary tables. This change was introduced with the fix of bug #151 (upstream #72475).

Temporary table drops and binloging on GTID-enabled server

Summary of the fix:

MySQL logs DROP statements for all temporary tables regardless of the logging mode under which these tables were created. This produces binlog writes and errand GTIDs on slaves with row and mixed logging. Percona Server for MySQL fixes this by tracking the binlog format at temporary table create time and uses it to decide whether a DROP should be logged or not.

Details:

Even with read_only mode enabled, the server permits some operations, including ones with temporary tables. With the previous fix, temporary table operations are not binlogged in row- or mixed-mode. But MySQL server doesn’t track what the logging mode was when a temporary table was created, and therefore unconditionally logs DROP statements for all temporary tables. These DROP statements receive IF EXISTS addition, which is intended to make them harmless.

Percona Server for MySQL has fixed this with the bug fixes #964, upstream #83003, and upstream #85258. Moreover, with all the binlogging fixes discussed so far nothing involving temporary tables is logged to the binary log in row or mixed format. There is no need to consider CREATE/DROP TEMPORARY TABLE unsafe for use in stored functions, triggers and multi-statement transactions in row/mixed format. Therefore, we introduced an additional fix to mark the creation and drop of temporary tables as unsafe inside transactions in statement-based replication only (the fixed bug is #1816, while the correspondent upstream one is #89467 and it is still open).

Safety of statements with a LIMIT clause

Summary of the fix:

MySQL Server considers all UPDATE/DELETE/INSERT ... SELECT statements with the LIMIT clause unsafe, no matter if they are really producing non-deterministic results or not. Percona Server for MySQL is more accurate because it acknowledges such instructions as safe when they include ORDER BY PK or WHERE condition.

Details:

MySQL Server treats UPDATE/DELETE/INSERT ... SELECT statements with the LIMIT clause as unsafe, considering that they produce an unpredictable number of rows. But some such statements can still produce an absolutely predictable result. One such deterministic case takes place when a statement with the LIMIT clause has an ORDER BY PK or WHERE condition.

The patch, making updates and deletes with a limit to be supposed as safe if they have an ORDER BY pk_column clause, was initially provided on the upstream bug report and incorporated later into Percona Server for MySQL with additional improvements. Bug fixed #44 (upstream #42415).

Performance improvements

There are also two modifications in Percona Server related to multi-source replication that improve performance on slaves.

The first improvement is about relay log position, which was always updated in multi-source replications setups regardless of whether the committed transaction has already been executed or not. Percona Server omits relay log position updates for the already logged GTIDs.

These unconditional relay log position updates caused additional fsync operations in the case of relay-log-info-repository=TABLE. With the higher number of channels transmitting such duplicate (already executed) transactions, the situation became proportionally worse. The problem was solved in Percona Server 5.7.18-14.  Bug fixed  #1786 (upstream #85141).

The second improvement decreases the load on slave nodes configured to update the master status and connection information only on log file rotation. MySQL additionally updated this information in the case of multi-source replication when a slave had to skip the already executed GTID event. This behavior was the cause of substantially higher write loads on slaves and lower replication throughput.

The configuration with master_info_repository=TABLE and sync_master_info=0  makes the slave update the master status and connection information in this table on log file rotation and not after each sync_master_info event, but it didn’t work on multi-source replication setups. Heartbeats sent to the slave to skip GTID events that it had already executed previously were evaluated as relay log rotation events and reacted with mysql.slave_master_info table sync. This inaccuracy could produce a huge (up to five times on some setups) increase in write load on the slave, before this problem was fixed in Percona Server for MySQL 5.7.20-19. Bug fixed  #1812 (upstream #85158).

Current status of fixes

The three issues related to temporary tables that were fixed in Percona Server 5.5 and contributed upstream, and the final fixes of the bugs #72475, #83003, and #85258, have landed into MySQL Server 8.0.4.

The post Binlog and Replication Improvements in Percona Server for MySQL appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at April 17, 2018 12:13 AM

April 16, 2018

Peter Zaitsev

ProxySQL 1.4.7 and Updated proxysql-admin Tool Now in the Percona Repository

ProxySQL for Connection Pooling

ProxySQL 1.4.5ProxySQL 1.4.7, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.7 source and binary packages available at https://percona.com/downloads/proxysql include ProxySQL Admin – a tool, developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.7 are available as well: https://hub.docker.com/r/percona/proxysql/. You can download the original ProxySQL from https://github.com/sysown/proxysql/releases.

This release fixes the following bugs in ProxySQL Admin:

Usability improvements:

  • Added proxysql-status  tool to dump ProxySQL configuration and statistics.

Bug fixes:

  • PSQLADM-2: ProxySQL galera checker script didn’t check if another instance of itself is already running. While running more then one copy of proxysql_galera_checker in the same runtime environment at the same time is still not supported, the introduced fix is able to prevent duplicate script execution in most cases.
  • PSQLADM-40: ProxySQL scheduler generated a lot of proxysql_galera_checker  and  proxysql_node_monitor processes in case of wrong ProxySQL credentials in proxysql-admin.cnf file.
  • PSQLADM-41: Timeout error handling was improved with clear messages.
  • PSQLADM-42: An inconsistency of the date format in ProxySQL and scripts was fixed.
  • PSQLADM-43: proxysql_galera_checker didn’t take into account the possibility of special characters presence in mysql-monitor_password.
  • PSQLADM-44: proxysql_galera_checker generated unclear errors in the proxysql.log file if wrong credentials where passed.
  • PSQLADM-46: proxysql_node_monitor script incorrectly split the hostname and the port number in URLs containing hyphen character.

ProxySQL is available under OpenSource license GPLv3.

The post ProxySQL 1.4.7 and Updated proxysql-admin Tool Now in the Percona Repository appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at April 16, 2018 07:47 PM

MariaDB AB

MariaDB Server 10.3.6 Release Candidate now available

MariaDB Server 10.3.6 Release Candidate now available dbart Mon, 04/16/2018 - 13:26

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.3.6. See the release notes and changelog for details and visit mariadb.com/downloads to download.

Download MariaDB Server 10.3.6

Release Notes Changelog What is MariaDB 10.3?

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.3.6. See the release notes and changelog for details.

Login or Register to post comments

by dbart at April 16, 2018 05:26 PM

MariaDB Foundation

MariaDB 10.3.6 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.3.6, the second release candidate in the MariaDB 10.3 series. See the release notes and changelogs for details. Download MariaDB 10.3.6 Release Notes Changelog What is MariaDB 10.3? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.3.6 Aleksey Midenkov (Tempesta) Alexander Barkov […]

The post MariaDB 10.3.6 now available appeared first on MariaDB.org.

by Ian Gilfillan at April 16, 2018 05:19 PM

Peter Zaitsev

Webinar Tuesday April 17, 2018: Which Amazon Cloud Technology Should You Chose? RDS? Aurora? Roll Your Own?

Amazon Cloud Technology

Amazon Cloud TechnologyPlease join Percona’s Senior Technical Operations Engineer, Daniel Kowalewski as he presents Which Amazon Cloud Technology Should You Chose? RDS? Aurora? Roll Your Own? on Tuesday, April 17, 2018, at 10:00 am PDT (UTC-7) / 1:00 pm EDT (UTC-4).

Are you running on Amazon, or planning to migrate there? In this talk, we are going to cover the different technologies for running databases on Amazon Cloud environments.

We will focus on the operational aspects, benefits and limitations for each of them.

Register for the webinar now.

Amazon Cloud TechnologyDaniel Kowalewski, Senior Technical Operations Engineer

Daniel joined Percona in August of 2015. Previously, he earned a B.S. in Computer Science from the University of Colorado in 2006 and was a DBA there until he joined Percona. In addition to MySQL, Daniel also has experience with Oracle and Microsoft SQL Server, but he much prefers to stay in the MySQL world. Daniel lives near Denver, CO with his wife, two-year-old son, and dog. If you can’t reach him, he’s probably in the mountains hiking, camping, or trying to get lost.

The post Webinar Tuesday April 17, 2018: Which Amazon Cloud Technology Should You Chose? RDS? Aurora? Roll Your Own? appeared first on Percona Database Performance Blog.

by Daniel Kowalewski at April 16, 2018 04:36 PM

April 15, 2018

Valeriy Kravchuk

Fun with Bugs #66 - On MySQL Bug Reports I am Subscribed to, Part VI

I have some free time today, but I am still lazy enough to work on numerous planned and pending "ToDo" kind of posts, so why not to continue review of older MySQL bugs I am subscribed to. Today I am going to list 15 more bugs reported more than a year ago and still not fixed:
  • Bug #85805 - "Incorrect ER_BAD_NULL_ERROR after LOAD DATA LOCAL INFILE". This detailed bug report by Tsubasa Tanaka stays "Verified" for more than a year already. It's a great example of gdb use for MySQL troubleshooting. Setting a couple of breakpoints may really help to understand how MySQL works and why some weird errors happen.
  • Bug #85536 - "Build error on 5.5.54". It's clear that almost nobody besides Roel Van de Paar cares about build problem of MySQL 5.5.x(!) on Ubuntu 16.10(!). Anyway, it's strange that the bug remains "Verified" and not closed in any way if Oracle really does not intend to support MySQL 5.5 any longer. For now it seems MySQL 5.5 is still under extended support, so I hope to see this build problem fixed with some final 5.5.x release.
  • Bug #85501 - "Make all options settable as variables in configuration files". We usually see Umesh Shastry processing bugs reported by other, but this is a rare case when he reports something himself. It's a great feature request.
  • Bug #85447 - "Slave SQL thread locking issue on a certain XA workload on master". There are good reasons to think that this bug reported by Laurynas Biveinis may be fixed since MySQL 5.7.18, but no one cares to close it properly.
  • Bug #85382 - "Getting semi-sync reply magic number errors when slave_compressed_protocol is 1". This bug was reported by Jaime Sicam. Read also comments from other community members and make your own conclusions. It seems setting slave_compressed_protocol to 1 is a bad idea in general...
  • Bug #85191 - "performance regression with HANDLER READ syntax". Zhai Weixiang found clear performance regression in the way MySQL 5.7 uses metadata locking for HANDLER commands.
  • Bug #85016 - "better description for: OS error: 71". Clear and simple request from Shane Bester still stays "Verified". I am not that Oracle customer affected anyway, but this seems strange to me.
  • Bug #84958 - "InnoDB's MVCC has O(N^2) behaviors". This one bug report from Domas Mituzas could be a topic for a series of blog posts... It clearly states that:
    "if there're multiple row versions in InnoDB, reading one row from PK may have O(N) complexity and reading from secondary keys may have O(N^2) complexity"
    There is a patch that partially fixes the problem submitted by Laurynas Biveinis and created by Alexey Midenkov. While this bug is still "Verified" take carer when using secondary indexes in concurrent environments when the same data are often changed.
  • Bug #84868 - "Please make it possible to query replication information consistently". Great feature request (or bug report, if you consider inconsistency as a bug) from Simon Mudd.
  • Bug #84615 - "More steps in connection processlist state/ events_stages". Sveta Snirnova cared to ask to split some well known statement execution stages like "cleaning up" into more detailed ones. I think this is really important to simplify troubleshooting with performance_schema. Wrong/misleading/too generic stages forces to use other tools and may lead to wrong conclusions. I hit this with "statistics" also, see Bug #84858. Rare case when Sveta's request just stays "Open", for more than a year already.
  • Bug #84467 - "ALTERing KEY_BLOCK_SIZE keeps the old kbs in KEYs.". Jean-François Gagné and other well known bug reporters found several problems related to KEY_BLOCK_SIZE. It seems Oracle engineers decided NOT to fix them (see Bug #88220). But then why this bug still stays "Verified"? Consistency in bugs processing is one of my dreams...
  • Bug #84439 - "Table of row size of ~800 bytes does not compress with KEY_BLOCK_SIZE=1." Yet another bug report from Jean-François Gagné. Based on lack of activity, those looking for smaller data size, compression etc should look elsewhere and do not expect much from Oracle's InnoDB. Question is, what other engines with data compression will be supported by Oracle's MySQL 8 (or 9) GA? When you get tired wondering, consider MariaDB or Percona Server instead - they do support storage engines that are both transactional and were designed with write efficiency and space efficiency in mind. Hint: they rock...
  • Bug #84274 - "READ COMMITTED does not scale after 36 threads (in 5.6 after 16 threads)". Sveta Smirnova had a chance to run benchmarks on 144 cores (the largest box I ever had a chance to use for benchmarking had 12 cores, so what do I know...) and the result is clear - READ COMMITTED transaction isolation level does not scale well (comparing to default REPEATABLE READ). It's counter intuitive for many, but that's what we have. I doubt MySQL 8 is going to change this (unfortunate) situation.
  • Bug #84241 - "Potential Race Condition". This was found in MySQL 5.7 by Rui Gu with a little help from Helgrind.
  • Bug #84024 - "Optimizer thinks clustered primary key is not covering". This bug was reported by Manuel Ung. Let me quote a comment by Øystein Grøvlen:
    "I can agree that the cost model for join buffering is not perfect. If so, I think we should improve this model, not rely on heuristics about covering indexes versus table scan."
    I can not agree more! Let's hope this really happens in MySQL 9 at least.
You probably noted that we see mostly already famous bug reporters mentioned in this list. But names of reporters, their customer or partner status, known achievements, even clear regressions found or patches provided do not force Oracle to fix problems faster these days... They have their own agenda and great plans for MySQL, obviously.

I also have my own agenda, so I'll proceed with this glass of wine...

by Valeriy Kravchuk (noreply@blogger.com) at April 15, 2018 05:56 PM

April 13, 2018

Peter Zaitsev

MongoDB Replica Set Tag Sets

Running MongoDB

MongoDB Replica Set Tag SetsIn this blog post, we will look at MongoDB replica set tag sets, which enable you to use customized write concern and read preferences for replica set members.

This blog post will cover most of the questions that come to mind before using tag sets in a production environment.

  • What scenarios are these helpful for?
  • Do these tag sets work with all read preferences modes?
  • What if we’re already using maxStalenessSeconds along with the read preferences, can we still use a tag set?
  • How can one configure tag sets in a replica set?
  • Do these tags work identically for custom read preferences and write concerns?

Now let’s answer all these questions one by one.

What scenarios are these helpful for?

You can use tags:

  • If replica set members have different configurations and queries need to be redirected to the specific secondaries as per their purpose. For example, production queries can be redirected to the higher configuration member for faster execution and queries used for internal reporting purpose can be redirected to the low configurations secondaries. This will help improve per node resource utilization.
  • When you use custom read preferences, but the reads are routed to a secondary that resides in another data center to make reads more optimized and cost-effective. You can use tag sets to make sure that specific reads are routed to the specific secondary node within the DC.
  • If you want to use custom write concerns with the tag set for acknowledging writes are propagated to the secondary nodes per the requirements.

Do these tag sets work with all read preferences modes?

Yes, these tag-sets work with all the read preferences — except “primary” mode. “Primary” preferred read preference mode doesn’t allow you to add any tag sets while querying.

replicaTest:PRIMARY> db.tagTest.find().readPref('primary', [{"specs" : "low","purpose" : "general"}])
Error: error: {
	"ok" : 0,
	"errmsg" : "Only empty tags are allowed with primary read preference",
	"code" : 2,
	"codeName" : "BadValue"
}

What if we’re already using maxStalenessSeconds along with the read preferences, can tag set still be used?

Yes, you can use tag sets with a maxStalenessSeconds value. In that case, priority is given to staleness first, then tags, to get the most recent data from the secondary member.

How can one configure tag sets in a replica set?

You can configure tags by adding a parameter in the replica set configuration. Consider this test case with a five members replica set:

"members" : [
		{
			"_id" : 0,
			"name" : "host1:27017",
			"stateStr" : "PRIMARY",
		},
		{
			"_id" : 1,
			"name" : "host2:27017",
			"stateStr" : "SECONDARY",
		},
		{
			"_id" : 2,
			"name" : "host3:27017",
			"stateStr" : "SECONDARY",
		},
		{
			"_id" : 3,
			"name" : "host4:27017",
			"stateStr" : "SECONDARY",
		},
		{
			"_id" : 4,
			"name" : "host5:27017",
			"stateStr" : "SECONDARY",
         }
		]

For our test case, members specification of the host are “specs” and the requirement for the query as per the application is the “purpose,” in order to route queries to specific members in an optimized manner.

You must associate tags to each member by adding it to the replica set configuration:

cfg=rs.conf()
cfg.members[0].tags={"specs":"high","purpose":"analytics"}
cfg.members[1].tags={"specs":"high"}
cfg.members[2].tags={"specs":"low","purpose":"general"}
cfg.members[3].tags={"specs":"high","purpose":"analytics"}
cfg.members[4].tags={"specs":"low"}
rs.reconfig(cfg)

After adding tags, you can validate these changes by checking replica set configurations like:

rs.conf()
	"members" : [
		{
			"_id" : 0,
			"host" : "host1:27017",
			"tags" : {
				"specs" : "high",
				"purpose" : "analytics"
			},
		},
		{
			"_id" : 1,
			"host" : "host2:27017",
			"tags" : {
				"specs" : "high"
			},
		},
		{
			"_id" : 2,
			"host" : "host3:27017",
			"tags" : {
				"specs" : "low",
				"purpose" : "general"
			},
		},
		{
			"_id" : 3,
			"host" : "host4:27017",
			"tags" : {
				"specs" : "high",
				"purpose" : "analytics"
			},
		},
		{
			"_id" : 4,
			"host" : "host5:27017",
			"tags" : {
				"specs" : "low"
			},
		}
	]

Now, we are done with the tag-set configuration.

Do these tags work identically for custom read preferences and write concerns?

No, custom read preferences and write concerns consider tag sets in different ways.

Read preferences routes read operations to a required specific member by following tag values assigned to it, but write concerns follows tag values only to check if the value is unique or not. It will not consider tag values while selecting replica members.

Let us see how to use tag sets with write concerns. As per our test case, we have two unique tag values (i.e., “analytics” and “general”) defined as:

cfg=rs.conf()
cfg.settings={ getLastErrorModes: {writeNode:{"purpose": 2}}}
rs.reconfig(cfg)

You can validate these changes by checking the replica set configuration:

rs.conf()
	"settings" : {
			"getLastErrorModes" : {
			"writeNode" : {
				"purpose" : 2
			}<strong>
		},</strong>
	}

Now let’s try to insert a sample document in the collection named “tagTest” with this write concern:

db.tagTest.insert({name:"tom",tech:"nosql",status:"active"},{writeConcern:{w:"writeNode"}})
WriteResult({ "nInserted" : 1 })

Here, the write concern “writeNode” means the client gets a write acknowledgment from two nodes with unique tag set values. If the value set in the configuration exceeds the count of unique values, then it leads to an error at the time of the write:

cfg.settings={ getLastErrorModes: {writeNode:{"purpose": 4}}}
rs.reconfig(cfg)
db.tagTest.insert({name:"tom",tech:"nosql",status:"active"},{writeConcern:{w:"writeNode"}})
WriteResult({
	"nInserted" : 1,
	"writeConcernError" : {
		"code" : 100,
		"codeName" : "CannotSatisfyWriteConcern",
		"errmsg" : "Not enough nodes match write concern mode "writeNode""
	}
}

You can perform read and write operations with tag sets like this:

db.tagTest.find({name:"tom"}).readPref("secondary",[{"specs":"low","purpose":"general"}])
db.tagTest.insert({name:"john",tech:"rdbms",status:"active"},{writeConcern:{w:"writeNode"}})

I hope this helps you to understand how to configure MongoDB replica set tag sets, how the read preferences and write concerns handle them, and where you can use them

The post MongoDB Replica Set Tag Sets appeared first on Percona Database Performance Blog.

by Aayushi Mangal at April 13, 2018 06:19 PM

This Week in Data with Colin Charles 35: Percona Live 18 final countdown and a roundup of recent news

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live is just over a week away — there’s an awesome keynote lineup, and you really should register. Also don’t forget to save the date as Percona Live goes to Frankfurt, Germany November 5-7 2018! Prost!

In acquisitions, we have seen MariaDB acquire MammothDB and Idera acquire Webyog.

Some interesting Amazon notes: Amazon Aurora Continues its Torrid Growth, More than Doubling the Number of Active Customers in the Last Year (not sure I’d describe it as torrid but this is great for MySQL and PostgreSQL), comes with a handful of customer mentions. In addition, there have already been 65,000 database migrations on AWS. For context, in late November 2017, it was 40,000 database migrations.

Releases

Link List

Upcoming appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 35: Percona Live 18 final countdown and a roundup of recent news appeared first on Percona Database Performance Blog.

by Colin Charles at April 13, 2018 04:32 PM

April 12, 2018

Peter Zaitsev

Percona Server for MongoDB 3.4.14-2.12 Is Now Available

Percona Server for MongoDB 3.4

Percona Server for MongoDB 3.2Percona announces the release of Percona Server for MongoDB 3.4.14-2.12 on April 12, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.4 Community Edition. It supports MongoDB 3.4 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features. It requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.14 and does not include any additional changes.

The Percona Server for MongoDB 3.4.14-2.12 release notes are available in the official documentation.

The post Percona Server for MongoDB 3.4.14-2.12 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at April 12, 2018 09:45 PM

Flashback: Another Take on Point-In-Time Recovery (PITR) in MySQL/MariaDB/Percona Server

Point-In-Time Recovery

Point-In-Time RecoveryIn this blog post, I’ll look at point-in-time recovery (PITR) options for MySQL, MariaDB and Percona Server for MySQL.

It is a common good practice to extend data safety by having additional measures apart from regular data backups, such as delayed slaves and binary log backups. These two options provide the ability to restore the data to any given point in time, or just revert from some bad accidents. These methods have their limitations of course: delayed slaves only help if a deadly mistake is noticed fast enough, while full point-in-time recovery (PITR) requires the last full backup and binary logs (and therefore usually takes a lot of time).

How to reverse from disaster faster

Alibaba engineers and the MariaDB team implemented an interesting feature in their version of the mysqlbinlog tool: the --flashback option. Based on ROW-based DML events, it can transform the binary log and reverse purposes. That means it can help undo given row changes extremely fast. For instance, it can change DELETE events to INSERTs and vice versa, and it will swap WHERE and SET parts of the UPDATE events. This simple idea can dramatically speed up recovery from certain types of mistakes or disasters.

The question is whether it works with non-MariaDB variants. To verify that, I tested this feature with the latest available Percona Server for MySQL 5.7 (which is fully compatible with upstream MySQL).

master [localhost] {msandbox} ((none)) > select @@version,@@version_comment;
+---------------+--------------------------------------------------------+
| @@version     | @@version_comment                                      |
+---------------+--------------------------------------------------------+
| 5.7.21-20-log | Percona Server (GPL), Release 20, Revision ed217b06ca3 |
+---------------+--------------------------------------------------------+
1 row in set (0.00 sec)

First, let’s simulate one possible deadly scenario: a forgotten WHERE in DELETE statement:

master [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
| 200      |
+----------+
1 row in set (0.00 sec)
master [localhost] {msandbox} ((none)) > delete from test.sbtest1;
Query OK, 200 rows affected (0.04 sec)
slave1 [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
| 0        |
+----------+
1 row in set (0.00 sec

So, our data is lost on both the master and slave!

Let’s start by downloading the latest MariaDB server 10.2.x package, which I’m hoping has a mysqlbinlog tool that works with MySQL 5.7, and unpack it to some custom location:

$ dpkg -x mariadb-server-10.2_10.2.13+maria~wheezy_amd64.deb /opt/maria/
$ /opt/maria/usr/bin/mysqlbinlog --help|grep flash
-B, --flashback Flashback feature can rollback you committed data to a

It has the function we are looking for. Now, we have to find the culprit transaction or set of transactions we want to revert. A simplified example may look like this:

$ mysqlbinlog -v --base64-output=DECODE-ROWS mysql-bin.000002 > mysql-bin.000002.sql
$ less mysql-bin.000002.sql

By searching through the decoded binary log, we are looking for transactions that have wiped out the table test.sbtest1. It looks like this (as the table had 200 rows, it is pretty long, so I’ve pasting only the beginning and the end):

BEGIN
/*!*/;
# at 291
#180314 15:30:34 server id 1  end_log_pos 348 CRC32 0x06cd193e  Table_map: `test`.`sbtest1` mapped to number 111
# at 348
#180314 15:30:34 server id 1  end_log_pos 8510 CRC32 0x064634c5         Delete_rows: table id 111
...
### DELETE FROM `test`.`sbtest1`
### WHERE
###   @1=200
###   @2=101
###   @3='26157116088-21551255803-13077038767-89418462090-07321921109-99464656338-95996554805-68102077806-88247356874-53904987561'
###   @4='51157774706-69740598871-18633441857-39587481216-98251863874'
# at 38323
#180314 15:30:34 server id 1  end_log_pos 38354 CRC32 0x6dbb7127        Xid = 97
COMMIT/*!*/;

It is very important to take the proper start and stop positions. We need the ones exactly after BEGIN and before the final COMMIT. Then, let’s test if the tool produces the reverse statements as expected. First, decode the rows to the .sql file:

$ /opt/maria/usr/bin/mysqlbinlog --flashback -v --base64-output=DECODE-ROWS --start-position=291 --stop-position=38323 mysql-bin.000002 > mysql-bin.000002_flash.sql

Inside, we find 200 of those. Looks good:

### INSERT INTO `test`.`sbtest1`
### SET
### @1=200
...

Since we verified the positions are correct, we can prepare a binary log file:

$ /opt/maria/usr/bin/mysqlbinlog --flashback --start-position=291 --stop-position=38323 mysql-bin.000002 > mysql-bin.000002_flash.bin

and load it back to our master:

master [localhost] {msandbox} (test) > source mysql-bin.000002_flash.bin
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.04 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
master [localhost] {msandbox} (test) > select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
| 200      |
+----------+
1 row in set (0.00 sec)

and double check they restored on slaves:

slave1 [localhost] {msandbox} (test) > select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
| 200      |
+----------+
1 row in set (0.00 sec)

GTID problem

MariaDB has a completely different GTID implementation from MySQL and Percona Server. You can expect problems when decoding incompatible GTID enabled binary logs with MariaDB. As MariaDB’s mysqlbinlog does not support –start/stop-gtid options (even for its own implementation), we have to take the usual positions anyway. From a GTID-enabled binary log, for example, delete can look like this:

# at 2300
#180315 9:37:31 server id 1 end_log_pos 2365 CRC32 0x09e4d815 GTID last_committed=1 sequence_number=2 rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= '00020996-1111-1111-1111-111111111111:2'/*!*/;
# at 2365
#180315 9:37:31 server id 1 end_log_pos 2433 CRC32 0xac62a20d Query thread_id=4 exec_time=0 error_code=0
SET TIMESTAMP=1521103051/*!*/;
BEGIN
/*!*/;
# at 2433
#180315 9:37:31 server id 1 end_log_pos 2490 CRC32 0x275601d6 Table_map: `test`.`sbtest1` mapped to number 108
# at 2490
#180315 9:37:31 server id 1 end_log_pos 10652 CRC32 0xe369e169 Delete_rows: table id 108
...
# at 42355
#180315 9:37:31 server id 1 end_log_pos 42386 CRC32 0xe01ff558 Xid = 31
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;

The tool seems to work, and transforms the delete transaction to a sequence of INSERTs. However, the server rejects it when we try to load it on a GTID-enabled master:

master [localhost] {msandbox} ((none)) > source mysql-bin.000003.flash
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
master [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)

Unfortunately, the solution here is either to disable GTID mode for the recovery time (which is surely tricky in replicated clusters), or try to add GTID-related information to the resulting binary log with the

--flashback option
. In my case, adding these lines worked (I used the next free available GTID sequence):

$ diff -u mysql-bin.000003.flash mysql-bin.000003.flash.gtid
--- mysql-bin.000003.flash 2018-03-15 10:20:20.080487998 +0100
+++ mysql-bin.000003.flash.gtid 2018-03-15 10:25:02.909953620 +0100
@@ -4,6 +4,10 @@
DELIMITER /*!*/;
#180315 9:32:51 server id 1 end_log_pos 123 CRC32 0x941b189a Start: binlog v 4, server v 5.7.21-20-log created 180315 9:32:51 at startup
ROLLBACK/*!*/;
+# at 154
+#180315 9:37:05 server id 1 end_log_pos 219 CRC32 0x69e4ce26 GTID last_committed=0 sequence_number=1 rbr_only=yes
+/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
+SET @@SESSION.GTID_NEXT= '00020996-1111-1111-1111-111111111111:5'/*!*/;
BINLOG '
sy+qWg8BAAAAdwAAAHsAAAAAAAQANS43LjIxLTIwLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAACzL6paEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
@@ -724,6 +728,7 @@
'/*!*/;
COMMIT
/*!*/;
+SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;

master [localhost] {msandbox} ((none)) > source mysql-bin.000003.flash.gtid
(...)
master [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
| 200      |
+----------+
1 row in set (0.00 sec

Limitations

Obviously, flashback cannot help after DROP/TRUNCATE or other DDL commands. These are not transactional, and affected rows are never recorded in the binary log. It doesn’t work with encrypted or compressed binary logs either. But most importantly, to produce complete events that can reverse bad transactions, the binary format must be ROW. The row image also must be FULL:

master [localhost] {msandbox} ((none)) > select @@binlog_format,@@binlog_row_image;
+-----------------+--------------------+
| @@binlog_format | @@binlog_row_image |
+-----------------+--------------------+
| ROW             | FULL               |
+-----------------+--------------------+
1 row in set (0.00 sec)

If these conditions are not met (or if you’re dealing with a too-complicated GTID issue), you will have to follow the standard point-in-time recovery procedure.

The post Flashback: Another Take on Point-In-Time Recovery (PITR) in MySQL/MariaDB/Percona Server appeared first on Percona Database Performance Blog.

by Przemysław Malkowski at April 12, 2018 05:46 PM

Percona Monitoring and Management 1.9.1 Is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management 1.9.1. PMM (Percona Monitoring and Management) is a free and open-source platform for managing and monitoring MySQL and MongoDB performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

This release contains bug fixes only and supersedes Percona Monitoring and Management 1.9.0. This release effectively solves the problem in QAN when the Count column actually displayed the number of queries per minute, not per second, as the user would expect. The following screenshot demonstrates the problem. The value of the Count column for the TOTAL row is 649.38 QPS (queries per second). The total number 38.96 k (38960) is only sixty times greater than the reported value of QPS. Thus, queries were counted for each minute within the selected time range of Last 1 hour.

Query Analytics in PMM version 1.9.0.

The corrected version of QAN in PMM 1.9.1 shows that queries are now counted per second. The total number of queries is 60 * 60 greater than the value of QPS, as should be expected for the chosen time range.

Query Analytics in PMM version 1.9.1.

Bug fixes

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

The post Percona Monitoring and Management 1.9.1 Is Now Available appeared first on Percona Database Performance Blog.

by Borys Belinsky at April 12, 2018 05:00 PM

Percona Live 2018 Featured Talk: Containerizing Databases at New Relic (What We Learned) with Joshua Galbraith and Bryant Vinisky

Joshua Bryant Percona Live 2018 New Relic

Percona Live 2018 Featured TalkWelcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Joshua Galbraith, Senior Software Engineer and Bryant Vinisky, Site Reliability Engineer at New Relic. Their session talk is titled Containerizing Databases at New Relic: What We Learned. There are many trade-offs when containerizing databases, and there are many open source support options for database containers such as Kubernetes and Apache Mesos. In our conversation, we discussed what containers can bring to a database environment:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Joshua: My name is Joshua Galbraith, and I’m a Senior Software Engineer and Technical Product Manager on the Database Engineering team at New Relic. I’ve been working with open-source databases for the past ten years. I started my software engineering career by writing scripts to load several years of high-resolution geophysical sensor data from flat files into a Postgres database. I’ve been writing software that interacts with databases and tools to make operating them easier ever since. I’ve run MySQL, MongoDB, ElasticSearch, Cassandra and Redis in a variety of production environments. A little over three years ago, I co-founded a DBaaS startup and built a container-based platform for Apache Cassandra. I’ve been containerizing databases ever since — most recently by working on a project called Megabase at New Relic.

Joshua Bryant Percona Live 2018 New RelicBryant: Hello, my name is Bryant Vinisky. I currently work on the Database Engineering team at New Relic as a Senior Site Reliability Engineer. My professional experience with databases and the open source ecosystem started over eight years ago when I joined the engineering team at NWEA and helped roll out a SaaS platform for delivering adaptive assessments to students over the web. That platform was backed by a number of different database and related technologies — namely PostgreSQL, MongoDB and Redis.

Though in the beginning, my role didn’t formally involve databases, they were clearly a very important part of the greater system. Armed with an unquenchable curiosity, I was never shy about crossing boundaries into DB land. On the development side, much of the tooling and side projects I worked on over the years frequently touched databases for a storage backend. As the assessment platform scaled out to support hundreds of thousands of concurrent users, my role shifted to a reliability focus, often involving pre-release load testing of the major system components as well as doing follow up for problems that occurred on the production system. Much of the work involved dealing with the databases as a scaling bottleneck.

Containers came into the picture for me over two years ago when I moved to New Relic, where they had been using containers for stateless services for years and largely skipped over VMs. It wasn’t long before we started exploring containers as a solution for stateful database services with our Megabase project, an area where I’ve spent a lot of my time since

Your tutorial is titled “Containerizing Databases at New Relic: What We Learned”. How did you decide on containers as your database solution?

Joshua/Bryant: At New Relic, we had already built a Container Fabric for our stateless services. We knew we needed to provide databases to our internal teams in a way that was fast, cost-efficient and repeatable. Using containers would get us most of the way towards those goals, but we didn’t have a proven pattern that we could follow to reach them. We heard about other companies succeeding in building similar solutions, and we knew that we were moving away from virtual machines. Containers seemed to fit the bill.

What are the benefits and drawbacks of containerizing databases?

Joshua/Bryant: Containers are great for packaging and deployment. They allow us to deploy a known version of configuration, code and environment in a deterministic way. Unfortunately, containers are not perfect mechanisms for resource isolation, especially when large amounts of persistent storage are required. The challenge of containerizing databases is to make the right trade-offs between portability and performance, complexity and ease-of-operation given the specific context and goals of your team and organization.

Were specific database software (MySQL, Redis, PostgreSQL) easier to containerize? Why?

Joshua/Bryant: In general, the databases that are easiest to containerize are the databases that manage their own state and make themselves effectively stateless from the point of the scheduler and container orchestration framework. Databases that provide cluster membership, fault-tolerance and data replication are easier to run with a traditional orchestration framework. Databases that do not do these things on their own require additional “sidecar” services and/or application-specific frameworks to operate reliably.

Why should people attend your talk? What do you hope people will take away from it?

Joshua/Bryant: As a team, we’ve learned a lot of lessons about what to do, and not to do, when running stateful applications in containers. We’d like to pass that knowledge on to our audience. We also want to send people away with enough information to make the right decisions about whether or not to containerize the databases they are responsible for, and how to do it in a way that is successful given their own goals and context. At the very least, we hope it will be entertaining — and maybe we’ll learn some things too.

What are you looking forward to at Percona Live (besides your talk)?

Joshua/Bryant: We’re very excited to hear about open-source tools like Vitess and ProxySQL. I’m also looking forward to hearing about the latest monitoring and performance analysis tools. I always love deep-dives into specific database problems faced by a company, and I love talking to people in the expo hall. I always come away from Percona Live events a little more excited about my day-to-day work and the future of open-source databases.

Want to find out more about this Percona Live 2018 featured talk, and containerizing stateful databases? Register for Percona Live 2018, and see Joshua and Bryant’s session talk Containerizing Databases at New Relic: What We Learned. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

The post Percona Live 2018 Featured Talk: Containerizing Databases at New Relic (What We Learned) with Joshua Galbraith and Bryant Vinisky appeared first on Percona Database Performance Blog.

by Dave Avery at April 12, 2018 03:25 PM

April 11, 2018

Peter Zaitsev

ProxySQL Admin Support for Multiple Clusters

ProxySQL Admin

ProxySQL AdminIn this blog post, we demonstrate a new feature in ProxySQL Admin: support for multiple clusters.

In a previous blog post, Ramesh and Roel introduced a new tool that helps configured Percona XtraDB Cluster nodes into ProxySQL. However, at that time it only worked for a single cluster per ProxySQL Admin configuration. Starting from ProxySQL 1.4.6, which comes with an improved ProxySQL Admin tool (proxysql-admin), our tool now supports configuring multiple Percona XtraDB Cluster clusters with ease (PSQLADM-32).

Pre-requisites

  • Cluster name (wsrep_cluster_name) should be unique.
  • proxysql-admin.cnf configuration differences:
    • ProxySQL READ/WRITE hostgroup should be different for each cluster.
    • Application user should be different for each cluster.
  • Host priority feature support only one cluster at a time.

Configuring /etc/proxysql-admin.cnf

As mentioned above, the CLUSTER_APP_USERNAME and the WRITE/READ_HOSTGROUP should be different for each cluster. Wsrep_cluster_name should also be unique for each cluster.

+--------------------+----------+
| Variable_name      | Value    |
+--------------------+----------+
| wsrep_cluster_name | cluster1 |
+--------------------+----------+
+--------------------+----------+
| Variable_name      | Value    |
+--------------------+----------+
| wsrep_cluster_name | cluster2 |
+--------------------+----------+

Sample configuration of /etc/proxysql-admin.cnf for cluster1:

# proxysql admin interface credentials.
export PROXYSQL_DATADIR='/var/lib/proxysql'
export PROXYSQL_USERNAME='admin'
export PROXYSQL_PASSWORD='admin'
export PROXYSQL_HOSTNAME='localhost'
export PROXYSQL_PORT='6032'
# PXC admin credentials for connecting to pxc-cluster-node.
export CLUSTER_USERNAME='root'
export CLUSTER_PASSWORD='sekret'
export CLUSTER_HOSTNAME='10.0.3.41'
export CLUSTER_PORT='3306'
# proxysql monitoring user. proxysql admin script will create this user in pxc to monitor pxc-nodes.
export MONITOR_USERNAME='monitor'
export MONITOR_PASSWORD='monit0r'
# Application user to connect to pxc-node through proxysql
export CLUSTER_APP_USERNAME='cluster1_user'
export CLUSTER_APP_PASSWORD='c1_passw0rd'
# ProxySQL read/write hostgroup
export WRITE_HOSTGROUP_ID='10'
export READ_HOSTGROUP_ID='11'
# ProxySQL read/write configuration mode.
export MODE="singlewrite"
# ProxySQL Cluster Node Priority File
export HOST_PRIORITY_FILE=$PROXYSQL_DATADIR/host_priority.conf

Sample configuration of /etc/proxysql-admin.cnf for cluster2

# proxysql admin interface credentials.
export PROXYSQL_DATADIR='/var/lib/proxysql'
export PROXYSQL_USERNAME='admin'
export PROXYSQL_PASSWORD='admin'
export PROXYSQL_HOSTNAME='localhost'
export PROXYSQL_PORT='6032'
# PXC admin credentials for connecting to pxc-cluster-node.
export CLUSTER_USERNAME='root'
export CLUSTER_PASSWORD='sekret'
export CLUSTER_HOSTNAME='10.0.3.173'
export CLUSTER_PORT='3306'
# proxysql monitoring user. proxysql admin script will create this user in pxc to monitor pxc-nodes.
export MONITOR_USERNAME='monitor'
export MONITOR_PASSWORD='monit0r'
# Application user to connect to pxc-node through proxysql
export CLUSTER_APP_USERNAME='cluster2_user'
export CLUSTER_APP_PASSWORD='c2_passw0rd'
# ProxySQL read/write hostgroup
export WRITE_HOSTGROUP_ID='20'
export READ_HOSTGROUP_ID='21'
# ProxySQL read/write configuration mode.
export MODE="loadbal"
# ProxySQL Cluster Node Priority File
export HOST_PRIORITY_FILE=$PROXYSQL_DATADIR/host_priority.conf

Setting up Percona XtraDB Cluster nodes in ProxySQL

I would add that you have the option to use a single proxysql-admin.cnf file, and just edit the file where changes are appropriate. You could also use two different files to configure ProxySQL. In my example, I used two files with the contents as seen above:

[root@proxysql_multi-pxc ~]# proxysql-admin --config=/etc/proxysql-admin_cluster1.cnf --enable
This script will assist with configuring ProxySQL (currently only Percona XtraDB cluster in combination with ProxySQL is supported)
ProxySQL read/write configuration mode is singlewrite
Configuring ProxySQL monitoring user..
ProxySQL monitor username as per command line/config-file is monitor
User 'monitor'@'10.%' has been added with USAGE privilege
Configuring the Percona XtraDB Cluster application user to connect through ProxySQL
Percona XtraDB Cluster application username as per command line/config-file is cluster1_user
Percona XtraDB Cluster application user 'cluster1_user'@'10.%' has been added with the USAGE privilege, please make sure to the grant appropriate privileges
Adding the Percona XtraDB Cluster server nodes to ProxySQL
Configuring singlewrite mode with the following nodes designated as priority order:
Write node info
+-----------+--------------+------+---------+---------+
| hostname  | hostgroup_id | port | weight  | comment |
+-----------+--------------+------+---------+---------+
| 10.0.3.41 | 10           | 3306 | 1000000 | WRITE   |
+-----------+--------------+------+---------+---------+
ProxySQL configuration completed!
ProxySQL has been successfully configured to use with Percona XtraDB Cluster
You can use the following login credentials to connect your application through ProxySQL
mysql --user=cluster1_user -p  --host=localhost --port=6033 --protocol=tcp

[root@proxysql_multi-pxc ~]# proxysql-admin --config=/etc/proxysql-admin_cluster2.cnf --enable
This script will assist with configuring ProxySQL (currently only Percona XtraDB cluster in combination with ProxySQL is supported)
ProxySQL read/write configuration mode is loadbal
Host priority file (/var/lib/proxysql/host_priority.conf) is already present. Would you like to replace with the new file [y/n] ? n
Host priority file is not deleted. Please make sure you have properly configured /var/lib/proxysql/host_priority.conf
Configuring ProxySQL monitoring user..
ProxySQL monitor username as per command line/config-file is monitor
User 'monitor'@'10.%' has been added with USAGE privilege
Configuring the Percona XtraDB Cluster application user to connect through ProxySQL
Percona XtraDB Cluster application username as per command line/config-file is cluster2_user
Percona XtraDB Cluster application user 'cluster2_user'@'10.%' has been added with the USAGE privilege, please make sure to the grant appropriate privileges
Adding the Percona XtraDB Cluster server nodes to ProxySQL
ProxySQL configuration completed!
ProxySQL has been successfully configured to use with Percona XtraDB Cluster
You can use the following login credentials to connect your application through ProxySQL
mysql --user=cluster2_user -p  --host=localhost --port=6033 --protocol=tcp

Inspect ProxySQL tables

Login to ProxySQL to confirm that the setup is correct:

[root@proxysql_multi-pxc ~]# mysql -uadmin -p -P6032 -h127.0.0.1
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 33893
Server version: 5.5.30 (ProxySQL Admin Module)
Copyright (c) 2009-2018 Percona LLC and/or its affiliates
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> select * from mysql_users;
+---------------+-------------------------------------------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+
| username      | password                                  | active | use_ssl | default_hostgroup | default_schema | schema_locked | transaction_persistent | fast_forward | backend | frontend | max_connections |
+---------------+-------------------------------------------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+
| cluster1_user | *448C417D62616B779E789F3BD72AA3DE9C319EA3 | 1      | 0       | 10                |                | 0             | 1                      | 0            | 1       | 1        | 10000           |
| cluster2_user | *AB1E96267D16A9F26A201282F9ED80B50244B770 | 1      | 0       | 20                |                | 0             | 1                      | 0            | 1       | 1        | 10000           |
+---------------+-------------------------------------------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+
2 rows in set (0.00 sec)
mysql> select * from mysql_servers;
+--------------+------------+------+--------+---------+-------------+-----------------+---------------------+---------+----------------+-----------+
| hostgroup_id | hostname   | port | status | weight  | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment   |
+--------------+------------+------+--------+---------+-------------+-----------------+---------------------+---------+----------------+-----------+
| 11           | 10.0.3.81  | 3306 | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              | READ      |
| 10           | 10.0.3.41  | 3306 | ONLINE | 1000000 | 0           | 1000            | 0                   | 0       | 0              | WRITE     |
| 11           | 10.0.3.232 | 3306 | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              | READ      |
| 20           | 10.0.3.173 | 3306 | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              | READWRITE |
| 20           | 10.0.3.78  | 3306 | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              | READWRITE |
| 20           | 10.0.3.141 | 3306 | ONLINE | 1000    | 0           | 1000            | 0                   | 0       | 0              | READWRITE |
+--------------+------------+------+--------+---------+-------------+-----------------+---------------------+---------+----------------+-----------+
6 rows in set (0.00 sec)
mysql> select * from scheduler;
+----+--------+-------------+----------------------------------+------+------+------+------+------------------------------------------------------+----------+
| id | active | interval_ms | filename                         | arg1 | arg2 | arg3 | arg4 | arg5                                                 | comment  |
+----+--------+-------------+----------------------------------+------+------+------+------+------------------------------------------------------+----------+
| 6  | 1      | 3000        | /usr/bin/proxysql_galera_checker | 10   | 11   | 1    | 1    | /var/lib/proxysql/cluster1_proxysql_galera_check.log | cluster1 |
| 7  | 1      | 3000        | /usr/bin/proxysql_galera_checker | 20   | 20   | 0    | 1    | /var/lib/proxysql/cluster2_proxysql_galera_check.log | cluster2 |
+----+--------+-------------+----------------------------------+------+------+------+------+------------------------------------------------------+----------+
2 rows in set (0.00 sec)
mysql> select * from mysql_query_rules;
+---------+--------+---------------+------------+--------+-------------+------------+------------+--------+---------------------+---------------+----------------------+--------------+---------+-----------------+-----------------------+-----------+-----------+---------+---------+-------+-------------------+----------------+------------------+-----------+--------+-------------+-----------+-----+-------+---------+
| rule_id | active | username      | schemaname | flagIN | client_addr | proxy_addr | proxy_port | digest | match_digest        | match_pattern | negate_match_pattern | re_modifiers | flagOUT | replace_pattern | destination_hostgroup | cache_ttl | reconnect | timeout | retries | delay | next_query_flagIN | mirror_flagOUT | mirror_hostgroup | error_msg | OK_msg | sticky_conn | multiplex | log | apply | comment |
+---------+--------+---------------+------------+--------+-------------+------------+------------+--------+---------------------+---------------+----------------------+--------------+---------+-----------------+-----------------------+-----------+-----------+---------+---------+-------+-------------------+----------------+------------------+-----------+--------+-------------+-----------+-----+-------+---------+
| 7       | 1      | cluster1_user | NULL       | 0      | NULL        | NULL       | NULL       | NULL   | ^SELECT.*FOR UPDATE | NULL          | 0                    | CASELESS     | NULL    | NULL            | 10                    | NULL      | NULL      | NULL    | NULL    | NULL  | NULL              | NULL           | NULL             | NULL      | NULL   | NULL        | NULL      | NULL | 1     | NULL    |
| 8       | 1      | cluster1_user | NULL       | 0      | NULL        | NULL       | NULL       | NULL   | ^SELECT             | NULL          | 0                    | CASELESS     | NULL    | NULL            | 11                    | NULL      | NULL      | NULL    | NULL    | NULL  | NULL              | NULL           | NULL             | NULL      | NULL   | NULL        | NULL      | NULL | 1     | NULL    |
+---------+--------+---------------+------------+--------+-------------+------------+------------+--------+---------------------+---------------+----------------------+--------------+---------+-----------------+-----------------------+-----------+-----------+---------+---------+-------+-------------------+----------------+------------------+-----------+--------+-------------+-----------+-----+-------+---------+
2 rows in set (0.00 sec)
mysql> exit
Bye

It’s as easy as that! We hope you continue to enjoy using ProxySQL Admin!

The post ProxySQL Admin Support for Multiple Clusters appeared first on Percona Database Performance Blog.

by Jericho Rivera at April 11, 2018 06:06 PM

Calling All Polyglots: Percona Live 2018 Keynote Schedule Now Available!

Percona Live 2018 Keynotes

Percona Live 2018 KeynotesWe’ve posted the Percona Live 2018 keynote addresses for the seventh annual Percona Live Open Source Database Conference 2018, taking place April 23-25, 2018 at the Santa Clara Convention Center in Santa Clara, CA. 

This year’s keynotes explore topics ranging from how cloud and open source database adoption accelerates business growth, to leading-edge emerging technologies, to the importance of MySQL 8.0, to the growing popularity of PostgreSQL.

We’re excited by the great lineup of speakers, including our friends at Alibaba Cloud, Grafana, Microsoft, Oracle, Upwork and VividCortex, the innovative leaders on the Cool Technologies panel, and Brendan Gregg from Netflix, who will discuss how to get the most out of your database on a Linux OS, using his experiences at Netflix to highlight examples.  

With the theme of “Championing Open Source Databases,” the conference will feature multiple tracks, including MySQL, MongoDB, Cloud, PostgreSQL, Containers and Automation, Monitoring and Ops, and Database Security. Once again, Percona will be offering a low-cost database 101 track for beginning users who want to learn how to use and operate open source databases.

The Percona Live 2018 keynotes include:

Tuesday, April 24, 2018

  • Open Source for the Modern Business – Peter Zaitsev of Percona will discuss how open source database adoption continues to grow in enterprise organizations, the expectations and definitions of what constitutes success continue to change. A single technology for everything is no longer an option; welcome to the polyglot world. The talk will include several compelling open source projects and trends of interest to the open source database community and will be followed by a round of lightning talks taking a closer look at some of those projects.
  • Cool Technologies Showcase – Four industry leaders will introduce key emerging industry developments. Andy Pavlo of Carnegie Mellon University will discuss the requirements for enabling autonomous database optimizations. Nikolay Samokhvalov of PostgreSQL.org will discuss new PostgreSQL tools. Sugu Sougoumarane of PlanetScale Data will explore how Vitess became a high-performance, scalable and available MySQL clustering cloud solution in line with today’s NewSQL storage systems. Shuhao Wu of Shopify explains how to use Ghostferry as a data migration tool for incompatible cloud platforms.
  • State of the Dolphin 8.0 – Tomas Ulin of Oracle will discuss the focus, strategy, investments and innovations that are evolving MySQL to power next-generation web, mobile, cloud and embedded applications – and why MySQL 8.0 is the most significant MySQL release in its history.
  • Linux Performance 2018 – Brendan Gregg of Netflix will summarize recent performance features to help users get the most out of their Linux systems, whether they are databases or application servers. Topics include the KPTI patches for Meltdown, eBPF for performance observability, Kyber for disk I/O scheduling, BBR for TCP congestion control, and more.

Wednesday, April 25, 2018

  • Panel Discussion: Database Evolution in the Cloud – An expert panel of industry leaders, including Lixun Peng of Alibaba, Sunil Kamath of Microsoft, and Baron Schwartz of VividCortex, will discuss the rapid changes occurring with databases deployed in the cloud and what that means for the future of databases, management and monitoring and the role of the DBA and developer.
  • Future Perfect: The New Shape of the Data Tier – Baron Schwartz of VividCortex will discuss the impact of macro trends such as cloud computing, microservices, containerization, and serverless applications. He will explore where these trends are headed, touching on topics such as whether we are about to see basic administrative tasks become more automated, the role of open source and free software, and whether databases as we know them today are headed for extinction.
  • MongoDB at Upwork – Scott Simpson of Upwork, the largest freelancing website for connecting clients and freelancers, will discuss how MongoDB is used at Upwork, how the company chose the database, and how Percona helps make the company successful.

We will also present the Percona Live 2018 Community Awards and Lightning Talks on Monday, April 23, 2018, during the Opening Night Reception. Don’t miss the first day of tutorials and Opening Night Reception!

Register for the conference on the Percona Live Open Source Database Conference 2018 website.

Sponsorships

Limited Sponsorship opportunities for Percona Live 2018 Open Source Database Conference are still available, and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

  • Diamond Sponsors – Percona, VividCortex
  • Platinum – Alibaba Cloud, Microsoft
  • Gold Sponsors – Facebook, Grafana
  • Bronze Sponsors – Altinity, BlazingDB, Box, Dynimize, ObjectRocket, Pingcap, Shannon Systems, SolarWinds, TimescaleDB, TwinDB, Yelp
  • Contributing Sponsors – cPanel, Github, Google Cloud, NaviCat
  • Media Sponsors – Database Trends & Applications, Datanami, EnterpriseTech, HPCWire, ODBMS.org, Packt

The post Calling All Polyglots: Percona Live 2018 Keynote Schedule Now Available! appeared first on Percona Database Performance Blog.

by Laurie Coffin at April 11, 2018 11:54 AM

Open Query Pty Ltd

Amazon AWS Billing

During our recent AWS research I observed again how their billing seems to lack clarity and granularity.  That is, it’s very difficult to figure out in detail what’s going on and where particular charges actually come from.  If you have a large server farm at AWS, that’s going to be a problem.  Unless of course you just pay whatever and don’t mind what’s on the bill?

Of particular interest was a charge for inter-AZ (DC) traffic.  Not because it was large, it was just a few cents.  But it was odd.  So this would be outbound traffic from our short-lived test server(s) to another AWS availability zone (datacenter) in the US.  This gets charged at a couple of cents per GB.

I had a yarn with billing support, and they noted that the AWS-side of the firewall had some things open and because they can’t see what goes on inside a server, they regarded it valid traffic.  The only service active on the server was SSH, and it keeps its own logs.  While doing the Aurora testing we weren’t specifically looking at this, so by the time the billing info showed up (over a day later), the server had been decommissioned already and along with it those logs.

As a sidenote, presuming this was “valid” traffic, someone was using AWS servers to scan other AWS servers and try and gain access.  I figured such activity clearly in breach of AWS policies would be of interest to AWS, but it wasn’t.  Seems a bit neglectful to me.  And with all the tech, shouldn’t their systems be able to spot such activities even automatically?

Some days later I fired up another server specifically to investigate the potential for rogue outbound traffic.  I again left SSH accessible to the world, to emulate the potential of being accessed from elsewhere, while keeping an eye on the log.  This test server only existed for a number of hours, and was fully monitored internally so we know exactly what went on.  Obviously, we had to leave the AWS-side firewall open to be able to perform the test.  Over hours there were a few login attempts, but nothing major.  There would have to be many thousands of login attempts to create a GB of outbound traffic – consider that there’s no SSH connection actually getting established, the attempts don’t get beyond the authentication stage so it’s just some handshake and the rejection going out.  So no such traffic was seen in this instance.

Of course, the presumed earlier SSH attempts may have just been the result of a scanning server getting lucky, whereas my later test server didn’t “get lucky” being scanned. It’s possible. To increase the possible attack surface, we put an nc (netcat) listener on ports 80, 443 and some others, just logging any connection attempt without returning outbound traffic.  This again saw one or two attempts, but no major flood.

I figured that was the end of it, and shut down the server.  But come billing time, we once again see a cent-charge for similar traffic.

And this time we know it definitely didn’t exist, because we were specifically monitoring for it.  So, we know for a fact that we were getting billed for traffic that didn’t happen. “How quaint”.  Naturally, because the AWS-side firewall was specifically left open, AWS billing doesn’t want to hear.  I suppose we could re-run the test again, this time with a fully set up AWS-side firewall, but it’s starting to chew up too much time.

My issues are these:

  1. For a high tech company, AWS billing is remarkably obtuse.  That’s a business choice, and not to the clients’ benefit.  Some years ago I asked my accountant whether my telco bill (Telstra at the time) was so convoluted for any particular accounting reason I wasn’t aware of, and her answer was simply “no”.  This is the same.  The company chooses to make their bills difficult to read and make sense of.
  2. Internal AWS network monitoring must be inadequate, if AWS hosted servers can do sweep scans and mass SSH login attempts. Those are patterns that can be caught.  That is, presuming, that those scans and attempts actually happened.  If they didn’t, the earlier traffic didn’t exist either, in which case we’re getting billed for stuff that didn’t happen. It’s one or the other, right?  (Based on the above observations, my bet is actually on the billing rather than the internal network monitoring – AWS employs very smart techs.)
  3. Us getting billed an extra cent on a bill doesn’t break the bank, but since it’s for something that we know didn’t happen, it’s annoying. It makes us wonder what else is wrong in the billing system, and whether other people too might get charged a cent extra here or there.  Does this matter?  Yes it does, because on the AWS end it adds up to many millions.  And for bigger AWS clients, it will add up over time also.

Does anybody care?  I don’t know.  Do you care?


Oh, and while Google (AdWords, etc), Microsoft and others have over the last few years adjusted their invoicing in Australia to produce an appropriate tax invoice with GST (sales tax) even though the billing is done from elsewhere (Singapore in Google’s case), AWS doesn’t do this.  Australian server instances are all sold from AWS Inc. in the Seattle, WA.

by Arjen Lentz at April 11, 2018 03:51 AM

April 10, 2018

MariaDB AB

How to Restore a Single Database from MariaDB Backup

How to Restore a Single Database from MariaDB Backup Ulrich Moser Tue, 04/10/2018 - 17:01

Lately, I’ve been asked how to restore a single database or even a single table out of a complete backup of MariaDB Server that was created with MariaDB Backup. This blog provides step-by-step guidance on how to achieve a restore of a database. Another blog post will pick up the question on how to restore a single table which has a separate set of challenges.

We will use the world sample database and a backup directory /opt/backup/ as an example to explain the process.

Step 1 – Creating the Backup and Preparing the Database for Export

As root or user with write permission to /opt/backup issue the following commands:

# TS=`date +"%Y-%m-%d_%H-%M-%S"`
# mkdir /opt/backup/${TS}
# mariabackup --backup --user backup1 --password MariaDB \ --target-dir "/opt/backup/${TS}"

This created a directory /opt/backup/2018-03-28_19-02-56 with the complete backup.

To be able to restore a database or to be more precise all or some tables of a database you first need to have the tables prepared for export. This is the easiest step in the process. To prepare all tables of a database world for export issue the following command:

# mariabackup  --prepare --export --databases world \
--user backup1 --password MariaDB \
--target-dir "/opt/backup/${TS}"

After this step if you go to the backup directory you will find .cfg files for all tables in world.

# cd /opt/backup/2018-03-28_19-02-56
# ls -l world
total 1132
-rw-rw---- 1 root root    686 Mar 28 19:05 city.cfg
-rw-r----- 1 root root   1578 Mar 28 19:03 city.frm
-rw-r----- 1 root root 606208 Mar 28 19:03 city.ibd
-rw-r----- 1 root root    856 Mar 28 19:03 country_capital.frm
-rw-rw---- 1 root root   1228 Mar 28 19:05 country.cfg
-rw-r----- 1 root root   1618 Mar 28 19:03 country.frm
-rw-r----- 1 root root 163840 Mar 28 19:03 country.ibd
-rw-rw---- 1 root root    665 Mar 28 19:05 countrylanguage.cfg
-rw-r----- 1 root root   1542 Mar 28 19:03 countrylanguage.frm
-rw-r----- 1 root root 229376 Mar 28 19:03 countrylanguage.ibd
-rw-r----- 1 root root     61 Mar 28 19:03 db.opt

country_capital.frm is a view on country and city tables therefore it has no .cfg file since it has no tablespace.

Step 2 – Creating empty tables for the restore

Next, you’ll need to create a database you want to restore the tables to.  The database does not necessarily need to be named the same as the database in the backup. For demonstration purposes, we use a database named world2.

What you need is the CREATE DATABASE and CREATE TABLE SQL statements that you used to create the original tables. You can obtain these from your server by taking the full CREATE TABLE statements from SHOW CREATE TABLE for each table (see emphasized text).

MariaDB [world]> SHOW CREATE DATABASE world\G
************************** 1. row ***************************
Database: world
Create Database: CREATE DATABASE `world` /*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci */
1 row in set (0.00 sec)

MariaDB [world]> SHOW CREATE TABLE country\G
*************************** 1. row ***************************
Table: country
Create Table: CREATE TABLE `country` (
`Code` char(3) NOT NULL DEFAULT '',
`Name` char(52) NOT NULL DEFAULT '',
`Continent` enum('Asia','Europe','North America','Africa','Oceania','Antarctica','South America') NOT NULL
DEFAULT 'Asia',
`Region` char(26) NOT NULL DEFAULT '',
`SurfaceArea` float(10,2) NOT NULL DEFAULT 0.00,
`IndepYear` smallint(6) DEFAULT NULL,
`Population` int(11) NOT NULL DEFAULT 0,
`LifeExpectancy` float(3,1) DEFAULT NULL,
`GNP` float(10,2) DEFAULT NULL,
`GNPOld` float(10,2) DEFAULT NULL,
`LocalName` char(45) NOT NULL DEFAULT '',
`GovernmentForm` char(45) NOT NULL DEFAULT '',
`HeadOfState` char(60) DEFAULT NULL,
`Capital` int(11) DEFAULT NULL,
`Code2` char(2) NOT NULL DEFAULT '',
PRIMARY KEY (`Code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MariaDB [world]> SHOW CREATE TABLE city\G
*************************** 1. row ***************************
      Table: city
Create Table: CREATE TABLE `city` (
 `ID` int(11) NOT NULL AUTO_INCREMENT,
 `Name` char(35) NOT NULL DEFAULT '',
 `CountryCode` char(3) NOT NULL DEFAULT '',
 `District` char(20) NOT NULL DEFAULT '',
 `Population` int(11) NOT NULL DEFAULT 0,
 PRIMARY KEY (`ID`),
 KEY `CountryCode` (`CountryCode`),
 CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `country` (`Code`)
) ENGINE=InnoDB AUTO_INCREMENT=4100 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MariaDB [world]> SHOW CREATE TABLE countrylanguage\G
*************************** 1. row ***************************
      Table: countrylanguage
Create Table: CREATE TABLE `countrylanguage` (
 `CountryCode` char(3) NOT NULL DEFAULT '',
 `Language` char(30) NOT NULL DEFAULT '',
 `IsOfficial` enum('T','F') NOT NULL DEFAULT 'F',
 `Percentage` float(4,1) NOT NULL DEFAULT 0.0,
 PRIMARY KEY (`CountryCode`,`Language`),
 KEY `CountryCode` (`CountryCode`),
 CONSTRAINT `countryLanguage_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `country` (`Code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

You need to remove any referential integrity constraints from the CREATE TABLE statements (see text in red) and recreate them after successfully importing the tablespaces because this can cause problems when you try to discard the tablespace in the next step.

MariaDB [world2]> ALTER TABLE country DISCARD TABLESPACE;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails

In this case drop the FOREIGN KEY CONSTRAINT by issuing:

MariaDB [world2]> ALTER TABLE city DROP FOREIGN KEY city_ibfk_1;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

In the CREATE DATABASE statement replace world with world2.

If you do not have your original database or tables anymore you need to get your latest CREATE statements from your application. So it is always a good idea to get the CREATE statements from every database you have on your servers whenever a change to the schema has occurred and store them in a safe place.

If your original database schema still exists you can also use the following statements to prepare the database for restore:

MariaDB [(none)]> CREATE DATABASE world2;
Query OK, 1 row affected (0.01 sec)
MariaDB [(none)]> use world2
Database changed

MariaDB [world2]> CREATE TABLE country LIKE world.country;
Query OK, 0 rows affected (0.05 sec)

MariaDB [world2]> CREATE TABLE city LIKE world.city;
Query OK, 0 rows affected (0.04 sec)

MariaDB [world2]> CREATE TABLE countrylanguage LIKE world.countrylanguage;
Query OK, 0 rows affected (0.04 sec)

Referential integrity constraints are not copied into the new schema.

Step 3 – Discard the tablespaces

MariaDB [world2]> ALTER TABLE country DISCARD TABLESPACE;
Query OK, 0 rows affected (0.02 sec)
MariaDB [world2]> ALTER TABLE city DISCARD TABLESPACE;
Query OK, 0 rows affected (0.01 sec)
MariaDB [world2]> ALTER TABLE countrylanguage DISCARD TABLESPACE;
Query OK, 0 rows affected (0.01 sec)

After this step the database directory for world2 only contains the .frm files and the db.opt file.

Step 4 – Copy the tables to restore to the new database directory

# cp /opt/backup/2018-03-28_19-02-56/world/*.* /var/lib/mysql/world2

If you look into the database directory world2 now you will see the following:

# ls -l
total 1008
-rw-r----- 1 root  root 686 Mar 28 19:25 city.cfg
-rw-rw---- 1 mysql mysql   1578 Mar 28 19:25 city.frm
-rw-r----- 1 root  root 606208 Mar 28 19:25 city.ibd
-rw-r----- 1 root  root 856 Mar 28 19:25 country_capital.frm
-rw-r----- 1 root  root 1228 Mar 28 19:25 country.cfg
-rw-rw---- 1 mysql mysql   1618 Mar 28 19:25 country.frm
-rw-r----- 1 root  root 163840 Mar 28 19:25 country.ibd
-rw-r----- 1 root  root 665 Mar 28 19:25 countrylanguage.cfg
-rw-rw---- 1 mysql mysql   1542 Mar 28 19:25 countrylanguage.frm
-rw-r----- 1 root  root 229376 Mar 28 19:25 countrylanguage.ibd
-rw-rw---- 1 mysql mysql     61 Mar 28 19:25 db.opt

The form files are owned by user and group mysql but the tablespace and export files (.cfg files) are not. To be able to import the tablespaces you need to change the ownership.

chown -R mysql:mysql /var/lib/mysql/world2

Step 5 – Import the tablespaces

To complete the process you now need to import the restored tablespaces.

MariaDB [world2]> ALTER TABLE country IMPORT TABLESPACE;
Query OK, 0 rows affected (0.09 sec)
MariaDB [world2]> ALTER TABLE city IMPORT TABLESPACE;
Query OK, 0 rows affected (0.10 sec)
MariaDB [world2]> ALTER TABLE countrylanguage IMPORT TABLESPACE;
Query OK, 0 rows affected (0.06 sec)

After importing the tablespace the database is fully restored. A SELECT against the imported tables shows that they have all the data expected:

MariaDB [world]> select count(id) from world.city;
+-----------+
| count(id) |
+-----------+
|      4081 |
+-----------+
1 row in set (0.01 sec)

MariaDB [world2]> select count(id) from world2.city;
+-----------+
| count(id) |
+-----------+
|      4081 |
+-----------+
1 row in set (0.01 sec)

If all went well we only need to add the FOREIGN KEY CONSTRAINTS again which is done in Step 6.

During import of the tablespaces you might get an error saying that the flags of the tablespaces to be imported do not match with the flags of the newly created tables in the new database.

MariaDB [world3]> ALTER TABLE country IMPORT TABLESPACE;
ERROR 1808 (HY000): Schema mismatch (Table flags don't match, sserver table has 0x21 and the meta-data file has 0x1)

See Step 7 on how to get around this error.

Step 6 – Recreate FOREIGN KEY Constraint

Recreate FOREIGN KEY constraint on table city:

MariaDB [world2]> ALTER TABLE city ADD CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `country` (`Code`);
Query OK, 4081 rows affected (0.15 sec)
Records: 4081  Duplicates: 0 Warnings: 0

Recreate FOREIGN KEY constraint on table countrylanguage:

MariaDB [world2]> ALTER TABLE countrylanguage ADD CONSTRAINT `countryLanguage_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `country` (`Code`);
Query OK, 984 rows affected (0.12 sec)             
Records: 984  Duplicates: 0 Warnings: 0

Step 7 – Identifying the InnoDB File Formats

If the world database was created with a version prior to that the Antelope file format you have ROW_FORMAT=COMPACT which corresponds to FLAG: 1 (0x1). If the version you are restoring to uses Barracuda file format the ROW_FORMAT will be Dynamic which corresponds to FLAG: 33 (0x21).

So check the row format of the original tables if you still have access to them.

MariaDB [(none)]> select * from information_schema.innodb_sys_tables where name like 'world/%'\G
*************************** 1. row **************************
    TABLE_ID: 125
        NAME: world/city
        FLAG: 1
      N_COLS: 12
       SPACE: 131
 FILE_FORMAT: Antelope
  ROW_FORMAT: Compact
ZIP_PAGE_SIZE: 0
  SPACE_TYPE: Single

This will also be the format of the tablespace files in
/opt/backup/2018-03-28_19-02-56/world .

Do the same check on the new tables. If it looks like in the example below you must proceed with Step 8.

MariaDB [world]> select * from information_schema.innodb_sys_tables where name like 'world2/%'\G
*************************** 1. row ***************************
    TABLE_ID: 159
        NAME: world/city
        FLAG: 33
      N_COLS: 8
       SPACE: 108
 FILE_FORMAT: Barracuda
  ROW_FORMAT: Dynamic
ZIP_PAGE_SIZE: 0
  SPACE_TYPE: Single

If you still have access to a working copy of the original database and tables like in the example above you can do the FILE_FORMAT check before trying to import the tablespaces and do Step 8 before.

Step 8 – Adjust FILE_FORMAT and ROW_FORMAT

In this case you need to change the ROW_FORMAT of the new empty tables to COMPACT by issuing:

ALTER TABLE country ROW_FORMAT=COMPACT;
Query OK, 0 rows affected (0.01 sec)
ALTER TABLE city ROW_FORMAT=COMPACT;
Query OK, 0 rows affected (0.01 sec)
ALTER TABLE countrylanguage ROW_FORMAT=COMPACT;
Query OK, 0 rows affected (0.01 sec)

Now retry from Step 5.

Restoring a database to a MariaDB Galera Cluster

Generally speaking, this same procedure can be used to restore a single database to a MariaDB Galera Cluster. The imported tablespaces will only be available on the node where the restore has been executed since tablespace imports are not replicated to the other nodes. We will cover the whole how to restore a single database to a MariaDB Galera Cluster including samples in a later blog. Stay tuned!

Lately, I’ve been asked how to restore a single database or even a single table out of a complete backup of MariaDB Server that was created with MariaDB Backup. This blog provides step-by-step guidance on how to achieve a restore of a database. Another blog post will pick up the question on how to restore a single table which has a separate set of challenges.

David Choy

David Choy

Fri, 04/20/2018 - 11:32

Thank you

Thank you for providing example on this. I was hoping if someone could write a blog for setting up avro-router with the master node so I can stream to kafka.

Dashamir Hoxha

Dashamir Hoxha

Sun, 04/22/2018 - 05:12

Restoring with rsync

How a bout using these instructions:
https://mariadb.com/kb/en/library/full-backup-and-restore-with-mariadb-backup/#restoring-with-other-tools

It seems like it should work. I have tested it a bit (not very extensively) and it seems to work. Is there anything wrong with this approach?

Login or Register to post comments

by Ulrich Moser at April 10, 2018 09:01 PM

Peter Zaitsev

Migrating Database Charsets to utf8mb4: A Story from the Trenches

utf8mb4

utf8mb4In this blog post, we’ll look at options for migrating database charsets to utf8mb4.

Migrating charsets, in my opinion, is one of the most tedious tasks in a DBA’s life. There are so many things involved that can screw up our data, making it work is always hard. Sometimes what seems like a trivial task can become a nightmare very easily, and keeps us working for longer than expected.

I’ve recently worked on a case that challenged me with lots of tests due to some existing schema designs that made InnoDB suffer. I’ve decided to write this post to put together some definitive guide to enact charset conversion with minimal downtime and pain.

  • First disclosure: I can’t emphasize enough that you need to always backup your data. If something goes wrong, you can always roll things back by keeping a healthy set of backups.
  • Second disclosure: A backup can’t be considered a good backup until you test it, so I can’t emphasize enough that running regular backups and also performing regular restore tests is a must-to-do task for being in the safe side.
  • Third and last disclosure: I’m not pretending to present the best or only way to do this exercise. This is the way I consider easiest and painless to perform a charset conversion with minimal downtime.

My approach involves at least one slave for failover and logical/physical backup operations to make sure that data is loaded properly using the right charset.

In this case, we are moving from latin1 (default until MySQL 8.0.0) to utf8mb4 (new default from 8.0.1). In this post, Lefred refers to this change and some safety checks for upgrading. For our change, an important thing to consider: Latin1 charset stores one byte per character, while utf8mb4 can store up to four bytes per character. This change definitely impacts the disk usage, but also makes us hit some limits that I describe later in the plan.

So let’s put out hands in action. First, let’s create a slave using a fresh (non-locking) backup. Remember that these operations are designed to minimize downtime and reduce any potential impact on our production server.

If you already have a slave that can act as a master replacement then you can skip this section. In our source server, configure binlog_format and flush logs to start with fresh binary logs:

set global binlog_format=MIXED;
flush logs;

Start a streaming backup using Percona Xtrabackup through netcat in the destination server:

nc -l 9999 | cat - > /dest/folder/backup.tar

and in our source server:

innobackupex --stream=tar ./ | nc dest_server_ip 9999

Once the backup is done, untar and restore the backup. Then set up the slave:

tar -xif /dest/folder/backup.tar
innobackupex --apply-log /dest/folder/
/etc/init.d/mysql stop
rm -rf /var/lib/mysql/
mv /dest/folder/* /var/lib/mysql/
chown -R mysql:mysql /var/lib/mysql
/etc/init.d/mysql start
cat /var/lib/mysql/xtrabackup_binlog_info
change master to master_host='master_host', master_user='master_user, master_password='master_password', master_log_file='file_printed_in_xtrabackup_binlog_info', master_log_pos=pos_printed_in_xtrabackup_binlog_info;
start slave;

Now that we have the slave ready, we prepare our dataset by running two mysqldump processes so we have data and schemas in separate files. You can also run this operation using MyDumper or mysqlpump, but I will keep it easy:

STOP SLAVE;
SHOW SLAVE STATUS;

Write down this output, as it may be needed later:

mysqldump --skip-set-charset --no-data --databases `mysql --skip-column-names -e "SELECT GROUP_CONCAT(schema_name SEPARATOR ' ') FROM information_schema.schemata WHERE schema_name NOT IN ('mysql','performance_schema','information_schema');"` > schema.sql
mysqldump --skip-set-charset -n -t --databases `mysql --skip-column-names -e "SELECT GROUP_CONCAT(schema_name SEPARATOR ' ') FROM information_schema.schemata WHERE schema_name NOT IN ('mysql','performance_schema','information_schema');"` > data.sql

Notice that I’m passing a command as an argument to –databases to dump all databases but mysql, performance_schema and information_schema (hack stolen from this post, with credit to Ronald Bradford).  It is very important to keep the replication stopped, as we will resume replication after fully converting our charset.

Now we have to convert our data to utf8mb4. This is easy as we just need to touch the schema.sql file by running few commands:

sed -e "s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8mb4/g" schema.sql
sed -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci /" schema.sql
sed -e "s/SET character_set_client = utf8/SET character_set_client = utf8mb4/" schema.sql

Can this be a one-liner? Yes, but I’m not a good basher. 🙂

Now we are ready to restore our data using new encoding:

mysql -e "set global innodb_large_prefix=1;"
mysql < schema.sql
mysql < data.sql

Notice I’ve enabled the variable innodb_large_prefix. This is important because InnoDB limits index prefixes to 768 bytes by default. If you have an index based in a varchar(255) data type, you will get an error because the new charset exceeds this limit (up to four bytes per character goes beyond 1000 bytes) unless you limit the index prefix. To avoid issues during data load, we enable this variable to extend the limit to 3072 bytes.

Finally, let’s configure our server and restart it to make sure to set new defaults properly. In the my.cnf file, add:

[client]
default-character-set=utf8mb4
[mysqld]
skip-slave-start
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
innodb_large_prefix=1

Let’s resume replication after the restart, and make sure everything is ok:

START SLAVE;
SHOW SLAVE STATUS;

Ok, at this point we should be fine and our data should be already converted to utf8mb4. So far so good. The next step is to failover applications to use the new server, and rebuild the old server using a fresh backup using xtrabackup as described above.

There are few things we need to consider now before converting this slave into master:

  1. Make sure you properly configured applications. Charset and collation values can be set as session level, so if you set your connection driver to another charset then you may end up mixing things in your data.
  2. Make sure the new slave is powerful enough to handle traffic from the master.
  3. Test everything before failing over production applications. Going from Latin1 to utf8mb4 should be straightforward, as utf8mb4 includes all the characters in Latin1. But let’s face it, things can go wrong and we are trying to avoid surprises.
  4. Last but not least, all procedures were done in a relatively small/medium sized dataset (around 600G). But this conversion (done via logical backups) is more difficult when talking about big databases (i.e., in the order of TBs). In these cases, the procedure helps but might not be good enough due to time restrictions (imagine loading a 1TB table from a logical dump — it take ages). If you happen to face such a conversion, here is a short, high-level plan:
    • Convert only smaller tables in the slave (i.e., those smaller than 500MB) following same procedure. Make sure to exclude big tables from the dump using the –ignore-tables parameter in mysqldump.
    • Convert bigger tables via alter table, as follows:
      ALTER TABLE big_table MODIFY latin1_column varbinary(250);
      ALTER TABLE big_table MODIFY latin1_column varchar(250) CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
    • Once everything is finished, you can resume replication. Notice you can do dump/conversion/restore in parallel with the altering of bigger tables, which should reduce the time required for conversion.

It’s important to understand why we need the double conversion from latin1 to varbinary to utf8mb4. This post from Marco Tusa largely explains this.

Conclusion

I wrote this guide from my experience working with these type of projects. If you Google a bit, you’ll find a lot of resources that make this work, along with different solutions. What I’ve tried to present here is a guide to help you deal with these projects. Normally, we have to perform these changes in existing datasets that sometimes are big enough to prevent any work getting done via ALTER TABLE commands. Hopefully, you find this useful!

The post Migrating Database Charsets to utf8mb4: A Story from the Trenches appeared first on Percona Database Performance Blog.

by Francisco Bordenave at April 10, 2018 07:12 PM

Webinar Thursday, April 12, 2018: MySQL Test Framework for Troubleshooting

MySQL Testing Framework

MySQL Testing FrameworkPercona’s Principal Support Engineer, Sveta Smirnova presents the webinar MySQL Test Framework for Troubleshooting on April 12, 2018, at 10:00 am PDT (UTC-7) / 1:00 pm EDT (UTC-4).

MySQL Test Framework (MTR) provides a unit test suite for MySQL. MySQL Server developers and contributors write the tests in the framework, and use them to ensure the build is working correctly.

I found that this isn’t the only thing that makes MTR useful. I regularly use it in my support job to help customers and verify bug reports.

With MySQL Test Framework I can:

  • Create a complicated environment in a single step, and re-use it later
  • Test the same scenario on dozens of MySQL/Percona/MariaDB server versions with a single command
  • Test concurrent scenarios
  • Test errors and return codes
  • Work with results, external commands and stored routines

Everything can be done with a single script that can be reused on any machine, any time, with any MySQL/Percona/MariaDB Server version.

In this webinar, I will show my way of working with MySQL Test Framework. I hope you will love it as I do!

Register for the webinar now.

MySQL Test FrameworkSveta Smirnova, Principal Technical Services Engineer

Sveta joined Percona in 2015. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can quickly solve typical issues and teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona, Sveta worked as Support Engineer in MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is the author of book “MySQL Troubleshooting” and JSON UDF functions for MySQL.

The post Webinar Thursday, April 12, 2018: MySQL Test Framework for Troubleshooting appeared first on Percona Database Performance Blog.

by Sveta Smirnova at April 10, 2018 01:36 PM

Jean-Jerome Schmidt

How to Make Your MySQL or MariaDB Database Highly Available on AWS and Google Cloud

Running databases on cloud infrastructure is getting increasingly popular these days. Although a cloud VM may not be as reliable as an enterprise-grade server, the main cloud providers offer a variety of tools to increase service availability. In this blog post, we’ll show you how to architect your MySQL or MariaDB database for high availability, in the cloud. We will be looking specifically at Amazon Web Services and Google Cloud Platform, but most of the tips can be used with other cloud providers too.

Both AWS and Google offer database services on their clouds, and these services can be configured for high availability. It is possible to have copies in different availability zones (or zones in GCP), in order to increase your chances to survive partial failure of services within a region. Although a hosted service is a very convenient way of running a database, note that the service is designed to behave in a specific way and that may or may not fit your requirements. So for instance, AWS RDS for MySQL has a pretty limited list of options when it comes to failover handling. Multi-AZ deployments come with 60-120 seconds failover time as per the documentation. In fact, given the “shadow” MySQL instance has to start from a “corrupted” dataset, this may take even longer as more work could be required on applying or rolling back transactions from InnoDB redo logs. There is an option to promote a slave to become a master, but it is not feasible as you cannot reslave existing slaves off the new master. In the case of a managed service, it is also intrinsically more complex and harder to trace performance problems. More insights on RDS for MySQL and its limitations in this blog post.

On the other hand, if you decide to manage the databases, you are in a different world of possibilities. A number of things that you can do on bare metal are also possible on EC2 or Compute Engine instances. You do not have the overhead of managing the underlying hardware, and yet retain control on how to architect the system. There are two main options when designing for MySQL availability - MySQL replication and Galera Cluster. Let’s discuss them.

MySQL Replication

MySQL replication is a common way of scaling MySQL with multiple copies of the data. Asynchronous or semi-synchronous, it allows to propagate changes executed on a single writer, the master, to replicas/slaves - each of which would contain the full data set and can be promoted to become the new master. Replication can also be used for scaling reads, by directing read traffic to replicas and offloading the master in this way. The main advantage of replication is the ease of use - it is so widely known and popular (it’s also easy to configure) that there are numerous resources and tools to help you manage and configure it. Our own ClusterControl is one of them - you can use it to easily deploy a MySQL replication setup with integrated load balancers, manage topology changes, failover/recovery, and so on.

One major issue with MySQL replication is that it is not designed to handle network splits or master’s failure. If a master goes down, you have to promote one of the replicas. This is a manual process, although it can be automated with external tools (e.g. ClusterControl). There is also no quorum mechanism and there is no support for fencing of failed master instances in MySQL replication. Unfortunately, this may lead to serious issues in distributed environments - if you promoted a new master while your old one comes back online, you may end up writing to two nodes, creating data drift and causing serious data consistency issues.

We’ll look into some examples later in this post, that shows you how to detect network splits and implement STONITH or some other fencing mechanism for your MySQL replication setup.

Galera Cluster

We saw in the previous section that MySQL replication lacks fencing and quorum support - this is where Galera Cluster shines. It has a quorum support built-in, it also has a fencing mechanism which prevents partitioned nodes from accepting writes. This makes Galera Cluster more suitable than replication in multi-datacenter setups. Galera Cluster also supports multiple writers, and is able to resolve write conflicts. You are therefore not limited to a single writer in a multi-datacenter setup, it is possible to have a writer in every datacenter which reduces the latency between your application and database tier. It does not speed up writes as every write still has to be sent to every Galera node for certification, but it’s still easier than to send writes from all application servers across WAN to one single remote master.

As good as Galera is, it is not always the best choice for all workloads. Galera is not a drop-in replacement for MySQL/InnoDB. It shares common features with “normal” MySQL -  it uses InnoDB as storage engine, it contains the entire dataset on every node, which makes JOINs feasible. Still, some of the performance characteristics of Galera (like the performance of writes which are affected by network latency) differ from what you’d expect from replication setups. Maintenance looks different too: schema change handling works slightly different. Some schema designs are not optimal: if you have hotspots in your tables, like frequently updated counters, this may lead to performance issues. There is also a difference in best practices related to batch processing - instead of executing queries in large transactions, you want your transactions to be small.

Proxy tier

It is very hard and cumbersome to build a highly available setup without proxies. Sure, you can write code in your application to keep track of database instances, blacklist unhealthy ones, keep track of the writeable master(s), and so on. But this is much more complex than just sending traffic to a single endpoint - which is where a proxy comes in. ClusterControl allows you to deploy ProxySQL, HAProxy and MaxScale. We will give some examples using ProxySQL, as it gives us good flexibility in controlling database traffic.

ProxySQL can be deployed in a couple of ways. For starters, it can be deployed on separate hosts and Keepalived can be used to provide Virtual IP. The Virtual IP will be moved around should one of the ProxySQL instances fail. In the cloud, this setup can be problematic as adding an IP to the interface usually is not enough. You would have to modify Keepalived configuration and scripts to work with elastic IP (or static -however it might be called by your cloud provider). Then one would use cloud API or CLI to relocate this IP address to another host. For this reason, we’d suggest to collocate ProxySQL with the application. Each application server would be configured to connect to the local ProxySQL, using Unix sockets. As ProxySQL uses an angel process, ProxySQL crashes can be detected/restarted within a second. In case of hardware crash, that particular application server will go down along with ProxySQL. The remaining application servers can still access their respective local ProxySQL instances. This particular setup has additional features. Security - ProxySQL, as of version 1.4.8, does not have support for client-side SSL. It can only setup SSL connection between ProxySQL and the backend. Collocating ProxySQL on the application host and using Unix sockets is a good workaround. ProxySQL also has the ability to cache queries and if you are going to use this feature, it makes sense to keep it as close to the application as possible to reduce latency. We would suggest to use this pattern to deploy ProxySQL.

Typical setups

Let’s take a look at examples of highly available setups.

Single datacenter, MySQL replication

The assumption here is that there are two separate zones within the datacenter. Each zone has redundant and separate power, networking and connectivity to reduce the likelihood of two zones failing simultaneously. It is possible to set up a replication topology spanning both zones.

Here we use ClusterControl to manage the failover. To solve the split-brain scenario between availability zones, we collocate the active ClusterControl with the master. We also blacklist slaves in the other availability zone to make sure that automated failover won’t result in two masters being available.

Multiple datacenters, MySQL replication

In this example we use three datacenters and Orchestrator/Raft for quorum calculation. You might have to write your own scripts to implement STONITH if master is in the partitioned segment of the infrastructure. ClusterControl is used for node recovery and management functions.

Multiple datacenters, Galera Cluster

In this case we use three datacenters with a Galera arbitrator in the third one - this makes possible to handle whole datacenter failure and reduces a risk of network partitioning as the third datacenter can be used as a relay.

For further reading, take a look at the “How to Design Highly Available Open Source Database Environments” whitepaper and watch the webinar replay “Designing Open Source Databases for High Availability”.

by krzysztof at April 10, 2018 09:36 AM

Valeriy Kravchuk

Fun with Bugs #65 - On MySQL Bug Reports I am Subscribed to, Part V

I think it's time to review some bugs I've subscribed to several months ago, those older than in the first post from this series. There are several really serious bugs in the list of 15 below:
  • Bug #87560 - "XA PREPARE log order error in replication and binlog recovery". This bug was reported by Wei Zhao, who also provided patches.
  • Bug #87526 - "The output of 'XA recover convert xid' is not useful". This bug reported by Sveta Smirnova is well known and is a real pain for DBAs who have to deal with incomplete XA transactions after some crash or unexpected restart. Check PS-1818 and MariaDB task MDEV-14593. The problem is resolved in MariaDB 10.3.3+ by a new XA RECOVER FORMAT='SQL' option.
  • Bug #87164 - "Queries running much slower in version 5.7 versus 5.6". It was reported by  Alok Pathak from Percona and stays "Verified" since August, 2017.
  • Bug #87130 - "XA COMMIT not taken as transaction boundary". Yet another XA bug report with a patch contributed by Wei Zhao.
  • Bug #87084 - "FK DELETE CASCADE does not honor innodb_lock_wait_timeout". Nice report by Elena Stepanova from MariaDB. As you can find out from MDEV-15219, it's properly fixed in MariaDB 10.2.13+.
  • Bug #87065 - "Release lock on table statistics after query plan created". Great feature request by Sveta Smirnova. The actual problem behind this feature request was resolved by Percona in versions 5.7.20-18+. This fix is one of few really good reasons to use recent Percona Server 5.7, so I opened MDEV-15101 for MariaDB also.
  • Bug #86926 - "The field table_name (varchar(64)) from mysql.innodb_table_stats can overflow.". I really wonder why this bug report by Jean-François Gagné still remains just "Verified".
  • Bug #86865 - "InnoDB does unnecessary work when extending a tablespace". Bug report and patch by Alexey Kopytov.
  • Bug #86705 - "Memory leak of Innodb". Great example of a leak (in MySQL 5.5.x only) found with a help of Valgrind/Massif. Qinglin Zhang also suggested a simple patch.
  • Bug #86475 - "Error with functions and group by with ONLY_FULL_GROUP_BY". Nice feature request from Arnaud Adant.
  • Bug #86462 - "mysql_ugprade: improve handling of upgrade errors". Simon Mudd asked for some better error messages at least, so that running under strace would not be needed.
  • Bug #86215 - "MySQL is much slower in 5.7 vs 5.6". This report from Mark Callaghan includes a lot of results and details on performance regressions at low concurrency starting from 5.0 and up to 8.0. For some cases studied the biggest drop in QPS is from 5.6 to 5.7.
  • Bug #86163 - "can't update temporary table when joined with table with triggers on read-only". I'd call this bug found by Bret Westenskow funny. The rest of them in this list are serious.
  • Bug #85970 - "Memory leak with transactions greater than 10% of the total redo log size". Nice corner case was studied by Joffrey MICHAÏE. Whenever you see
    The size of BLOB/TEXT data inserted in one transaction is greater than 10% of redo log size...
    error messages, take care, as memory allocated may not be released.
  • Bug #85910 - "Increased Performance_Schema overhead on Sending data". Seems to be a known problem fixed in MySQL 5.7. Still a nice report by Jervin R on the overhead one may expect from MySQL 5.6 on some systems.
So, we traveled back in time for a year. I am really sorry to see XA-related bugs and some reports with patches properly contributed still active.


by Valeriy Kravchuk (noreply@blogger.com) at April 10, 2018 04:12 AM

Peter Zaitsev

Starting MongoDB Database Software

MongoDB Database Software

In this blog post, we will cover how to start MongoDB database software in the three most used platforms: Windows/Linux/MacOS.

If you have just started with NoSQL databases, you might wonder how to evaluate if MongoDB is a good fit for your application.

Percona provides a signed version of MongoDB called Percona Server for MongoDB with a couple of enterprise-grade features included free of charge that runs on all Linux flavors. We also support MongoDB, please check out our support page. But, what if you’re running a test on your study laptop, PC or not. How do you easily start a mongod process for testing? Below I demonstrate how to start MongoDB database software on the three most popular operating systems.

Microsoft Windows

First of all, be aware of this hotfix: https://support.microsoft.com/en-ca/help/2731284/33-dos-error-code-when-memory-memory-mapped-files-are-cleaned-by-using.

You might need to restart the computer after applying the fix. Then download the .zip file. The website only offers an MSI, but we don’t want to install the binaries, we just want to run it.

Click here to download the 3.4.10 version:
http://downloads.mongodb.org/win32/mongodb-win32-x86_64-2008plus-ssl-3.4.10.zip

After the download, use your favorite decompressing tool to extract the MongoDB executables. Then cut the extracted folder to your Documents or C: or even a memory stick (but don’t expect high performance):

 

Inside of the bin folder, create a data folder. We are going to use this folder to save our databases.

 

Now we have everything we need to start the database. Open the CMD, and run the following commands to start the database:

C:\mongodb\bin\mongod --dbpath c:\mongodb\bin\data

You will see an output like:

This means the process is running.

In a different CMD, connect to the database using:

C:mongodbbinmongod --dbpath c:mongodbbindata

I’ve passed the –quiet to omit the warnings:

And here we go, MongoDB is running on a windows machine!

MacOS and Linux configuration:

For macOS, the process is very similar to Windows. The difference is that we can take advantage of the extensive bash commands that the UNIX-like system offers.

Open the terminal. Go to our home/Downloads folder:

cd ~/Downloads

Download MongoDB for MacOS or Linux:

wget https://fastdl.mongodb.org/osx/mongodb-osx-ssl-x86_64-3.6.3.tgz
Download mongodb for Linux:
wget https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-3.6.3.tgz
untar the file:
tar -xvzf mongodb-osx-ssl-x86_64-3.6.3.tgz
Change the folder name to mongodb // just to make it easier
mv mongodb-osx-x86_64-3.6.3/ ~/Downloads/mongodb
Right now all the binaries are on ~/Downloads/mongodb/
Create a folder to save the database is in ~/Downloads/mongodb/bin/
mkdir ~/Downloads/mongodb/bin/data
Start the mongod process
./mongod --dbpath data

The output must be similar to:

On a different tab run:

~/Downloads/mongodb/bin/mongo

At this point, you should be able to use MongoDB with the default options on MacOS or Linux.

Note that we aren’t enabling authentication either configuring a replica set.

If we don’t pass the –quiet parameter we will receive a few warnings like:

2018-03-16T14:26:20.868-0300 I CONTROL  [initandlisten]
I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
I CONTROL  [initandlisten] **  Read and write access to data and configuration is unrestricted.
I CONTROL  [initandlisten]
I CONTROL  [initandlisten] ** WARNING: This server is bound to localhost.
I CONTROL  [initandlisten] **          Remote systems will be unable to connect to this server.
I CONTROL  [initandlisten] **       Start the server with --bind_ip <address> to specify which IP
I CONTROL  [initandlisten] **  addresses it should serve responses from, or with --bind_ip_all to
I CONTROL  [initandlisten] **   bind to all interfaces. If this behavior is desired, start the
I CONTROL  [initandlisten] **          server with --bind_ip 127.0.0.1 to disable this warning.
I CONTROL  [initandlisten]

For more information about how to configure those parameters, please refer to the following blog post and or documentation:

https://www.percona.com/blog/2017/12/15/mongodb-3-6-security-improvements/

https://www.percona.com/blog/2017/05/17/mongodb-authentication-and-roles-creating-your-first-personalized-role/

https://www.percona.com/blog/2016/08/12/tuning-linux-for-mongodb/

To stop the mongod process, use ctrl+c (on any operating system) in the server window.

The post Starting MongoDB Database Software appeared first on Percona Database Performance Blog.

by Adamo Tonete at April 10, 2018 12:13 AM

April 09, 2018

Peter Zaitsev

MongoDB Sharding: Are Chunks Balanced (Part 1)?

MongoDB Sharding

MongoDB ShardingIn this blog post, we will look at how chunks split and migrate in MongoDB sharding.

Sometimes even using good shard keys can create imbalanced chunks. Below, we’ll look at how jumbo chunks affect MongoDB sharding, how they are created and why they need to split into smaller chunks. We will also walk through the introduction of some tools you can use to find and split those chunks and help the balancer migrate those split chunks across shards.

Before we start, please check that the balancer is running:

mongos> sh.getBalancerState()
true

The most important consideration for sharding collections and distributing data efficiently across shards is the selection of a shard key. Or to be more specific, the selection of a good shard key, because MongoDB uses these ranges of shard key values and partition data in the collection and it is associated with a chunk.

What is a good shard key?

A good shard key enables MongoDB to distribute documents evenly throughout shards. A key that has high cardinality for better horizontal scaling and low frequency to prevent uneven document distribution, and does not increase or decrease monotonically is considered a good shard key.

Ok, I have a good shard key, but my chunks are still not balanced

If your chunks are not balanced, instead of using a good shard key, check for the jumbo chunks. This is one possible reason preventing the balancer from migrating those chunks, which leads to an uneven distribution of chunks across the shards — and ultimately to performance issues.

What are jumbo chunks?

MongoDB splits chunks when they increase beyond the configured chunk size (i.e., 64 MB) or exceeds 250000 documents. Chunk sizes are configurable, and you can change them per requirements. The balancer migrates these split chunks between shards and achieves equal distribution per shard.

Sometimes chunks cannot be broken up and continue to grow beyond the configured size. The balancer cannot move it. These chunks remain on the particular shard and are called jumbo chunks.

How can I figure out if my shard has jumbo chunks or not?

If MongoDB was not able to split the chunks that exceed its max chunk size or max number of documents, then those chunks are marked as “jumbo”.

These can be found by checking the shard status:

sh.status(true)
----
  unique: false
  balancing: true
  chunks:
     shard0000	2
     shard0001	2
             { "username" : { "$minKey" : 1 } } -->> { "username" : NumberLong("-4611686018427387902") } on : shard0000 Timestamp(2, 2)
             { "username" : NumberLong("-4611686018427387902") } -->> { "username" : NumberLong(0) } on : shard0000 Timestamp(2, 3)

It should have a flag, but we can’t see one because mongos sometimes has not moved a chunk yet and is unaware it’s a jumbo chunk.

Let’s try to remove a shard in this same example. In this process, mongos has to move the chunk. Let’s see what happen,

mongos> db.adminCommand( { removeShard: "shard0000" } )
{
	"msg" : "draining started successfully",
	"state" : "started",
	"shard" : "shard0000",
	"note" : "you need to drop or movePrimary these databases",
	"dbsToMove" : [
		"jumbo_t"
	],
	"ok" : 1
}

It started moving the chunks, as it is the primary shard. I have already move database “jumbo_t” to another shard, and draining of chunks is taking a long time. We have to figure out what is wrong:

mongos> db.chunks.find({"shard" : "shard0000"},{"shard":1,"jumbo":1}).pretty()
{
	"_id" : "jumbo_t.col-username_-4611686018427387902",
	"shard" : "shard0000",
	"jumbo" : true
}

Jumbo found! Now at this time, mongos is aware that it has a jumbo chunk.

Let’s see what the output of same sh.status() is now:

mongos> sh.status(true)
chunks:
      shard0000	1
      shard0001	3
      { "username" : { "$minKey" : 1 } } -->> { "username" : NumberLong("-4611686018427387902") } on : shard0001 Timestamp(3, 0)
      { "username" : NumberLong("-4611686018427387902") } -->> { "username" : NumberLong(0) } on : shard0000 Timestamp(3, 1) jumbo

Please note, only one chunk is moved to the other shard, while the other is not. The balancer can’t move that, and it is flagged as “jumbo”. So sometimes when moving a chunk is triggered, the mongos will mark a large chunk as “jumbo”.

Ok, I have Jumbo chunks in my shard, but why should I bother as all my chunks are distributed fairly?

These jumbo chunks are inconvenient to deal with, and are the possible causes of performance degradation in a sharded environment.

Let’s look at an example. There are two shards: shard1 and shard2. shard1 has jumbo chunks and all the writes are routing to shard1. mongos will try to balance the number of chunks evenly between the shards. But the chunks that can be migrated by the balancer are only non-jumbo chunks

A common misconception about the balancer is that it balances the chunks by data size. This isn’t true. It just balances the number of chunks when a particular shard reaches maximum thershold counts (that’s why you see chunks equally distributed). Hence a chunk with 0 documents in it counts just the same as one with 500k documents.

Now, let’s get back to the example. If jumbo chunks are created on the shard1, that means the chunks are more than 64MB of size. shard1 fills up faster compared to shard2, even though the number of chunks is balanced.

More data to shard1 leads to more queries route to shard1 as compare to shard2. This causes one shard with a higher load compared to the other, and leads to performance issues. Load balancing concepts aren’t correctly applied in the case of jumbo chunks.

Let’s consider one more example. Chunks are distributed among the shards equally, but we can check the chunk size and document details specific to the collections:

mongos> db.col.getShardDistribution()
Shard shard0000 at 127.0.0.1:27003
 data : 257.98MiB docs : 5698065 chunks : 2
 estimated data per chunk : 128.99MiB
 estimated docs per chunk : 2849032
Shard shard0001 at 127.0.0.1:27004
 data : 0B docs : 0 chunks : 2
 estimated data per chunk : 0B
 estimated docs per chunk : 0
Totals
 data : 257.98MiB docs : 5698065 chunks : 4
 Shard shard0000 contains 100% data, 100% docs in cluster, avg obj size on shard : 47B
 Shard shard0001 contains 0% data, 0% docs in cluster, avg obj size on shard : NaNGiB

In the above example, you can see shard distribution specific to collection “col”. It says two chunks in each shard (shard0000 and shard0001), and shard0001 has no data while shard0000 has all the data, 129MB each chunk. Here the two chunks are on each shard because range-based sharding is being used. MongoDB allocated two chunks for each shard initially, then documents are allocated to these chunks.

How are jumbos created?

Let’s figure out how jumbos are created, and what makes them turn into non-splitting chunks that grow beyond a reasonable size.

  • The main reason for jumbos is multiple mongos, or restarting mongos regularly. This causes the splitIfShould not to be called enough, and prevents chunks from splitting. The balancer won’t be able to move it.
  • Each mongos measures how much data it has seen inserted or updated for each chunk. With each write, a call to ShouldSplit is made by sending an internal command “splitVector” to the primary that owns the chunks. If mongos is restarted, it loses this memory. If mongo services restarts are frequent, it leads to many chunks that could split, but don’t. This causes chunk imbalance across the shards.

Are these jumbos curable? How do I prevent them?

Yes, these can be fixed by performing a manual split, using the “split” command. These chunks can be split into smaller pieces and easily moved by the balancer. For more specific information on how to manually use the splitAt() and splitFind() commands, please refer this blog post written by Miguel Angel Nieto.

There are some very useful tools for chunk management that will save your day, as written by David Murphy (our MongoDB practice manager@Percona). ChunkManager.py iteratively calls ChunkHunter.py to find splittable chunks, and ChunkSplitter.py uses ChunkHunters output to split them for you. This allows the balancer to then correctly work.

For more Tools and commands, please refer here: ChunkMangement tools.

If no jumbo or splittable chunks are found after using these tools, then its time to optimize the shard key with all the required parameters, concerning the document fields.

I cannot see any jumbo chunks, and chunks are distributed evenly in each shard but of different sizes

Chunk distribution is fairly equal among the shards, but some documents are different in chunks and that does not really cause performance issues. The data isn’t equal because of two possible reasons: the differences in the document sizes and the range of the documents to which it belongs and will reside in particular chunk.

As we discussed, the balancer just balances the number of chunks (not based on the size).

Let’s consider a test case for two shards with shard keys (custId and prodId), and range-based sharding is used. Chunks are split based on the range used, so the number of documents can be varied as per insertion. This can lead to the difference in the chunks’ sizes:

shard1: The number of documents against the same custId can be different, as well as the size of the documents:

{ "custId" : 10, "prodId" : "40412702" } -->> { "custId" : 10, "prodId" : "40423398" } on : rs0 Timestamp(5203, 16)
 { "custId" : 10, "prodId" : "40423398" } -->> { "custId" : 10, "prodId" : "40439934" } on : rs0 Timestamp(7331, 16)
 { "custId" : 10, "prodId" : "40439934" } -->> { "custId" : 10, "prodId" : "42823107" } on : rs0 Timestamp(8447, 6)
 { "custId" : 10, "prodId" : "42823107" } -->> { "custId" : 10, "prodId" : "42835784" } on : rs0 Timestamp(8447, 8)

shard2: Here there are more documents against the same custId, and the size of the documents also might vary:

{ "custId" : 14, "prodId" : "2250759" } -->> { "custId" : 14, "prodId" : "2375613" } on : rs2 Timestamp(1366, 32)
 { "custId" : 14, "prodId" : "2375613" } -->> { "custId" : 14, "prodId" : "2499723" } on : rs2 Timestamp(1376, 8)
 { "custId" : 14, "prodId" : "2499723" } -->> { "custId" : 14, "prodId" : "2760169" } on : rs2 Timestamp(1402, 2)
 { "custId" : 14, "prodId" : "2760169" } -->> { "custId" : 14, "prodId" : "3143381" } on : rs2 Timestamp(1657, 14)
 { "custId" : 14, "prodId" : "3143381" } -->> { "custId" : 14, "prodId" : "3715177" } on : rs2 Timestamp(1696, 2)
 { "custId" : 14, "prodId" : "3715177" } -->> { "custId" : 14, "prodId" : "3793003" } on : rs2 Timestamp(1696, 4)
 { "custId" : 14, "prodId" : "3793003" } -->> { "custId" : 14, "prodId" : "3807169" } on : rs2 Timestamp(1778, 2)

Hashed sharding is considered good for shard keys with fields that change monotonically. If you need the data to be exactly split among shards, then a hashed index must be used. For details on ranged-based and hashed-based sharding, please check here under the heading “Hashed vs. Ranged Sharding”.

I hope this blog helps you understand the possible causes of uneven distribution of chunks in MongoDB sharding, and how and when these chunks are eligible for split and migration. This leads to jumbo chunks that need to be further split and migrated, and can be sorted with some chunk management tools. You also need to understand how range-based sharding makes chunks size different, as well as when to use hashed- and range-based sharding as per the requirements, as well as when to correctly optimize the shard key.

The post MongoDB Sharding: Are Chunks Balanced (Part 1)? appeared first on Percona Database Performance Blog.

by Aayushi Mangal at April 09, 2018 11:40 PM

April 07, 2018

Peter Zaitsev

Free, Fast MongoDB Hot Backup with Percona Server for MongoDB

MongoDB Hot Backups

In this blog post, we will discuss the MongoDB Hot Backup feature in Percona Server for MongoDB and how it can help you get a safe backup of your data with minimal impact.

Percona Server for MongoDB

Percona Server for MongoDB is Percona’s open-source fork of MongoDB, aimed at having 100% feature compatibility (much like our MySQL fork).MongoDB Hot Backup We have added a few extra features to our fork for free that are only available with MongoDB Enterprise binaries for an additional fee.

The feature pertinent to this article is our free, open-source Hot Backup feature for WiredTiger and RocksDB, only available in Percona Server for MongoDB.

Essentially, this Hot Backup feature adds a MongoDB server command that creates a full binary backup of your data set to a new directory with no locking or impact to the database, aside from some increased resource usage due to copying the data.

It’s important to note these backups are binary-level backups, not logical backups (such as mongodump would produce).

Logical vs. Binary Backups

Before the concept of a MongoDB Hot Backup, the only way to backup a MongoDB instance, cluster or replica set was using the logical backup tool ‘mongodump’, or using block-device (binary) snapshots.

A “binary-level” backup means backup data contains the data files (WiredTiger or RocksDB) that MongoDB stores on disk. This is different from the BSON representation of the data that ‘mongodump’ (a “logical” backup tool) outputs.

Binary-level backups are generally faster than logical because a logical backup (mongodump) requires the server to read all data and return it to the MongoDB Client API. Once received, ‘mongodump’ serializes the payload into .bson files on disk. Important areas like indices are not backed up in full, merely the metadata describing the index is backed up. On restore, the entire process is reversed: ‘mongorestore’ must deserialize the data created by ‘mongodump’, send it over the MongoDB Client API to the server, then the server’s storage engine must translate this into data files on disk. Due to only metadata of indices being backed up in this approach, all indices must be recreated at restore time. This is a serial operation for each collection! I have personally restored several databases where the majority of the restore time was the index rebuilds and NOT the actual restore of the raw data!

In contrast, binary-level backups are much simpler: the storage-engine representation of the data is what is backed up. This style of backup includes the real index data, meaning a restore is as simple as copying the backup directory to be in the location of the MongoDB dbPath and restarting MongoDB. No recreation of indices is necessary! For very large datasets, this can be a massive win. Hours or even days of restore time can be saved due to this efficiency.

Of course, there are always some tradeoffs. Binary-level backups can take a bit more space on disk and care must be taken to ensure the files are restored to the right version of MongoDB on matching CPU architecture. Generally, backing up the MongoDB Configuration File and version number with your backups addresses this concern.

‘createBackup’ Command

The Hot Backup feature is triggered by a simple admin command via the ‘mongo’ shell named ‘createBackup’. This command requires only one input: the path to output the backup to, named ‘backupDir’. This backup directory must not exist or an error is returned.

If you have MongoDB Authorization enabled (I hope you do!), this command requires the built-in role: ‘backup’ or a role that inherits the “backup” role.

An example in the  ‘mongo’ shell:

> db.adminCommand({
    createBackup: 1,
    backupDir: "/data/backup27017"
  })
{ "ok" : 1 }

When this command returns an “ok”, a full backup is available to be read at the location specified in ‘backupDir’. This end-result is similar to using block-device snapshots such as LVM snapshots, however with less overhead vs. the 10-30% write degradation many users report at LVM snapshot time.

This backup directory can be deleted with a regular UNIX/Linux “rm -rf ...” command once it is no longer required. A typical deployment archives this directory and/or upload the backup to a remote location (Rsync, NFS, AWS S3, Google Cloud Storage, etc.) before removing the backup directory.

WiredTiger vs. RocksDB Hot Backups

Although the ‘createBackup’ command is syntactically the same for WiredTiger and RocksDB, there are some big differences in the implementation of the backup behind-the-scenes.

RocksDB is a storage engine available in Percona Server for MongoDB that uses a level-tiered compaction strategy that is highly write-optimized. As RocksDB uses immutable (write-once) files on disk, it can provide much more efficient backups of the database by using filesystem “hardlinks” to a single inode on disk. This is important to know for large data sets as this requires exponentially less overhead to create a backup.

If your RocksDB-based server ‘createBackup’ command uses an output path that is on the same disk volume as the MongoDB dbPath (very important requirement), hardlinks are used instead of copying most of the database data! If only 5% of the data changes during backup, only 5% of data is duplicated/copied. This makes backups potentially much faster than WiredTiger, which needs to make a full copy of the data and use two-times as much disk space as a result.

Here is an example of a ‘createBackup’ command on a RocksDB-based mongod instance that uses ‘/data/mongodb27017’ as a dbPath:

$ mongo --port=27017
test1:PRIMARY> db.adminCommand({
    createBackup: 1,
    backupDir: "/data/backup27017.rocksdb"
})
{ "ok" : 1 }
test1:PRIMARY> quit()

Seeing we received { “ok”: 1 }, the backup is ready at our output path. Let’s see:

$ cd /data/backup27017.rocksdb
$ ls -alh
total 4.0K
drwxrwxr-x. 3 tim tim  36 Mar  6 15:25 .
drwxr-xr-x. 9 tim tim 147 Mar  6 15:25 ..
drwxr-xr-x. 2 tim tim 138 Mar  6 15:25 db
-rw-rw-r--. 1 tim tim  77 Mar  6 15:25 storage.bson
$ cd db
$ ls -alh
total 92K
drwxr-xr-x. 2 tim tim  138 Mar  6 15:25 .
drwxrwxr-x. 3 tim tim   36 Mar  6 15:25 ..
-rw-r--r--. 2 tim tim 6.4K Mar  6 15:21 000013.sst
-rw-r--r--. 2 tim tim  18K Mar  6 15:22 000015.sst
-rw-r--r--. 2 tim tim  36K Mar  6 15:25 000017.sst
-rw-r--r--. 2 tim tim  12K Mar  6 15:25 000019.sst
-rw-r--r--. 1 tim tim   16 Mar  6 15:25 CURRENT
-rw-r--r--. 1 tim tim  742 Mar  6 15:25 MANIFEST-000008
-rw-r--r--. 1 tim tim 4.1K Mar  6 15:25 OPTIONS-000005

Inside the RocksDB ‘db’ subdirectory we can see .sst files containing the data are there! As this MongoDB instance stores data on the same disk at ‘/data/mongod27017’, let’s prove that RocksDB created a “hardlink” instead of a full copy of the data.

First, we get the Inode number of an example .sst file using the ‘stat’ command. I chose the RocksDB data file: ‘000013.sst’:

$ stat 000013.sst
  File: ‘000013.sst’
  Size: 6501      	Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d	Inode: 33556899    Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1000/     tim)   Gid: ( 1000/     tim)
Context: unconfined_u:object_r:unlabeled_t:s0
Access: 2018-03-06 15:21:10.735310581 +0200
Modify: 2018-03-06 15:21:10.738310479 +0200
Change: 2018-03-06 15:25:56.778556981 +0200
 Birth: -

Notice the Inode number for this file is 33556899. After the ‘find’ command can be used to find all files pointing to Inode 33556899 on /data:

$ find /data -inum 33556899
/data/mongod27017/db/000013.sst
/data/backup27017.rocksdb/db/000013.sst

Using the ‘-inum’ (Inode Number) flag of find, here we can see .sst files in both the live MongoDB instance (/data/mongodb27017) and the backup (/data/backup27017.rocksdb) are pointing to the same inode on disk for their ‘000013.sst’ file, meaning this file was NOT duplicated or copied during the hot backup process. Only metadata was written to point to the same inode! Now, imagine this file was 1TB+ and this becomes very impressive!

Restore Time

MongoDB Hot BackupIt bears repeating that restoring a logical, mongodump-based backup is very slow; Indices are rebuilt in serial for each collection and both mongorestore and the server need to spend time translating data from logical to binary representations.

Sadly, it is extremely rare that backup restore times are tested and I’ve seen large users of MongoDB disappointed to find that logical-backup restores will take several hours, an entire day or longer while their Production is on fire.

Thankfully, binary-level backups are very easy to restore: the backup directory needs to be copied to the location of the (stopped) MongoDB instance dbPath and then the instance just needs to be started with the same configuration file and version of MongoDB. No indices are rebuilt and there is no time spent rebuilding the data files from logical representations!

Percona-Labs/mongodb_consistent_backup Support

We have plans for our Percona-Labs/mongodb_consistent_backup tool to support the ‘createBackup’/binary-level method of backups in the future. See more about this tool in this Percona Blog post: https://www.percona.com/blog/2017/05/10/percona-lab-mongodb_consistent_backup-1-0-release-explained.

Currently, this project supports cluster-consistent backups using ‘mongodump’ (logical backups only), which are very time consuming for large systems.

Support for ‘createBackup’ in this tool greatly reduces the overhead and time required for backups of clusters, but it requires some added complexity to support remote filesystems it would use.

Conclusion

As this article outlines, there are a lot of exciting developments that have reduced the impact of taking backups of your systems. More important than the backup efficiency is the time it takes to recover your important data from backup, an area where “hot” binary backups are the clear winner by leaps and bounds.

If MongoDB hot backup and topics like this interest you, please see the document below, we are hiring!

{
  hiring: true,
  role: "Consultant",
  tech: "MongoDB",
  location: "USA",
  moreInfo: "https://www.percona.com/about-percona/careers/mongodb-consultant-usa-based"
}

The post Free, Fast MongoDB Hot Backup with Percona Server for MongoDB appeared first on Percona Database Performance Blog.

by Tim Vaillancourt at April 07, 2018 12:59 AM

April 06, 2018

Peter Zaitsev

How to Handle pt-table-checksum Errors

pt-table-checksum Errors

pt-table-checksum ErrorsIn this blog post, we’ll look at how to approach pt-table-checksum errors.

pt-table-checksum is one of the most popular tools in Percona Toolkit, and it is widely used to identify data differences between masters and slaves. Therefore, as Percona Support Engineers we have customers often asking questions related to the pt-table-checksum errors and warnings produced. Below are the most common issues raised with pt-table-checksum, and we decided to address those issues to help with how to mitigate related warnings or errors.

Unable to detect slaves

Cannot connect to h=127.0.0.1,p=...,u=percona
Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.

It’s possible that the tool cannot connect to the slaves due to not specific enough information found on the master. By default, it is looking for slaves based on the replica threads visible in master’s processlist. This could be the problem if, for example, the slave’s MySQL runs with a different TCP port, the hostname is not resolved correctly or both the master and slave are on the same host, or this is Galera-based replication. In this case, there is –recursion-method option to try with different discovery methods: ‘hosts’ or ‘cluster’. And if all of them fail, you can specify each slave details manually using the ‘dsn’ method.

An example using this option for the cluster looks like this:

# pt-table-checksum --user=root --password=*** --databases="db1" --recursion-method=cluster 192.168.88.82
Checking if all tables can be checksummed ...
Starting checksum ...
Not checking replica lag on pxc02 because it is a cluster node.
Not checking replica lag on pxc03 because it is a cluster node.
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
03-03T00:24:13 0 0 12 1 0 0.033 db1.t1
03-03T00:24:13 0 0 4 1 0 0.031 db1.t2

and when a DSN is needed (like for mysqlsandbox instances), we have to add the slave(s) details to the table, similar to below:

master [localhost] {msandbox} ((none)) > create table percona.dsns (id int(11) NOT NULL AUTO_INCREMENT,parent_id int(11) DEFAULT NULL,dsn varchar(255) NOT NULL,PRIMARY KEY (id));
Query OK, 0 rows affected (0.08 sec)
master [localhost] {msandbox} ((none)) > insert into percona.dsns values (null,null,"h=localhost,S=/tmp/mysql_sandbox20997.sock");
Query OK, 1 row affected (0.03 sec)

$ pt-table-checksum --databases="test" --tables="s1"  --recursion-method=dsn=localhost,D=percona,t=dsns u=root,p=msandbox,h=localhost,S=/tmp/mysql_sandbox20996.sock
Checking if all tables can be checksummed ...
Starting checksum ...
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
03-19T14:16:05 0 1 0 1 0 0.344 test.s1

ROW format on slave

Replica slave1.myorg.com has binlog_format ROW which could cause pt-table-checksum to break replication. Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation. If you understand the risks, specify --no-check-binlog-format to disable this check.

The problem is that second and next level replicas (in chain replication topology) will not calculate the diffs as expected. So this message warns that the slave is using binlog_format=ROW, as the tool needs STATEMENT format to calculate the diffs separately on the slave and master. This is done by replicating the command (e.g., INSERT INTO percona.checksum SELECT CRC32 …. WHERE … ) as the original statement, not as a row copy of CRC values already computed on the master. And that is possible as the tool sets the binlog_format=STATEMENT in its session. This session setting does not propagate further into the slave’s own binary log though. This is not a problem when all the slaves are replicating directly from the master, and in such cases we can ignore that message and use the –no-check-binlog-format option.

By the way, the warning message is misleading regarding breaking replication claim, hence the bug reported.

Unable to switch session binlog_format to STATEMENT

# pt-table-checksum --user=root --password=cmon --databases="db1" --recursion-method=cluster 192.168.88.82
03-02T23:54:50 Failed to /*!50108 SET @@binlog_format := 'STATEMENT'*/: DBD::mysql::db do failed: Percona-XtraDB-Cluster prohibits setting binlog_format to STATEMENT or MIXED with pxc_strict_mode = ENFORCING or MASTER [for Statement "/*!50108 SET @@binlog_format := 'STATEMENT'*/"] at /bin/pt-table-checksum line 10064.
This tool requires binlog_format=STATEMENT, but the current binlog_format is set to ROW and an error occurred while attempting to change it. If running MySQL 5.1.29 or newer, setting binlog_format requires the SUPER privilege. You will need to manually set binlog_format to 'STATEMENT' before running this tool.

or:

$ pt-table-checksum -h przemek-aurora57.xxx.rds.amazonaws.com -u przemek -p xxx --databases="test"
02-19T12:51:01 Failed to /!50108 SET @@binlog_format := 'STATEMENT'/: DBD::mysql::db do failed: Access denied; you need (at least one of) the SUPER privilege(s) for this operation for Statement "/*!50108 SET @@binlog_format := 'STATEMENT'*/" at /usr/bin/pt-table-checksum line 10023.
This tool requires binlog_format=STATEMENT, but the current binlog_format is set to ROW and an error occurred while attempting to change it. If running MySQL 5.1.29 or newer, setting binlog_format requires the SUPER privilege. You will need to manually set binlog_format to 'STATEMENT' before running this tool.

This can be an issue if STATEMENT mode is unsupported in the MySQL variant or special edition of it – Amazon RDS for example, or when switching is prohibited either by lack of SUPER privilege (limitation for Amazon Aurora), or Percona XtraDB Cluster Strict Mode safety precaution as seen on the example above. To workaround it in Percona XtraDB Cluster, temporarily relaxing the strict mode (be careful as this may be dangerous) will work:

pxc01 > set global pxc_strict_mode="permissive";
Query OK, 0 rows affected (0.00 sec)

For Aurora though (only in case asynchronous replication is used between Aurora clusters or from Aurora to non-Aurora MySQL), you will have to change the binlog_format globally to STATEMENT using the option groups.

Too large chunk size or no good index

Cannot checksum table db_name.table_name: There is no good index and the table is oversized. at /usr/bin/pt-table-checksum line 6662.

or

Skipping table because on the master it would be checksummed in one chunk but on these replicas it has too many rows:
xxxxx rows on db_name.table_name
The current chunk size limit is xxxxx rows (chunk size=xxxx * chunk size limit=5).

Instead of examining each table with a single big query, the pt-table-checksum splits tables into chunks to ensure that the checksum is non-intrusive and doesn’t cause too much replication lag or load on the server. To create these chunks, it needs an index of some sort (preferably a primary key or unique index). If there is no index, and the table contains a suitably small number of rows, the tool tries to checksum the table in a single chunk.

Skipping the table, as in the second message example, is a common issue with pt-table-checksum and can be caused by different/outdated table statistics on the master or slave side. To alleviate this issue, make sure all your tables contain a primary or unique key. pt-table-checksum requires that to divide a table into chunks effectively. We also suggest that you make sure these messages are not related to real differences in this table (maybe a row count is significantly different). Also, executing pt-table-checksum with PTDEBUG is a good idea as it captures a lot of debugging info and it provides better insight into what is causing the issue.

There can be some random skipping of tables across many tool runs, and it’s probably because of a mix of two variables. One of it is innodb_stats_on_metadata. Turn it off, at least during the checksum running, such that InnoDB index stats won’t change so often. We remind you it’s a dynamic variable, which means you can change it without MySQL server restart. On the other hand, if constant statistics change for a table (even though the innodb_stats_on_metadata=0, statistics change with each significant amount of writes) is a problem, you may want to disable it for the duration of checksum. Check innodb_stats_auto_update option in Percona Server for MySQL for details.

pt-table-checksum uses an EXPLAIN query to determine the number of rows in the chunk, so ever-changing table statistics is most likely the reason for skipped tables. This is where pt-table-checksum decides to skip a chunk or not. This avoids the scenario that a table has fewer rows on the master but many on a replica, and is checksummed in a single large query, which causes a very long delay in replication. This is also affected by –chunk-size-limit, which defaults to 2. Try setting up higher chunk-size-limit or chunk-time so that pt-table-checksum allows larger chunks, but do it during off-peak periods. Of course, allowing too big of a chunk makes the server suffer for heavy selects, and slave lag may also be a problem while –chunk-time adjusts the chunk size dynamically so that the checksum query executes in a defined amount of time.

For tables that can’t be chunked and must be checksummed in a single run, the chunk size should be sufficiently large, and sometimes is not enough. That’s where the chunk-size-limit comes into play. The –chunk-size-limit modifier is a multiplier for chunk-size and allows larger chunks. To make sure your server is not heavily loaded, you can set a threshold at which pt-table-checksum pauses itself. This can be done by using –-max-load parameter of pt-table-checksum so, in this way –chunk-time and –chunk-size-limit won’t noticeably impact your server. We would suggest to start with default value –chunk-size-limit and increase it gradually till it succeeds. High values of –chunk-size-limit guarantee higher rates of successful runs, but there’s no way to tell if it will always be successful because the number of rows processed is only an estimate. It’s worth mentioning that you can also try running ANALYZE TABLE on “skipped tables” before running pt-table-checksum to make sure statistics are up to date. This may help or may not help, as statistics are estimated and it still might not be inaccurate.

Also, scripting retries of skipped chunks can be a good approach. You can redirect the pt-table-checksum output to a log file and parse that log to find out which tables need to be re-tried separately. You can do many re-tries for a single table if necessary, and the checksum result for a particular table in the checksums table gets overwritten without affecting other results.

All the problems described above will not take place when a table has a primary key on auto_increment int column.

Suboptimal query plan

Skipping chunk 1 of db_name.table_name because MySQL used only 3 bytes of the PRIMARY index instead of 9. See the --[no]check-plan documentation for more information.

The tool uses several heuristics to determine whether an execution plan is good or bad. The first is whether EXPLAIN reports that MySQL intends to use the desired index to access the rows. If MySQL chooses a different index, the tool considers the query unsafe. The tool also checks how much of the index MySQL reports that it uses for the query. The EXPLAIN output shows this in the key_len column. The tool remembers the largest key_len seen, and skips chunks where MySQL reports that it uses a smaller prefix of the index. However, it stretches the overall time to run checksum as it runs several heuristics to decide whether execution path is good or bad. This helps to decide the chunk. By default, –check-plan is on. It can bring a little bit of additional load to the server, but if that’s the case you can always monitor the checksum progress during execution and cancel pt-table-checksum at any moment if necessary. In general, it’s good to keep it enabled. Further, it’s best to run pt-table-checksum during low database traffic time.

To deal with the above error, disable the feature by using –no-check-plan when you get one “Skipping chunk” error. The only drawback of using it is leaving the door open for possible (costly) table scans.

Missing or filtered tables on the slave

Error checksumming table test.dummy: Error getting row count estimate of table test.dummy on replica centos1.bm.int.percona.com: DBD::mysql::db  selectrow_hashref failed: Table 'test.dummy' doesn't exist [for Statement "EXPLAIN SELECT * FROM `test`.`dummy` WHERE 1=1"] at pt-table-checksum line 6607.

This above error is clear that table test.dummy exists on the master but is missing on the slave server. This usually occurs with replication filters. pt-table-checksum failed because test.dummy checksummed on the master while failed on replica to checksum. This can be easily reproduced as per the below example:

mysql> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.3.164
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 704
               Relay_Log_File: centos1-relay-bin.000002
                Relay_Log_Pos: 684
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB: test
[root@slave1]# perl pt-table-checksum --empty-replicate-table --no-check-replication-filters --replicate=percona.checksums --ignore-databases mysql h=localhost,u=checksum_user,p=checksum_password
02-04T03:14:07 Skipping table test.dummy because it has problems on these replicas:
Table test.dummy does not exist on replica slave1
This can break replication.  If you understand the risks, specify --no-check-slave-tables to disable this check.
02-04T03:14:07 Error checksumming table test.dummy: Error getting row count estimate of table test.dummy on replica slave1: DBD::mysql::db selectrow_hashref failed:
Table 'test.dummy' doesn't exist [for Statement "EXPLAIN SELECT * FROM `test`.`dummy` WHERE 1=1"] at pt-table-checksum line 6607.

As per the above example, the ‘test’ database is ignored to replicate via replication filter Replicate_Ignore_DB, which means any updates on that database will not fall to slave.

Waiting to check replicas for differences:   0% 00:00 remain
Waiting to check replicas for differences:   0% 00:00 remain
.
Waiting to check replicas for differences:   0% 00:00 remain
Waiting to check replicas for differences:   0% 00:00 remain
.

That is actually not an error, but it means that pt-table-checksum is waiting on replicas to run checksum queries. We have customers reporting that the tool runs forever and never came out from “Waiting to check replicas for differences”.  We noticed this problem occurs when database tables exist on replicas but are ignored by replication filters. Because pt-table-checksum checksums each chunk with an INSERT/REPLACE…SELECT query, and those queries from the master never fall to replicas via replication because the tables in question are blocked by replication filters. So the tool waits forever to check the checksum result on replicas, which will never happen. To remedy this issue, use the –ignore-databases or –ignore-tables option to ignore filtered tables from the checksum process.

Replication filters can bring unexpected issues as the last two warnings/errors demonstrated.

Conclusion

pt-table-checksum is a robust tool that validates data between master/slaves in a replication environment. However, in some scenarios the task can be quite challenging. Fortunately, there are options to deal with these obstacles. Some, however, involve not only using specific options for the tool, but also properly (re-)designing your schema. A proper primary key may not only allow the tool to work much faster, less expensive, but sometimes to work at all.

The post How to Handle pt-table-checksum Errors appeared first on Percona Database Performance Blog.

by Przemysław Malkowski at April 06, 2018 05:22 PM

Percona Live Europe 2018 – Save the Date!

Percona Live Europe 2018

Percona Live Europe 2018We’ve been searching for a great venue for Percona Live Europe 2018, and I am thrilled to announce we’ll be hosting it in Frankfurt, Germany! Please block November 5-7, 2018 on your calendar now and plan to join us at the Radisson Blu Frankfurt for the premier open source database conference.

We’re in the final days of organizing for the Percona Live 2018 in Santa Clara. You can still purchase tickets for an amazing lineup of keynote speakers, tutorials and sessions. We have ten tracks, including MySQL, MongoDB, Cloud, PostgreSQL, Containers and Automation, Monitoring and Ops, and Database Security. Major areas of focus at the conference will include:

  • Database operations and automation at scale, featuring speakers from Facebook, Slack, Github and more
  • Databases in the cloud – how database-as-a-service (DBaaS) is changing the DB landscape, featuring speakers from AWS, Microsoft, Alibaba and more
  • Security and compliance – how GDPR and other government regulations are changing the way we manage databases, featuring speakers from Fastly, Facebook, Pythian, Percona and more
  • Bridging the gap between developers and DBAs – finding common ground, featuring speakers from Square, Oracle, Percona and more

The Call for Papers for Percona Live Europe will open soon. We look forward to seeing you in Santa Clara!

The post Percona Live Europe 2018 – Save the Date! appeared first on Percona Database Performance Blog.

by Laurie Coffin at April 06, 2018 02:53 AM

April 05, 2018

Peter Zaitsev

Managing MongoDB Bulk Deletes and Inserts with Minimal Impact to Production Environments

MongoDB bulk deletes and inserts

MongoDB bulk deletes and insertsIn this blog post, we’ll look at how to manage MongoDB bulk deletes and inserts with little impact on production traffic.

If you are like me, there is no end to the demands placed on you as a DBA. One of the biggest is when we want to load X% more data into the database, during peak traffic no less. I refer to this as MongoDB bulk deletes and inserts. As a DBA, my first reaction is “no, do this during off-peak hours.” However, the business person in me says what if this is due to clients loading a customer, product, or email list into the system for work during business hours. That puts it into another light, does it not?

This raises the question of how can we change data in the database as fast as possible while also trying to give the production system some breathing room. In this blog, I wanted to give you some nice scripts that you can load into your MongoDB shell to really simplify the process.

First, we will cover an iterative delete function that can be stopped and restarted at any time. Next, I will talk about smart updating with similarly planned overhead. Lastly, I want to talk about more advanced forms of health checking when you want to do something a bit smarter than where this basic series of scripts stop.

Bulk Deleting with a Plan

In this code, you can see there are a couple of ways to manage these deletes. Specifically, you can see how to call this from anywhere (deleteFromCollection). I’ve also shown how to extend the shell so you can call (db.collection.deleteBulk). This avoids the need to provide the namespace, as it can discover that from the context of the function.

The idea behind this function is pretty straightforward: you provide it with a find pattern for what you want to delete. This could be { } if you don’t want to restrict it, but you should use .drop() in that case. After that, it expects a batch size, which is the number of document ID’s to use to drop in a single go. There is a trade-off between more deletes with more iterations or more with fewer iterations. Keep in mind this means there are 1000 of oplog entries per batch (also in my examples). You should consider this carefully and watch your oplog range as a result. You could improve this to allow someone to check that size, but it requires more permissions (we’ll leave that discussion for another time). Finally, between batches, the pauseNS sleeps for that duration.

If you find that the overhead is too much for you, simply kill the shell running this and it will stop running. You can then reduce the batch, increase the pause, or both, to make the system handle the change better. Sadly, this is not an exact science as some people have different behaviors they consider an “acceptable” from an impact perspective with so many writes. We will talk about this more in a bit:

function parseNS(ns){
    //Expects we are forcing people to not violate the rules and not doing "foodb.foocollection.month.day.year" if they do they need to use an array.
    if (ns instanceof Array){
        database =  ns[0];
        collection = ns[1];
    }
    else{
        tNS =  ns.split(".");
        if (tNS.length > 2){
            print('ERROR: NS had more than 1 period in it, please pass as an [ "dbname","coll.name.with.dots"] !');
            return false;
        }
        database = tNS[0];
        collection = tNS[1];
    }
    return {database: database,collection: collection};
}
DBCollection.prototype.deleteBulk = function( query, batchSize, pauseMS){
    //Parse and check namespaces
    ns = this.getFullName();
    srcNS={
        database:   ns.split(".")[0],
        collection: ns.split(".").slice(1,ns.length).join("."),
    };
    var db = this._db;
    var batchBucket = new Array();
    var totalToProcess = db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).find(query,{_id:1}).count();
    if (totalToProcess < batchSize){ batchSize = totalToProcess; }
    currentCount = 0;
    print("Processed "+currentCount+"/"+totalToProcess+"...");
    db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).find(query).addOption(DBQuery.Option.noTimeout).forEach(function(doc){
        batchBucket.push(doc._id);
        if ( batchBucket.length >= batchSize){
            printjson(db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).remove({_id : { "$in" : batchBucket}}));
            currentCount += batchBucket.length;
            batchBucket = [];
            sleep (pauseMS);
            print("Processed "+currentCount+"/"+totalToProcess+"...");
        }
    })
    print("Completed");
}
function deleteFromCollection( sourceNS, query, batchSize, pauseMS){
    //Parse and check namespaces
    srcNS = parseNS(sourceNS);
    if (srcNS == false) { return false; }
    batchBucket = new Array();
    totalToProcess = db.getDB(srcNS.database).getCollection(srcNS.collection).find(query,{_id:1}).count();
    if (totalToProcess < batchSize){ batchSize = totalToProcess};
    currentCount = 0;
    print("Processed "+currentCount+"/"+totalToProcess+"...");
    db.getDB(srcNS.database).getCollection(srcNS.collection).find(query).addOption(DBQuery.Option.noTimeout).forEach(function(doc){
        batchBucket.push(doc._id);
        if ( batchBucket.length >= batchSize){
            db.getDB(srcNS.database).getCollection(srcNS.collection).remove({_id : { "$in" : batchBucket}});
            currentCount += batchBucket.length;
            batchBucket = [];
            sleep (pauseMS);
            print("Processed "+currentCount+"/"+totalToProcess+"...");
        }
    })
    print("Completed");
}
/** Example Usage:
    deleteFromCollection("foo.bar",{"type":"archive"},1000,20);
  or
    db.bar.deleteBulk({type:"archive"},1000,20);
**/

Inserting & Updating with a Plan

Not to be outdone with the deletes, MongoDB updates and inserts are equally good for the same logic. In these cases, only small changes would be needed to create batches of inserts and then pass .insert(batchBucket) into the shell. Using “sleep” allows breather room for other reads and actions in the system. I find we don’t need this for modern MongoDB using WiredTiger, but your mileage can vary based on workloads. Also, you might want to figure out a way to tell the script how to handle a document that already exists. In the case of data loading, you could wrap the script with a check for errors not including a duplicate key. Please note it’s very easy to duplicate data if you do not have a unique index, and MongoDB is auto assigning its own _id field.

Updates are a tad tricker as they can be expensive if the query portion of the code is not indexed. I’ve provided you with an example, however. You should consider the query time when planning batches and pauses — the more the update is based on a table scan, the smaller the batch you should consider. The reasoning here is that we want to avoid restarting and causing a new table scan as much as possible. A future improvement might be to also support reads from a secondary, and doing the update itself on the primary by the _id field, to ensure a pin-pointed update query.

DBCollection.prototype.updateBulk = function( query, changeObject, batchSize, pauseMS){
    //Parse and check namespaces
    ns = this.getFullName();
    srcNS={
        database:   ns.split(".")[0],
        collection: ns.split(".").slice(1,ns.length).join("."),
    };
    var db = this._db;
    var batchBucket = new Array();
    var totalToProcess = db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).find(query,{_id:1}).count();
    if (totalToProcess < batchSize){ batchSize = totalToProcess; }
    currentCount = 0;
    print("Processed "+currentCount+"/"+totalToProcess+"...");
    db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).find(query).addOption(DBQuery.Option.noTimeout).forEach(function(doc){
        batchBucket.push(doc._id);
        if ( batchBucket.length >= batchSize){
            var bulk = db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).initializeUnorderedBulkOp();
            batchBucket.forEach(function(doc){
                bulk.find({_id:doc._id}).update(changeObject);
            })
            printjson(bulk.execute());
            currentCount += batchBucket.length;
            batchBucket = [];
            sleep (pauseMS);
            print("Processed "+currentCount+"/"+totalToProcess+"...");
        }
    })
    print("Completed");
}
/** Example Usage:
    db.bar.updateBulk({type:"archive"},{$set:{archiveDate: ISODate()}},1000,20);
**/

In each iteration, the update prints out the failure. You can extend this example code to either write the failures to a file or try to automatically fix any issues as appropriate. My goal here is to provide you the starter function to build on. As with the earlier example, this assumes the JS shell, but you can follow the logic in the programming language of your choice if you would rather use Python, Golang or Java.

If you got nothing else from this blog on MongoDB bulk deletes and inserts, I hope you learned a good deal more about writing functions in the shell. Hopefully, you learned how to use programming to add pauses to the bulk operations you need to do. Taking this forward, you could be inventive by having a query to measure latency to trigger pauses (canary query), or even measure things like the oplog to ensure your not adversely impacting HA and replication. There is no right answer, but this is a great start towards explaining more operationally safe ways to do the bigger actions DBAs are asked to do from time to time.

The post Managing MongoDB Bulk Deletes and Inserts with Minimal Impact to Production Environments appeared first on Percona Database Performance Blog.

by David Murphy at April 05, 2018 09:23 PM

Percona Live 2018 Featured Talk: The Accidental DBA with Jenni Snyder

Jenni Snyder Yelp Percona Live 2018 (2)

Percona Live 2018 Featured TalkWelcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Jenni Snyder, Engineering Manager – Operations at Yelp. Her tutorial talk is titled The Accidental DBA. Open source relational databases like MySQL and PostgreSQL power some of the world’s largest websites. They can be used out of the box with few adjustments, and rarely require a dedicated Database Administrator (DBA) right away. This means that System Administrators, Site Reliability Engineers or Developers are usually the first to respond to some of the more interesting issues that can arise as you scale your databases. In our conversation, we discussed how people become “accidental” DBAs:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

I’m Jenni Snyder and currently work as Manager of the Database Reliability Engineering (DRE) Team at Yelp. And, I got into databases by accident. After graduating from college, my first job was as a software engineer. As I remember it, I got a desk in the corner, and under that desk was another computer. I hit a power strip switch as I went to plug in my workstation on my first day. This cut power to the other machine, which turned out to be the database host for their development environment (running Sybase). My new boss came over and walked me through starting it back up.

After that, it became pretty clear to me that I preferred systems administration over development and was put in charge of our database migration scripts. I figured out query and server tuning and later configured and deployed a more complicated environment using Oracle. I got my first official MySQL Database Administrator (DBA) job about four years later.

My degrees are in Sociology as well as Computer Science, so I have always been interested in social networks and media. I have worked for Tribe.net and the Cisco Media Solutions Group, and have been with Yelp now for almost seven years.

Percona: Your tutorial is titled The Accidental DBA. What do you mean by this term, and how did you arrive at it?

Jenni Snyder Yelp Percona Live 2018 (2)I explained this a bit above: few people seek out a career in database administration early in their career. One of the benefits of MySQL and other open source relational databases is that they’re relatively easy to get started with. You don’t need a DBA, DRE, or another owner right away. However, with success comes growth, and any open source databases with default configuration can quickly become slow if not tuned for your workload.

As a result, I think that many DBAs and DREs end up in their position by being the right person at the right time, being curious, and loving it.

Percona: What are some of the most important things an “Accidental DBA” needs to know right away?

I’d say that they are how to make sure that MySQL is running, interpreting any errors found in the client or server logs, and how to monitor MySQL.

This is going to be a very interactive talk and I’m hoping for lots of questions and a discussion. Everyone’s experience is different, and if I don’t include something in my slides, that doesn’t mean we can’t cover it!

Percona: What are three important database management lessons a good DBA needs to master?

    1. Put information in the hands of your developers so that they can learn about the database themselves

    2. Use automation early and often

    3. Use open source tools and contribute back to the community

Percona: Why should people attend your tutorial? What do you hope people will take away from it?

People should come if they’re interested in a broad overview of running MySQL. They should want to learn where they can make the most impact while tuning MySQL, how to avoid common problems, and discover some great open source tools that will make their jobs easier.

Percona: What are you looking forward to at Percona Live (besides your talk)?

Unfortunately, Percona chose to schedule Shlomi Noach’s Orchestrator High Availability Tutorial at the time of my talk, so I’m going to miss out on the number one tutorial I wanted to see!

Want to find out more about this Percona Live 2018 featured talk, and becoming an accidental DBA? Register for Percona Live 2018, and see Jenni’s tutorial talk The Accidental DBA. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

The post Percona Live 2018 Featured Talk: The Accidental DBA with Jenni Snyder appeared first on Percona Database Performance Blog.

by Dave Avery at April 05, 2018 06:38 PM

Jean-Jerome Schmidt

Capacity Planning for MySQL and MariaDB - Dimensioning Storage Size

Server manufacturers and cloud providers offer different kinds of storage solutions to cater for your database needs. When buying a new server or choosing a cloud instance to run our database, we often ask ourselves - how much disk space should we allocate? As we will find out, the answer is not trivial as there are a number of aspects to consider. Disk space is something that has to be thought of upfront, because shrinking and expanding disk space can be a risky operation for a disk-based database.

In this blog post, we are going to look into how to initially size your storage space, and then plan for capacity to support the growth of your MySQL or MariaDB database.

How MySQL Utilizes Disk Space

MySQL stores data in files on the hard disk under a specific directory that has the system variable "datadir". The contents of the datadir will depend on the MySQL server version, and the loaded configuration parameters and server variables (e.g., general_log, slow_query_log, binary log).

The actual storage and retrieval information is dependent on the storage engines. For the MyISAM engine, a table's indexes are stored in the .MYI file, in the data directory, along with the .MYD and .frm files for the table. For InnoDB engine, the indexes are stored in the tablespace, along with the table. If innodb_file_per_table option is set, the indexes will be in the table's .ibd file along with the .frm file. For the memory engine, the data are stored in the memory (heap) while the structure is stored in the .frm file on disk. In the upcoming MySQL 8.0, the metadata files (.frm, .par, dp.opt) are removed with the introduction of the new data dictionary schema.

It's important to note that if you are using InnoDB shared tablespace for storing table data (innodb_file_per_table=OFF), your MySQL physical data size is expected to grow continuously even after you truncate or delete huge rows of data. The only way to reclaim the free space in this configuration is to export, delete the current databases and re-import them back via mysqldump. Thus, it's important to set innodb_file_per_table=ON if you are concerned about the disk space, so when truncating a table, the space can be reclaimed. Also, with this configuration, a huge DELETE operation won't free up the disk space unless OPTIMIZE TABLE is executed afterward.

MySQL stores each database in its own directory under the "datadir" path. In addition, log files and other related MySQL files like socket and PID files, by default, will be created under datadir as well. For performance and reliability reason, it is recommended to store MySQL log files on a separate disk or partition - especially the MySQL error log and binary logs.

Database Size Estimation

The basic way of estimating size is to find the growth ratio between two different points in time, and then multiply that with the current database size. Measuring your peak-hours database traffic for this purpose is not the best practice, and does not represent your database usage as a whole. Think about a batch operation or a stored procedure that runs at midnight, or once a week. Your database could potentially grow significantly in the morning, before possibly being shrunk by a housekeeping operation at midnight.

One possible way is to use our backups as the base element for this measurement. Physical backup like Percona Xtrabackup, MariaDB Backup and filesystem snapshot would produce a more accurate representation of your database size as compared to logical backup, since it contains the binary copy of the database and indexes. Logical backup like mysqldump only stores SQL statements that can be executed to reproduce the original database object definitions and table data. Nevertheless, you can still come out with a good growth ratio by comparing mysqldump backups.

We can use the following formula to estimate the database size:

Where,

  • Bn - Current week full backup size,
  • Bn-1 - Previous week full backup size,
  • Dbdata - Total database data size,
  • Dbindex - Total database index size,
  • 52 - Number of weeks in a year,
  • Y - Year.

The total database size (data and indexes) in MB can be calculated by using the following statements:

mysql> SELECT ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) "DB Size in MB" FROM information_schema.tables;
+---------------+
| DB Size in MB |
+---------------+
|       2013.41 |
+---------------+

The above equation can be modified if you would like to use the monthly backups instead. Change the constant value of 52 to 12 (12 months in a year) and you are good to go.

Also, don't forget to account for innodb_log_file_size x 2, innodb_data_file_path and for Galera Cluster, add gcache.size value.

Binary Logs Size Estimation

Binary logs are generated by the MySQL master for replication and point-in-time recovery purposes. It is a set of log files that contain information about data modifications made on the MySQL server. The size of the binary logs depends on the number of write operations and the binary log format - STATEMENT, ROW or MIXED. Statement-based binary log are usually much smaller as compared to row-based binary log, because it only consists of the write statements while the row-based consists of modified rows information.

The best way to estimate the maximum disk usage of binary logs is to measure the binary log size for a day and multiply it with the expire_logs_days value (default is 0 - no automatic removal). It's important to set expire_logs_days so you can estimate the size correctly. By default, each binary log is capped around 1GB before MySQL rotates the binary log file. We can use a MySQL event to simply flush the binary log for the purpose of this estimation.

Firstly, make sure event_scheduler variable is enabled:

mysql> SET GLOBAL event_scheduler = ON;

Then, as a privileged user (with EVENT and RELOAD privileges), create the following event:

mysql> USE mysql;
mysql> CREATE EVENT flush_binlog
ON SCHEDULE EVERY 1 HOUR STARTS CURRENT_TIMESTAMP ENDS CURRENT_TIMESTAMP + INTERVAL 2 HOUR
COMMENT 'Flush binlogs per hour for the next 2 hours'
DO FLUSH BINARY LOGS;

For a write-intensive workload, you probably need to shorten down the interval to 30 minutes or 10 minutes before the binary log reaches 1GB maximum size, then round the output up to an hour. Then verify the status of the event by using the following statement and look at the LAST_EXECUTED column:

mysql> SELECT * FROM information_schema.events WHERE event_name='flush_binlog'\G
       ...
       LAST_EXECUTED: 2018-04-05 13:44:25
       ...

Then, take a look at the binary logs we have now:

mysql> SHOW BINARY LOGS;
+---------------+------------+
| Log_name      | File_size  |
+---------------+------------+
| binlog.000001 |        146 |
| binlog.000002 | 1073742058 |
| binlog.000003 | 1073742302 |
| binlog.000004 | 1070551371 |
| binlog.000005 | 1070254293 |
| binlog.000006 |  562350055 | <- hour #1
| binlog.000007 |  561754360 | <- hour #2
| binlog.000008 |  434015678 |
+---------------+------------+

We can then calculate the average of our binary logs growth which is around ~562 MB per hour during peak hours. Multiply this value with 24 hours and the expire_logs_days value:

mysql> SELECT (562 * 24 * @@expire_logs_days);
+---------------------------------+
| (562 * 24 * @@expire_logs_days) |
+---------------------------------+
|                           94416 |
+---------------------------------+

We will get 94416 MB which is around ~95 GB of disk space for our binary logs. Slave's relay logs are basically the same as the master's binary logs, except that they are stored on the slave side. Therefore, this calculation also applies to the slave relay logs.

Spindle Disk or Solid State?

There are two types of I/O operations on MySQL files:

  • Sequential I/O-oriented files:
    • InnoDB system tablespace (ibdata)
    • MySQL log files:
      • Binary logs (binlog.xxxx)
      • REDO logs (ib_logfile*)
      • General logs
      • Slow query logs
      • Error log
  • Random I/O-oriented files:
    • InnoDB file-per-table data file (*.ibd) with innodb_file_per_table=ON (default).

Consider placing random I/O-oriented files in a high throughput disk subsystem for best performance. This could be flash drive - either SSDs or NVRAM card, or high RPM spindle disks like SAS 15K or 10K, with hardware RAID controller and battery-backed unit. For sequential I/O-oriented files, storing on HDD with battery-backed write-cache should be good enough for MySQL. Take note that performance degradation is likely if the battery is dead.

We will cover this area (estimating disk throughput and file allocation) in a separate post.

Capacity Planning and Dimensioning

Capacity planning can help us build a production database server with enough resources to survive daily operations. We must also provision for unexpected needs, account for future storage and disk throughput needs. Thus, capacity planning is important to ensure the database has enough room to breath until the next hardware refresh cycle.

It's best to illustrate this with an example. Considering the following scenario:

  • Next hardware cycle: 3 years
  • Current database size: 2013 MB
  • Current full backup size (week N): 1177 MB
  • Previous full backup size (week N-1): 936 MB
  • Delta size: 241MB per week
  • Delta ratio: 25.7% increment per week
  • Total weeks in 3 years: 156 weeks
  • Total database size estimation: ((1177 - 936) x 2013 x 156)/936 = 80856 MB ~ 81 GB after 3 years

If you are using binary logs, sum it up from the value we got in the previous section:

  • 81 + 95 = 176 GB of storage for database and binary logs.

Add at least 100% more room for operational and maintenance tasks (local backup, data staging, error log, operating system files, etc):

  • 176 + 176 = 352 GB of total disk space.

Based on this estimation, we can conclude that we would need at least 352 GB of disk space for our database for 3 years. You can use this value to justify your new hardware purchase. For example, if you want to buy a new dedicated server, you could opt for 6 x 128 SSD RAID 10 with battery-backed RAID controller which will give you around 384 GB of total disk space. Or, if you prefer cloud, you could get 100GB of block storage with provisioned IOPS for our 81GB database usage and use the standard persistent block storage for our 95GB binary logs and other operational usage.

Happy dimensioning!

by ashraf at April 05, 2018 09:12 AM

April 04, 2018

Peter Zaitsev

Performance Schema for MySQL Troubleshooting Webinar: Q & A

MySQL Troubleshooting

MySQL TroubleshootingIn this blog, I will provide answers to the Q & A for the Performance Schema for MySQL Troubleshooting webinar.

First, I want to thank everybody for attending my March 1, 2018, webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I was unable to answer fully during the webinar.

Q: Is Workbench able to take advantage of the enhancements to Perf schema?

A: MySQL Workbench is a graphical tool for database architects, administrators and developers. It uses Performance Schema for its Performance Schema Reports and Query Statistics dashboards for MySQL Servers of version 5.6 or greater. So the answer is: yes, it is able to take advantage of the enhancements to Performance Schema.

Q: Can we check the history data ?

A: Yes. To do it you need to enable history consumers. You will find instructions here. For all kinds of consumers history table names follow the same pattern:

  • *_history
      contains last N events per thread. Number N is defined by
    performance_schema_*_history_size
      variables. Default is -1 (autosized) in version 5.7 and 10 in version 5.6.
  • *_history_long
     contains the most recent M events. Value of M is defined by
    performance_schema_*_history_long_size
      variables. Default is -1 (autosized) in version 5.7 and 10000 in version 5.6

For example, if you want to have historical data for statements, you need to enable consumers

events_statements_history
 and
events_statements_history_long
. If you want to limit the number of queries stored, you need to set variables
performance_schema_events_statements_history_size
 and
performance_schema_events_statements_history_long_size
.

Q: Are there any guidelines regarding how much memory we should set aside for every X counters/statistics being enabled?

A: No, there is no such guideline I am aware of. But you can use definitions of tables in Performance Schema to calculate the approximate value of how much memory one row could occupy, and make predictions from it. You can also use memory instrumentation in Performance Schema and watch for changes of memory usage under load, adjusting as needed.

Q: How has the performance cost of performance schema changed from 5.6 to 5.7?

A: The worst situation for the performance cost of Performance Schema was in version 5.5. It was discussed in numerous places, but I recommend you read this 2010 post from Dimitri Kravtchuk, a MySQL Performance Architect at Oracle. The result of his post was a huge performance improvement in Performance Schema, reported in this post from 2011. There were more improvements, discussed in 2012. Since then, Performance Schema does not add significant performance overhead unless you enable 

waits
 instrumentation.

Version 5.7 added new instrumentation, and as my tests showed this instrumentation did not add any noticeable performance impact.

To summarize: version 5.6 made huge improvements to the performance of Performance Schema, and the new features in version 5.7 did not add any performance regressions.

Q: Will performance schema eat up my disk space? How long will it store all these logs and cause any issues?

A: Performance Schema does not store anything on disk, but uses memory. It stores data until it reaches the size of the consumer tables. When it reaches the maximum size, it removes the oldest data and replaces it with the newest statistics. Read the Performance Schema startup configuration guide on how to limit the maximum size of consumer tables.

Thanks for attending this webinar on Performance Schema for MySQL Troubleshooting. You can find the slides and a recording here.

The post Performance Schema for MySQL Troubleshooting Webinar: Q & A appeared first on Percona Database Performance Blog.

by Sveta Smirnova at April 04, 2018 05:48 PM

Percona Monitoring and Management 1.9.0 Is Now Available

Percona Monitoring and Management

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL® and MongoDB® performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL® and MongoDB® servers to ensure that your data works as efficiently as possible.

Percona Monitoring and ManagementThere are a number of significant updates in Percona Monitoring and Management 1.9.0 that we hope you will like, some of the key highlights include:

  • Faster loading of the index page: We have enabled performance optimizations using gzip and HTTP2.
  • AWS improvements: We have added metrics from CloudWatch RDS to 6 dashboards, as well as changed our AWS add instance workflow, and made some changes to credentials handling.
  • Percona Snapshot Server: If you are a Percona customer you can now securely share your dashboards with Percona Engineers.
  • Exporting Percona Monitoring and Management Server logs: Retrieve logs from PMM Server for troubleshooting using single button-click, avoiding the need to log in manually to the docker container.
  • Low RAM support: We have reduced the memory requirement so PMM Server will run on systems with 512MB
  • Dashboard improvements: We have changed MongoDB instance identification for MongoDB graphs, and set maximum graph Y-axis on Prometheus Exporter Status dashboard

AWS Improvements

CloudWatch RDS metrics

Since we are already consuming Amazon Cloudwatch metrics and persisting them in Prometheus, we have improved six node-specific dashboards to now display Amazon RDS node-level metrics:

  • Cross_Server (Network Traffic)
  • Disk Performance (Disk Latency)
  • Home Dashboard (Network IO)
  • MySQL Overview (Disk Latency, Network traffic)
  • Summary Dashboard (Network Traffic)
  • System Overview (Network Traffic)

AWS Add Instance changes

We have changed our AWS add instance interface and workflow to be more clear on information needed to add an Amazon Aurora MySQL or Amazon RDS MySQL instance. We have provided some clarity on how to locate your AWS credentials.

AWS Settings

We have improved our documentation to highlight connectivity best practices, and authentication options – IAM Role or IAM User Access Key.

Enabling Enhanced Monitoring

Credentials Screen

Low RAM Support

You can now run Percona Monitoring and Management Server on instances with memory as low as 512MB RAM, which means you can deploy to the free tier of many cloud providers if you want to experiment with PMM. Our memory calculation is now:

METRICS_MEMORY_MULTIPLIED=$(( (${MEMORY_AVAIABLE} - 256*1024*1024) / 100 * 40 ))
if [[ $METRICS_MEMORY_MULTIPLIED < $((128*1024*1024)) ]]; then
   METRICS_MEMORY_MULTIPLIED=$((128*1024*1024))
fi

Percona Snapshot Server

Snapshots are a way of sharing PMM dashboards via a link to individuals who do not normally have access to your PMM Server. If you are a Percona customer you can now securely share your dashboards with Percona Engineers. We have replaced the button for sharing to the Grafana publicly hosted platform onto one administered by Percona. Your dashboard will be written to Percona snapshots and only Percona Engineers will be able to retrieve the data. We will be expiring old snapshots automatically at 90 days, but when sharing you will have the option to configure a shorter retention period.

Export of PMM Server Logs

In this release, the logs from PMM Server can be exported using single button-click, avoiding the need to log in manually to the docker container. This simplifies the troubleshooting process of a PMM Server, and especially for Percona customers, this feature will provide a more consistent data gathering task that you will perform on behalf of requests from Percona Engineers.

Faster Loading of the Index Page

In Percona Monitoring and Management version 1.8.0, the index page was redesigned to reveal more useful information about the performance of your hosts as well an immediate access to essential components of PMM, however the index page had to load much data dynamically resulting in a noticeably longer load time. In this release we enabled gzip and HTTP2 to improve the load time of the index page. The following screenshots demonstrate the results of our tests on webpagetest.org where we reduce page load time by half. We will continue to look for opportunities to improve the performance of the index page and expect that when we upgrade to Prometheus 2 we will see another improvement.

The load time of the index page of PMM version 1.8.0

The load time of the index page of PMM version 1.9.0

Issues in this release

New Features

  • PMM-781: Plot new PXC 5.7.17, 5.7.18 status variables on new graphs for PXC Galera, PXC Overview dashboards
  • PMM-1274: Export PMM Server logs as zip file to the browser
  • PMM-2058: Percona Snapshot Server

Improvements

  • PMM-1587: Use mongodb_up variable for the MongoDB Overview dashboard to identify if a host is MongoDB.
  • PMM-1788: AWS Credentials form changes
  • PMM-1823: AWS Install wizard improvements
  • PMM-2010: System dashboards update to be compatible with RDS nodes
  • PMM-2118: Update grafana config for metric series that will not go above 1.0
  • PMM-2215: PMM Web speed improvements
  • PMM-2216: PMM can now be started on systems without memory limit capabilities in the kernel
  • PMM-2217: PMM Server can now run in Docker with 512 Mb memory
  • PMM-2252: Better handling of variables in the navigation menu

Bug fixes

  • PMM-605: pt-mysql-summary requires additional configuration
  • PMM-941: ParseSocketFromNetstat finds an incorrect socket
  • PMM-948: Wrong load reported by QAN due to mis-alignment of time intervals
  • PMM-1486: MySQL passwords containing the dollar sign ($) were not processed properly.
  • PMM-1905: In QAN, the Explain command could fail in some cases.
  • PMM-2090: Minor formatting issues in QAN
  • PMM-2214: Setting Send real query examples for Query Analytic OFF still shows the real query in example.
  • PMM-2221: no Rate of Scrapes for MySQL & MySQL Errors
  • PMM-2224: Exporter CPU Usage glitches
  • PMM-2227: Auto Refresh for dashboards
  • PMM-2243: Long host names in Grafana dashboards are not displayed correctly
  • PMM-2257: PXC/galera cluster overview Flow control paused time has a percentage glitch
  • PMM-2282: No data is displayed on dashboards for OVA images
  • PMM-2296: The mysql:metrics service will not start on Ubuntu LTS 16.04

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

The post Percona Monitoring and Management 1.9.0 Is Now Available appeared first on Percona Database Performance Blog.

by Borys Belinsky at April 04, 2018 04:39 PM

Pattern Matching Queries vs. Full-Text Indexes

Pattern Matching Queries vs. Full-Text Indexes

Pattern Matching Queries vs. Full-Text IndexesIn this blog post, we’ll compare the performance of pattern matching queries vs. full-text indexes.

In my previous blog post, I looked for a solution on how we can search only a part of the email address and how can we make faster queries where the condition is email LIKE '%n.pierre%'. I showed two possible ways that could work. Of course, they had some pros and cons as well but were more efficient and faster than a like '%n.prierre%'.

But you could also ask why I would bother with this? Let’s add a FULLTEXT index, and everybody is happy! Are you sure about that? I’m not. Let’s investigate and test a bit. (We have some nice blog posts that explain how FULLTEXT indexes work: Post 1, Post 2, Post 3.)

Let’s see if it works in our case where we were looking for email addresses. Here is the table:

CREATE TABLE `email` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `email` varchar(120) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `idx_email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=318465 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Add the default full-text index:

ALTER TABLE email ADD FULLTEXT KEY (email);

It took only five seconds for 320K email addresses.

Let’s run a search:

SELECT id, email FROM email where MATCH(email) AGAINST ('n.pierre' IN NATURAL LANGUAGE MODE);
+--------+--------------------------------+
| id     | email                          |
+--------+--------------------------------+
|   2940 | pierre.west@example.org        |
|  10775 | pierre.beier@example.org       |
|  24267 | schroeder.pierre@example.org   |
|  26285 | bode.pierre@example.org        |
|  27104 | pierre.franecki@example.org    |
|  31792 | pierre.jaskolski@example.com   |
|  39369 | kuphal.pierre@example.org      |
|  58625 | olson.pierre@example.org       |
|  59526 | larkin.pierre@example.net      |
|  64718 | boyle.pierre@example.com       |
|  72033 | pierre.wolf@example.net        |
|  90587 | anderson.pierre@example.org    |
| 108806 | fadel.pierre@example.org       |
| 113897 | jacobs.pierre@example.com      |
| 118579 | hudson.pierre@example.com      |
| 118798 | pierre.wuckert@example.org     |
| 118937 | green.pierre@example.net       |
| 125451 | hauck.pierre@example.net       |
| 133352 | friesen.pierre@example.net     |
| 134594 | windler.pierre@example.com     |
| 135406 | dietrich.pierre@example.org    |
| 190451 | daugherty.pierre@example.org   |
...

Immediately, we have issues with the results. It returns 43 rows, but there are only 11 rows with string n.pierre. Why? It is because of . The manual says:

The built-in FULLTEXT parser determines where words start and end by looking for certain delimiter characters; for example,   (space), , (comma), and . (period).

The parser believes that a . starts a new word, so it is going to search for pierre instead of n.pierre. That’s not good news as many email addresses contain ..  What can we do? The manual says:

It is possible to write a plugin that replaces the built-in full-text parser. For details, see Section 28.2, “The MySQL Plugin API”. For example parser plugin source code, see the plugin/fulltext directory of a MySQL source distribution.

If you are willing to write your own plugin in C/C++, you can try that route. Until then, it is going to give us back a lot of irrelevant matches.

We can order the results by relevancy:

SELECT id,email,MATCH(email) AGAINST ('n.pierre' IN NATURAL LANGUAGE MODE)
 AS score FROM email where MATCH(email) AGAINST
('n.pierre' IN NATURAL LANGUAGE MODE) ORDER BY 3 desc limit 10;
+-------+------------------------------+-------------------+
| id    | email                        | score             |
+-------+------------------------------+-------------------+
|  2940 | pierre.west@example.org      | 14.96491813659668 |
| 10775 | pierre.beier@example.org     | 14.96491813659668 |
| 24267 | schroeder.pierre@example.org | 14.96491813659668 |
| 26285 | bode.pierre@example.org      | 14.96491813659668 |
| 27104 | pierre.franecki@example.org  | 14.96491813659668 |
| 31792 | pierre.jaskolski@example.com | 14.96491813659668 |
| 39369 | kuphal.pierre@example.org    | 14.96491813659668 |
| 58625 | olson.pierre@example.org     | 14.96491813659668 |
| 59526 | larkin.pierre@example.net    | 14.96491813659668 |
| 64718 | boyle.pierre@example.com     | 14.96491813659668 |
+-------+------------------------------+-------------------+

This does not guarantee we get back the lines that we are looking for, however. I tried to change innodb_ft_min_token_size as well, but it did not affect the results.

Let’s see what happens when I search for williamson pierre. Two separate words. I know there is only one email address with these names.

SELECT id,email,MATCH(email) AGAINST
('williamson.pierre' IN NATURAL LANGUAGE MODE) AS score
FROM email where MATCH(email) AGAINST
('williamson.pierre' IN NATURAL LANGUAGE MODE) ORDER BY 3 desc limit 50;
+--------+---------------------------------+-------------------+
| id     | email                           | score             |
+--------+---------------------------------+-------------------+
| 238396 | williamson.pierre@example.net   | 24.08820343017578 |
|   2940 | pierre.west@example.org         | 14.96491813659668 |
|  10775 | pierre.beier@example.org        | 14.96491813659668 |
|  24267 | schroeder.pierre@example.org    | 14.96491813659668 |
|  26285 | bode.pierre@example.org         | 14.96491813659668 |
|  27104 | pierre.franecki@example.org     | 14.96491813659668 |
|  31792 | pierre.jaskolski@example.com    | 14.96491813659668 |
|  39369 | kuphal.pierre@example.org       | 14.96491813659668 |
|  58625 | olson.pierre@example.org        | 14.96491813659668 |
...

The first result is that we still got another 49 addresses. How can the application decide which email address is relevant and which is not? I am still not happy.

Are there any other options without writing our own plugin?

Can I somehow tell the parser to use n.pierre as one word? The manual says:

A phrase that is enclosed within double quote (") characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase".

I can use double quotes, but it will still split at . and the results are the same. I did not find a solution except writing your own plugin. If someone knows a solution, please write a comment.

With Parser Ngram

The built-in MySQL full-text parser uses delimiters between words, but we can create an Ngram-based full-text index.

mysql> alter table  email ADD FULLTEXT KEY (email) WITH PARSER ngram;
Query OK, 0 rows affected (20.10 sec)
Records: 0  Duplicates: 0  Warnings: 0

Before that, I changed the ngram_token_size to 3.

mysql> SELECT id,email,MATCH(email) AGAINST ('n.pierre' IN NATURAL LANGUAGE MODE) AS score FROM email where MATCH(email) AGAINST ('n.pierre' IN NATURAL LANGUAGE MODE) ORDER BY 3 desc;
+--------+----------------------------------+--------------------+
| id     | email                            | score              |
+--------+----------------------------------+--------------------+
|  58625 | olson.pierre@example.org         |  16.56794548034668 |
|  59526 | larkin.pierre@example.net        |  16.56794548034668 |
|  90587 | anderson.pierre@example.org      |  16.56794548034668 |
| 118579 | hudson.pierre@example.com        |  16.56794548034668 |
| 118937 | green.pierre@example.net         |  16.56794548034668 |
| 133352 | friesen.pierre@example.net       |  16.56794548034668 |
| 200608 | wilkinson.pierre@example.org     |  16.56794548034668 |
| 237928 | johnson.pierre@example.org       |  16.56794548034668 |
| 238396 | williamson.pierre@example.net    |  16.56794548034668 |
| 278384 | monahan.pierre@example.net       |  16.56794548034668 |
| 306718 | rohan.pierre@example.com         |  16.56794548034668 |
| 226737 | warren.pfeffer@example.net       | 12.156486511230469 |
|  74278 | stiedemann.perry@example.net     |  11.52701187133789 |
|  75234 | bogan.perry@example.org          |  11.52701187133789 |
...
4697 rows in set (0.03 sec)

Finally, we are getting somewhere. But it gives back 4697 rows. How can the application decide which results are relevant? Should we just use the score?

Subselect?

I dropped the Ngram FULLTEXT index and created a normal one because that gives me back only 43 results instead of 4697. I thought a full-text search might be good to narrow down the results from a million to a few thousand, and then we can run a select based on that. Example:

mysql> Select e2.id,e2.email from
(SELECT id,email FROM email where MATCH(email)
AGAINST ('n.pierre' IN NATURAL LANGUAGE MODE))
as e2 where e2.email like '%n.pierre%';
+--------+-------------------------------+
| id     | email                         |
+--------+-------------------------------+
|  58625 | olson.pierre@example.org      |
|  59526 | larkin.pierre@example.net     |
|  90587 | anderson.pierre@example.org   |
| 118579 | hudson.pierre@example.com     |
| 118937 | green.pierre@example.net      |
| 133352 | friesen.pierre@example.net    |
| 200608 | wilkinson.pierre@example.org  |
| 237928 | johnson.pierre@example.org    |
| 238396 | williamson.pierre@example.net |
| 278384 | monahan.pierre@example.net    |
| 306718 | rohan.pierre@example.com      |
+--------+-------------------------------+
11 rows in set (0.00 sec)

Wow, this can work and it looks quite fast as well. BUT (there is always a but), if I run the following query (searching for ierre):

mysql> Select e2.id,e2.email from
(SELECT id,email FROM email where MATCH(email)
AGAINST ('ierre' IN NATURAL LANGUAGE MODE))
as e2 where e2.email like '%ierre%';
Empty set (0.00 sec)

It gives back nothing because the default full-text parser uses only full words! In our case, that is not very helpful. Let’s switch back to Ngram and re-run the query:

mysql> Select e2.id,e2.email from
(SELECT id,email FROM email where MATCH(email)
AGAINST ('ierre' IN NATURAL LANGUAGE MODE))
as e2 where e2.email like '%ierre%';
+--------+--------------------------------+
| id     | email                          |
+--------+--------------------------------+
|   2940 | pierre.west@example.org        |
|  10775 | pierre.beier@example.org       |
|  16958 | pierre68@example.com           |
|  24267 | schroeder.pierre@example.org   |
...
65 rows in set (0.05 sec)
+-------------------------+----------+
| Status                  | Duration |
+-------------------------+----------+
| starting                | 0.000072 |
| checking permissions    | 0.000006 |
| Opening tables          | 0.000014 |
| init                    | 0.000027 |
| System lock             | 0.000007 |
| optimizing              | 0.000006 |
| statistics              | 0.000013 |
| preparing               | 0.000006 |
| FULLTEXT initialization | 0.006384 |
| executing               | 0.000012 |
| Sending data            | 0.020735 |
| end                     | 0.000014 |
| query end               | 0.000014 |
| closing tables          | 0.000013 |
| freeing items           | 0.001383 |
| cleaning up             | 0.000024 |
+-------------------------+----------+

It gives us back 65 rows, and it takes between 0.02-0.05s because the subquery results in many rows.

With my “shorting method”:

select e.email from email as e right join email_tib as t
on t.email_id=e.id where t.email_parts like "ierre%";
+--------------------------------+
| email                          |
+--------------------------------+
| anderson.pierre@example.org    |
| bode.pierre@example.org        |
| bode.pierre@example.org        |
| boyle.pierre@example.com       |
| bradtke.pierre@example.org     |
| bradtke.pierre@example.org     |
...
65 rows in set (0.00 sec)
mysql> show profile;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000069 |
| checking permissions | 0.000011 |
| checking permissions | 0.000003 |
| Opening tables       | 0.000020 |
| init                 | 0.000021 |
| System lock          | 0.000008 |
| optimizing           | 0.000009 |
| statistics           | 0.000070 |
| preparing            | 0.000011 |
| executing            | 0.000001 |
| Sending data         | 0.000330 |
| end                  | 0.000002 |
| query end            | 0.000007 |
| closing tables       | 0.000005 |
| freeing items        | 0.000014 |
| cleaning up          | 0.000010 |
+----------------------+----------+

It reads and gives back exactly 65 rows and it takes 0.000s.

Conclusion

When it comes to pattern matching queries vs. full-text indexes, it looks like full-text index can be helpful, and it is built in. Unfortunately, we do not have many metrics regarding full-text indexes. We cannot see how many rows were read, etc. I don’t want to make any conclusions on which one is faster. I still have to run some tests with our favorite benchmark tool sysbench on a much bigger dataset.

I should mention that full-text indexes and my previous solutions won’t solve all the problems. In this and my other blog I was trying to find an answer to a specific problem, but there are cases where my solutions would not work that well.

The post Pattern Matching Queries vs. Full-Text Indexes appeared first on Percona Database Performance Blog.

by Tibor Korocz at April 04, 2018 01:19 PM

April 03, 2018

Peter Zaitsev

Leveraging ProxySQL with AWS Aurora to Improve Performance, Or How ProxySQL Out-performs Native Aurora Cluster Endpoints

ProxySQL with AWS Aurora

In this blog post, I’ll look at how you can use ProxySQL with AWS Aurora to further leverage database performance.

My previous article described how easy is to replace the native Aurora connector with ProxySQL. In this article, you will see WHY you should do that.

It is important to understand that aside from the basic optimization in the connectivity and connection management, ProxySQL also provides you with a new set of features that currently are not available in Aurora.

Just think:

  • Better caching
  • Query filtering
  • Sharding
  • Query substitution
  • Firewalling
  • … and more

We will cover areas like scalability, security and performance. In short, I think is more than worth it to spend some time and give ProxySQL with AWS Aurora a try.

The tests

I will show you the results from two different kinds of tests. One is sysbench-oriented, the other simulates a more complex application using Java, data object utilization and a Hikari connection pool in the middle as well. 

For the EC2 and Aurora platform I used:

  • Application/ProxySQL T2.xlarge eu-central-1a
  • 2 Aurora MySQL 5.7.12 db.t2.medium eu-central-1a
  • 1 Aurora MySQL 5.7.12 db.t2.medium eu-central-1b for AZ redundancy

The code for the application is available here, and for sysbench tests here. All the data and configurations for the application are available here.

I ran three tests using both bench apps, obviously with Aurora as it comes and with ProxySQL. For the ProxySQL configuration see my previous article.
The tests were read_only / Write_only / read_write.

For Aurora, I only increased the number of connections and kept the how it comes out of the box approach. Note each test was run at least three times at different moments of the day, and on a different day. The data reported as final is the BEST performing result for each one.

The Results

For the impatient among us, here is a summary table of the tests:

Sysbench:

Java App:


Now if this is enough for you, you can go to the conclusion and start to use ProxySQL with AWS Aurora. If you would like to know a bit more, continue reading.

Aside from any discussion on the benchmark tool and settings, I really focused on identifying the differences between the two “connectors”. Given the layer below was exactly the same, any difference is due to the simple substitution of the endpoint.

Sysbench

Read Only

The first image reports the number of events achieved at the time of the test. It is quite clear that when using ProxySQL, sysbench ran more events.

In this graph, higher is better:

In this graph, lower is better:

As we can see, the latency when using an Aurora cluster entry point is higher. True, we are talking about milliseconds, but it is not just the value that matters, but also the distribution:

Aurora cluster endpoint ProxySQL
   

An image is worth a thousand words!

We can see, the behavior stays constant in analyzing the READS executed, with ProxySQL performing better.

In this graph, higher is better:


In this graph, higher is better:

Closing with the number of total queries performed, in which ProxySQL surpassed the Cluster endpoint by ~ 4K queries.

Write Only

For writing, things go a bit different. We see that all lines intersect, and the values are very close one to the other. I will let the images speak for themselves:

In this graph, higher is better:

In this graph, lower is better:

Latency spiked in each ProxySQL test, and it may require additional investigation and tuning.  

In this graph, higher is better:

While the rates of writes/sec intersect with each other frequently, in the end ProxySQL resulted in more writes than the native endpoint.

In this graph, higher is better:

In the end, a difference exists and is consistent across the different test iterations, but is minimal. We are talking a range of 25 – 50 entries in total. This result is not surprising, and it will be clear why later in the article.  

Read and Write

As expected in the read and write test, we see a different situation. ProxySQL is still performing better than the default entry point, but not by such a big margin as in read-only tests.

In this graph, higher is better:

In this graph, lower is better

Latency and events are following the expected trend, where read operations are executed more efficiently with ProxySQL and writes are close, but NOT the same as in the write only test.

As a result, the number of queries in ProxySQL is approximately 13% better than the default entry point.

Java Application Tests

What about the Java application? First of all, we need to remember that the application used a connection pool mechanism (Hikari), and the connection pool was present in all cases (for both Aurora cluster endpoint or ProxySQL). Given that a small delay in establishing the first connection was expected, you can easily see this in the MAX value of the connection latency.    

In this graph, lower is better.

The connection latency reported here is expressed in nanoseconds and is the measure of the time taken by the connection provider to return an active connection to the application from the moment the application requested it. In other words, how long the HikariCP is taking to choose/check/return an open connection. As you can see, the MAX value is drastically higher, and this was expected since it is the connection initialization. While not really interesting in terms of performance, this value is interesting because it gives us the dimension of the cost in the CP to open a new connection, which in the worse case is 25 milliseconds.

As the graphs show, ProxySQL manages both cases (first call, reassignment) more efficiently.

In this graph, higher is better.

In the CRUD summary table, we can see the number of SQL commands executed per second for each CRUD action and for each test. Once more we can see that when using ProxySQL, the application performed much better and significantly executed more operations (especially in the R/W test).

In this graph, higher is better.

This graph represents the total number of events run at the time of the test. An event is a full application cycle, which sees the application generate the data needed to fill the SQL (no matter if it is for read/write), create the SQL, request the connection, push the SQL, get and read the resultset returned and give back the connection.

Once more, ProxySQL shows better performance.

In this graph, lower is better.

The execution time reported in this graph is the time taken by the application to run a whole event.

This is it, execution time is the time of a full cycle. The faster the cycle is executed, the better the application is performing. The time is express in milliseconds and it goes from a very fast read, which probably accesses the cache in Aurora, to almost two seconds taken to insert a batch of rows.

Needless to say, the tests using ProxySQL performed better.

But Why?

Why do the tests using ProxySQL perform better? After all, it is just an additional step in the middle, which also has a cost in intercepting the queries and managing the connections. So why the better performance?

The answer is simple and can be found in the Aurora manual: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.Overview.html#Aurora.Overview.Endpoints.

The Cluster endpoint is an endpoint for an Aurora DB cluster that connects to the current primary instance for that DB cluster. Each Aurora DB cluster has a cluster endpoint and one primary instance.

That endpoint receives the read and write request and sends them to the same instance.The main use for it is to perform failover if needed.

At the same time, the Reader endpoint is an endpoint for an Aurora DB cluster that connects to one of the available Aurora Replicas for that DB cluster. Each Aurora DB cluster has a reader endpoint. If there is more than one Aurora Replica, the reader endpoint directs each connection request to one of the Aurora Replicas.

The reader endpoint only load balances connections to available Aurora Replicas in an Aurora DB cluster. It does not load balance specific queries. If you want to load balance queries to distribute the read workload for a DB cluster, you need to manage that in your application and use instance endpoints to connect directly to Aurora Replicas to balance the load.

This means that to perform a Read/Write split, your application must manage two entry points and you will NOT have much control over how the queries are handled or to which replica instance they are directed. This could lead to unexpected results and delays.

Needless to say, ProxySQL does all that by default (as described in my previous article).

Now that we’ve clarified how Aurora entry points behave, let’s see about the performance difference.

How do we read this graph? From left to right:

  • read_only test with an Aurora cluster endpoint
  • read_only test with ProxySQL
  • write_only with an Aurora cluster endpoint
  • write_only with ProxySQL
  • read and write with an Aurora cluster endpoint
  • read and write with ProxySQL

Here we go! As we can see, the tests with ProxySQL used the two configured instances, splitting R/W without the need to do anything on the application side. I purposely avoided the AZ replica because I previously identified it as having higher latency, so I can exclude it and use it ONLY in the case of an emergency.

The effects are clear in the next graph.

When using the cluster endpoint, given all the load was on a single instance, the CPU utilization is higher and that became a bottleneck. When using ProxySQL, the load is spread across the different instances, allowing real read scalability. This has immediate benefits in read and read/write operations, allowing better load distribution that results in better performance.

Conclusions

Aurora is a very interesting technology and can be a very good solution for read scaling. But at the moment, the way AWS offers data connectivity with the Cluster endpoints and Reader endpoints can negatively affect performance.

The lack of configuration and the limitation of using different endpoints lead to confusion and less optimized utilization.

The introduction of ProxySQL, which now supports (from version 2) Aurora, allows an architect, SA or DBA to properly configure the environment. You can very granularly choose how to use each instance, without the need to have the application modify how it works. This helps keep the data layer solution separate from the application layer.

Even better, this additional set of flexibility does not come with a cost. On the contrary, it improves resource utilization and brings higher performance using less powerful instances. Given the cost of Aurora, this is not a secondary benefit.   

I suggest you try installing ProxySQL v2 (or higher) in front of your Aurora cluster. If you don’t feel confident and prefer to have us help you, contact us and we will be more than happy to support you!

The post Leveraging ProxySQL with AWS Aurora to Improve Performance, Or How ProxySQL Out-performs Native Aurora Cluster Endpoints appeared first on Percona Database Performance Blog.

by Marco Tusa at April 03, 2018 06:51 PM

How to Implement ProxySQL with AWS Aurora

ProxySQL with AWS Aurora

ProxySQL with AWS AuroraIn this post, we’ll look at how to implement ProxySQL with AWS Aurora.

Recently, there have been a few discussions and customer requests that focused on AWS Aurora and how to make the various architectures and solutions more flexible.

Flexible how, you may ask? Well, there are the usual expectations:

  • How do you improve resource utilization?
  • How can I filter (or block) things?
  • Can I shard with Aurora?
  • What is the best way to implement query caching?
  • … and more.

The inclusion of ProxySQL solves many of the points above. We in Consulting design the solutions for our customers by applying the different functionalities to better match customers needs. Whenever we deal with Aurora, we’ve had to exclude ProxySQL because of some limitations in the software.

Now, however, ProxySQL 2.0 supports Aurora, and it does it amazingly well.

This article shows you how to implement ProxySQL with AWS Aurora. The the next article Leveraging ProxySQL with AWS Aurora to Improve Performance will show you WHY.

The Problem

ProxySQL has two different ways to deal with backend servers. One is using replication mechanisms, like standard Async replication and Group Replication. The other is to use the scheduler, as in the case of Percona XtraDB Cluster, MariaDB Cluster, etc.

While we can use the scheduler as a solution for Aurora, it is not as immediate and well-integrated as the embedded support for replication, given that we normally opted not to use it in this specific case (Aurora).

But what WAS the problem with Aurora? An Aurora cluster bases its definition of Writer vs. Readers using the innodb_read_only variable. So, where is the problem? Well actually no problem at all, just that ProxySQL up to version 2 for replication only supported the generic variable READ_ONLY. As such, it was not able to correctly identify the Writer/Readers set.

The Solution

In October 2017, this issue was opened (https://github.com/sysown/proxysql/issues/1195 )and the result was, as usual, a quite simple and flexible solution.

Brainstorming, a possible solution could be to add another column in mysql_replication_hostgroups to specify what needs to be checked, either read_only or innodb_read_only, or even super_read_only

This lead to the ProxySQL team delivering (“commit fe2f16d6df15252f0107a6a224dad7b1efdb13f6”):

Added support for innodb_read_only and super_read_only  

MYHGM_MYSQL_REPLICATION_HOSTGROUPS "CREATE TABLE mysql_replication_hostgroups
(writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY ,
reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>=0) ,
check_type VARCHAR CHECK (LOWER(check_type) IN ('read_only','innodb_read_only','super_read_only')) NOT NULL DEFAULT 'read_only' ,
comment VARCHAR NOT NULL DEFAULT '' , UNIQUE (reader_hostgroup))"

Which in short means they added a new column to the mysql_replication_hostgroup table. ProxySQL continues to behave exactly the same and manages the servers and the replication groups as usual. No need for scripts or other crazy stuff.

Implementation

Here we are, the HOW TO part. The first thing to keep in mind is that when you implement a new Aurora cluster, you should always consider having at least two instances in the same AZ and another instance in a remote AZ.

To implement ProxySQL, you should refer directly to the instances, NOT to the cluster entry-point. To be clear, you must take this for each instance:

The information is available in the web-admin interface, under the instance or using the command:

aws rds describe-db-instances

And filter the result for:

"Endpoint": {
                "Port": 3306,
                "Address": "proxysqltestdb.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com"
            },

To run ProxySQL with RDS in general, you need to install it on an intermediate server or on the application box.

Once you decide which one fits your setup better, you must download or git clone ProxySQL v2.0+.

DO NOT use v1.4.x, as it does not contain these new features and will not work as expected.

Once you have all the Aurora instances up, it is time to configure ProxySQL. Below is an example of all the commands used during the installation:

grant usage, replication client on *.* to monitor@'%' identified by 'monitor';
delete from mysql_servers where hostgroup_id in (70,71);
delete from mysql_replication_hostgroups where writer_hostgroup=70;
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('proxysqltestdb.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com',70,3306,1000,2000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('proxysqltestdb.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com',71,3306,1000,2000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('proxysqltestdb2.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com',71,3306,1000,2000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('proxysqltestdb-eu-central-1b.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com',71,3306,1,2000);
INSERT INTO mysql_replication_hostgroups(writer_hostgroup,reader_hostgroup,comment,check_type) VALUES (70,71,'aws-aurora','innodb_read_only');
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
delete from mysql_query_rules where rule_id in (50,51,52);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(50,6033,'m8_test',70,0,3,'.',1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(51,6033,'m8_test',70,1,3,'^SELECT.*FOR UPDATE',1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(52,6033,'m8_test',71,1,3,'^SELECT.*$',1);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
delete from mysql_users where username='m8_test';
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('m8_test','test',1,70,'mysql',1);
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;
update global_variables set variable_value="67108864" where variable_name='mysql-max_allowed_packet';
update global_variables set Variable_Value=0  where Variable_name='mysql-hostgroup_manager_verbose';
load mysql variables to run;save mysql variables to disk;

The above will give you a ready-to-go ProxySQL setup that supports Aurora cluster, performing all the usual operations ProxySQL does, including proper W/R split and more for a user named ‘m8_test’.

The key is in passing the value ‘innodb_read_only’ for the column check_type in the table mysql_replication_hostgroups.  

To check the status of your ProxySQL, you can use this command (which gives you a snapshot of what is going to happen):

watch -n 1 'mysql --defaults-file=~/.my.cnf -h 127.0.0.1 -P 6032 -t -e "select b.weight, c.* from stats_mysql_connection_pool c left JOIN runtime_mysql_servers b ON  c.hostgroup=b.hostgroup_id and c.srv_host=b.hostname and c.srv_port = b.port where hostgroup in( 50,52,70,71) order by hostgroup,srv_host desc;" -e " select srv_host,command,avg(time_ms), count(ThreadID) from stats_mysql_processlist group by srv_host,command;" -e "select * from stats_mysql_users;";mysql  --defaults-file=~/.my.cnf -h 127.0.0.1 -P 6032  -t -e "select * from stats_mysql_global "|egrep -i  "(mirror|memory|stmt|processor)"'
+--------+-----------+--------------------------------------------------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host                                                                 | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+--------------------------------------------------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 1000   | 70        | proxysqltestdb.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com               | 3306     | ONLINE | 0        | 0        | 0	     | 0       | 0           | 0       | 0                 | 0               | 0               | 5491       |
| 1000   | 71        | proxysqltestdb2.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com              | 3306     | ONLINE | 0        | 5        | 5	     | 0       | 5           | 73      | 0                 | 5483            | 28442           | 881        |
| 1000   | 71        | proxysqltestdb.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com               | 3306     | ONLINE | 0        | 5        | 5	     | 0       | 5           | 82      | 0                 | 6203            | 32217           | 5491       |
| 1	 | 71        | proxysqltestdb-eu-central-1b.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com | 3306     | ONLINE | 0        | 0        | 0	     | 0       | 0           | 0       | 0                 | 0               | 0               | 1593       |
+--------+-----------+--------------------------------------------------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
+----------+----------------------+--------------------------+
| username | frontend_connections | frontend_max_connections |
+----------+----------------------+--------------------------+
| m8_test  | 0                    | 10000                    |
+----------+----------------------+--------------------------+
| Query_Processor_time_nsec    | 0              |
| Com_backend_stmt_prepare     | 0              |
| Com_backend_stmt_execute     | 0              |
| Com_backend_stmt_close       | 0              |
| Com_frontend_stmt_prepare    | 0              |
| Com_frontend_stmt_execute    | 0              |
| Com_frontend_stmt_close      | 0              |
| Mirror_concurrency           | 0              |
| Mirror_queue_length          | 0              |
| SQLite3_memory_bytes         | 2652288        |
| ConnPool_memory_bytes        | 712720         |
| Stmt_Client_Active_Total     | 0              |
| Stmt_Client_Active_Unique    | 0              |
| Stmt_Server_Active_Total     | 0              |
| Stmt_Server_Active_Unique    | 0              |
| Stmt_Max_Stmt_id             | 1              |
| Stmt_Cached                  | 0              |
| Query_Cache_Memory_bytes     | 0              |

At this point, you can connect your application and see how ProxySQL allows you to perform much better than the native cluster entry point.

This will be expanded in the next article: Leverage AWS Aurora performance.

Conclusions

I had my first issue with the native Aurora connector a long time ago, but I had nothing to replace it. ProxySQL is a very good alternative to standard cluster access, with more options/controls and it also allows us to perform close-to-application caching, which is much more efficient than the remote MySQL one (http://www.proxysql.com/blog/scaling-with-proxysql-query-cache).

In the next article I will illustrate how, in a simple setup, ProxySQL can help in achieving better results than using the default Aurora cluster endpoint.

The post How to Implement ProxySQL with AWS Aurora appeared first on Percona Database Performance Blog.

by Marco Tusa at April 03, 2018 06:50 PM

Jean-Jerome Schmidt

New Webinar: How to Measure Database Availability

Join us on April 24th for Part 2 of our database high availability webinar special!

In this session we will focus on how to measure database availability. It is notoriously hard to measure and report on, although it is an important KPI in any SLA between you and your customer. With that in mind, we will discuss the different factors that affect database availability and see how you can measure your database availability in a realistic way.

It is common enough to define availability in terms of 9s (e.g. 99.9% or 99.999%) - especially here at Severalnines - although there are often different opinions as to what these numbers actually mean, or how they are measured.

Is the database available if an instance is up and running, but it is unable to serve any requests? Or if response times are excessively long, so that users consider the service unusable? Is the impact of one longer outage the same as multiple shorter outages? How do partial outages affect database availability, where some users are unable to use the service while others are completely unaffected?

Not agreeing on precise definitions with your customers might lead to dissatisfaction. The database team might be reporting that they have met their availability goals, while the customer is dissatisfied with the service.

Join us for this webinar during which we will discuss the different factors that affect database availability and see how to measure database availability in a realistic way.

Register for the webinar

Date, Time & Registration

Europe/MEA/APAC

Tuesday, April 24th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)

Register Now

North America/LatAm

Tuesday, April 24th at 09:00 PDT (US) / 12:00 EDT (US)

Register Now

Agenda

  • Defining availability targets
    • Critical business functions
    • Customer needs
    • Duration and frequency of downtime
    • Planned vs unplanned downtime
    • SLA
  • Measuring the database availability
    • Failover/Switchover time
    • Recovery time
    • Upgrade time
    • Queries latency
    • Restoration time from backup
    • Service outage time
  • Instrumentation and tools to measure database availability:
    • Free & open-source tools
    • CC's Operational Report
    • Paid tools

Register for the webinar

Speaker

Bartlomiej Oles is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.

by jj at April 03, 2018 10:33 AM

April 02, 2018

Peter Zaitsev

Plot MySQL Data in Real Time Using Percona Monitoring and Management (PMM)

Plot MySQL Data in Real Time

In this blog post, we’ll show that you can plot MySQL data in real time using Percona Monitoring and Management (PMM).

In my previous blog post, I showed how we could load into any metrics, benchmarks into MySQL and visualize it with PMM. But that’s not all! We can even visualize most any kind of data from MySQL in real time. I am falling in love with the MySQL plugin for Grafana — it just makes things so easy and smooth.

This graph shows us the number of visitors to a website in real time (refreshing in every 5 seconds).

We have the following table in MySQL:

CREATE TABLE `page_stats` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `visitors` int(11) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `time` (`time`)
) ENGINE=InnoDB AUTO_INCREMENT=9232 DEFAULT CHARSET=latin1

We store the number of visitors every second. I am not saying you have to update this table hundreds or thousands of times, it depends on how many visitors you have. You could use the example of Redis to store and increase this counter and save it into MySQL every second. Here are my metrics:

mysql> select * from page_stats order by id desc limit 10;
+------+---------------------+----------+
| id   | time                | visitors |
+------+---------------------+----------+
| 9446 | 2018-02-27 21:44:12 |      744 |
| 9445 | 2018-02-27 21:44:11 |      703 |
| 9444 | 2018-02-27 21:44:10 |      791 |
| 9443 | 2018-02-27 21:44:09 |      734 |
| 9442 | 2018-02-27 21:44:08 |      632 |
| 9441 | 2018-02-27 21:44:07 |      646 |
| 9440 | 2018-02-27 21:44:06 |      656 |
| 9439 | 2018-02-27 21:44:05 |      678 |
| 9438 | 2018-02-27 21:44:04 |      673 |
| 9437 | 2018-02-27 21:44:03 |      660 |
+------+---------------------+----------+

We can easily add my MySQL query to Grafana, and it will visualize it for us:

You might ask “what is $__timeFilter?” I discussed that in the previous post, but let me copy the manual here as well:

Time series:
- return column named time_sec (UTC in seconds), use UNIX_TIMESTAMP(column)
- return column named value for the time point value
- return column named metric to represent the series name
Table:
- return any set of columns
Macros:
- $__time(column) -> UNIX_TIMESTAMP(column) as time_sec
- $__timeFilter(column) ->  UNIX_TIMESTAMP(time_date_time) ≥ 1492750877 AND UNIX_TIMESTAMP(time_date_time) ≤ 1492750877
- $__unixEpochFilter(column) ->  time_unix_epoch > 1492750877 AND time_unix_epoch < 1492750877
- $__timeGroup(column,'5m') -> (extract(epoch from "dateColumn")/extract(epoch from '5m'::interval))::int
Or build your own conditionals using these macros which just return the values:
- $__timeFrom() ->  FROM_UNIXTIME(1492750877)
- $__timeTo() ->  FROM_UNIXTIME(1492750877)
- $__unixEpochFrom() ->  1492750877
- $__unixEpochTo() ->  1492750877

What can I visualize?

It’s true! Basically, if you can write a query, you can graph it. For example, let’s count all the visitors in every minute. Here is the query:

select
      UNIX_TIMESTAMP(ps.time) as time_sec,
      sum(visitors) as value,
      'visitors' as metric
   from
   page_stats as ps
   WHERE $__timeFilter(time)
   GROUP BY DATE_FORMAT(`time`, '%Y-%m-%d %H:%i')
    ORDER BY ps.time ASC;

And it gives us the following graph:

See, it’s easy! 🙂

Conclusion

There is no more excuse why you can not visualize your data! Percona Monitoring and Management lets you plot MySQL data in real time. You do not have to move it anywhere or change anything! Just grant read access from PMM, and you can start to create your own graphs!

The post Plot MySQL Data in Real Time Using Percona Monitoring and Management (PMM) appeared first on Percona Database Performance Blog.

by Tibor Korocz at April 02, 2018 11:37 PM

Migrate to Amazon RDS Using Percona Xtrabackup

Migrate to Amazon RDS

In this blog post, we’ll look at how to migrate to Amazon RDS using Percona XtraBackup.

Until recently, there was only one way to migrate your data from an existing MySQL instance into a new RDS MySQL instance: take and restore a logical backup with mysqldump or mydumper. This can be slow and error-prone. When Amazon introduced Amazon Aurora MySQL, you could use Percona XtraBackup to take an online physical backup of your database and restore that into a new Aurora instance. This feature is now available for RDS MySQL as well. Using Percona XtraBackup instead of a logical backup can save a lot of time, especially with a large dataset.

There are many caveats and limitations listed in Amazon’s documentation, but the most important ones are:

  • Source and destination databases must be MySQL 5.6. Earlier and later major versions are not supported at this time.
  • You can’t restore into an existing RDS instance using this method.
  • The total data size is limited to 6 TB.
  • User accounts, functions, and stored procedures are not imported automatically.
  • You can’t choose which databases and tables to migrate this way — migrate the whole instance. (You can’t use Percona Xtrabackup’s partial backup feature when migrating to RDS.)

If those limitations don’t apply to your use case, read on to learn how to migrate to Amazon RDS using Percona XtraBackup and restoring it into RDS.

Demonstration

For this demonstration, I created a Percona Server for MySQL 5.6 instance on EC2 with the sakila sample database and an extra InnoDB table. I filled the table with junk data to make the total data size about 13.5 GB. Then I installed the latest percona-xtrabackup-24  (2.3 would also have worked) and the AWS CLI tools. I took a backup from the EC2 instance with this command, using gzip to create a compressed backup:

sudo xtrabackup --backup --stream=tar | gzip -c > /data/backups/xtrabackup.tar.gz

Note that Amazon prepares the backup, so there’s no need to run

xtrabackup --prepare
 yourself.

For comparison, I took a mysqldump backup as well:

mysqldump --all-databases --triggers --events --routines --master-data=1 --single-transaction | gzip -c > /data/backups/mysqldump.sql.gz

I could have used mydumper to make this process multi-threaded, but to reduce complexity I did not. I then uploaded the backup to an S3 bucket (setting up credentials beforehand):

sudo aws s3 cp /data/backups/xtrabackup.tar.gz s3://dankow/

After that, I navigated to Relational Database Service in the AWS Console, and instead of clicking Launch DB Instance, I clicked Restore from S3. After that, the process is almost identical to creating a normal RDS MySQL or Amazon Aurora MySQL instance, with the addition of this box on Step 2:

I chose a db.m4.xlarge instance with 1000 Provisioned IOPS for this test. After I configured all the other options, I clicked “Launch DB Instance” and waited for my backup to decompress, prepare and restore into a new RDS instance.

For time comparison, I imported the backup I took with mysqldump, ignoring all the expected errors about privileges because they don’t affect the tables that we’re really interested in:

gunzip -c /data/backups/mysqldump.sql.gz | mysql --defaults-file=rds.cnf --force

Replication

If you’re planning on migrating a non-RDS instance to RDS, you might want to make your new RDS instance an async replica of the source instance. If there is a network path between the two instances, this is simple. Use the binary log coordinates from the xtrabackup_binlog_info (RDS does not support master_auto_position with GTID replication), and use them as arguments to the RDS external replication stored procedures, like this:
CALL mysql.rds_set_external_master (
"<host_name>",
 <host_port>,
 "<replication_user_name",
 "replication_password",
 "<mysql_binary_log_file_name>",
 mysql_binary_log_file_position,
 0
);
CALL mysql.rds_start_replication;

Currently, there is no way to make this connection use SSL. If the source instance is not in the same VPC as the RDS instance, set up a VPN connection between the two networks in order to protect the replication traffic.

Time Comparison

The time to back up was close: 8 minutes for Percona XtraBackup, and 7.5 minutes for mysqldump. Add the time to copy the backup to S3 (37 seconds), and the two methods are almost identical.The difference comes with restore time. The mysqldump backup took 22.5 minutes to restore, and Amazon took 10 minutes and 50 seconds to create the RDS instance from the backup. Some part of that is the normal overhead of creating an RDS instance, which always takes a few minutes.

Although my test dataset was small (13.5 GB) compared to most production databases, it was large enough to show a significant difference between physical (Percona XtraBackup) and logical (mysqldump) backups. The XtraBackup method was about 60% faster than mysqldump. If your dataset is larger, you will see even more of a difference.

Conclusion

When you migrate to Amazon RDS using a physical backup, it can be much faster than using a logical backup — but it’s not the right option for every use case. If your InnoDB tablespaces have significant fragmentation, or if you’re not currently using innodb_file_per_table, you may want to perform a logical migration to fix those issues. If you normally create RDS instances programmatically, the AWS CLI does not currently support creating an RDS instance from a physical backup. Any corruption in the InnoDB files transfers over to the RDS instance if you use a physical backup, but a logical backup will fail and allow you to fix the corruption before it gets to RDS.

For many use cases, however, building an RDS instance from Percona XtraBackup is a convenient way to get your data into RDS MySQL or Aurora relatively quickly. In this one small-scale test, migrating using XtraBackup was 60% faster than using mysqldump.

The post Migrate to Amazon RDS Using Percona Xtrabackup appeared first on Percona Database Performance Blog.

by Daniel Kowalewski at April 02, 2018 11:00 PM

MongoDB Data at Rest Encryption Using eCryptFS

In this post, we’ll look at MongoDB data at rest encryption using eCryptFS, and how to deploy a MongoDB server using encrypted data files.

When dealing with data, a good security policy should enforce the use of “no trivial” passwords, the use of encrypted connections and hopefully encrypted files on the disks.

Only the MongoDB Enterprise edition has an “engine encryption” feature. The Community edition and Percona Server for MongoDB don’t (yet). This is why I’m going to introduce a useful way to achieve data encryption at rest for MongoDB, using a simple but effective tool: eCryptFS.

eCryptFS is an enterprise-class stacked cryptographic filesystem for Linux. You can use it to encrypt partitions or even any folder that doesn’t use a partition of its own, no matter the underlying filesystem or partition type. For more information about this too, visit the official website: http://ecryptfs.org/.

I’m using Ubuntu 16.04 and I have Percona Server for MongoDB already installed on the system. The data directory (dbpath) is in /var/lib/mongodb.

Preparation of the encrypted directory

First, let’s stop mongod if it’s running:

sudo service mongod stop

Install eCryptFS:

sudo apt-get install ecryptfs-utils

Create two new directories:

sudo mkdir /datastore
sudo mkdir /var/lib/mongodb-encrypted

We’ll use the /datastore directory as the folder where we copy all the mongo’s files, and have them automatically encrypted. It’s also useful to test later that everything is working correctly. The folder /var/lib/mongodb_encrypted is the mount point we’ll use as the new data directory for mongod.

Mount the encrypted directory

Now it’s time to use eCryptFS to mount the /datastore folder and define it as encrypted. Launch the command as follows, choose a passphrase and respond to all the questions with the default proposed value. In a real case, choose the answers that best fit for you, and a complex passphrase:

root@psmdb1:~# sudo mount -t ecryptfs /datastore /var/lib/mongo-encrypted
Passphrase:
Select cipher:
1) aes: blocksize = 16; min keysize = 16; max keysize = 32
2) blowfish: blocksize = 8; min keysize = 16; max keysize = 56
3) des3_ede: blocksize = 8; min keysize = 24; max keysize = 24
4) twofish: blocksize = 16; min keysize = 16; max keysize = 32
5) cast6: blocksize = 16; min keysize = 16; max keysize = 32
6) cast5: blocksize = 8; min keysize = 5; max keysize = 16
Selection [aes]:
Select key bytes:
1) 16
2) 32
3) 24
Selection [16]:
Enable plaintext passthrough (y/n) [n]:
Enable filename encryption (y/n) [n]:
Attempting to mount with the following options:
 ecryptfs_unlink_sigs
 ecryptfs_key_bytes=16
 ecryptfs_cipher=aes
 ecryptfs_sig=f946e4b85fd84010
Mounted eCryptfs

If you see Mounted eCryptfs as the last line, everything went well. Now you have the folder /datastore encrypted. Any file you create or copy into this folder is automatically encrypted by eCryptFS. Also, you have mounted the encrypted folder into the path /var/lib/mongo-encrypted.

For the sake of security, you can verify with the mount command that the directory is correctly mounted. You should see something similar to the following:

root@psmdb1:~# sudo mount | grep crypt
/datastore on /var/lib/mongo-encrypted type ecryptfs (rw,relatime,ecryptfs_sig=f946e4b85fd84010,ecryptfs_cipher=aes,ecryptfs_key_bytes=16,ecryptfs_unlink_sigs)

Copy mongo files

sudo cp -r /var/lib/mongodb/* /var/lib/mongo-encrypted

We copy all the files from the existent mongo’s data directory into the new path.

Since we are working as root (or we used sudo -s at the beginning), we need to change the ownership of the files to the mongod user, the default user for the database server. Otherwise, mongod won’t start:

sudo chown -R mongod:mongod /var/lib/mongo-encrypted/

Modify mongo configuration

Before starting mongod, we have to change the configuration into /etc/mongod.conf to instruct the server to use the new folder. So, change the line with dbpath as follow and save the file:

dbpath=/var/lib/mongo-encrypted

Launch mongod and verify

So, it’s time to start mongod, connect with the mongo shell and verify that it’s working as usual:

root@psmdb1:~# sudo service mongod start

The server works correctly and is unaware of the encrypted files because eCryptFS itself takes care of encryption and decryption activities at a lower level. There’s a little price to pay in terms of performance, as in every system that uses encryption, but we won’t worry about that since our first goal is security. In any case, eCryptFS has some small footprint.

Now, let’s verify the files directly.

Since the encrypted folder is mounted and automatically managed by eCryptFS, we can see the content of the files. Let’s have a look:

root@psmdb1:~# cat /var/lib/mongo-encrypted/mongod.lock
6965

But if we look at the same file into /datastore, we see weird characters:

root@psmdb1:~# cat /datastore/mongod.lock
�0���k�"3DUfw`�Pp�Ku�����b�_CONSOLE�F�_�@��[�'�b��^�җfZ�7

As expected.

Make encrypted dbpath persistent

Finally, if you want to automatically mount the encrypted directory at startup, add the following line into /etc/fstab:

/datastore /var/lib/mongo-encrypted ecryptfs defaults 0 0

Create the file .ecryptfsrc into /root directory with the following lines:

key=passphrase:passphrase_passwd_file=/root/passphrase.txt
ecryptfs_sig=f946e4b85fd84010
ecryptfs_cipher=aes
ecryptfs_key_bytes=16
ecryptfs_passthrough=n
ecryptfs_enable_filename_crypto=n

You can find the value of the variable ecryptfs_sig in the file /root/.ecryptfs/sig-cache.txt.

Create the file /root/passphrase.txt containing your secret passphrase. The format is as follows:

passphrase_passwd=mypassphrase

Now you can reboot the system and have the encrypted directory mounted at startup.

Tip: it is not a good idea to have a plain text file on your server with our passphrase. To have a better security level, you can place this file into a USB key (for example) that you can mount at startup, or you can use some sort of wallet tool to protect your passphrase.

Conclusion

Security is more and more a “must have” that customers are requesting of anyone managing their data. This how-to guide shows that achieving MongoDB data at rest encryption success is not so complicated.

The post MongoDB Data at Rest Encryption Using eCryptFS appeared first on Percona Database Performance Blog.

by Corrado Pandiani at April 02, 2018 04:46 PM

April 01, 2018

Valeriy Kravchuk

Fun with Bugs #64 - On MySQL Bug Reports I am Subscribed to, Part IV

I've subscribed to more than 15 new MySQL bug reports since the previous post in this series, so it's time for a new one. I am trying to follow important, funny or hard to process bug reports every day. Here is the list of the most interesting recent ones starting from the latest (with several still not processed properly):
  • Bug #90211 - "Various warnings and errors when compiling MySQL 8 with Clang".  Roel Van de Paar and Percona in general continue their QA efforts in a hope to make MySQL 8 better. Current opinion of Oracle engineers on this bug is the following:
    "First of all, these issues are in protobuf, not MySQL per se. There are some warnings with Clang 6, but since they're in third-party code, we have simply disabled them when compiling protobuf (will be part of 8.0.11). Optionally, -DUSE_SYSTEM_LIBS=1 will use system protobuf and thus not compile the files in question.
    As for the crash, we don't support prerelease compilers (more generally, we support platforms, not compilers). Given the stack trace, it is highly likely that the issue either is in the prerelease Clang, or in protobuf.
    "
    Let's see how it may end up. Roel rarely gives up easily...
  • Bug #90209 - "Performance regression with > 15K tables in MySQL 8.0 (with general tablespaces)". Nice regression bug report from Alexander Rubin. It is still "Open".
  • Bug #90190 - "Sig=6 assertion in MYSQL_BIN_LOG::new_file_impl | binlog.cc:6862". Yet another bug report from Percona employee, Ramesh Sivaraman.
  • Bug #89994 - "INDEX DIRECTORY shown as valid option for InnoDB table creation". Everybody knows how much I like fine MySQL manual. Even more I like when missing or wrong details are found there, like in this case reported by by colleague from MariaDB, Claudio Nanni.
  • Bug #89963  - "Slowdown in creating new SSL connection". Maybe it's comparing apples to oranges, as stated in one of comments, but I am surprised that this (performance regression) bug report by Rene' Cannao' is still "Open". It requires more attention, IMHO. Speed of connections matters a lot for MySQL.
  • Bug #89904 - "Can't change innodb_max_dirty_pages_pct to 0 to flush all pages". Good intentions to set better default value (applied a bit later than needed) led to the problem. As Simon Mudd put it:
    "innodb_max_dirty_pages_pct_lwm setting has existed since 5.6. This issue only comes up as by changing the default value to 10 those of us who have ignored it until now never noticed it existed. That is a shame as setting this value to a value other than 0 (e.g. 10 which is the new default) should be better and trigger some background flushing of dirty pages avoiding us hitting innodb_max_dirty_pages_pct which would trigger much more aggressive behaviour which is not really desirable."
  • Bug #89876 - "mysqladmin flush-hosts is not safe in GTID mode". Yet another bug report from Simon Mudd. See also Bug #88720 that highlights even more problems with various FLUSH statements and GTIDs.
  • Bug #89870 - "Group by optimization not used with partitioned tables". For some reason this report from Arnaud Adant is still "Open". As my colleague Richard Stracke stated:
    "The only solution would be, that the optimizer is able to check, if the condition in the where clause include the whole table (or partition) and in this case use group by optimization."
  • Bug #89860 - "XA may lost prepared transaction and cause different between master and slave." As this (and other, like Bug #88534) bug report from Michael Yang shows, there is still a long way to go until it would be safe to use XA transactions with MySQL.
  • Bug #89834 - "Replication will not connect on IPv6 - does not function in an IPv6 only environ". This bug report from Tim St. Pierre is still "Open".
  • Bug #89822 - "InnoDB retries open on EINTR error only if innodb_use_native_aio is enabled". We have patch contributed by Laurynas Biveinis from Percona.
  • Bug #89758 - "Conversion from ENUM to VARCHAR fails because mysql adds prefix index". This funny bug was found and reported by Monty Solomon.
  • Bug #89741 - "Events log Note level messages even for log_warnings=0". Nikolai Ikhalainen found that this problem happens only in versions 5.5.x and 5.6.x, so chances to see it fixed are low. But I still want to know if this ever happens.
  • Bug #89696 - "Cyclic dependencies are not resolved properly with cascade removal". Make sure to check nice discussion that my dear friend Sinisa Milivojevic had with a bug reporter, Andrei Anishchenko, before marking the bug as "Verified". This regression was most likely caused by a change in MySQL 5.7.21:
    "InnoDB: An iterative approach to processing foreign cascade operations resulted in excessive memory use. (Bug #26191879, Bug #86573)"
  • Bug #89625 - "please package the debug symbols *.pdb files!". Shane Bester always cared about having a way to debug on Windows. Recently I also started to care about this...
---
It's April Fools' Day today, so why not to make fool of myself assuming that anyone cares about the series of blog posts.

by Valeriy Kravchuk (noreply@blogger.com) at April 01, 2018 02:42 PM

March 30, 2018

Peter Zaitsev

Multi-Source Replication Performance with GTID

Multi-Source Replication with GTID

In this blog post, we’ll look at the performance of multi-source replication with GTID.

Multi-Source Replication is a topology I’ve seen discussed recently, so I decided to look into how it performs with the different replication concepts. Multi-source replication use replication channels, which allow a slave to replicate from multiple masters. This is a great way to consolidate data that has been sharded for production or simplify the analytics process by using the same server. Since multiple masters are taking writes, care is needed to not overlook the slave. The traditional replication concept uses the binary log file name, and the position inside that file.

This was the standard until the release of global transaction identifiers (GTID). I have set up a test environment to validate which concept would perform better, and be a better choice for use in this topology.

SETUP

My test suite is rather simple, consisting of only three virtual machines, two masters and one slave. The slaves’ replication channels are set up using the same concept for each run, and no run had any replication filters. To prevent any replication errors, each master took writes against a different schema and user grants are identical on all three servers. The setup below ran with both replication channels using binary log file and position. Then the tables were dropped and the servers changed to use GTID for the next run.

Prepare the sysbench tables:

sysbench --db-driver=mysql --mysql-user= --mysql-password='' --mysql-db=db1 --range_size=100 --table_size=1000000 --tables=5 --threads=5 --events=0 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua prepare
sysbench --db-driver=mysql --mysql-user= --mysql-password='' --mysql-db=db3 --range_size=100 --table_size=1000000 --tables=5 --threads=5 --events=0 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua prepare

I used a read-only sysbench to warm up the InnoDB buffer pool. Both commands ran on the slave to ensure both schemas were loaded into the buffer pool:

sysbench --db-driver=mysql --mysql-user= --mysql-password='' --mysql-db=db1 --range_size=100 --table_size=1000000 --tables=5 --threads=5 --events=0 --time=3600 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua run
sysbench --db-driver=mysql --mysql-user= --mysql-password='' --mysql-db=db3 --range_size=100 --table_size=1000000 --tables=5 --threads=5 --events=0 --time=3600 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua run

After warming up the buffer pool, the slave should be fully caught up with both masters. To remove IO contention as a possible influencer, I stopped the SQL thread while I generated load on the master. Leaving the IO thread running allowed the slave to write the relay logs during this process, and help ensure that the test only measures the difference in the slave SQL thread.

stop slave sql thread for channel 'db1'; stop slave sql thread for channel 'db3';

Each master had a sysbench run against it for the schema that was designated to it in order to generate the writes:

sysbench --db-driver=mysql --mysql-user= --mysql-password='' --mysql-db=db1 --range_size=100 --table_size=1000000 --tables=5 --threads=1 --events=0 --time=3600 --rand-type=uniform /usr/share/sysbench/oltp_write_only.lua run
sysbench --db-driver=mysql --mysql-user= --mysql-password='' --mysql-db=db3 --range_size=100 --table_size=1000000 --tables=5 --threads=1 --events=0 --time=3600 --rand-type=uniform /usr/share/sysbench/oltp_write_only.lua run

Once the writes completed, I monitored the IO activity on the slave to ensure it was 100% idle and that all relay logs were fully captured. Once everything was fully written, I enabled a capture of the replication lag once per minute for each replication channel, and started the slaves SQL threads:

usr/bin/pt-heartbeat -D db1 -h localhost --master-server-id=101 --check
usr/bin/pt-heartbeat -D db3 -h localhost --master-server-id=103 --check
start slave sql thread for channel 'db1'; start slave sql thread for channel 'db3';

The above chart depicts the cumulative lag seen on the slave by pt-heartbeat since starting the sql_thread. The first item to noticed is that the replication delay was higher overall with the binary log. This could be because the SQL thread stopped for a different amount of time. This may appear to give GTID an advantage in this test, but remember with this test the amount of delay is less important than the processed rate. Focusing on when replication began to display a significant change towards catching up you can see that there are two distinct drops in delay. This is caused by the fact that the slave has two replication threads that individually monitor their delay. One of the replication threads caught up fully and the other was delayed for a bit longer.

In every test run. GTID took slightly longer to fully catch up than the traditional method. There are a couple of reasons to expect GTID’s to be slightly slower. One possibility is the that there are additional writes on the slave, in order to keep track of all the GTID’s that the slave ran. I removed the initial write to the relay log, but we must retain the committed GTID, and this causes additional writes. I used the default settings for MySQL, and as such log_slave_updates was disabled. This causes the replicated GTID to be stored in a table, which is periodically compressed. You can find more details on how log_slave_updates impacts GTID replication here.

So the question still exists, why should we use GTID, especially with multisource replication? I’ve found that the answer lies in the composition of a GTID. From MySQL’s GTID Concepts, a GTID is composed of two parts, the source_id, and the transaction_id. The source_id is a unique identifier targeting the server which originally wrote the transaction. This allows you to identify in the binary log which master took the initial write, and so you can pinpoint problems much easier.

The below excerpt from DB1’s (a master from this test) binary log shows that, before the transaction being written, the “SET @@SESSION.GTID_NEXT” ran. This is the GTID that you can follow through the rest of the topology to identify the same transaction.

“d1ab72e9-0220-11e8-aee7-00155dab6104” is the server_uuid for DB1, and 270035 is the transaction id.

SET @@SESSION.GTID_NEXT= 'd1ab72e9-0220-11e8-aee7-00155dab6104:270035'/*!*/;
# at 212345
#180221 15:37:56 server id 101 end_log_pos 212416 CRC32 0x758a2d77 Query thread_id=15 exec_time=0 error_code=0
SET TIMESTAMP=1519245476/*!*/;
BEGIN
/*!*/;
# at 212416
#180221 15:37:56 server id 101 end_log_pos 212472 CRC32 0x4363b430 Table_map: `db1`.`sbtest1` mapped to number 109
# at 212472
#180221 15:37:56 server id 101 end_log_pos 212886 CRC32 0xebc7dd07 Update_rows: table id 109 flags: STMT_END_F
### UPDATE `db1`.`sbtest1`
### WHERE
### @1=654656 /* INT meta=0 nullable=0 is_null=0 */
### @2=575055 /* INT meta=0 nullable=0 is_null=0 */
### @3='20363719684-91714942007-16275727909-59392501704-12548243890-89454336635-33888955251-58527675655-80724884750-84323571901' /* STRING(120) meta=65144 nullable=0 is_null=0 */
### @4='97609582672-87128964037-28290786562-40461379888-28354441688' /* STRING(60) meta=65084 nullable=0 is_null=0 */
### SET
### @1=654656 /* INT meta=0 nullable=0 is_null=0 */
### @2=575055 /* INT meta=0 nullable=0 is_null=0 */
### @3='17385221703-35116499567-51878229032-71273693554-15554057523-51236572310-30075972872-00319230964-15844913650-16027840700' /* STRING(120) meta=65144 nullable=0 is_null=0 */
### @4='97609582672-87128964037-28290786562-40461379888-28354441688' /* STRING(60) meta=65084 nullable=0 is_null=0 */
# at 212886
#180221 15:37:56 server id 101 end_log_pos 212942 CRC32 0xa6261395 Table_map: `db1`.`sbtest3` mapped to number 111
# at 212942
#180221 15:37:56 server id 101 end_log_pos 213166 CRC32 0x2782f0ba Write_rows: table id 111 flags: STMT_END_F
### INSERT INTO `db1`.`sbtest3`
### SET
### @1=817058 /* INT meta=0 nullable=0 is_null=0 */
### @2=390619 /* INT meta=0 nullable=0 is_null=0 */
### @3='01297933619-49903746173-24451604496-63437351643-68022151381-53341425828-64598253099-03878171884-20272994102-36742295812' /* STRING(120) meta=65144 nullable=0 is_null=0 */
### @4='29893726257-50434258879-09435473253-27022021485-07601619471' /* STRING(60) meta=65084 nullable=0 is_null=0 */
# at 213166
#180221 15:37:56 server id 101 end_log_pos 213197 CRC32 0x5814a60c Xid = 2313
COMMIT/*!*/;
# at 213197

Conclusion

Based on the sysbench tests I ran, GTID replication has a slightly lower throughput. It took about two to three minutes longer to process an hour worth of writes on two masters, compared to binary log replication. GTID’s strengths lie more in how it eases the management and troubleshooting of complex replication topologies.

The GTID concept allows a slave to know exactly which server initially wrote the transaction, even in a tiered environment. This means that if you need to promote a slave from the bottom tier, to the middle tier, simply changing the master is all that is needed. The slave can pick up from the last transaction it ran on that server and continue replicating without a problem. Stephane Combaudon explains this in detail in a pair of blogs. You can find part 1 here and part 2 here. Facebook also has a great post about their experience deploying GTID-based replication and the troubles they faced.

The post Multi-Source Replication Performance with GTID appeared first on Percona Database Performance Blog.

by Bradley Mickel at March 30, 2018 07:49 PM

MongoDB 3.6 Retryable Writes . . . Retryable Writes

MongoDB 3.6 retryable write

MongoDB 3.6 retryable writeIn this blog post, we will discuss MongoDB 3.6 Retryable Writes, a new application-level feature.

Background

From the beginning, MongoDB replica sets were designed to recover gracefully from many internal problems or events such as node crashes, network partitions/errors/interruptions, replica set member fail-overs, etc.

While these events eventually recover transparently to the overall replica set, in many instances these events return errors to the application. The most common example is a failover of the Primary during a write: this returns network errors to most MongoDB drivers. Another possible situation is a Primary receiving a write, but the acknowledgment response never makes it back to the driver. Here it is unclear to the application if the write really succeeded or not.

If an application is designed for writes to be idempotent, generally all the application needs to do in a problem scenario is send the same write operation again and again until it succeeds. This approach is extremely dangerous to data integrity, however, if the application was not designed for idempotent writes! Retrying writes relying on state can lead to incorrect results.

MongoDB 3.6 Retryable Writes

MongoDB 3.6 introduces the concept of Retryable Writes to address situations where simple retrying of idempotent operations is not possible or desired (often more code is required to perform retries). This feature is implemented transparently via the use of unique IDs for each write operation that both the MongoDB driver and server can consider when handling failures.

This feature allows the application driver to automatically retry a failed write behind-the-scenes, without throwing an exception/error to the application. Retryable Writes mitigates problems caused by short interruptions, not long-term problems. Therefore, the mechanism only retries a write operation exactly once. If the retry is unsuccessful, then the application receives an error/exception as normal.

If a healthy Primary cannot be found to retry the write, the MongoDB driver waits for a time period equal to the serverSelectionTimeoutMS server parameter before retrying the write, so that it can allow for a failover to occur.

MongoDB implemented this feature in both the MongoDB driver and server, and it has some requirements:

  • MongoDB Version – every node in the cluster or replica set must run version 3.6 or greater. All nodes must also have featureCompatabilityVersion set to ‘3.6’.
  • MongoDB Driver – this feature requires that your application use a MongoDB driver that supports it.
  • Replication – The Retryable Writes feature requires that MongoDB Replication is enabled. You can use a single-node Replica Set to achieve this if you do not wish to deploy many nodes.
  • Write Concern – A Write Concern of ‘1’ or greater is required for this feature to operate.
  • Storage Engine – The use of MMAPv1 is not possible with this feature. WiredTiger or inMemory storage engines only!

With the exception of insert operations, this feature is limited to operations that change only a single document, meaning the following operations cannot use Retryable Writes:

  1. Multi-document Update (multi: true)
  2. Multi-document Delete
  3. Bulk Operations with Multi-document changes

The full list of operations available for use with this feature is here: https://docs.mongodb.com/manual/core/retryable-writes/#retryable-write-operations.

Using Retryable Writes

Enabling Retryable Writes doesn’t require major code changes!

Generally, you enable the use of Retryable Writes by adding the ‘retryWrites=’ flag to your MongoDB connection string that is passed to your MongoDB driver:

mongodb://localhost/?retryWrites=true

You enable the feature on the ‘mongo’ shell with the command-line flag ‘–retryWrites’:

mongo --retryWrites

That’s it! The rest is transparent to you!

Conclusion

The MongoDB 3.6 Retryable Writes feature continues a theme I’ve noticed in the last few major releases: improved data integrity and improved development experience.

The use of this great new feature should lead to simplified code and improved data integrity in applications using non-idempotent changes!

The post MongoDB 3.6 Retryable Writes . . . Retryable Writes appeared first on Percona Database Performance Blog.

by Tim Vaillancourt at March 30, 2018 07:21 PM

Percona XtraBackup 2.4.10 Is Now Available

Percona_XtraBackup LogoVert_CMYK

Percona XtraBackup 2.4Percona announces the GA release of Percona XtraBackup 2.4.10 on March 30, 2018. This release is based on MySQL 5.7.19. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

Starting from now, Percona XtraBackup issue tracking system was moved from launchpad to JIRA.

Bugs Fixed:

  • xbcrypt with the --encrypt-key-file option was failing due to regression in Percona XtraBackup 2.4.9. Bug fixed bug PXB-518.
  • Simultaneous usage of both the --lock-ddl and --lock-ddl-per-table options caused Percona XtraBackup lock with the backup process never completed. Bug fixed PXB-792.
  • Compilation under Mac OS X was broken. Bug fixed PXB-796.
  • A regression of the maximum number of pending reads and the unnoticed earlier possibility of a pending reads related deadlock caused Percona XtraBackup to stuck in prepare stage. Bug fixed PXB-1467.
  • Percona XtraBackup skipped tablespaces with a corrupted first page instead of aborting the backup. Bug fixed PXB-1497.

Other bugs fixed: PXB-513.

Release notes with all the bugfixes for version 2.4.10 are available in our online documentation. Please report any bugs to the issue tracker.

The post Percona XtraBackup 2.4.10 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at March 30, 2018 04:43 PM

Analyze MySQL Audit Logs with ClickHouse and ClickTail

ClickHouse and ClickTail

MySQL Audit LogsIn this blog post, I’ll look at how you can analyze MySQL audit logs (Percona Server for MySQL) with ClickHouse and ClickTail.

Audit logs are available with a free plugin for Percona Server for MySQL (https://www.percona.com/doc/percona-server/LATEST/management/audit_log_plugin.html). Besides providing insights about activity on your server, you might need the logs for compliance purposes.

However, on an active server, the logs can get very large. Under a sysbench-tpcc workload, for example, I was able to generate 24GB worth of logs just within one hour.

So we are going to use the ClickTail tool, which Peter Zaitsev mentioned in Analyze Your Raw MySQL Query Logs with ClickHouse and the Altinity team describes in the ClickTail Introduction.

Clicktail extracts all fields available in Percona Server for MySQL’s audit log in JSON format, as you can see in Schema. I used the command:

clicktail --dataset='clicktail.mysql_audit_log' --parser=mysqlaudit --file=/mnt/nvmi/mysql/audit.log --backfill

In my setup, ClickTail imported records at the rate of 1.5 to 2 million records/minute. Once we have ClickTail setup, we can do some work on audit logs. Below are some examples of queries.

Check if some queries were run with errors:

SELECT
    status AS c1,
    count(*)
FROM mysql_audit_log
GROUP BY c1
┌───c1─┬──count()─┐
│    0 │ 46197504 │
│ 1160 │        1 │
│ 1193 │     1274 │
│ 1064 │     5096 │
└──────┴──────────┘
4 rows in set. Elapsed: 0.018 sec. Processed 46.20 million rows, 184.82 MB (2.51 billion rows/s., 10.03 GB/s.)

First, it is very impressive to see the performance of 2.5 billion row/s analyzed. And second, there are really some queries with non-zero (errors) statuses.

We can dig into and check what exactly caused an 1193 error (MySQL Error Code: 1193. Unknown system variable):

SELECT *
FROM mysql_audit_log
WHERE status = 1193
LIMIT 1
┌───────────────_time─┬──────_date─┬─_ms─┬─command_class─┬─connection_id─┬─db─┬─host──────┬─ip─┬─name──┬─os_user─┬─os_login─┬─os_version─┬─mysql_version─┬─priv_user─┬─proxy_user─┬─record───────────────────────┬─sqltext────────────────────────────┬─status─┬─user──────────────────────┬─startup_optionsi─┐
│ 2018-03-12 20:34:49 │ 2018-03-12 │   0 │ select        │          1097 │    │ localhost │    │ Query │         │          │            │               │           │            │ 39782055_2018-03-12T20:21:21 │ SELECT @@query_response_time_stats │   1193 │ root[root] @ localhost [] │                  │
└─────────────────────┴────────────┴─────┴───────────────┴───────────────┴────┴───────────┴────┴───────┴─────────┴──────────┴────────────┴───────────────┴───────────┴────────────┴──────────────────────────────┴────────────────────────────────────┴────────┴───────────────────────────┴──────────────────┘

So this was

SELECT @@query_response_time_stats
, which I believe comes from the Percona Monitoring and Management (PMM) MySQL Metrics exporter.

Similarly, we can check what queries types were run on MySQL:

SELECT
    command_class,
    count(*)
FROM mysql_audit_log
GROUP BY command_class
┌─command_class────────┬──count()─┐
│                      │    15882 │
│ show_storage_engines │     1274 │
│ select               │ 26944474 │
│ error                │     5096 │
│ show_slave_status    │     1274 │
│ begin                │  1242555 │
│ update               │  9163866 │
│ show_tables          │      204 │
│ show_status          │     6366 │
│ insert_select        │      170 │
│ delete               │   539058 │
│ commit               │  1237074 │
│ create_db            │        2 │
│ show_engine_status   │     1274 │
│ show_variables       │      450 │
│ set_option           │     8102 │
│ create_table         │      180 │
│ rollback             │     5394 │
│ create_index         │      120 │
│ insert               │  7031060 │
└──────────────────────┴──────────┘
20 rows in set. Elapsed: 0.120 sec. Processed 46.20 million rows, 691.84 MB (385.17 million rows/s., 5.77 GB/s.)

There are more fields available, like:

db String,
host String,
ip String,

to understand who accessed a MySQL instance, and from where.

If you ever need to do some advanced work with MySQL audit logs, consider doing it with ClickHouse and ClickTail!

The post Analyze MySQL Audit Logs with ClickHouse and ClickTail appeared first on Percona Database Performance Blog.

by Vadim Tkachenko at March 30, 2018 01:07 AM

March 29, 2018

Peter Zaitsev

Using ProxySQL and VIRTUAL Columns to Solve ORM Issues

ProxySQL with AWS Aurora

ProxySQL and VIRTUAL columnsIn this blog post, we’ll look at using ProxySQL and VIRTUAL columns to solve ORM issues.

There are a lot of web frameworks all around. Programmers and web designers are using them to develop and deploy any website and web application. Just to cite some of the most famous names: Drupal, Ruby on Rails, Symfony, etc.

Web frameworks are very useful tools. But sometimes, as with many human artifacts, they have issues. Any framework has its own queries to manage its internal tables. While there is nothing wrong with that, but it often means these queries are not optimized.

Here is my case with Symfony 2 on MySQL 5.7, and how I solved it.

The sessions table issue

Symfony has a table to manage session data for users on the application. The table is defined as follow:

CREATE TABLE `sessions` (
 `sess_id` varchar(126) COLLATE utf8_bin NOT NULL,
 `sess_data` blob NOT NULL,
 `sess_time` int(10) unsigned NOT NULL,
 `sess_lifetime` mediumint(9) NOT NULL,
 PRIMARY KEY (`sess_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin

The expiration time of the user session is configurable. The developers decided to configure it to be one month.

Symfony was serving a high traffic website, and very soon that table became very big. After one month, I saw it had more than 14 million rows and was more than 3GB in size.

mysql> SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, TABLE_ROWS, DATA_LENGTH
    -> FROM information_schema.tables WHERE table_schema='symfony' AND table_name='sessions'\G
*************************** 1. row ***************************
  TABLE_SCHEMA: symfony
    TABLE_NAME: sessions
        ENGINE: InnoDB
    TABLE_ROWS: 14272158
   DATA_LENGTH: 3306140672

Developers noticed the web application sometimes stalling for a few seconds. First, I analyzed the slow queries on MySQL and I discovered that sometimes Symfony deletes inactive sessions. It issued the following query, which took several seconds to complete. This query was the cause of the stalls in the application:

DELETE FROM sessions WHERE sess_lifetime + sess_time < 1521025847

The query is not optimized. Let’s have a look at the EXPLAIN:

mysql> EXPLAIN DELETE FROM sessions WHERE sess_lifetime + sess_time < 1521025847\G
*************************** 1. row ***************************
           id: 1
  select_type: DELETE
        table: sessions
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 14272312
     filtered: 100.00
        Extra: Using where

Every DELETE query was a full table scan of more than 14 million rows. So, let’s try to improve it.

First workaround

Looking around on the web and discussing it with colleagues, we’ve found some workarounds. But none of them was the definitive solution:

  1. Reduce expiration time in Symfony configuration. Good idea. One month is probably too long for a high traffic website. But we kept the expiration time configured at one month because of an internal business policy. But even one week wouldn’t have solved the full table scan.
  2. Using a different database solution. Redis was proposed as an alternative to MySQL to manage session data. This might be a good solution, but it could involve a long deployment time. We planned a test, but the sysadmins suggested it was not a good solution to have another database system for such a simple task.
  3. Patching Symfony code. It was proposed to rewrite the query directly into the Symfony code. Discarded.
  4. Create indexes. It was proposed to create indexes on sess_time and sess_lifetime columns. The indexes wouldn’t get used because of the arithmetic addition on the where clause. This is the only condition we have on the query.

So, what do we do if everything must remain the same? Same configuration, same environment, same query issued and no indexes added?

Query optimization using a virtual column

I focused on how to optimize the query. Since I was using 5.7, I thought about a generated virtual column. I decided to add a virtual column in the sessions table, defined as sess_time+sess_lifetime (the same as the condition of the query):

mysql> ALTER TABLE sessions
ADD COLUMN `sess_delete` INT UNSIGNED GENERATED ALWAYS AS ((`sess_time` + `sess_lifetime`)) VIRTUAL;

Any virtual column can have an index on it. So, I created the index:

mysql> ALTER TABLE sessions ADD INDEX(sess_delete);

Note: I first checked that the INSERT queries were well written in Symfony (with an explicit list of the fields to insert), in make sure this modification wouldn’t cause more issues. Making a schema change on a table that is in use by any framework, where the queries against the table are generally outside of your control, can be a daunting task.

So, let’s EXPLAIN the query rewritten as follows, with the condition directly on the generated indexed column:

mysql> EXPLAIN DELETE FROM sessions WHERE sess_delete < 1521025847\G
*************************** 1. row ***************************
           id: 1
  select_type: DELETE
        table: sessions
         type: range
possible_keys: sess_delete
          key: sess_delete
      key_len: 5
          ref: const
         rows: 6435
     filtered: 100.00
        Extra: Using where

The query now can to use the index, and the number of rows selected are the exact number of the session that we have to delete.

So far, so good. But will Symfony execute that query if we don’t want to modify the source code?

Using ProxySQL to rewrite the query

Fortunately, we already had ProxySQL up and running in our environment. We were using it just to manage the master MySQL failover.

One of the very useful features of ProxySQL is the ability to rewrite any query it receives into another one based on rules you can define. You can create queries from very simple rules, like changing the name of a field, to very complex queries that use a chain of rules. It depends on how complex the translation is that you have to do. In our case, we just needed to translate sess_time + sess_lifetime into sess_delete. The rest of the query was the same. We needed to define a very simple rule.

Let’s see how to create the rewrite rules.

Connect to the proxy:

mysql -u admin -psecretpwd -h 127.0.0.1 -P6032 --prompt='Admin> '

Define the rewrite rule by inserting a record into the mysql_query_rules table:

Admin> INSERT INTO mysql_query_rules(rule_id,active,flagIN,match_pattern,negate_match_pattern,re_modifiers,replace_pattern,destination_hostgroup,apply)
 -> VALUES(
 -> 1,
 -> 1,
 -> 0,
 -> '^DELETE FROM sessions WHERE sess_lifetime + sess_time < (.*)',
 -> 0,
 -> 'CASELESS',
 -> 'DELETE FROM sessions WHERE sess_delete < \1',
 -> 0,
 -> 1);

The two fields I want to focus on are:

  • match_pattern: it defines the query to be matched using the regular expression notation. The + symbol must be escaped using because it’s a special character for regular expressions
  • replace_pattern: it defines how to rewrite the matched query. 1 is the value of the parameter matched by match_pattern into (.*)

For the meaning of the other fields, have a look at https://github.com/sysown/proxysql/wiki/ProxySQL-Configuration.

Once created, we have to save the rule to disk and put it on runtime to let it run effectively.

Admin> SAVE MYSQL QUERY RULES TO DISK;
Admin> LOAD MYSQL QUERY RULES TO RUNTIME;

After that, the proxy began to filter the query and rewrite it to have a better execution plan using the index on the virtual column.

Note: pay attention when you need to upgrade the framework. If it needs to rebuild the database tables, you will lose the virtual column you’ve created. Just remember to recreate it and check it after the upgrade.

Conclusion

Developers love using web frameworks because they are very powerful in simplifying development and deployment of complex web applications. But for DBAs, sometimes internal queries can cause a bit of a headache because it is not well optimized or because it was not supposed to run in your “huge” database. I solved my case using ProxySQL and VIRTUAL columns with a minimal impact on the architecture of the system we had and avoided any source code patching.

Take this post as a tip in case you face similar issues with your application framework.

The post Using ProxySQL and VIRTUAL Columns to Solve ORM Issues appeared first on Percona Database Performance Blog.

by Corrado Pandiani at March 29, 2018 08:38 PM

March 28, 2018

Peter Zaitsev

Percona Live 2018 Featured Talk: Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade-Off with Mat Arye

Mat Ayre TimeScale Percona Live 18

Percona Live 2018 Featured TalkWelcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Mat Arye, Core Database Engineer at Timescale. His talk is titled Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade-Off. Distributed systems were built to scale out for ballooning user bases and operations. As more and more companies vied to be the next Google, Amazon or Facebook, they too “required” horizontal scalability. But in a real way, NoSQL and even NewSQL have forgotten single node performance, where scaling out isn’t an option. And single node performance is important because it allows you to do more with much less.  In our conversation, we discussed why you shouldn’t forget to focus on single-node performance:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Mat: My name is Mat Arye. I started working on database infrastructure as part of my graduate studies in distributed systems at Princeton University, with Timescale’s CTO Mike Freedman. My first project was developing the data streaming infrastructure for a cross-continental data analysis system called Jetstream. I was first introduced to working with PostgreSQL as an intern at CloudFlare, where I worked on their request-analysis system. I started working on the precursor to what would become TimescaleDB while working on a data analysis system for an IoT device cloud platform.

Note: Mike Freedman will also be speaking on Wednesday at 12:50 pm in Room M2, giving the talk TimescaleDB: Re-engineering PostgreSQL as a Time-Series Database.

Percona: Your talk is titled Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade-Off. How have people gotten away from single-node performance?

Mat: Well, when the Internet became a thing, people saw the deluge of data that was coming. They realized that a single-node data system would no longer suffice for many data applications. Thus, the focus of a lot of data infrastructure work shifted to creating scale-out systems. For multi-node systems, performance often comes from making the system “scale linearly” (i.e., increase performance by adding nodes). Thus, a “scalable” system meant it could scale-out across multiple servers. The performance of any single node became less important and less optimized. I do think that, as a community, we have now learned a lot about building scale-out systems and that we need to switch back to concentrating on single-node performance for reasons having to do with cost and operational efficiency.

Percona: How does single-node performance fit in with time-series data?

Mat Ayre TimeScale Percona Live 18Mat: You can think of time-series data as “live” data. This data is often analyzed on dashboards and near-real-time analysis systems that have very different analysis latency requirements from the BI analytical use cases that data lakes were designed for. Single-node efficiency is important for creating systems that can provide the low-latency results necessary for these live applications. Also, many time-series data settings, especially for IoT related use cases, are remote or at the “edge” (e.g., mining sites, factory floors, satellites, gateways). Single-node performance is important for getting the most out of these smaller footprint or resource-constrained environments.

Percona: Why should people worry about single-node architecture in cloud deployments?

Mat: There are many applications in cloud deployments where the single-node data architecture that systems like TimescaleDB provides is sufficient for their data needs. In such applications, using a single-node cloud deployment can save costs (i.e., easier to use, easier to maintain, especially compared to smaller multi-node instances). It can also decrease the latency for getting query results compared to alternate multi-node systems.

Percona:  Why should people attend your talk? What do you hope people will take away from it?

Mat: I hope that people learn two things: (1) that it is often possible (and desirable) to use efficient single-node data analysis systems for many important real-life applications, and (2) as a community, we should start concentrating on single-node efficiency even in multi-node systems. It sort of goes along with the whole “use the right tool for the job” approach that most people tend to aspire to.

Percona: What are you looking forward to at Percona Live (besides your talk)?

Mat: I always like learning about data analysis systems that take new approaches. The diversity of talks and topics at Percona always gives me the opportunity to learn something new. And of course, meeting new people is fun and educational, and Percona Live gives you a great opportunity for that!

Want to find out more about this Percona Live 2018 featured talk, and single-node database performance? Register for Percona Live 2018, and see Mat’s talk Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade-Off. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

The post Percona Live 2018 Featured Talk: Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade-Off with Mat Arye appeared first on Percona Database Performance Blog.

by Dave Avery at March 28, 2018 06:47 PM

Percona XtraDB Cluster on Amazon GP2 Volumes

In this blog post, we look at the performance of Percona XtraDB Cluster on Amazon GP2 volumes.

In our overview blog post on Best Practices for Percona XtraDB Cluster on AWSGP2 volumes did not show good results. However, we allocated only the size needed to fit the database (200GB volumes). Percona XtraDB Cluster did not show good performance on these volumes and provided only limited IOPs.

After publishing our material, Amazon engineers pointed that we should try GP2 volumes with the size allocated to provide 10000 IOPS. If we allocated volumes with size 3.3 TiB or more, we should achieve 10000 IOPS.

It might not be initially clear what the benefit of allocating 3.3 TiB volumes for the database, which is only 100 GiB in size, but in reality GP2 volumes this size are cheaper than IO provisioned volumes that provide 10000 IOPS. Below, we will show Percona XtraDB Cluster results on GP2 volumes 3.3 TB in size.

In the previous post. we used four different instance sizes: r4.large, r4.xlarge, r4.2xlarge and r4.4xlarge. In this case with GP2 volumes of 3.3TB, they are only available with r4.2xlarge and r4.4xlarge instances. We will only test these instances.

The dataset and workload are the same as in the previous post

First, let’s review throughput and latency:

Percona XtraDB Cluster on Amazon GP2 Volumes

The legend:

  • r4/gp2 – the previous results (on GP2 volumes 200GB)
  • r4/gp2.3T – the results on GP2 volumes with 3.3 TB in size
  • r4/io1.10k – IO provisioned volumes 10000 IOPS
  • i3/nvme – I3 instances with NVMe storage.

The takeaway from these results is that 3.3 TB GP2 volumes greatly improve performance, and the results are comparable with IO provisioned volumes.

To compare the stability of latency on GP2 vs. IO1, we check the latency distribution (99% latency with 5-second interval resolution):

Percona XtraDB Cluster on Amazon GP2 Volumes 2

There is no major difference between these volumes types.

With cloud resources, you should always consider cost. Let’s review the cost of the volumes itself:

Percona XtraDB Cluster on Amazon GP2 Volumes 3

We can see that 3.3TB GP2 volumes are much more expensive than 200GB, but still about only the half of the cost of IO provisioned volumes (when we add the cost of provisioned IOPS).

And to compare the full cost of resources, let’s review the cost of an instance (we will use 1-year reserved prices):

Percona XtraDB Cluster on Amazon GP2 Volumes 4

The points of interest:

  • The cost of an r4.2xlarge instance:
    • With 3.3TB GPL2 volume: 78.5 cents/hour
    • With IO1 volume: 125.03 cents/hour
  • The cost of an r4.4xlarge instance
    • With 3.3TB GPL2 volume: 109.25 cents/hour
    • With IO1 volume: 156.32 cents/hour

And given the identical throughput, it may be more economically feasible to use 3.3TB GP2 volumes instead of IO provisioned volumes.

Now we can compare the transactions per second cost of 3.3 TB GP2 volumes with other instances:

Percona XtraDB Cluster on Amazon GP2 Volumes 5

While i3 instances are still a clear winner, if you need to have the capabilities that EBS volumes provide, you might want to consider large GP2 volumes instead of IO provisioned volumes.

In general, large GP2 volumes provide a way to increase IO performance. It seems to be a viable alternative to IO provisioned volumes.

The post Percona XtraDB Cluster on Amazon GP2 Volumes appeared first on Percona Database Performance Blog.

by Vadim Tkachenko at March 28, 2018 05:13 PM

Safely Purging Binary Logs From Master

Purging Bin Logs

Purging Binary LogsIn this blog post, we’ll discuss some of the options available when purging binary logs. We’ll look at how to safely purge them when you have slaves in your topology and want to avoid deleting any binary log that still needs to be applied.

We generally want to ensure that, before purging the binary logs from the master, all logs were applied to the slaves to avoid halting them. The example error below is a classic case of a binary log purged before being applied on the slave:

Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: ‘Could not open log file’

MySQL offers some options to purge of binary logs. One of them is executing the PURGE BINARY LOGS command. The documentation describes this command and its options. Here is an example:

mysql> PURGE BINARY LOGS TO 'mysql-bin.000010';
mysql> PURGE BINARY LOGS BEFORE '2008-04-02 22:46:26';

This will remove the binary logs and update the index file. Another option to purge binary logs is to use the expire_log_days variable. This variable defines the number of days for automatic binary log file removal. You can edit your my.cnf to make this persistent, and also change it dynamically (it is not necessary a restart to take effect):

mysql> set global expire_logs_days=3;
Query OK, 0 rows affected (0.00 sec)

And on my.cnf:

expire-logs-days = 3

One alternative to control the number of binary log files introduced in Percona Server for MySQL 5.6.11-60.3 is the max_binlog_files parameter. When you set the variable max_binlog_files  to a non-zero value, the server removes the oldest binlog file(s) whenever their number exceeds the value of the variable. This is useful to limit the disk usage of the binlog files. Using this parameter limits the maximum disk usage to this theoretical value:

Binlogs disk usage = max_binlog_size * max_binlog_files

The size limit can be smaller because a server restart or FLUSH LOGS will make the server start a new log file and thus resulting in log files that are not fully written. The max_binlog_files  has a dynamic scope and you can change online using this command:

mysql> set global max_binlog_files = 10;
Query OK, 0 rows affected (0.00 sec)

And on my.cnf under [mysqld] section:

[mysqld]
max_binlog_files = 20

However, using these options does not ensure that the slave is already applied the binary log transactions and can be safely removed. For this scenario, it comes in handy to use the mysqlbinlogpurge tool. The mysqlbinlogpurge tool is part of MySQL utilities, and you can download them here. This tool ensures that any files that are in use or required by any of the slaves in a replication topology are not deleted.

But how does mysqlbinlogpurge determine when it is safe to purge the binlogs? A slave in MySQL has two parts that make replication happen: the Slave_IO thread is responsible for gathering events from the master, while the Slave_SQL thread(s) is responsible for executing the events locally. You can see if the slave IO or slave SQL is running and where they are at in their processes by looking at a slave’s status:

mysql > show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 45007
Connect_Retry: 60
Master_Log_File: mysql-bin.000024
Read_Master_Log_Pos: 194
Relay_Log_File: mysql-relay.000013
Relay_Log_Pos: 24996028
Relay_Master_Log_File: mysql-bin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
Exec_Master_Log_Pos: 24995815

And to briefly summarize other important parameters and its meanings:

  • Master_Log_File/Read_Master_Log_Pos – This is what the Slave_IO is currently fetching from the master
  • Relay_Master_Log_File/Exec_Master_Log_Pos – This is what the Slave_SQL thread is actively executing in terms of the Master’s coordinates (master’s log file)
  • Relay_Log_File/Relay_Log_Pos – This is what the SQL thread is actively executing in terms of the Slave’s coordinates (relay log file)

The Master_Log_File is the latest binlog file on the master server that the Slave_IO knows about and reads from. This is where it gathers the information from. Therefore it is this file, and any files after this on the Master server, that we must preserve for replication to continue. The Relay_Master_Log_File is the point of execution in the Master’s binlog that the Slave_SQL thread has executed.

Below is a few examples of how to use the tool and how it behaves in different scenarios. If the slave is stopped/halted, the tool throws an error and will not purge the binary logs:

$ mysqlbinlogpurge --master=root:msandbox@localhost:45007
>           --slaves=root:msandbox@localhost:45008,root:msandbox@localhost:45009
>           --dry-run
ERROR: Can not verify the status for slave localhost:45008. Make sure the slave are active and accessible.

If you want to check what the tool is going to perform before executing, you can use the --dry-run option:

$ mysqlbinlogpurge --master=root:msandbox@localhost:45007 --slaves=root:msandbox@localhost:45008,root:msandbox@localhost:45009 --dry-run
# Latest binlog file replicated by all slaves: mysql-bin.000011
# To manually purge purge the binary logs Execute the following query:
PURGE BINARY LOGS TO 'mysql-bin.000012'

If you don’t want to purge all binary logs, let’s say you want to keep the binlogs until the SQL thread is executing (Relay_Master_Log_File):

$ mysqlbinlogpurge --master=root:msandbox@localhost:45007
>          --slaves=root:msandbox@localhost:45008,root:msandbox@localhost:45009
>          --dry-run
>          --binlog=mysql-bin.000002 -v
# Checking user permission to purge binary logs...
#
# Master active binlog file: mysql-bin.000012
# Checking slave: localhost@45008
# I/O thread is currently reading: mysql-bin.000012
# Checking slave: localhost@45009
# I/O thread is currently reading: mysql-bin.000012
# Range of binlog files available: from mysql-bin.000001 to mysql-bin.000012
# Latest binlog file replicated by all slaves: mysql-bin.000011
# To manually purge purge the binary logs Execute the following query:
PURGE BINARY LOGS TO 'mysql-bin.000002'
# Range of binlog files available: from mysql-bin.000001 to mysql-bin.000012

To effectively remove the binary logs, it is just necessary to remove the --dry-run option:

$ ls -larth data/ | grep -i mysql-bin
-rw-r----- 1 vinicius.grippa vinicius.grippa  99M Feb  2 10:20 mysql-bin.000001
-rw-r----- 1 vinicius.grippa vinicius.grippa  201 Feb  2 10:20 mysql-bin.000002
-rw-r----- 1 vinicius.grippa vinicius.grippa  201 Feb  2 10:20 mysql-bin.000003
-rw-r----- 1 vinicius.grippa vinicius.grippa  50M Feb  2 10:28 mysql-bin.000004
-rw-r----- 1 vinicius.grippa vinicius.grippa  201 Feb  2 10:28 mysql-bin.000005
-rw-r----- 1 vinicius.grippa vinicius.grippa  201 Feb  2 10:28 mysql-bin.000006
-rw-r----- 1 vinicius.grippa vinicius.grippa  201 Feb  2 10:28 mysql-bin.000007
-rw-r----- 1 vinicius.grippa vinicius.grippa  201 Feb  2 10:28 mysql-bin.000008
-rw-r----- 1 vinicius.grippa vinicius.grippa  201 Feb  2 10:28 mysql-bin.000009
-rw-r----- 1 vinicius.grippa vinicius.grippa  201 Feb  2 10:28 mysql-bin.000010
-rw-r----- 1 vinicius.grippa vinicius.grippa  228 Feb  2 10:28 mysql-bin.index
-rw-r----- 1 vinicius.grippa vinicius.grippa  201 Feb  2 10:28 mysql-bin.000011
-rw-r----- 1 vinicius.grippa vinicius.grippa 739M Feb  2 10:32 mysql-bin.000012
$ mysqlbinlogpurge --master=root:msandbox@localhost:45007
>           --slaves=root:msandbox@localhost:45008,root:msandbox@localhost:45009
>           -v
# Checking user permission to purge binary logs...
#
# Master active binlog file: mysql-bin.000012
# Checking slave: localhost@45008
# I/O thread is currently reading: mysql-bin.000012
# Checking slave: localhost@45009
# I/O thread is currently reading: mysql-bin.000012
# Range of binlog files available: from mysql-bin.000001 to mysql-bin.000012
# Latest binlog file replicated by all slaves: mysql-bin.000011
# Latest not active binlog file: mysql-bin.000011
# Purging binary logs prior to 'mysql-bin.000012'
# Binlog file available: mysql-bin.000012
# Range of binlog files purged: from mysql-bin.000001 to mysql-bin.000011
$ ls -larth data/ | grep -i mysql-bin
-rw-r----- 1 vinicius.grippa vinicius.grippa 739M Feb  2 10:32 mysql-bin.000012
-rw-r----- 1 vinicius.grippa vinicius.grippa   19 Feb  2 10:44 mysql-bin.index

The tool has proven safe to run under the most general scenarios, like:

  • Master x Slave (1:1, 1:N) and with GTID on/off
  • Master x Master and with GTID on/off

Caveats

There are a few caveats using the mysqlbinlogpurge tool:

Multi-source replication. The tool does not work properly when the topology has Multi-Source replication enabled. The tool will run, but it will not get the binlog properly. Here is an example:

$ mysqlbinlogpurge --master=root:msandbox@localhost:45008 --slaves=root:msandbox@localhost:45009 --dry-run
# Latest binlog file replicated by all slaves: mysql-bin.000000
# No binlog files can be purged.

Relay log corrupted. If for some reason a slave corrupts its local relay log, you need to restart replication from the Relay_Master_Log_File, and the tool might already have executed the purge of the required binlog.

Summary

If you have space constraints and are reasonably certain that relay logs won’t be corrupted, the mysqlbinlogpurge tool provides a good way for purging binary logs. You might consider it as an option to keep the bin logs under control. It attends the most general topologies (except multi-source topologies).

The post Safely Purging Binary Logs From Master appeared first on Percona Database Performance Blog.

by Vinicius Grippa at March 28, 2018 04:36 PM

Jean-Jerome Schmidt

Webinar Replay: How to Design Open Source Databases for High Availability

Thanks for joining this week’s webinar on how to design open source databases for high availability with Ashraf Sharif, Senior Support Engineer at Severalnines. From discussing high availability concepts through to failover or switch over mechanisms, Ashraf covered all the need-to-know information when it comes to building highly available database infrastructures.

It’s been said that not designing for failure leads to failure; but what is the best way to design a database system from the ground up to withstand failure?

Designing open source databases for high availability can be a challenge as failures happen in many different ways, which sometimes go beyond imagination. This is one of the consequences of the complexity of today’s open source database environments.

At Severalnines we’re big fans of high availability databases and have seen our fair share of failure scenarios across the thousands of database deployment attempts that we come across every year.

In this webinar replay, we look at the different types of failures you might encounter and what mechanisms can be used to address them. And we look at some of popular high availability solutions used today, and how they can help you achieve different levels of availability.

Watch the replay

Agenda

  • Why design for High Availability?
  • High availability concepts
    • CAP theorem
    • PACELC theorem
  • Trade offs
    • Deployment and operational cost
    • System complexity
    • Performance issues
    • Lock management
  • Architecting databases for failures
    • Capacity planning
    • Redundancy
    • Load balancing
    • Failover and switchover
    • Quorum and split brain
    • Fencing
    • Multi datacenter and multi-cloud setups
    • Recovery policy
  • High availability solutions
    • Database architecture determines Availability
    • Active-Standby failover solution with shared storage or DRBD
    • Master-slave replication
    • Master-master cluster
  • Failover and switchover mechanisms
    • Reverse proxy
    • Caching
    • Virtual IP address
    • Application connector

Watch the replay

Speaker

Ashraf Sharif is System Support Engineer at Severalnines. He was previously involved in hosting world and LAMP stack, where he worked as principal consultant and head of support team and delivered clustering solutions for large websites in the South East Asia region. His professional interests are on system scalability and high availability.

by jj at March 28, 2018 12:59 PM

March 27, 2018

Peter Zaitsev

Webinar Thursday March 29, 2018: Effective Testing for Live Applications

Testing for Live Applications

Testing for Live ApplicationsPlease join Percona’s Principal Support Engineer, Sveta Smirnova, as she presents Effective Testing for Live Applications on March 29, 2018, at 10:00 am PDT (UTC-7) / 1:00 pm EDT (UTC-4).

When an application is hit with trouble in a live production environment, it is often difficult to:

  • Repeat a problematic scenario without the risk of making things worse
  • Find which query or sequence of actions caused the issue
  • Prepare a dataset to share with the support team that they can use to investigate the problem

At the same time, it is not possible to solve troubles without understanding what caused them. This is why it is necessary to clearly understand what steps caused the issue. You need to have a repeatable test case or, at the very least, a clear understanding of when your application met the error.

In this webinar, I will:

  • Guide you through general steps that will help you to identify the issue
  • Cover testing methods that work best at the each step
  • Discuss minimizing test cases so you are better prepared to provide information to your support team (or just to have it handy on your test server).

Register for the webinar now.

Testing for Live ApplicationsSveta Smirnova, Principal Technical Services Engineer

Sveta joined Percona in 2015. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can solve typical issues quicker, and teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona, Sveta worked as Support Engineer in the MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is the author of the book “MySQL Troubleshooting” and JSON UDF functions for MySQL.

The post Webinar Thursday March 29, 2018: Effective Testing for Live Applications appeared first on Percona Database Performance Blog.

by Sveta Smirnova at March 27, 2018 09:43 PM

ANALYZE TABLE Is No Longer a Blocking Operation

analyze table

analyze tableIn this post, I’ll discuss the fix for lp:1704195 (migrated to PS-2503), which prevents

ANALYZE TABLE
 from blocking all subsequent queries on the same table.

In November 2017, Percona released a fix for lp:1704195 (migrated to PS-2503), created by Laurynas Biveinis. The fix, included with Percona Server for MySQL since versions 5.6.38-83.0 and 5.7.20-18, stops

ANALYZE TABLE
 from invalidating query and table definition cache content for supported storage engines (InnoDB, TokuDB and MyRocks).

Why is this important?

In short, it is now safe to run

ANALYZE TABLE
 in production environments because it won’t trigger a situation where all queries on the same table stack are in the state
"Waiting for table flush"
. Check this blog post for details on how this situation can happen.

Why do we need to run ANALYZE TABLE?

When Optimizer decides which index to choose to resolve the query, it uses statistics stored for this table by storage engine. If the statistics are not up to date, Optimizer might choose the wrong index when it creates the query execution plan. This can cause performance to suffer.

To prevent this, storage engines support automatic and manual statistics updates. While automatic statistics updates usually work fine, there are cases when they do not do their job properly.

For example, InnoDB uses 20 sample 16K pages when it updates persistent statistics, and eight 16K pages when it updates transient statistics. If your data distribution is even, it does not matter how big your table is: even for 1T tables, using a sample of 320K is enough. But if your data isn’t even, statistics might get wrongly created. The solution for this issue is to increase either the innodb_stats_transient_sample_pages or innodb_stats_persistent_sample_pages variable. But increasing the number of pages to examine while collecting statistics leads to longer update runs, and thus higher IO activity, which is probably not what you want to happen often.

To control this, you can disable automatic statistics updates for such tables, and schedule a job that periodically runs 

ANALYZE TABLE
.

Will it be safe before the fix for lp:1704195 (migrated to PS-2503)?

Theoretically yes, but we could easily hit a situation as described in this blog post by Miguel Angel Nieto. The article describes what if some long-running query starts and doesn’t finish before

ANALYZE TABLE
. All the queries on the analyzing table get stuck in the state
"Waiting for table flush"
 at some time.

This happens because before the fix, 

ANALYZE TABLE
 worked as follows:

  1. Opens table statistics: concurrent DML operations (
    INSERT/UPDATE/DELETE/SELECT
    ) are allowed
  2. Updates table statistics: concurrent DML operations are allowed
  3. Update finished
  4. Invalidates table entry in the table definition cache: concurrent DML operations are forbidden
    1. What happens here is
      ANALYZE TABLE
       marks the currently open table share instances as invalid. This does not affect running queries: they will complete as usual. But all incoming queries will not start until they can re-open table share instance. And this will not happen until all currently running queries complete.
  5. Invalidates query cache: concurrent DML operations are forbidden

Last two operations are usually fast, but they cannot finish if another query touched either the table share instance or acquired query cache mutex. And, in its turn, it cannot allow for incoming queries to start.

However

ANALYZE TABLE
 modifies table statistics, not table definition!

Practically, it cannot affect already running queries in any way. If a query started before

ANALYZE TABLE
 finished updating statistics, it uses old statistics.
ANALYZE TABLE
 does not affect data in the table. Thus old entries in the query cache will still be correct. It hasn’t changed the definition of the table. Therefore there is no need to remove it from the table definition cache. As a result, we avoid operations 4 and 5 above.

The fix for lp:1704195 (migrated to PS-2503) removes these additional updates and locks required for them, and makes

ANALYZE TABLE
 always safe to run in busy production environments.

The post ANALYZE TABLE Is No Longer a Blocking Operation appeared first on Percona Database Performance Blog.

by Sveta Smirnova at March 27, 2018 07:35 PM

MariaDB AB

MariaDB Server 10.2.14 and 10.1.32 now available

MariaDB Server 10.2.14 and 10.1.32 now available dbart Tue, 03/27/2018 - 15:14

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.2.14 and MariaDB Server 10.1.32. See the release notes and changelogs for details and visit mariadb.com/downloads to download.

Download MariaDB Server 10.2.14

Release Notes Changelog What is MariaDB 10.2?


Download MariaDB Server 10.1.32

Release Notes Changelog What is MariaDB 10.1?

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.2.14 and MariaDB Server 10.1.32. See the release notes and changelog for details.

Ana Paula

Ana Paula

Wed, 04/04/2018 - 21:31

Salve Maria!

Adoremos o Santíssimo Sacramento!

Login or Register to post comments

by dbart at March 27, 2018 07:14 PM

MariaDB Foundation

MariaDB 10.2.14, MariaDB 10.1.32 and MariaDB Connector/J 2.2.3 and 1.7.3 now available

The MariaDB project is pleased to announce the availability of MariaDB 10.2.14 and MariaDB 10.1.32, both stable releases, as well as MariaDB Connector/J 2.2.3, the latest stable release in the MariaDB Connector/J 2.2 series, and MariaDB Connector/J 1.7.3, the latest stable release in the MariaDB Connector/J 1.7 series. See the release notes and changelogs for […]

The post MariaDB 10.2.14, MariaDB 10.1.32 and MariaDB Connector/J 2.2.3 and 1.7.3 now available appeared first on MariaDB.org.

by Ian Gilfillan at March 27, 2018 06:51 PM

Peter Zaitsev

Webinar Wednesday, March 28, 2018: ZFS with MySQL

ZFS with MySQL

ZFS with MySQLPlease join Percona’s Principal Architect in Architecture & Projects, Yves Trudeau, as he presents ZFS with MySQL on Wednesday, March 28, 2018, at 7:00 am PDT (UTC -7) / 10:00 am EDT (UTC -4).

Are you curious about ZFS? Would you like to learn how to setup and configure ZFS? What about ZFS with MySQL?

ZFS on Linux has matured a lot. It offers unique features that are extremely compelling for use with a database server like MySQL.

During this webinar, we’ll review the main characteristics of ZFS, and walk through the configuration of ZFS and MySQL in order to provide good performance levels and superior ease-of-management. We will also cover aspects like backups using snapshots, cloning snapshots to create local slaves, the use of an SLOG device for low latency transactions and the use of the L2ARC as a level 2 caching layer over fast SSDs.

Register for the webinar now.

ZFS with MySQLYves Trudeau, Principal Architect

Yves is a Principal Consultant at Percona, specializing in MySQL High-Availability and scaling solutions. Prior to joining Percona in 2009, he worked as a senior consultant for MySQL AB and Sun Microsystems, assisting customers across North America with NDB Cluster and Heartbeat/DRBD technologies. Yves holds a Ph.D. in Experimental Physics from Université de Sherbrooke. He lives in Québec, Canada with his wife and three daughters.

The post Webinar Wednesday, March 28, 2018: ZFS with MySQL appeared first on Percona Database Performance Blog.

by Yves Trudeau at March 27, 2018 04:39 PM

Jean-Jerome Schmidt

Comparing Database Proxy Failover Times - ProxySQL, MaxScale and HAProxy

ClusterControl can be used to deploy highly available replication setups. It supports switchover and failover for GTID-based MySQL or MariaDB replication setups. ClusterControl can deploy different types of proxies for traffic routing: ProxySQL, HAProxy and MaxScale. These are integrated to handle topology changes related to failovers or switchovers. In this blog post, we’ll take a look at how this works and what you can expect from each of the proxies.

First, let’s go through some definitions and terminology. ClusterControl can be configured to perform a recovery of a failed replication master - it can promote a slave to become the new master, make any required topology changes and restore entire setup’s ability to accept writes. This is what we will call a “failover”. ClusterControl can also perform a master switch - sometimes it’s required to change a master. Typical scenario would be a heavy schema change, which has to be executed in a rolling fashion. Towards the end of the procedure, you’ll have to promote one of the slaves, which already has the change applied, before performing the change on the old master.

The main difference between “failover” and “switchover” is that failover, by definition, is an emergency situation where the master is already unavailable. On the other hand, switchover is a more controllable process over which ClusterControl has full control. If we are talking about failover, there is no way to handle it gracefully as application already lost connections due to master crash. As such, no matter which proxy you will use, application will always have to reconnect.

So, applications need to be able to handle transaction failures and retry them. The other important thing when speaking about failover is the proxy’s ability to check the health of the database servers. Without health checks, the proxy cannot know the status of the server, and therefore cannot decide to failover traffic. ClusterControl automatically configures these healthchecks when deploying the proxy.

Failover

ProxySQL

Let’s take a look at how the failover may look like from the application point of view. We will first connect to the database using ProxySQL version 1.4.6.

root@vagrant:~# while true  ;do time sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --max-requests=0 --time=3600 --mysql-host=10.0.0.105 --mysql-user=sbtest --mysql-password=pass --mysql-port=6033 --tables=32 --report-interval=1 --skip-trx=on --table-size=10000 --db-ps-mode=disable run ; done
sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Report intermediate results every 1 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 1s ] thds: 4 tps: 29.51 qps: 585.28 (r/w/o: 465.27/120.01/0.00) lat (ms,95%): 196.89 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 4 tps: 44.61 qps: 784.77 (r/w/o: 603.28/181.49/0.00) lat (ms,95%): 116.80 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 4 tps: 46.98 qps: 829.66 (r/w/o: 646.74/182.93/0.00) lat (ms,95%): 121.08 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 4 tps: 49.04 qps: 886.64 (r/w/o: 690.50/195.14/1.00) lat (ms,95%): 112.67 err/s: 0.00 reconn/s: 0.00
[ 5s ] thds: 4 tps: 47.98 qps: 887.64 (r/w/o: 689.72/197.92/0.00) lat (ms,95%): 106.75 err/s: 0.00 reconn/s: 0.00
FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'UPDATE sbtest8 SET k=k+1 WHERE id=5019'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:461: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query
FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'DELETE FROM sbtest6 WHERE id=4957'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:490: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query
FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'SELECT SUM(k) FROM sbtest23 WHERE id BETWEEN 4986 AND 5085'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:435: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query
FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'DELETE FROM sbtest21 WHERE id=5218'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:490: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query

real    0m5.903s
user    0m0.092s
sys    0m1.252s
sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Report intermediate results every 1 second(s)
Initializing random number generator from current time


Initializing worker threads...

FATAL: unable to connect to MySQL server on host '10.0.0.105', port 6033, aborting...
FATAL: error 2003: Can't connect to MySQL server on '10.0.0.105' (111)
FATAL: `thread_init' function failed: /usr/local/share/sysbench/oltp_common.lua:352: connection creation failed
FATAL: unable to connect to MySQL server on host '10.0.0.105', port 6033, aborting...
FATAL: error 2003: Can't connect to MySQL server on '10.0.0.105' (111)
FATAL: `thread_init' function failed: /usr/local/share/sysbench/oltp_common.lua:352: connection creation failed
FATAL: unable to connect to MySQL server on host '10.0.0.105', port 6033, aborting...
FATAL: error 2003: Can't connect to MySQL server on '10.0.0.105' (111)
FATAL: `thread_init' function failed: /usr/local/share/sysbench/oltp_common.lua:352: connection creation failed
FATAL: unable to connect to MySQL server on host '10.0.0.105', port 6033, aborting...
FATAL: error 2003: Can't connect to MySQL server on '10.0.0.105' (111)
FATAL: `thread_init' function failed: /usr/local/share/sysbench/oltp_common.lua:352: connection creation failed
FATAL: Threads initialization failed!

real    0m0.021s
user    0m0.012s
sys    0m0.000s
sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Report intermediate results every 1 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 1s ] thds: 4 tps: 0.00 qps: 55.81 (r/w/o: 55.81/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 5s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 6s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 7s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 8s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 9s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 4 tps: 0.00 qps: 3.00 (r/w/o: 0.00/3.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 11s ] thds: 4 tps: 58.99 qps: 1026.91 (r/w/o: 792.93/233.98/0.00) lat (ms,95%): 9977.52 err/s: 0.00 reconn/s: 0.00

As we can see from the above, the new master became available within ~11 seconds of the crash. During this time, ClusterControl promoted one of the slaves to become a new master and it became available for writes.

HAProxy

Below is an excerpt from the output of our sysbench application, when failover happened while we connected via HAProxy. HAProxy was deployed with version 1.5.14.

root@vagrant:~# while true  ;do date ; time sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --max-requests=0 --time=3600 --mysql-host=10.0.0.105 --mysql-user=sbtest --mysql-password=pass --mysql-port=3307 --tables=32 --report-interval=1 --skip-trx=on --table-size=10000 --db-ps-mode=disable run ; done
Mon Mar 26 13:24:36 UTC 2018
sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Report intermediate results every 1 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 1s ] thds: 4 tps: 38.62 qps: 748.66 (r/w/o: 591.21/157.46/0.00) lat (ms,95%): 204.11 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 4 tps: 45.25 qps: 797.34 (r/w/o: 619.37/177.97/0.00) lat (ms,95%): 142.39 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 4 tps: 46.04 qps: 833.66 (r/w/o: 647.51/186.15/0.00) lat (ms,95%): 155.80 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 4 tps: 38.03 qps: 698.50 (r/w/o: 548.39/150.11/0.00) lat (ms,95%): 161.51 err/s: 0.00 reconn/s: 0.00
FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'INSERT INTO sbtest26 (id, k, c, pad) VALUES (5019, 4641, '59053342586-08172779908-92479743240-43242105725-10632773383-95161136797-93281862044-04686210438-11173993922-29424780352', '31974441818-04649488782-29232641118-20479872868-43849012112')'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:491: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query
FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'INSERT INTO sbtest5 (id, k, c, pad) VALUES (4990, 5016, '24532768797-67997552950-32933774735-28931955363-94029987812-56997738696-36504817596-46223378508-29593036153-06914757723', '96663311222-58437606902-85941187037-63300736065-65139798452')'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:491: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query
FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'DELETE FROM sbtest25 WHERE id=4996'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:490: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query
FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'UPDATE sbtest16 SET k=k+1 WHERE id=5269'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:461: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query

real    0m4.270s
user    0m0.068s
sys    0m0.928s

...

Mon Mar 26 13:24:47 UTC 2018
sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Report intermediate results every 1 second(s)
Initializing random number generator from current time


Initializing worker threads...

FATAL: unable to connect to MySQL server on host '10.0.0.105', port 3307, aborting...
FATAL: error 2013: Lost connection to MySQL server at 'reading initial communication packet', system error: 0
FATAL: `thread_init' function failed: /usr/local/share/sysbench/oltp_common.lua:352: connection creation failed
FATAL: unable to connect to MySQL server on host '10.0.0.105', port 3307, aborting...
FATAL: error 2013: Lost connection to MySQL server at 'reading initial communication packet', system error: 2
FATAL: `thread_init' function failed: /usr/local/share/sysbench/oltp_common.lua:352: connection creation failed
FATAL: unable to connect to MySQL server on host '10.0.0.105', port 3307, aborting...
FATAL: error 2013: Lost connection to MySQL server at 'reading initial communication packet', system error: 2
FATAL: `thread_init' function failed: /usr/local/share/sysbench/oltp_common.lua:352: connection creation failed
FATAL: unable to connect to MySQL server on host '10.0.0.105', port 3307, aborting...
FATAL: error 2013: Lost connection to MySQL server at 'reading initial communication packet', system error: 2
FATAL: `thread_init' function failed: /usr/local/share/sysbench/oltp_common.lua:352: connection creation failed
FATAL: Threads initialization failed!

real    0m0.036s
user    0m0.004s
sys    0m0.008s

...

Mon Mar 26 13:25:03 UTC 2018
sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Report intermediate results every 1 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 1s ] thds: 4 tps: 50.58 qps: 917.42 (r/w/o: 715.10/202.33/0.00) lat (ms,95%): 153.02 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 4 tps: 50.17 qps: 956.33 (r/w/o: 749.61/205.72/1.00) lat (ms,95%): 121.08 err/s: 0.00 reconn/s: 0.00

In total, the process took 12 seconds.

MaxScale

Let’s take a look at how MaxScale handles failover. We use MaxScale with version 2.1.9.

root@vagrant:~# while true ; do date ; time sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --max-requests=0 --time=3600 --mysql-host=10.0.0.106 --mysql-user=myuser --mysql-password=pass --mysql-port=4008 --tables=32 --report-interval=1 --skip-trx=on --table-size=100000 --db-ps-mode=disable run ; done
Mon Mar 26 15:16:34 UTC 2018
sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Report intermediate results every 1 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 1s ] thds: 4 tps: 34.82 qps: 658.54 (r/w/o: 519.27/125.34/13.93) lat (ms,95%): 137.35 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 4 tps: 35.01 qps: 655.23 (r/w/o: 513.18/142.05/0.00) lat (ms,95%): 207.82 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 4 tps: 39.01 qps: 696.16 (r/w/o: 542.13/154.04/0.00) lat (ms,95%): 139.85 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 4 tps: 40.91 qps: 724.41 (r/w/o: 557.77/166.63/0.00) lat (ms,95%): 125.52 err/s: 0.00 reconn/s: 0.00
FATAL: mysql_drv_query() returned error 1053 (Server shutdown in progress) for query 'UPDATE sbtest28 SET k=k+1 WHERE id=49992'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:461: SQL error, errno = 1053, state = '08S01': Server shutdown in progress
FATAL: mysql_drv_query() returned error 1053 (Server shutdown in progress) for query 'UPDATE sbtest14 SET k=k+1 WHERE id=59650'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:461: SQL error, errno = 1053, state = '08S01': Server shutdown in progress
FATAL: mysql_drv_query() returned error 1053 (Server shutdown in progress) for query 'UPDATE sbtest12 SET k=k+1 WHERE id=50288'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:461: SQL error, errno = 1053, state = '08S01': Server shutdown in progress
FATAL: mysql_drv_query() returned error 1053 (Server shutdown in progress) for query 'UPDATE sbtest25 SET k=k+1 WHERE id=50105'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:461: SQL error, errno = 1053, state = '08S01': Server shutdown in progress

real    0m5.043s
user    0m0.080s
sys    0m1.044s


Mon Mar 26 15:16:53 UTC 2018
sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Report intermediate results every 1 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 1s ] thds: 4 tps: 46.82 qps: 905.61 (r/w/o: 710.34/195.27/0.00) lat (ms,95%): 101.13 err/s: 0.00 reconn/s: 0.00

Failover summary

It is important to clarify that this is not a scientific benchmark - most of the time is used by ClusterControl to perform the failover. Proxies typically need a couple of seconds at most to detect the topology change. We used sysbench as our application. It was configured to run auto-committed transactions, so neither explicit transactions nor prepared statements have been used. Sysbench’s read/write workload is pretty fast. If you have long-running transactions or queries, the failover performance will differ. You can see our scenario as a best case.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Switchover

As we mentioned earlier, when executing a switchover ClusterControl has more control of the master. Under some circumstances (like no transactions, no long running writes, etc.), it may be able to perform a graceful master switch, as long as the proxy supports this. Unfortunately, as of now, none of the proxies deployable by ClusterControl can handle graceful switchover. In the past, ProxySQL had this capability therefore we decided to investigate closer and got in touch with ProxySQL creator, René Cannaò. During the investigation we identified a regression which should be fixed in the next release of ProxySQL. In the meantime, to showcase how ProxySQL should behave, we used ProxySQL patched with a small workaround which we compiled from source.

[ 16s ] thds: 4 tps: 39.01 qps: 711.11 (r/w/o: 555.09/156.02/0.00) lat (ms,95%): 173.58 err/s: 0.00 reconn/s: 0.00
[ 17s ] thds: 4 tps: 49.00 qps: 879.06 (r/w/o: 678.05/201.01/0.00) lat (ms,95%): 102.97 err/s: 0.00 reconn/s: 0.00
[ 18s ] thds: 4 tps: 42.86 qps: 768.57 (r/w/o: 603.09/165.48/0.00) lat (ms,95%): 176.73 err/s: 0.00 reconn/s: 0.00
[ 19s ] thds: 4 tps: 28.07 qps: 521.26 (r/w/o: 406.98/114.28/0.00) lat (ms,95%): 235.74 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 21s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 22s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 23s ] thds: 4 tps: 13.98 qps: 249.59 (r/w/o: 193.68/55.91/0.00) lat (ms,95%): 4055.23 err/s: 0.00 reconn/s: 0.00
[ 24s ] thds: 4 tps: 81.06 qps: 1449.01 (r/w/o: 1123.79/325.23/0.00) lat (ms,95%): 62.19 err/s: 0.00 reconn/s: 0.00
[ 25s ] thds: 4 tps: 52.02 qps: 923.42 (r/w/o: 715.32/208.09/0.00) lat (ms,95%): 390.30 err/s: 0.00 reconn/s: 0.00
[ 26s ] thds: 4 tps: 59.00 qps: 1082.94 (r/w/o: 844.96/237.99/0.00) lat (ms,95%): 164.45 err/s: 0.00 reconn/s: 0.00
[ 27s ] thds: 4 tps: 50.99 qps: 900.75 (r/w/o: 700.81/199.95/0.00) lat (ms,95%): 130.13 err/s: 0.00 reconn/s: 0.00

As you can see, no queries are executed for 4 seconds but no error is returned to the application and after this pause, the traffic starts to flow once more.

To summarize, we have shown that ClusterControl, when used with ProxySQL or MaxScale or HAProxy, can perform a failover with a downtime of 10 - 15 seconds. With respect to a planned master switch, none of the proxies can handle the procedure without errors at the time of writing. However, it is expected that the next ProxySQL version will allow a switchover of a few seconds without any error showing up in the application.

by krzysztof at March 27, 2018 02:45 PM

MariaDB Foundation

2018-2 Developers Unconference in Finland

We are happy to announce that the 2nd and final MariaDB Developers Unconference of 2018 will take place in Tampere, Finland during the last week of June: 26 June – New Contributor Day 27–28 June –Developers Unconference 29 June – Patch review day Seravo are kindly hosting the event. If you want to attend, please […]

The post 2018-2 Developers Unconference in Finland appeared first on MariaDB.org.

by Otto Kekäläinen at March 27, 2018 08:30 AM

March 26, 2018

Peter Zaitsev

New MySQL 8.0 innodb_dedicated_server Variable Optimizes InnoDB from the Get-Go

MySQL 8.0 innodb_dedicated_server

MySQL 8.0 innodb_dedicated_serverIn this post, we’ll look at the MySQL 8.0 innodb_dedicated_server variable.

MySQL 8.0 introduces a new variable called innodb_dedicated_server. When enabled, it auto tunes innodb_buffer_pool_size, innodb_log_file_size and innodb_flush_method at startup (if these variables are not explicitly defined in my.cnf).

The new MySQL 8.0 variable automatically sizes the following variables based on the RAM size of the system:

innodb_buffer_pool_size:

    • <1G: 128M(default value if innodb_dedicated_server is OFF)
    • <=4G: Detected Physical RAM * 0.5
    • >4G: Detected Physical RAM * 0.75

innodb_log_file_size:

    • <1G: 48M(default value if innodb_dedicated_server is OFF)
    • <=4G: 128M
    • <=8G: 512M
    • <=16G: 1024M
    • >16G: 2G

The variable also sets the following:

innodb_flush_method: 

    • Set to O_DIRECT_NO_FSYNC if the setting is available on the system. If not, set it to the default InnoDB flush method

These new default values are very reasonable, and the changes to these three variables show considerable performance improvements from the get-go than using the old default values. As stated in the worklog of this feature, the current MySQL version (5.7) only uses around 512M RAM with the default settings. With the new feature, these variables can easily adapt to the amount of RAM allocated to the server for the convenience of the system/database administrator.

With that said, you can achieve the best setting for these three variables by tuning it to your workload and hardware.

For InnoDB buffer pool size (based on this article), consider allocating 80% of physical RAM for starters. You can increase it to as large as needed and possible, as long as the system doesn’t swap on the production workload.

For InnoDB log file size, it should be able to handle one hour of writes to allow InnoDB to optimize writing the redo log to disk. You can calculate an estimate by following the steps here, which samples one minute worth of writes to the redo log. You could also get a better estimate from hourly log file usage with Percona Monitoring and Management (PMM) graphs.

Finally, for innodb_flush_method, O_DIRECT_NO_FSYNC prevents double buffering between the OS cache and disk, and works well with low-latency IO devices such as RAID subsystem with write cache. On the other hand, in high-latency IO devices, commonly found on deployments where MySQL is stored in SAN drives, having an OS cache with the default flush method fsync is more beneficial.

All in all, the MySQL 8.0 innodb_dedicated_server variable provides a fairly well-tuned InnoDB configuration at startup. But if it’s not enough, you can still tune these variables based on your workload and hardware. While MySQL 8.0 isn’t released yet, you can take a look at this article that helps you tune the current version (MySQL 5.7) right after installation.

The post New MySQL 8.0 innodb_dedicated_server Variable Optimizes InnoDB from the Get-Go appeared first on Percona Database Performance Blog.

by Jaime Sicam at March 26, 2018 08:26 PM

March 25, 2018

Valeriy Kravchuk

Windows Tools for MySQL DBAs: Basic Minidump Analysis

"To a man with a hammer, everything looks like a nail."

Even though I had written many posts explaining the use of gdb for various MySQL-related tasks, I have to use other OS level troubleshooting tools from time to time. Moreover, as MySQL and MariaDB are still supported and used under Microsoft Windows in production by customers I have to serve them there, and use Windows-specific tools sometimes. So, I decided to start a series of posts (that I promised to my great colleague Vladislav Vaintroub (a.k.a Wlad) who helped me a lot over years and actually switched my attention from Performance Schema towards debuggers) about different Windows tools for MySQL DBAs (and support engineers).

Developers (and maybe even power users) on Windows probably know all I plan to describe and way more, by heart, but for me many things were not obvious and took some time to search, try or even ask for some advises... So, this series of posts is going to be useful at least for me (and mostly UNIX users, like me), as a source of hints and links that may save me some time and efforts in the future.

In this first post I plan to describe basic installation of "Debugging Tools for Windows" and use of cdb command line debugger to analyze minidumps (that one gets on Windows upon crashes when core-file option is added to my.ini and may get for hanging mysqld.exe process with minimal efforts using different tools) and get backtraces and few other details from them. I also plan to show simple command lines to share with DBAs and users whom you help, that allow to get useful details (more or less full backtraces, crash analysis, OS details etc) for further troubleshooting when/if dumps can not or should not be shared.
---
I have to confess: I use Microsoft Windows on desktops and laptops. I started from Windows 3.0 back in 1992 and ended with Windows 10 on my wife's laptop. I use Windows even for work. Today 2 of my 4 machines used for work-related tasks run Windows (64-bit XP on old Dell box I've got from MySQL AB back in 2005 and 64-bit Windows 7 on this Acer netbook). At the same time, most of work I have to do since 1992 is related to UNIX of all kinds (from Xenix and SCO OpenDesktop that I connected to from VT220 terminal in at my first job after the university, to recent Linux versions used by customers in production, my Fedora 27 box and Ubuntu 14.04 netbook used as build, Docker, VirtualBox, testing, benchmarking etc servers). I had never become a real powerful user of Windows (no really complex .bat files, PowerShell programming or even Basic macros in Word, domains, shadow copy services usage for backups, nothing fancy). But on UNIX I had to master shell, vi :), some Perl and a lot of command line tools.

I had to do some software development on Windows till 2005, built MySQL on Windows sometimes up to 2012 when I joined Percona (that had nothing to do with Windows at all), so I have old version of Visual Studio, some older WinDbg and other debugging tools here and there, but had not used them more than once a year, until recently... Last time I attached WinDbg to anything MySQL-related it was MariaDB 10.1.13, during some troubleshooting related to MDEV-10191.

Suddenly in March I've got issues from customers related to hanging upon startup/InnoDB recovery and under load, and crashing while using some (somewhat exotic) storage engine, all these - on modern versions of Microsoft Windows, in production. I had no other option but to get and study backtraces (of all threads or crashing threads) and check source code. It would be so easy to get them on Linux (just ask them to install gdb , attach it to hanging mysqld process or point out to the mysqld binary and core, and get the output of thread apply all backtrace, minor details aside). But how to do this on Winsdows, in command line if possible (as I hate to share screenshots and write long explanations on where to click and what to copy/paste)? I had to check in WinDbg, get some failures because of my outdated and incomplete environment (while customer with proper environment provided useful outputs anyway), then, eventually, asked Wlad for some help. Eventually I was able to make some progress.

To be ready to do this again next time with confidence, proper test environment and without wasting anybody else's time, I decided to repeat some of these efforts in clean environment and make notes, that I am going to share in this series of blog posts. Today I'll concentrate on installing current "Debugging Tools for Windows" and using cdb from them to process minidumps.

1. Installing "Debugging Tools for Windows"

There is a nice, easy to find document from Microsoft on how to get cdb and other debugging tools for Windows. For recent versions you just have to download Windows 10 SDK and then install these tools (and everything else you may need) from it. Proceed to this page, read the details, click on "Download .EXE" to get winsdksetup.exe , start it and select "Debugging Tools for Windows" when requested to select the features. Eventually you'll get some 416+ MB downloaded and installed by default in C:\Program Files (x86)\Windows Kits\10\Debuggers\. (on default 64-bit Windows installation with C: as system disk). Quick check shows I have everything I need:
C:\Program Files (x86)\Windows Kits\10\Debuggers\x64>dir
...
11/10/2017  11:55 PM           154,936 cdb.exe
...
11/10/2017  11:55 PM           576,312 windbg.exe
...
Here is the list of most useful cdb options for the next step:
C:\Program Files (x86)\Windows Kits\10\Debuggers\x64>cdb /?
cdb version 10.0.16299.91
usage: cdb [options]

Options:

  <command-line> command to run under the debugger
  -? displays command line help text
...
  -i <ImagePath> specifies the location of the executables that generated the
                 fault (see _NT_EXECUTABLE_IMAGE_PATH)
...
  -lines requests that line number information be used if present
...
  -logo <logfile> opens a new log file
...
  -p <pid> specifies the decimal process ID to attach to
...
  -pv specifies that any attach should be noninvasive
...
  -y <SymbolsPath> specifies the symbol search path (see _NT_SYMBOL_PATH)
  -z <CrashDmpFile> specifies the name of a crash dump file to debug
...
Environment Variables:

    _NT_SYMBOL_PATH=[Drive:][Path]
        Specify symbol image path.
...
Control Keys:

     <Ctrl-B><Enter> Quit debugger
...
Remember Crtl-B key combination as a way to quit from cdb. I looked as funny as the beginner vi user few times, clicking on everything to get out of that tool...

2. Basic Use of cdb to Process Minidump

Let's assume you've got mysqld.dmp minidump file (a kind of "core" file on UNIX, but better, at least smaller usually) created during some crash. Depending on binaries used, you may need to make sure you have .PDB files in some directory, for the mysqld.exe binary and all .dll files for plugins/extra storage engines used, in some directory. Default path to .PDB files is defined by the _NT_SYMBOL_PATH environment variable and may include multiple directories ad URLs.

Initially I've got advice to set this environment variable as follows:
set _NT_SYMBOL_PATH=srv*c:\symbols*http://msdl.microsoft.com/download/symbols
This assumes that I have a collection of .PDB files in c:\symbols on some locally available server and rely on Microsoft's symbols server for the rest. For anything missing we can always add -y option to point to some directory with additional .PDB files. Note that MariaDB provides .pdb files along with .exe in .msi installer, not only in .zip file with binaries.

So, if your mysqld.dmp file is located in h:\, mysqld.exe for the same version as generated that minidump is located in p:\software and all related .dll files and .pdb files for them all are also there, the command to get basic details about the crash in file h:\out.txt would be the following:
cdb -z h:\mysqld.dmp -i p:\software -y p:\software -logo h:\out.txt -c "!sym prompts;.reload;.ecxr;q"
You can click on every option underlined above to get details. It produces output like this:
C:\Program Files (x86)\Windows Kits\10\Debuggers\x64>cdb -z h:\mysqld.dmp -i p:\
software -y p:\software -logo h:\out.txt -c "!sym prompts;.reload;.ecxr;q"


Microsoft (R) Windows Debugger Version 10.0.16299.91 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [h:\mysqld.dmp]
User Mini Dump File: Only registers, stack and portions of memory are available


************* Path validation summary **************
Response                         Time (ms)     Location
OK                                             p:\software

************* Path validation summary **************
Response                         Time (ms)     Location
OK                                             p:\software
Deferred                                       srv*c:\symbols*http://msdl.micros
oft.com/download/symbols
Symbol search path is: p:\software;srv*c:\symbols*http://msdl.microsoft.com/down
load/symbols
Executable search path is: p:\software
Windows 10 Version 14393 MP (4 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
10.0.14393.206 (rs1_release.160915-0644)
Machine Name:
Debug session time: ...
System Uptime: not available
Process Uptime: 0 days X:YY:ZZ.000
............................................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(1658.fd0): Access violation - code c0000005 (first/second chance not available)

ntdll!NtGetContextThread+0x14:
00007fff`804a7d84 c3              ret
0:053> cdb: Reading initial command '!sym prompts;.reload;.ecxr;q'
quiet mode - symbol prompts on
............................................
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000006
rdx=000001cf0ac2e118 rsi=000001cf0abeeef8 rdi=000001cf0ac2e118
rip=00007fff5f313b0d rsp=000000653804e2b0 rbp=000001cf165c9cc8
 r8=0000000000000000  r9=00007fff5f384448 r10=000000653804ef70
r11=000000653804eb28 r12=0000000000000000 r13=000001cf0ab49d48
r14=0000000000000000 r15=000001cf0b083028
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
ha_spider!spider_db_connect+0xdd:
00007fff`5f313b0d 8b1498          mov     edx,dword ptr [rax+rbx*4] ds:00000000`
00000000=????????
quit:

C:\Program Files (x86)\Windows Kits\10\Debuggers\x64>
that also goes to the file pointed out by the -logo option. Here we have some weird crash in Spider engine of MariaDB that  is not a topic of current post.

If you think the crash is related to some activity of other threads, you can get all unique stack dumps with the following options:
cdb -lines -z h:\mysqld.dmp -i p:\software -y p:\software -logo h:\out.txt -c "!sym prompts;.reload;!uniqstack -p;q"
This is how the backtrace of slave SQL thread may look like, note files with line numbers for each frame (-lines option):
. 44  Id: 1658.1584 Suspend: 0 Teb: 00000065`32185000 Unfrozen
      Priority: 0  Priority class: 32
Child-SP          RetAddr           Call Site
00000065`353fed08 00007fff`8046d119 ntdll!NtWaitForAlertByThreadId+0x14
00000065`353fed10 00007fff`7cbd8d78 ntdll!RtlSleepConditionVariableCS+0xc9
00000065`353fed80 00007ff6`2d7d62e7 KERNELBASE!SleepConditionVariableCS+0x28
00000065`353fedb0 00007ff6`2d446c8e mysqld!pthread_cond_timedwait(struct _RTL_CO
NDITION_VARIABLE * cond = 0x000001ce`66805688, struct _RTL_CRITICAL_SECTION * mu
tex = 0x000001ce`668051b8, struct timespec * abstime = <Value unavailable error>
)+0x27 [d:\winx64-packages\build\src\mysys\my_wincond.c @ 85]
(Inline Function) --------`-------- mysqld!inline_mysql_cond_wait+0x61 [d:\winx6
4-packages\build\src\include\mysql\psi\mysql_thread.h @ 1149]
00000065`353fede0 00007ff6`2d4b1718 mysqld!MYSQL_BIN_LOG::wait_for_update_relay_
log(class THD * thd = <Value unavailable error>)+0xce [d:\winx64-packages\build\
src\sql\log.cc @ 8055]
00000065`353fee90 00007ff6`2d4af03f mysqld!next_event(struct rpl_group_info * rg
i = 0x000001ce`667fe560, unsigned int64 * event_size = 0x00000065`353ff008)+0x2b
8 [d:\winx64-packages\build\src\sql\slave.cc @ 7148]
00000065`353fef60 00007ff6`2d4bb038 mysqld!exec_relay_log_event(class THD * thd
= 0x000001ce`6682ece8, class Relay_log_info * rli = 0x000001ce`66804d58, struct
rpl_group_info * serial_rgi = 0x000001ce`667fe560)+0x8f [d:\winx64-packages\buil
d\src\sql\slave.cc @ 3866
]
00000065`353ff000 00007ff6`2d7d35cb mysqld!handle_slave_sql(void * arg = 0x00000
1ce`66803430)+0xa28 [d:\winx64-packages\build\src\sql\slave.cc @ 5145]
00000065`353ff780 00007ff6`2d852d51 mysqld!pthread_start(void * p = <Value unava
ilable error>)+0x1b [d:\winx64-packages\build\src\mysys\my_winthread.c @ 62]
(Inline Function) --------`-------- mysqld!invoke_thread_procedure+0xe [d:\th\mi
nkernel\crts\ucrt\src\appcrt\startup\thread.cpp @ 91]
00000065`353ff7b0 00007fff`80338364 mysqld!thread_start<unsigned int (void * par
ameter = 0x00000000`00000000)+0x5d [d:\th\minkernel\crts\ucrt\src\appcrt\startup
\thread.cpp @ 115]
00000065`353ff7e0 00007fff`804670d1 kernel32!BaseThreadInitThunk+0x14
00000065`353ff810 00000000`00000000 ntdll!RtlUserThreadStart+0x21
For crash analysis usually !analyze command is also used:
cdb -lines -z h:\mysqld.dmp -i p:\software -y p:\software -logo h:\out.txt -c "!sym prompts;.reload;!analyze -v;q"
It may give some details about the exception happened:
...
FAULTING_IP:
ha_spider!spider_db_connect+dd00007fff`5f313b0d 8b1498          mov     edx,dword ptr [rax+rbx*4]

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007fff5f313b0d (ha_spider!spider_db_connect+0x00000000000000
dd)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000000000000
Attempt to read from address 0000000000000000
DEFAULT_BUCKET_ID:  NULL_POINTER_READ

PROCESS_NAME:  mysqld.exe

ERROR_CODE: (NTSTATUS) 0xc0000005 - <Unable to get error code text>
...
STACK_TEXT:
00000065`3804e2b0 00007fff`5f3132ad : 00000000`00000000 000001cf`1741ab68 000001
cf`0b083028 000001cf`0ac2e118 : ha_spider!spider_db_connect+0xdd
00000065`3804e330 00007fff`5f3117f8 : 000001ce`669107c8 000001cf`0ac2e118 000001
cf`1741ab68 00000000`00000001 : ha_spider!spider_db_conn_queue_action+0xad
00000065`3804ea20 00007fff`5f31a1ee : 00000000`00000000 000001cf`0abeeef8 000000
00`00000000 000001cd`c0b30000 : ha_spider!spider_db_before_query+0x108
00000065`3804eaa0 00007fff`5f31a0bf : 00000000`00000000 00000000`00000000 000000
65`3804ec70 00000000`00000038 : ha_spider!spider_db_set_names_internal+0x11e
00000065`3804eb30 00007fff`5f369b4e : 00000000`00000000 00000065`3804ec70 00007f
ff`5f387f08 00000000`00000000 : ha_spider!spider_db_set_names+0x3f
00000065`3804eb70 00007fff`5f32f5f1 : 00000000`00000001 00000065`00000000 41cfff
ff`00000001 00000000`00000001 : ha_spider!spider_mysql_handler::show_table_statu
s+0x15e
00000065`3804ece0 00007fff`5f3222e8 : 00000000`00000001 00000065`00000000 000000
00`5ab175cd 000001cf`0b0886f8 : ha_spider!spider_get_sts+0x201
00000065`3804edb0 00007ff6`2d7d35cb : 00000000`00000057 000001cf`0ab49d48 000000
00`00000000 00007fff`5f321c10 : ha_spider!spider_bg_sts_action+0x6d8
00000065`3804fa30 00007ff6`2d852d51 : 000001cf`17008fe0 000001cf`0aa3fef0 000000
00`00000000 00000000`00000000 : mysqld!pthread_start+0x1b
00000065`3804fa60 00007fff`80338364 : 00000000`00000000 00000000`00000000 000000
00`00000000 00000000`00000000 : mysqld!thread_start<unsigned int (__cdecl*)(void
 * __ptr64)>+0x5d
00000065`3804fa90 00007fff`804670d1 : 00000000`00000000 00000000`00000000 000000
00`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x14
00000065`3804fac0 00000000`00000000 : 00000000`00000000 00000000`00000000 000000
00`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21
...
Finally (for this post), this is how we can get information about a crashing thread, including details about local variables (like full backtrace in gdb). We apply !for_each_frame extension and use dv to "display variable":
cdb -z h:\mysqld.dmp -i p:\software -y p:\software -logo h:\out.txt -c "!sym prompts;.reload;.ecxr;!for_each_frame dv /t;q"
The result will include details about each frame, parameters and local variables, like this:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
00 00000065`3804e2b0 00007fff`5f3132ad ha_spider!spider_db_connect+0xdd
struct st_spider_share * share = 0x000001cf`165c9cc8
struct st_spider_conn * conn = 0x000001cf`0ac2e118
int link_idx = 0n0
int error_num = <value unavailable>
class THD * thd = 0x000001cf`0abeeef8
int64 connect_retry_interval = <value unavailable>
int connect_retry_count = <value unavailable>
int64 tmp_time = <value unavailable>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
01 00000065`3804e330 00007fff`5f3117f8 ha_spider!spider_db_conn_queue_action+0xa
d
struct st_spider_conn * conn = 0x000001cf`0ac2e118
int error_num = 0n0
char [1532] sql_buf = char [1532] ""
class spider_string sql_str = class spider_string
class spider_db_result * result = <value unavailable>
struct st_spider_db_request_key request_key = struct st_spider_db_request_key
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
02 00000065`3804ea20 00007fff`5f31a1ee ha_spider!spider_db_before_query+0x108
struct st_spider_conn * conn = 0x000001cf`0ac2e118
int * need_mon = 0x000001cf`1741ab68
int error_num = 0n0
bool tmp_mta_conn_mutex_lock_already = true
class ha_spider * spider = <value unavailable>
bool tmp_mta_conn_mutex_unlock_later = <value unavailable>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
03 00000065`3804eaa0 00007fff`5f31a0bf ha_spider!spider_db_set_names_internal+0x
11e
struct st_spider_transaction * trx = 0x000001cf`0b083028
struct st_spider_share * share = 0x000001cf`0ab49d48
struct st_spider_conn * conn = 0x000001cf`0ac2e118
int all_link_idx = 0n0
int * need_mon = 0x000001cf`1741ab68
bool tmp_mta_conn_mutex_lock_already = true
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
04 00000065`3804eb30 00007fff`5f369b4e ha_spider!spider_db_set_names+0x3f
class ha_spider * spider = <value unavailable>
struct st_spider_conn * conn = <value unavailable>
int link_idx = <value unavailable>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
05 00000065`3804eb70 00007fff`5f32f5f1 ha_spider!spider_mysql_handler::show_tabl
e_status+0x15e
class spider_mysql_handler * this = 0x000001cf`0a15dd00
int link_idx = 0n0
int sts_mode = 0n1
unsigned int flag = 1
int error_num = 0n1
struct st_spider_share * share = 0x000001cf`0ab49d48
struct st_spider_conn * conn = 0x000001cf`0ac2e118
class spider_db_result * res = <value unavailable>
unsigned int64 auto_increment_value = 0
unsigned int pos = 0
struct st_spider_db_request_key request_key = struct st_spider_db_request_key
struct st_spider_db_request_key request_key = struct st_spider_db_request_key
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
06 00000065`3804ece0 00007fff`5f3222e8 ha_spider!spider_get_sts+0x201
struct st_spider_share * share = 0x000001cf`0ab49d48
int link_idx = 0n0
int64 tmp_time = 0n1521579469
class ha_spider * spider = 0x00000065`3804ef70
double sts_interval = 10
int sts_mode = 0n1
int sts_sync = 0n0
int sts_sync_level = 0n2
unsigned int flag = 0x18
int error_num = <value unavailable>
int get_type = 0n1
struct st_spider_patition_handler_share * partition_handler_share = <value unava
ilable>
double tmp_sts_interval = <value unavailable>
struct st_spider_share * tmp_share = <value unavailable>
int tmp_sts_sync = <value unavailable>
class ha_spider * tmp_spider = <value unavailable>
int roop_count = <value unavailable>
int tmp_sts_mode = <value unavailable>
class THD * thd = <value unavailable>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
07 00000065`3804edb0 00007ff6`2d7d35cb ha_spider!spider_bg_sts_action+0x6d8
void * arg = 0x000001cf`0ab49d48
int error_num = 0n0
class ha_spider spider = class ha_spider
unsigned int * conn_link_idx = 0x000001cf`1741ab78
unsigned char * conn_can_fo = 0x000001cf`1741ab80 "--- memory read error at addr
ess 0x000001cf`1741ab80 ---"
struct st_spider_conn ** conns = 0x000001cf`1741ab70
int * need_mons = 0x000001cf`1741ab68
int roop_count = 0n0
char ** conn_keys = 0x000001cf`1741ab88
class THD * thd = 0x000001cf`0abeeef8
class spider_db_handler ** dbton_hdl = 0x000001cf`1741ab90
struct st_spider_transaction * trx = 0x000001cf`0b083028
struct st_mysql_mutex spider_global_trx_mutex = <value unavailable>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
08 00000065`3804fa30 00007ff6`2d852d51 mysqld!pthread_start+0x1b
void * p = <value unavailable>
void * arg = 0x000001cf`0ab49d48
<function> * func = 0x00007fff`5f321c10
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
09 (Inline Function) --------`-------- mysqld!invoke_thread_procedure+0xe
void * context = 0x000001cf`17008fe0
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
0a 00000065`3804fa60 00007fff`80338364 mysqld!thread_start<unsigned int (__cdecl
*)(void * __ptr64)>+0x5d
void * parameter = 0x00000000`00000000
<function> * procedure = 0x00007ff6`2d7d35b0
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
0b 00000065`3804fa90 00007fff`804670d1 kernel32!BaseThreadInitThunk+0x14
Unable to enumerate locals, Win32 error 0n87
Private symbols (symbols.pri) are required for locals.
Type ".hh dbgerr005" for details.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
0c 00000065`3804fac0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
Unable to enumerate locals, Win32 error 0n87
Private symbols (symbols.pri) are required for locals.
Type ".hh dbgerr005" for details.
--- 

Stay tuned. I keep working on complex MySQL/MariaDB problems under Windows, so soon will have few more findings and links to share.

by Valeriy Kravchuk (noreply@blogger.com) at March 25, 2018 04:17 PM

March 24, 2018

Valeriy Kravchuk

Fun with Bugs #63 - On Bugs Detected by ASan

Among other things Geir Hoydalsvik stated in his nice post yesterday:
 "We’ve fixed a number of bugs detected by UBsan and Asan."
This is indeed true, I already noted many related bugs fixed in recent MySQL 8.0.4. But I think that a couple of details are missing in the blog post. First of all, there still a notable number of bugs detected by ASan or noted in builds with ASan that remain "Verified". Second, who actually found and reported these bugs?

I decided to do a quick search and present my summary to clarify these details. Let me start with the list of "Verified" or "Open" bugs in public MySQL bugs database, starting from the oldest one:
  • Bug #69715 - "UBSAN: Item_func_mul::int_op() mishandles 9223372036854775809*-1". The oldest related "Verified" bug I found was reported back in 2013 by Arthur O'Dwyer. Shane Bester from Oracle kindly keeps checking it with recent and upcoming releases, so we know that even '9.0.0-dmr-ubsan' (built on 20 October 2017) was still affected.
  • Bug #80309 - "some innodb tests fail with address sanitizer (WITH_ASAN)". It was reported by Richard Prohaska and remains "Verified" for more than two years already.
  • Bug #80581 - "rpl_semi_sync_[non_]group_commit_deadlock crash on ASan, debug". This bug reported by Laurynas Biveinis from Percona two years ago is still "Verified".
  • Bug #81674 - "LeakSanitizer-enabled build fails to bootstrap server for MTR". This bug reported by  Laurynas Biveinis affects only MySQL 5.6, but still, why not to backport the fix from 5.7?
  • Bug #82026 - "Stack buffer overflow with --ssl-cipher=<more than 4K characters>". Bug detected by ASan was noted by Yura Sorokin from Percona and reported by Laurynas Biveinis.
  • Bug #82915 - "SIGKILL myself when using innodb_limit_optimistic_insert_debug=2 and drop table". ASan debug builds are affected. This bug was reported by Roel Van de Paar from Percona.
  • Bug #85995 - "Server error exit due to empty datadir causes LeakSanitizer errors". This bug in MySQL 8.0.1 (that had to affect anyone who runs tests on ASan debug builds on a regular basis) was reported by Laurynas Biveinis and stay "Verified" for almost a year.
  • Bug #87129 - "Unstable test main.basedir". This test problem reported by Laurynas Biveinis affects ASan builds, among others. See also his Bug #87190 - "Test main.group_by is unstable".
  • Bug #87201 - "XCode 8.3.3+ -DWITH_UBSAN=ON bundled protobuf build error". Yet another (this time macOS-specific) bug found by Laurynas Biveinis.
  • Bug #87295 - "Test group_replication.gr_single_primary_majority_loss_1 produces warnings". Potential bug in group replication noted by Laurynas Biveinis in ASan builds.
  • Bug #87923 - "ASan reporting a memory leak on merge_large_tests-t". This bug by Laurynas Biveinis is still "Verified", while Tor Didriksen's comment states that it it resolved with the fix for Bug #87922 (that is closed as fixed in MySQL 8.0.4). Why not to close this one also?
  • Bug #89438 - "LeakSanitizer errors on xplugin unit tests". As Laurynas Biveinis found, X Plugin unit tests report errors with LeakSanitizer.
  • Bug #89439 - "LeakSanitizer errors on GCS unit tests". yet another bug report for MySQL 8.0.4 by Laurynas Biveinis.
  • Bug #89961 - "add support for clang ubsan". This request was made by Tor Didriksen from Oracle. It is marked as "fixed in 8.0.12". It means we may get MySQL 8.0.11 released soon. That's why I decided to mention the bug here.
There were also few other test failures noted on ASan debug builds. I skipped them to make this post shorter.

Personally I do not run builds or tests with ASan on a regular basis. I appreciate Oracle's efforts to make code warning-free, UBSan- and ASan-clean, and fix bugs found with ASan. But I'd also want them to process all/most of related bugs in public database properly before making announcements of new related achievement, and clearly admit and appreciate a lot of help and contribution from specific community members (mostly Laurynas Biveinis in this case).

Percona engineers seem to test ASan builds of MySQL 5.7 and 8.0 (or Percona's closely related versions) regularly, for years, and contribute back public bug reports. I suspect they found way more related bugs than internal Oracle's QA. I think we should explicitly thank them for this contribution that made MySQL better!

by Valeriy Kravchuk (noreply@blogger.com) at March 24, 2018 12:04 PM

March 23, 2018

Peter Zaitsev

This Week in Data with Colin Charles 33: Reporting from FOSSASIA 2018 and Azure Announces Database Services for MySQL and PostgreSQL

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Writing to you on the ground from FOSSASIA 2018, where I gave a track introduction yesterday since we have a pretty awesome database track most Saturday, and generally, all MySQL focused on Sunday. There’s even a list of talks by Oracle MySQL’ers (yes, there’s more than just Oracle folk, but for that, you got to get the schedule).

The Percona Live Community Dinner happens again this year during Percona Live 2018, at Pedro’s on 24 April 2018. It starts at 7 pm, and I highly recommend you purchase the $30 ticket. It usually sells out, so don’t wait till it’s too late.

Some big news from a MySQL in the cloud perspective: Announcing general availability of Azure database services for MySQL and PostgreSQL. There’s also a quick guide: Create an Azure Database for MySQL server by using the Azure portal. Note that next comes MariaDB Server too.

Releases

Link List

Upcoming appearances

The post This Week in Data with Colin Charles 33: Reporting from FOSSASIA 2018 and Azure Announces Database Services for MySQL and PostgreSQL appeared first on Percona Database Performance Blog.

by Colin Charles at March 23, 2018 09:16 AM

March 22, 2018

Peter Zaitsev

The Anatomy of a MongoDB Replica Set

MongoDB Replica Set

In this blog post, we’re going to break down what constitutes a MongoDB replica set.

While replica sets are not a specific environment or solution, understanding what you can do with their sophisticated features helps fit a multitude of situations. With many classical databases, you would need to use services, third party software and scripts to facilitate many of the abilities of a replica set. We are going to explore how you might use each, and end with a complicated diagram showing off these features.

If you ever asked why can’t we have an elastic, highly available, fast to recover and resilient database layer out of the box, then this blog will help you pick the right thing to get the features you need. The design given here – while good, sophisticated and expansive – is only an example. Please make sure you only use what you need to get the result you desire.

MongoDB Replica Set Components

In all projects, we need to some way to store data persistently. In many cases, this needs to be a shared state across all of your applications that allows load balancing for incoming traffic to help you scale. To facilitate this layer of scaling up and out as needed, we typically have a few requirements: Data Loading, Replication, Recovery, Fail-Over and Data stability. In a different database, these are all separate tasks your operations group would handle, using precious staff hours and thus costing you money.  Using replica sets lets you defer much of this work, while supporting higher uptime (also called “9s” due to its automated behaviors).

MongoDB replica sets include a few basic node types: Primary, Secondary, and  Arbiter.  These, in turn, support a few options: Priority, Votes, Delay, Tags, and Hidden. Using these together allows you some very advanced or very simple configurations.

Key terms to know:

Term
Meaning
Primary A node that accepts writes and is the leader for voting (there can be only one!).
Secondary A node that replicates from the Primary or another secondary, and can be used for reads if you tell the query to allow this. There can be a max of 127.
Arbiter In the event your physical node count is an even number, add one of these to break the tie. Never add one where it would make the count even.
Priority No arbiter nodes can have a priority set. Allows you to prefer specific nodes are primary, such as any node in your primary data center or one with more resources.
Votes In some specific cases, having more than eight nodes means additional nodes must not vote. This allows you to set that.
Delay You can make a node not vote, be hidden and delay its replication. This is useful if you want to quickly revert all nodes back by a set amount of time to reverse a change, as resyncing this node to the others is faster than a full recovery.
Tags Grants special ability to make queries directly to a specific node(s). Useful for BI, geo locality and other advanced functions.
Hidden This makes a node unable to take queries from clients without a tag, but is useful for a dedicated DR or backup node.

 

The table above tells you what each main setting is, but there is much more you could use to control writing, reading and even chaining replication. However, those advanced functions are outside of the scope of this document, as you will know when you need them. What we’re going to talk about is the RAID level you need in MongoDB. This is a subject that need more consideration.

Hardware Considerations

Cloud Considerations

Below I will talk about RAID a good deal; this is because it’s important to understand where you can save costs to justify more nodes. However, in the cloud your choices are different. Typically you would not use several ephemeral type drives to make a RAID that would go away. Similarly, you might think to use EBS or other cloud storage systems because they already have durability and scaling in their design. Interestingly, I would rather have local SSD’s with MongoDB as discussed below. It provides duplication and stripes data across many nodes to solve that issue. It also reduces the impact or network issues affecting storage. This sounds great, but it also means I can’t just clone a snapshot to build a new node, and in a traditional RDBMS you might opt for EBS because of the work involved in creating a new replica. With MongoDB, however, our story is different. We instead merely spin up a new node and tell it what replica set it is part of. When MongoDB starts up, it automatically copies the data for use. The ability to copy data when needed removes the need for shared storage or using snapshots.

This is not to say you MUST use ephemeral SSD storage. You can, of course, use provisioned IOPS or basic EBS for example. However, in both cases there are logical network limitations to consider on your account, let alone per-node. EBS provides you an extra layer of protection on the storage level, but I would instead take those same savings and allow myself an additional data bearing node, allowing even more availability, resiliency and read scaling.

RAID Considerations

When talking about their own data center, many people would suggest we use RAID5 everywhere. However, is this the best option? To answer this, here’s a quick primer on RAID levels.

RAID level 0 – Striping

Data is split up into blocks that get written across all the drives in the array. By using multiple disks (a minimum of two) at the same time, this offers superior I/O performance. RAID 0 provides excellent performance, both in read and write operations. There is no overhead caused by parity controls. We use all storage capacity, and there is no overhead.

BUT…

RAID 0 is not fault-tolerant. If one drive fails, we lose all data in the RAID 0 array. It should not be used for mission-critical systems. RAID 0 does not provide any performance benefits either.

RAID level 1 – Mirrored

Data is stored twice, as when anything is written to drive 0 the same write event also occurs on drive1 or drive(n) in larger RAID1 setups. When you have a drive failure, you can replace the failed drive and copy the data block by block to the new drive from your existing good drive. As this is pure sequentially reads, it’s recovery is considered to be very fast and stable. RAID1 allows you to read from N number of copy drives, so performance increases with available drives. Write performance, however, is still limited to the speed of a single drive. The rebuild is easy as its  block copy, not a logical rebuild

But…

Where RAID 0 allowed you to use all storage capacity, RAID1 cuts it by 50% due to duplication. Software RAID1 is not the best option due to rebuild and hot-swap issues. You need hardware controllers for anything above level 0.

RAID level 10 or 1+0 – Mirrored & Striped

RAID 10 can also be considered to be RAID 1+0. It is the marriage of both designs, taking on the ability to scale to more sets of duplication. In RAID 0, you could have four drives but there would not duplicate any data. In RAID 1, on the other hand, you would have the capacity of a single drive with three additional replica drives. Obviously getting 25% of your storage is only useful in extreme redundancy needs, for which you usually would use a SAN instead. With RAID10, you could say d1/d2 and d3/d4 are paired, so you get the read performance of four drives and the write performance of two drives while having disk failure tolerance.

BUT…

RAID5 allows more than 50% usage of capacity, unlike RAID1+0. The trade-off is capacity vs. Mean Time to Recovery (MTR). The minimum drive count is four drives. You should carefully consider this vs. additional shards in MongoDB.

RAID level 5 – Striping with Parity

RAID 5 is the most common RAID in a typical system. It provides good data capacity, durability and read performance. However, it is strongly impacted by write cost, which includes rebuilding times. This RAID level is going to have the slowest recovery time, which keeps the system in a degraded state (but still working). Adding more drives allows you to create what is also called RAID 6, or having 2+ parity drives, allowing you multiple disk failures before a full outage. Like RAID1, hardware controllers are preferred for both performance and stability reasons.  Many database systems use this, as rebuilding or recovery is manual in the database, and we want to avoid that work. This is where MongoDB differs. Replica sets already provide stripping+partity. However, it’s over separate systems, meaning its even less likely for all components to fail at the same time. For this reason, to run any replica set with RAID 5/6  is considered overkill. I typically recommend either RAID1 or RAID10 – it varies by the storage available. It is common for larger drivers to cost more, so you should consider the ROI on doing RAID 10 with four drives of smaller size (this is especially true with SSDs).

As mentioned, MongoDB already provides duplication and stripping across nodes. For this reason, some people use RAID0, which is as wrong as using RAID5. This is the Goldilocks paradox in action: one does not provide enough single node recoverability, and the other too much so that it costs you valuable response time. RAID1 or 1+0 is the best choice here, so long as you follow the best advice and have three full data nodes (not two for data and one arbiter). It is important to note that RAID5 or even RAID6 are still acceptable, but you need to consider that their recovery times might exceed the time MongoDB could have just done initial sync with less than five minutes of your DBA/SRE’s time. MongoDB takes this even further when you consider sharding. It adds an extra layer as only N% of data is owned in any single shard, and the replica set + RAID1 provides the durability of that data.

Now that you are an active contender to be an expert in the features and consideration of a replica set, we should see two replica sets in action by showing you an “Every Feature Configuration Replica Set” architecture example. While it is very complicated, it can show you how you could run two main data centers and one DR data center, needing east and local west reads for performance, local backup nodes to improve recovery, delayed slave in the DR, and even a preferred pair of nodes to become primary.

The most common thing people get wrong with replica sets is one thing with two outcomes: the arbiter. You should always run three full data nodes if you don’t need any work such as building indexes, building a node, or backups. These can have a measurable impact on your system, or worst yet, risk your HA coverage. On the other side, do not randomly add arbiters. Only use them to break ties and only where you need to. Adding a full data node where you can afford it will always improve your uptime SLA capabilities.

MongoDB Replica Set

The nodes in the diagram explained:

P (Primary)

All writes are sent to this node for this replica set. S1-S5 could also become primary, however, as  S3-S5 have priority:1, and P, S1, and S2 have priority:2. The system will keep the primary on the west coast, only moving to the east coast in dire need.

B (Backup)

This non-voting node is dedicated for backups, as seen by the gray heartbeat, and it does not vote or answer queries, and is not in the “West” tag-set.

S1/S2 (Secondaries 1 & 2)

Both of these nodes are normal secondaries that do vote but are also members of the West tag-set. This means any applications in West that use the tag will query them (not S3-S5).  In the event of an election, as this is the primary data center they have a priority of 2, so P, S1, S2 are preferred to keep the primary in West.

S3/4/5 (Secondaries 3,4,& 5)

Like S1/S2, these nodes are secondaries, but only become primary if the West DC is down in some way.  Additionally, as these are green, they are in the “East” tag set. Reads for east coast applications go here, not to the west coast, but writes still go to the primary.

DS (Delayed Secondary)

This is a special node that is purple as it’s hidden so it won’t take reads, and delayed so that it applies stuff a full two hours behind the rest of the cluster. Doing this allows the final data center to be a proper Disaster Recovery site. This node can never become primary, and it is unable to vote.

A (Arbiter)

Finally, we get to the arbiter. As we have three voting members in East and West each, we have an even six votes. To break any ties, we have placed an arbiter that holds no data into the DR to ensure either West or East will always have a primary if the opposing DC was to go offline.

I hope this has helped you feel more confident in knowing how a MongoDB replica set works, how to configure them and proper hardware planning when it comes to storage for a distributed database like MongoDB.

The post The Anatomy of a MongoDB Replica Set appeared first on Percona Database Performance Blog.

by David Murphy at March 22, 2018 11:28 PM

Oli Sennhauser

MySQL sys Schema in MariaDB 10.2

MySQL has introduced the PERFORMANCE_SCHEMA (P_S) in MySQL 5.5 and made it really usable in MySQL 5.6 and added some enhancements in MySQL 5.7 and 8.0.

Unfortunately the PERFORMANCE_SCHEMA was not really intuitive for the broader audience. Thus Mark Leith created the sys Schema for an easier access for the normal DBA and DevOps and Daniel Fischer has enhanced it further. Fortunately the sys Schema up to version 1.5.1 is available on GitHub. So we can adapt and use it for MariaDB as well. The version of the sys Schema in MySQL 8.0 is 1.6.0 and seems not to be on GitHub yet. But you can extract it from the MySQL 8.0 directory structure: mysql-8.0/share/mysql_sys_schema.sql. According to a well informed source the project on GitHub is not dead but the developers have just been working on other priorities. An the source announced another release soon (they are working on it at the moment).

MariaDB has integrated the PERFORMANCE_SCHEMA based on MySQL 5.6 into its own MariaDB 10.2 server but unfortunately did not integrate the sys Schema. Which PERFORMANCE_SCHEMA version is integrated in MariaDB can be found here.

To install the sys Schema into MariaDB we first have to check if the PERFORMANCE_SCHEMA is activated in the MariaDB server:

mariadb> SHOW GLOBAL VARIABLES LIKE 'performance_schema';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| performance_schema | OFF   |
+--------------------+-------+

To enable the PERFORMANCE_SCHEMA just add the following line to your my.cnf:

[mysqld]

performance_schema = 1

and restart the instance.

In MariaDB 10.2 the MySQL 5.6 PERFORMANCE_SCHEMA is integrated so we have to run the sys_56.sql installation script. If you try to run the sys_57.sql script you will get a lot of errors...

But also the sys_56.sql installation script will cause you some little troubles which are easy to fix:

unzip mysql-sys-1.5.1.zip 
mysql -uroot < sys_56.sql

ERROR 1193 (HY000) at line 20 in file: './procedures/diagnostics.sql': Unknown system variable 'server_uuid'
ERROR 1193 (HY000) at line 20 in file: './procedures/diagnostics.sql': Unknown system variable 'master_info_repository'
ERROR 1193 (HY000) at line 20 in file: './procedures/diagnostics.sql': Unknown system variable 'relay_log_info_repository'

For a quick hack to make the sys Schema work I changed the following information:

  • server_uuid to server_id
  • @@master_info_repository to NULL (3 times).
  • @@relay_log_info_repository to NULL (3 times).

For the future the community has to think about if the sys Schema should be aware of the 2 branches MariaDB and MySQL and act accordingly or if the sys Schema has to be forked to work properly for MariaDB and implement MariaDB specific functionality.

When the sys Schema finally is installed you have the following tables to get your performance metrics:

mariadb> use sys
mariadb> SHOW TABLES;
+-----------------------------------------------+
| Tables_in_sys                                 |
+-----------------------------------------------+
| host_summary                                  |
| host_summary_by_file_io                       |
| host_summary_by_file_io_type                  |
| host_summary_by_stages                        |
| host_summary_by_statement_latency             |
| host_summary_by_statement_type                |
| innodb_buffer_stats_by_schema                 |
| innodb_buffer_stats_by_table                  |
| innodb_lock_waits                             |
| io_by_thread_by_latency                       |
| io_global_by_file_by_bytes                    |
| io_global_by_file_by_latency                  |
| io_global_by_wait_by_bytes                    |
| io_global_by_wait_by_latency                  |
| latest_file_io                                |
| metrics                                       |
| processlist                                   |
| ps_check_lost_instrumentation                 |
| schema_auto_increment_columns                 |
| schema_index_statistics                       |
| schema_object_overview                        |
| schema_redundant_indexes                      |
| schema_table_statistics                       |
| schema_table_statistics_with_buffer           |
| schema_tables_with_full_table_scans           |
| schema_unused_indexes                         |
| session                                       |
| statement_analysis                            |
| statements_with_errors_or_warnings            |
| statements_with_full_table_scans              |
| statements_with_runtimes_in_95th_percentile   |
| statements_with_sorting                       |
| statements_with_temp_tables                   |
| sys_config                                    |
| user_summary                                  |
| user_summary_by_file_io                       |
| user_summary_by_file_io_type                  |
| user_summary_by_stages                        |
| user_summary_by_statement_latency             |
| user_summary_by_statement_type                |
| version                                       |
| wait_classes_global_by_avg_latency            |
| wait_classes_global_by_latency                |
| waits_by_host_by_latency                      |
| waits_by_user_by_latency                      |
| waits_global_by_latency                       |
+-----------------------------------------------+

One query as an example: Top 10 MariaDB global I/O latency files on my system:

mariadb> SELECT * FROM sys.waits_global_by_latency LIMIT 10;
+--------------------------------------+-------+---------------+-------------+-------------+
| events                               | total | total_latency | avg_latency | max_latency |
+--------------------------------------+-------+---------------+-------------+-------------+
| wait/io/file/innodb/innodb_log_file  |   112 | 674.18 ms     | 6.02 ms     | 23.75 ms    |
| wait/io/file/innodb/innodb_data_file |   892 | 394.60 ms     | 442.38 us   | 29.74 ms    |
| wait/io/file/sql/FRM                 |   668 | 72.85 ms      | 109.05 us   | 20.17 ms    |
| wait/io/file/sql/binlog_index        |    10 | 21.25 ms      | 2.13 ms     | 15.74 ms    |
| wait/io/file/sql/binlog              |    19 | 11.18 ms      | 588.56 us   | 10.38 ms    |
| wait/io/file/myisam/dfile            |    79 | 10.48 ms      | 132.66 us   | 3.78 ms     |
| wait/io/file/myisam/kfile            |    86 | 7.23 ms       | 84.01 us    | 789.44 us   |
| wait/io/file/sql/dbopt               |    35 | 1.95 ms       | 55.61 us    | 821.68 us   |
| wait/io/file/aria/MAI                |   269 | 1.18 ms       | 4.40 us     | 91.20 us    |
| wait/io/table/sql/handler            |    36 | 710.89 us     | 19.75 us    | 125.37 us   |
+--------------------------------------+-------+---------------+-------------+-------------+

Taxonomy upgrade extras: 

by Shinguz at March 22, 2018 09:54 PM

Peter Zaitsev

Five Tips to Optimize MongoDB

Optimize MongoDB

Optimize MongoDBIn this blog, we’ll look at five ways to quickly optimize MongoDB.

Have you ever had performance issues with your MongoDB database? A common situation is a sudden performance issue when running a query. The obvious first solution is “let’s create an index!” While this works in some cases, there are other options we need to consider when trying to optimize MongoDB.

Performance is not a matter of having big machines with very expensive disks and gigabit networks. In fact, these are not necessarily the keys to good performance.

MongoDB performance comes from good concepts, organization and data distribution. We are going to list some best practices for good MongoDB optimization. This is not an exhaustive or complete guide, as there are many variables. But this is a good start.

Keep documents simple

MongoDB is a schema-free database. This means there is no predefined schema by default. We can add a predefined schema in newer versions, but it is not mandatory. Be aware of the difficulties involved when working with embedded documents and arrays as it can become really complicated to parse your data in the application side/ETL process. Besides, arrays can hurt the replication performance: for every change in the array, all the array values are replicated!

In MMAPv1, choosing the right field names is really important because the database needs to save the field name for each document. It is not like saving the schema in a relational database. Let’s imagine how much data a field called “lastmessagereceivedfromsensor” costs you if you have a million documents: around 28 MB just to save this field name! A collection with ten fields would demand 280MB (just to save an empty document).

Documents almost hitting this document size aren’t desirable, as the database will need a lot of pages to work on one single document. This demands more CPU cycles to finish any operation.

Hardware is important but…

Using good hardware with several processors and a considerable amount of memory definitely helps for a good performance.

WiredTiger takes advantage of multiple processors to deliver a good performance. This storage engine features a per-document locking algorithm so as many processors and as many operations can run at the same time (there is a ticket limitation, but this is out of this blog’s scope). The MMAPv1 storage engine, however, does have to lock per collection and sometimes cannot take advantage of multiple processors to write.

But what could happen in an environment with three big machines (32 CPUs, 128 RAM and 2TB disk) when one instance dies? The answer is it will failover and the drivers are smart enough to read the health instances and write the new primary. However, your performance will not be the same.

That’s not always true, but having multiple small/medium machines in a distributed environment can ensure that outages are going to affect only a few parts of the shard — with little or no perception by the application. But at the same time, more machines implies in a high probability to have a failure. Consider this tradeoff when designing your environment. The right choices affect performance.

Read preference and WriteConcern

The read preference and write-concern vary according to a company’s requirements. But please keep in mind that new MongoDB versions (3.6) use writeConcern: “majority” and readConcern: “primary”.

This means it must acknowledge all the writes in at least floor((N/0.5)+1) – where N is the number of instances in the replica set. This can be slow. However, this is a fair trade-off for consistency for speed. 

Please make sure you’re using the most appropriate read preference and write concern in your company. Drivers always read from the primary, but if it is not a requirement for your environment consider distributing the queries among the other instances. If you don’t, the instances are only for failover and won’t get used in regular operation.

Working set

How big is the working set? Usually, an application doesn’t use all the data. Some data is updated often, while other data isn’t.

Does your working data set fit in RAM? Optimal performance occurs when all the working data set is in RAM. Wome slowness, like page faults, can hurt performance depending on what you’re using.

Reads, such as backup, ETL or reporting from primaries, can really hurt performance as there is competition to have pages in cache. The same is true for large reports or aggregation.
Having multiple collections for multiple purposes and using specific machines for specific purposes – such as using zones to save documents that will no longer be used – will help to have simple and expected working set.

Monitoring

Are you monitoring your system? Can you tell the difference in performance from last week to this week?

If you are not using any monitoring system and want to use a free tool, we highly recommend Percona Monitoring and Management (PMM) to monitor both MongoDB, MySQL and PostgreSQL. With a GUI monitoring system, it is easy to see pattern activities and isolate instances at a specific point in time. Recording the MongoDB log files also helps to understand what one instance is doing (as all the slow queries >100ms are logged by default).

 

I hope you found this article on how to optimize MongoDB helpful. If you have any questions or concerns, please ping me @AdamoTonete, or @percona on twitter.

The post Five Tips to Optimize MongoDB appeared first on Percona Database Performance Blog.

by Adamo Tonete at March 22, 2018 09:46 PM

March 21, 2018

Peter Zaitsev

Percona Live 2018 Featured Talk: Deep Dive into the RDS PostgreSQL Universe with Jignesh Shah

Percona Live 2018 Jignesh AWS

Percona Live 2018 Featured TalkWelcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Jignesh Shah, Senior Product Manager at Amazon Web Services. His talk is titled Deep Dive into the RDS PostgreSQL Universe. PostgreSQL is a very popular relational database gaining traction in Amazon’s RDS cloud environment. In our conversation, we discussed the features, uses, and benchmarks for PostgreSQL in AWS RDS:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

I am a Senior Product Manager for Amazon Relational Database Service. I first started learning about databases with dBase III in 1994, followed by FoxPro, Progress 4GL, IBM DB2. I started learning about open source databases in the 2000s including PostgreSQL and MySQL, with a focus on database performance tuning on Sun Solaris Systems, and got closely involved in PostgreSQL-related benchmarks. One thing led to another, and I ended up building databases, virtual machines, and application lifecycle management products.

Percona: Your talk is titled “Deep Dive into the RDS PostgreSQL Universe”. How popular is PostgreSQL in Amazon RDS? 

PostgreSQL engines – including Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL compatibility – are very popular and fast growing. Customers love the flexibility of PostgreSQL and ease of operations provided by Amazon RDS. Customers are excited by the innovations happening here, and love to give feedback on features and capabilities we can add to PostgreSQL in Amazon RDS. Most of our features are driven by customer requests, and customers are excited when they see their requested features available in the service.

Percona Live 2018 Jignesh AWSPercona: Why would you use PostgreSQL in Amazon RDS as opposed to other databases?

PostgreSQL offers good performance out of the box, with transactional semantics very similar to those of Oracle and SQL Server. PostgreSQL is object-oriented and ANSI SQL:2008 compatible, which makes it easy for customers to migrate applications from other relational database platforms. PostgreSQL also has very strong support for geospatial capabilities with the PostGIS extension and supports stored procedures in many languages, including PL/pgSQL (which is very similar to Oracle’s PL/SQL).

Percona: What PostgreSQL features are especially useful?

Every major release of PostgreSQL comes with new interesting features. Features like JSONB to handle JSON data types, spatial features with PostGIS for developing location-based services, foreign database wrappers to do federated queries, and replication features to replicate data are very useful. Especially for modern application development where speed and operational readiness are required for startups and enterprises.

Percona: Why should people attend your talk? What do you hope people will take away from it?

PostgreSQL and Amazon RDS together solve many developer needs and make operational lives easier for administrators – saving them time, resources and cost. Come and learn what is new in Amazon RDS for PostgreSQL, and look under the hood of how some of the capabilities work behind the scenes!

Percona: What are you looking forward to at Percona Live (besides your talk)?

I look forward to hearing from customers about their experiences with PostgreSQL, and learning more about the latest developments in open source databases at Percona Live.

Want to find out more about this Percona Live 2018 featured talk, and PostgreSQL in AWS RDS? Register for Percona Live 2018, and see Jignesh’s talk Deep Dive into the RDS PostgreSQL Universe. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

The post Percona Live 2018 Featured Talk: Deep Dive into the RDS PostgreSQL Universe with Jignesh Shah appeared first on Percona Database Performance Blog.

by Dave Avery at March 21, 2018 07:35 PM

FLUSH and LOCK Handling in Percona XtraDB Cluster

FLUSH and LOCK Handling

FLUSH and LOCK HandlingIn this blog post, we’ll look at how Percona XtraDB Cluster (PXC) executes FLUSH and LOCK handling.

Introduction

Percona XtraDB Cluster is a multi-master solution that allows parallel execution of the transactions on multiple nodes at the same point in time. Given this semantics, it is important to understand how Percona XtraDB Cluster executes statements regarding FLUSH and LOCK handling (that operate at node level).

The section below enlist different flavors of these statements and their PXC semantics

FLUSH TABLE WITH READ LOCK
  • FTWRL is normally used for backup purposes.
  • Execution of this command establishes a global level read lock.
  • This read lock is non-preemptable by the background running applier thread.
  • PXC causes the node to move to DESYNC state (thereby blocking emission of flow-control) and also pauses the node.

2018-03-08T05:09:54.293991Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 1777)
2018-03-08T05:09:58.040809Z 5 [Note] WSREP: Provider paused at c7daf065-2285-11e8-a848-af3e3329ab8f:2002 (2047)
2018-03-08T05:14:20.508317Z 5 [Note] WSREP: resuming provider at 2047
2018-03-08T05:14:20.508350Z 5 [Note] WSREP: Provider resumed.
2018-03-08T05:14:20.508887Z 0 [Note] WSREP: Member 1.0 (n2) resyncs itself to group
2018-03-08T05:14:20.508900Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 29145)
2018-03-08T05:15:16.932759Z 0 [Note] WSREP: Member 1.0 (n2) synced with group.
2018-03-08T05:15:16.932782Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 29145)
2018-03-08T05:15:16.988029Z 2 [Note] WSREP: Synchronized with group, ready for connections
2018-03-08T05:15:16.988054Z 2 [Note] WSREP: Setting wsrep_ready to true

  • Other nodes of the cluster continue to process the workload.
  • DESYNC and pause node continue to see the replication traffic. Though it doesn’t process the write-sets, they are appended to Galera cache for future processing.
  • Fallback: When FTWRL is released (through UNLOCK TABLES), and if the workload is active on other nodes of the cluster, FTWRL executed node may start emitting flow-control to cover the backlog. Check details here.
FLUSH TABLE <tablename> (WITH READ LOCK|FOR EXPORT)
  • It is meant to take global level read lock on the said table only. This lock command is not replicated and so pxc_strict_mode = ENFORCING blocks execution of this command.
  • This read lock is non-preemptable by the background running applier thread.
  • Execution of this command will cause the node to pause.
  • If the flush command executing node is same as workload processing node, then the node will pause immediately
  • If the flush command executing node is different from workload processing node, then the write-sets are queued to the incoming queue and flow-control will cause the pause.
  • End-result is cluster will stall in both cases.

2018-03-07T06:40:00.143783Z 5 [Note] WSREP: Provider paused at 40de14ba-21be-11e8-8e3d-0ee226700bda:147682 (149032)
2018-03-07T06:40:00.144347Z 5 [Note] InnoDB: Sync to disk of `test`.`t` started.
2018-03-07T06:40:00.144365Z 5 [Note] InnoDB: Stopping purge
2018-03-07T06:40:00.144468Z 5 [Note] InnoDB: Writing table metadata to './test/t.cfg'
2018-03-07T06:40:00.144537Z 5 [Note] InnoDB: Table `test`.`t` flushed to disk
2018-03-07T06:40:01.855847Z 5 [Note] InnoDB: Deleting the meta-data file './test/t.cfg'
2018-03-07T06:40:01.855874Z 5 [Note] InnoDB: Resuming purge
2018-03-07T06:40:01.855955Z 5 [Note] WSREP: resuming provider at 149032
2018-03-07T06:40:01.855970Z 5 [Note] WSREP: Provider resumed.

  • Once the lock is released (through UNLOCK TABLES), node resumes apply of write-sets.
LOCK TABLE <tablename> READ/WRITE
  • LOCK TABLE command is meant to lock the said table in the said mode.
  • Again, the lock established by this command is non-preemptable.
  • LOCK is taken at node level (command is not replicated) so pxc_strict_mode = ENFORCING blocks this command.
  • There is no state change in PXC on the execution of this command.
  • If the lock is taken on the table that is not being touched by the active workload, the workload can continue to progress. If the lock is taken on the table that is part of the workload, said transaction in the workload will wait for the lock to get released, in turn, will cause complete workload to halt.
GET_LOCK
  • It is named lock and follows same semantics as LOCK TABLE for PXC. (Base semantics of MySQL are slightly different that you can check here).
LOCK TABLES FOR BACKUP
  • As the semantics goes, this lock is specially meant for backup and blocks non-transactional changes (like the updates to non-transactional engine = MyISAM and DDL changes).
  • PXC doesn’t have any special add-on semantics for this command
LOCK BINLOG FOR BACKUP
  • This statement blocks write to binlog. PXC always generates a binlog (persist to disk is controlled by the log-bin setting). If you disable log-bin, then PXC enables emulation-based binlogging.
  • This effectively means this command can cause the cluster to stall.

Tracking active lock/flush

  • If you have executed a flush or lock command and wanted to find out, it is possible using the com_% counter. These counters are connection specific, so execute these commands from the same client connection. Also, these counters are aggregate counters and incremental only.

mysql> show status like 'Com%lock%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Com_lock_tables | 2 |
| Com_lock_tables_for_backup | 1 |
| Com_lock_binlog_for_backup | 1 |
| Com_unlock_binlog | 1 |
| Com_unlock_tables | 5 |
+----------------------------+-------+
5 rows in set (0.01 sec)
mysql> show status like '%flush%';
+--------------------------------------+---------+
| Variable_name | Value |
+--------------------------------------+---------+
| Com_flush | 4 |
| Flush_commands | 3 |
* Flush_commands is a global counter. Check MySQL documentation for more details.

Conclusion

By now, we can conclude that the user should be a bit more careful when executing local lock commands (understanding the semantics and the effect). Careful execution of these commands can help serve your purpose.

The post FLUSH and LOCK Handling in Percona XtraDB Cluster appeared first on Percona Database Performance Blog.

by Krunal Bauskar at March 21, 2018 07:30 PM

March 20, 2018

Peter Zaitsev

Using Different Mount Points on PMM Docker Deployments

Mount Points on PMM Docker

Mount Points on PMM DockerIn this blog post, we’ll see how to use different mount points on PMM Docker deployments (Percona Monitoring and Management). This is useful if you want to use other mount points for the different directories, or even if you want to use a custom path that is not bound to Docker’s volumes directory (which is /var/lib/docker/volumes/ by default) within the same mount point.

There are two ways in which you can achieve this:

  • using symlinks after the pmm-data container is created
  • modifying the docker create command to use different directories

In the following examples, /pmm/ is used as the new base directory. One can, of course, choose different directories for each if needed. Also, remember to be aware of any SELinux or AppArmor policies you may have in place.

Using symlinks

For this, we need to follow these steps:

  1. Create the needed directories
  2. Create the pmm-data container
  3. Move contents from default Docker paths to the desired paths
  4. Create symlinks that point to the moved directories

Let’s see this with some commands and outputs. In this example, we will use /pmm/ as if it were the new mount point:

shell> mkdir /pmm/opt/
shell> mkdir /pmm/opt/prometheus
shell> mkdir /pmm/var/lib/

shell> docker create
  -v /opt/prometheus/data
  -v /opt/consul-data
  -v /var/lib/mysql
  -v /var/lib/grafana
  --name pmm-data
  percona/pmm-server:1.7.0 /bin/true
4589cd1bf8ce365f8f62eab9f415eb14f1ce3a76b0123b7aad42e93385455303

shell> docker inspect pmm-data | egrep "Source|Destination"
"Source": "/var/lib/docker/volumes/a191331f6be1a177003ef2fdeee53f92fc190dc67b0c402ee7b47b4461ffa522/_data",
"Destination": "/opt/prometheus/data",
"Source": "/var/lib/docker/volumes/7208317edff4565f649df294cfb05fc1888e6ab817c18abc5f036c419e364d4b/_data",
"Destination": "/var/lib/grafana",
"Source": "/var/lib/docker/volumes/547b3f083a0a33b6cd75eb72e2cc25c383f5d4db2d8a493b25eb43499e2f5807/_data",
"Destination": "/var/lib/mysql",
"Source": "/var/lib/docker/volumes/7473ac5d2dac4440ac94fae2faf4a63af95baaabed4b14d9414f499ae9b5761d/_data",
"Destination": "/opt/consul-data",
shell> DOCKER_CONSUL_DATA="/var/lib/docker/volumes/7473ac5d2dac4440ac94fae2faf4a63af95baaabed4b14d9414f499ae9b5761d/_data"
shell> DOCKER_PROMETHEUS_DATA="/var/lib/docker/volumes/a191331f6be1a177003ef2fdeee53f92fc190dc67b0c402ee7b47b4461ffa522/_data"
shell> DOCKER_GRAFANA_DATA="/var/lib/docker/volumes/7208317edff4565f649df294cfb05fc1888e6ab817c18abc5f036c419e364d4b/_data"
shell> DOCKER_MYSQL_DATA="/var/lib/docker/volumes/547b3f083a0a33b6cd75eb72e2cc25c383f5d4db2d8a493b25eb43499e2f5807/_data"
shell> mv $DOCKER_CONSUL_DATA /pmm/opt/consul-data
shell> mv $DOCKER_PROMETHEUS_DATA /pmm/opt/prometheus/data
shell> mv $DOCKER_GRAFANA_DATA /pmm/var/lib/grafana
shell> mv $DOCKER_MYSQL_DATA /pmm/var/lib/mysql

shell> ln -s /pmm/opt/consul-data $DOCKER_CONSUL_DATA
shell> ln -s /pmm/opt/prometheus/data $DOCKER_PROMETHEUS_DATA
shell> ln -s /pmm/var/lib/grafana $DOCKER_GRAFANA_DATA
shell> ln -s /pmm/var/lib/mysql $DOCKER_MYSQL_DATA

After this, we can start the pmm-server container (see below).

Modifying the docker create command

For this, we need to follow these other steps:

  1. Create the needed directories
  2. Create a temporary pmm-data container
  3. Copy its contents to the new locations, and delete it (the temporary container)
  4. Create the permanent pmm-data container with the modified paths (-v arguments)
  5. Fix ownership of files in the copied directories (to avoid errors when starting the pmm-server container later on)

Let’s see this in practical terms again, assuming we want to use the /pmm/ mount point.

shell> mkdir /pmm/opt/
shell> mkdir /pmm/opt/prometheus
shell> mkdir /pmm/var/lib/

shell> docker create
  -v /opt/prometheus/data
  -v /opt/consul-data
  -v /var/lib/mysql
  -v /var/lib/grafana
  --name pmm-data-temporary
  percona/pmm-server:1.7.0 /bin/true
76249e1830c2a9c320466e41a454e9e80bf513e9b046e795ec41a33d75df5830

shell> docker cp pmm-data-temporary:/opt/prometheus/data /pmm/opt/prometheus/data
shell> docker cp pmm-data-temporary:/opt/consul-data /pmm/opt/consul-data
shell> docker cp pmm-data-temporary:/var/lib/mysql /pmm/var/lib/mysql
shell> docker cp pmm-data-temporary:/var/lib/grafana /pmm/var/lib/grafana
shell> docker rm -v pmm-data-temporary

shell> docker create
  -v /pmm/opt/prometheus/data:/opt/prometheus/data
  -v /pmm/opt/consul-data:/opt/consul-data
  -v /pmm/var/lib/mysql:/var/lib/mysql
  -v /pmm/var/lib/grafana:/var/lib/grafana
  --name pmm-data
  percona/pmm-server:1.7.0 /bin/true
d4c10ae9fb2e38758df999268573f4a8cddb5b47389b349f55733d2e54815bf0

shell> docker run --rm --volumes-from pmm-data -it percona/pmm-server:1.7.0 chown -R pmm:pmm /opt/prometheus/data /opt/consul-data
shell> docker run --rm --volumes-from pmm-data -it percona/pmm-server:1.7.0 chown -R grafana:grafana /var/lib/grafana
shell> docker run --rm --volumes-from pmm-data -it percona/pmm-server:1.7.0 chown -R mysql:mysql /var/lib/mysql

After this, we can start the pmm-server container (see below).

Running pmm-server container

After following either of the steps mentioned above, we can run the pmm-server container with the exact same commands as shown in the online documentation:

shell> docker run -d
   -p 80:80
   --volumes-from pmm-data
   --name pmm-server
   --restart always
   percona/pmm-server:1.7.0
0caa14f6fa22c419876de0dfb635535dbba41a2bd82b51b3d8a5be0b763fa6d2

And that’s it! Now you should have custom mount points on PMM docker deployment.

The post Using Different Mount Points on PMM Docker Deployments appeared first on Percona Database Performance Blog.

by Agustín at March 20, 2018 09:02 PM

Percona Blog Poll: What Percona Software Are You Using?

Percona Software

Percona SoftwareThis blog post contains a poll that helps us find out what Percona software the open source database community is using.

Nearly 20 years ago, Netscape released the source code for its Netscape Communicator web browser. This marked one of the biggest moments in “open source” history. The formation of The Open Source Initiative happened shortly after that. Bruce Perens, one of the working group’s founders, adapted his Free Software Guidelines as the official Open Source Definition.

Since then, open source software has gone from being the exception in large projects and enterprises, to being a normal part of huge deployments and daily business activities. Open source software is used by some of the biggest online companies: Facebook, YouTube, Twitter, etc. Many of these companies depend on open source software as part of their business model.

Percona’s mission is to champion unbiased open source database solutions. As part of this mission, we provide open source software, completely free of charge and for reuse. We developed our Percona Server for MySQL and Percona Server for MongoDB solutions to not only be drop-in replacements for existing open source software, but often incorporate “enterprise” features from upstream.

We’ve also recognized a need for a database clustering and backup solutions, and created Percona XtraDB Cluster and Percona XtraBackup to address those concerns.

Beyond database software, Percona has created management and monitoring tools like Percona Monitoring and Management that not only help DBAs with day-to-day tasks, but also use metrics to find out how best to configure, optimize and architect a database environment to best meet the needs of applications and websites.

What we’d like to know is which of our software products are you currently using in your database environment? Are you using just database software, just management and monitoring tools, or a combination of both? As Percona makes plans for the year, we’d like to know what the community is using, what they find helpful, and how we can best allocate our resources to address those needs. We are always looking for the best ways to invest in and grow the Percona software and tools people use.

Complete the survey below by selecting all the options that apply.

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

Thanks in advance for your responses – this helps us see which of our software is being deployed in the community.

The post Percona Blog Poll: What Percona Software Are You Using? appeared first on Percona Database Performance Blog.

by Matt Yonkovit at March 20, 2018 07:33 PM

Webinar Thursday, March 22, 2018: Percona XtraDB Cluster 5.7 with ProxySQL for Your MySQL High Availability and Clustering Needs

MySQL high availability

MySQL high availabilityPlease join Percona’s Ramesh Sivaraman (QA Engineer) and Krunal Bauskar (Software Engineer, Percona XtraDB Cluster Lead) as they present Percona XtraDB Cluster 5.7 with ProxySQL for Your MySQL High Availability and Clustering Needs on Thursday, March 22, 2018 at 8:30 am PDT (UTC-7) / 11:30 am EDT (UTC-4).

Percona has developed Percona XtraDB Cluster (based on Galera Cluster) and integrated it with ProxySQL to address MySQL high availability and clustering. These two products working together provide a great out-of-the-box synchronous replication setup.

In this webinar, we’ll look at why this is a great solution, and what types of deployments you should consider using it in.

Register for the webinar now.

MySQL High AvailabilityKrunal is Percona XtraDB Cluster lead at Percona. He is responsible for day-to-day Percona XtraDB Cluster development, what goes into Percona XtraDB Cluster, bug fixes, releases, etc. Before joining Percona, he worked as part of InnoDB team at MySQL/Oracle. He authored most of the temporary table revamp work, undo log truncate, atomic truncate and a lot of other features. In the past, he was associated with Yahoo! Labs researching big data problems, and a database startup that is now part of Teradata. His interests mainly include data-management at any scale and he has been practicing it for more than decade.

MySQL High AvailabilityRamesh joined the Percona QA Team in March 2014. Prior to joining Percona, he provided MySQL database support to various service- and product-based Internet companies. Ramesh’s professional interests include writing shell/Perl scripts to automate routine tasks, and new technology. Ramesh lives in Kerala, the southern part of India, close to his family.

The post Webinar Thursday, March 22, 2018: Percona XtraDB Cluster 5.7 with ProxySQL for Your MySQL High Availability and Clustering Needs appeared first on Percona Database Performance Blog.

by Krunal Bauskar at March 20, 2018 06:16 PM