Planet MariaDB

September 21, 2018

Peter Zaitsev

Securing PostgreSQL as an Enterprise-Grade Environment

PostgreSQL enterprise-grade security

PostgreSQL® logoIn this post, we review how you can build an enhanced and secure PostgreSQL database environment using community software. We look at the features that are available in PostgreSQL that, when implemented, provide improved security.

As discussed in the introductory blog post of this series, in our webinar of October 10, 2018 we highlight important aspects an enterprise should consider for their PostgreSQL environments. This series of blogs addressing particular aspects of the enterprise-grade postgres environment complements the webinar. This post addresses security.

Authentication Layer

Client connections to PostgreSQL Server using host based authentication

PostgreSQL uses a host based authentication file (pg_hba.conf) to authorize incoming connections. This file contains entries with a combination of 5 categories: type, database, user, address, and method. A client is allowed to connect to a database only when the combination of username, database and the hostname of the client matches an entry in the pg_hba.conf file.

Consider the following entry in pg_hba.conf file :

# TYPE DATABASE USER ADDRESS METHOD
host percona pguser 192.168.0.14/32 md5

This entry says that connections from server 192.168.0.14 are only allowed from user pguser and only to the database percona. The method md5 forces password authentication.

The order of the entries in the pg_hba.conf file matters. If you have an entry that rejects connections from a given server followed by another that allows connections from it, the first entry in the order is considered. So, in this case, the connection is rejected.

This is the first layer of protection in authentication. If this criteria is not satisfied in this Access Control List (ACL), PostgreSQL will discard the request without considering even the server authentication.

Server Authentication

Historically, PostgreSQL uses MD5 digest as a password hash by default. The problem with pure MD5 hashing is that this function will always return the same hash for a given password, which renders a MD5 digest more susceptible for password cracking. Newer versions of PostgreSQL implement SCRAM Authentication (Simple Authentication and Secured Layer) that stores passwords in salted and iterated hash formats to strengthen PostgreSQL against offline attacks. SCRAM-SHA-256 support was introduced in PostgreSQL 10. What matters most in terms of “enterprise-grade” security is that PostgreSQL supports industry-standard authentication methods out of the box, like SSL certificates, PAM/LDAP, Kerberos, etc.

Authorization Layer

User management through roles and privileges

It is always recommended to implement segregation of users through roles and privileges. There may be several user accounts in your PostgreSQL server. Only a few of them may be application accounts while the rest are developers or admin accounts. In such cases, PostgreSQL allows you to create multiple roles. Those can be assigned with a set of privileges. Thus, instead of managing user privileges individually, standard roles can be maintained and the appropriate role from the list can be assigned to a user. Through roles, database access can be standardized, which helps in user management and avoids granting too much or too little privilege to a given user.

For example, we might have six roles:

app_read_write
app_read_only
dev_read_write
dev_read_only
admin_read_write
admin_read_only

Now, if you need to create a new dev user who can only have read access, grant one among the appropriate roles, such as dev_read_only:

GRANT dev_read_only to avi_im_developer;

Row level Security

Starting with version 9.5, PostgreSQL implements row level security, which can limit access to only a subset of records/rows in a table. Usually a user is granted a mix of SELECT, INSERT, DELETE and UPDATE privileges on a given table, which allows access to all records in the table. Through row level security, however, such privileges can be restricted to a subset of the records by means of a policy, which in turn can be  assigned to a role.

In the next example, we create an employee table and two manager accounts. We then enable row level security on the table and create a policy that allows the managers to only view/modify their own subordinates’ records:

CREATE TABLE scott.employee (id INT, first_name VARCHAR(20), last_name VARCHAR(20), manager VARCHAR(20));
INSERT INTO scott.employee VALUES (1,'avinash','vallarapu','carina');
INSERT INTO scott.employee VALUES (2,'jobin','augustine','stuart');
INSERT INTO scott.employee VALUES (3,'fernando','laudares','carina');
CREATE USER carina WITH ENCRYPTED PASSWORD 'carina';
CREATE USER stuart WITH ENCRYPTED PASSWORD 'stuart';
CREATE ROLE managers;
GRANT managers TO carina, stuart;
GRANT SELECT, INSERT, UPDATE, DELETE ON scott.employee TO managers;
GRANT USAGE ON SCHEMA scott TO managers;
ALTER TABLE scott.employee ENABLE ROW LEVEL SECURITY;
CREATE POLICY employee_managers ON scott.employee TO managers USING (manager = current_user);

In the log we can see that only certain records are visible to each manager:

$ psql -d percona -U carina
psql (10.5)
Type "help" for help.
percona=> select * from scott.employee ;
id | first_name | last_name | manager
----+------------+-----------+---------
 1 | avinash    | vallarapu | carina
 3 | fernando   | laudares | carina
(2 rows)
$ psql -d percona -U stuart
psql (10.5)
Type "help" for help.
percona=> select * from scott.employee ;
id | first_name | last_name | manager
----+------------+-----------+---------
 2 | jobin      | augustine | stuart
(1 row)

You can read more about row level security in the manual page.

Data Security

1. Encryption of data over the wire using SSL

PostgreSQL allows you to use SSL to enable encryption of data in motion. In addition, you may enable certification based authentication to ensure that the communication is happening between trusted parties. SSL is implemented by OpenSSL and thus it requires the OpenSSL package to be installed in your PostgreSQL server and PostgreSQL to be built –with-openssl support.

The following entry in a pg_hba.conf file says that connections to any database and from any user are allowed from server 192.68.0.13 as long as the communication is encrypted over SSL. Also, the connection is only established when a valid client certificate is provided:

# TYPE DATABASE USER ADDRESS METHOD
hostssl all all 192.168.0.13/32 md5

Optionally, you may also use Client Certificate Authentication using the following method:

# TYPE DATABASE USER ADDRESS METHOD
hostssl all all 192.168.0.13/32 cert clientcert=1

2. Encryption at Rest – pgcrypto

The pgcrypto module provides cryptographic functions for PostgreSQL, allowing certain fields to be stored encrypted. pgcrypto implements PGP encryption, which is part of the OpenPGP (RFC 4880) standard. It supports both symmetric-key and public-key encryption. Besides the advanced features offered by PGP for encryption, pgcrypto also offers functions for running simple encryption based on ciphers. These functions only run a cipher over data.

Accounting and Auditing

Logging in PostgreSQL

PostgreSQL allows you to log either all of the statements or a few statements based on parameter settings. You can log all the DDLs or DMLs or any statement running for more than a certain duration to the log file when logging_collector is enabled. To avoid write overload to the data directory, you may also move your log_directory to a different location. Here’s a few important parameters you should review when logging activities in your PostgreSQL server:

log_connections
log_disconnections
log_lock_waits
log_statement
log_min_duration_statement

Please note that detailed logging takes additional disk space and may impose an important overhead in terms of write IO depending on the activity in your PostgreSQL server. You should be careful when enabling logging and should only do so after understanding the overhead and performance degradation it may cause to your workload.

Auditing – pgaudit and set_user

Some essential auditing features in PostgreSQL are implemented as extensions, which can be enabled at will on highly secured environments with regulatory requirements.

pgaudit helps to audit the activities happening in the database. If any unauthorized user has intentionally obfuscated the DDL or DML, the statement the user has passed and the sub-statement that was actually executed in the database will be logged in the PostgreSQL log file.

set_user
  provides a method of privilege escalations. If properly implemented, it provides the highest level of auditing, which allows the monitoring of even SUPERUSER actions.

You can read more about pgaudit here.

Security Bug Fixes

PostgreSQL Global Development Group (PGDG) considers security bugs seriously. Any security vulnerabilities can be reported directly to security@postgresql.org. The list of security issues fixed for all the supported PostgreSQL versions can be found here. Security fixes to PostgreSQL are made available through minor version upgrades. This is the main reason why it is advised to always maintain PostgreSQL servers upgraded to the latest minor version.

If you liked this post…

Please join Percona’s PostgreSQL Support Technical Lead,  Avinash Vallarapu; Senior Support Engineer, Fernando Laudares; and Senior Support Engineer, Jobin Augustine, on Wednesday, October 10, 2018 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4), as they demonstrate an enterprise-grade PostgreSQL® environment built using a combination of open source tools and extensions.

Register Now

The post Securing PostgreSQL as an Enterprise-Grade Environment appeared first on Percona Database Performance Blog.

by Avinash Vallarapu at September 21, 2018 01:50 PM

End-of-Life Reminder for Percona Server for MongoDB 3.2

Percona Server for MongoDB

Percona Server for MongoDBAs noted on our Software and Platform Lifecycle page, Percona Server for MongoDB 3.2 will be end of life on September 30, 2018. According to the Percona Services Lifecycle Policy, customers with an active support contract with Percona will continue receiving support; however, end-of-life software receives no new builds, bug fixes, security fixes, or features.

Each software package Percona produces has a defined migration pathway for upgrading to a newer version. If you are currently running Percona Server for MongoDB 3.2, we encourage you to upgrade to version 3.4 or newer. Please reach out to us if you need assistance in migrating to a newer release, Percona is happy to help.

The post End-of-Life Reminder for Percona Server for MongoDB 3.2 appeared first on Percona Database Performance Blog.

by Jeff Sandstrom at September 21, 2018 12:45 PM

This Week in Data with Colin Charles 53: It’s MariaDB Week PLUS Percona Live Europe Update

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

This week is clearly what I’d call a “MariaDB week” — plenty of announcements coming from MariaDB Corporation and MariaDB Foundation.

It started with Alibaba Cloud and MariaDB Announce the Launch of ApsaraDB RDS for MariaDB TX, which makes Alibaba Cloud the first public cloud to offer the enterprise offering of MariaDB, MariaDB TX 3.0. It is not available yet as of this announcement for rolling out from the interface, but I expect it will be soon. Exciting, as you can already get MariaDB Server on Amazon RDS for MariaDB, and you can join the waitlist preview for Azure.

MariaDB Corporation has received more funding from ServiceNow Ventures in the Series C round, and has gained a new board member in Pat Casey. ServiceNow is a user of MariaDB, and “ServiceNow’s platform runs on up to 85,000 MariaDB databases that serve more than 25 billion queries per hour.” There was an excellent keynote session at M|18 about how ServiceNow uses MariaDB. The Register refers to this as “protecting ServiceNow’s toolchain”.

For good measure, MariaDB acquired Clustrix as well. This is the second acquisition after MammothDB earlier in the year. It is worth reading the TechCrunch take on this. Clustrix, a Y Combinator company, has been around since 2006 and raised $72 million. The price of the acquisition was not announced. For a bit of behind the scenes chatter from ex-employee shareholders, Hacker News delivers.

From a MariaDB Foundation standpoint, we see Otto Kekäläinen, the MariaDB Foundation CEO stepping down. Thanks for all your hard work Otto! And maybe you missed it, but not long ago, Percona Became a Bronze Sponsor of MariaDB Foundation.

Speaking of conferences, the tutorial schedule and a sneak peek of sessions for Percona Live Europe Frankfurt have been announced. In addition, the Call for Papers – 2019 Annual MariaDB User Conference closes October 31, 2018.

Releases

Link List

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 53: It’s MariaDB Week PLUS Percona Live Europe Update appeared first on Percona Database Performance Blog.

by Colin Charles at September 21, 2018 10:44 AM

Jean-Jerome Schmidt

Become a ClusterControl DBA: Operational Reports for MySQL, MariaDB, PostgreSQL & MongoDB

The majority of DBA’s perform health checks on their databases every now and then. Usually, it would happen on a daily or weekly basis. We previously discussed why such checks are important and what they should include.

To make sure your systems are in a good shape, you’d need to go through quite a lot of information - host statistics, MySQL statistics, workload statistics, state of backups, database packages, logs and so forth. Such data should be available in every properly monitored environment, although sometimes it is scattered across multiple locations - you may have one tool to monitor MySQL state, another tool to collect system statistics, maybe a set of scripts, e.g., to check the state of your backups. This makes health checks much more time-consuming than they should be - the DBA has to put together the different pieces to understand the state of the system.

Integrated tools like ClusterControl have an advantage that all of the bits are located in the same place (or in the same application). It still does not mean they are located next to each other - they may be located in different sections of the UI and a DBA may have to spend some time clicking through the UI to reach all the interesting data.

The whole idea behind creating Operational Reports is to put all of the most important data into a single document, which can be quickly reviewed to get an understanding of the state of the databases.

Operational Reports are available from the menu Side Menu -> Operational Reports:

Once you go there, you’ll be presented with a list of reports created manually or automatically, based on a predefined schedule:

If you want to create a new report manually, you’ll use the 'Create' option. Pick the type of report, cluster name (for per-cluster report), email recipients (optional - if you want the report to be delivered to you), and you’re pretty much done:

The reports can also be scheduled to be created on a regular basis:

At this time, 5 types of reports are available:

  • Availability report - All clusters.
  • Backup report - All clusters.
  • Schema change report - MySQL/MariaDB-based cluster only.
  • Daily system report - Per cluster.
  • Package upgrade report - Per cluster.

Availability Report

Availability reports focuses on, well, availability. It includes three sections. First, availability summary:

You can see information about availability statistics of your databases, the cluster type, total uptime and downtime, current state of the cluster and when that state last changed.

Another section gives more details on availability for every cluster. The screenshot below only shows one of the database cluster:

We can see when a node switched state and what the transition was. It’s a nice place to check if there were any recent problems with the cluster. Similar data is shown in the third section of this report, where you can go through the history of changes in cluster state.

Backup Report

The second type of the report is one covering backups of all clusters. It contains two sections - backup summary and backup details, where the former basically gives you a short summary of when the last backup was created, if it completed successfully or failed, backup verification status, success rate and retention period:

ClusterControl also provides examples of backup policy if it finds any of the monitored database cluster running without any scheduled backup or delayed slave configured. Next are the backup details:

You can also check the list of backups executed on the cluster with their state, type and size within the specified interval. This is as close you can get to be certain that backups work correctly without running a full recovery test. We definitely recommend that such tests are performed every now and then. Good news is ClusterControl supports MySQL-based restoration and verification on a standalone host under Backup -> Restore Backup.

Daily System Report

This type of report contains detailed information about a particular cluster. It starts with a summary of different alerts which are related to the cluster:

Next section is about the state of the nodes that are part of the cluster:

You have a list of the nodes in the cluster, their type, role (master or slave), status of the node, uptime and the OS.

Another section of the report is the backup summary, same as we discussed above. Next one presents a summary of top queries in the cluster:

Finally, we see a “Node status overview” in which you’ll be presented with graphs related to OS and MySQL metrics for each node.

As you can see, we have here graphs covering all of the aspects of the load on the host - CPU, memory, network, disk, CPU load and disk free. This is enough to get an idea whether anything weird happened recently or not. You can also see some details about MySQL workload - how many queries were executed, which type of query, how the data was accessed (via which handler)? This, on the other hand, should be enough to pick most of the issues on MySQL side. What you want to look at are all spikes and dips that you haven’t seen in the past. Maybe a new query has been added to the mix and, as a result, handler_read_rnd_next skyrocketed? Maybe there was an increase of CPU load and a high number of connections might point to increased load on MySQL, but also to some kind of contention. An unexpected pattern might be good to investigate, so you know what is going on.

Package Upgrade Report

This report gives a summary of packages available for upgrade by the repository manager on the monitored hosts. For an accurate reporting, ensure you always use stable and trusted repositories on every host. In some undesirable occasions, the monitored hosts could be configured with an outdated repository after an upgrade (e.g, every MariaDB major version uses different repository), incomplete internal repository (e.g, partial mirrored from the upstream) or bleeding edge repository (commonly for unstable nightly-build packages).

The first section is the upgrade summary:

It summarizes the total number of packages available for upgrade as well as the related managed service for the cluster like load balancer, virtual IP address and arbitrator. Next, ClusterControl provides a detailed package list, grouped by package type for every host:

This report provides the available version and can greatly help us plan our maintenance window efficiently. For critical upgrades like security and database packages, we could prioritize it over non-critical upgrades, which could be consolidated with other less priority maintenance windows.

Schema Change Report

This report compares the selected MySQL/MariaDB database changes in table structure which happened between two different generated reports. In the MySQL/MariaDB older versions, DDL operation is a non-atomic operation (pre 8.0) and requires full table copy (pre 5.6 for most operations) - blocking other transactions until it completes. Schema changes could become a huge pain once your tables get a significant amount of data and must be carefully planned especially in a clustered setup. In a multi-tiered development environment, we have seen many cases where developers silently modify the table structure, resulting in significant impact to query performance.

In order for ClusterControl to produce an accurate report, special options must be configured inside CMON configuration file for the respective cluster:

  • schema_change_detection_address - Checks will be executed using SHOW TABLES/SHOW CREATE TABLE to determine if the schema has changed. The checks are executed on the address specified and is of the format HOSTNAME:PORT. The schema_change_detection_databases must also be set. A differential of a changed table is created (using diff).
  • schema_change_detection_databases - Comma separated list of databases to monitor for schema changes. If empty, no checks are made.

In this example, we would like to monitor schema changes for database "myapp" and "sbtest" on our MariaDB Cluster with cluster ID 27. Pick one of the database nodes as the value of schema_change_detection_address. For MySQL replication, this should be the master host, or any slave host that holds the databases (in case partial replication is active). Then, inside /etc/cmon.d/cmon_27.cnf, add the two following lines:

schema_change_detection_address=10.0.0.30:3306
schema_change_detection_databases=myapp,sbtest

Restart CMON service to load the change:

$ systemctl restart cmon

For the first and foremost report, ClusterControl only returns the result of metadata collection, similar to below:

With the first report as the baseline, the subsequent reports will return the output that we are expecting for:

Take note only new tables or changed tables are printed in the report. The first report is only for metadata collection for comparison in the subsequent rounds, thus we have to run it for at least twice to see the difference.

With this report, you can now gather the database structure footprints and understand how your database has evolved across time.

Final Thoughts

Operational report is a comprehensive way to understand the state of your database infrastructure. It is built for both operational or managerial staff, and can be very useful in analysing your database operations. The reports can be generated in-place or can be delivered to you via email, which make things conveniently easy if you have a reporting silo.

We’d love to hear your feedback on anything else you’d like to have included in the report, what’s missing and what is not needed.

by ashraf at September 21, 2018 08:18 AM

September 20, 2018

Peter Zaitsev

Prometheus 2 Times Series Storage Performance Analyses

cpu saturation and max core usage

Prometheus 2 time series database (TSDB) is an amazing piece of engineering, offering a dramatic improvement compared to “v2” storage in Prometheus 1 in terms of ingest performance, query performance and resource use efficiency. As we’ve been adopting Prometheus 2 in Percona Monitoring and Management (PMM), I had a chance to look into the performance of Prometheus 2 TSDB. This blog post details my observations.

Understanding the typical Prometheus workload

For someone who has spent their career working with general purpose databases, the typical workload of Prometheus is quite interesting. The ingest rate tends to remain very stable: typically, devices you monitor will send approximately the same amount of metrics all the time, and infrastructure tends to change relatively slowly.

Queries to the data can come from multiple sources. Some of them, such as alerting, tend to be very stable and predictable too. Others, such as users exploring data, can be spiky, though it is not common for this to be largest part of the load.

The Benchmark

In my assessment, I focused on handling an ingest workload. I had deployed Prometheus 2.3.2 compiled with Go 1.10.1 (as part of PMM 1.14)  on Linode using this StackScript.  For a maximally realistic load generation, I spin up multiple MySQL nodes running some real workloads (Sysbench TPC-C Test) , with each emulating 10 Nodes running MySQL and Linux using this StackScript

The observations below are based on a Linode instance with eight virtual cores and 32GB of memory, running  20 load driving simulating the monitoring of 200 MySQL instances. Or, in Prometheus Terms, some 800 targets; 440 scrapes/sec 380K samples ingested per second and 1.7M of active time series.

Design Observations

The conventional approach of traditional databases, and the approach that Prometheus 1.x used, is to limit amount of memory. If this amount of memory is not enough to handle the load, you will have high latency and some queries (or scrapes) will fail. Prometheus 2 memory usage instead is configured by

storage.tsdb.min-block-duration
   which determines how long samples will be stored in memory before they are flushed (the default being 2h). How much memory it requires will depend on the number of time series, the number of labels you have, and your scrape frequency in addition to the raw ingest rate. On disk, Prometheus tends to use about three bytes per sample. Memory requirements, though, will be significantly higher.

While the configuration knob exists to change the head block size, tuning this by users is discouraged. So you’re limited to providing Prometheus 2 with as much memory as it needs for your workload.

If there is not enough memory for Prometheus to handle your ingest rate, then it will crash with out of memory error message or will be killed by OOM killer.

Adding more swap space as a “backup” in case Prometheus runs out of RAM does not seem to work as using swap space causes a dramatic memory usage explosion. I suspect swapping does not play well with Go garbage collection.

Another interesting design choice is aligning block flushes to specific times, rather than to time since start:

head block Prometheus 2

As you can see from this graph, flushes happen every two hours, on the clock. If you change min-block-duration  to 1h, these flushes will happen every hour at 30 minutes past the hour.

(If you want to see this and other graphs for your Prometheus Installation you can use this Dashboard. It has been designed for PMM but can work for any Prometheus installation with little adjustments.)

While the active block—called head block— is kept in memory, blocks containing older blocks are accessed through

mmap()
  This eliminates the need to configure cache separately, but also means you need to allocate plenty of memory for OS Cache if you want to query data older than fits in the head block.

It also means the virtual memory you will see Prometheus 2 using will get very high: do not let it worry you.

Prometheus process memory usage

Another interesting design choice is WAL configuration. As you can see in the storage documentation, Prometheus protects from data loss during a crash by having WAL log. The exact durability guarantees, though, are not clearly described. As of Prometheus 2.3.2, Prometheus flushes the WAL log every 10 seconds, and this value is not user configurable.

Compactions

Prometheus TSDB is designed somewhat similar to the LSM storage engines – the head block is flushed to disk periodically, while at the same time, compactions to merge a few blocks together are performed to avoid need to scan too many blocks for queries

Here is the number of data blocks I observed on my system after a 24h workload:

active data blocks

If you want more details about storage, you can check out the meta.json file which has additional information about the blocks you have, and how they came about.

{
       "ulid": "01CPZDPD1D9R019JS87TPV5MPE",
       "minTime": 1536472800000,
       "maxTime": 1536494400000,
       "stats": {
               "numSamples": 8292128378,
               "numSeries": 1673622,
               "numChunks": 69528220
       },
       "compaction": {
               "level": 2,
               "sources": [
                       "01CPYRY9MS465Y5ETM3SXFBV7X",
                       "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                       "01CPZ6NR4Q3PDP3E57HEH760XS"
               ],
               "parents": [
                       {
                               "ulid": "01CPYRY9MS465Y5ETM3SXFBV7X",
                               "minTime": 1536472800000,
                               "maxTime": 1536480000000
                       },
                       {
                               "ulid": "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                               "minTime": 1536480000000,
                               "maxTime": 1536487200000
                       },
                       {
                               "ulid": "01CPZ6NR4Q3PDP3E57HEH760XS",
                               "minTime": 1536487200000,
                               "maxTime": 1536494400000
                       }
               ]
       },
       "version": 1
}

Compactions in Prometheus are triggered at the time the head block is flushed, and several compactions may be performed at these intervals:Prometheus 2 compactions

Compactions do not seem to be throttled in any way, causing huge spikes of disk IO usage when they run:

spike in io activity for compactions

And a spike in CPU usage:

spike in CPU usage during compactions

This, of course, can cause negative impact to the system performance. This is also why it is one of the greatest questions in LSM engines: how to run compactions to maintain great query performance, but not cause too much overhead.

Memory utilization as it relates to the compaction process is also interesting:

Memory utilization during compaction process

We can see after compaction a lot of memory changes from “Cached”  to “Free”, meaning potentially valuable data is washed out from memory. I wonder if

fadvice()
 or other techniques to minimize data washout from cache are in use, or if this is caused by the fact that the blocks which were cached are destroyed by the compaction process

Crash Recovery

Crash recovery from the log file takes time, though it is reasonable. For an ingest rate of about 1 mil samples/sec, I observed some 25 minutes recovery time on SSD storage:

level=info ts=2018-09-13T13:38:14.09650965Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.2, branch=v2.3.2, revision=71af5e29e815795e9dd14742ee7725682fa14b7b)"
level=info ts=2018-09-13T13:38:14.096599879Z caller=main.go:223 build_context="(go=go1.10.1, user=Jenkins, date=20180725-08:58:13OURCE)"
level=info ts=2018-09-13T13:38:14.096624109Z caller=main.go:224 host_details="(Linux 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 1bee9e9b78cf (none))"
level=info ts=2018-09-13T13:38:14.096641396Z caller=main.go:225 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-09-13T13:38:14.097715256Z caller=web.go:415 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-09-13T13:38:14.097400393Z caller=main.go:533 msg="Starting TSDB ..."
level=info ts=2018-09-13T13:38:14.098718401Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536530400000 maxt=1536537600000 ulid=01CQ0FW3ME8Q5W2AN5F9CB7R0R
level=info ts=2018-09-13T13:38:14.100315658Z caller=web.go:467 component=web msg="router prefix" prefix=/prometheus
level=info ts=2018-09-13T13:38:14.101793727Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536732000000 maxt=1536753600000 ulid=01CQ78486TNX5QZTBF049PQHSM
level=info ts=2018-09-13T13:38:14.102267346Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536537600000 maxt=1536732000000 ulid=01CQ78DE7HSQK0C0F5AZ46YGF0
level=info ts=2018-09-13T13:38:14.102660295Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536775200000 maxt=1536782400000 ulid=01CQ7SAT4RM21Y0PT5GNSS146Q
level=info ts=2018-09-13T13:38:14.103075885Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536753600000 maxt=1536775200000 ulid=01CQ7SV8WJ3C2W5S3RTAHC2GHB
level=error ts=2018-09-13T14:05:18.208469169Z caller=wal.go:275 component=tsdb msg="WAL corruption detected; truncating" err="unexpected CRC32 checksum d0465484, want 0" file=/opt/prometheus/data/.prom2-data/wal/007357 pos=15504363
level=info ts=2018-09-13T14:05:19.471459777Z caller=main.go:543 msg="TSDB started"
level=info ts=2018-09-13T14:05:19.471604598Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499156711Z caller=main.go:629 msg="Completed loading of configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499228186Z caller=main.go:502 msg="Server is ready to receive web requests."

The problem I observed with recovery is that it is very memory intensive. While the server may be capable of handling the normal load with memory to spare if it crashes, it may not be able to ever recover due to running out of memory.  The only solution I found for this is to disable scraping, let it perform crash recovery, and then restarting the server with scraping enabled

Warmup

Another behavior to keep in mind is the need for warmup – a lower performance/higher resource usage ratio immediately after start. In some—but not all—starts I can observe significantly higher initial CPU and memory usage

cpu usage during warmup

memory usage during warmup

The gaps in the memory utilization graph show that Prometheus is not initially able to perform all the scrapes configured, and as such some data is lost.

I have not profiled what exactly causes this extensive CPU and memory consumption. I suspect these might be happening when new time series entries are created, at head block, and at high rate.

CPU Usage Spikes

Besides compaction—which is quite heavy on the Disk IO—I also can observe significant CPU spikes about every 2 minutes. These are longer with a higher ingest ratio. These seem to be caused by Go Garbage Collection during these spikes: at least some CPU cores are completely saturated

cpu usage spikes maybe during Go Garbage collection

cpu saturation and max core usage

These spikes are not just cosmetic. It looks like when these spikes happen, the Prometheus internal /metrics endpoint becomes unresponsive, thus producing data gaps during the exact time that the spikes occur:

Prometheus 2 process memory usage

We can also see the Prometheus Exporter hitting a one second timeout:

scrape time by job

We can observe this correlates with garbage collection:

garbage collection in Prometheus processing

Conclusion

Prometheus 2 TSDB offers impressive performance, being able to handle a cardinality of millions of time series, and also to handle hundreds of thousands of samples ingested per second on rather modest hardware. CPU and disk IO usage are both very impressive. I got up to 200K/metrics/sec per used CPU core!

For capacity planning purposes you need to ensure that you have plenty of memory available, and it needs to be real RAM. The actual amount of memory I observed was about 5GB per 100K/samples/sec ingest rate, which with additional space for OS cache, makes it 8GB or so.

There is work that remains to be done to avoid CPU and IO usage spikes, though this is not unexpected considering how young Prometheus 2 TSDB is – if we look at InnoDB, TokuDB, RocksDB, WiredTiger all of them had similar problem in their initial releases.

The post Prometheus 2 Times Series Storage Performance Analyses appeared first on Percona Database Performance Blog.

by Peter Zaitsev at September 20, 2018 04:07 PM

ProxySQL 1.4.10 and Updated proxysql-admin Tool Now in the Percona Repository

ProxySQL 1.4.9

ProxySQL 1.4.5ProxySQL 1.4.10, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.10 source and binary packages available at https://percona.com/downloads/proxysql include ProxySQL Admin – a tool, developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.10 are available as well: https://hub.docker.com/r/percona/proxysql/. You can download the original ProxySQL from https://github.com/sysown/proxysql/releases.

Improvements

  • PSQLADM-12: Implemented the writer-is-reader option in proxysql-admin. This is now a text option: ‘always’, ‘never’, and ‘ondemand’
  • PSQLADM-64: Added the option --sync-multi-cluster-users which , that uses the same function as --sync-users but will not delete users on ProxySQL that don’t exist on MySQL
  • PSQLADM-90: Added testsuites for host priority/slave/loadbal/writer-is-reader features
  • Additional debugging support
    An additional --debug flag on scripts prints more output. All SQL calls are now logged if debugging is enabled.

Tool Enhancements

  • proxysql-status
    proxysql-status now reads the credentials from the proxysql-admin.cnf file. It is possible to look only at certain tables (--files, --main, --monitor, --runtime, --stats). Also added the ability to filter based on the table name (--table)
  • tests directory
    The proxysql-admin-testsuite.sh script can now be used to create test clusters (
    proxysql-admin-testsuite.sh <workdir> --no-test --cluster-one-only
     , this option will create a 3-node PXC cluster with 1 async slave and will also start proxyxql). Also added regression test suites.
  • tools directory
    Added extra tools that can be used for debugging (mysql_exec, proxysql_exec, enable_scheduler, and run_galera_checker).

Bug Fixes

  • PSQLADM-73: proxysql-admin did not check that the monitor user had been configured on the PXC nodes.
  • PSQLADM-82: the without-check-monitor-user option did check the monitor user (even if it was enabled). This option has been replaced with use-existing-monitor-password.
  • PSQLADM-83: proxysql_galera-checker could hang if there was no scheduler entry.
  • PSQLADM-87: in some cases, proxysql_galera_checker was not moving a node to OFFLINE_SOFT if pxc_maint_mode was set to “maintenance”
  • PSQLADM-88: proxysql_node_monitor was searching among all nodes, not just the read hostgroup.
  • PSQLADM-91: Nodes in the priority list were not being picked.
  • PSQLADM-93: If mode=’loadbal’, then the read_hostgroup setting was used from the config file, rather than being set to -1.
  • PSQLADM-96: Centos used /usr/share/proxysql rather than /var/lib/proxysql
  • PSQLADM-98: In some cases, checking the PXC node status could stall (this call now uses a TIMEOUT)

ProxySQL is available under OpenSource license GPLv3.

The post ProxySQL 1.4.10 and Updated proxysql-admin Tool Now in the Percona Repository appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at September 20, 2018 11:06 AM

MyRocks Disk Full Edge Case

problem in MyRocks

MyRocks disk full bugRocksDB engine—and it’s MySQL implementation MyRocks—is a very good alternative engine for MySQL. It has proven to be very efficient and stable for many workloads, including those of large scale. However, it is still a relative newborn in the MySQL ecosystem, and has only a small fraction of the adoption rate of InnoDB. That means it is not so well tested at all possible edge cases, and may have many unreported bugs. One known bug is discussed here: if you are a MyRocks user, it’s important that you are aware of the possibility of lost data in the specific circumstances described below.

In writing this article, I want to broadcast a wider warning, as the problem I found is pretty serious and could lead to a very unpleasant situation.

The problem is related to not a very edge case after all – disk full. The result could be extremely bad, though. After printing some errors, the RocksDB engine continues to message clients that consecutive writes are accepted, while they are, in fact, lost! In addition, depending on the workload and at which moment the disk ran out of space, in a worst case scenario we may lose access to tables that use the RocksDB engine completely.

Let me show how bad the situation could be, using a basic sandbox and sysbench test example. I have a sandbox with MyRocks installed:

mysql [localhost] {msandbox} ((none)) > show engines\G
*************************** 1. row ***************************
Engine: ROCKSDB
Support: YES
Comment: RocksDB storage engine
Transactions: YES
XA: YES
Savepoints: YES
mysql [localhost] {msandbox} ((none)) > select @@version,@@version_comment;
+-----------+----------------------------------------------------+
| @@version | @@version_comment |
+-----------+----------------------------------------------------+
| 5.7.22-22 | Percona Server (GPL), Release 22, Revision f62d93c |
+-----------+----------------------------------------------------+
1 row in set (0.00 sec)

Once my test system had not so much free disk space remaining, I ran this simple sysbench prepare command:

$ sysbench /usr/share/sysbench/oltp_insert.lua --mysql_storage_engine=rocksdb --table-size=1000000 --tables=4 --mysql-db=db2 --mysql-user=root --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox5723.sock --threads=4 --time=200 --report-interval=1 --events=0 --db-driver=mysql prepare
sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
Initializing worker threads...
Creating table 'sbtest2'...
Creating table 'sbtest3'...
Creating table 'sbtest1'...
Creating table 'sbtest4'...
Inserting 1000000 records into 'sbtest2'
Inserting 1000000 records into 'sbtest1'
Inserting 1000000 records into 'sbtest3'
Inserting 1000000 records into 'sbtest4'
Creating a secondary index on 'sbtest4'...
FATAL: mysql_drv_query() returned error 1105 ([./.rocksdb/db2.sbtest4_k_4_4_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device) for query 'CREATE INDEX k_4 ON sbtest4(k)'
FATAL: `sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:238: SQL error, errno = 1105, state = 'HY000': [./.rocksdb/db2.sbtest4_k_4_4_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device
Creating a secondary index on 'sbtest2'...
Creating a secondary index on 'sbtest1'...
Creating a secondary index on 'sbtest3'...
FATAL: mysql_drv_query() returned error 1105 ([./.rocksdb/db2.sbtest2_k_2_5_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device) for query 'CREATE INDEX k_2 ON sbtest2(k)'
FATAL: `sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:238: SQL error, errno = 1105, state = 'HY000': [./.rocksdb/db2.sbtest2_k_2_5_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device
FATAL: mysql_drv_query() returned error 1105 ([./.rocksdb/db2.sbtest3_k_3_6_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device) for query 'CREATE INDEX k_3 ON sbtest3(k)'
FATAL: `sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:238: SQL error, errno = 1105, state = 'HY000': [./.rocksdb/db2.sbtest3_k_3_6_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device
FATAL: mysql_drv_query() returned error 1105 ([./.rocksdb/db2.sbtest1_k_1_7_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device) for query 'CREATE INDEX k_1 ON sbtest1(k)'
FATAL: `sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:238: SQL error, errno = 1105, state = 'HY000': [./.rocksdb/db2.sbtest1_k_1_7_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device

This resulted in the following errors being printed to the error log:

2018-08-10T11:01:35.379611Z 5 [Note] RocksDB: Begin index creation (0,260)
2018-08-10T11:01:40.105596Z 3 [Note] RocksDB: Begin index creation (0,261)
2018-08-10T11:01:40.838902Z 4 [Note] RocksDB: Begin index creation (0,262)
2018-08-10T11:01:40.924093Z 6 [Note] RocksDB: Begin index creation (0,263)
2018-08-10T11:03:14.868958Z 0 [ERROR] RocksDB: Error detected in background, Status Code: 5, Status: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device
2018-08-10T11:03:14.868986Z 0 [ERROR] RocksDB: BackgroundErrorReason: 0
2018-08-10T11:03:14.869003Z 0 [ERROR] LibRocksDB:[/mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/min-stretch-x64/percona-server-5.7.22-22/storage/rocksdb/rocksdb/db/db_impl_compaction_flush.cc:1293] Waiting after background flush error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on deviceAccumulated background error counts: 1
2018-08-10T11:03:15.430064Z 0 [ERROR] LibRocksDB:[/mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/min-stretch-x64/percona-server-5.7.22-22/storage/rocksdb/rocksdb/db/db_impl_compaction_flush.cc:1373] Waiting after background compaction error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000102.sst: No space left on device, Accumulated background error counts: 2
2018-08-10T11:03:18.538866Z 9 [Note] RocksDB: Begin index creation (0,268)
2018-08-10T11:03:19.897797Z 9 [ERROR] Error finishing bulk load.
2018-08-10T11:03:23.306957Z 10 [Note] RocksDB: Begin index creation (0,269)
2018-08-10T11:03:23.509448Z 8 [Note] RocksDB: Begin index creation (0,270)
2018-08-10T11:03:23.539705Z 7 [Note] RocksDB: Begin index creation (0,271)
2018-08-10T11:03:24.536092Z 10 [ERROR] Error finishing bulk load.
2018-08-10T11:03:24.659233Z 7 [ERROR] Error finishing bulk load.
2018-08-10T11:03:24.736380Z 8 [ERROR] Error finishing bulk load.

OK, so I resized my data partition online (no MySQL service restart) and gave it more free disk space. And here is what happened next. I tried to see if MySQL/MyRocks works now:

mysql [localhost] {msandbox} (db1) > select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.33 sec)
mysql [localhost] {msandbox} (db1) > insert into sbtest1 set k=1;
Query OK, 1 row affected (0.01 sec)
mysql [localhost] {msandbox} (db1) > select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.32 sec)
mysql [localhost] {msandbox} (db1) > delete from db1.sbtest2 where id<100;
Query OK, 99 rows affected (0.01 sec)
mysql [localhost] {msandbox} (db1) > select count(*) from db1.sbtest2;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.32 sec)

That’s the worst part – there’s no error returned to the client, and yet my new writes are basically gone!

The good thing is we can still read from the tables, so mysqldump or mydumper is the way to go now, BEFORE you attempt to restart the service:

mysql [localhost] {msandbox} (db1) > select id from db1.sbtest2 where id<10;
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+----+
9 rows in set (0.00 sec)

When I later try to restart the service, it crashes on shutdown. In some cases, the database is able to recover and offer access to the old RocksDB data (though the new data, written above, are still lost).

In other cases, though, your RocksDB won’t load after restart:

mysql [localhost] {msandbox} (db1) > show create table sbtest1\G
ERROR 1286 (42000): Unknown storage engine 'ROCKSDB'

and using this option:

rocksdb_wal_recovery_mode = 3

…helps to recover the engine to usable state.

Ultimately, the worst case scenario is that you end up with the MyRocks engine dead and unable to recover, no matter what options you try. We can see this in the error log when I try to start the service:

2018-08-10T09:33:18.756868Z 0 [Note] RocksDB: 2 column families found
2018-08-10T09:33:18.756993Z 0 [Note] RocksDB: Column Families at start:
2018-08-10T09:33:18.757016Z 0 [Note] cf=default
2018-08-10T09:33:18.757022Z 0 [Note] write_buffer_size=67108864
2018-08-10T09:33:18.757027Z 0 [Note] target_file_size_base=67108864
2018-08-10T09:33:18.757043Z 0 [Note] cf=__system__
2018-08-10T09:33:18.757048Z 0 [Note] write_buffer_size=67108864
2018-08-10T09:33:18.757052Z 0 [Note] target_file_size_base=67108864
2018-08-10T09:33:18.811671Z 0 [ERROR] RocksDB: Could not get index information for Index Number (0,282), table db3.sbtest4
2018-08-10T09:33:18.811741Z 0 [ERROR] RocksDB: Failed to initialize DDL manager.
2018-08-10T09:33:18.811795Z 0 [ERROR] Plugin 'ROCKSDB' init function returned error.
2018-08-10T09:33:18.811812Z 0 [ERROR] Plugin 'ROCKSDB' registration as a STORAGE ENGINE failed.

Unfortunately, we were not able to find any workaround for this ultimate worst case situation, and so recovering data—including all other RocksDB tables not touched during the incident—may be very hard.

The problem, which is obviously a bug, is reported here: https://jira.percona.com/browse/PS-4706.

Fortunately, the engineers at Facebook—the original RocksDB founder—are aware of the issue and have already released a simple patch. This allows the engine simply to crash instead of allowing lost writes. While this is not a perfect solution, it at least protects you from losing data. Related commit details can be found here: https://github.com/facebook/mysql-5.6/commit/4a7d3fb8f96c96c2be10d61a7796ccac7610a5d6.

At Percona, we accepted that fix, and the latest Percona Server for MySQL 5.7.23 version, already has it incorporated: https://github.com/percona/percona-server/commit/d0d5635573ebc4f157e36a989d14f7ee6f87c410

So, with the upgraded version, we instead will see this kind of error log event:

2018-09-19T08:12:14.496412Z 0 [ERROR] RocksDB: Error detected in background, Status Code: 5, Status: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000104.sst: No space left on device
2018-09-19T08:12:14.496435Z 0 [ERROR] RocksDB: BackgroundErrorReason: 1
2018-09-19T08:12:14.496451Z 0 [ERROR] LibRocksDB:[/mnt/workspace/percona-server-5.7-binaries-release-rocks-new/label_exp/min-stretch-x64/test/percona-server-5.7.23-23/storage/rocksdb/rocksdb/db/db_impl_compactio
n_flush.cc:1472] Waiting after background compaction error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000104.sst: No space left on device, Accumulated background error counts: 1
2018-09-19T08:12:14.511910Z 12 [ERROR] RocksDB: failed to write to WAL, Status Code: 5, Status: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000104.sst: No space left on device
2018-09-19T08:12:14.511935Z 12 [ERROR] MyRocks: aborting on WAL write error.
08:12:14 UTC - mysqld got signal 6 ;

Therefore, all MyRocks users are advised to upgrade ASAP and if that’s not possible, you should at least double check the disk space monitoring and alerting.

The post MyRocks Disk Full Edge Case appeared first on Percona Database Performance Blog.

by Przemysław Malkowski at September 20, 2018 10:00 AM

September 19, 2018

MariaDB Foundation

MariaDB Foundation CEO steps down

The time has come time for me, Otto Kekäläinen, after serving as the CEO of the MariaDB Foundation since January 2015, to step down. The board has already quietly been searching for a new CEO for some months, but no contract has been signed as of yet, so if you are interested in the position, […]

The post MariaDB Foundation CEO steps down appeared first on MariaDB.org.

by Otto Kekäläinen at September 19, 2018 12:23 PM

Jean-Jerome Schmidt

How to Monitor Multiple MySQL Instances Running on the Same Machine - ClusterControl Tips & Tricks

Requires ClusterControl 1.6 or later. Applies to MySQL based instances/clusters.

On some occasions, you might want to run multiple instances of MySQL on a single machine. You might want to give different users access to their own MySQL servers that they manage themselves, or you might want to test a new MySQL release while keeping an existing production setup undisturbed.

It is possible to use a different MySQL server binary per instance, or use the same binary for multiple instances (or a combination of the two approaches). For example, you might run a server from MySQL 5.6 and one from MySQL 5.7, to see how the different versions handle a certain workload. Or you might run multiple instances of the latest MySQL version, each managing a different set of databases.

Whether or not you use distinct server binaries, each instance that you run must be configured with unique values for several operating parameters. This eliminates the potential for conflict between instances. You can use MySQL Sandbox to create multiple MySQL instances. Or you can use mysqld_multi available in MySQL to start or stop any number of separate mysqld processes running on different TCP/IP ports and UNIX sockets.

In this blog post, we’ll show you how to configure ClusterControl to monitor multiple MySQL instances running on one host.

ClusterControl Limitation

At the time of writing, ClusterControl does not support monitoring of multiple instances on one host per cluster/server group. It assumes the following best practices:

  • Only one MySQL instance per host (physical server or virtual machine).
  • MySQL data redundancy should be configured on N+1 server.
  • All MySQL instances are running with uniform configuration across the cluster/server group, e.g., listening port, error log, datadir, basedir, socket are identical.

With regards to the points mentioned above, ClusterControl assumes that in a cluster/server group:

  • MySQL instances are configured uniformly across a cluster; same port, the same location of logs, base/data directory and other critical configurations.
  • It monitors, manages and deploys only one MySQL instance per host.
  • MySQL client must be installed on the host and available on the executable path for the corresponding OS user.
  • The MySQL is bound to an IP address reachable by ClusterControl node.
  • It keeps monitoring the host statistics e.g CPU/RAM/disk/network for each MySQL instance individually. In an environment with multiple instances per host, you should expect redundant host statistics since it monitors the same host multiple times.

With the above assumptions, the following ClusterControl features do not work for a host with multiple instances:

Backup - Percona Xtrabackup does not support multiple instances per host and mysqldump executed by ClusterControl only connects to the default socket.

Process management - ClusterControl uses the standard ‘pgrep -f mysqld_safe’ to check if MySQL is running on that host. With multiple MySQL instances, this is a false positive approach. As such, automatic recovery for node/cluster won’t work.

Configuration management - ClusterControl provisions the standard MySQL configuration directory. It usually resides under /etc/ and /etc/mysql.

Workaround

Monitoring multiple MySQL instances on a machine is still possible with ClusterControl with a simple workaround. Each MySQL instance must be treated as a single entity per server group.

In this example, we have 3 MySQL instances on a single host created with MySQL Sandbox:

ClusterControl monitoring multiple instances on same host
ClusterControl monitoring multiple instances on same host

We created our MySQL instances using the following commands:

$ su - sandbox
$ make_multiple_sandbox mysql-5.7.23-linux-glibc2.12-x86_64.tar.gz

By default, MySQL Sandbox creates mysql instances that listen to 127.0.0.1. It is necessary to configure each node appropriately to make them listen to all available IP addresses. Here is the summary of our MySQL instances in the host:

[sandbox@master multi_msb_mysql-5_7_23]$ cat default_connection.json 
{
"node1":  
    {
        "host":     "master",
        "port":     "15024",
        "socket":   "/tmp/mysql_sandbox15024.sock",
        "username": "msandbox@127.%",
        "password": "msandbox"
    }
,
"node2":  
    {
        "host":     "master",
        "port":     "15025",
        "socket":   "/tmp/mysql_sandbox15025.sock",
        "username": "msandbox@127.%",
        "password": "msandbox"
    }
,
"node3":  
    {
        "host":     "master",
        "port":     "15026",
        "socket":   "/tmp/mysql_sandbox15026.sock",
        "username": "msandbox@127.%",
        "password": "msandbox"
    }
}

Next step is to modify the configuration of the newly created instances. Go to my.cnf for each of them and hash bind_address variable:

[sandbox@master multi_msb_mysql-5_7_23]$ ps -ef | grep mysqld_safe
sandbox  13086     1  0 08:58 pts/0    00:00:00 /bin/sh bin/mysqld_safe --defaults-file=/home/sandbox/sandboxes/multi_msb_mysql-5_7_23/node1/my.sandbox.cnf
sandbox  13805     1  0 08:58 pts/0    00:00:00 /bin/sh bin/mysqld_safe --defaults-file=/home/sandbox/sandboxes/multi_msb_mysql-5_7_23/node2/my.sandbox.cnf
sandbox  14065     1  0 08:58 pts/0    00:00:00 /bin/sh bin/mysqld_safe --defaults-file=/home/sandbox/sandboxes/multi_msb_mysql-5_7_23/node3/my.sandbox.cnf
[sandbox@master multi_msb_mysql-5_7_23]$ vi my.cnf
#bind_address = 127.0.0.1

Then install mysql on your master node and restart all instances using restart_all script.

[sandbox@master multi_msb_mysql-5_7_23]$ yum install mysql
[sandbox@master multi_msb_mysql-5_7_23]$ ./restart_all  
# executing "stop" on /home/sandbox/sandboxes/multi_msb_mysql-5_7_23
executing "stop" on node 1
executing "stop" on node 2
executing "stop" on node 3
# executing "start" on /home/sandbox/sandboxes/multi_msb_mysql-5_7_23
executing "start" on node 1
. sandbox server started
executing "start" on node 2
. sandbox server started
executing "start" on node 3
. sandbox server started

From ClusterControl, we need to perform ‘Import’ for each instance as we need to isolate them in a different group to make it work.

ClusterControl import existing server
ClusterControl import existing server

For node1, enter the following information in ClusterControl > Import:

ClusterControl import existing server
ClusterControl import existing server

Make sure to put proper ports (different for different instances) and host (same for all instances).

You can monitor the progress by clicking on the Activity/Jobs icon in the top menu.

ClusterControl import existing server details
ClusterControl import existing server details

You will see node1 in the UI once ClusterControl finishes the job. Repeat the same steps to add another two nodes with port 15025 and 15026. You should see something like the below once they are added:

ClusterControl Dashboard
ClusterControl Dashboard

There you go. We just added our existing MySQL instances into ClusterControl for monitoring. Happy monitoring!

PS.: To get started with ClusterControl, click here!

by Bart Oles at September 19, 2018 11:17 AM

Peter Zaitsev

Tutorial Schedule for Percona Live Europe 2018 Is Live

Percona Live Europe tutorials and sneak peak

Percona Live Europe Open Source Database Conference PLE 2018Percona has revealed the line-up of in-depth tutorials for the Percona Live Europe 2018 Open Source Database Conference, taking place November 5–7, 2018 at the Radisson Blu Hotel in Frankfurt, Germany. Secure your spot now with Advanced Registration prices. Be sure to buy your tickets soon as tickets prices will only head up, not down! Sponsorship opportunities for the conference are still available.

Percona Live Europe 2018 Open Source Database Conference is the premier open source database event. Our theme this year is “Connect. Accelerate. Innovate.”  Percona Live is the place to learn about how open source database technology can power your applications, improve your websites and solve your critical database issues.

Monday, November 5: Tutorial Day

Tutorials take place throughout the day on Monday, November 5. Tutorials are three hours and provide practical, in-depth knowledge exchange on critical open source database issues. The line up includes:

  • Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics with Jaime Crespo – Wikimedia Foundation
  • Hands on ProxySQL with René Cannaò – ProxySQL
  • ElasticSearch 101 with Antonios Giannopoulos – ObjectRocket
  • MySQL Performance Schema in Action: the Complete Tutorial with Sveta Smirnova and Alexander Rubin – Percona
  • MySQL InnoDB Cluster in a Nutshell : The Saga Continues with 8.0 with Frédéric Descamps – Oracle
  • Introduction to PostgreSQL for MySQL and Oracle DBAs with Avinash Vallarapu – Percona
  • InnoDB Architecture and Optimization with Peter Zaitsev – Percona
  • MongoDB: Replica Sets and Sharded Clusters with Adamo Tonete – Percona
  • High Availability PostgreSQL and Kubernetes with Google Cloud presented by Alexis Guajardo – Google
  • Open Source Database Performance Optimization and Monitoring with PMM with Michael Coburn, Vinicius Grippa, and Avinash Vallarapu – Percona
  • Percona XtraDB Cluster Tutorial presented by Tibor Köröcz – Percona
  • Mastering PostgreSQL Administration with Bruce Momjian – EnterpriseDB

and a sneak peak at some of the sessions

Of course, we have a stellar line up of talks, too! Here’s a tantalising glimpse of just some of the talks you could MISS if you don’t head to Frankfurt in November

Tuesday 6th November

  • Percona Server 8.0 – Laurynas Biveinis – Percona
  • MySQL 8.0 Performance: Scalability & Benchmarks – Dimitri Kravtchuk – Oracle
  • MySQL Group Replication : the magic explained – Frédéric Descamps – Oracle
  • Explaining the Postgres Query Optimizer – Bruce Momjian – EnterpriseDB
  • TLS for MySQL at large scale – Jaime Crespo – Wikimedia Foundation
  • BlaBlaCar – 100% Containers Powered Carpooling – Maxime Fouilleul – BlaBlaCar
  • A Year in Google Cloud – Carmen Mason, Alan Mason – VitalSource Technologies
  • Demystifying MySQL Replication Crash Safety – Jean-François Gagné
  • MongoDB Shard 101 – Adamo Tonete, Vinodh Krishnaswamy – Percona
  • Highway to Hell or Stairway to Cloud? – Alexander Kukushkin – Zalando
  • MongoDB administration cool tips – Gabriel Ciciliani – Pythian

Wednesday 7th November

  • PostgreSQL Enterprise Features – Michael Banck – credativ GmbH
  • Open Source Databases and Non-Volatile Memory – Frank Ober – Intel Memory Group
  • MariaDB 10.3 Optimizer and beyond – Vicentiu Ciorbaru – MariaDB Foundation
  • MariaDB system-versioned tables – Federico Razzoli – PayProp

A shout out to our fantastic Conference Committee who have been working hard to review the tutorial and talk submissions: we had over 200! Thank you!

The Radisson Blu Hotel, Frankfurt

Percona Live 2018 Open Source Database Conference will be held at the Radisson Blu Hotel, Frankfurt Franklinstraße 65, 60486 Frankfurt am Main, Germany

The Radisson Blu enjoys an enviable location in the Bockenheim District, just off the A66 motorway – only one kilometer from Messe Frankfurt, one of the world’s largest exhibition complexes. They’re also just 10 minutes from the city center, and Frankfurt International Airport (FRA) is a quick 15-minute drive away.

Book your hotel using Percona’s special room block rate, available only until September 20.

Sponsorships

Sponsorship opportunities for Percona Live Europe 2018 Open Source Database Conference are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. Contact events@percona.com for sponsorship details.

The post Tutorial Schedule for Percona Live Europe 2018 Is Live appeared first on Percona Database Performance Blog.

by Bronwyn Campbell at September 19, 2018 07:53 AM

September 18, 2018

Peter Zaitsev

Percona Server for MongoDB 3.2.21-3.12 Is Now Available

Percona Server for MongoDB

Percona Server for MongoDB 3.2Percona announces the release of Percona Server for MongoDB 3.2.21-3.12 on September 18, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.2 Community Edition. It supports MongoDB 3.2 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine, as well as several enterprise-grade features. It requires no changes to MongoDB applications or code.

This release updates the MongoDB portion of the Percona Server for MongoDB code base to
to 3.2.21.

The Percona Server for MongoDB 3.2.21-3.12 release notes are available in the official documentation.

The post Percona Server for MongoDB 3.2.21-3.12 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at September 18, 2018 06:28 PM

Percona XtraDB Cluster 5.6.41-28.28 Is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.6Percona announces the release of Percona XtraDB Cluster 5.6.41-28.28 (PXC) on September 18, 2018. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.41-28.28 is now the current release, based on the following:

Fixed Bugs

  • PXC-1017: Memcached API is now disabled if node is acting as a cluster node, because InnoDB Memcached access is not replicated by Galera.
  • PXC-2164: SST script compatibility with SELinux was improved by forcing it to look for port associated with the said process only.
  • PXC-2155: Temporary folders created during SST execution are now deleted on cleanup.
  • PXC-2199: TOI replication protocol was fixed to prevent unexpected GTID generation caused by the  DROP TRIGGER IF EXISTS statement logged by MySQL as a successful one due to its IF EXISTS clause.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

The post Percona XtraDB Cluster 5.6.41-28.28 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at September 18, 2018 06:11 PM

Kentoku SHIBA

Jean-Jerome Schmidt

Custom Graphs to Monitor your MySQL, MariaDB, MongoDB and PostgreSQL Systems - ClusterControl Tips & Tricks

Graphs are important, as they are your window onto your monitored systems. ClusterControl comes with a predefined set of graphs for you to analyze, these are built on top of the metric sampling done by the controller. Those are designed to give you, at first glance, as much information as possible about the state of your database cluster. You might have your own set of metrics you’d like to monitor though. Therefore ClusterControl allows you to customize the graphs available in the cluster overview section and in the Nodes -> DB Performance tab. Multiple metrics can be overlaid on the same graph.

Cluster Overview tab

Let’s take a look at the cluster overview - it shows the most important information aggregated under different tabs.

Cluster Overview Graphs
Cluster Overview Graphs

You can see graphs like “Cluster Load” and “Galera - Flow Ctrl” along with couple of others. If this is not enough for you, you can click on “Dash Settings” and then pick “Create Board” option. From there, you can also manage existing graphs - you can edit a graph by double-clicking on it, you can also delete it from the tab list.

Dashboard Settings
Dashboard Settings

When you decide to create a new graph, you’ll be presented with an option to pick metrics that you’d like to monitor. Let’s assume we are interested in monitoring temporary objects - tables, files and tables on disk. We just need to pick all three metrics we want to follow and add them to our new graph.

New Board 1
New Board 1

Next, pick some name for our new graph and pick a scale. Most of the time you want scale to be linear but in some rare cases, like when you mix metrics containing large and small values, you may want to use logarithmic scale instead.

New Board 2
New Board 2

Finally, you can pick if your template should be presented as a default graph. If you tick this option, this is the graph you will see by default when you enter the “Overview” tab.

Once we save the new graph, you can enjoy the result:

New Board 3
New Board 3

Node Overview tab

In addition to the graphs on our cluster, we can also use this functionality on each of our nodes independently. On the cluster, if we go to the “Nodes” section and select some of them, we can see an overview of it, with metrics of the operating system:

Node Overview Graphs
Node Overview Graphs

As we can see, we have eight graphs with information about CPU usage, Network usage, Disk space, RAM usage, Disk utilization, Disk IOPS, Swap space and Network errors, which we can use as a starting point for troubleshooting on our nodes.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

DB Performance tab

When you take a look at the node and then follow into DB Performance tab, you’ll be presented with a default of eight different metrics. You can change them or add new ones. To do that, you need to use “Choose Graph” button:

DB Performance Graphs
DB Performance Graphs

You’ll be presented with a new window, that allows you to configure the layout and the metrics graphed.

DB Performance Graphs Settings
DB Performance Graphs Settings

Here you can pick the layout - two or three columns of graphs and number of graphs - up to 20. Then, you may want to modify which metrics you’d want to see plotted - use drop-down dialog boxes to pick whatever metric you’d like to add. Once you are ready, save the graphs and enjoy your new metrics.

We can also use the Operational Reports feature of ClusterControl, where we will obtain the graphs and the report of our cluster and nodes in a HTML report, that can be accessed through the ClusterControl UI, or schedule it to be sent by email periodically.

These graphs help us to have a complete picture of the state and behavior of our databases.

by Sebastian Insausti at September 18, 2018 01:26 PM

Peter Zaitsev

Monitoring Processes with Percona Monitoring and Management

Memory utilization during compaction process

A few months ago I wrote a blog post on How to Capture Per Process Metrics in PMM. Since that time, Nick Cabatoff has made a lot of improvements to Process Exporter and I’ve improved the Grafana Dashboard to match.

I will not go through installation instructions, they are well covered in original blog post.  This post covers features available in release 0.4.0 Here are a few new features you might find of interest:

Used Memory

Memory usage in Linux is complicated.  You can look at resident memory, which shows how much space is used in RAM. However, if you have a substantial part of process swapped out because of memory pressure, you would not see it. You can also look at virtual memory–but it will include a lot of address space which was allocated and never mapped either to RAM or to swap space.   Especially for processes written in Go, the difference can be extreme. Let’s look at the process exporter itself: it uses 20MB of resident memory but over 2GB of virtual memory.

top processes by resident memory

top processes by virtual memory

Meet the Used Memory dashboard, which shows the sum of resident memory used by the process and swap space used:

used memory dashboard

There is dashboard to see process by swap space used as well, so you can see if some processes that you expect to be resident are swapped out.

Processes by Disk IO

processes by disk io

Processes by Disk IO is another graph which I often find very helpful. It is the most useful for catching the unusual suspects, when the process causing the IO is not totally obvious.

Context Switches

Context switches, as shown by VMSTAT, are often seen as an indication of contention. With contention stats per process you can see which of the process are having those context switches.

top processes by voluntary context switches

Note: while large number of context switches can be a cause of high contention, some applications and workloads are just designed in such a way. You are better off looking at the change in the number of context switches, rather than at the raw number.

CPU and Disk IO Saturation

As Brendan Gregg tells us, utilization and saturation are not the same. While CPU usage and Disk IO usage graphs show us resource utilization by different processes, they do not show saturation.

top running processes graph

For example, if you have four CPU cores then you can’t get more than four CPU cores used by any process, whether there are four or four hundred concurrent threads trying to run.

While being rather volatile as gauge metrics, top running processes and top processes waiting on IO are good metrics to understand which processes are prone to saturation.

These graphs roughly provide a breakdown of “r” and “b”  VMSTAT columns per process

Kernel Waits

Finally, you can see which kernel function (WCHAN) the process is sleeping on, which can be very helpful to access processes which are not using a lot of CPU, but are not making much progress either.

I find this graph most useful if you pick the single process in the dashboard picker:

kernel waits for sysbench

In this graph we can see sysbench has most threads sleeping in

unix_stream_read_generic
  which corresponds to reading the response from MySQL from UNIX socket – exactly what you would expect!

Summary

If you ever need to understand what different processes are doing in your system, then Nick’s Process Exporter is a fantastic tool to have. It just takes few minutes to get it added into your PMM installation.

If you enjoyed this post…

You might also like my pre-recorded webinar MySQL troubleshooting and performance optimization with PMM.

The post Monitoring Processes with Percona Monitoring and Management appeared first on Percona Database Performance Blog.

by Peter Zaitsev at September 18, 2018 11:39 AM

Upcoming Webinar Wed 9/19: Percona Server for MongoDB vs MongoDB Enterprise

MongoDB Enterprise Advanced

MongoDB Enterprise AdvancedPlease join Percona’s Senior Support Engineer, Adamo Tonete, as he presents Percona Server for MongoDB vs MongoDB Enterprise on Wednesday, September 19th, 2018, at 12:30 PM PDT (UTC-7) / 3:30 PM EDT (UTC-4).

 

In this webinar we will evaluate MongoDB Community, MongoDB Enterprise Advanced, and Percona Server for MongoDB side-by-side to better inform the decision making process. Percona Server for MongoDB features enterprise-grade functionalities and runs on tools like Percona Monitoring and Management to monitor MongoDB. Even more, it is 100% free and open source. It is also worth mentioning that MongoDB Enterprise Advanced comes with encryption and LDAP authorization. Accordingly, we will walk through those features and much more.

Register for this webinar to learn more about Percona Server for MongoDB vs MongoDB Enterprise.

The post Upcoming Webinar Wed 9/19: Percona Server for MongoDB vs MongoDB Enterprise appeared first on Percona Database Performance Blog.

by Adamo Tonete at September 18, 2018 12:08 AM

September 17, 2018

Peter Zaitsev

Percona Server for MongoDB 3.6.7-1.5 Is Now Available

Percona Server for MongoDB

Percona Server for MongoDBPercona announces the release of Percona Server for MongoDB 3.6.7-1.5 on September 17, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.6 Community Edition. It supports MongoDB 3.6 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine, as well as several enterprise-grade features. Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.6.7 and does not include any additional changes.

The Percona Server for MongoDB 3.6.7-1.5 release notes are available in the official documentation.

The post Percona Server for MongoDB 3.6.7-1.5 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at September 17, 2018 06:28 PM

Jean-Jerome Schmidt

Become a ClusterControl DBA: Performance and Health Monitoring

In the previous two blog posts we covered both deploying the four types of clustering/replication (MySQL/Galera, MySQL Replication, MongoDB & PostgreSQL) and managing/monitoring your existing databases and clusters. So, after reading these two first blog posts you were able to add your 20 existing replication setups to ClusterControl, expand them and additionally deployed two new Galera clusters while doing a ton of other things. Or maybe you deployed MongoDB and/or PostgreSQL systems. So now, how do you keep them healthy?

That’s exactly what this blog post is about: how to leverage ClusterControl performance monitoring and advisors functionality to keep your MySQL, MongoDB and/or PostgreSQL databases and clusters healthy. So how is this done in ClusterControl?

Database Cluster List

The most important information can already be found in the cluster list: as long as there are no alarms and no hosts are shown to be down, everything is functioning fine. An alarm is raised if a certain condition is met, e.g. host is swapping, and brings to your attention the issue you should investigate. That means that alarms not only are raised during an outage but also to allow you to proactively manage your databases.

Suppose you would log into ClusterControl and see a cluster listing like this, you will definitely have something to investigate: one node is down in the Galera cluster for example and every cluster has various alarms:

Once you click on one of the alarms, you will go to a detailed page on all alarms of the cluster. The alarm details will explain the issue and in most cases also advise the action to resolve the issue.

You can set up your own alarms by creating custom expressions, but that has been deprecated in favor of our new Developer Studio that allows you to write custom Javascripts and execute these as Advisors. We will get back to this topic later in this post.

Cluster Overview - Dashboards

When opening up the cluster overview, we can immediately see the most important performance metrics for the cluster in the tabs. This overview may differ per cluster type as, for instance, Galera has different performance metrics to watch than traditional MySQL, PostgreSQL or MongoDB.

Both the default overview and the pre-selected tabs are customizable. By clicking on Overview -> Dash Settings you are given a dialogue that allows you to define the dashboard:

By pressing the plus sign you can add and define your own metrics to graph the dashboard. In our case we will define a new dashboard featuring the Galera specific send and receive queue average:

This new dashboard should give us good insight in the average queue length of our Galera cluster.

Once you have pressed save, the new dashboard will become available for this cluster:

Similarly you can do this for PostgreSQL as well, for example we can monitor the shared blocks hit versus blocks read:

So as you can see, it is relatively easy to customize your own (default) dashboard.

Cluster Overview - Query Monitor

The Query Monitor tab is available for both MySQL and PostgreSQL based setups and consists out of three dashboards: Top Queries, Running Queries and Query Outliers.

In the Running Queries dashboard, you will find all current queries that are running. This is basically the equivalent of SHOW FULL PROCESSLIST statement in MySQL database.

Top Queries and Query Outliers both rely on the input of the slow query log or Performance Schema. Using Performance Schema is always recommended and will be used automatically if enabled. Otherwise, ClusterControl will use the MySQL slow query log to capture the running queries. To prevent ClusterControl from being too intrusive and the slow query log to grow too large, ClusterControl will sample the slow query log by turning it on and off. This loop is by default set to 1 second capturing and the long_query_time is set to 0.5 seconds. If you wish to change these settings for your cluster, you can change this via Settings -> Query Monitor.

Top Queries will, like the name says, show the top queries that were sampled. You can sort them on various columns: for instance the frequency, average execution time, total execution time or standard deviation time:

You can get more details about the query by selecting it and this will present the query execution plan (if available) and optimization hints/advisories. The Query Outliers is similar to the Top Queries but then allows you to filter the queries per host and compare them in time.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Cluster Overview - Operations

Similar to the PostgreSQL and MySQL systems the MongoDB clusters have the Operations overview and is similar to the MySQL's Running Queries. This overview is similar to issuing the db.currentOp() command within MongoDB.

Cluster Overview - Performance

MySQL/Galera

The performance tab is probably the best place to find the overall performance and health of your clusters. For MySQL and Galera it consists of an Overview page, the Advisors, status/variables overviews, the Schema Analyzer and the Transaction log.

The Overview page will give you a graph overview of the most important metrics in your cluster. This is, obviously, different per cluster type. Eight metrics have been set by default, but you can easily set your own - up to 20 graphs if needed:

The Advisors is one of the key features of ClusterControl: the Advisors are scripted checks that can be run on demand. The advisors can evaluate almost any fact known about the host and/or cluster and give its opinion on the health of the host and/or cluster and even can give advice on how to resolve issues or improve your hosts!

The best part is yet to come: you can create your own checks in the Developer Studio (ClusterControl -> Manage -> Developer Studio), run them on a regular interval and use them again in the Advisors section. We blogged about this new feature earlier this year.

We will skip the status/variables overview of MySQL and Galera as this is useful for reference but not for this blog post: it is good enough that you know it is here.

Now suppose your database is growing but you want to know how fast it grew in the past week. You can actually keep track of the growth of both data and index sizes from right within ClusterControl:

And next to the total growth on disk it can also report back the top 25 largest schemas.

Another important feature is the Schema Analyzer within ClusterControl:

ClusterControl will analyze your schemas and look for redundant indexes, MyISAM tables and tables without a primary key. Of course it is entirely up to you to keep a table without a primary key because some application might have created it this way, but at least it is great to get the advice here for free. The Schema Analyzer even recommends the necessary ALTER statement to fix the problem.

PostgreSQL

For PostgreSQL the Advisors, DB Status and DB Variables can be found here:

MongoDB

For MongoDB the Mongo Stats and performance overview can be found under the Performance tab. The Mongo Stats is an overview of the output of mongostat and the Performance overview gives a good graphical overview of the MongoDB opcounters:

Final Thoughts

We showed you how to keep your eyeballs on the most important monitoring and health checking features of ClusterControl. Obviously this is only the beginning of the journey as we will soon start another blog series about the Developer Studio capabilities and how you can make most of your own checks. Also keep in mind that our support for MongoDB and PostgreSQL is not as extensive as our MySQL toolset, but we are continuously improving on this.

You may ask yourself why we have skipped over the performance monitoring and health checks of HAProxy, ProxySQL and MaxScale. We did that deliberately as the blog series covered only deployments of clusters up till now and not the deployment of HA components. So that’s the subject we'll cover next time.

by ashraf at September 17, 2018 12:14 PM

Peter Zaitsev

Using the keyring_vault Plugin with Percona Server for MySQL 5.7

keyring_vault store database encryption keys

keyring_vault store database encryption keysThis is the first of a two-part series on using the keyring_vault plugin with Percona Server for MySQL 5.7. The second part will walk you through on how to use Percona Xtrabackup to backup from this instance and restore to another server and set it up as a slave with keyring_vault plugin.

What is the keyring_vault plugin?

The keyring_vault is a plugin that allows the database to interface with a Hashicorp Vault server to store and secure encryption keys. The Vault server then acts as a centralized encryption key management solution which is critical for security and for compliance with various security standards.

Configuring Vault

Create SSL certificates to be used by Vault. You can use the sample ssl.conf template below to generate the necessary files.

[root@vault1 ~]# cat /etc/sslkeys/ssl.conf
[req]
distinguished_name = req_distinguished_name
x509_extensions = v3_req
prompt = no
[req_distinguished_name]
C = US
ST = NC
L =  R
O = Percona
CN = *
[v3_req]
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid,issuer
basicConstraints = CA:TRUE
subjectAltName = @alt_names
[alt_names]
IP = 192.168.0.114

Then run the two commands below to generated the cert and key files and the certificate chain:

$ openssl req -config ssl.conf -x509 -days 365 -batch -nodes -newkey rsa:2048 -keyout vault.key -out vault.crt
$ cat vault.key vault.crt > vault.pem

Once the SSL certificates are created start Vault with the sample configuration below. Take note that you should follow the suggested best practices when deploying Vault in production, this example is to get us by with a simple working setup.

[root@vault1 ~]# cat /etc/vault.hcl
listener "tcp" {
address = "192.168.0.114:8200"
tls_cert_file="/etc/sslkeys/vault.crt"
tls_key_file="/etc/sslkeys/vault.key"
}
storage "file" {
path = "/var/lib/vault"
}

Assuming Vault started up fine and you are able to unseal Vault, the next step is to create the policy file. For more details on initializing and unsealing Vault please read the manual here.

[root@vault1 ~]# cat /etc/vault/policy/dc1.hcl
path "secret/*" {
capabilities = ["list"]
}
path "secret/dc1/*" {
capabilities = ["create", "read", "delete", "update", "list"]
}

Create a Vault policy named dc1-secrets using the dc1.hcl file like this:

[root@vault1 ~]# vault policy write dc1-secrets /etc/vault/policy/dc1.hcl -ca-cert=/etc/sslkeys/vault.pem
Success! Uploaded policy: dc1-secrets

Next, create a token associated with the newly created policy:

[root@vault1 ~]# vault token create -policy=dc1-secrets -ca-cert=/etc/sslkeys/vault.pem > dc1-token
[root@vault1 ~]# cat dc1-token
Key                  Value
---                  -----
token                be515093-b1a8-c799-b237-8e04ea90ad7a
token_accessor       4c1ba5c5-3fed-e9bb-d230-5bf1392e2d7e
token_duration       8760h
token_renewable      true
token_policies       ["dc1-secrets" "default"]
identity_policies    []
policies             ["dc1-secrets" "default"]

Setting up MySQL

The following instructions should work starting from Percona Server for MySQL 5.7.20-18 and through later versions.

Configure my.cnf with the following variables:

early-plugin-load="keyring_vault=keyring_vault.so"
loose-keyring_vault_config="/var/lib/mysql-keyring/keyring_vault.conf"
encrypt_binlog=ON
innodb_encrypt_online_alter_logs=ON
innodb_encrypt_tables=ON
innodb_temp_tablespace_encrypt=ON
master_verify_checksum=ON
binlog_checksum=CRC32
log_bin=mysqld-bin
binlog_format=ROW
server-id=1
log-slave-updates

Create the keyring_vault.conf file in the path above with the following contents:

[root@mysql1 ~]# cat /var/lib/mysql-keyring/keyring_vault.conf
vault_url = https://192.168.0.114:8200
secret_mount_point = secret/dc1/master
token = be515093-b1a8-c799-b237-8e04ea90ad7a
vault_ca = /etc/vault_ca/vault.pem

Here we are using the vault.pem file generated by combining the vault.crt and vault.key files. Observe that our secret_mount_point is secret/dc1/master. We want to make sure that this mount point is unique across all servers, this is in fact advised in the manual here.

Ensure that the CA certificate is owned by mysql user:

[root@mysql1 ~]# ls -la /etc/vault_ca/
total 24
drwxr-xr-x  2 mysql mysql   41 Jul 14 11:39 .
drwxr-xr-x 63 root  root  4096 Jul 14 13:17 ..
-rw-------  1 mysql mysql 1139 Jul 14 11:39 vault.pem

Initialize the MySQL data directory on the Master:

[root@mysql1 ~]# mysqld --initialize-insecure --datadir=/var/lib/mysql --user=mysql

For production systems we do not recommend using --initialize-insecure option, this is just to skip additional steps in this tutorial.

Finally, start mysqld instance and then test the setup by creating an encrypted table.

[root@mysql1 ~]# systemctl status mysqld
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; disabled; vendor preset: disabled)
Active: active (running) since Sat 2018-07-14 23:53:16 UTC; 2s ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Process: 1401 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
Process: 1383 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
Main PID: 1403 (mysqld)
CGroup: /system.slice/mysqld.service
└─1403 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
Jul 14 23:53:16 mysql1 systemd[1]: Starting MySQL Server...
Jul 14 23:53:16 mysql1 systemd[1]: Started MySQL Server.

At this point you should have Percona Server for MySQL instance with tablespace encryption using Vault.

Researching database security?

You might also enjoy this pre-recorded webinar securing your database servers from external attacks presented by my colleague Colin Charles.

The post Using the keyring_vault Plugin with Percona Server for MySQL 5.7 appeared first on Percona Database Performance Blog.

by Jericho Rivera at September 17, 2018 12:02 PM

September 14, 2018

Peter Zaitsev

This Week in Data With Colin Charles 52: London MySQL Meetup

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

This week wraps up the London MySQL meetup, where there were four presentations, for the intimate yet diverse crowd. We saw representation from Oracle MySQL, MariaDB Corporation, Pythian, and Percona. Long-time organizer Ivan Zoratti has also handed off the baton to Maria Luisa Raviol, and going forward she will ensure meetups are at least once per quarter. It was a real pleasure to see MySQL Community Manager Dave Stokes at the event, too.

A new book to read: AWS System Administration: Best Practices for Sysadmins in the Amazon Cloud. There is coverage of RDS, from the standpoint of an example application stack as well as backups.

A most interesting tweet from the Chief Marketing Officer of MongoDB, Meagen Eisenberg, of an ad on a billboard: friend’s don’t let friends use relational databases.

Releases

Link List

Industry Updates

  • Catalyst IT Australia acquires Open Query – Arjen Lentz is a long-time MySQL community member, and he ran his company for the last 11 years pre-acquisition. Congratulations!
  • Elastic files for IPO – the financials are solid, *”Our revenue was $159.9 million and $88.2 million in fiscal 2018 and 2017, respectively, representing year-over-year growth of 81% for fiscal 2018.” The filing is worth reading.
  • New Cloud Unicorn: PagerDuty Scores $1.3 Billion Valuation In $90 Million Round – total raised now $173 million, valuing the company at $1.3 billion. Some of their competitors have been purchased recently, VictorOps by Splunk for $120 million and OpsGenie by Atlassian for $295 million. There are not many independents left in this space beyond PagerDuty and xMatters(who recently picked up Series D financing, total raised $96.5 million).
  • PingCap raises $50m in Series C funding. They are behind TiDB and TiKV. Raised a total of $72 million, and this is a significant increase from the $15m Series B raise in June 2017!
  • Tague Griffith has departed Redis Labs where he was Head of Developer Advocacy and is now at Google.
  • Manyi Lu who has been in the MySQL world for a very long time, leading much of the changes in the MySQL optimizer, who was most recently Director of Software Development at Oracle has departed to be a Senior Director at Alicloud.

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data With Colin Charles 52: London MySQL Meetup appeared first on Percona Database Performance Blog.

by Colin Charles at September 14, 2018 10:17 PM

PostgreSQL Webinar Wed Oct 10th – Enterprise-Grade PostgreSQL: Built on Open Source Tools

Enterprise PostgreSQL built with open source tools

PostgreSQL® logoPlease join Percona’s PostgreSQL Support Technical Lead,  Avinash Vallarapu; Senior Support Engineer, Fernando Laudares; and Senior Support Engineer, Jobin Augustine, on Wednesday, October 10th, 2018 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4), as they demonstrate an enterprise-grade PostgreSQL® environment built using a combination of open source tools and extensions.

“We built our application on top of PostgreSQL. It works great but only now that we took it to the market and it became a hit we realize how much it relies on the database. How can we “harden” PostgreSQL? How can we make the solution we built around PostgreSQL enterprise-grade?”

“I migrated from a proprietary database software to PostgreSQL. I am curious to know whether I can get the same features I used to have in the proprietary database software.”

You’ll find the answer to these questions and more in a series of blog posts we will be publishing on this topic, which will be followed by a live demo we planned for our webinar on October 10th, 2018.

The market coined the term “enterprise grade” or “enterprise ready” to differentiate products and service offerings for licensed database software. For example: there may be a standard database software or an entry-level package that delivers the core functionality and basic features. Likewise, there may be an enterprise version, a more advanced package which goes beyond the essentials to include features and tools indispensable for running critical solutions in production. With such a differentiation found in commercial software, we may wonder whether a solution built on top of an open source database like PostgreSQL can satisfy all the enterprise requirements.

It starts with building a secured PostgreSQL environment, tuning the database for the production workload, building a high availability strategy that avoids single-point-of-failures, scaling PostgreSQL using connection poolers to avoid excessive usage of server resources, and finally load balancing the reads between master and all the available standby servers aka replicas to effectively use the computing power of all the database servers.

The operational aspect of maintaining an enterprise grade PostgreSQL database also includes the methods to configure a backup strategy that helps us achieve point-in-time-recovery as needed, detailed logging and monitoring PostgreSQL along with a real-time analysis of the database performance and finally maintaining the database health with optimal performance, such as making sure vacuuming is working as it should and at the right times.

“Can we build such an enterprise grade solution that satisfies all the above requirements around PostgreSQL with open source softwares only?”

Yes, we can. During the 20+ years PostgreSQL has been around, the open source community has created all sorts of complementary extensions and tools that can be used to build an enterprise grade solution with postgres.

We’ll be following this post with a series of posts covering each piece of such solution, culminating with a webinar that will take place on October 10th. During the webinar we’ll showcase the full project. Here’s the list of topics we are going to consider while building our enterprise grade PostgreSQL server.

  1. Securing your PostgreSQL database cluster
  2. High Availability
  3. Preparing a Backup strategy and the tools available to achieve it
  4. Scaling PostgreSQL using connection poolers and load balancers
  5. Tools/extensions available for your daily DBA life and detailed logging in PostgreSQL.
  6. Monitoring your PostgreSQL and real-time analysis.

Join us to see it in action!

The post PostgreSQL Webinar Wed Oct 10th – Enterprise-Grade PostgreSQL: Built on Open Source Tools appeared first on Percona Database Performance Blog.

by Avinash Vallarapu at September 14, 2018 05:23 PM

Encryption of the InnoDB System Tablespace and Parallel Doublewrite Buffer

encryption of InnoDB tablespace parallel doublewrite buffer

encryption of InnoDB tablespace parallel doublewrite bufferIn my last post I compared data at-rest encryption features available for MySQL and MariaDB. As noted at the time, some of the features available for Percona Server for MySQL were in development, and the latest version (5.7.23) sees two of them released as ALPHA quality.

Encrypting the InnoDB system tablespace

The first of the new features is InnoDB system tablespace encryption via innodb_sys_tablespace_encrypt, which would provide encryption of the following data:

  • the change buffer, which caches changes to secondary index pages as a result of DML operations for pages that are not in the InnoDB buffer pool
  • The undo logs if they have not been configured to be stored in separate undo tablespaces
  • data from any tables that exist in the main tablespace, which occurs when innodb_file_per_table is disabled

There are some related changes on the horizon that would allow this to be applied to an existing instance. However, for now this is only available for new instances as it can only be applied during bootstrap. This means that it would require a logical restore of your data to use it with an existing cluster–I should restate that this is an ALPHA feature and not production-ready.

There are some extra points to note about this new variable:

  • an instance with an encrypted tablespace cannot be downgraded to use a version prior to 5.7.23, due to the inability to read the tablespace
  • as noted, it is not currently possible to convert the tablespace between encrypted and unencrypted states, or vice versa
  • the key for the system tablespace can be manually rotated using ALTER INSTANCE ROTATE INNODB MASTER KEY as per any other tablespace

Encrypting the parallel doublewrite buffer

To complement the encryption of the system tablespace, it is also possible to encrypt the parallel doublewrite buffer using innodb_parallel_dblwr_encrypt, a feature unique to Percona Server for MySQL.  This means that any data for an encrypted tablespace is also only written in an encrypted form in the parallel doublewrite buffer; unencrypted tablespace data remains in plaintext. Unlike innodb_sys_tablespace_encrypt, you are able to set innodb_parallel_dblwr_encrypt dynamically on an existing instance.

There are more encryption features planned–or already in development–for Percona Server for MySQL so watch this space!

The post Encryption of the InnoDB System Tablespace and Parallel Doublewrite Buffer appeared first on Percona Database Performance Blog.

by Ceri Williams at September 14, 2018 11:29 AM

September 13, 2018

MariaDB AB

Call for Papers – 2019 Annual MariaDB User Conference

Call for Papers – 2019 Annual MariaDB User Conference Shane Johnson Thu, 09/13/2018 - 16:42

The annual MariaDB user conference will be taking place February 25-27, 2019, at New York City's Conrad Hotel. For the first time, the event will span three days, which means more room for new topics. We want to hear your ideas! 

Submitting a proposal is easy. Come up with your presentation title and abstract, then submit it here by October 31, 2018. 

Topics and Tracks

Are you an innovator? Have you been in the trenches with MariaDB? If so, we invite you to share your knowledge and experience with the global MariaDB community. You’re just who we’re looking for!

Submit your best ideas involving: 

  • How you're using MariaDB in the real world

  • Challenges you've faced and lessons you've learned

  • Best practices and expert strategies

And let us know which speaking track you want to contribute to:

Real World: Adopting and embracing MariaDB in the enterprise to solve business challenges and drive innovation

Operations: Gaining operational agility with cloud/container infrastructure, automated provisioning and transparent routing, sharding and failover

Development: Building modern applications with advanced SQL (e.g., temporal queries) and flexible schemas (e.g., hybrid relational/JSON data models)

Analytics: Driving faster time-to-insight with ad-hoc, interactive analytics on near real-time data, and scaling to support 100s of billions of rows

Bonus – Free 3-Day Conference Pass for Speakers

In addition to a presentation in front of MariaDB's global community, if your speaking proposal is accepted (you'll be notified by mid-December), you'll receive a free three-day pass to the show, including access to two deep-dive workshops. 

Submit your speaking proposal by October 31, 2018.

We look forward to seeing you in New York February 25-27, 2019!

 

The annual MariaDB user conference will be taking place February 25-27, 2019, at New York City's Conrad Hotel. For the first time, the event will span three days, which means more room for new topics. We want to hear your ideas – submit your speaking proposal by October 31, 2018! 

Login or Register to post comments

by Shane Johnson at September 13, 2018 08:42 PM

Peter Zaitsev

Percona Toolkit 3.0.12 Is Now Available

percona toolkit

percona toolkitPercona announces the release of Percona Toolkit 3.0.12 on September 13, 2018.

Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL®, MongoDB® and system tasks that are too difficult or complex to perform manually. With over 1,000,000 downloads, Percona Toolkit supports Percona Server for MySQL, MySQL®, MariaDB®, Percona Server for MongoDB and MongoDB.

Percona Toolkit, like all Percona software, is free and open source. You can download packages from the website or install from official repositories.

This release includes the following changes:

Fixed bugs:

 

  • PT-1611: pt-archiver failed to output UTF-8 characters.
  • PT-1603: pt-table-sync incorrectly calculated chunk boundaries in case of unsorted ENUM fields in indexes.
  • PT-1574: pt-online-schema-change failed on tables with a nullable unique key and a row with NULL values.
  • PT-1572: ENUM fields usage in keys was improved, resulting in higher speed for expressions with sorted ENUM items.
  • PT-1422: pt-mysql-summary could hang when NULL values appear in the processlist Time column.

Documentation change:

  • PT-1321: The required MySQL privileges were detailed in pt-online-schema-change documentation

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

The post Percona Toolkit 3.0.12 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at September 13, 2018 06:29 PM

Analyzing Amazon Aurora Slow Logs with pt-query-digest

Amazon Aurora MySQL slow query logs with pt-query-digest slow

Amazon Aurora MySQL slow query logs with pt-query-digest slowIn this blog post we shall discuss how you can analyze slow query logs from Amazon Aurora for MySQL, (referred to as Amazon Aurora in the remaining blog). The tools and techniques explained here apply to the other MySQL compatible services available under Amazon Aurora. However, we’ll focus specially on analyzing slow logs from Amazon Aurora version 2 (MySQL 5.7 compatible) using pt-query-digest. We believe there is a bug in Aurora where it logs really big numbers for query execution and lock times for otherwise really fast queries.

So, the main steps we need are:

  1. Enable slow query logging on your Amazon Aurora DB parameter group, apply the change when appropriate.
  2. Download the slow log(s) that match the time that you are interested to investigate, and optionally concatenate them.
  3. Run pt-query-digest on the downloaded logs and check the results.

Enable slow query logging

For our testing we decided to capture all the SELECT queries that were hitting our Amazon Aurora instance, mainly because we had a sysbench OLTP read only workload and that wouldn’t really have a lot of slow queries. An easy way to do so is to enable the capture of slow query logs and set long_query_time to 0 — you will need to enable slow query logging. To achieve that, we created a new DB parameter group and applied it to our test Aurora instance with the following three parameters set as below:

slow_query_log=1
long_query_time=0
min_examined_row_limit=0

Once you have the above configuration applied to Amazon RDS, you will be able to see slow query logs being created in the Amazon RDS console.

Download the log file

You can download the log file of your choice using either the Amazon RDS console OR you can use the following AWS CLI command to achieve the same:

$ aws rds download-db-log-file-portion --db-instance-identifier perconasupport  --starting-token 0 --output text --log-file-name slowquery/mysql-slowquery.log.2018-09-03.09 > mysql-slowquery.log.2018-09-03.09

Depending on the size of the chosen log file, the above command will take some time to complete the download.

Run pt-query-digest on the log file

Once the file has been downloaded you can analyse that using the following pt-query-digest command.

$ pt-query-digest --group-by fingerprint --order-by Query_time:sum mysql-slowquery.log.2018-09-03.09

On our Aurora test slow log file, the initial results didn’t look right so we had to apply a workaround. Here is the header of the initial results from pt-query-digest:

# 456.2s user time, 2.5s system time, 43.80M rss, 141.48M vsz
# Current date: Tue Sep 4 15:54:21 2018
# Hostname: aahmed-GL503VD
# Files: mysql-slowquery.log.2018-09-03.09
# Overall: 5.13M total, 60 unique, 1.43k QPS, 507.43Gx concurrency _______
# Time range: 2018-09-03T08:00:04 to 2018-09-03T09:00:03
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Exec time 1826227663297288s 1us 18446744073710s 355917782s 761us 80127878922s 93us
# Lock time 1401952549601936s 0 18446744073710s 273229812s 44us 70205933577s 23us
# Rows sent 94.71M 0 100 19.35 97.36 37.62 0.99
# Rows examine 216.26M 0 300 44.19 299.03 84.74 0.99
# Query size 196.24M 5 1.24k 40.08 72.65 18.90 36.69
# Profile
# Rank Query ID Response time Calls R/Call
# ==== ====================== =========================== ======= ========
# 1 0xE81D0B3DB4FB31BC5... 1346612317380813.0000 73.7% 3194111 421592210.5966 18... SELECT sbtest?
# 2 0x9934EF6887CC7A638... 147573952589685.0625 8.1% 319381 462062403.8051 18... SELECT sbtest?
# 3 0x8D589AFA4DFAEEED8... 110680464442264.1094 6.1% 319411 346514254.1812 18... BEGIN
# 4 0xFF7C69F51BBD3A736... 92233720368565.1875 5.1% 319388 288782673.0139 18... SELECT sbtest?
# 5 0xFFFCA4D67EA0A7888... 73786976294861.9844 4.0% 321238 229695665.8143 18... COMMIT
# MISC 0xMISC 55340232221335.8281 3.0% 657509 84166501.4796 0.0 <43 ITEMS>

What’s wrong with the above results is that the total query Exec time and Lock time are very large numbers. Digging deeper into the logs revealed a problem with the slow logs themselves that had very large numbers for Query time & Lock time for some queries. For instance in our case, of 5.13 million queries in the log file, only 111 had the anomaly. Even so, it was enough to skew the results.

# Time: 2018-09-03T08:41:47.363522Z
--
SELECT c FROM sbtest1 WHERE id=24278;
# Time: 2018-09-03T08:41:49.363224Z
# User@Host: perconasupport[perconasupport] @ [172.30.2.111] Id: 20869
# Query_time: 18446744073709.550781 Lock_time: 18446744073709.550781 Rows_sent: 1 Rows_examined: 1
SET timestamp=1535964109;
SELECT c FROM sbtest2 WHERE id=989322;
# Time: 2018-09-03T08:41:49.363296Z
--
BEGIN;
# Time: 2018-09-03T08:41:53.362947Z
# User@Host: perconasupport[perconasupport] @ [172.30.2.111] Id: 20873
# Query_time: 18446744073709.550781 Lock_time: 18446744073709.550781 Rows_sent: 1 Rows_examined: 1
SET timestamp=1535964113;
SELECT c FROM sbtest1 WHERE id=246889;
# Time: 2018-09-03T08:41:53.363003Z

Incorrect logging

The above two queries are, in fact, really fast, but for some reason the execution time & lock times are wrongly logged in the slow query log. Since the number of such query log records is statistically negligible compared to the total number of queries, we decided to ask pt-query-digest to ignore them using the command line parameter –attribute-value-limit . The default value of this parameter is 0. We decided to increase that to 2^32, and make it ignore the large numbers from the slow query log. So, the pt-query-digest command became:

$ pt-query-digest --group-by fingerprint --order-by Query_time:sum --attribute-value-limit=4294967296 mysql-slowquery.log.2018-09-03.09

This caused the 111 queries with the bad log times to be ignored and the results looked good. In our case, the ignored queries were bad variants of queries for which good versions existed. You can tell this because the number of unique queries remained the same as before after the bad variants were ignored. However, this may not always hold true and one should expect to lose some fidelity, especially if you are analyzing a smaller slow log.

# 441s user time, 450ms system time, 38.19M rss, 111.76M vsz
# Current date: Tue Sep 4 16:23:33 2018
# Hostname: aahmed-GL503VD
# Files: mysql-slowquery.log.2018-09-03.09
# Overall: 5.13M total, 60 unique, 1.43k QPS, 0.30x concurrency __________
# Time range: 2018-09-03T08:00:04 to 2018-09-03T09:00:03
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Exec time 1096s 1us 198ms 213us 761us 431us 93us
# Lock time 180s 0 103ms 34us 44us 161us 23us
# Rows sent 94.71M 0 100 19.35 97.36 37.62 0.99
# Rows examine 216.26M 0 300 44.19 299.03 84.74 0.99
# Query size 196.24M 5 1.24k 40.08 72.65 18.90 36.69
# Profile
# Rank Query ID Response time Calls R/Call V/M Ite
# ==== =========================== ============== ======= ====== ===== ===
# 1 0xE81D0B3DB4FB31BC558CAE... 400.1469 36.5% 3194111 0.0001 0.00 SELECT sbtest?
# 2 0xF0C5AE75A52E847D737F39... 161.4065 14.7% 319453 0.0005 0.00 SELECT sbtest?
# 3 0xFFFCA4D67EA0A788813031... 155.8740 14.2% 321238 0.0005 0.00 COMMIT
# 4 0x8D589AFA4DFAEEED85FFF5... 107.9827 9.9% 319411 0.0003 0.00 BEGIN
# 5 0x9934EF6887CC7A6384D1DE... 94.1002 8.6% 319381 0.0003 0.00 SELECT sbtest?
# 6 0xFF7C69F51BBD3A736EEB1B... 79.9279 7.3% 319388 0.0003 0.00 SELECT sbtest?
# 7 0xA729E7889F57828D3821AE... 75.3969 6.9% 319398 0.0002 0.00 SELECT sbtest?
# MISC 0xMISC 21.1212 1.9% 18658 0.0011 0.0 <41 ITEMS>
# Query 1: 1.27k QPS, 0.16x concurrency, ID 0xE81D0B3DB4FB31BC558CAEF5F387E929 at byte 358647353
# Scores: V/M = 0.00
# Time range: 2018-09-03T08:00:04 to 2018-09-03T08:42:00
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 62 3194111
# Exec time 36 400s 10us 198ms 125us 332us 300us 80us
# Lock time 74 134s 0 26ms 42us 49us 154us 27us
# Rows sent 3 3.01M 0 1 0.99 0.99 0.11 0.99
# Rows examine 1 3.01M 0 1 0.99 0.99 0.11 0.99
# Query size 57 112.37M 32 38 36.89 36.69 0.53 36.69
# String:
# Databases perconasupport
# Hosts 172.30.2.111
# Users perconasupport
# Query_time distribution
# 1us
# 10us ################################################################
# 100us ##############
# 1ms #
# 10ms #
# 100ms #
# 1s

That number looks familiar

The really big number 18446744073709.550781 seemed to ring a bell. A quick web search revealed that it could be a regression of an old bug in MySQL’s code. The following bugs were found to have the same value being reported for query exec time & query lock time.

  1. https://bugs.mysql.com/bug.php?id=59757
  2. https://bugs.mysql.com/bug.php?id=63524
  3. https://bugs.mysql.com/bug.php?id=35396
Once slow logs were enabled, we used this sysbench command  to generate the workload for the Amazon Aurora instance. You might like to try it yourselves. Please note that this used sysbench version 1.0.14.

$ sysbench --db-driver=mysql --mysql-user=perconasupport --mysql-host=perconasupport-1234567.cgmobiazycdv.eu-west-1.rds.amazonaws.com --mysql-password=XXXXXXX  --mysql-db=perconasupport --range_size=100 --table_size=1000000 --tables=2 --threads=6 --events=0 --time=600 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua run

If you are an Amazon Aurora user, have you found any problems analyzing slow query logs? You are welcome to use the comments section, below, to let me know.

Percona Toolkit

pt-query-digest is part of Percona Toolkit, a collection of advanced open source command-line tools, developed and used by the Percona technical staff. Percona Toolkit is open source and free to download and use.

The post Analyzing Amazon Aurora Slow Logs with pt-query-digest appeared first on Percona Database Performance Blog.

by Abbas Ahmed at September 13, 2018 01:17 PM

MariaDB Foundation

Improved MariaDB performance using Shannon Systems PCIe and atomic writes

We recently had a chance to test how the Shannon Direct-IO PCIe Flash G3i devices improves the performance of MariaDB. Anybody who cares about I/O performance has noticed that PCIe drives have become readily available in recent years and that they are significantly faster than normal SSD drives connected to the traditional SATA port (not to […]

The post Improved MariaDB performance using Shannon Systems PCIe and atomic writes appeared first on MariaDB.org.

by Otto Kekäläinen at September 13, 2018 05:30 AM

September 12, 2018

MariaDB AB

What Does It Take to Achieve Freedom from Oracle? 5 Factors That Made One Company Take the Leap

What Does It Take to Achieve Freedom from Oracle? 5 Factors That Made One Company Take the Leap MariaDB Team Wed, 09/12/2018 - 14:30

Financial Network, Inc. (FNI) was looking to evolve its credit strategy and loan-origination services after more than 30 years in business – but the "astronomical" fees they were paying to use Oracle Database were keeping them from it. They needed to move on, and chose MariaDB to power their growth. Let’s take a look at how (and why) they did it.

 

“We were doing business, we were making money, but we didn’t have money to put back into the business – to make the business thrive, to expand our marketing, to go after new customers, as well as to be competitive with other companies that fill the same little niche that FNI does. MariaDB has allowed us to do that.”
—William Wood, Director of Database Architecture, FNI

 

Here are five main factors that sealed FNI's decision to escape Oracle in favor of MariaDB:

1. ROI. As the quote above shows, FNI needed significantly greater ROI from its database solution in order to grow its business. Wood explains, "When we looked at the price to migrate to MariaDB from a licensing and vendor cost standpoint, it didn't compare. Oracle is out of the ballpark."

2. Scalability. Oracle's onerous $47,500-per-processor pricing model had FNI's hands tied. Wood explains, “We couldn’t afford to upgrade hardware because going from a quad-core processor to the latest and greatest that has 96 cores ... can you imagine the cost of that? Moving to MariaDB, the scalability is there.” Wood sums up MariaDB's enterprise open source model and per-server licensing fees this way: "It's simple. It just makes sense. It's a good way to do business."

3. Encryption of data at rest. "That was the clincher for us," says Wood. Getting a similar feature from Oracle required a pricy add-on, whereas advanced enterprise security is built into MariaDB TX, at no extra cost.

4. Truly supportive support. "You look at these big companies like Oracle and [IBM's] Informix. It's astronomical just to get that license, and then once you're licensed, then you're hit every year after for support," laments Wood. "MariaDB beats them out of the water. The support is great. They're responsive in a very timely manner. I can't say enough good about them."

5. Migration assistance. "You can greatly shorten your migration from any other database to MariaDB by leveraging onsite training – training from a developer standpoint, training from a database administrator standpoint – and we did that, and it helped us to be successful," Wood explains. And now migration is even easier: shortly after FNI took its first MariaDB server live in production, MariaDB TX 3.0 debuted, introducing built-in Oracle-compatibility features.

 

Learn more about FNI's journey from Oracle to MariaDB:

Financial Network, Inc. (FNI) was looking to evolve its credit strategy and loan-origination services after more than 30 years in business – but the "astronomical" fees they were paying to use Oracle Database were keeping them from it. They needed to move on, and chose MariaDB to power their growth. Learn how (and why) they did it.

Login or Register to post comments

by MariaDB Team at September 12, 2018 06:30 PM

Peter Zaitsev

Percona Server for MySQL 5.7.23-23 Is Now Available

Percona Server for MySQL 5.6

Percona Server for MySQL 5.7Percona announces the release of Percona Server for MySQL 5.7.23-23 on September 12, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.7.23, including all the bug fixes in it. Percona Server for MySQL 5.7.23-23 is now the current GA release in the 5.7 series. All of Percona’s software is open-source and free.

New Features
  • The max_binlog_files variable is deprecated, and the binlog_space_limit variable should be used instead of it. The behavior of binlog_space_limit is consistent with the variable relay-log-space-limit used for relay logs; both variables have the same semantics. For more information, see #275.
  • Starting with 5.7.23-23, it is possible to encrypt all data in the InnoDB system tablespace and in the parallel double write buffer. This feature is considered ALPHA quality. A new variable innodb_sys_tablespace_encrypt is introduced to encrypt the system tablespace. The encryption of the parallel double write buffer file is controlled by the variable innodb_parallel_dblwr_encrypt. Both variables are OFF by default. For more information, see #3822.
  • Changing rocksdb_update_cf_options returns any warnings and errors to the client instead of printing them to the server error log. For more information, see #4258.
  • rocksdb_number_stat_computers and rocksdb_rate_limit_delay_millis variables have been removed. For more information, see #4780.
  • A number of new variables were introduced for MyRocks: rocksdb_rows_filtered to show the number of rows filtered out for TTL in MyRocks tables, rocksdb_bulk_load_allow_sk to allow adding secondary keys using the bulk loading feature, rocksdb_error_on_suboptimal_collation toggling warning or error in case of an index creation on a char field where the table has a sub-optimal collation, rocksdb_stats_recalc_rate specifying the number of indexes to recalculate per second, rocksdb_commit_time_batch_for_recovery toggler of writing the commit time write batch into the database, and rocksdb_write_policy specifying when two-phase commit data are actually written into the database.
Bugs Fixed
  • The statement SELECT...ORDER BY produced inconsistent results with the euckr charset or euckr_bin collation. Bug fixed #4513 (upstream #91091).
  • InnoDB statistics could incorrectly report zeros in the slow query log. Bug fixed #3828.
  • With the FIPS mode enabled and performance_schema=off, the instance crashed when running the CREATE VIEW command. Bug fixed #3840.
  • The soft limit of the core file size was set incorrectly starting with PS 5.7.21-20. Bug fixed #4479.
  • The option innodb-optimize-keys could fail when a dumped table has two columns such that the name of one of them contains the other as as a prefix and is defined with the AUTO_INCREMENT attribute. Bug fixed #4524.
  • When innodb_temp_tablespace_encrypt was set to ON the CREATE TABLE command could ignore the value of the ENCRYPTION option. Bug fixed #4565.
  • If FLUSH STATUS was run from a different session, a statement could be counted twice in GLOBAL STATUS. Bug fixed #4570 (upstream #91541).
  • In some cases, it was not possible to set the flush_caches variable on systems that use systemd. Bug fixed #3796.
  • A message in the MyRocks log file did not clearly inform whether fast CRC32 was supported. Bug fixed #3988.
  • mysqld could not be started on Ubuntu if the database recovery had taken longer than ten minutes. Bug fixed #4546 (upstream #91423).
  • The ALTER TABLE command was slow when the number of dirty pages was high. Bug fixed #3702.
  • Setting the global variable version_suffix to NULL could lead to a server crash. Bug fixed #4785.
Other Bugs Fixed
  • #4620 “Enable encryption of temporary tablespace from foreground thread”
  • #4727 “intrinsic temp table behaviour shouldn’t depend on innodb_encrypt_tables”
  • #4046 “Ship assert failure: ‘res == 0’ (bulk loader)”
  • #3851 “Percona Ver 5.6.39-83.1 Failing assertion: sym_node->table != NULL”
  • #4533 “audit_log MTR tests should refer to include files without parent directories”
  • #4619 “main.flush_read_lock fails with timeout in wait_condition.inc.”
  • #4561 “Read after free at Binlog_crypt_data::load_latest_binlog_key()”
  • #4587 “ROCKSDB_INCLUDE_RFR macro in wrong file”

Find the release notes for Percona Server for MySQL 5.7.23-23 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.7.23-23 Is Now Available appeared first on Percona Database Performance Blog.

by Borys Belinsky at September 12, 2018 06:01 PM

Jean-Jerome Schmidt

How to Deploy a Production-Ready MySQL or MariaDB Galera Cluster using ClusterControl

Deploying a database cluster is not rocket science - there are many how-to’s on how to do that. But how do you know what you just deployed is production-ready? Manual deployments can also be tedious and repetitive. Depending on the number of nodes in the cluster, the deployment steps may be time-consuming and error-prone. Configuration management tools like Puppet, Chef and Ansible are popular in deploying infrastructure, but for stateful database clusters, you need to perform significant scripting to handle deployment of the whole database HA stack. Moreover, the chosen template/module/cookbook/role has to be meticulously tested before you can trust it as part of your infrastructure automation. Version changes require the scripts to be updated and tested again.

The good news is that ClusterControl automates deployments of the entire stack - and for free as well! We’ve deployed thousands of production clusters, and take a number of precautions to ensure they are production-ready Different topologies are supported, from master-slave replication to Galera, NDB and InnoDB cluster, with different database proxies on top.

A high availability stack, deployed through ClusterControl, consists of three layers:

  • Database layer (e.g., Galera Cluster)
  • Reverse proxy layer (e.g., HAProxy or ProxySQL)
  • Keepalived layer, which, with use of Virtual IP, ensures high availability of the proxy layer

In this blog, we are going to show you how to deploy a production-grade Galera Cluster complete with load balancers for high availability setup. The complete setup consists of 6 hosts:

  • 1 host - ClusterControl (deployment, monitoring, management server)
  • 3 hosts - MySQL Galera Cluster
  • 2 hosts - Reverse proxies act as load balancers in front of the cluster.

The following diagram illustrates our end result once deployment is complete:

Prerequisites

ClusterControl must reside on an independant node which is not part of the cluster. Download ClusterControl, and the page will generate a license unique for you and show the steps to install ClusterControl:

$ wget -O install-cc https://severalnines.com/scripts/install-cc
$ chmod +x install-cc
$ ./install-cc # as root or sudo user

Follow the instructions where you will be guided with setting up MySQL server, MySQL root password on the ClusterControl node, cmon password for ClusterControl usage and so on. You should get the following line once the installation has completed:

Determining network interfaces. This may take a couple of minutes. Do NOT press any key.
Public/external IP => http://{public_IP}/clustercontrol
Installation successful. If you want to uninstall ClusterControl then run install-cc --uninstall.

Then, on the ClusterControl server, generate an SSH key which we will use to setup the passwordless SSH later on. You can use any user in the system but it must have the ability to perform super-user operations (sudoer). In this example, we picked the root user:

$ whoami
root
$ ssh-keygen -t rsa

Set up passwordless SSH to all nodes that you would like to monitor/manage via ClusterControl. In this case, we will set this up on all nodes in the stack (including ClusterControl node itself). On ClusterControl node, run the following commands and specify the root password when prompted:

$ ssh-copy-id root@192.168.55.160 # clustercontrol
$ ssh-copy-id root@192.168.55.161 # galera1
$ ssh-copy-id root@192.168.55.162 # galera2
$ ssh-copy-id root@192.168.55.163 # galera3
$ ssh-copy-id root@192.168.55.181 # proxy1
$ ssh-copy-id root@192.168.55.182 # proxy2

You can then verify if it's working by running the following command on ClusterControl node:

$ ssh root@192.168.55.161 "ls /root"

Make sure you are able to see the result of the command above without the need to enter password.

Deploying the Cluster

ClusterControl supports all vendors for Galera Cluster (Codership, Percona and MariaDB). There are some minor differences which may influence your decision for choosing the vendor. If you would like to learn about the differences between them, check out our previous blog post - Galera Cluster Comparison - Codership vs Percona vs MariaDB.

For production deployment, a three-node Galera Cluster is the minimum you should have. You can always scale it out later once the cluster is deployed, manually or via ClusterControl. We’ll open our ClusterControl UI at https://192.168.55.160/clustercontrol and create the first admin user. Then, go to the top menu and click Deploy -> MySQL Galera and you will be presented with the following dialog:

There are two steps, the first one is the "General & SSH Settings". Here we need to configure the SSH user that ClusterControl should use to connect to the database nodes, together with the path to the SSH key (as generated under Prerequisite section) as well as the SSH port of the database nodes. ClusterControl presumes all database nodes are configured with the same SSH user, key and port. Next, give the cluster a name, in this case we will use "MySQL Galera Cluster 5.7". This value can be changed later on. Then select the options to instruct ClusterControl to install the required software, disable the firewall and also disable the security enhancement module on the particular Linux distribution. All of these are recommended to be toggled on to maximize the potential of successful deployment.

Click Continue and you will be presented with the following dialog:

In the next step, we need to configure the database servers - vendor, version, datadir, port, etc - which are pretty self-explanatory. "Configuration Template" is the template filename under /usr/share/cmon/templates of the ClusterControl node. "Repository" is how ClusterControl should configure the repository on the database node. By default, it will use the vendor repository and install the latest version provided by the repository. However, in some cases, the user might have a pre-existing repository mirrored from the original repository due to security policy restriction. Nevertheless, ClusterControl supports most of them, as described in the user guide, under Repository.

Lastly, add the IP address or hostname (must be a valid FQDN) of the database nodes. You will see a green tick icon on the left of the node, indicating ClusterControl was able to connect to the node via passwordless SSH. You are now good to go. Click Deploy to start the deployment. This may take 15 to 20 minutes to complete. You can monitor the deployment progress under Activity (top menu) -> Jobs -> Create Cluster:

Once the deployment completed, at this point, our architecture can be illustrated as below:

Deploying the Load Balancers

In Galera Cluster, all nodes are equal - each node holds the same role and same dataset. Therefore, there is no failover within the cluster if a node fails. Only the application side requires failover, to skip the inoperational nodes while the cluster is partitioned. Therefore, it's highly recommended to place load balancers on top of a Galera Cluster to:

  • Unify the multiple database endpoints to a single endpoint (load balancer host or virtual IP address as the endpoint).
  • Balance the database connections between the backend database servers.
  • Perform health checks and only forward the database connections to healthy nodes.
  • Redirect/rewrite/block offending (badly written) queries before they hit the database servers.

There are three main choices of reverse proxies for Galera Cluster - HAProxy, MariaDB MaxScale or ProxySQL - all can be installed and configured automatically by ClusterControl. In this deployment, we picked ProxySQL because it checks all the above plus it understands the MySQL protocol of the backend servers.

In this architecture, we want to use two ProxySQL servers to eliminate any single-point-of-failure (SPOF) to the database tier, which will be tied together using a floating virtual IP address. We’ll explain this in the next section. One node will act as the active proxy and the other one as hot-standby. Whichever node that holds the virtual IP address at a given time is the active node.

To deploy the first ProxySQL server, simply go to the cluster action menu (right-side of the summary bar) and click on Add Load Balancer -> ProxySQL -> Deploy ProxySQL and you will see the following:

Again, most of the fields are self-explanatory. In the "Database User" section, ProxySQL acts as a gateway through which your application connects to the database. The application authenticates against ProxySQL, therefore you have to add all of the users from all the backend MySQL nodes, along with their passwords, into ProxySQL. From ClusterControl, you can either create a new user to be used by the application - you can decide on its name, password, access to which databases are granted and what MySQL privileges that user will have. Such user will be created on both MySQL and ProxySQL side. Second option, more suitable for existing infrastructures, is to use the existing database users. You need to pass username and password, and such user will be created only on ProxySQL.

The last section, "Implicit Transaction", ClusterControl will configure ProxySQL to send all of the traffic to the master if we started transaction with SET autocommit=0. Otherwise, if you use BEGIN or START TRANSACTION to create a transaction, ClusterControl will configure read/write split in the query rules. This is to ensure ProxySQL will handle transactions correctly. If you have no idea how your application does this, you can pick the latter.

Repeat the same configuration for the second ProxySQL node, except the "Server Address" value which is 192.168.55.182. Once done, both nodes will be listed under "Nodes" tab -> ProxySQL where you can monitor and manage them directly from the UI:

At this point, our architecture is now looking like this:

If you would like to learn more about ProxySQL, do check out this tutorial - Database Load Balancing for MySQL and MariaDB with ProxySQL - Tutorial.

Deploying the Virtual IP Address

The final part is the virtual IP address. Without it, our load balancers (reverse proxies) would be the weak link as they would be a single-point of failure - unless the application has the ability to automatically redirect failed database connections to another load balancer. Nevertheless, it's good practice to unify them both using virtual IP address and simplify the connection endpoint to the database layer.

From ClusterControl UI -> Add Load Balancer -> Keepalived -> Deploy Keepalived and select the two ProxySQL hosts that we have deployed:

Also, specify the virtual IP address and the network interface to bind the IP address. The network interface must exist on both ProxySQL nodes. Once deployed, you should see the following green checks in the summary bar of the cluster:

At this point, our architecture can be illustrated as below:

Our database cluster is now ready for production usage. You can import your existing database into it or create a fresh new database. You can use the Schemas and Users Management feature if the trial license hasn't expired.

To understand how ClusterControl configures Keepalived, check out this blog post, How ClusterControl Configures Virtual IP and What to Expect During Failover.

Connecting to the Database Cluster

From the application and client standpoint, they need to connect to 192.168.55.180 on port 6033 which is the virtual IP address floating on top of the load balancers. For example, the Wordpress database configuration will be something like this:

/** The name of the database for WordPress */
define( 'DB_NAME', 'wp_myblog' );

/** MySQL database username */
define( 'DB_USER', 'wp_myblog' );

/** MySQL database password */
define( 'DB_PASSWORD', 'mysecr3t' );

/** MySQL hostname - virtual IP address with ProxySQL load-balanced port*/
define( 'DB_HOST', '192.168.55.180:6033' );

If you would like to access the database cluster directly, bypassing the load balancer, you can just connect to port 3306 of the database hosts. This is usually required by the DBA staff for administration, management, and troubleshooting. With ClusterControl, most of these operations can be performed directly from the user interface.

Final Thoughts

As shown above, deploying a database cluster is no longer a difficult task. Once deployed, there a full suite of free monitoring features as well as commercial features for backup management, failover/recovery and others. Fast deployment of different types of cluster/replication topologies can be useful when evaluating high availability database solutions, and how they fit to your particular environment.

by ashraf at September 12, 2018 09:54 AM

Peter Zaitsev

Announcement: Experimental Build of Percona XtraBackup 8.0

Percona XtraBackup 8.0

Percona XtraBackup 8.0Experimental Build of Percona XtraBackup 8.0 released

An experimental alpha version of Percona XtraBackup 8.0.1 is now available in the Percona experimental software repositories.

A few things to note about this release:

  • We removed the deprecated innobackupex in this release
  • Due to the new MySQL redo log and data dictionary formats the Percona XtraBackup 8.0.x versions will only be compatible with MySQL 8.0.x and the upcoming Percona Server for MySQL 8.0.x
  • For experimental migrations from earlier database server versions, you will need to backup and restore and using XtraBackup 2.4 and then use mysql_upgrade from MySQL 8.0.x

PXB 8.0.1 alpha is available for the following platforms:

  • RHEL/Centos 6.x
  • RHEL/Centos 7.x
  • Ubuntu 14.04 Trusty*
  • Ubuntu 16.04 Xenial
  • Ubuntu 18.04 Bionic
  • Debian 8 Jessie*
  • Debian 9 Stretch

Information on how to configure the Percona repositories for apt and yum systems and access the Percona experimental software is here.

* We might drop these platforms before GA release.

The post Announcement: Experimental Build of Percona XtraBackup 8.0 appeared first on Percona Database Performance Blog.

by David Bennett at September 12, 2018 07:25 AM

September 11, 2018

Peter Zaitsev

How To Deploy PMM on Linode With StackScripts

rebuiild from a StackScript

In my previous blog post, I showed how to deploy Percona Monitoring and Management (PMM) on Linode manually. It is pretty simple, but with a little coding it can be done even more easily using StackScripts

Here’s how:

1. Click on the “Add a Linode” and pick a Linode type you want to deploy.

2. Click on the deployed Linode and then click on the “Rebuild” Link

Rebuild the Linode

3. Click on Deploy Using StackScripts

Deploy using StackScripts

4. On the resulting page search for “PMM” and pick PMMServer from PerconaLab.

Choose PMMServer from PerconaLab

5. Provide the host name for new Linode, pick the root password and click on “Rebuild”

6. Boot the server.

boot the server

7.  You’re done. Wait for about 5 minutes for the installation to complete, then you can see PMM interface by going to this Linode IP

view PMM on Linode IP

If you think that a manual deployment with StackScripts is not much less hassle than doing it manually, you’re right. The real benefit comes with using Linode API for deployment.

There are multiple way to access this API, though for basic scripting I prefer the linode-cli tool for using the Linode API from the command line.

With linode-cli  you can deploy your PMM Server on Linode using this one liner:

linode-cli linodes create --label pmm-test  --root_pass MyRootPassword123 --stackscript_id 338458  --stackscript_data '{"hostname": "pmm-test"}'

Summary

As you can see, with Linode StackScripts you can get going with Percona Monitoring and Management on Linode in no time, especially if you chose to use the Linode API.

You might also like:

Here’s an overview from the Percona Monitoring and Management manual on deploying PMM. If you are new to PMM and would like to know more, you will find lots of resources on this site including my webinar MySQL Troubleshooting and Performance Optimization with PMM.

The post How To Deploy PMM on Linode With StackScripts appeared first on Percona Database Performance Blog.

by Peter Zaitsev at September 11, 2018 02:02 PM

Open Query Pty Ltd

Press Release 2018-09-11: Open Query acquired by Catalyst IT Australia Pty Limited

We are pleased to announce that Open Query, Queensland-based provider of MySQL, MariaDB and related services which just celebrated its 11-th anniversary, has been acquired by Catalyst IT Australia.

Founded in New Zealand in 1997, Catalyst is an experienced and respected Open Source integrator.  Catalyst is looking forward to the opportunity to work with the current Open Query clients as well as with new prospects. Catalyst offers a broad suite of Enterprise services, including support and custom development for Drupal, SilverStripe CMS, Moodle, Samba and other software, as well as fully managed hosting on AWS and other platforms.

“Catalyst’s core values are very much aligned with those of Open Query, which is why we are particularly pleased with this outcome”, notes Arjen Lentz, Founder and Exec.Director of Open Query.

Catalyst IT Australia has offices in Sydney, Melbourne and Brisbane.

Contacts

For Open Query Pty Ltd

Arjen Lentz, Exec.Director
https://openquery.com.au

For Catalyst IT Australia Pty Limited

Andrew Boag, Managing Director
https://www.catalyst-au.net/
Phone (02) 8203 9777

by Arjen Lentz at September 11, 2018 05:00 AM

September 10, 2018

Peter Zaitsev

Using ProxySQL to connect to IPv6-only databases over IPv4

connect to ipv6 database from ipv4 application using proxysql

connect to ipv6 database from ipv4 application using proxysqlIt’s 2018. Maybe now is the time to start migrating your network to IPv6, and your database infrastructure is a great place to start. Unfortunately, many legacy applications don’t offer the option to connect to MySQL directly over IPv6 (sometimes even if passing a hostname). We can work around this by using ProxySQL’s IPv6 support which was added in version 1.3. This will allow us to proxy incoming IPv4 connections to IPv6-only database servers.

Note that by default ProxySQL only listens on IPv4. We don’t recommended changing that until this bug is resolved. The bug causes ProxySQL to segfault frequently if listening on IPv6.

In this example I’ll use centos7-pxc57-1 as my database server. It’s running Percona XtraDB Cluster (PXC) 5.7 on CentOS 7,  which is only accessible over IPv6. This is one node of a three node cluster, but l treat this one node as a standalone server for this example.  One node of a synchronous cluster can be thought of as equivalent to the entire cluster, and vice-versa. Using the PXC plugin for ProxySQL to split reads from writes is the subject of a future blog post.

The application server, centos7-app01, would be running the hypothetical legacy application.

Note: We use default passwords throughout this example. You should always change the default passwords.

We have changed the IPv6 address in these examples. Any resemblance to real IPv6 addresses, living or dead, is purely coincidental.

  • 2a01:5f8:261:748c::74 is the IPv6 address of the ProxySQL server
  • 2a01:5f8:261:748c::71 is the Percona XtraDB node

Step 1: Install ProxySQL for your distribution

Packages are available here but in this case I’m going to use the version provided by the Percona yum repository:

[...]
Installed:
proxysql.x86_64 0:1.4.9-1.1.el7
Complete!

Step 2: Configure ProxySQL to listen on IPv4 TCP port 3306 by editing /etc/proxysql.cnf and starting it

[root@centos7-app1 ~]# vim /etc/proxysql.cnf
[root@centos7-app1 ~]# grep interfaces /etc/proxysql.cnf
interfaces="127.0.0.1:3306"
[root@centos7-app1 ~]# systemctl start proxysql

Step 3: Configure ACLs on the destination database server to allow ProxySQL to connect over IPv6

mysql> GRANT SELECT on sys.* to 'monitor'@'2a01:5f8:261:748c::74' IDENTIFIED BY 'monitor';
Query OK, 0 rows affected, 1 warning (0.25 sec)
mysql> GRANT ALL ON legacyapp.* TO 'legacyappuser'@'2a01:5f8:261:748c::74' IDENTIFIED BY 'super_secure_password';
Query OK, 0 rows affected, 1 warning (0.25 sec)

Step 4: Add the IPv6 address of the destination server to ProxySQL and add users

We need to configure the IPv6 server as a mysql_server inside ProxySQL. We also need to add a user to ProxySQL as it will reuse these credentials when connecting to the backend server. We’ll do this by connecting to the admin interface of ProxySQL on port 6032:

[root@centos7-app1 ~]# mysql -h127.0.0.1 -P6032 -uadmin -padmin
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.
Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.5.30 (ProxySQL Admin Module)
Copyright (c) 2009-2018 Percona LLC and/or its affiliates
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> INSERT INTO mysql_servers(hostgroup_id,hostname,port) VALUES (1,'2a01:5f8:261:748c::71',3306);
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO mysql_users(username, password, default_hostgroup) VALUES ('legacyappuser', 'super_secure_password', 1);
Query OK, 1 row affected (0.00 sec)
mysql> LOAD MYSQL USERS TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)
mysql> SAVE MYSQL USERS TO DISK;
Query OK, 0 rows affected (0.27 sec)
mysql> LOAD MYSQL SERVERS TO RUNTIME;
Query OK, 0 rows affected (0.01 sec)
mysql> SAVE MYSQL SERVERS TO DISK;
Query OK, 0 rows affected (0.30 sec)
mysql> LOAD MYSQL VARIABLES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)
mysql> SAVE MYSQL VARIABLES TO DISK;
Query OK, 95 rows affected (0.12 sec)

Step 5: Configure your application to connect to ProxySQL over IPv4 on localhost4 (IPv4 localhost)

This is application specific and so not shown here, but I’d configure my application to use localhost4 as this is in /etc/hosts by default and points to 127.0.0.1 and not ::1

Step 6: Verify

As I don’t have the application here, I’ll verify with mysql-client. Remember that ProxySQL is listening on 127.0.0.1 port 3306, so we connect via ProxySQL on IPv4 (the usage of 127.0.0.1 rather than a hostname is just to show this explicitly):

[root@centos7-app1 ~]# mysql -h127.0.0.1 -ulegacyappuser -psuper_secure_password
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql> SELECT host FROM information_schema.processlist WHERE ID=connection_id();
+-----------------------------+
| host                        |
+-----------------------------+
| 2a01:5f8:261:748c::74:57546 |
+-----------------------------+
1 row in set (0.00 sec)
mysql> CREATE TABLE legacyapp.legacy_test_table(id int);
Query OK, 0 rows affected (0.83 sec)

The query above shows the remote host (from MySQL’s point of view) for the current connection. As you can see, MySQL sees this connection established over IPv6. So to recap, we connected to MySQL on an IPv4 IP address (127.0.0.1) and were successfully proxied to a backend IPv6 server.

The post Using ProxySQL to connect to IPv6-only databases over IPv4 appeared first on Percona Database Performance Blog.

by James Lawrie at September 10, 2018 10:30 AM

Oli Sennhauser

Schulung MariaDB/MySQL für Fortgeschrittene vom 12.-16. 11. in Köln

In unserer FromDual Schulung MariaDB/MySQL für Fortgeschrittene vom 12. bis 16. November in Köln hat es zur Zeit noch 3 Plätze frei.

Anmelden können Sie sich direkt bei unserem Schulungspartner GfU in Köln.

Den Schulungsinhalt finden Sie auf der FromDual Webseite.

Wenn Sie spezifische in Haus Schulungen oder eine auf Sie zugeschnittene MariaDB oder MySQL Beratung benötigen, nehmen Sie mit uns Kontakt auf.

Taxonomy upgrade extras: 

by Shinguz at September 10, 2018 09:57 AM

September 08, 2018

MariaDB Foundation

MariaDB 10.1.36 and MariaDB Connector/C 2.3.7, Connector/J 2.3.0 and Connector/ODBC 2.0.18 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.1.36, the latest stable release in the MariaDB 10.1 series, as well as MariaDB Connector/C 2.3.7, MariaDB Connector/J 2.3.0 and MariaDB Connector/ODBC 2.0.18, the latest stable MariaDB Connector releases. See the release notes and changelogs for details. Download MariaDB 10.1.36 Release Notes Changelog What […]

The post MariaDB 10.1.36 and MariaDB Connector/C 2.3.7, Connector/J 2.3.0 and Connector/ODBC 2.0.18 now available appeared first on MariaDB.org.

by Ian Gilfillan at September 08, 2018 09:57 PM

Peter Zaitsev

Percona Monitoring and Management (PMM) 1.14.1 Is Now Available

Percona Monitoring and Management

Percona Monitoring and Management

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL® and MongoDB® performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL® and MongoDB® servers to ensure that your data works as efficiently as possible.

We’re releasing hotfix 1.14.1 to address two issues found post-release of 1.14.0:

  • PMM-2963: Upgrading to PMM 1.14.0 fails due to attempting to create already existing Dashboard
    • Our upgrade script incorrectly tried to create dashboards that already existed, and generating failure message:
      A folder or dashboard in the general folder with the same name already exists
  • PMM-2958: Grafana did not update to 5.1 when upgrading from versions older than 1.11
    • We identified a niche case where PMM installations that were upgraded from < 1.11 would fail to upgrade Grafana to correct release 5.1 (Users were left on Grafana 5.0)

Help us improve our software quality by reporting any Percona Monitoring and Management bugs you encounter using our bug tracking system.

The post Percona Monitoring and Management (PMM) 1.14.1 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at September 08, 2018 03:51 PM

September 07, 2018

Peter Zaitsev

Upcoming Webinar Tues 9/11: Migrating to AWS Aurora: A Checklist for Success

Migrating to AWS Aurora: A Checklist for Success

Migrating to AWS Aurora: A Checklist for Success

Please join Percona’s Senior Consultant, Jervin Real, as he presents Migrating to AWS Aurora: A Checklist for Success. The event will take place on Tuesday, September 11th, 2018, at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

 

In the last few weeks, we have shown you how to successfully migrate from on-premise MySQL installations to AWS Aurora. What comes next is how to successfully ensure that your Aurora cluster performs and operates as you expect it to.

While Aurora’s hands-off operational approach ensures agile practices remain agile; there are also trade-offs and subsequent growing pains.

This webinar will discuss the points on how to remain flexible and in full control of your data while using AWS Aurora.

Register for this webinar on how to make your Aurora migration a success.

The post Upcoming Webinar Tues 9/11: Migrating to AWS Aurora: A Checklist for Success appeared first on Percona Database Performance Blog.

by Jervin Real at September 07, 2018 10:20 PM

Setting up Streaming Replication in PostgreSQL

postgres streaming replication

Configuring replication between two databases is considered to be a best strategy towards achieving high availability during disasters and provides fault tolerance against unexpected failures. PostgreSQL satisfies this requirement through streaming replication. We shall talk about another option called logical replication and logical decoding in our future blog post.

Streaming replication works on log shipping. Every transaction in postgres is written to a transaction log called WAL (write-ahead log) to achieve durability. A slave uses these WAL segments to continuously replicate changes from its master.

There exists three mandatory processes –

wal sender
 ,
wal receiver
 and
startup
 process, these play a major role in achieving streaming replication in postgres.

A

wal sender
 process runs on a master, whereas the
wal receiver
 and
startup
 processes runs on its slave. When you start the replication, a
wal receiver
 process sends the LSN (Log Sequence Number) up until when the WAL data has been replayed on a slave, to the master. And then the
wal sender
 process on master sends the WAL data until the latest LSN starting from the LSN sent by the
wal receiver
, to the slave.
Wal receiver
 writes the WAL data sent by
wal sender
 to WAL segments. It is the
startup
 process on slave that replays the data written to WAL segment. And then the streaming replication begins.

Note: Log Sequence Number, or LSN, is a pointer to a location in the WAL.

Steps to setup streaming replication between a master and one slave

Step 1:

Create the user in master using whichever slave should connect for streaming the WALs. This user must have REPLICATION ROLE.

CREATE USER replicator
WITH REPLICATION
ENCRYPTED PASSWORD 'replicator';

Step 2:

The following parameters on the master are considered as mandatory when setting up streaming replication.

  • archive_mode : Must be set to ON to enable archiving of WALs.
  • wal_level : Must be at least set to hot_standby  until version 9.5 or replica  in the later versions.
  • max_wal_senders : Must be set to 3 if you are starting with one slave. For every slave, you may add 2 wal senders.
  • wal_keep_segments : Set the WAL retention in pg_xlog (until PostgreSQL 9.x) and pg_wal (from PostgreSQL 10). Every WAL requires 16MB of space unless you have explicitly modified the WAL segment size. You may start with 100 or more depending on the space and the amount of WAL that could be generated during a backup.
  • archive_command : This parameter takes a shell command or external programs. It can be a simple copy command to copy the WAL segments to another location or a script that has the logic to archive the WALs to S3 or a remote backup server.
  • listen_addresses : Set it to * or the range of IP Addresses that need to be whitelisted to connect to your master PostgreSQL server. Your slave IP should be whitelisted too, else, the slave cannot connect to the master to replicate/replay WALs.
  • hot_standby : Must be set to ON on standby/replica and has no effect on the master. However, when you setup your replication, parameters set on the master are automatically copied. This parameter is important to enable READS on slave. Otherwise, you cannot run your SELECT queries against slave.

The above parameters can be set on the master using these commands followed by a restart:

ALTER SYSTEM SET wal_level TO 'hot_standby';
ALTER SYSTEM SET archive_mode TO 'ON';
ALTER SYSTEM SET max_wal_senders TO '5';
ALTER SYSTEM SET wal_keep_segments TO '10';
ALTER SYSTEM SET listen_addresses TO '*';
ALTER SYSTEM SET hot_standby TO 'ON';
ALTER SYSTEM SET archive_command TO 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f';

$ pg_ctl -D $PGDATA restart -mf

Step 3:

Add an entry to

pg_hba.conf
 of the master to allow replication connections from the slave. The default location of
pg_hba.conf
 is the data directory. However, you may modify the location of this file in the file 
postgresql.conf
. In Ubuntu/Debian, pg_hba.conf may be located in the same directory as the postgresql.conf file by default. You can get the location of postgresql.conf in Ubuntu/Debian by calling an OS command =>
pg_lsclusters
.

host replication replicator 192.168.0.28/32 md5

The IP address mentioned in this line must match the IP address of your slave server. Please change the IP accordingly.

In order to get the changes into effect, issue a SIGHUP:

$ pg_ctl -D $PGDATA reload
Or
$ psql -U postgres -p 5432 -c "select pg_reload_conf()"

Step 4:

pg_basebackup
 helps us to stream the data through the 
wal sender
 process from the master to a slave to set up replication. You can also take a tar format backup from master and copy that to the slave server. You can read more about tar format pg_basebackup here

The following step can be used to stream data directory from master to slave. This step can be performed on a slave.

$ pg_basebackup -h 192.168.0.28 -U replicator -p 5432 -D $PGDATA -P -Xs -R

Please replace the IP address with your master’s IP address.

In the above command, you see an optional argument -R. When you pass -R, it automatically creates a

recovery.conf
  file that contains the role of the DB instance and the details of its master. It is mandatory to create the recovery.conf file on the slave in order to set up a streaming replication. If you are not using the backup type mentioned above, and choose to take a tar format backup on master that can be copied to slave, you must create this recovery.conf file manually. Here are the contents of the recovery.conf file:

$ cat $PGDATA/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=192.168.0.28 port=5432 user=replicator password=replicator'
restore_command = 'cp /path/to/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'

In the above file, the role of the server is defined by standby_mode.

standby_mode
  must be set to ON for slaves in postgres.
And to stream WAL data, details of the master server are configured using the parameter
primary_conninfo
 .

The two parameters

standby_mode
  and
primary_conninfo
 are automatically created when you use the optional argument -R while taking a pg_basebackup. This recovery.conf file must exist in the data directory($PGDATA) of Slave.

Step 5:

Start your slave once the backup and restore are completed.

If you have configured a backup (remotely) using the streaming method mentioned in Step 4, it just copies all the files and directories to the data directory of the slave. Which means it is both a back up of the master data directory and also provides for restore in a single step.

If you have taken a tar back up from the master and shipped it to the slave, you must unzip/untar the back up to the slave data directory, followed by creating a recovery.conf as mentioned in the previous step. Once done, you may proceed to start your PostgreSQL instance on the slave using the following command.

$ pg_ctl -D $PGDATA start

Step 6:

In a production environment, it is always advisable to have the parameter

restore_command
set appropriately. This parameter takes a shell command (or a script) that can be used to fetch the WAL needed by a slave, if the WAL is not available on the master.

For example:

If a network issue has caused a slave to lag behind the master for a substantial time, it is less likely to have those WALs required by the slave available on the master’s

pg_xlog
 or
pg_wal
 location. Hence, it is sensible to archive the WALs to a safe location, and to have the commands that are needed to restore the WAL set to restore_command parameter in the recovery.conf file of your slave. To achieve that, you have to add a line similar to the next example to your recovery.conf file in slave. You may substitute the cp command with a shell command/script or a copy command that helps the slave get the appropriate WALs from the archive location.

restore_command = 'cp /mnt/server/archivedir/%f "%p"'

Setting the above parameter requires a restart and cannot be done online.

Final step: validate that replication is setup

As discussed earlier, a

wal sender
  and a
wal receiver
  process are started on the master and the slave after setting up replication. Check for these processes on both master and slave using the following commands.

On Master
==========
$ ps -eaf | grep sender
On Slave
==========
$ ps -eaf | grep receiver
$ ps -eaf | grep startup

You must see those all three processes running on master and slave as you see in the following example log.

On Master
=========
$ ps -eaf | grep sender
postgres  1287  1268  0 10:40 ?        00:00:00 postgres: wal sender process replicator 192.168.0.28(36924) streaming 0/50000D68
On Slave
=========
$ ps -eaf | egrep "receiver|startup"
postgres  1251  1249  0 10:40 ?        00:00:00 postgres: startup process   recovering 000000010000000000000050
postgres  1255  1249  0 10:40 ?        00:00:04 postgres: wal receiver process   streaming 0/50000D68

You can see more details by querying the master’s pg_stat_replication view.

$ psql
postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 1287
usesysid         | 24615
usename          | replicator
application_name | walreceiver
client_addr      | 192.168.0.28
client_hostname  |
client_port      | 36924
backend_start    | 2018-09-07 10:40:48.074496-04
backend_xmin     |
state            | streaming
sent_lsn         | 0/50000D68
write_lsn        | 0/50000D68
flush_lsn        | 0/50000D68
replay_lsn       | 0/50000D68
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async

Reference : https://www.postgresql.org/docs/10/static/warm-standby.html#STANDBY-SERVER-SETUP

If you found this post interesting…

Did you know that Percona now provides PostgreSQL support services? If you’d like to read more about this, here’s some more information. We’re here to help.

The post Setting up Streaming Replication in PostgreSQL appeared first on Percona Database Performance Blog.

by Avinash Vallarapu at September 07, 2018 06:00 PM

Taboola and Percona Host a Meetup in Tel Aviv, Oct 3

Percona Taboola Meetup in Tel Aviv

Percona Taboola Meetup in Tel AvivFrom time to time, Percona hosts and co-hosts events around the world, often at no cost for attendees. The common focus for these events is, of course, open source databases. On October 3 2018, we team up with Taboola for an evening of talks suitable for all levels of expertise, addressing both the how and the why of choosing a database technology. The event is to be held at the Taboola offices in Tel Aviv. Here’s the schedule:

  • 6:00 pm Welcome with food & drink served
  • 6:45 pm How to pick database technologies that cover your requirements (Dimitri Vanoverbeke, Percona)
  • 7:20 pm Lessons from the field, extremely heavy loads on Percona Server for MySQL (Ariel Pisetzky, Taboola)
  • 7:40 pm Migrating to the cloud, the why and the how. (Dimitri Vanoverbeke, Percona)
  • 8:00 pm Event close

If you would like to join us, please sign up here so that we can be sure to cater for you.

Meanwhile, if you have not yet bought your tickets for Percona Live Europe from November 5 – 7 in Frankfurt, then hurry… I have extended Early Bird pricing one week. We’ll be publishing the exciting, high quality schedule soon so watch out for it! (But don’t miss the Early Bird…)

Finally, if you would like to be sure to not miss our future events, please subscribe to our newsletter–we don’t necessarily blog about every opportunity.

The post Taboola and Percona Host a Meetup in Tel Aviv, Oct 3 appeared first on Percona Database Performance Blog.

by Bronwyn Campbell at September 07, 2018 12:38 PM

MariaDB Foundation

Tencent Games becomes a Gold Sponsor of the MariaDB Foundation

The MariaDB Foundation today announces that Tencent Games – online game developer and operator by Tencent Interactive Entertainment – has become a Gold Sponsor of the MariaDB Foundation. As one of the world’s largest video game companies, Tencent Games shows the way to the ceaselessly-growing video game industry by supporting the open source development of MariaDB. […]

The post Tencent Games becomes a Gold Sponsor of the MariaDB Foundation appeared first on MariaDB.org.

by Otto Kekäläinen at September 07, 2018 06:00 AM

September 06, 2018

Peter Zaitsev

MongoDB: Investigate Queries with explain() and Index Usage (part 2)

MongoDB explain() method optimization tool

MongoDB explain() method optimization toolThis is the second part of a two parts series. In MongoDB: index usage and MongoDB explain() we introduced the main index types supported by MongoDB, and how to create and use them. In this second article, we are going to see some examples on how to use explain() method to investigate queries. Do you need to optimize MongoDB queries? You’ll see how to use explain() to find out how your query uses indexes. Or, perhaps, that it doesn’t!

What is explain()

explain() is a method that you can apply to simple queries or to cursors to investigate the query execution plan. The execution plan is how MongoDB resolves a query. Looking at all the information returned by explain() we can see find out stuff like:

  • how many documents were scanned
  • how many documents were returned
  • which index was used
  • how long the query took to be executed
  • which alternative execution plans were evaluated

…and other useful information.

The aim of using explain() is to find out how to improve the query. For example, by creating missing indexes or by rewriting it in order to use existing indexes more correctly. If you are familiar with the EXPLAIN command in MySQL, the goals of MongoDB’s explain() method are exactly the same.

Explainable object

You can apply the explain() method to a query or a cursor in the following way, as per any other method:

MongoDB > db.mycollection.find().explain()

However, the preferred way to investigate queries in the mongo shell is to create first the explainable object.

We can create an explainable object like this:

MongoDB > var myexp = db.mycollection.explain()

Once you have created the explainable object, then any kind of operation can be run against it to investigate a query or cursor execution plan. For example:

MongoDB > myexp.find()
MongoDB > myexp.update()
MongoDB > myexp.remove()
MongoDB > myexp.aggregate()

Restaurants test database

To see some examples we need a collection with some data.
For our purposes we can use the New York restaurants database. You can download this from the following url:

https://www.w3resource.com/mongodb-exercises/retaurants.zip

Unzip the archive, and import the JSON file into MongoDB:

$ unzip retaurants.zip
$ mongoimport --host 127.0.0.1 -d test -c restaurants --file retaurants.json

This collection has 3772 documents, all the restaurants in New York City. Here is a document sample.

MongoDB > use test
switched to db test
MongoDB > db.restaurants.find().pretty().limit(1)
{
	"_id" : ObjectId("5b71b3281979e24aa18c0121"),
	"address" : {
		"building" : "1007",
		"coord" : [
			-73.856077,
			40.848447
		],
		"street" : "Morris Park Ave",
		"zipcode" : "10462"
	},
	"borough" : "Bronx",
	"cuisine" : "Bakery",
	"grades" : [
		{
			"date" : ISODate("2014-03-03T00:00:00Z"),
			"grade" : "A",
			"score" : 2
		},
		{
			"date" : ISODate("2013-09-11T00:00:00Z"),
			"grade" : "A",
			"score" : 6
		},
		{
			"date" : ISODate("2013-01-24T00:00:00Z"),
			"grade" : "A",
			"score" : 10
		},
		{
			"date" : ISODate("2011-11-23T00:00:00Z"),
			"grade" : "A",
			"score" : 9
		},
		{
			"date" : ISODate("2011-03-10T00:00:00Z"),
			"grade" : "B",
			"score" : 14
		}
	],
	"name" : "Morris Park Bake Shop",
	"restaurant_id" : "30075445"
}

Explain() verbosity

The explain() method has three verbosity modes.

  • queryPlanner – this is the default mode. At this level, explain provides information about the winning plan, including the index used or if a collection scan is needed (COLLSCAN)
  • executionStats – this mode includes all the information provided by the queryPlanner, plus the statistics. Statistics include details such as the number of documents examined and returned, the execution time in milliseconds, and so on.
  • allPlansExecution – this mode includes all the information provided by the executionStats plus information about the discarded execution plans

We’ll see the explain() output in the following examples.

Example 1

It’s time to use the restaurants collection to run our first example: find out all the restaurants in the Manhattan borough.

Let’s create first the explainable object with the executionStats mode.

MongoDB > var exp = db.restaurants.explain("executionStats")

Then let’s investigate the query.

MongoDB > exp.find( { borough: "Manhattan"} )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"borough" : {
				"$eq" : "Manhattan"
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"borough" : {
					"$eq" : "Manhattan"
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1883,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 3772,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"borough" : {
					"$eq" : "Manhattan"
				}
			},
			"nReturned" : 1883,
			"executionTimeMillisEstimate" : 0,
			"works" : 3774,
			"advanced" : 1883,
			"needTime" : 1890,
			"needYield" : 0,
			"saveState" : 29,
			"restoreState" : 29,
			"isEOF" : 1,
			"invalidates" : 0,
			"direction" : "forward",
			"docsExamined" : 3772
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Here we can see the output of the explain(). First of all, we can clearly distinguish the “queryPlanner” and the “executionStats” modes. We won’t describe each one of the values, as some are really intuitive. Let’s have a look just at some of them:

queryPlanner.winningPlan.stage = “COLLSCAN”
This provides very important information about the winning plan: it means that MongoDB needs to do a collection scan. The query it’s not optimized because all the documents must be read.

queryPlanner.winningPlan.rejectedPlans = []
It’s empty. There aren’t rejected plans. When the query needs to be executed with COLLSCAN the only execution plan is the winning plan. We don’t have any indexes in the collection, apart the one on _id, so there aren’t other execution plans.

executionStats.nReturned = 1883
The number of documents returned is 1883, the number of restaurants located in Manhattan.

executionStats.totalDocsExamined = 3772
The number of documents examined is exactly the number of documents in the collection. This was expected because the query uses COLLSCAN

executionStats.executionTimeMillis = 1
The execution time of the query. It’s just 1 millisecond. This might seem good, but remember that this is the time needed to scan just 3772 documents, a very small test collection. Think about what this time could be in the case of a collection with millions of documents!

How can we improve the query?

In this case it’s simple. Let’s try to create a single field index on borough, the only condition we have in the find(). Then let’s try to explain the same query again.

MongoDB > db.restaurants.createIndex( {borough: 1} )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
MongoDB > exp.find( { borough: "Manhattan"} )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"borough" : {
				"$eq" : "Manhattan"
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"borough" : 1
				},
				"indexName" : "borough_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"borough" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"borough" : [
						"[\"Manhattan\", \"Manhattan\"]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1883,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 1883,
		"totalDocsExamined" : 1883,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 1883,
			"executionTimeMillisEstimate" : 0,
			"works" : 1884,
			"advanced" : 1883,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 14,
			"restoreState" : 14,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 1883,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 1883,
				"executionTimeMillisEstimate" : 0,
				"works" : 1884,
				"advanced" : 1883,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 14,
				"restoreState" : 14,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"borough" : 1
				},
				"indexName" : "borough_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"borough" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"borough" : [
						"[\"Manhattan\", \"Manhattan\"]"
					]
				},
				"keysExamined" : 1883,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Interpreting the output

Now, the output is completely different. Let’s have a look at some of the most relevant values.

queryPlanner.winningPlan.inputStage.stage = “IXSCAN”
This is very important. IXSCAN means that now MongoDB doesn’t need to do a collection scan but an index can be used to find the documents.

queryPlanner.winningPlan.inputStage.indexName = “borough_1”
The name of the index used. This is the default name of an index: the name of the field plus the _1 for ascending or _-1 for descending order.

queryPlanner.winningPlan.inputStage.direction = “forward”
MongoDB traverses the index in a forward direction.

executionStats.nRertuned = 1883
The number of documents returned. Obviously this is the same as before.

executionStats.totalKeysExamined = 1883
The number of keys examined in the index.

executionStats.totalDocsExamined  = 1883
Now the number of documents examined corresponds to the number of elements examined in the index.

We have optimized the query.

Example 2

Now we would like to examine a query to find out all the restaurants with Italian cuisine that received a grade score of greater than 50.

MongoDB > var exp = db.restaurants.explain()
MongoDB > exp.find({$and: [  {"cuisine" : {$eq :"Italian"}},  {"grades.score" : {$gt : 50}}  ]  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"cuisine" : {
						"$eq" : "Italian"
					}
				},
				{
					"grades.score" : {
						"$gt" : 50
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"cuisine" : {
							"$eq" : "Italian"
						}
					},
					{
						"grades.score" : {
							"$gt" : 50
						}
					}
				]
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

We have a COLLSCAN again. Let’s try to improve the query creating an index on the cuisine field.

MongoDB > var exp = db.restaurants.explain("executionStats")
MongoDB > db.restaurants.createIndex({cuisine:1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 2,
	"numIndexesAfter" : 3,
	"ok" : 1
}
MongoDB > exp.find({$and: [  {"cuisine" : {$eq :"Italian"}},  {"grades.score" : {$gt : 50}}  ]  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"cuisine" : {
						"$eq" : "Italian"
					}
				},
				{
					"grades.score" : {
						"$gt" : 50
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"grades.score" : {
					"$gt" : 50
				}
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1
				},
				"indexName" : "cuisine_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"cuisine" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 6,
		"executionTimeMillis" : 4,
		"totalKeysExamined" : 325,
		"totalDocsExamined" : 325,
		"executionStages" : {
			"stage" : "FETCH",
			"filter" : {
				"grades.score" : {
					"$gt" : 50
				}
			},
			"nReturned" : 6,
			"executionTimeMillisEstimate" : 0,
			"works" : 326,
			"advanced" : 6,
			"needTime" : 319,
			"needYield" : 0,
			"saveState" : 2,
			"restoreState" : 2,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 325,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 325,
				"executionTimeMillisEstimate" : 0,
				"works" : 326,
				"advanced" : 325,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 2,
				"restoreState" : 2,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"cuisine" : 1
				},
				"indexName" : "cuisine_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"cuisine" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					]
				},
				"keysExamined" : 325,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

The query has improved. The created index cuisine_1 is used, but still we have 325 documents examined with only 6 documents returned. Let’s see if we can do better by creating instead a compound index that uses both the fields in the condition: cuisine and grades.score.

MongoDB > db.restaurants.createIndex({cuisine:1, "grades.score":1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 3,
	"numIndexesAfter" : 4,
	"ok" : 1
}
MongoDB > exp.find({$and: [  {"cuisine" : {$eq :"Italian"}},  {"grades.score" : {$gt : 50}}  ]  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"cuisine" : {
						"$eq" : "Italian"
					}
				},
				{
					"grades.score" : {
						"$gt" : 50
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1,
					"grades.score" : 1
				},
				"indexName" : "cuisine_1_grades.score_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"cuisine" : [ ],
					"grades.score" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					],
					"grades.score" : [
						"(50.0, inf.0]"
					]
				}
			}
		},
		"rejectedPlans" : [
			{
				"stage" : "FETCH",
				"filter" : {
					"grades.score" : {
						"$gt" : 50
					}
				},
				"inputStage" : {
					"stage" : "IXSCAN",
					"keyPattern" : {
						"cuisine" : 1
					},
					"indexName" : "cuisine_1",
					"isMultiKey" : false,
					"multiKeyPaths" : {
						"cuisine" : [ ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"cuisine" : [
							"[\"Italian\", \"Italian\"]"
						]
					}
				}
			}
		]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 6,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 7,
		"totalDocsExamined" : 6,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 6,
			"executionTimeMillisEstimate" : 0,
			"works" : 9,
			"advanced" : 6,
			"needTime" : 1,
			"needYield" : 0,
			"saveState" : 0,
			"restoreState" : 0,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 6,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 6,
				"executionTimeMillisEstimate" : 0,
				"works" : 8,
				"advanced" : 6,
				"needTime" : 1,
				"needYield" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"cuisine" : 1,
					"grades.score" : 1
				},
				"indexName" : "cuisine_1_grades.score_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"cuisine" : [ ],
					"grades.score" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					],
					"grades.score" : [
						"(50.0, inf.0]"
					]
				},
				"keysExamined" : 7,
				"seeks" : 1,
				"dupsTested" : 7,
				"dupsDropped" : 1,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Now, the winning plan uses the new compound index cuisine_1_grades.score_1 and we have only 6 documents examined. Please note also that now we have a rejected plan, the one that uses the single field index cuisine_1 previously created.

We have optimized the query.

Example 3

Let’s find out all the restaurants that don’t prepare any “American” cuisine in Brooklyn and achieved a grade of ‘A’. We want to see the  results ordered by cuisine, descending.

At this point you should be a little familiar with the explain() output, so in the next box we truncate it for the sake of simplicity, leaving only the relevant parts.

MongoDB > var exp.restaurants.explain("executionStats")
MongoDB > exp.find( {"cuisine" : {$ne : "American "},
... "grades.grade" :"A",
... "borough": "Brooklyn"}).sort({"cuisine":-1})
{
	"queryPlanner" : {
...
...
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"borough" : {
							"$eq" : "Brooklyn"
						}
					},
					{
						"grades.grade" : {
							"$eq" : "A"
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1
				},
				"indexName" : "cuisine_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"cuisine" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "backward",
				"indexBounds" : {
					"cuisine" : [
						"[MaxKey, \"American \")",
						"(\"American \", MinKey]"
					]
				}
			}
		},
...
...
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 493,
		"executionTimeMillis" : 9,
		"totalKeysExamined" : 2518,
		"totalDocsExamined" : 2517,
...
...
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Looking at the winning plan, we see that we have IXSCAN on the index cuisine_1. The important thing to notice here is the choice of the index on cuisine field—it’s because we have used sort({cuisine:-1}). There is not a SORT stage because the documents are already extracted using the index, and so they are already sorted. Just notice, too, direction:”backward” —this is because we specified descending order in the query. If we try to execute a slightly different query, changing the sorting to name:1 instead of cuisine:-1 we’ll see a completely different winning plan.

MongoDB > exp.find( {"cuisine" : {$ne : "American "},
... "grades.grade" :"A",
... "borough": "Brooklyn"}).sort({"name":1})
...
		"winningPlan" : {
			"stage" : "SORT",
			"sortPattern" : {
				"name" : -1
			},
			"inputStage" : {
				"stage" : "SORT_KEY_GENERATOR",
				"inputStage" : {
					"stage" : "FETCH",
					"filter" : {
						"$and" : [
							{
								"grades.grade" : {
									"$eq" : "A"
								}
							},
							{
								"$nor" : [
									{
										"cuisine" : {
											"$eq" : "American "
										}
									}
								]
							}
						]
					},
					"inputStage" : {
						"stage" : "IXSCAN",
						"keyPattern" : {
							"borough" : 1
						},
						"indexName" : "borough_1",
						"isMultiKey" : false,
						"multiKeyPaths" : {
							"borough" : [ ]
						},
						"isUnique" : false,
						"isSparse" : false,
						"isPartial" : false,
						"indexVersion" : 2,
						"direction" : "forward",
						"indexBounds" : {
							"borough" : [
								"[\"Brooklyn\", \"Brooklyn\"]"
							]
						}
					}
				}
			}
		},
...
...
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 493,
		"executionTimeMillis" : 13,
		"totalKeysExamined" : 684,
		"totalDocsExamined" : 684,
...
...

In this case, we have fewer documents examined but since the cuisine_1 index cannot be used, a SORT stage is needed, and the index used to fetch the document is borough_1. While MongoDB has examined fewer documents, the execution time is worse because of the extra stage used to sort the documents.

Let’s return now to the original query. We can also notice that the number of documents examined is still too high (2517) compared to the documents returned (493). That’s not optimal. Let’s see if we can further improve the query by adding another compound index on (cuisine,borough,grades.grade).

MongoDB > db.restaurants.createIndex({cuisine:1,borough:1,"grades.grade":1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 4,
	"numIndexesAfter" : 5,
	"ok" : 1
}
MongoDB > exp.find( {"cuisine" : {$ne : "American "},
... "grades.grade" :"A",
... "borough": "Brooklyn"}).sort({"cuisine":-1})
...
...		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$nor" : [
					{
						"cuisine" : {
							"$eq" : "American "
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1,
					"borough" : 1,
					"grades.grade" : 1
				},
				"indexName" : "cuisine_1_borough_1_grades.grade_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"cuisine" : [ ],
					"borough" : [ ],
					"grades.grade" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "backward",
				"indexBounds" : {
					"cuisine" : [
						"[MaxKey, \"American \")",
						"(\"American \", MinKey]"
					],
					"borough" : [
						"[\"Brooklyn\", \"Brooklyn\"]"
					],
					"grades.grade" : [
						"[\"A\", \"A\"]"
					]
				}
			}
		},
...
...
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 493,
		"executionTimeMillis" : 6,
		"totalKeysExamined" : 591,
		"totalDocsExamined" : 493,
...
...

Now, MongoDB uses the new index and does not need the extra sorting stage. The number of documents examined is the same as the number of documents returned. Also, the execution time is better.

We have optimized the query.

Example 4

This is the final example. Let’s find out the restaurants where the grades array contains a grade of ‘A’ and a score of 9 for a specific date.

MongoDB > exp.find({"grades.date": ISODate("2014-08-11T00:00:00Z"),
... "grades.grade":"A" ,
... "grades.score" : 9})
{
	"queryPlanner" : {
...
...
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"grades.date" : {
							"$eq" : ISODate("2014-08-11T00:00:00Z")
						}
					},
					{
						"grades.grade" : {
							"$eq" : "A"
						}
					},
					{
						"grades.score" : {
							"$eq" : 9
						}
					}
				]
			},
			"direction" : "forward"
		},
...

A COLLSCAN again. In the query we notice that all the conditions refer to embedded fields in an array object. So let’s try to create a multikey index. Let’s just a create the index on the date field only and see what happens.

MongoDB > db.restaurants.createIndex({"grades.date":1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 5,
	"numIndexesAfter" : 6,
	"ok" : 1
}
MongoDB > exp.find({"grades.date": ISODate("2014-08-11T00:00:00Z"),
... "grades.grade":"A" ,
... "grades.score" : 9})
{
	"queryPlanner" : {
...
...
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"grades.grade" : {
							"$eq" : "A"
						}
					},
					{
						"grades.score" : {
							"$eq" : 9
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"grades.date" : 1
				},
				"indexName" : "grades.date_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"grades.date" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"grades.date" : [
						"[new Date(1407715200000), new Date(1407715200000)]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 15,
		"executionTimeMillis" : 0,
		"totalKeysExamined" : 22,
		"totalDocsExamined" : 22,
...
...

MongoDB uses the index and the winning plan is already good enough. You can try as an exercise to create a compound index including other embedded fields of the array, like {“grades.date”:1, “grades.grade”:1, “grades.score”:1} and see what happens. You will probably see that the index we have created only on date is good enough. Enlarging the compound index will generate only rejected plans. This is because the date field is the most selective.

Hint: when dealing with compound indexes, please remember that the order of the fields is important. The first field should be the most selective one, and the last one should be the less selective. Or, in cases where you don’t need to put lot of fields in the index, the most selective ones are probably all that you need to improve the queries.

Conclusion

That’s all folks. In this two part series we have seen the indexes available in MongoDB and how to use explain() to investigate queries and find out how to improve their performance. We don’t expect now that you’ll know everything, but we hope this will be a good a starting point for you to practice MongoDB query optimization using explain(). You’ll find plenty more information in the manual and on the internet about indexes and explain(), if you are motivated to find out more.

While you are here

If you found this article interesting, you might like to check out some of our other resources. For example, you might find this recorded webinar MongoDB Sharding 101 by my colleague Adamo Tonete to be useful. Or perhaps Tim Vaillancourt’s MongoDB backup and recovery field guide would provide some useful information. If you want to catch our webinars as they happen, please subscribe to receive notifications in good time.

The post MongoDB: Investigate Queries with explain() and Index Usage (part 2) appeared first on Percona Database Performance Blog.

by Corrado Pandiani at September 06, 2018 05:15 PM

MariaDB Foundation

Percona Becomes Bronze Sponsor of MariaDB Foundation

The MariaDB Foundation is happy to announce that Percona, one of the oldest and most well known open source database support and services companies, has joined as a sponsor of the MariaDB Foundation. Percona’s ongoing commitment to the open source database community is reflected in several ways, including hosting the Percona Live Open Source Database Conference, […]

The post Percona Becomes Bronze Sponsor of MariaDB Foundation appeared first on MariaDB.org.

by Otto Kekäläinen at September 06, 2018 08:00 AM

Peter Zaitsev

Percona Becomes a Sponsor of the MariaDB Foundation

MariaDB Foundation Bronze Sponsor


We are excited to announce we have joined the 
MariaDB Foundation as a Bronze Sponsor!MariaDB Foundation Bronze Sponsor

Percona’s mission is to be an unbiased champion of open source databases. We pride ourselves on helping customers avoid vendor lock-in by supporting them across a range of database technologies and deployment strategies. Sponsoring the MariaDB Foundation is a great way for us to show our support for the open source database community and highlight our single-source expertise in multi-vendor environments that eliminates lock-in, increases agility and enables business growth for our customers.

Open source continues to be an essential part of business strategy, with organizations deploying different open source databases in their production environments to meet various application requirements. Mixed environments can create new and interesting issues for businesses looking to deploy business-critical applications, whether they are on-premises, in the cloud or in hybrid environments. MariaDB remains a popular choice among enterprises, and is used by around 40% of our customers.

Percona has participated in the MariaDB community for many years. We have an advanced understanding of MariaDB’s unique features, performance and capabilities, and can support MariaDB installations wherever they are located (on-premises, in the cloud or using DBaaS providers such as Amazon RDS). At Percona, MariaDB Support is provided 24x7x365 by some of the world’s top MariaDB experts. These experts have helped thousands of customers resolve nearly every conceivable MariaDB issue. These solutions include deploying scale-out and high-availability solutions like ProxySQL, MariaDB with Galera Cluster, MHA, and more. With Percona services, organizations can increase uptime, reduce costs and obtain increased ROI for their MariaDB installations.

Percona demonstrates our commitment to the open source database community in a variety of ways, including hosting the Percona Live Open Source Database Conference, participating in meet-ups and webinars, and partnering with other key players in the open source database arena, such as AWS, Google, Mesosphere and Microsoft. Our sponsorship of the MariaDB Foundation helps them support continuity and open collaboration in the MariaDB ecosystem, drives adoption and serves an ever-growing community of users and developers.

The post Percona Becomes a Sponsor of the MariaDB Foundation appeared first on Percona Database Performance Blog.

by Laurie Coffin at September 06, 2018 06:55 AM

September 05, 2018

Peter Zaitsev

Percona Monitoring and Management (PMM) 1.14.0 Is Now Available

Percona Monitoring and Management

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL® and MongoDB® performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL® and MongoDB® servers to ensure that your data works as efficiently as possible.



Percona Monitoring and Management

We’ve included a plethora of visual improvements in this release, including:

  • PostgreSQL Metrics Collection – Visualize PostgreSQL performance!
  • Identify New Queries in Query Analytics
  • New Dashboard: Compare System Parameters
  • New Dashboard: PERFORMANCE_SCHEMA Wait Events Analysis
  • Dashboard Updates – Advanced Data Exploration, MyRocks, TokuDB, InnoDB Metrics
  • Disable SSL between Prometheus and Exporters
  • Dashboards grouped by Folder – We’ve organized the Dashboard drop-down to present a cleaner interface

We addressed 16 new features and improvements, and fixed 20 bugs.

PostgreSQL Metrics Collection

The PMM team is very proud to bring you native support for PostgreSQL! We’ve shipped a new dashboard called PostgreSQL Overview, and we now provide the ability to add PostgreSQL instances as native, first-class citizens as part of PMM. This means you can add PostgreSQL + Linux monitoring capabilities through the standard pmm-admin add postgresql syntax, see our documentation links for more details!

../_images/1.14.0-1.png

Identify New Queries in Query Analytics

A long-awaited feature is the ability to visually identify new queries that have appeared in Query Analytics – those queries who’s first seen time is within the selected time range. New queries will be highlighted in a soft blue band for quick identification, and we’ve provided a button called First Seen which you can toggle to display only those newly seen queries. A common use case for this feature is potentially during code release / deployments, where you want to review which new queries have been deployed and to review their performance characteristics.

../_images/1.14.0-2.jpg

New Dashboard: Compare System Parameters

We’ve introduced a new dashboard to let you compare System Parameters across multiple servers so at a glance you can understand provisioning or configuration differences. This might be of help when comparing a pool of identical slaves or other logical groups of instances.

../_images/1.14.0-3.jpg

New Dashboard: PERFORMANCE_SCHEMA Wait Events Analysis

We’ve added a new dashboard that lets you drill down into great detail on one or several PERFORMANCE_SCHEMA wait event categories in order to visualize them over time.

../_images/1.14.0-4.jpg

Dashboards grouped by Folder

At long last we’ve addressed the sprawl of the long list of 30+ Dashboards, and grouped them into categories which match the pre-existing right-side navigation system. This should leave you with a more organized, less cluttered list of Dashboards.

../_images/1.14.0-5.jpg

Dashboard Updates – Advanced Data Exploration, MyRocks, TokuDB, InnoDB Metrics

We’ve improved four dashboards with minor but helpful improvements:

  • Advanced Data Exploration dashboard with the addition of a graph element plotting the Metric Rates, which will help you understand the scraping efficiency of this metric series, or whether scrapes have failed / are failing.
  • InnoDB Metrics to present the graph elements in two columns – previously we’d inconsistently use three columns or two columns, making it hard to visualize trends across graphs.
  • MyRocks formulas were improved to be more precise
  • TokuDB has many new graphs to expand our coverage of this storage engine

Disable SSL between PMM Server and Exporters

Lastly, we’ve delivered on a feature request from a Percona Customer to optionally disable SSL between PMM Server and Exporters, with the advantage that if you do not need encrypted traffic for your metric series, you can reduce the CPU overhead on PMM Server. We’d love to hear your feedback on this feature!

pmm-admin add mysql --disable-ssl ...

New Features & Improvements

  • PMM-1362: Update descriptions on MySQL InnoDB Metrics (Advanced) Dashboard – thanks to Yves Trudeau
  • PMM-2304: New Dashboard: Compare System Parameters
  • PMM-2331: Advanced Data Exploration: add graph for showing exporter scrapers over time intervals
  • PMM-2356: Grouping dashboards in folders with Grafana5
  • PMM-2472: Identify new queries in QAN
  • PMM-2486: Allow the disabling of SSL by means of an option – thanks to Dongchan Sung
  • PMM-2597: Improve MyRocks dashboard – thanks to Przemek Malkowski for the valuable ideas
  • PMM-2704: PostgreSQL Metrics Collection
  • PMM-2772: Display InnoDB Metrics dashboard using consistent two column view
  • PMM-2775: Display PERFORMANCE_SCHEMA Wait Events Analysis
  • PMM-2769: Display TokuDB Dashboard Improvements
  • PMM-2797: MySQL Performance Schema – Filter HOSTS
  • PMM-2798: Filter hosts on NUMA dashboard
  • PMM-2833: Added granularity interval for scraping AWS API – thanks to Aleksandr Stepanov
  • PMM-2846: Increase MySQL Max Connections in PMM Server

Fixed Bugs

  • PMM-946: QAN sparklines drop to zero when data is not available
  • PMM-1987: pt-archiver rule for agent_log is not correct – thanks to Yves Trudeau for providing a fix
  • PMM-2013: Styling of QAN allows overlapping content
  • PMM-2028: nginx shows “414 Request-URI Too Large” for 150 hosts – thanks to Nickolay Ihalainen for the bug report and fix
  • PMM-2166: Add RDS instance page refresh will head to “Page Not Found” error
  • PMM-2457: Improve External Exporter help documentation for duration interval
  • PMM-2459: Cross-Graph Crosshair not enabled on the PXC/Galera Cluster
  • PMM-2477: Frequent Access Denied prompts while using AWS Marketplace image
  • PMM-2566: CPU busy graph shows incorrect values
  • PMM-2763: Unknown version is available on Update widget
  • PMM-2784: What’s new link on Update widget has wrong URL
  • PMM-2793: Network Overview needs to be in OS menu, not insights
  • PMM-2796: Overview NUMA Metrics dashboard should be renamed to NUMA Overview
  • PMM-2801: Prometheus Exporters Overview – CPU metrics are strange
  • PMM-2804: Prometheus Graph is empty with PMM 1.13
  • PMM-2811: SQL to get Hosts in QAN – thanks to Forums member Fan
  • PMM-2821: Clean local storage if status is “You are up to date” and use animation for refresh button
  • PMM-2828: Weird Latency Graphs
  • PMM-2841: Change memory defaults for Prometheus 1.8 and use additional environment variable
  • PMM-2856: RDS/Aurora disk related graphs are empty
  • PMM-2885: System Overview dashboard has incorrect values

Help us improve our software quality by reporting any Percona Monitoring and Management bugs you encounter using our bug tracking system.

The post Percona Monitoring and Management (PMM) 1.14.0 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at September 05, 2018 08:23 PM

Jean-Jerome Schmidt

New Webinar - Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL and MongoDB

Monitoring is essential for operations teams to ensure that databases are up and running. However, as databases are increasingly being deployed in distributed topologies based on replication or clustering, what does it mean to our monitoring infrastructure? Is it ok to monitor individual components of a database cluster, or do we need a more holistic systems approach? Can we rely on SELECT 1 as health check when determining whether a database is up or down? Do we need high-resolution time-series charts of status counters? Are there ways to predict problems before they actually become one?

In this webinar, we will discuss how to effectively monitor distributed database clusters or replication setups. We’ll look at different types of monitoring infrastructures, from on-prem to cloud and from agent-based to agentless. Then we’ll dive into the different monitoring features available in the free ClusterControl Community Edition - from time-series charts of metrics, dashboards, and queries to performance advisors.

If you would like to centralize the monitoring of your open source databases and achieve this at zero cost, please join us on September 25!

Date, Time & Registration

Europe/MEA/APAC

Tuesday, September 25th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)

Register Now

North America/LatAm

Tuesday, September 25th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)

Register Now

Agenda

  • Requirements for monitoring distributed database systems
  • Cloud-based vs On-prem monitoring solutions
  • Agent-based vs Agentless monitoring
  • Deep-dive into ClusterControl Community Edition
    • Architecture
    • Metrics Collection
    • Trending
    • Dashboards
    • Queries
    • Performance Advisors
    • Other features available to Community users

Speaker

Bartlomiej Oles is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.

We look forward to “seeing” you there!

by jj at September 05, 2018 01:30 PM

Peter Zaitsev

Modifying List of Collected Metrics on PMM Linux Exporter

Changing metrics collection with PMM for Linux

Changing metrics collection with PMM for LinuxDo you need to modify the metrics collected from Linux by Percona Monitoring and Management (PMM)? In this blog post we will see how to enable, disable, and update collected metrics on PMM’s linux:metrics exporter. 

We will assume that the PMM client packages are installed, and they are configured already.

Using a custom list of metrics

Let’s now suppose we are not yet collecting any metrics on our desired client server, and we want to enable only the following: diskstats, meminfo, netdev and vmstat. We can use the following command:

pmm-admin add linux:metrics -- -collectors.enabled=diskstats,meminfo,netdev,vmstat

So, in order to enable or disable the functionality we want, we need to modify the collectors.enabled list accordingly. In the following online documentation page, we are able to find all collectors supported:

https://www.percona.com/doc/percona-monitoring-and-management/section.exporter.node.html

In this way, by adding or removing items from the collectors.enabled list, we can choose which functionality will be set in our PMM linux:metrics collectors.

Checking list of metrics used

We can use the ps aux command to check the collectors list that applies at any given time. Let’s see this in practical terms. After running the previous command, we should expect to see the following:

shell> ps aux | grep node_exporter
...
root     20450 3.2  0.0 1660248 15924 ?       Sl 13:48 0:13 /usr/local/percona/pmm-client/node_exporter -collectors.enabled=diskstats,filefd,filesystem,loadavg,meminfo,netdev,netstat,stat,time,uname,vmstat,meminfo_numa -web.listen-address=10.10.8.141:42000 -web.auth-file=/usr/local/percona/pmm-client/pmm.yml -web.ssl-cert-file=/usr/local/percona/pmm-client/server.crt -web.ssl-key-file=/usr/local/percona/pmm-client/server.key -collectors.enabled=diskstats,meminfo,netdev,vmstat

Note: there is currently a bug where you will see duplicate entries when using custom collectors. We are tracking this under issue PMM-2857. For now, if you see two different lists on ps aux outputs, the one that applies is the last one (as seen above).

Updating the list of metrics

There is currently no way to enable (or disable) metrics dynamically (however, we are working on it). We will need to stop the collector, manually get and edit the metrics list, and re-add it with the updated collectors in place. For this, we will need to know the current list of metrics used from the ps aux outputs.

Continuing with the same example, suppose we are not happy with what netdev metrics offer, and we wish to disable them to avoid unnecessary overhead. Knowing the metrics list was:

-collectors.enabled=diskstats,meminfo,netdev,vmstat

We will need to run the following commands to have it removed from the collectors list:

pmm-admin stop linux:metrics
pmm-admin rm linux:metrics
pmm-admin add linux:metrics -- -collectors.enabled=diskstats,meminfo,vmstat

Links for reference

  • Passing options to exporter: here and here
  • Collector options for linux:metrics exporter: here

Did you find this post useful?

You might also enjoy some of our other resources. Check out some of these recent technical webinars on using PMM, presented by my colleagues from Percona:

The post Modifying List of Collected Metrics on PMM Linux Exporter appeared first on Percona Database Performance Blog.

by Agustín at September 05, 2018 01:15 PM

September 04, 2018

Jean-Jerome Schmidt

How ClusterControl Monitors your Database Servers and Clusters Agentlessly

ClusterControl’s agentless approach allows sysadmins and DBAs to monitoring their databases without having to install agent software on each monitored system. Monitoring is implemented using a remote data collector that uses the SSH protocol.

But first, let’s clarify the scope and meaning of monitoring within our context here. Monitoring comes after data trending - the metrics collection and storing process - which allows the monitoring system to process the collected data to produce justification for tuning, alerting, as well as displaying trending data for reporting.

Generally, ClusterControl performs its monitoring, alerting and trending duties by using the following three ways:

  • SSH - Host metrics collection (process, load balancers stats, resource usage and consumption etc) using SSH library.
  • Database client - Database metrics collection (status, queries, variables, usage etc) using the respective database client library.
  • Advisor - Mini programs written using ClusterControl DSL and running within ClusterControl itself, for monitoring, tuning and alerting purposes.

Some description of the above - SSH stands for Secure Shell, a secure network protocol which is used by most of the Linux-based servers for remote administration. ClusterControl Controller, or CMON, is the backend service performing automation, management, monitoring and scheduling tasks, built on top of C++.

ClusterControl DSL (Domain Specific Language) allows you to extend the functionality of your ClusterControl platform by creating Advisors, Auto Tuners, or "Mini Programs". The DSL syntax is based on JavaScript, with extensions to provide access to ClusterControl internal data structures and functions. The DSL allows you to execute SQL statements, run shell commands/programs across all your cluster hosts, and retrieve results to be processed for advisors/alerts or any other actions.

Monitoring Tools

All of the prerequisite tools will be met by the installer script or will be automatically installed by ClusterControl during the database deployment stage, or if the required file/binary/package does not exist on the target server before executing a job. Generally speaking, ClusterControl monitoring duty only requires OpenSSH server package on the monitored hosts. ClusterControl uses libssh client library to collect host metrics for the monitored hosts - CPU, memory, disk, network, IO, process, etc. OpenSSH client package is required on the ClusterControl host only for setting up passwordless SSH and debugging purposes. Other SSH implementations like Dropbear and TinySSH are not supported.

When gathering the database stats and metrics, ClusterControl Controller (CMON) connects to the database server directly via database client libraries - libmysqlclient (MySQL/MariaDB and ProxySQL), libpq (PostgreSQL) and libmongocxx (MongoDB). That is why it's crucial to setup proper privileges for ClusterControl server from database servers perspective. For MySQL-based clusters, ClusterControl requires database user "cmon" while for other databases, any username can be used for monitoring, as long as it is granted with super-user privileges. Most of the time, ClusterControl will setup the required privileges (or use the specified database user) automatically during the cluster import or cluster deployment stage.

For load balancers, ClusterControl requires the following tools:

  • Maxadmin on the MariaDB MaxScale server.
  • netcat and/or socat on the HAProxy server to connect to HAProxy socket file and retrieve the monitoring data.
  • ProxySQL requires mysql client on the ProxySQL server.

The following diagram illustrates both host and database monitoring processes executed by ClusterControl using libssh and database client libraries:

Although monitoring threads do not need database client packages to be installed on the monitored host, it's highly recommended to have them for management purposes. For example, MySQL client package comes with mysql, mysqldump, mysqlbinlog and mysqladmin programs which will be used by ClusterControl when performing backups and point-in-time recovery.

Monitoring Methods

For host and load balancer stats collection, ClusterControl executes this task via SSH with super-user privilege. Therefore, passwordless SSH with super-user privilege is vital, to allow ClusterControl to run the necessary commands remotely with proper escalation. With this pull approach, there are a couple of advantages as compared to other mechanisms:

  • Agentless - There is no need for agent to be installed, configured and maintained.
  • Unifying the management and monitoring configuration - SSH can be used to pull monitoring metrics or push management jobs on the target nodes.
  • Simplify the deployment - The only requirement is proper passwordless SSH setup and that's it. SSH is also very secure and encrypted.
  • Centralized setup - One ClusterControl server can manage multiple servers and clusters, provided it has sufficient resources.

However, there are also drawbacks with the pull mechanism:

  • The monitoring data is accurate only from ClusterControl perspective. For example, if there is a network glitch and ClusterControl loses communication to the monitored host, the sampling will be skipped until the next available cycle.
  • For high granularity monitoring, there will be network overhead due to increase sampling rate where ClusterControl needs to establish more connections to every target hosts.
  • ClusterControl will keep on attempting to re-establish connection to the target node, because it has no agent to do this on its behalf.
  • Redundant data sampling if you have more than one ClusterControl server monitoring a cluster, since each ClusterControl server has to pull the monitoring data for itself.

For MySQL query monitoring, ClusterControl monitors the queries in two different ways:

  1. Queries are retrieved from PERFORMANCE_SCHEMA, by querying the schema on the database node via SSH.
  2. If PERFORMANCE_SCHEMA is disabled or unavailable, ClusterControl will parse the content of the Slow Query Log via SSH.

If Performance Schema is enabled, ClusterControl will use it to look for the slow queries. Otherwise, ClusterControl will parse the content of the MySQL slow query log (via slow_query_log=ON dynamic variable) based on the following flow:

  1. Start slow log (during MySQL runtime).
  2. Run it for a short period of time (a second or couple of seconds).
  3. Stop log.
  4. Parse log.
  5. Truncate log (new log file).
  6. Go to 1.

The collected queries are hashed, calculated and digested (normalize, average, count, sort) and then stored in ClusterControl CMON database. Take note that for this sampling method, there is a slight chance some queries will not be captured, especially during “stop log, parse log, truncate log” parts. You can enable Performance Schema if this is not an option.

If you are using the Slow Query log, only queries that exceed the Long Query Time will be listed here. If the data is not populated correctly and you believe that there should be something in there, it could be:

  • ClusterControl did not collect enough queries to summarize and populate data. Try to lower the Long Query Time.
  • You have configured Slow Query Log configuration options in the my.cnf of MySQL server, and Override Local Query is turned off. If you really want to use the value you defined inside my.cnf, probably you have to lower the long_query_time value so ClusterControl can calculate a more accurate result.
  • You have another ClusterControl node pulling the Slow Query log as well (in case you have a standby ClusterControl server). Only allow one ClusterControl server to do this job.

For more details (including how to enable the PERFORMANCE_SCHEMA), see this blog post, How to use the ClusterControl Query Monitor for MySQL, MariaDB and Percona Server.

For PostgreSQL query monitoring, ClusterControl requires pg_stat_statements module, to track execution statistics of all SQL statements. It populates the pg_stat_statements views and functions when displaying the queries in the UI (under Query Monitor tab).

Intervals and Timeouts

ClusterControl Controller (cmon) is a multi-threaded process. By default, ClusterControl Controller sampling thread connects to each monitored host once and maintain persistent connection until the host drops or disconnects it when sampling host stats. It may establish more connections depending on the jobs assigned to the host, since most of the management jobs run in their own thread. For example, cluster recovery runs on the recovery thread, Advisor execution runs on a cron-thread, as well as process monitoring which runs on process collector thread.

ClusterControl monitoring thread performs the following sampling operations in the following interval:

  • MySQL query/status metrics: every second
  • Process collection (/proc): every 10 seconds
  • Server detection: every 10 seconds
  • Host metrics (/proc, /sys): every 30 seconds (configurable via host_stats_collection_interval)
  • Database metrics (PostgreSQL and MongoDB only): every 30 seconds (configurable via db_stats_collection_interval)
  • Database schema metrics: every 3 hours (configurable via db_schema_stats_collection_interval)
  • Load balancer metrics: every 15 seconds (configurable via lb_stats_collection_interval)

The imperative scripts (Advisors) can make use of SSH and database client libraries that come with CMON with the following restrictions:

  • 5 seconds of hard limit for SSH execution,
  • 10 seconds of default limit for database connection, configurable via net_read_timeout, net_write_timeout, connect_timeout in CMON configuration file,
  • 60 seconds of total script execution time limit before CMON ungracefully aborts it.

Advisors can be created, compiled, tested and scheduled directly from ClusterControl UI, under Manage -> Developer Studio. The following screenshot shows an example of an Advisor to extract top 10 queries from PERFORMANCE_SCHEMA:

The execution of advisors is depending if it is activated and the scheduling time in cron format:

The results of the execution are displayed under Performance -> Advisors, as shown in the following screenshot:

For more information on what Advisors being provided by default, check out our Developer Studio product page.

For short-interval monitoring data like MySQL queries and status, data are stored directly into CMON database. While for long-interval monitoring data like weekly/monthly/yearly data points are aggregated every 60 seconds and stored in memory for 10 minutes. These behaviours are not configurable due to the architecture design.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Parameters

There are plethora of parameters you can configure for ClusterControl to suit your monitoring and alerting policy. Most of them are configurable through ClusterControl UI -> pick a cluster -> Settings. The "Settings" tab provide many options to configure alerts, thresholds, notifications, graphs layout, database counters, query monitoring and so on. For example, warning and critical thresholds can be configured as follows:

There are also "Runtime Configuration" page, a summarized list of the active ClusterControl Controller (CMON) runtime configuration parameters:

There are more than 170 ClusterControl Controller configuration options in total and some of the advanced settings can be configured to finely tune your monitoring and alerting policy. To list out some of them:

  • monitor_cpu_temperature
  • swap_warning
  • swap_critical
  • redobuffer_warning
  • redobuffer_critical
  • indexmemory_warning
  • indexmemory_critical
  • datamemory_warning
  • datamemory_critical
  • tablespace_warning
  • tablespace_critical
  • redolog_warning
  • redolog_critical
  • max_replication_lag
  • long_query_time
  • log_queries_not_using_indexes
  • query_monitor_use_local_settings
  • enable_query_monitor
  • enable_query_monitor_auto_purge_ps

The parameters listed in the "Runtime Configuration" page can be changed either by using the UI or CMON configuration file located at /etc/cmon.d/cmon_X.cnf, where X is the cluster ID. You can list out all of the supported configuration options for CMON by using the following command:

$ cmon --help-config

The same output is also available in the documentation page, ClusterControl Controller Configuration Options.

Final Thoughts

We hope this blog has given you a good understanding of how ClusterControl monitors your database servers and clusters agentlessly. We’ll be shortly announcing some significant new features in the next version of ClusterControl so stay tuned!

by ashraf at September 04, 2018 02:33 PM

Peter Zaitsev

Upcoming Webinar Thurs 9/6: Running MySQL in Kubernetes

Running MySQL in Kubernetes

Running MySQL in KubernetesPlease join Percona’s Chief Technology Officer, Vadim Tkachenko, as he presents Running MySQL in Kubernetes on Thursday, September 6th, 2018, at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

 

Without question, Kubernetes is the most popular container orchestration platform. But can it handle databases? I think so, and in this webinar, I will show you how it does it. This talk presents a quick overview of the persistent storage options (the most critical part of data storage) and then what it takes to run highly available MySQL deployments with backup and recovery options.

By the end of the webinar, you will understand if Kubertenes is suitable for handling your databases. Moreover, you’ll understand how to approach using it if it is.

Register for this webinar on running MySQL in Kubernetes.

The post Upcoming Webinar Thurs 9/6: Running MySQL in Kubernetes appeared first on Percona Database Performance Blog.

by Vadim Tkachenko at September 04, 2018 12:36 PM

MongoDB: index usage and MongoDB explain() (part 1)

MongoDB index types

MongoDB index typesIn this two part series, I’m going to explain explain. No, the repetition is not a pun or a typo. I’m talking about explain(), the most useful MongoDB feature to investigate a query. Here, I’ll introduce the index types available in MongoDB, their properties and how to create and use them. In the next article we’ll see explain() in action on some real examples.

Using explain(), questions like these will find a proper answer:

  • Is the query using an index?
  • How is it using the index?
  • How many documents are scanned?
  • How many documents are returned?
  • For how many milliseconds does the query run?

And many others. The explain() command provides a lot of information.

If you are familiar with MySQL, then you probably know about the command EXPLAIN. In MongoDB we have a similar feature, and the goal of using it is exactly the same: find out how we can improve a query in order to reduce the execution time as far as possible. Based on the information provided, we can decide to create a new index or to rewrite the query to take advantage of indexes that already exist. The same as you do when using MySQL.

Indexes supported by MongoDB

The concept of an index in MongoDB is the same as in relational databases. An index is generally a small structure—in comparison to the collection size—that provides a better way to access documents more quickly. Without an index, the only way that MongoDB has to retrieve the documents is to do a collection scan: reading sequentially all the documents in the collection. This is exactly the same as the full scan table in MySQL. Another similarity to MySQL is that the indexes in MongoDB have a structure based on the well known B-Tree. To understand explain() you’ll need to understand the index structures.

Let’s have an overview of the index types available in MongoDB, and their features and properties. We focus, in particular, on the single, compound and multikey index types. These are by far the most used and most useful in the majority of cases. We’ll present also the other types, but when using the explain(), in the second part of this article series, we’ll use only single field and compound indexes in the examples.

Single Field Indexes

This is an index built on a single field of the documents. The entries could be a single value, such as a string or a number, but also could be an embedded document.

By default each collection has a single field index automatically created on the _id field, the primary key.

The index can be defined in ascending or descending order.

Let’s see some examples.

Assume we have a collection people containing the following type of document:

{
  "_id": 1,
  "person": { name: "John", surname: "Brown" },
  "age": 34,
  "city": "New York"
}

We can define, for example, a single field index on the age field.

db.people.createIndex( { age : 1} )

In this case we have created an ascending index. If we had wanted to create a descending index we would have used this syntax:

db.people.createIndex( {age : -1} )

With this kind of index we can improve all the queries that find documents with a condition an the age field, like the following:

db.people.find( { age : 20 } ) 
db.people.find( { name : "Antony", age : 30 } )
db.people.find( { age : { $gt : 25} } )

It’s interesting to highlight that the index can be used to improve even the sorting of the results. In this case, it doesn’t matter whether you have defined the index as ascending or descending. MongoDB can traverse the items in the index in both directions.

db.people.find().sort( { age: 1} )
db.people.find().sort( {age : -1} )

Both these queries use the index to retrieve all of the documents in the specified order.

The next query can use the index to retrieve the documents in the order specified by the sort.

db.people.find( { age : { $lt : 25 } } ).sort( { age : -1} )

Indexes on embedded documents

We can even define an index on an embedded document.

db.people.createIndex( { person: 1 } )

In this case, each item in the index is the embedded document as a whole. The index can be used when the condition in the query matches exactly the embedded document. This query can retrieve the document using the index:

db.people.find( { person : {name : "John", surname: "Brown" } } )

but the next two examples are not able to use the index, and MongoDB will do a collection scan:

db.people.find( { person : {surname : "Brown", name: "John" } } )
db.prople.find( person.name: "John", person.surname: "Brown" )

More useful is to create indexes on embedded fields rather than on embedded documents. For doing this we can use the dot notation.

db.people.createIndex( { "person.name": 1} )
db.people.createIndex( { "person.surname": 1 } )

We have created two separate ascending indexes on the name and surname embedded fields. Now, queries like the following can rely on the new indexes to be resolved.

db.people.find( { "person.name": "John" } )
db.people.find( { "person.surname": "Brown, "person.name": "John" } )

Compound indexes

A compound index is an index on multiple fields. Using the same people collection we can create a compound index combining the city and age field.

db.people.createIndex( {city: 1, age: 1, person.surname: 1  } )

In this case, we have created a compound index where the first entry is the value of city field; the second is the value of the age field; and the third is the person.name. All the fields here are defined in ascending order.

Queries such as the following can benefit from the index:

db.people.find( { city: "Miami", age: { $gt: 50 } } )
db.people.find( { city: "Boston" } )
db.people.find( { city: "Atlanta", age: {$lt: 25}, "person.surname": "Green" } )

The order we define the fields is important. In fact, only the left prefix of the index can be used to retrieve the documents. In the examples above the index can be used because the fields in the find() are always a left-prefix of the index definition. The following queries will not use the index because the field city is not specified in the query.  A collection scan is required to solve them.

db.people.find( { age: 20 } )
db.people.find( { age: { $gt: 40 }, "person.surname": "Brown" } )
db.people.find( { "person.surname": "Green" }

The ascending or descending order is more important in the case of compound indexes. In fact we need to pay attention when defining the order in the index because not all the queries can rely on it, in particular when using the sort() function.

Assume we have the compound index defined as follow:

db.people.createIndex( {city: 1, age: -1} )

The specified sort direction in the documents, when use sort(), must match the same pattern of the index definition or the inverse pattern. Any other pattern will not be supported by the index.

So, the following queries will use the index for sorting:

db.people.find( ).sort( city: 1, age: -1)
db.people.find( ).sort( city: -1, age: 1)

The following queries will not use the index:

db.people.find( ).sort( { city: 1, age: 1} )
db.people.find( ).sort( { city: -1, age: -1} )

The following queries will use the index both for retreiving the documents and for sorting.

db.people.find( { city: "New York" } ).sort( city: 1, age: -1)
db.people.find( { city: "Miami", age:{ $gt: 50 } } ).sort( city: -1, age: 1)

They use the left prefix and the pattern of the sort() stage matches exactly the index pattern or the inverse pattern.

Multikey indexes

This is the index type for arrays. When creating an index on an array, MongoDB will create an index entry for every element.

Let’s assume we have these documents:

{
   "_id": 1,
   "person": { name: "John", surname: "Brown" },
   "age": 34,
   "city": "New York",
   "hobbies": [ "music", "gardening", "skiing" ]
 }

The multikey index can be created as:

db.people.createIndex( { hobbies: 1} )

Queries such as these next examples will use the index:

db.people.find( { hobbies: "music" } )
db.people.find( { hobbies: "music", hobbies: "gardening" } )

Geospatial and text indexes

MongoDB supports specialized indexes also for geospatial data and for text searches. They are out of scope of the goal of this article, but it is worth mentioning them. We’ll look at these in a future post maybe.

Index options

TTL

The acronym stands for Time To Live. This is a special option that we can apply only to a single field index to permit the automatic deletion of documents after a certain time.

During index creation we can define an expiration time. After that time, all the documents that are older than the expiration time will be removed from the collection. This kind of feature is very useful when we are dealing with data that don’t need to persist in the database. A good example of this is session data.

Let’s see how to define the TTL option on the sessionlog collection.

db.sessionlog.createIndex( { "lastUpdateTime": 1 }, { expireAfterSeconds: 1800 } )

In this case, MongoDB will drop the documents from the collection automatically once half an hour (1800 seconds) has passed since the value in lastUpdateTime field.

MongoDB runs a background process every 60 seconds to drop the expired documents. So, this means that the real deletion of a document can be applied with a short delay, not precisely after expireAfterSeconds.

There are some restrictions:

  • as already mentioned only single field indexes can have the TTL option
  • the _id single field index cannot support the TTL option
  • the indexed field must be a date type
  • a capped collection cannot have a TTL index

In a replica set, documents are deleted only in the primary node; the background process works only on the primary node, with deletions correctly replicated as per any other event.

Partial indexes

A partial index is an index that contains only a subset of the values based on a filter rule. They are useful in cases where:

  • the index size can be reduced
  • we want to index the most relevant and used values in the query conditions
  • we want to index the most selective values of a field

Here’s how to create a partial index using the clause partialFilterExpression.

db.people.createIndex(
   { "city": 1, "person.surname": 1 },
   { partialFilterExpression: { age : { $lt: 30 } } }
)

We have created a compound index on city and person.surname but only for the documents with age less then 30.

In order for the partial index to be used the queries must contain a condition on the age field.

db.people.find( { city: "New Tork", age: { $eq: 20} } )

The following queries cannot use the partial index because the results would be incomplete. In these examples, MongoDB will use a collection scan:

db.people.find( { city: "Miami" } )
db.people.find( { city: "Orlando", age : { $gt: 25 } } )

Sparse indexes

Sparse indexes are a subset of partial indexes. A sparse index only contains elements for the documents that have the indexed field, even if it is null.

Since MongoDB is a schemaless database, the documents in a collection can have different fields, so an indexed field may not be present in some of them.

To create such an index use the sparse option:

db.people.createIndex( { city: 1 }, { sparse: true } )

In this case we are assuming there could be documents in the collection with the field city missing.

Regular indexes—without the sparse option—contain all the documents of the collection with a null value for the elements  whose documents don’t contain the indexed field.

Sparse indexes are based on the existence of a field in the documents, and are useful to reduce the size of the index. Since they are a subset of partial indexes, if you have to decide then you should choose partial indexes.

Defining a partial index with the following filter is the same of having a sparse index:

db.people.createIndex(
  { city: 1 },
  { partialFilterExpression: { city: { $exists: true }  } }
)

Unique indexes

MongoDB can create an index as unique. An index defined this way cannot contain duplicate entries.

To create such an index use the unique option.

db.people.createIndex( { city: 1 }, { unique: true } )

Uniqueness can be defined for compound indexes too.

db.people.createIndex( { city: 1, person.surname: 1}, { unique: true } )

By default the index on _id is automatically created as unique.

You cannot create the index as unique if you already have documents in the collection with duplicates for the proposed index.

As we know, missing fields in the documents are inserted in the index with a null value, even in the case of a unique index creation. Consequently only one null value is permitted in the index because of the uniqueness restraint. Our suggestion is to try to avoid missing fields when creating a unique index.

A couple of useful commands

Here are a couple of useful commands when dealing with indexes.

List of the indexes in a collection

Use the getIndexes() method.

MongoDB > db.restaurants.getIndexes()
[
	{
		"v" : 2,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "test.restaurants"
	},
	{
		"v" : 2,
		"key" : {
			"borough" : 1
		},
		"name" : "borough_1",
		"ns" : "test.restaurants"
	},
	{
		"v" : 2,
		"key" : {
			"cuisine" : 1
		},
		"name" : "cuisine_1",
		"ns" : "test.restaurants"
	},
	{
		"v" : 2,
		"key" : {
			"cuisine" : 1,
			"grades.score" : 1
		},
		"name" : "cuisine_1_grades.score_1",
		"ns" : "test.restaurants"
	}
]

In this sample collection we have 4 indexes, including the _id.

Drop an existing index

Use the dropIndex() method passing the name of the index to be dropped.

MongoDB > db.restaurants.dropIndex("cuisine_1_grades.score_1")
{ "nIndexesWas" : 4, "ok" : 1 }

Conclusions

In this first part of this two part series I’ve described the indexes available in MongoDB, and how you can create and use them. We’ve also discussed some of their main features. In the next part, we’ll focus on the explain() method, and using some examples we’ll show how we can investigate queries and, if possible, how we can optimize them.

You won’t have to wait long! Come back on Thursday to read about MongoDB explain().

While you are here

If you found this article interesting, you might like to check out some of our other resources. For example, you might find this recorded webinar MongoDB Sharding 101 by my colleague Adamo Tonete to be useful. Or perhaps Tim Vaillancourt’s MongoDB backup and recovery field guide would provide some useful information. If you want to catch our webinars as they happen, please subscribe to receive notifications in good time.

The post MongoDB: index usage and MongoDB explain() (part 1) appeared first on Percona Database Performance Blog.

by Corrado Pandiani at September 04, 2018 11:37 AM

September 03, 2018

Peter Zaitsev

Monitoring S.M.A.R.T. Metrics with Prometheus and PMM

visualized using Grafana

In his excellent blog post, Pavel Trukhanov showed the value of S.M.A.R.T. metric collections, so I wondered how hard would it be to enable their collection in Percona Monitoring and Management (PMM)

A quick search led me to the  text_collector plugin SmartMon, which can be easily integrated with any Prometheus Installation

For PMM, Vadim Yalovets recently showed how to do custom integrations based on text_collector

Let’s put those together:

  1. Ensure you have the smartctl tool installed. It is available in repositories for most Linux distributions
  2. Get  smartmon.sh and place it in /usr/local/bin or other location
  3. Install the cron job
    echo  "*/5 * * * * root bash  /usr/local/bin/smartmon.sh > /tmp/smart_metrics.prom  " > /etc/cron.d/smartmon
  4. Enable textfile_collector as described in this blog post

That’s it! You should get your data flowing. Now you can use Prometheus to query device information:

use prometheus to query device

Or if you want to get a specific S.M.A.R.T value, such as media_wearout indicator:

specific smart value wearout indicator

If you would like to see a nicer visualization in Grafana, you can install the appropriate dashboard from the Grafana web site.

visualized using Grafana

The number and kind of metrics you’re going to get depends on the storage device vendor and model. Here is an example list from one of my test systems:

# HELP smartmon_smartctl_version SMART metric smartctl_version
# TYPE smartmon_smartctl_version gauge
smartmon_smartctl_version{version="6.5"} 1
# HELP smartmon_current_pending_sector_raw_value SMART metric current_pending_sector_raw_value
# TYPE smartmon_current_pending_sector_raw_value gauge
smartmon_current_pending_sector_raw_value{disk="/dev/sda",type="sat",smart_id="197"} 0.000000e+00
# HELP smartmon_current_pending_sector_threshold SMART metric current_pending_sector_threshold
# TYPE smartmon_current_pending_sector_threshold gauge
smartmon_current_pending_sector_threshold{disk="/dev/sda",type="sat",smart_id="197"} 0
# HELP smartmon_current_pending_sector_value SMART metric current_pending_sector_value
# TYPE smartmon_current_pending_sector_value gauge
smartmon_current_pending_sector_value{disk="/dev/sda",type="sat",smart_id="197"} 100
# HELP smartmon_current_pending_sector_worst SMART metric current_pending_sector_worst
# TYPE smartmon_current_pending_sector_worst gauge
smartmon_current_pending_sector_worst{disk="/dev/sda",type="sat",smart_id="197"} 100
# HELP smartmon_device_info SMART metric device_info
# TYPE smartmon_device_info gauge
smartmon_device_info{disk="/dev/sda",type="sat",vendor="",product="",revision="",lun_id="",model_family="",device_model="Crucial_CT275MX300SSD1",serial_number="16431465B53F",firmware_version="M0CR031"} 1
# HELP smartmon_device_smart_available SMART metric device_smart_available
# TYPE smartmon_device_smart_available gauge
smartmon_device_smart_available{disk="/dev/sda",type="sat"} 1
# HELP smartmon_device_smart_enabled SMART metric device_smart_enabled
# TYPE smartmon_device_smart_enabled gauge
smartmon_device_smart_enabled{disk="/dev/sda",type="sat"} 1
# HELP smartmon_device_smart_healthy SMART metric device_smart_healthy
# TYPE smartmon_device_smart_healthy gauge
smartmon_device_smart_healthy{disk="/dev/sda",type="sat"} 1
# HELP smartmon_end_to_end_error_raw_value SMART metric end_to_end_error_raw_value
# TYPE smartmon_end_to_end_error_raw_value gauge
smartmon_end_to_end_error_raw_value{disk="/dev/sda",type="sat",smart_id="184"} 0.000000e+00
# HELP smartmon_end_to_end_error_threshold SMART metric end_to_end_error_threshold
# TYPE smartmon_end_to_end_error_threshold gauge
smartmon_end_to_end_error_threshold{disk="/dev/sda",type="sat",smart_id="184"} 0
# HELP smartmon_end_to_end_error_value SMART metric end_to_end_error_value
# TYPE smartmon_end_to_end_error_value gauge
smartmon_end_to_end_error_value{disk="/dev/sda",type="sat",smart_id="184"} 100
# HELP smartmon_end_to_end_error_worst SMART metric end_to_end_error_worst
# TYPE smartmon_end_to_end_error_worst gauge
smartmon_end_to_end_error_worst{disk="/dev/sda",type="sat",smart_id="184"} 100
# HELP smartmon_offline_uncorrectable_raw_value SMART metric offline_uncorrectable_raw_value
# TYPE smartmon_offline_uncorrectable_raw_value gauge
smartmon_offline_uncorrectable_raw_value{disk="/dev/sda",type="sat",smart_id="198"} 0.000000e+00
# HELP smartmon_offline_uncorrectable_threshold SMART metric offline_uncorrectable_threshold
# TYPE smartmon_offline_uncorrectable_threshold gauge
smartmon_offline_uncorrectable_threshold{disk="/dev/sda",type="sat",smart_id="198"} 0
# HELP smartmon_offline_uncorrectable_value SMART metric offline_uncorrectable_value
# TYPE smartmon_offline_uncorrectable_value gauge
smartmon_offline_uncorrectable_value{disk="/dev/sda",type="sat",smart_id="198"} 100
# HELP smartmon_offline_uncorrectable_worst SMART metric offline_uncorrectable_worst
# TYPE smartmon_offline_uncorrectable_worst gauge
smartmon_offline_uncorrectable_worst{disk="/dev/sda",type="sat",smart_id="198"} 100
# HELP smartmon_power_cycle_count_raw_value SMART metric power_cycle_count_raw_value
# TYPE smartmon_power_cycle_count_raw_value gauge
smartmon_power_cycle_count_raw_value{disk="/dev/sda",type="sat",smart_id="12"} 2.000000e+01
# HELP smartmon_power_cycle_count_threshold SMART metric power_cycle_count_threshold
# TYPE smartmon_power_cycle_count_threshold gauge
smartmon_power_cycle_count_threshold{disk="/dev/sda",type="sat",smart_id="12"} 0
# HELP smartmon_power_cycle_count_value SMART metric power_cycle_count_value
# TYPE smartmon_power_cycle_count_value gauge
smartmon_power_cycle_count_value{disk="/dev/sda",type="sat",smart_id="12"} 100
# HELP smartmon_power_cycle_count_worst SMART metric power_cycle_count_worst
# TYPE smartmon_power_cycle_count_worst gauge
smartmon_power_cycle_count_worst{disk="/dev/sda",type="sat",smart_id="12"} 100
# HELP smartmon_power_on_hours_raw_value SMART metric power_on_hours_raw_value
# TYPE smartmon_power_on_hours_raw_value gauge
smartmon_power_on_hours_raw_value{disk="/dev/sda",type="sat",smart_id="9"} 1.313300e+04
# HELP smartmon_power_on_hours_threshold SMART metric power_on_hours_threshold
# TYPE smartmon_power_on_hours_threshold gauge
smartmon_power_on_hours_threshold{disk="/dev/sda",type="sat",smart_id="9"} 0
# HELP smartmon_power_on_hours_value SMART metric power_on_hours_value
# TYPE smartmon_power_on_hours_value gauge
smartmon_power_on_hours_value{disk="/dev/sda",type="sat",smart_id="9"} 100
# HELP smartmon_power_on_hours_worst SMART metric power_on_hours_worst
# TYPE smartmon_power_on_hours_worst gauge
smartmon_power_on_hours_worst{disk="/dev/sda",type="sat",smart_id="9"} 100
# HELP smartmon_raw_read_error_rate_raw_value SMART metric raw_read_error_rate_raw_value
# TYPE smartmon_raw_read_error_rate_raw_value gauge
smartmon_raw_read_error_rate_raw_value{disk="/dev/sda",type="sat",smart_id="1"} 0.000000e+00
# HELP smartmon_raw_read_error_rate_threshold SMART metric raw_read_error_rate_threshold
# TYPE smartmon_raw_read_error_rate_threshold gauge
smartmon_raw_read_error_rate_threshold{disk="/dev/sda",type="sat",smart_id="1"} 0
# HELP smartmon_raw_read_error_rate_value SMART metric raw_read_error_rate_value
# TYPE smartmon_raw_read_error_rate_value gauge
smartmon_raw_read_error_rate_value{disk="/dev/sda",type="sat",smart_id="1"} 100
# HELP smartmon_raw_read_error_rate_worst SMART metric raw_read_error_rate_worst
# TYPE smartmon_raw_read_error_rate_worst gauge
smartmon_raw_read_error_rate_worst{disk="/dev/sda",type="sat",smart_id="1"} 100
# HELP smartmon_reallocated_sector_ct_raw_value SMART metric reallocated_sector_ct_raw_value
# TYPE smartmon_reallocated_sector_ct_raw_value gauge
smartmon_reallocated_sector_ct_raw_value{disk="/dev/sda",type="sat",smart_id="5"} 0.000000e+00
# HELP smartmon_reallocated_sector_ct_threshold SMART metric reallocated_sector_ct_threshold
# TYPE smartmon_reallocated_sector_ct_threshold gauge
smartmon_reallocated_sector_ct_threshold{disk="/dev/sda",type="sat",smart_id="5"} 10
# HELP smartmon_reallocated_sector_ct_value SMART metric reallocated_sector_ct_value
# TYPE smartmon_reallocated_sector_ct_value gauge
smartmon_reallocated_sector_ct_value{disk="/dev/sda",type="sat",smart_id="5"} 100
# HELP smartmon_reallocated_sector_ct_worst SMART metric reallocated_sector_ct_worst
# TYPE smartmon_reallocated_sector_ct_worst gauge
smartmon_reallocated_sector_ct_worst{disk="/dev/sda",type="sat",smart_id="5"} 100
# HELP smartmon_reported_uncorrect_raw_value SMART metric reported_uncorrect_raw_value
# TYPE smartmon_reported_uncorrect_raw_value gauge
smartmon_reported_uncorrect_raw_value{disk="/dev/sda",type="sat",smart_id="187"} 0.000000e+00
# HELP smartmon_reported_uncorrect_threshold SMART metric reported_uncorrect_threshold
# TYPE smartmon_reported_uncorrect_threshold gauge
smartmon_reported_uncorrect_threshold{disk="/dev/sda",type="sat",smart_id="187"} 0
# HELP smartmon_reported_uncorrect_value SMART metric reported_uncorrect_value
# TYPE smartmon_reported_uncorrect_value gauge
smartmon_reported_uncorrect_value{disk="/dev/sda",type="sat",smart_id="187"} 100
# HELP smartmon_reported_uncorrect_worst SMART metric reported_uncorrect_worst
# TYPE smartmon_reported_uncorrect_worst gauge
smartmon_reported_uncorrect_worst{disk="/dev/sda",type="sat",smart_id="187"} 100
# HELP smartmon_smartctl_run SMART metric smartctl_run
# TYPE smartmon_smartctl_run gauge
smartmon_smartctl_run{disk="/dev/sda",type="sat"} 1535666337
# HELP smartmon_temperature_celsius_raw_value SMART metric temperature_celsius_raw_value
# TYPE smartmon_temperature_celsius_raw_value gauge
smartmon_temperature_celsius_raw_value{disk="/dev/sda",type="sat",smart_id="194"} 3.100000e+01
# HELP smartmon_temperature_celsius_threshold SMART metric temperature_celsius_threshold
# TYPE smartmon_temperature_celsius_threshold gauge
smartmon_temperature_celsius_threshold{disk="/dev/sda",type="sat",smart_id="194"} 0
# HELP smartmon_temperature_celsius_value SMART metric temperature_celsius_value
# TYPE smartmon_temperature_celsius_value gauge
smartmon_temperature_celsius_value{disk="/dev/sda",type="sat",smart_id="194"} 69
# HELP smartmon_temperature_celsius_worst SMART metric temperature_celsius_worst
# TYPE smartmon_temperature_celsius_worst gauge
smartmon_temperature_celsius_worst{disk="/dev/sda",type="sat",smart_id="194"} 59
# HELP smartmon_udma_crc_error_count_raw_value SMART metric udma_crc_error_count_raw_value
# TYPE smartmon_udma_crc_error_count_raw_value gauge
smartmon_udma_crc_error_count_raw_value{disk="/dev/sda",type="sat",smart_id="199"} 0.000000e+00
# HELP smartmon_udma_crc_error_count_threshold SMART metric udma_crc_error_count_threshold
# TYPE smartmon_udma_crc_error_count_threshold gauge
smartmon_udma_crc_error_count_threshold{disk="/dev/sda",type="sat",smart_id="199"} 0
# HELP smartmon_udma_crc_error_count_value SMART metric udma_crc_error_count_value
# TYPE smartmon_udma_crc_error_count_value gauge
smartmon_udma_crc_error_count_value{disk="/dev/sda",type="sat",smart_id="199"} 100
# HELP smartmon_udma_crc_error_count_worst SMART metric udma_crc_error_count_worst
# TYPE smartmon_udma_crc_error_count_worst gauge
smartmon_udma_crc_error_count_worst{disk="/dev/sda",type="sat",smart_id="199"} 100

The post Monitoring S.M.A.R.T. Metrics with Prometheus and PMM appeared first on Percona Database Performance Blog.

by Peter Zaitsev at September 03, 2018 06:58 PM

40 million tables in MySQL 8.0 with ZFS

40 million tables in MySQL 8

40 million tables in MySQL 8In my previous blog post about millions of table in MySQL 8, I was able to create one million tables and test the performance of it. My next challenge is to create 40 million tables in MySQL 8 using shared tablespaces (one tablespace per schema). In this blog post I’m showing how to do it and what challenges we can expect.

Background

Once again – why do we need so many tables in MySQL, what is the use case? The main reason is: customer isolation. With the new focus on security and privacy (take GDPR for example) it is much easier and more beneficial to create a separate schema (or “database” in MySQL terms) for each customer. That creates a new set of challenges that we will need to solve. Here is the summary:

  1. Too many files. For each table MySQL creates an FRM file. With MySQL 8.0, this is not the case for InnoDB tables (new data dictionary): it does not create FRM files, only creates IBD file.
  2. Too much storage overhead. Just to create 40 million tables we will need to have ~4 – 5 Tb of space. The ZFS filesystem can help here a lot, through compression – see below.
  3. MySQL does not work well with so many tables. We have observed a lot of overhead (MySQL needs to open/close table definition files) and contention (table definitions needs to be stored in memory to avoid performance penalty, which introduce mutex contention)

Challenges

When I approached the task of creating 40 million tables, my first challenge was disk space. Just to create them, I needed at least 5Tb of fast disk storage. The good news is: we have the ZFS filesystem which provides compression out of the box. With compression I was able to use just a 250G drive with ZFS – the compression ratio is > 10x:

# du -sh --apparent-size /var/lib/mysql-data
4.7T    /var/lib/mysql-data
# du -sh /var/lib/mysql-data
131G    /var/lib/mysql-data

The second challenge is how to create those tables in a reasonable amount of time. I created a script to “provision” the databases (create all 40 millions tables). The good new is that the performance regression in “create table” speed and scalability bug was fixed so I was able to use this script to create 40 million tables using shared tablespaces (one tablespace per schema):

#/bin/bash
function do_db {
        db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'")
        if [ "$db_exist" == "1" ]; then echo "Already exists $db"; return 0; fi;
        mysql -vvv -e "create database $db";
        mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;"
        for i in {1..100}
        do
                table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;"
                mysql $db -e "$table"
        done
}
c=0
for m in {1..4000000}
do
        for i in {1..40}
        do
                let c=$c+1
                echo $c
                db="sbtest_$c"
                do_db &
        done
        wait
        #if [ $c > 4000000 ]; then exit; fi
done

40 million tables in MySQL 8

Now it’s time for a real test. I’m using the latest MySQL 8 version (at the time of writing): 8.0.12. This implements the new data dictionary.

MySQL config file:

[mysqld]
datadir=/var/lib/mysql-data
socket=/var/lib/mysql-data/mysql.sock
datadir=/var/lib/mysql-data
log-error = /var/lib/mysql-log/error.log
server_id = 12345
log_bin = /var/lib/mysql-log/binlog
relay_log=/var/lib/mysql-log/relay-bin
skip-log-bin=1
innodb_log_group_home_dir = /var/lib/mysql-log
innodb_doublewrite = 0
innodb_flush_log_at_trx_commit=0
innodb_log_file_size=2G
innodb_buffer_pool_size=4G
tablespace_definition_cache = 524288
schema_definition_cache = 524288
table_definition_cache = 524288
table_open_cache=524288
open-files-limit=1000000

Sysbench shell script:

function run_sb() {
conn=" --db-driver=mysql --mysql-socket=/var/lib/mysql-data/mysql.sock  --mysql-db=sbtest_1 --mysql-user=sbtest --mysql-password=abc "
sysbench $conn --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time ./select_custom.lua run | tee -a sysbench_2.txt
}
let db_count=400000
table_count=100
max_time=10000
num_threads=32
run_sb

Sysbench lua script:

pathtest = "/usr/share/sysbench/tests/include/oltp_legacy/"
if pathtest then
   dofile(pathtest .. "common.lua")
else
   require("common")
end
function thread_init(thread_id)
   set_vars()
end
function event()
   local table_name
   local i
   local c_val
   local k_val
   local pad_val
   oltp_db_count = tonumber(oltp_db_count) or 1
   -- local oltp_db_count = 4
   table_name = "sbtest_" .. sb_rand(1, oltp_db_count)..".sbtest".. sb_rand(1, oltp_tables_count)
   k_val = sb_rand(1, oltp_table_size)
   c_val = sb_rand_str([[
###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]])
   pad_val = sb_rand_str([[
###########-###########-###########-###########-###########]])
      rs = db_query("SELECT id FROM " .. table_name .." LIMIT 1")
end

Please note that the tables are empty – no data.

Now we can run the benchmark. Unfortunately, we have a serious mutex contention in the data dictionary. Here are the results:

[ 453s ] thds: 32 tps: 1203.96 qps: 1203.96 (r/w/o: 1203.96/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 454s ] thds: 32 tps: 1202.32 qps: 1202.32 (r/w/o: 1202.32/0.00/0.00) lat (ms,95%): 42.61 err/s: 0.00 reconn/s: 0.00
[ 455s ] thds: 32 tps: 1196.74 qps: 1196.74 (r/w/o: 1196.74/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 456s ] thds: 32 tps: 1197.18 qps: 1197.18 (r/w/o: 1197.18/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 457s ] thds: 32 tps: 887.11 qps: 887.11 (r/w/o: 887.11/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 458s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 459s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 460s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 461s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 462s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 463s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 464s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 465s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 466s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 467s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 468s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 469s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 470s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 471s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 472s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 473s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 474s ] thds: 32 tps: 403.96 qps: 403.96 (r/w/o: 403.96/0.00/0.00) lat (ms,95%): 16819.24 err/s: 0.00 reconn/s: 0.00
[ 475s ] thds: 32 tps: 1196.00 qps: 1196.00 (r/w/o: 1196.00/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 476s ] thds: 32 tps: 1208.96 qps: 1208.96 (r/w/o: 1208.96/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 477s ] thds: 32 tps: 1192.06 qps: 1192.06 (r/w/o: 1192.06/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 478s ] thds: 32 tps: 1173.89 qps: 1173.89 (r/w/o: 1173.89/0.00/0.00) lat (ms,95%): 43.39 err/s: 0.00 reconn/s: 0.00

As we can see, for ~15 seconds no queries were processed: a complete MySQL stall. That situation – complete stall – happens constantly, every ~25-30 seconds.

Show engine innodb status query shows mutex contention:

SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 498635
--Thread 140456572004096 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140451898689280 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140451896919808 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140456571119360 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140457044215552 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140456572299008 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140457043035904 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140456571709184 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140451897214720 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140451896624896 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140457042740992 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140451899279104 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
--Thread 140457042446080 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore:
Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1
OS WAIT ARRAY INFO: signal count 89024
RW-shared spins 11216, rounds 14847, OS waits 3641

I’ve filed a  new MySQL bug: DICT_SYS mutex contention causes complete stall when running with 40 mill tables.

I’ve also tested with pareto distribution in sysbench, and even set the ratio to 0.05 (5%) and 0.01 (1%), and mutex contention is still an issue. I have used the following updated sysbench script:

function run_sb() {
conn=" --db-driver=mysql --mysql-socket=/var/lib/mysql-data/mysql.sock  --mysql-db=sbtest_1 --mysql-user=sbtest --mysql-password=abc "
sysbench $conn --rand-type=$rand_type --rand-pareto-h=$pareto_h --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time $test_name run | tee -a sysbench_2.txt
}
let db_count=400000
table_count=100
max_time=10000
num_threads=32
rand_type="pareto"
pareto_h=0.01
test_name="./select_custom.lua"
echo "Now running $rand_type for $max_time seconds, test=$test_name"
run_sb

And the results with 0.01 (1%) are the following:

[ 55s ] thds: 32 tps: 72465.29 qps: 72465.29 (r/w/o: 72465.29/0.00/0.00) lat (ms,95%): 0.53 err/s: 0.00 reconn/s: 0.00
[ 56s ] thds: 32 tps: 68641.04 qps: 68641.04 (r/w/o: 68641.04/0.00/0.00) lat (ms,95%): 0.61 err/s: 0.00 reconn/s: 0.00
[ 57s ] thds: 32 tps: 70479.82 qps: 70479.82 (r/w/o: 70479.82/0.00/0.00) lat (ms,95%): 0.57 err/s: 0.00 reconn/s: 0.00
[ 58s ] thds: 32 tps: 31395.55 qps: 31395.55 (r/w/o: 31395.55/0.00/0.00) lat (ms,95%): 0.49 err/s: 0.00 reconn/s: 0.00
[ 59s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 61s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 62s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 63s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 64s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 65s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 66s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 67s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 68s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 69s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 70s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 71s ] thds: 32 tps: 18879.04 qps: 18879.04 (r/w/o: 18879.04/0.00/0.00) lat (ms,95%): 0.75 err/s: 0.00 reconn/s: 0.00
[ 72s ] thds: 32 tps: 70924.82 qps: 70924.82 (r/w/o: 70924.82/0.00/0.00) lat (ms,95%): 0.48 err/s: 0.00 reconn/s: 0.00
[ 73s ] thds: 32 tps: 72395.57 qps: 72395.57 (r/w/o: 72395.57/0.00/0.00) lat (ms,95%): 0.47 err/s: 0.00 reconn/s: 0.00
[ 74s ] thds: 32 tps: 72483.22 qps: 72484.22 (r/w/o: 72484.22/0.00/0.00) lat (ms,95%): 0.58 err/s: 0.00 reconn/s: 0.00

ZFS

The ZFS filesystem provides compression, which helps tremendously in this case. When MySQL creates an InnoDB table it will create a new blank .ibd file and pre-allocate some pages, which will be blank. I have configured ZFS compression and can see > 10x compression ratio:

# zfs get all | grep compressratio
mysqldata                       compressratio         12.47x                      -
mysqldata                       refcompressratio      1.00x                       -
mysqldata/mysql                 compressratio         12.47x                      -
mysqldata/mysql                 refcompressratio      1.00x                       -
mysqldata/mysql/data            compressratio         12.51x                      -
mysqldata/mysql/data            refcompressratio      12.54x                      -
mysqldata/mysql/log             compressratio         2.79x                       -
mysqldata/mysql/log             refcompressratio      4.57x                       -

Conclusion

It is possible to create 40 million tables with MySQL 8.0 using shared tablespaces. ZFS provides an excellent compression ratio (with gzip) which can help by reducing the overhead of “schema per customer” architecture. Unfortunately, the new data dictionary in MySQL 8.0.12 suffers from the DICT_SYS mutex contention and causes constant “stalls”.

The post 40 million tables in MySQL 8.0 with ZFS appeared first on Percona Database Performance Blog.

by Alexander Rubin at September 03, 2018 12:02 PM

August 31, 2018

Peter Zaitsev

This Week in Data With Colin Charles 51: Debates Emerging on the Relicensing of OSS

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

There has been a lot of talk around licenses in open source software, and it has hit the database world in the past weeks. Redis Labs relicensed some AGPL software to the Commons Clause (in their case, Apache + Commons Clause; so you can’t really call it Apache any longer). I’ll have more to say on this topic soon, but in the meantime you might enjoy reading Open-source licensing war: Commons Clause. This was the most balanced article I read about this move and the kerfuffle it has caused. We also saw this with Lerna (not database related), and here’s another good read: Open Source Devs Reverse Decision to Block ICE Contractors From Using Software.

Reviewing is under way for Percona Live Europe 2018 talks: the review of the tutorials is complete. We can expect to see a schedule by mid-September, so hang in there—I’ve received a lot of messages asking if talks are going to be approved or not.

Releases

  • While not a new release, MySQL Shell 8.0.12 is worth spending some time with, especially since you might enjoy the pluggable password store.
  • SqlKata for C# – SqlKata is an elegant Sql Query Builder for C#, it helps you to talk with your database engine with a higher order of freedom, it allows you to write complex queries in an Object Oriented Manner, helpful when you need. Works with MySQL, PostgreSQL, and more

Link List

Industry Updates

  • Balazs Pocze is now a database SRE at Wikimedia Foundation. He has spoken at several Percona Live events too!

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data With Colin Charles 51: Debates Emerging on the Relicensing of OSS appeared first on Percona Database Performance Blog.

by Colin Charles at August 31, 2018 05:29 PM

Webinar Wed 9/5: Choosing the Right Open Source Database

open source database

open source databasePlease join Percona’s CEO, Peter Zaitsev as he presents Choosing the Right Open Source Database on Wednesday, September 5th, 2018 at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

 

 

The world of open-source databases is overwhelming. There are dozens of types of databases – relational DBMS, time-series, graph, document, etc.  not to mention the dozens of software options within each of those categories. More and more, the strategies of enterprises involve open source software and open source database software. This allows companies to be more agile, quick to market, and cost-effective.

So how do you know which open source database to choose? Open-source database companies worldwide are competing for our attention. In this talk, we will distill the madness. I will give an overview of the types of open-source databases, use-cases, and key players in the ecosystem.

Register for this webinar on choosing the right open source database.

The post Webinar Wed 9/5: Choosing the Right Open Source Database appeared first on Percona Database Performance Blog.

by Peter Zaitsev at August 31, 2018 05:14 PM

Tuning PostgreSQL Database Parameters to Optimize Performance

PostgreSQL parameters for database performance tuning

Out of the box, the default PostgreSQL configuration is not tuned for any particular workload. Default values are set to ensure that PostgreSQL runs everywhere, with the least resources it can consume and so that it doesn’t cause any vulnerabilities. It has default settings for all of the database parameters. It is primarily the responsibility of the database administrator or developer to tune PostgreSQL according to their system’s workload. In this blog, we will establish basic guidelines for setting PostgreSQL database parameters to improve database performance according to workload.

Bear in mind that while optimizing PostgreSQL server configuration improves performance, a database developer must also be diligent when writing queries for the application. If queries perform full table scans where an index could be used or perform heavy joins or expensive aggregate operations, then the system can still perform poorly even if the database parameters are tuned. It is important to pay attention to performance when writing database queries.

Nevertheless, database parameters are very important too, so let’s take a look at the eight that have the greatest potential to improve performance

PostgreSQL’s Tuneable Parameters

shared_buffer

PostgreSQL uses its own buffer and also uses kernel buffered IO. That means data is stored in memory twice, first in PostgreSQL buffer and then kernel buffer. Unlike other databases, PostgreSQL does not provide direct IO. This is called double buffering. The PostgreSQL buffer is called shared_buffer which is the most effective tunable parameter for most operating systems. This parameter sets how much dedicated memory will be used by PostgreSQL for cache.

The default value of shared_buffer is set very low and you will not get much benefit from that. It’s low because certain machines and operating systems do not support higher values. But in most modern machines, you need to increase this value for optimal performance.

The recommended value is 25% of your total machine RAM. You should try some lower and higher values because in some cases we achieve good performance with a setting over 25%. The configuration really depends on your machine and the working data set. If your working set of data can easily fit into your RAM, then you might want to increase the shared_buffer value to contain your entire database, so that the whole working set of data can reside in cache. That said, you obviously do not want to reserve all RAM for PostgreSQL.

In production environments, it is observed that a large value for shared_buffer gives really good performance, though you should always benchmark to find the right balance.

testdb=# SHOW shared_buffers;
shared_buffers
----------------
128MB
(1 row)

Note: Be careful as some kernels do not allow a bigger value, specifically in Windows there is no use of higher value.

wal_buffers

PostgreSQL writes its WAL (write ahead log) record into the buffers and then these buffers are flushed to disk. The default size of the buffer, defined by wal_buffers, is 16MB, but if you have a lot of concurrent connections then a higher value can give better performance.

effective_cache_size

The effective_cache_size provides an estimate of the memory available for disk caching. It is just a guideline, not the exact allocated memory or cache size. It does not allocate actual memory but tells the optimizer the amount of cache available in the kernel. If the value of this is set too low the query planner can decide not to use some indexes, even if they’d be helpful. Therefore, setting a large value is always beneficial.

work_mem

This configuration is used for complex sorting. If you have to do complex sorting then increase the value of work_mem for good results. In-memory sorts are much faster than sorts spilling to disk. Setting a very high value can cause a memory bottleneck for your deployment environment because this parameter is per user sort operation. Therefore, if you have many users trying to execute sort operations, then the system will allocate

work_mem * total sort operations
  for all users. Setting this parameter globally can cause very high memory usage. So it is highly recommended to modify this at the session level.

testdb=# SET work_mem TO "2MB";
testdb=# EXPLAIN SELECT * FROM bar ORDER BY bar.b;
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
Gather Merge  (cost=509181.84..1706542.14 rows=10000116 width=24)
   Workers Planned: 4
   ->  Sort  (cost=508181.79..514431.86 rows=2500029 width=24)
         Sort Key: b
         ->  Parallel Seq Scan on bar  (cost=0.00..88695.29 rows=2500029 width=24)
(5 rows)

The initial query’s sort node has an estimated cost of 514431.86. Cost is an arbitrary unit of computation. For the above query, we have a work_mem of only 2MB. For testing purposes, let’s increase this to 256MB and see if there is any impact on cost.

testdb=# SET work_mem TO "256MB";
testdb=# EXPLAIN SELECT * FROM bar ORDER BY bar.b;
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
Gather Merge  (cost=355367.34..1552727.64 rows=10000116 width=24)
   Workers Planned: 4
   ->  Sort  (cost=354367.29..360617.36 rows=2500029 width=24)
         Sort Key: b
         ->  Parallel Seq Scan on bar  (cost=0.00..88695.29 rows=2500029 width=24)

The query cost is reduced to 360617.36 from 514431.86 — a 30% reduction.

maintenance_work_mem

maintenance_work_mem is a memory setting used for maintenance tasks. The default value is 64MB. Setting a large value helps in tasks like VACUUM, RESTORE, CREATE INDEX, ADD FOREIGN KEY and ALTER TABLE.

postgres=# CHECKPOINT;
postgres=# SET maintenance_work_mem to '10MB';
postgres=# CREATE INDEX foo_idx ON foo (c);
CREATE INDEX
Time: 170091.371 ms (02:50.091)

postgres=# CHECKPOINT;
postgres=# set maintenance_work_mem to '256MB';
postgres=# CREATE INDEX foo_idx ON foo (c);
CREATE INDEX
Time: 111274.903 ms (01:51.275)

The index creation time is 170091.371ms when maintenance_work_mem is set to only 10MB, but that is reduced to 111274.903 ms when we increase maintenance_work_mem setting to 256MB.

synchronous_commit

This is used to enforce that commit will wait for WAL to be written on disk before returning a success status to the client. This is a trade-off between performance and reliability. If your application is designed such that performance is more important than the reliability, then turn off synchronous_commit. This means that there will be a time gap between the success status and a guaranteed write to disk. In the case of a server crash, data might be lost even though the client received a success message on commit. In this case, a transaction commits very quickly because it will not wait for a WAL file to be flushed, but reliability is compromised.

checkpoint_timeout, checkpoint_completion_target

PostgreSQL writes changes into WAL. The checkpoint process flushes the data into the data files. This activity is done when CHECKPOINT occurs. This is an expensive operation and can cause a huge amount of IO. This whole process involves expensive disk read/write operations. Users can always issue CHECKPOINT whenever it seems necessary or automate the system by PostgreSQL’s parameters checkpoint_timeout and checkpoint_completion_target.

The checkpoint_timeout parameter is used to set time between WAL checkpoints. Setting this too low decreases crash recovery time, as more data is written to disk, but it hurts performance too since every checkpoint ends up consuming valuable system resources. The checkpoint_completion_target is the fraction of time between checkpoints for checkpoint completion. A high frequency of checkpoints can impact performance. For smooth checkpointing, checkpoint_timeout must be a low value. Otherwise the OS will accumulate all the dirty pages until the ratio is met and then go for a big flush.

Conclusion

There are more parameters that can be tuned to gain better performance but those have less impact than the ones highlighted here. In the end, we must always keep in mind that not all parameters are relevant for all applications types. Some applications perform better by tuning a parameter and some don’t. Database parameters must be tuned for the specific needs of an application and the OS it runs on.

Related posts

You can read my post about tuning Linux parameters for PostgreSQL database performance

Plus another recent post on benchmarks:

Tuning PostgreSQL for sysbench-tpcc

The post Tuning PostgreSQL Database Parameters to Optimize Performance appeared first on Percona Database Performance Blog.

by Ibrar Ahmed at August 31, 2018 03:38 PM

Open Query Pty Ltd

My oldest still-open MySQL bug

a bugThis morning I received an update on a MySQL bug, someone added a comment on an issue I filed in November 2003, originally for MySQL 4.0.12 and 4.1.0. It’s MySQL bug#1956 (note the very low number!), “Parser allows naming of PRIMARY KEY”:

[25 Nov 2003 16:23] Arjen Lentz
Description:
When specifying a PRIMARY KEY, the parser still accepts a name for the index even though the MySQL server will always name it PRIMARY.
So, the parser should NOT accept this, otherwise a user can get into a confusing situation as his input is ignored/changed.

How to repeat:
CREATE TABLE pk (i INT NOT NULL);
ALTER TABLE pk ADD PRIMARY KEY bla (i);

'bla' after PRIMARY KEY should not be accepted.

Suggested fix:
Fix grammar in parser.

Most likely we found it during a MySQL training session, training days have always been a great bug catching source as students would be encouraged to try lots of quirky things and explore.

It’s not a critical issue, but one from the “era of sloppiness” as I think we may now call it, with the benefit of hindsight.  At the time, it was just regarded as lenient: the parser would silently ignore things it couldn’t process in a particular context.  Like creating a foreign key on a table using an engine that didn’t support foreign keys would see the foreign keys silently disappear, rather than reporting an error.  Many  if not most of those quirks have been cleaned up over the years.  Some are very difficult to get rid of, as people use them in code and thus essentially we’re stuck with old bad behaviour as otherwise we’d break to many applications.  I think that one neat example of that is auto-casting:

SELECT "123 apples" + 1;
+------------------+
| "123 apples" + 1 |
+------------------+
|              124 |
+------------------+
1 row in set, 1 warning (0.000 sec)

SHOW WARNINGS;
+---------+------+-----------------------------------------------+
| Level   | Code | Message                                       |
+---------+------+-----------------------------------------------+
| Warning | 1292 | Truncated incorrect DOUBLE value: '123 hello' |
+---------+------+-----------------------------------------------+

At least that one chucks a warning now, which is good as it allows developers to catch mistakes and thus prevent trouble.  A lenient parser (or grammar) can be convenient, but it tends to also enable application developers to be sloppy, which is not really a good thing.  Something that creates a warning may be harmless, or an indication of a nasty mistake: the parser can’t tell.  So it’s best to ensure that code is clean, free even of warnings.

Going back to the named PRIMARY KEY issue… effectively, a PRIMARY KEY has the name ‘PRIMARY’ internally.  So it can’t really have another name, and the parser should not accept an attempt to name it.  The name silently disappears so when you check back in SHOW CREATE TABLE or SHOW INDEXES, you won’t ever see it.
CREATE TABLE pkx (i INT PRIMARY KEY x) reports a syntax error. So far so good.
But, CREATE TABLE pkx (i INT, PRIMARY KEY x (i)) is accepted, as is ALTER TABLE pkx ADD PRIMARY KEY x (i).

MySQL 8.0 still allows these constructs, as does MariaDB 10.3.9 !

The bug is somewhere in the parser, so that’s sql/sql_yacc.yy.  I’ve tweaked and added things in there before, but it’s been a while and particularly in parser grammar you have to be very careful that changes don’t have other side-effects.  I do think it’s worthwhile fixing even these minor issues.

by Arjen Lentz at August 31, 2018 02:53 AM

August 30, 2018

Peter Zaitsev

Is It a Read Intensive or a Write Intensive Workload?

innodb row operations featured

One of the common ways to classify database workloads is whether it is  “read intensive” or “write intensive”. In other words, whether the workload is dominated by reads or writes.

Why should you care? Because recognizing if the workload is read intensive or write intensive will impact your hardware choices, database configuration as well as what techniques you can apply for performance optimization and scalability.

This question looks trivial on the surface, but as you go deeper—complexity emerges. There are different “levels” of reads and writes for you to consider. You can also choose to look at event counts or at the time it takes to do operations. These can provide very different responses, especially as the cost difference between a single read and a single write can be an order of magnitude.

Let’s examine the TPC-C Benchmark from this point of view, or more specifically its implementation in Sysbench. The illustrations below are taken from Percona Monitoring and Management (PMM) while running this benchmark.

Analyzing read/write workload by counts

analyzing read write workload by counts
At the highest level, you can think about queries that are sent to the database. In this case we can see about 30K of SELECT queries versus 20K of UPDATE+INSERT queries, making this benchmark slightly more read intensive by this measure.

innodb row operations
Another way to look at the load is through actual operations at the row level – a single query may touch just one row or may touch millions. In this benchmark the difference between looking at workload from a SQL commands standpoint vs a row operation standpoint yields the same results, but it is not going to always be the case.

io activity
Let’s now look at the operating system level. We can see the amount of data written to the disk is 2x more than the amount of data being read from the disk. This workload is write intensive by this measure.

top tables by row read

top tables by rows changed

Yet another way to take a look at your workload is to take a look at it from the aspect of tables. This view shows us that tables are being mostly accessed for reads and writes. This in turn allows us to see whether a given table is getting more reads or writes. This is helpful, for example, if you are considering to move some of the tables to a different server and want to clearly understand how your workload will be impacted.

Analyzing Read/Write Workload by Response Time

As I mentioned already, the counts often do not reflect the time to respond, which is typically more representative of the real work being done. To look at timing information from query point of view, we want to look at query analytics.

query analytics providing time analysis
The “Load” column here is a measure of such a combined response time, versus count which is reflective of query counts. Looking at this list we can see that three out of top five queries are SELECT queries. Looking at the numbers overall, we can see we have a read intensive application from this perspective.

In terms of row level operations, there is currently no easy way to see if reads or writes are dominating overall but  you can get an idea from the table operations dashboard:

table operations dashboard
This shows the load on a per table basis. It labels reads “Fetch” and breaks down writes in more detail—“Update”, “Delete”, “Inserts”—which is helpful. Not all writes are equal either.

disk io load

If we want to look at a response time based view of read vs write on an operating system, we can check out this disk IO Load graph. You can see in this case it happens to match the IO activity graph, with storage taking more time to serve write requests versus read requests

Summary

As you can see, the question about whether a workload is read intensive or write intensive, while simple on the surface, can have many different answers. You might ask me “OK, so what should I use?” Well… it really depends.

Looking at query counts is a great way to understand the application’s demands on the database—you can’t really do anything to change the database size.  However by changing the database configuration and schema you may drastically alter the impact of these queries, both from the standpoint of the number of rows they crunch and in terms of the disk IO they require.

The response time based statistics, gathered from the impact your queries cause on the system or disk IO, provide a better representation of the load these queries currently generate.

Another thing to keep in mind—reads and writes are not created equal. My rule of thumb for InnoDB is that a single row write is about 10x more expensive than a single row read.

More resources that you might enjoy

If you found this post useful, you might also like to see some of Percona’s other resources.

For an introduction to PMM, our free and open source management and monitoring software, you might find value in my recorded webinar, MySQL Troubleshooting and Performance Optimization with PMM

While our white paper Performance at Scale could provide useful insight if you are at the planning or review stage.

The post Is It a Read Intensive or a Write Intensive Workload? appeared first on Percona Database Performance Blog.

by Peter Zaitsev at August 30, 2018 11:19 AM

August 29, 2018

Peter Zaitsev

Scaling IO-Bound Workloads for MySQL in the Cloud

InnoDB / MyRocks throughput on IO1

Is increasing GP2 volumes size or increasing IOPS for IO1 volumes a valid method for scaling IO-Bound workloads? In this post I’ll focus on one question: how much can we improve performance if we use faster cloud volumes? This post is a continuance of previous cloud research posts:

To recap, in Amazon EC2 we can use gp2 and io1 volumes. gp2 performance can be scaled with size, i.e for gp2 volume size of 500GB we get 1500 iops; size 1000GB – 3000 iops; and for 3334GB – 10000 iops (maximal possible value). For io1 volumes we can “buy” throughput up to 30000 iops.

So I wanted to check how both InnoDB and RocksDB storage engines perform on these volumes with different throughput.

Benchmark Scenario

I will use the same datasize that I used in Saving With MyRocks in The Cloud, that is sysbench-tpcc, 50 tables, 100W each, about 500GB datasize in InnoDB and 100GB in RocksDB (compressed with LZ4).

Volumes settings: gp2 volumes from 500GB (1000GB for InnoDB) to 3400GB with 100GB increments (so each increment increases throughput by 300 iops); io1 volumes: 1TB in size, iops from 1000 to 30000 with 1000 increments.

Let’s take look at the results. I will use a slightly different format than usual, but hopefully it represents the results better. You will see density throughout the plots—a higher and narrower chart represents less variance in the throughput. The plot represents the distribution of the throughput.

Results on GP2 volumes:

InnoDB/MyRocks throughput on gp2

It’s quite interesting to see how the result scales with better IO throughput. InnoDB does not improve its throughput after gp2 size 2600GB, while MyRocks continues to scale linearly. The problem with MyRocks is that there is a lot of variance in throughput (I will show a one second resolution chart).

Results on IO1 volumes

InnoDB / MyRocks throughput on IO1

Here MyRocks again shows an impressive growth as as we add more IO capacity, but also shows a lot of variance on high capacity volumes.

Let’s compare how engines perform with one second resolution. GP2 volume, 3400GB:

InnoDB/MyRocks throughput on gp2 3400GB

IO1 volume, 30000 iops:

InnoDB/MyRocks throughput on IO1 30000 IOPS

So for MyRocks there seems to be periodical background activity, which does not allow it to achieve a stable throughput.

Raw results, if you’d like to review them, can be found here: https://github.com/Percona-Lab-results/201808-rocksdb-cloudio

Conclusions

If you are looking to improve throughput in IO-bound workloads, either increasing GP2 volumes size or increasing IOPS for IO1 volumes is a valid method, especially for the MyRocks engine.

The post Scaling IO-Bound Workloads for MySQL in the Cloud appeared first on Percona Database Performance Blog.

by Vadim Tkachenko at August 29, 2018 03:55 PM

Tune Linux Kernel Parameters For PostgreSQL Optimization

Linux parameters for PostgreSQL performance tuning

For optimum performance, a PostgreSQL database depends on the operating system parameters being defined correctly. Poorly configured OS kernel parameters can cause degradation in database server performance. Therefore, it is imperative that these parameters are configured according to the database server and its workload. In this post, we will discuss some important kernel parameters that can affect database server performance and how these should be tuned.

SHMMAX / SHMALL

SHMMAX is a kernel parameter used to define the maximum size of a single shared memory segment a Linux process can allocate. Until version 9.2, PostgreSQL uses System V (SysV) that requires SHMMAX setting. After 9.2, PostgreSQL switched to POSIX shared memory. So now it requires fewer bytes of System V shared memory.

Prior to version 9.3 SHMMAX was the most important kernel parameter. The value of SHMMAX is in bytes.

Similarly SHMALL is another kernel parameter used to define system wide total amount of shared memory pages. To view the current values for SHMMAX, SHMALL or SHMMIN, use the ipcs command.

$ ipcs -lm
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 1073741824
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1

$ ipcs -M
IPC status from  as of Thu Aug 16 22:20:35 PKT 2018
shminfo:
	shmmax: 16777216	(max shared memory segment size)
	shmmin:       1	(min shared memory segment size)
	shmmni:      32	(max number of shared memory identifiers)
	shmseg:       8	(max shared memory segments per process)
	shmall:    1024	(max amount of shared memory in pages)

PostgreSQL uses System V IPC to allocate the shared memory. This parameter is one of the most important kernel parameters. Whenever you get following error messages, it means that you have an older version PostgreSQL and you have a very low SHMMAX value. Users are expected to adjust and increase the value according to the shared memory they are going to use.

Possible misconfiguration errors

If SHMMAX is misconfigured, you can get an error when trying to initialize a PostgreSQL cluster using the initdb command.

DETAIL: Failed system call was shmget(key=1, size=2072576, 03600).
HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.&nbsp;
You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 2072576 bytes),
reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter,
in which case raising the request size or reconfiguring SHMMIN is called for.
The PostgreSQL documentation contains more information about shared memory configuration. child process exited with exit code 1

Similarly, you can get an error when starting the PostgreSQL server using the pg_ctl command.

DETAIL: Failed system call was shmget(key=5432001, size=14385152, 03600).
HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.
You can either reduce the request size or reconfigure the kernel with larger SHMMAX.; To reduce the request size (currently 14385152 bytes),
reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter,
in which case raising the request size or reconfiguring SHMMIN is called for.
The PostgreSQL documentation contains more information about shared memory configuration.

Be aware of differing definitions

The definition of the SHMMAX/SHMALL parameters is slightly different between Linux and MacOS X. These are the definitions:

  • Linux: kernel.shmmax, kernel.shmall
  • MacOS X: kern.sysv.shmmax, kern.sysv.shmall

The sysctl command can be used to change the value temporarily. To permanently set the value, add an entry into /etc/sysctl.conf. The details are given below.

# Get the value of SHMMAX
sudo sysctl kern.sysv.shmmax
kern.sysv.shmmax: 4096
# Get the value of SHMALL
sudo sysctl kern.sysv.shmall
kern.sysv.shmall: 4096
# Set the value of SHMMAX
sudo sysctl -w kern.sysv.shmmax=16777216
kern.sysv.shmmax: 4096 -> 16777216<br>
# Set the value of SHMALL
sudo sysctl -w kern.sysv.shmall=16777216
kern.sysv.shmall: 4096 -> 16777216

# Get the value of SHMMAX
sudo sysctl kernel.shmmax
kernel.shmmax: 4096
# Get the value of SHMALL
sudo sysctl kernel.shmall
kernel.shmall: 4096
# Set the value of SHMMAX
sudo sysctl -w kernel.shmmax=16777216
kernel.shmmax: 4096 -> 16777216<br>
# Set the value of SHMALL
sudo sysctl -w kernel.shmall=16777216
kernel.shmall: 4096 -> 16777216

Remember: to make the change permanent add these values in

/etc/sysctl.conf
 

Huge Pages

Linux, by default uses 4K memory pages, BSD has Super Pages, whereas Windows has Large Pages. A page is a chunk of RAM that is allocated to a process. A process may own more than one page depending on its memory requirements. The more memory a process needs, the more pages that are allocated to it. The OS maintains a table of page allocation to processes. The smaller the page size, the bigger the table, the more time required to lookup a page in that page table. Therefore, huge pages make it possible to use large amount of memory with reduced overheads; fewer page look ups, fewer page faults, faster read/write operations through larger buffers. This results in improved performance.

PostgreSQL has support for bigger pages on Linux only. By default, Linux uses 4K of memory pages, so in cases where there are too many memory operations, there is a need to set bigger pages. Performance gains have been observed by using huge pages with sizes 2 MB and up to 1 GB. The size of Huge Page can be set boot time. You can easily check the huge page settings and utilization on your Linux box using cat /proc/meminfo | grep -i huge command.

Note: This is only for Linux, for other OS this operation is ignored$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

In this example, although huge page size is set at 2,048 (2 MB), the total number of huge pages has a value of 0. which signifies that huge pages are disabled.

Script to quantify Huge Pages

This is a simple script which returns the number of Huge Pages required. Execute the script on your Linux box while your PostgreSQL is running. Ensure that $PGDATA environment variable is set to PostgreSQL’s data directory.

#!/bin/bash
pid=`head -1 $PGDATA/postmaster.pid`
echo "Pid:            $pid"
peak=`grep ^VmPeak /proc/$pid/status | awk '{ print $2 }'`
echo "VmPeak:            $peak kB"
hps=`grep ^Hugepagesize /proc/meminfo | awk '{ print $2 }'`
echo "Hugepagesize:   $hps kB"
hp=$((peak/hps))
echo Set Huge Pages:     $hp

The output of the script looks like this:

Pid:            12737
VmPeak:         180932 kB
Hugepagesize:   2048 kB
Set Huge Pages: 88

The recommended huge pages are 88, therefore you should set the value to 88.

sysctl -w vm.nr_hugepages= 88

Check the huge pages now, you will see no huge page is in use (HugePages_Free = HugePages_Total).

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       88
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now set the parameter huge_pages “on” in $PGDATA/postgresql.conf and restart the server.

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       81
HugePages_Rsvd:       64
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now you can see that a very few of the huge pages are used. Let’s now try to add some data into the database.

postgres=# CREATE TABLE foo(a INTEGER);
CREATE TABLE
postgres=# INSERT INTO foo VALUES(generate_Series(1,10000000));
INSERT 0 10000000

Let’s see if we are now using more huge pages than before.

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       18
HugePages_Rsvd:        1
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now you can see that most of the huge pages are in use.

Note: The sample value for HugePages used here is very low, which is not a normal value for a big production machine. Please assess the required number of pages for your system and set those accordingly depending on your systems workload and resources.

vm.swappiness

vm.swappiness is another kernel parameter that can affect the performance of the database. This parameter is used to control the swappiness (swapping pages to and from swap memory into RAM) behaviour on a Linux system. The value ranges from 0 to 100. It controls how much memory will be swapped or paged out. Zero means disable swap and 100 means aggressive swapping.

You may get good performance by setting lower values.

Setting a value of 0 in newer kernels may cause the OOM Killer (out of memory killer process in Linux) to kill the process. Therefore, you can be on safe side and set the value to 1 if you want to minimize swapping. The default value on a Linux system is 60. A higher value causes the MMU (memory management unit) to utilize more swap space than RAM, whereas a lower value preserves more data/code in memory.

A smaller value is a good bet to improve performance in PostgreSQL.

vm.overcommit_memory / vm.overcommit_ratio

Applications acquire memory and free that memory when it is no longer needed. But in some cases an application acquires too much memory and does not release it.  This can invoke the OOM killer. Here are the possible values for vm.overcommit_memory parameter with a description for each:

    1. Heuristic overcommit, Do it intelligently (default); based kernel heuristics
    2. Allow overcommit anyway
    3. Don’t over commit beyond the overcommit ratio.

Reference: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

vm.overcommit_ratio is the percentage of RAM that is available for overcommitment. A value of 50% on a system with 2 GB of RAM may commit up to 3 GB of RAM.

A value of 2 for vm.overcommit_memory yields better performance for PostgreSQL. This value maximises RAM utilization by the server process without any significant risk of getting killed by the OOM killer process. An application will be able to overcommit, but only within the overcommit ratio, thus reducing the risk of having OOM killer kill the process. Hence a value to 2 gives better performance than the default 0 value. However, reliability can be improved by ensuring that memory beyond an allowable range is not overcommitted. It avoid the risk of the process being killed by OOM-killer.

On systems without swap, one may experience problem when vm.overcommit_memory is 2.

https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

vm.dirty_background_ratio / vm.dirty_background_bytes

The vm.dirty_background_ratio is the percentage of memory filled with dirty pages that need to be flushed to disk. Flushing is done in the background. The value of this parameter ranges from 0 to 100; however a value lower than 5 may not be effective and some kernels do not internally support it. The default value is 10 on most Linux systems. You can gain performance for write intensive operations with a lower ratio, which means that Linux flushes dirty pages in the background.

You need to set a value of vm.dirty_background_bytes depending on your disk speed.

There are no “good” values for these two parameters since both depend on the hardware. However, setting vm.dirty_background_ratio to 5 and vm.dirty_background_bytes to 25% of your disk speed improves performance by up to ~25% in most cases.

vm.dirty_ratio / dirty_bytes

This is same as vm.dirty_background_ratio / dirty_background_bytes except that the flushing is done in the foreground, blocking the application. So vm.dirty_ratio should be higher than vm.dirty_background_ratio. This will ensure that background processes kick in before the foreground processes to avoid blocking the application, as much as possible. You can tune the difference between the two ratios depending on your disk IO load.

Summing up

You can tune other parameters for performance, but the improvement gains are likely to be minimal. We must keep in mind that not all parameters are relevant for all applications types. Some applications perform better by tuning some parameters and some applications don’t. You need to find a good balance between these parameter configurations for the expected application workload and type, and OS behaviour must also be kept in mind when making adjustments. Tuning kernel parameters is not as easy as tuning database parameters: it’s harder to be prescriptive.

In my next post, I’ll take a look at tuning PostgreSQL’s database parameters. You might also enjoy this post:

Tuning PostgreSQL for sysbench-tpcc

 

The post Tune Linux Kernel Parameters For PostgreSQL Optimization appeared first on Percona Database Performance Blog.

by Ibrar Ahmed at August 29, 2018 12:33 PM

August 28, 2018

Peter Zaitsev

Extend Metrics for Percona Monitoring and Management Without Modifying Code

PMM Extended Metrics

Percona Monitoring and Management (PMM) provides an excellent solution for system monitoring. Sometimes, though, you’ll have the need for a metric that’s not present in the list of node_exporter metrics out of the box. In this post, we introduce a simple method and show how to extend the list of available metrics without modifying the node_exporter code. It’s based on the textfile collector.

Enable the textfile collector in pmm-client

This collector is not enabled by default in the latest version of pmm-client. So, first let’s enable the textfile collector.

# pmm-admin rm linux:metrics
OK, removed system pmm-client-hostname from monitoring.
# pmm-admin add linux:metrics -- --collectors.enabled=diskstats,filefd,filesystem,loadavg,meminfo,netdev,netstat,stat,time,uname,vmstat,textfile --collector.textfile.directory="/tmp"
OK, now monitoring this system.
# pmm-admin ls
pmm-admin 1.13.0
PMM Server      | 10.178.1.252  
Client Name     | pmm-client-hostname
Client Address  | 10.178.1.252  
Service Manager | linux-upstart
-------------- -------------------- ----------- -------- ------------ --------
SERVICE TYPE   NAME                 LOCAL PORT  RUNNING  DATA SOURCE  OPTIONS  
-------------- -------------------- ----------- -------- ------------ --------
linux:metrics  pmm-client-hostname  42000       YES      -

Notice that the whole list of default collectors has to be re-enabled. Also, don’t forget to specify the directory for reading files with the metrics (–collector.textfile.directory=”/tmp”). The exporter reads files with the extension .prom

Add a crontab task

The second step is to add a crontab task to collect metrics and place them into a file.

Here are the cron commands for collecting the number of running and stopping docker containers.

*/1 * * * *     root   echo -n "" > /tmp/docker_all.prom; /usr/bin/docker ps -a | sed -n '1!p'| /usr/bin/wc -l | sed -ne 's/^/node_docker_containers_total /p' >> /tmp/docker_all.prom;
*/1 * * * *     root   echo -n "" > /tmp/docker_running.prom; /usr/bin/docker ps | sed -n '1!p'| /usr/bin/wc -l | sed -ne 's/^/node_docker_containers_running_total /p' >>/tmp/docker_running.prom;

The result of the commands is placed into the files 

/tmp/docker_running.prom
and
/tmp/docker_running.prom
and read by exporter.

Look - we got a new metric!

Adding the crontab tasks by using a script

Also, we have a few bash scripts that make it much easier to add crontab tasks.

The first one allows you to collect the logged-in users and the size of Innodb data files.

Modifying the cron job - a script

You may use the suggested names of files and metrics or set new ones.

The second script is more universal. It allows us to get the size of any directories or files. This script can be placed directly into a crontab task. You should just specify the list of monitored instances (e.g. /var/log /var/cache/apt /var/lib/mysql/ibdata1)

echo  "*/5 * * * * root bash  /root/object_sizes.sh /var/log /var/cache/apt /var/lib/mysql/ibdata1"  > /etc/cron.d/object_size

So, I hope this has provided useful insight into how to set up the collection of new PMM metrics without the need to write code. Please feel free to use the scripts or configure commands similar to the ones provided above.

More resources you might enjoy

If you are new to PMM, there is a great demo site of the latest version, showing you those out of the box metrics. Or how about our free webinar on monitoring Amazon RDS with PMM?

The post Extend Metrics for Percona Monitoring and Management Without Modifying Code appeared first on Percona Database Performance Blog.

by Vadim Yalovets at August 28, 2018 01:56 PM

Webinar Wed 8/29: Databases in the Hosted Cloud

databases-in-the-cloud

databases-in-the-cloudPlease join Percona’s Chief Evangelist, Colin Charles on Wednesday, August 29th, 2018, as he presents Databases in the Hosted Cloud at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

 

Nearly everyone today uses some form of database in the hosted cloud. You can use hosted MySQL, MariaDB, Percona Server, and PostgreSQL in several cloud providers as a database as a service (DBaaS).

In this webinar, Colin Charles explores how to efficiently deploy a cloud database configured for optimal performance, with a particular focus on MySQL.

You’ll learn the differences between the various public cloud offerings for Amazon RDS including Aurora, Google Cloud SQL, Rackspace OpenStack DBaaS, Microsoft Azure, and Alibaba Cloud, as well as the access methods and the level of control you have. Hosting in the cloud can be a challenge but after today’s webinar, we’ll make sure you walk away with a better understanding of how you can leverage the cloud for your business needs.

Topics include:

  • Backup strategies
  • Planning multiple data centers for availability
  • Where to host your application
  • How to get the most performance out of the solution
  • Cost
  • Monitoring
  • Moving from one DBaaS to another
  • Moving from a DBaaS to your own hosted platform

Register Now.

The post Webinar Wed 8/29: Databases in the Hosted Cloud appeared first on Percona Database Performance Blog.

by Colin Charles at August 28, 2018 01:54 PM

August 27, 2018

Peter Zaitsev

MySQL 8: Load Fine Tuning With Resource Groups

MySQL Resource Groups, introduced in MySQL 8, provide the ability to manipulate the assignment of running threads to specific resources, thereby allowing the DBA to manage application priorities. Essentially, you can assign a thread to a specific virtual CPU. In this post, I’m going to take a look at how these might work in practice.

Let us start with a disclaimer.

What I am going to discuss here is NOT common practice. This is advanced load optimization, and you should approach/implement it ONLY if you are 100% sure of what you are doing, and, more importantly, if you know what you are doing, and why you are doing it.

Overview

MySQL 8 introduced a feature that is explained only in a single documentation page. This feature can help a lot if used correctly, and hopefully they will not deprecate or remove it after five minutes. It is well hidden in the Optimization: Optimizing the MySQL Server chapter.

I am talking about resource groups. Resource groups permit assigning threads running within MySQL to particular groups so that threads execute according to the resources available to this group. Group attributes enable control over resources to enable or restrict resource consumption by threads in the group. DBAs can modify these attributes as appropriate for different workloads.

Currently, CPU affinity (i.e., assigning to a specific CPU) is a manageable resource, represented by the concept of “virtual CPU” as a term that includes CPU cores, hyperthreads, hardware threads, and so forth. MySQL determines, at startup, how many virtual CPUs are available. Database administrators with appropriate privileges can associate virtual CPUs with resource groups and assign threads to these groups.

In short, you can define that, this specific thread (ergo connection unless connection pooling OR ProxySQL with multiplexing), will use that usage CPU and will have the given priority.

Setting this by thread can be:

  1. Dangerous
  2. Not useful

Dangerous, because if you set this to a thread when using connection pooling OR ProxySQL and multiplexing, you may end up assigning a limitation to queries that instead, you wanted to  run efficiently.

Not useful because unless you spend the time looking at the processlist (full), and/or have a script running all the time that catches what you need, 99% of the time you will not be able to assign the group efficiently.

So? Another cool useless feature???

Nope…

Resource groups can be referenced inside a single statement, which means I can have only that query utilizing that resource group. Something like this will do the magic:

SELECT /*+ RESOURCE_GROUP(NAME OF THE RG) */ id, millid, date,active,kwatts_s FROM sbtest29 WHERE id=44

But if I run:

SELECT id, millid, date,active,kwatts_s FROM sbtest29 WHERE id=44

No resource group utilization even if I am using the same connection.

This is cool, isn’t it?

What is the possible usage?

In general, you can see this as a way to limit the negative impact of queries that you know will be problematic for others.

Good examples are:

  • ETL processes for data archiving, reporting, data consolidation and so on
  • Applications that are not business critical and can wait, while your revenue generator application cannot
  • GUI Client applications, used by some staff of your company, that mainly create problems for you while they claim they are working.

“Marco, that could make sense … but what should I do to have it working? Rewrite my whole application to add this feature?”

Good question! Thanks!

We can split the task of having a good Resource Group implementation into three steps:

  1. You must perform an analysis of what you want to control. You need to identify the source (like TCP/IP if it is fixed, username) and design which settings you want for your resource groups. Identify if you only want to reduce the CPU priority, or if you want to isolate the queries on a specific CPU, or a combination of the two.
  2. Implement the resource groups in MySQL.
  3. Implement a way to inject the string comment into the SQL.

About the last step, I will show you how to do this in a straightforward way with ProxySQL, but hey… this is really up to you. I will show you the easy way, but if you prefer a more difficult route, that’s good for me too.

The Setup

In my scenario, I have a very noisy secondary application written by a very, very bad developer that accesses my servers, mostly with read queries, and occasionally with write updates. Reads and writes are obsessive and create an impact on the MAIN application. My task is to limit the impact of this secondary application without having the main one affected.

To do that I will create two resource groups, one for WRITE and another for READ.

The first group, Write_app2, will have no cpu affiliation, but will have lowest (19) priority:

CREATE RESOURCE GROUP Write_app2 TYPE=USER THREAD_PRIORITY=19;

The second group, Select_app2, will have CPU affiliation AND lowest priority;

CREATE RESOURCE GROUP Select_app2 TYPE=USER VCPU=5 THREAD_PRIORITY=19;

Finally, I have identified that the application is connecting from several sources BUT it uses a common username APP2. Given that, I will use the user name to inject the instructions into the SQL using ProxySQL (I could have also used the IP, or the schema name, or destination port, or something in the submitted SQL. In short, any possible filter in the query rules).

To do that I will need four query rules:

insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(80,6033,'app1',80,1,3,'^SELECT.*FOR UPDATE',1,1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(81,6033,'app1',81,1,3,'^SELECT.*',1,1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(82,6033,'app2',80,1,3,'^SELECT.*FOR UPDATE',1,1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(83,6033,'app2',81,1,3,'^SELECT.*',1,1);

To identify and redirect the query for R/W split.

INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,comment) VALUES (32,1,'app2',"(^SELECT)\s*(.*$)","\1 /*+ RESOURCE_GROUP(Select_app2) */ \2 ",0,"Lower prio and CPU bound on Reader");
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,comment) VALUES (33,1,'app2',"^(INSERT|UPDATE|DELETE)\s*(.*$)","\1 /*+ RESOURCE_GROUP(Write_app2) */ \2 ",0,"Lower prio on Writer");

And a user definition like:

insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('app2','test',1,80,'mysql',1);
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('app1','test',1,80,'mysql',1);

One important step you need to do ON ALL the servers you want to include in the Resource Group utilization, is to be sure you have

CAP_SYS_NICE
  capability set.

On Linux, resource group thread priorities are ignored unless the CAP_SYS_NICE capability is set. MySQL package installers for Linux systems should set this capability. For installation using a compressed tar file binary distribution or from source, the CAP_SYS_NICE capability can be set manually using the setcap command, specifying the pathname to the mysqld executable (this requires sudo access). You can check the capabilities using getcap. For example:

shell> sudo setcap cap_sys_nice+ep <Path to you mysqld executable>
shell> getcap ./bin/mysqld
./bin/mysqld = cap_sys_nice+ep

If manual setting of CAP_SYS_NICE is required, then you will need to do it every time you perform a new install.

As a reference here is a table about CPU priority:

Priority Range Windows Priority Level
-20 to -10 THREAD_PRIORITY_HIGHEST
-9 to -1 THREAD_PRIORITY_ABOVE_NORMAL
0 THREAD_PRIORITY_NORMAL
1 to 10 THREAD_PRIORITY_BELOW_NORMAL
11 to 19 THREAD_PRIORITY_LOWEST

 

Summarizing here the whole set of steps on my environment:

1) Check the CAP_SYS_NICE

getcap /opt/mysql_templates/mysql-8P/bin/mysqld
setcap cap_sys_nice+ep /opt/mysql_templates/mysql-8P/bin/mysqld

2) Create the user in MySQL and resource groups

create user app2@'%' identified by 'test';
GRANT ALL PRIVILEGES ON `windmills2`.* TO `app2`@`%`;
CREATE RESOURCE GROUP Select_app2 TYPE=USER VCPU=5 THREAD_PRIORITY=19;
CREATE RESOURCE GROUP Write_app2 TYPE=USER THREAD_PRIORITY=19;

To check :

SELECT * FROM INFORMATION_SCHEMA.RESOURCE_GROUPS;

3) Create ProxySQL user and rules

insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('app2','test',1,80,'mysql',1);
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('app1,'test',1,80,'mysql',1);
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(83,6033,'app2',80,1,3,'^SELECT.*FOR UPDATE',1,1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(84,6033,'app2',81,1,3,'^SELECT.*',1,1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(85,6033,'app2',80,0,3,'.',1,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,comment) VALUES (32,0,'app2',"(^SELECT)\s*(.*$)","\1 /*+ RESOURCE_GROUP(Select_app2) */ \2 ",0,"Lower prio and CPU bound on Reader");
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,comment) VALUES (33,0,'app2',"^(INSERT|UPDATE|DELETE)\s*(.*$)","\1 /*+ RESOURCE_GROUP(Write_app2) */ \2 ",0,"Lower prio on Writer");
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

For several reasons I will add the resource groups query rules as INACTIVE for now.

Done…

Testing

Will this work?
We need to see the impact of the bad application on my production application.
Then we need to see IF implementing the tuning will work or not.

To do a basic check I run four tests:

  • test1 run both apps with read/write and rule disabled for RG
  • test2 run an application at a time without RG
  • test3 run only App2 with RG to see the cost on the execution
  • test4 run both to see what happen with RG

Test 1

Master

Test 1 master 1 current CPU core utilization

Slave

This test aims to have an idea, right away, of what happens when both applications are running, without limits.
As we can see during the test, all cores are utilized, some more consistently and some a bit less so, but nothing huge.
What is interesting is to see the effect on the response time and the number of events each application can execute:

The execution graph indicates a very high time in Insert, and Delete for App1, with the results showing very bad performance only nine inserts, 1333 deletes, and 165 selects.

But what is the application actually supposed to do? Test 2 will tell us, creating de facto our baseline.

Test 2

In this test I had run each application separately, so no interference.
Master App1

Test 2 master 1 current CPU core utilization

Master App2

Slave App1

Test 2 slave1 current CPU core utilization

Slave App2

Test 2 slave 2 current CPU core utilization

Nothing significantly different in the CPU utilization when App1 was running, while we can see a bit less utilization in the case of App2.

The impact on the performance is, however, more significant:

Execution time for insert, delete drops significantly for App1 and we can see that the application SHOULD be able to insert ~1320 events and perform a significantly higher number of operations. Same for App2, but here we care more about the OLTP than the ETL application.

So, what will happen IF we activate the Resource Group flags to the App2 (ETL) performance? Let’s see with test 3.

Test 3

Running only App2 with active resource groups
Master App2

Slave App2

On the master, what the RG settings will do is just reduce the priority, given that no other process is running and no other application is connected, the impact is not high.
On the other hand, on the slave we can clearly see that now App2 can only use core 5 as indicated in our configuration.
So far so good, what will be the performance loss? Let’s see:

Comparing the two tests 2 and 3, we can see that in applying the resource groups our ETL application has a minimal but existing impact. That is expected, desired and must be noted. The impact is not high in this test, but it can expand the running time in the real world.

It’s time to combine all and see what is going on.

Test 4

Run our OLTP application while the ETL is running under Resource Group.
Master

Slave

Looking at the CPU utilization these graphs are very similar to the ones in test1, but the result is totally different:

The execution time for App1 (OLTP) has dropped significantly while the performance has increased almost as if nothing else is running. At the same time App2 has lost performance, and this must be taken into account, but it will not stop/prevent the ETL process to run.


It is possible to do more tuning in the case that ETL is too compromised. Or maybe modify the Servers layout such as adding a Slave and dedicating it to ETL reads. The combinations and possibilities are many.

Conclusion

Just looking to the final graphs will help us to reach our conclusions:

Comparing the two tests 1 and 4 we can see how using the Resource Group will help us to correctly balance the workload and optimize the performance in the case of unavoidable contention between different applications.

At the same time, using Resource Group alone as a blanket setting is not optimal because it can fail its purpose. Instead of providing some improvement, it can unpredictably affect all the traffic. It is also not desirable to modify the code in order to implement it at the query level, given the possible impact of doing that in cost and time.
The introduction of ProxySQL with query rewrite, allows us to utilize the per query option, without the need for any code modification, and allow us to specify what we want, with very high level of granularity.

Once more do not do this by yourself unless you are more than competent and know 100% what you are doing. In any case, remember that an ETL process may take longer and that you need to plan your work/schedule accordingly.

Good MySQL everyone.

References

The post MySQL 8: Load Fine Tuning With Resource Groups appeared first on Percona Database Performance Blog.

by Marco Tusa at August 27, 2018 10:38 PM

Webinar Tuesday, 8/28: Forking or Branching – Lessons from the MySQL Community

forking or branching

forking or branchingPlease join Percona’s CEO, Peter Zaitsev as he presents Forking or Branching – Lessons from the MySQL Community on Tuesday, August 28th, 2018 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

 

The MySQL Community offers a great example of various forks and branches, with MariaDB being the most well-known fork, and companies like Percona, Facebook and Alibaba maintaining their own branches.

In this presentation we will look at the history of MySQL, the causes of MySQL forking and branching, and discuss the benefits and drawbacks of both approaches, using specific examples from the MySQL ecosystem.

Register for the webinar.

Peter ZaitsevPeter Zaitsev, CEO and Co-Founder

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Inc. 5000 named Percona to their list in 2013, 2014, 2015 and 2016. Peter was an early employee at MySQL AB, eventually leading the company’s High-Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High-Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his ebook Practical MySQL Performance Optimization is one of Percona’s most popular downloads.

 

The post Webinar Tuesday, 8/28: Forking or Branching – Lessons from the MySQL Community appeared first on Percona Database Performance Blog.

by Peter Zaitsev at August 27, 2018 09:50 PM

August 24, 2018

Peter Zaitsev

This Week in Data with Colin Charles 50: Percona Live Europe Sessions, PostgreSQL in Google Cloud

Colin Charles

Colin Charles

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Grading is underway for talks at Percona Live Europe 2018. I understand that by next week you will see the tutorial schedule released. As part of the program committee, I have enjoyed reviewing tutorials, and I reckon there is great competition for the schedule. I suggest you register now, and don’t forget to book your accommodation (need a discount?).

A video worth watching: How we Live Migrated Millions of BBM Users & its Infrastructure Across the Pacific (Cloud Next ’18). I was a big Blackberry Messenger (BBM) user in its heyday. They moved from their Canadian data center to Google Cloud. Blackberry used MySQL, migrated Oracle to PostgreSQL, as Oracle isn’t sanctioned to run on Google Cloud Platform. They also moved from MySQL to Google CloudSQL since the databases weren’t critical to running the application. Some cases write to BigQuery and have Tableau query them. They make use of Cassandra native replication. With PostgreSQL, there is master-slave replication: they got users onto the new master and then promoted the master. When they did the master promotion, there was minimal (5-10 minutes) downtime while this process was ongoing. This didn’t impact their key services, messages continued to work, and this was seen as acceptable.

Releases

  • MariaDB 10.3.9 – InnoDB from 5.7.23, a new variable innodb_log_optimize_ddl for avoiding delay due to page flushing and allowing concurrent backup, and many fixes and improvements around ALTER TABLE too.
  • Percona Server for MySQL 5.6.41-84.1 – full-text search index with InnoDB table bug fixed, as well as ensuring queries on a table with CHARSET=euckr COLLATE=euckr_bin always return the same results.
  • Percona Server for MySQL 5.5.61-38.13  bug fixes.

Link List

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 50: Percona Live Europe Sessions, PostgreSQL in Google Cloud appeared first on Percona Database Performance Blog.

by Colin Charles at August 24, 2018 08:18 PM

PostgreSQL Accessing MySQL as a Data Source Using mysql_fdw

PostgreSQL foreign tables in MySQL

PostgreSQL foreign tables in MySQLThere are many organizations where front/web-facing applications use MySQL and back end processing uses PostgreSQL®. Any system integration between these applications generally involves the replication—or duplication—of data from system to system. We recently blogged about pg_chameleon which can be used replicate data from MySQL® to PostgreSQL. mysql_fdw can play a key role in eliminating the problem of replicating/duplicating data. In order to eliminate maintaining the same data physically in both postgres and MySQL, we can use mysql_fdw. This allows PostgreSQL to access MySQL tables and to use them as if they are local tables in PostgreSQL. mysql_fdw can be used, too, with Percona Server for MySQL, our drop-in replacement for MySQL.

This post is to showcase how easy it is to set that up and get them working together. We will address a few points that we skipped while discussing about FDWs in general in our previous post

Preparing MySQL for fdw connectivity

On the MySQL server side, we need to set up a user to allow for access to MySQL from the PostgreSQL server side. We recommend Percona Server for MySQL if you are setting it up for the first time.

mysql> create user 'fdw_user'@'%' identified by 'Secret!123';

This user needs to have privileges on the tables which are to be presented as foreign tables in PostgreSQL.

mysql> grant select,insert,update,delete on EMP to fdw_user@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> grant select,insert,update,delete on DEPT to fdw_user@'%';
Query OK, 0 rows affected (0.00 sec)

Installing mysql_fdw on PostgreSQL server

Under the hood, MySQL FDW (mysql_fdw) facilitates the use of PostgreSQL server as a client for MySQL Server, which means it can then fetch data from the MySQL database as a client. Obviously, mysql_fdw uses MySQL client libraries. Nowadays, many Linux distributions are packaged with MariaDB® libraries. This works well enough for mysql_fdw to function. If we install mysql_fdw from the PGDG repo, then mariadb-devel.x86_64 packages will be installed alongside other development packages. To switch to Percona packages as client libraries, you need to have the Percona development packages too.

sudo yum install Percona-Server-devel-57-5.7.22-22.1.el7.x86_64.rpm

Now we should be able to install the mysql_fdw from PGDG repository:

sudo yum install mysql_fdw_10.x86_64

Connect to the PostgreSQL server where we are going to create the foreign table, and using the command line tool, create mysql_fdw extension:

postgres=# create extension mysql_fdw;
CREATE EXTENSION

Create a server definition to point to the MySQL server running on a host machine by specifying the hostname and port:

postgres=# CREATE SERVER mysql_svr  FOREIGN DATA WRAPPER mysql_fdw OPTIONS (host 'hr',port '3306');
CREATE SERVER

Now we can create a user mapping. This maps the database user in PostgreSQL to the user on the remote server (MySQL). While creating the user mapping, we need to specify the user credentials for the MySQL server as shown below. For this demonstration, we are using PUBLIC user in PostgreSQL. However, we could use a specific user as an alternative.

postgres=# CREATE USER MAPPING FOR PUBLIC SERVER mysql_svr OPTIONS (username 'fdw_user',password 'Secret!123');
CREATE USER MAPPING

Import schema objects

Once we complete the user mapping, we can import the foreign schema.

postgres=# IMPORT FOREIGN SCHEMA hrdb FROM SERVER mysql_svr INTO public;

Or we have the option to import only selected tables from the foreign schema.

postgres=# IMPORT FOREIGN SCHEMA hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO public;

This statement says that the tables “EMP” and “DEPT” from the foreign schema named “hrdb” in mysql_server need to be imported into the  public schema of the PostgreSQL database.

FDWs in PostgreSQL allow us to import the tables to any schema in postgres.

Let’s create a schema in postgres:

postgres=# create schema hrdb;
postgres=# IMPORT FOREIGN SCHEMA hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO hrdb;

Suppose we need the foreign table to be part of multiple schemas of PostgreSQL. Yes, it is possible.

postgres=# create schema payroll;
CREATE SCHEMA
postgres=# create schema finance;
CREATE SCHEMA
postgres=# create schema sales;
CREATE SCHEMA
postgres=# IMPORT FOREIGN SCHEMA  hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO payroll;
IMPORT FOREIGN SCHEMA
postgres=# IMPORT FOREIGN SCHEMA  hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO finance;
IMPORT FOREIGN SCHEMA
postgres=# IMPORT FOREIGN SCHEMA  hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO sales;
IMPORT FOREIGN SCHEMA

You might be wondering if there’s a benefit to doing this. Yes, since in a multi-tenant environment, it allows us to centralize many of the master/lookup tables. These can even sit in a remote server, and that can be MySQL as well!.

IMPORTANT: PostgreSQL extensions are database specific. So if you have more than one database inside a PostgreSQL instance/cluster, you have to create a separate fdw extension, foreign server definition and user mapping.

Foreign tables with a subset of columns

Another important property of foreign tables is that you can have a subset of columns if you are not planning to issue DMLs on the remote table. For example MySQL’s famous sample database Sakila contains a table “film” with the following definition

CREATE TABLE `film` (
`film_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`description` text,
`release_year` year(4) DEFAULT NULL,
`language_id` tinyint(3) unsigned NOT NULL,
`original_language_id` tinyint(3) unsigned DEFAULT NULL,
`rental_duration` tinyint(3) unsigned NOT NULL DEFAULT '3',
`rental_rate` decimal(4,2) NOT NULL DEFAULT '4.99',
`length` smallint(5) unsigned DEFAULT NULL,
`replacement_cost` decimal(5,2) NOT NULL DEFAULT '19.99',
`rating` enum('G','PG','PG-13','R','NC-17') DEFAULT 'G',
`special_features` set('Trailers','Commentaries','Deleted Scenes','Behind the Scenes') DEFAULT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`film_id`),
KEY `idx_title` (`title`),
KEY `idx_fk_language_id` (`language_id`),
KEY `idx_fk_original_language_id` (`original_language_id`),
CONSTRAINT `fk_film_language` FOREIGN KEY (`language_id`) REFERENCES `language` (`language_id`) ON UPDATE CASCADE,
CONSTRAINT `fk_film_language_original` FOREIGN KEY (`original_language_id`) REFERENCES `language` (`language_id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8

Imagine that we don’t need all of these fields to be available to the PostgreSQL database and its application. In such cases, we can create a foreign table with only the necessary columns in the PostgreSQL side. For example:

CREATE FOREIGN TABLE film (
film_id smallint NOT NULL,
title varchar(255) NOT NULL,
) SERVER mysql_svr OPTIONS (dbname 'sakila', table_name 'film');

The challenges of incompatible syntax and datatypes

There are many syntactical differences between MySQL and PostgreSQL. Consequently, you may need to manually intervene to create foreign tables. For example, MySQL tables accepts definition of enumerations in place, whereas PostgreSQL expects enumeration types to be defined before creating the table like this:

CREATE TYPE rating_t AS enum('G','PG','PG-13','R','NC-17');

Many such things are not handled perfectly. So it is better to specify them as a text datatype. The same applies to the set datatype.

CREATE FOREIGN TABLE film (
film_id smallint NOT NULL,
title varchar(255) NOT NULL,
rating text,
special_features text
) SERVER mysql_svr OPTIONS (dbname 'sakila', table_name 'film');

I’m used to receiving scepticism from people about treating enum and set as text. Well, please don’t forget that we are not storing them in PostgreSQL, the text datatype is just a method for handling input and output from the table. The data is pulled and pushed from the foreign server, which is MySQL, and this converts these texts into the corresponding enumeration before storing them.

IMPORTANT : mysql_fdw has the capability to do data type conversion (casting) automatically behind the scenes when a user fires DML against foreign tables.

Generally, DML against a remote MySQL database from the PostgreSQL side can be quite challenging because of the architecture differences. These impose restrictions, such as the first column of the foreign table must be unique. We will cover these in more depth in a future post.

Handling views on the MySQL side

While foreign tables are not limited tables on the MySQL side, a view can also be mapped as a foreign table. Let’s create a view in the MySQL database.

mysql> create view v_film as select film_id,title,description,release_year from film;

PostgreSQL can treat this view as a foreign table:

postgres=# CREATE FOREIGN TABLE v_film (
film_id smallint,
title varchar(255) NOT NULL,
description text,
release_year smallint ) SERVER mysql_svr OPTIONS (dbname 'sakila', table_name 'v_film');
CREATE FOREIGN TABLE

Views on the top of foreign table on PostgreSQL

PostgreSQL allows us to create views on the top of foreign tables. This might even be pointing to a view on the remote MySQL server. Let’s try creating a view using the newly created foreign table v_film.

postgres=# create view v2_film as select film_id,title from v_film;
postgres=# explain verbose select * from v2_film;
QUERY PLAN
--------------------------------------------------------------------------
Foreign Scan on public.v_film  (cost=10.00..1010.00 rows=1000 width=518)
Output: v_film.film_id, v_film.title
Local server startup cost: 10
Remote query: SELECT `film_id`, `title` FROM `sakila`.`v_film`
(4 rows)

Materializing the foreign tables (Materialized Views)

One of the key features mysql_fdw implements is the ability to support persistent connections. After query execution, the connection to the remote MySQL database is not dropped. Instead it retains the connection for the next query from the same session. Nevertheless, in some situations, there will be concerns about continuously streaming data from the source database (MySQL) to the destination (PostgreSQL). If you have a frequent need to access data from foreign tables, you could consider the option of materializing the data locally. It is possible to create a materialized view on the top of the foreign table.

postgres=# CREATE MATERIALIZED VIEW mv_film as select * from film;
SELECT 1000

Whenever required, we can just refresh the materialized view.

postgres=# REFRESH MATERIALIZED VIEW mv_film;
REFRESH MATERIALIZED VIEW

Automated Cleanup

One of the features I love about the FDW framework is its ability to clean up foreign tables in a single shot. This is very useful when we setup foreign table for a temporary purpose, like data migration. At the very top level, we can drop the extension, PostgreSQL will walk through the dependencies and drop those too.

postgres=# drop extension mysql_fdw cascade;
NOTICE:  drop cascades to 12 other objects
DETAIL:  drop cascades to server mysql_svr
drop cascades to user mapping for public on server mysql_svr
drop cascades to foreign table "DEPT"
drop cascades to foreign table "EMP"
drop cascades to foreign table hrdb."DEPT"
drop cascades to foreign table hrdb."EMP"
drop cascades to foreign table payroll."DEPT"
drop cascades to foreign table payroll."EMP"
drop cascades to foreign table finance."DEPT"
drop cascades to foreign table finance."EMP"
drop cascades to foreign table sales."DEPT"
drop cascades to foreign table sales."EMP"
DROP EXTENSION
postgres=#

Conclusion

I should concede that the features offered by mysql_fdw are far fewer compared to postgres_fdw. Many of the features are not yet implemented, including column renaming. But the good news is that the key developer and maintainer of mysql_fdw is here with Percona! Hopefully, we will be able to put more effort into implementing some of the missing features. Even so, we can see here that the features implemented so far are powerful enough to support system integration. We can really make the two sing together!

Percona’s support for PostgreSQL

As part of our commitment to being unbiased champions of the open source database eco-system, Percona offers support for PostgreSQL – you can read more about that here.

The post PostgreSQL Accessing MySQL as a Data Source Using mysql_fdw appeared first on Percona Database Performance Blog.

by Jobin Augustine at August 24, 2018 03:38 PM

August 23, 2018

Peter Zaitsev

Comparing Data At-Rest Encryption Features for MariaDB, MySQL and Percona Server for MySQL

Encryption at rest MariaDB MySQL Percona Server

Encryption at rest MariaDB MySQL Percona ServerProtecting the data stored in your database may have been at the top of your priorities recently, especially with the changes that were introduced earlier this year with GDPR.

There are a number of ways to protect this data, which until not so long ago would have meant either using an encrypted filesystem (e.g. LUKS), or encrypting the data before it is stored in the database (e.g. AES_ENCRYPT or other abstraction within the application). A few years ago, the options started to change, as Alexander Rubin discussed in MySQL Data at Rest Encryption, and now MariaDB®, MySQL®, and Percona Server for MySQL all support encryption at-rest. However, the options that you have—and, indeed, the variable names—vary depending upon which database version you are using.

In this blog post we will take a look at what constitutes the maximum level of at-rest encryption that can be achieved with each of the latest major GA releases from each provider. To allow a fair comparison across the three, we will focus on the file-based key management; keyring_file plugin for MySQL and Percona Server for MySQL along with file_key_management plugin for MariaDB.

MariaDB 10.3

The MariaDB team take the credit for leading the way with at-rest encryption, as most of their features have been present since the 10.1 release (most notably the beta release of 10.1.3 in March 2015). Google donated the tablespace encryption, and eperi donated per-table encryption and key identifier support.

The current feature set for MariaDB 10.3 comprises of the following variables:

Maximising at-rest encryption with MariaDB 10.3

Using the following configuration would give you maximum at-rest encryption with MariaDB 10.3:

plugin_load_add = file_key_management
file_key_management_filename = /etc/mysql/keys.enc
file_key_management_filekey = FILE:/etc/mysql/.key
file_key_management_encryption_algorithm = aes_cbc
innodb_encrypt_log = ON
innodb_encrypt_tables = FORCE
Innodb_encrypt_threads = 4
encrypt_binlog = ON
encrypt_tmp_disk_tables = ON
encrypt_tmp_files = ON
aria_encrypt_tables = ON

This configuration would provide the following at-rest protection:

  • automatic and enforced InnoDB tablespace encryption
  • automatic encryption of existing tables that have not been marked with
    ENCRYPTED=NO
  • 4 parallel encryption threads
  • encryption of temporary files and tables
  • encryption of Aria tables
  • binary log encryption
  • an encrypted file that contains the main encryption key

You can read more about preparing the keys, as well as the other key management plugins in the Encryption Key Management docs.

There is an existing bug related to encrypt_tmp_files (MDEV-14884), which causes the use of

mysqld --help --verbose
 to fail, which if you are using the official MariaDB Docker container for 10.3 will cause you to be unable to keep mysqld up and running. Messages similar to these would be visible in the Docker logs:

ERROR: mysqld failed while attempting to check config
command was: "mysqld --verbose --help --log-bin-index=/tmp/tmp.HDiERM4SPx"
2018-08-15 13:38:15 0 [Note] Plugin 'FEEDBACK' is disabled.
2018-08-15 13:38:15 0 [ERROR] Failed to enable encryption of temporary files
2018-08-15 13:38:15 0 [ERROR] Aborting

N.B. you should be aware of the limitations for the implementation, most notably log tables and files are not encrypted and may contain data along with any query text.

One of the key features supported by MariaDB that is not yet supported by the other providers is the automatic, parallel encryption of tables that will occur when simply enabling

innodb_encrypt_tables
 . This avoids the need to mark the existing tables for encryption using
ENCRYPTED=YES
 , although at the same time it also does not automatically add the comment and so you would not see this information. Instead, to check for encrypted InnoDB tables in MariaDB you should check
information_schema.INNODB_TABLESPACES_ENCRYPTION
 , an example query being:

mysql> SELECT SUBSTRING_INDEX(name, '/', 1) AS db_name,
   ->   SUBSTRING_INDEX(name, '/', -1) AS db_table,
   ->   COALESCE(ENCRYPTION_SCHEME, 0) AS encrypted
   -> FROM information_schema.INNODB_SYS_TABLESPACES
   -> LEFT JOIN INNODB_TABLESPACES_ENCRYPTION USING(name);
+---------+----------------------+-----------+
| db_name | db_table             | encrypted |
+---------+----------------------+-----------+
| mysql   | innodb_table_stats   |      1    |
| mysql   | innodb_index_stats   |      0    |
| mysql   | transaction_registry |      0    |
| mysql   | gtid_slave_pos       |      0    |
+---------+----------------------+-----------+

As can be inferred from this query, the system tables in MariaDB 10.3 are still predominantly MyISAM and as such cannot be encrypted.

MySQL

Whilst the enterprise version of MySQL has support for a number of data at-rest encryption features as of 5.7, most of them are not available to the community edition. The latest major release of the community version sees the main feature set comprise of:

The enterprise edition adds the following extra support:

Maximising at-rest encryption with MySQL 8.0

Using the following configuration would give you maximum at-rest encryption with MySQL 8.0:

early-plugin-load=keyring_file.so
keyring_file_data=/var/lib/mysql/keyring
innodb_redo_log_encrypt=ON
innodb_undo_log_encrypt=ON

This configuration would provide the following at-rest protection:

  • optional InnoDB tablespace encryption
  • redo and undo log encryption

You would need to create new, or alter existing tables with the

ENCRYPTION=Y
 option, which would then be visible by examining
information_schema.INNODB_TABLESPACES
 , an example query being:

mysql> SELECT TABLE_SCHEMA AS db_name,
   ->    TABLE_NAME AS db_table,
   ->    CREATE_OPTIONS LIKE '%ENCRYPTION="Y"%' AS encrypted
   -> FROM information_schema.INNODB_TABLESPACES ts
   -> INNER JOIN information_schema.TABLES t ON t.TABLE_SCHEMA = SUBSTRING_INDEX(ts.name, '/', 1)
   ->                                        AND t.TABLE_NAME = SUBSTRING_INDEX(ts.name, '/', -1);
+---------+-----------------+-----------+
| db_name | db_table        | encrypted |
+---------+-----------------+-----------+
| sys     | sys_config      |     1     |
+---------+-----------------+-----------+

N.B. You are able to encrypt the tablespaces in 5.7, in which case you should use

information_schema.INNODB_SYS_TABLESPACES
 as the internal system views on the data dictionary were renamed (InnoDB Changes).

Unfortunately, whilst all of the tables in the mysql schema use the InnoDB engine (except for the log tables), you cannot encrypt them and instead get the following error:

ERROR 3183 (HY000): This tablespace can't be encrypted.

Interestingly, you are led to believe that you can indeed encrypt the

general_log
 and
slow_log
 tables, but this is in fact a bug (#91791).

Percona Server for MySQL

Last, but not least we have Percona Server for MySQL, which, whilst not completely matching MariaDB for features, is getting very close. As we shall see shortly, it does in fact have some interesting differences to both MySQL and MariaDB.

The current feature set for 5.7, which does indeed exceed the features provided by MySQL 5.7 and for the most part 8.0, is as follows:

Maximising at-rest encryption with Percona Server for MySQL 5.7

Using the following configuration would give you maximum at-rest encryption with Percona Server 5.7:

early-plugin-load=keyring_file.so
keyring_file_data=/var/lib/mysql-keyring/keyring
innodb_temp_tablespace_encrypt=ON
innodb_encrypt_online_alter_logs=ON
innodb_encrypt_tables=FORCE
encrypt_binlog=ON
encrypt_tmp_files=

This configuration would provide the following at-rest protection:

  • automatic and enforced InnoDB tablespace encryption
  • encryption of temporary files and tables
  • binary log encryption
  • encryption when performing online DDL

There are some additional features that are due for release in the near future:

  • Encryption of the doublewrite buffer
  • Automatic key rotation
  • Undo log and redo log encryption
  • InnoDB system tablespace encryption
  • InnoDB tablespace and redo log scrubbing
  • Amazon KMS keyring plugin

Just like MySQL, encryption of any existing tables needs to be specified via

ENCRYPTION=Y
 via an
ALTER
, however new tables are automatically encrypted. Another difference is that in order to check which tables are encrypted you can should the flag set against the tablespace in
information_schema.INNODB_SYS_TABLESPACES
, an example query being:

mysql> SELECT SUBSTRING_INDEX(name, '/', 1) AS db_name,
   ->    SUBSTRING_INDEX(name, '/', -1) AS db_table,
   ->    (flag & 8192) != 0 AS encrypted
   -> FROM information_schema.INNODB_SYS_TABLESPACES;
+---------+---------------------------+-----------+
| db_name | db_table                  | encrypted |
+---------+---------------------------+-----------+
| sys     | sys_config                |      1    |
| mysql   | engine_cost               |      1    |
| mysql   | help_category             |      1    |
| mysql   | help_keyword              |      1    |
| mysql   | help_relation             |      1    |
| mysql   | help_topic                |      1    |
| mysql   | innodb_index_stats        |      1    |
| mysql   | innodb_table_stats        |      1    |
| mysql   | plugin                    |      1    |
| mysql   | servers                   |      1    |
| mysql   | server_cost               |      1    |
| mysql   | slave_master_info         |      1    |
| mysql   | slave_relay_log_info      |      1    |
| mysql   | slave_worker_info         |      1    |
| mysql   | time_zone                 |      1    |
| mysql   | time_zone_leap_second     |      1    |
| mysql   | time_zone_name            |      1    |
| mysql   | time_zone_transition      |      1    |
| mysql   | time_zone_transition_type |      1    |
| mysql   | gtid_executed             |      0    |
+---------+---------------------------+-----------+

Here you will see something interesting! We are able to encrypt most of the system tables, including two that are of significance, as they can contain plain text passwords:

+---------+-------------------+-----------+
| db_name | db_table          | encrypted |
+---------+-------------------+-----------+
| mysql   | servers           |      1    |
| mysql   | slave_master_info |      1    |
+---------+-------------------+-----------+

In addition to the above, Percona Server for MySQL also supports using the opensource HashiCorp Vault to host the keyring decryption information using the keyring_vault plugin; utilizing this setup (provided Vault is not on the same device as your mysql service, and is configured correctly) gains you an additional layer of security.

You may also be interested in my earlier blog post on using Vault with MySQL, showing you how to store your credentials in a central location and use them to access your database, including the setup and configuration of Vault with Let’s Encrypt certificates.

Summary

There are significant differences both in terms of features and indeed variable names, but all of them are able to provide encryption of the InnoDB tablespaces that will be containing your persistent, sensitive data. The temporary tablespaces, InnoDB logs and temporary files contain transient data, so whilst they should ideally be encrypted, only a small section of data would exist in them for a finite amount of time which is less of a risk, albeit a risk nonetheless.

Here are the highlights:

MariaDB 10.3 MySQL 8.0 Percona Server 5.7
encrypted InnoDB data Y Y Y
encrypted non-InnoDB data Aria-only N N
encrypted InnoDB logs Y Y TBA
automatic encryption Y N Y
enforced encryption Y N Y
automatic key rotation Y N TBA
encrypted binary logs Y N Y
encrypted online DDL ? N Y
encrypted keyring Y Enterprise-only N
mysql.slave_master_info N N Y
mysql.servers N N Y
Hashicorp Vault N N Y
AWS KMS Y Enterprise-only TBA

 

Extra reading:

 

The post Comparing Data At-Rest Encryption Features for MariaDB, MySQL and Percona Server for MySQL appeared first on Percona Database Performance Blog.

by Ceri Williams at August 23, 2018 05:00 PM

August 22, 2018

Peter Zaitsev

How to Compile Percona Server for MySQL 5.7 in Raspberry Pi 3

Percona Server for MySQL on a Raspberry Pi

Percona Server for MySQL on a Raspberry PiIn this post I’ll give to you the steps to compile Percona Server for MySQL 5.7.22 in Raspberry Pi 3. Why? Well because in general this little computer is cheap, has low power consumption, and is great to use as a test machine for developers.

By default Raspbian OS includes very few versions of MySQL to install

$ apt-cache search mysql | grep server
...
mariadb-server-10.0 - MariaDB database server binaries
mariadb-server-10.1 - MariaDB database server binaries
mariadb-server-core-10.0 - MariaDB database core server files
mariadb-server-core-10.1 - MariaDB database core server files
mysql-server - MySQL database server binaries and system database setup [transitional] (5.5)
...

If you want to install MySQL or MariaDB on an ARM architecture using official pre-built binaries, you are limited to those distributions and versions.

Roel Van de Paar wrote time ago this post “Percona Server on the Raspberry Pi: Your own MySQL Database Server for Under $80” using “Fedora ARM” like OS on the first versions of raspberry cards.

Now we will use the last version of Raspberry Pi 3 with Raspbian OS. In my case, the OS version is this

$ cat /etc/issue
Raspbian GNU/Linux 9 \n \l

The Installation of Percona Server for MySQL on Raspberry Pi 3

Let’s start. We will need many devs packages and cmake to compile the source code. There is the command line to update or install all these packages:

apt-get install screen cmake debhelper autotools-dev libaio-dev wget automake libtool bison libncurses-dev libz-dev cmake bzr libgcrypt11-dev build-essential flex bison automake autoconf bzr libtool cmake libaio-dev mysql-client libncurses-dev zlib1g-dev libboost-dev

Now we need to download the Percona Server for MySQL 5.7.22 source code and then we can proceed to compile.

Before starting to compile the source code, we will need to extend the swap memory. This is necessary to avoid encountering memory problems at compilation time.

$ dd if=/dev/zero of=/swapfile1GB bs=1M count=1024
$ mkswap /swapfile1GB
$ swapon /swapfile1GB
$ chmod 0600 /swapfile1GB

Now we can check the memory and find that memory swap is correct

$ free -m

This is the output in my case

$ free -m
total used free shared buff/cache available
Mem: 927 176 92 2 658 683
Swap: 1123 26 1097

I recommend to use a screen session to compile the source code, because it takes a lot of time.

$ cd /root
$ screen -SL compile_percona_server
$ wget https://www.percona.com/downloads/Percona-Server-LATEST/Percona-Server-5.7.22-22/source/tarball/percona-server-5.7.22-22.tar.gz
$ tar czf percona-server-5.7.22-22.tar.gz
$ cd percona-server-5.7.22-22
$ cmake  -DDOWNLOAD_BOOST=ON -DWITH_BOOST=$HOME/my_boost .
$ make
$ make install

After it has compiled and installed successfully, it’s time to create our datadir directory for this Percona Server version, and the mysql user. Feel free to use other directory names

$ mkdir /var/lib/mysql
$ useradd mysql -d /var/lib/mysql
$ chown mysql.mysql /var/lib/mysql

Now it’s time to create a minimal my.cnf config file to start mysql, you can use the below example

$ vim /data/percona-5.6.38/my.cnf
[mysql]
socket=/var/lib/mysql/mysql.sock
[mysqld]
datadir = /var/lib/mysql
server_id = 2
binlog-format = row
log_bin = /var/lib/mysql/binlog
innodb_buffer_pool_size = 128M
socket=/var/lib/mysql/mysql.sock
symbolic-links=0
[mysqld_safe]
log-error=/var/lib/mysql/mysqld.log
pid-file=/var/lib/mysql/mysqld.pid

then we need to initialize the initial databases/schemas, ibdata and ib_logfile files with the next command

$ /usr/local/mysql/bin/mysqld --initialize-insecure --user=mysql --basedir=/usr/local/mysql --datadir=/var/lib/mysql

now—finally—it’s time start MySQL

$ /usr/local/mysql/bin/mysqld_safe --defaults-file=/etc/my.cnf --user=mysql &

We can check if MySQL started or not, by taking a look at the mysqld.log file

$ cat /var/lib/mysql/mysqld.log
...
2018-08-13T16:44:55.067352Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2018-08-13T16:44:55.067576Z 0 [Note] IPv6 is available.
2018-08-13T16:44:55.067680Z 0 [Note]   - '::' resolves to '::';
2018-08-13T16:44:55.067939Z 0 [Note] Server socket created on IP: '::'.
2018-08-13T16:44:55.258576Z 0 [Note] Event Scheduler: Loaded 0 events
2018-08-13T16:44:55.259525Z 0 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.7.22-22-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution

In our example it started ok.

Remember MySQL server was installed and started using an alternative path.
Now it’s time to connect and check if everything is running well.

$ /usr/local/mysql/bin/mysql -uroot
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.22-22-log Source distribution
Copyright (c) 2009-2018 Percona LLC and/or its affiliates
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.00 sec)

To check the version that’s running, you can use the next command

mysql> SHOW VARIABLES LIKE "%version%";
+-------------------------+---------------------+
| Variable_name | Value |
+-------------------------+---------------------+
| innodb_version | 5.7.22-22 |
| protocol_version | 10 |
| slave_type_conversions | |
| tls_version | TLSv1,TLSv1.1 |
| version | 5.7.22-22-log |
| version_comment | Source distribution |
| version_compile_machine | armv7l |
| version_compile_os | Linux |
| version_suffix | -log |
+-------------------------+---------------------+
9 rows in set (0.02 sec)

Improving Performance

Keep in mind that you have configured the datadir directory in the same microSD where are you running the OS:  MySQL will run slowly. If you create a new table perhaps it will take a few seconds. So, I recommend that you use a separate USB SSD disk, and move the datadir directory to this SSD disk. That’s more useful and the performance is much better

I hope you enjoyed this guide on how to use a tiny server to install Percona Server for MySQL.
If you want to test other versions, please go ahead: the steps will be very similar to these.

Other related post about Raspberry PI

The post How to Compile Percona Server for MySQL 5.7 in Raspberry Pi 3 appeared first on Percona Database Performance Blog.

by Walter Garcia at August 22, 2018 03:04 PM

Webinar Thurs 8/23: MySQL vs MongoDB – Choosing the Right Technology for Your Application

mongodb vs mysql which to choose

mongodb vs mysql which to choosePlease join Percona’s CEO, Peter Zaitsev as he presents MySQL vs MongoDB – Choosing the Right Technology for Your Application on Thursday, August 23, 2018, at 10:30 AM PDT (UTC-7) / 1:30 PM EDT (UTC-4).

Are you considering to adopt the most popular open source relational database or the most popular open source NoSQL database? Which one is right for your particular application?

In this presentation, we will look into advantages and disadvantages of both and examine the applications where MySQL or MongoDB are the most appropriate choice.

Register Now

The post Webinar Thurs 8/23: MySQL vs MongoDB – Choosing the Right Technology for Your Application appeared first on Percona Database Performance Blog.

by Peter Zaitsev at August 22, 2018 02:23 PM

August 21, 2018

Peter Zaitsev

Foreign Data Wrappers in PostgreSQL and a closer look at postgres_fdw

Foreign data wrapper FDWs with PostgreSQL postgres_fdw

Foreign data wrapper FDWs with PostgreSQL postgres_fdwThere are a few features in PostgreSQL that are very compelling, and that I rarely see in other RDBMSs. Some of these features are the driving force behind the growing popularity of PostgreSQL. This blog post is about one of my favourite features: FDW (Foreign Data Wrapper). As the name indicates, this feature allows a PostgreSQL database to treat tables in a remote PostgreSQL database as locally available tables.

The history of FDW began when SQL/MED came out as part of the ANSI SQL standard specification in 2003. MED stands for “Management of External Data”. By definition, “external data” is the data that the DBMS is able to access but does not manage. There are two parts for this specification:

  1. Foreign Table : this is about how to access external data sources and present them as relational tables.
  2. Datalink : this extends the functionality of database systems to include control over external files without the need to store their contents directly in the database, such as LOBs. A column of a table could directly refer a file.

PostgreSQL’s FDW capabilities addresses foreign tables only. It was introduced in PostgreSQL 9.1 and has been receiving improvements ever since.

Today there are a variety of FDWs which allow PostgreSQL to talk to most of the data sources we can think of. However, most FDWs are independent open source projects implemented as Postgres Extensions, and not officially supported by the PostgreSQL Global Development Group.

postgres_fdw

In this blog post we will take a closer look at the postgres_fdw which can be considered as the “reference implementation” for other FDW development efforts, and showcases its capabilities. This is the one FDW which comes with PostgreSQL source as a contrib extension module. The only other FDW which is part of PostgreSQL source tree is file_fdw.

Let’s look into postgres_fdw with a use case. In many organizations, there could be multiple systems catering to different functionalities/departments. For example, while an HR database may be holding the employee information the finance and payroll systems may need to access that same data. A common—but bad—solution for this is to duplicate the data in both systems. Data duplication often leads to problems, starting with data maintenance and accuracy. A smarter option, to avoid duplication while providing access to foreign databases to only the required data, is through FDWs.

Installation postgres_fdw

The Postgres Development Group (PGDG) offers PostgreSQL packages for all major Linux distributions. postgres_fdw itself is provided as a module that is usually included in the contrib package. In the example below we install such package for PostgreSQL 10 running on Red Hat/CentOS:

$ sudo yum install postgresql10-contrib.x86_64

Steps to setup

Let’s consider two PostgreSQL Instances, source instance and a destination instance

  • source is the remote postgres server from where the tables are accessed by the destination database server as foreign tables.
  • destination is another postgres server where the foreign tables are created which is referring tables in source database server.

We are going to use these definitions of source and destination in the rest of the post. Let’s assume that current application connects to destination database using a user app_user.

Step 1 : Create a user on the source

Create a user in the source server using the following syntax. This user account will be used by the destination server to access the source tables

postgres=# CREATE USER fdw_user WITH ENCRYPTED PASSWORD 'secret';

Step 2 : Create test tables (optional)

Let’s create a test table in the source server and insert a few records.

postgres=> create table employee (id int, first_name varchar(20), last_name varchar(20));
CREATE TABLE
postgres=# insert into employee values (1,'jobin','augustine'),(2,'avinash','vallarapu'),(3,'fernando','camargos');
INSERT 0 3

Step 3 : Grant privileges to user in the source

Give appropriate privileges to the fdw_user on the source table. Always try to limit the scope of privilege to minimum to improve security.
An example syntax is as following :

postgres=# GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE employee TO fdw_user;
GRANT

Step 4 : Modify ACL in pg_hba.conf

We need to ensure that the proper authentication is setup for accessing source server from destination server.
Add an entry into  pg_hba.conf as shown below, preferably at the beginning of the file.

host    all all     destination_server_ip/32          md5

Step 5 : Test connectivity and privileges on source

Before proceeding further, It is a good idea to make sure that we are able to connect to the source machine from this destination machine using the newly created database user (fdw_user).

In order to validate, on the destination server, use psql to connect to the source server:

$ psql -h hr -U fdw_user postgres

You could even validate all privileges on the tables which are to be presented as foreign tables using this connection.

Step 6 : Create postgres_fdw extension on the destination

Connect to destination server, and create the postgres_fdw extension in the destination database from where you wish to access the tables of source server. You must be a superuser to create the extension.

No postgres_fdw extension is needed on the source server.

postgres=# create extension postgres_fdw;
CREATE EXTENSION

Validate if the extension is created using \dx. Following is an example validation log.

postgres=# \dx postgres_fdw
                            List of installed extensions
    Name    | Version | Schema |                    Description
--------------+---------+--------+----------------------------------------------------
postgres_fdw | 1.0     | public | foreign-data wrapper for remote PostgreSQL servers
(1 row)

Step 7: Grant privileges to user in the destination

Always better to limit the scope of the server definition to an application user. If a regular user needs to define a server, that user needs to have USAGE permission on the foreign data wrapper. Superuser can grant the privilege

postgres=# grant usage on FOREIGN DATA WRAPPER postgres_fdw to app_user;
GRANT

Alternatively, superuser (postgres) can create a server definition and then grant USAGE permission on that server definition to the application user like this:

postgres=# GRANT USAGE ON FOREIGN SERVER hr TO app_user;
GRANT

Step 8: Create a server definition

Now we can create a server definition. This foreign server is created using the connection details of the source server running on host “hr”. Let’s name the foreign server as itself as “hr”

postgres=> CREATE SERVER hr
 FOREIGN DATA WRAPPER postgres_fdw
 OPTIONS (dbname 'postgres', host 'hr', port '5432');
CREATE SERVER

Step 9: Create user mapping from destination user to source user

Create a mapping on the destination side for destination user (app_user) to remote source user (fdw_user)

postgres=> CREATE USER MAPPING for app_user
SERVER hr
OPTIONS (user 'fdw_user', password 'secret');
CREATE USER MAPPING

Step 10 : Create foreign table definition on the destination

Create a foreign table in the destination server with the same structure as the source table, but with OPTIONS specifying schema_name and table_name

postgres=# CREATE FOREIGN TABLE employee
(id int, first_name character varying(20), last_name character varying(20))
SERVER hr
OPTIONS (schema_name 'public', table_name 'employee');

Step 11 : Test foreign table

Validate whether we can query the foreign table we just created in the destination server.

postgres=> select * from employee;
id | first_name | last_name
----+------------+-----------
1 | jobin | augustine
2 | avinash | vallarapu
3 | fernando | camargos
(3 rows)

As we can see from the above example, data is been accessed from the source database.

Now you might be thinking: “creating foreign tables one by one like this on the destination server is painful. Is it possible to do it automatically?“. The answer is yes – there is an option to import a full schema.

On the destination server, you can use the following syntax to import a schema.

postgres=# IMPORT FOREIGN SCHEMA "public" FROM SERVER hr INTO public;

If you wish to choose a certain list of tables for import, you can use the following syntax.

postgres=# IMPORT FOREIGN SCHEMA "public" limit to (employee) FROM SERVER hr INTO public;

In the above example, it will import the definition of only one table (employee).

Advantages of foreign tables

The main use case of the foreign tables is to make the data available to systems without actually duplicating/replicating it. There are even simple implementations of sharding using FDW, because data in the other shards can be made available for queries though FDWs.

A person coming from an Oracle-like background might think: “I can get data from a remote database table using simple DBLinks so what is the difference?“. The main difference is that FDW will maintain the meta-data/table definition about the foreign table locally. This results in better decisions compared to sending a simple SELECT * FROM <TABLE> to pull all results. We are going to see some of these advantages.

Note : In the following section always pay special attention on those lines starting with “Remote SQL:”

Query optimization

Since the definition of the foreign table is held locally, all query optimizations are made for remote executions too. Let’s consider a slightly more complex example where we have EMP (employee) and DEPT (department) tables in the HR database and SALGRADE (salary grade) table in the finance database. Suppose we want to know how many employees there are with a particular salary grade:

SELECT COUNT(*)
FROM EMP
JOIN SALGRADE ON EMP.SAL > SALGRADE.LOSAL AND EMP.SAL < SALGRADE.HISAL
WHERE SALGRADE.GRADE = 4;

Let’s see how PostgreSQL handles this:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=805.44..805.45 rows=1 width=8)
   Output: count(*)
   ->  Nested Loop  (cost=100.00..798.33 rows=2844 width=0)
         Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..186.80 rows=2560 width=8)
               Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno
               Remote SQL: SELECT sal FROM public.emp
         ->  Materialize  (cost=0.00..35.55 rows=10 width=8)
               Output: salgrade.losal, salgrade.hisal
               ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=8)
                     Output: salgrade.losal, salgrade.hisal
                     Filter: (salgrade.grade = 4)

Please pay special attention for the line reading :

Remote SQL: SELECT sal FROM public.emp

It knows that only the sal column need to be fetched from the remote database.
If we change the count(*) to ename (Employee Name) column, the remote SQL changes like:

Remote SQL: SELECT ename, sal FROM public.emp

PostgreSQL tries to pull only the absolutely necessary data from the remote server.

Writable foreign tables

At the beginning, foreign tables were just readable. But, with time, the community introduced writable foreign tables functionality in PostgreSQL. Let us consider the following situation where management wants to give a salary increase of 10% to grade 3 employees:

UPDATE emp
SET    sal = sal * 1.1
FROM   salgrade
WHERE  emp.sal > salgrade.losal
AND emp.sal < salgrade.hisal
AND salgrade.grade = 3;

In this case, we are updating data on a remote table using a join condition with a local table. As we can see in the explain plan, an UPDATE statement is more complex because it involves 2 steps. First it needs to fetch the data from the remote table to complete the join operation. Then, it updates the rows in the foreign table.

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on public.emp  (cost=100.00..300.71 rows=669 width=118)
   Remote SQL: UPDATE public.emp SET sal = $2 WHERE ctid = $1
   ->  Nested Loop  (cost=100.00..300.71 rows=669 width=118)
         Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, (emp.sal * '1.1'::double precision), emp.comm, emp.deptno, emp.ctid, salgrade.ctid
         Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..128.06 rows=602 width=112)
               Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno, emp.ctid
               Remote SQL: SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno, ctid FROM public.emp FOR UPDATE
         ->  Materialize  (cost=0.00..35.55 rows=10 width=14)
               Output: salgrade.ctid, salgrade.losal, salgrade.hisal
               ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=14)
                     Output: salgrade.ctid, salgrade.losal, salgrade.hisal
                     Filter: (salgrade.grade = 3)

Operator and function pushdown

PostgreSQL 9.5 release included the capability to assess and decide on the safety of pushing a function execution to remote server. Built-in functions are good candidates for this:

SELECT avg(sal)
FROM EMP
WHERE EMP.SAL > (SELECT LOSAL FROM SALGRADE WHERE GRADE = 4);

This statement results in the following query plan

QUERY PLAN
---------------------------------------------------------------------------
 Foreign Scan  (cost=137.63..186.06 rows=1 width=8)
   Output: (avg(emp.sal))
   Relations: Aggregate on (public.emp)
   Remote SQL: SELECT avg(sal) FROM public.emp WHERE ((sal > $1::integer))
   InitPlan 1 (returns $0)
     ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=4)
           Output: salgrade.losal
           Filter: (salgrade.grade = 4)

If the planner finds that the majority of records needs to be fetched from a remote server, it may not push the function execution to the remote server. For example:

SELECT avg(sal)
FROM EMP
JOIN SALGRADE ON EMP.SAL > SALGRADE.LOSAL AND EMP.SAL < SALGRADE.HISAL
WHERE SALGRADE.GRADE = 4;

In this case, the planner decides to do the function execution on the local server:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=805.44..805.45 rows=1 width=8)
   Output: avg(emp.sal)
   ->  Nested Loop  (cost=100.00..798.33 rows=2844 width=8)
         Output: emp.sal
         Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..186.80 rows=2560 width=8)
               Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno
               Remote SQL: SELECT sal FROM public.emp
         ->  Materialize  (cost=0.00..35.55 rows=10 width=8)
               Output: salgrade.losal, salgrade.hisal
               ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=8)
                     Output: salgrade.losal, salgrade.hisal
                     Filter: (salgrade.grade = 4)
(13 rows)

A great improvement in PostgreSQL 9.6 is that the function does’t need to be even a built-in function. If a user defined function or operator is immutable it becomes a good candidate for being executed in the remote server.

Join push down

In many cases, it is worth pushing down the entire join operations to the remote server in such a way only the results need to be fetched to the local server. PostgreSQL handles this switching intelligently. Here’s an example:

postgres=# EXPLAIN VERBOSE SELECT COUNT(*)
FROM EMP JOIN  DEPT ON EMP.deptno = DEPT.deptno AND DEPT.deptno=10;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=100.56..194.84 rows=1 width=8)
   Output: (count(*))
   Relations: Aggregate on ((public.emp) INNER JOIN (public.dept))
   Remote SQL: SELECT count(*) FROM (public.emp r1 INNER JOIN public.dept r2 ON (((r2.deptno = 10)) AND ((r1.deptno = 10))))
(4 rows)

Predicate push down

There are two options when executing a query against a foreign table:

  1. Fetch the data locally and apply the predicates like filtering condition locally.
  2. Send the filtering condition to the remote server and have it applied there.

The latter will can be the best option in many cases.

If you consider the previous example, we can see that  the predicate specification like “DEPT.deptno=10;” is pushed down to the remote server through foreign tables and applied there separately like this:

Remote SQL: SELECT count(*) FROM (public.emp r1 INNER JOIN public.dept r2 ON (((r2.deptno = 10)) AND ((r1.deptno = 10))))

PostgreSQL not only pushed the predicate, it also rewrote the query we sent to avoid one extra AND condition.

Aggregate push down

Just like predicate push down, here PostgreSQL also considers 2 options:

  1.  Execute the aggregates on the remote server and pull the result back to the local server
  2. Do the aggregate calculations on the local database instance after collecting all required data from remote database

We’ve already seen an aggregate pushdown example as part of the function pushdown, since we’ve used an aggregate function for that example. Here’s another simple example:

postgres=# explain verbose select deptno,count(*) from emp group by deptno;
                            QUERY PLAN
------------------------------------------------------------------
 Foreign Scan  (cost=114.62..159.88 rows=200 width=12)
   Output: deptno, (count(*))
   Relations: Aggregate on (public.emp)
   Remote SQL: SELECT deptno, count(*) FROM public.emp GROUP BY 1
(4 rows)

In this case, all of the aggregate calculation happens on the remote server.

Triggers and Check constraints on Foreign tables

We have seen that foreign tables can be writable. PostgreSQL provides features to implement check constraints and triggers on the foreign table as well. This allows us to have powerful capabilities in the local database. For example, all validations and auditing can take place on the local server. The remote DMLs can be audited separately, or a different logic can be applied for local and remote triggers and constraint validations.

Conclusion

FDWs in PostgreSQL, postgres_fdw in particular, provides very powerful and useful features by which, in many cases, we can avoid the complex duplicating and replicating of data. It provides a mechanism for ACID compliant transactions between two database systems. postgres_fdw works as a reference implementation for the development of other fdw implementations. In the coming days we will be covering some of these.

More articles you might enjoy:

If you found this article useful, why not take a look at some of our other posts on PostgreSQL?

 

The post Foreign Data Wrappers in PostgreSQL and a closer look at postgres_fdw appeared first on Percona Database Performance Blog.

by Jobin Augustine at August 21, 2018 03:44 PM

August 20, 2018

Jean-Jerome Schmidt

Database Security Monitoring for MySQL and MariaDB

Data protection is one of the most significant aspects of administering a database. Depending on the organizational structure, whether you are a developer, sysadmin or DBA, if you are managing the production database, you must monitor data for unauthorized access and usage. The purpose of security monitoring is twofold. One, to identify unauthorised activity on the database. And two, to check if databases ´and their configurations on a company-wide basis are compliant with security policies and standards.

In this article, we will divide monitoring for security in two categories. One will be related to auditing of MySQL and MariaDB databases activities. The second category will be about monitoring your instances for potential security gaps.

Query and connection policy-based monitoring

Continuous auditing is an imperative task for monitoring your database environment. By auditing your database, you can achieve accountability for actions taken or content accessed. Moreover, the audit may include some critical system components, such as the ones associated with financial data to support a precise set of regulations like SOX, or the EU GDPR regulation. Usually, it is achieved by logging information about DB operations on the database to an external log file.

By default, auditing in MySQL or MariaDB is disabled. You and achieve it by installing additional plugins or by capturing all queries with the query_log parameter. The general query log file is a general record of what MySQL is performing. The server records some information to this log when clients connect or disconnect, and it logs each SQL statement received from clients. Due to performance issues and lack of configuration options, the general_log is not a good solution for security audit purposes.

If you use MySQL Enterprise, you can use the MySQL Enterprise Audit plugin which is an extension to the proprietary MySQL version. MySQL Enterprise Audit Plugin plugin is only available with MySQL Enterprise, which is a commercial offering from Oracle. Percona and MariaDB have created their own open source versions of the audit plugin. Lastly, McAfee plugin for MySQL can also be used with various versions of MySQL. In this article, we will focus on the open source plugins, although the Enterprise version from Oracle seems to be the most robust and stable.

Characteristics of MySQL open source audit plugins

While the open source audit plugins do the same job as the Enterprise plugin from Oracle - they produce output with database query and connections - there are some major architectural differences.

MariaDB Audit Plugin – The MariaDB Audit Plugin works with MariaDB, MySQL (as of version 5.5.34 and 10.0.7) and Percona Server. MariaDB started including the Audit Plugin by default from versions 10.0.10 and 5.5.37, and it can be installed in any version from MariaDB 5.5.20. It is the only plugin that supports Oracle MySQL, Percona Server, and MariaDB. It is available on Windows and Linux platform. Versions starting from 1.2 are most stable, and it may be risky to use versions below that in your production environment.

McAfee MySQL Audit Plugin – This plugin does not use MySQL audit API. It was recently updated to support MySQL 5.7. Some tests show that API based plugins may provide better performance but you need to check it with your environment.

Percona Audit Log Plugin – Percona provides an open source auditing solution that installs with Percona Server 5.5.37+ and 5.6.17+ as part of the installation process. Comparing to other open source plugins, this plugin has more reach output features as it outputs XML, JSON and to syslog.

As it has some internal hooks to the server to be feature-compatible with Oracle’s plugin, it is not available as a standalone plugin for other versions of MySQL.

Plugin installation based on MariaDB audit extension

The installation of open source MySQL plugins is quite similar for MariaDB, Percona, and McAfee versions.
Percona and MariaDB add their plugins as part of the default server binaries, so there is no need to download plugins separately. The Percona version only officially supports it’s own fork of MySQL so there is no direct download from the vendor's website ( if you want to use this plugin with MySQL, you will have to obtain the plugin from a Percona server package). If you would like to use the MariaDB plugin with other forks of MySQL, then you can find it from https://downloads.mariadb.com/Audit-Plugin/MariaDB-Audit-Plugin/. The McAfee plugin is available at https://github.com/mcafee/mysql-audit/wiki/Installation.

Before you start the plugin installation, you can check if the plugin is present in the system. The dynamic plugin (doesn’t require instance restart) location can be checked with:

SHOW GLOBAL VARIABLES LIKE 'plugin_dir';

+---------------+--------------------------+
| Variable_name | Value                    |
+---------------+--------------------------+
| plugin_dir    | /usr/lib64/mysql/plugin/ |
+---------------+--------------------------+

Check the directory returned at the filesystem level to make sure you have a copy of the plugin library. If you do not have server_audit.so or server_audit.dll inside of /usr/lib64/mysql/plugin/, then more likely your MariaDB version is not supported and should upgrade it to latest version..

The syntax to install the MariaDB plugin is:

INSTALL SONAME 'server_audit';

To check installed plugins you need to run:

SHOW PLUGINS;
MariaDB [(none)]> show plugins;
+-------------------------------+----------+--------------------+--------------------+---------+
| Name                          | Status   | Type               | Library            | License |
+-------------------------------+----------+--------------------+--------------------+---------+
| binlog                        | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| mysql_native_password         | ACTIVE   | AUTHENTICATION     | NULL               | GPL     |
| mysql_old_password            | ACTIVE   | AUTHENTICATION     | NULL               | GPL     |
| wsrep                         | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| MRG_MyISAM                    | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| MEMORY                        | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| CSV                           | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| MyISAM                        | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| CLIENT_STATISTICS             | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INDEX_STATISTICS              | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| TABLE_STATISTICS              | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| USER_STATISTICS               | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| PERFORMANCE_SCHEMA            | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| InnoDB                        | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| INNODB_TRX                    | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_LOCKS                  | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_LOCK_WAITS             | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_CMP                    | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
...
| INNODB_MUTEXES                | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_SYS_SEMAPHORE_WAITS    | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_TABLESPACES_ENCRYPTION | ACTIVE   | INFORMATION SCHEMA | NULL               | BSD     |
| INNODB_TABLESPACES_SCRUBBING  | ACTIVE   | INFORMATION SCHEMA | NULL               | BSD     |
| Aria                          | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| SEQUENCE                      | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| user_variables                | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| FEEDBACK                      | DISABLED | INFORMATION SCHEMA | NULL               | GPL     |
| partition                     | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| rpl_semi_sync_master          | ACTIVE   | REPLICATION        | semisync_master.so | GPL     |
| rpl_semi_sync_slave           | ACTIVE   | REPLICATION        | semisync_slave.so  | GPL     |
| SERVER_AUDIT                  | ACTIVE   | AUDIT              | server_audit.so    | GPL     |
+-------------------------------+----------+--------------------+--------------------+---------+

If you need additional information, check the PLUGINS table in the information_schema database which contains more detailed information.

Another way to install the plugin is to enable the plugin in my.cnf and restart the instance. An example of a basic audit plugin configuration from MariaDB could be :

server_audit_events=CONNECT
server_audit_file_path=/var/log/mysql/audit.log
server_audit_file_rotate_size=1073741824
server_audit_file_rotations=8
server_audit_logging=ON
server_audit_incl_users=
server_audit_excl_users=
server_audit_output_type=FILE
server_audit_query_log_limit=1024

Above setting should be placed in my.cnf. Audit plugin will create file /var/log/mysql/audit.log which will rotate on size 1GB and there will be eight rotations until the file is overwritten. The file will contain only information about connections.

Currently, there are sixteen settings which you can use to adjust the MariaDB audit plugin.

server_audit_events
server_audit_excl_users
server_audit_file_path
server_audit_file_rotate_now
server_audit_file_rotate_size
server_audit_file_rotations
server_audit_incl_users
server_audit_loc_info
server_audit_logging
server_audit_mode
server_audit_output_type
Server_audit_query_log_limit
server_audit_syslog_facility
server_audit_syslog_ident
server_audit_syslog_info
server_audit_syslog_priority

Among them, you can find options to include or exclude users, set different logging events (CONNECT or QUERY) and switch between file and syslog.

To make sure the plugin will be enabled upon server startup, you have to set
plugin_load=server_audit=server_audit.so in your my.cnf settings. Such configuration can be additionally protected by server_audit=FORCE_PLUS_PERMANENT which will disable the plugin uninstall option.

UNINSTALL PLUGIN server_audit;

ERROR 1702 (HY000):
Plugin 'server_audit' is force_plus_permanent and can not be unloaded

Here is some sample entries produced by MariaDB audit plugin:

20180817 20:00:01,slave,cmon,cmon,31,0,DISCONNECT,information_schema,,0
20180817 20:47:01,slave,cmon,cmon,17,0,DISCONNECT,information_schema,,0
20180817 20:47:02,slave,cmon,cmon,19,0,DISCONNECT,information_schema,,0
20180817 20:47:02,slave,cmon,cmon,18,0,DISCONNECT,information_schema,,0
20180819 17:19:19,slave,cmon,cmon,12,0,CONNECT,information_schema,,0
20180819 17:19:19,slave,root,localhost,13,0,FAILED_CONNECT,,,1045
20180819 17:19:19,slave,root,localhost,13,0,DISCONNECT,,,0
20180819 17:19:20,slave,cmon,cmon,14,0,CONNECT,mysql,,0
20180819 17:19:20,slave,cmon,cmon,14,0,DISCONNECT,mysql,,0
20180819 17:19:21,slave,cmon,cmon,15,0,CONNECT,information_schema,,0
20180819 17:19:21,slave,cmon,cmon,16,0,CONNECT,information_schema,,0
20180819 19:00:01,slave,cmon,cmon,17,0,CONNECT,information_schema,,0
20180819 19:00:01,slave,cmon,cmon,17,0,DISCONNECT,information_schema,,0

Schema changes report

If you need to track only DDL changes, you can use the ClusterControl Operational Report on Schema Change. The Schema Change Detection Report shows any DDL changes on your database. This functionality requires an additional parameter in ClusterControl configuration file. If this is not set you will see the following information: schema_change_detection_address is not set in /etc/cmon.d/cmon_1.cnf. Once that is in place an example output may be like below:

It can be set up with a schedule, and the reports emailed to recipients.

ClusterControl: Schedule Operational Report
ClusterControl: Schedule Operational Report

MySQL Database Security Assessment

Package upgrade check

First, we will start with security checks. Being up-to-date with MySQL patches will help reduce risks associated with known vulnerabilities present in the MySQL server. You can keep your environment up-to-date by using the vendors’ package repository. Based on this information you can build your own reports, or use tools like ClusterControl to verify your environment and alert you on possible updates.

ClusterControl Upgrade Report gathers information from the operating system and compares them to packages available in the repository. The report is divided into four sections; upgrade summary, database packages, security packages, and other packages. You can quickly compare what you have installed on your system and find a recommended upgrade or patch.

ClusterControl: Upgrade Report
ClusterControl: Upgrade Report
ClusterControl: Upgrade Report details
ClusterControl: Upgrade Report details

To compare them manually you can run

SHOW VARIABLES WHERE variable_name LIKE "version";

With security bulletins like:
https://www.oracle.com/technetwork/topics/security/alerts-086861.html
https://nvd.nist.gov/view/vuln/search-results?adv_search=true&cves=on&cpe_vendor=cpe%3a%2f%3aoracle&cpe_produ
https://www.percona.com/doc/percona-server/LATEST/release-notes/release-notes_index.html
https://downloads.mariadb.org/mariadb/+releases/
https://www.cvedetails.com/vulnerability-list/vendor_id-12010/Mariadb.html
https://www.cvedetails.com/vulnerability-list/vendor_id-13000/Percona.html

Or vendor repositories:

On Debian

sudo apt list mysql-server

On RHEL/Centos

yum list | grep -i mariadb-server

Accounts without password

Blank passwords allow a user to login without using a password. MySQL used to come with a set of pre-created users, some of which can connect to the database without password or, even worse, anonymous users. Fortunately, this has changed in MySQL 5.7. Finally, it comes only with a root account that uses the password you choose at installation time.

For each row returned from the audit procedure, set a password:

SELECT User,host
FROM mysql.user
WHERE authentication_string='';

Additionally, you can install a password validation plugin and implement a more secure policy:

INSTALL PLUGIN validate_password SONAME 'validate_password.so';

SHOW VARIABLES LIKE 'default_password_lifetime';
SHOW VARIABLES LIKE 'validate_password%';

An good start can be:

plugin-load=validate_password.so
validate-password=FORCE_PLUS_PERMANENT
validate_password_length=14
validate_password_mixed_case_count=1
validate_password_number_count=1
validate_password_special_char_count=1
validate_password_policy=MEDIUM

Of course, these settings will depend on your business needs.

Remote access monitoring

Avoiding the use of wildcards within hostnames helps control the specific locations from which a given user may connect to and interact with the database.

You should make sure that every user can connect to MySQL only from specific hosts. You can always define several entries for the same user, this should help to reduce a need for wildcards.

Execute the following SQL statement to assess this recommendation (make sure no rows are returned):

SELECT user, host FROM mysql.user WHERE host = '%';

Test database

The default MySQL installation comes with an unused database called test and the test database is available to every user, especially to the anonymous users. Such users can create tables and write to them. This can potentially become a problem on its own - and writes would add some overhead and reduce database performance. It is recommended that the test database is dropped. To determine if the test database is present, run:

SHOW DATABASES LIKE 'test';

If you notice that the test database is present, this could be that mysql_secure_installation script which drops the test database (as well as other security-related activities) was not executed.

LOAD DATA INFILE

If both server and client has the ability to run LOAD DATA LOCAL INFILE, a client will be able to load data from a local file to a remote MySQL server. The local_infile parameter dictates whether files located on the MySQL client's computer can be loaded or selected via LOAD DATA INFILE or SELECT local_file.

This, potentially, can help to read files the client has access to - for example, on an application server, one could access any data that the HTTP server has access to. To avoid it, you need to set local-infile=0 in my.cnf.

Execute the following SQL statement and ensure the Value field is set to OFF:

SHOW VARIABLES WHERE Variable_name = 'local_infile';

Monitor for non-encrypted tablespaces

Starting from MySQL 5.7.11, InnoDB supports data encryption for tables stored in file-per-table tablespaces. This feature provides at-rest encryption for physical tablespace data files. To examine if your tables have been encrypted run:

mysql> SELECT TABLE_SCHEMA, TABLE_NAME, CREATE_OPTIONS FROM INFORMATION_SCHEMA.TABLES
       WHERE CREATE_OPTIONS LIKE '%ENCRYPTION="Y"%';

+--------------+------------+----------------+
| TABLE_SCHEMA | TABLE_NAME | CREATE_OPTIONS |
+--------------+------------+----------------+
| test         | t1         | ENCRYPTION="Y" |
+--------------+------------+----------------+

As a part of the encryption, you should also consider encryption of the binary log. The MySQL server writes plenty of information to binary logs.

Encryption connection validation

In some setups, the database should not be accessible through the network if every connection is managed locally, through the Unix socket. In such cases, you can add the ‘skip-networking’ variable in my.cnf. Skip-networking prevents MySQL from using any TCP/IP connection, and only Unix socket would be possible on Linux.

However this is rather rare situation as it is common to access MySQL over the network. You then need to monitor that your connections are encrypted. MySQL supports SSL as a means to encrypting traffic both between MySQL servers (replication) and between MySQL servers and clients. If you use Galera cluster, similar features are available - both intra-cluster communication and connections with clients can be encrypted using SSL. To check if you use SSL encryption run the following queries:

SHOW variables WHERE variable_name = 'have_ssl'; 
select ssl_verify_server_cert from mysql.slave_master_info;

That’s it for now. This is not a complete list, do let us know if there are any other checks that you are doing today on your production databases.

by Bart Oles at August 20, 2018 12:30 PM

Peter Zaitsev

Using AWS EC2 instance store vs EBS for MySQL: how to increase performance and decrease cost

AWS EC2 MySQL cost savings

AWS EC2 MySQL cost savingsIf you are using large EBS GP2 volumes for MySQL (i.e. 10TB+) on AWS EC2, you can increase performance and save a significant amount of money by moving to local SSD (NVMe) instance storage. Interested? Then read on for a more detailed examination of how to achieve cost-benefits and increase performance from this implementation.

EBS vs Local instance store

We have heard from customers that large EBS GP2 volumes can be affected by short term outages—IO “stalls” where no IO is going in or out for a couple of minutes. Statistically, with so many disks in disk arrays (which back EBS volumes) we can expect frequent disk failures. If we allocate a very large EBS GP2 volume, i.e. 10Tb+, hitting such failure events can be common.

In the case of MySQL/InnoDB, such an IO “stall” will be obvious, particularly with the highly loaded system where MySQL needs to do physical IO. During the stall, you will see all write queries are waiting, or “hang”.  Some of the writes may error out with “Error 1030” (MySQL error code 1030 (ER_GET_ERRNO): Got error %d from storage engine). There is nothing MySQL can do here – if the IO subsystem is not available, it will need to wait for it.

The good news is: many of the newer EC2 instances (i.e. i3, m5d, etc) have local SSD disks attached (NVMe). Those disks are local to the physical server and should not suffer from the EBS issues described above. Using local disks can be a very good solution:

  1. They are faster, as they are local to the server, and do not suffer from the EBS issues
  2. They are much cheaper compared to large EBS volumes.

Please note, however, that local storage does not guarantee persistence. More about this below.

Another potential option will be to use IO1 volumes with provisional IOPS. However, it will be significantly more expensive for the large volumes and high traffic.

A look at costs

To estimate the costs, I’ve used the AWS simple monthly calculator. Estimated costs are based on 1 year reserved instances. Let’s imagine we will need to use 14TB volume (to store ~10Tb of MySQL data including binary logs). The pricing estimates will look like this:

r4.4xlarge, 122GB RAM, 16 vCPUs + EBS, 14TB volume (this is what we are presumably using now)

Amazon EC2 Service (US East (N. Virginia)) $ 1890.56 / month
Compute: $ 490.56
EBS Volumes: $1400.00

Local storage price estimate:
i3.4xlarge, 122GB RAM, 16 vCPUs, 3800 GiB disk (2 x 1900 NVMe SSD)

Amazon EC2 Service (US East (N. Virginia)) $ 627.21 / month
Compute: $ 625.61

i3.8xlarge, 244GB RAM, 32 vCPUs, 7600 GiB disk (4 x 1900 NVMe SSD)

Amazon EC2 Service (US East (N. Virginia)) $1252.82 / month
Compute: $ 1251.22

As we can see, even if we switch to i3.8xlarge and get 2x more RAM and 2x more virtual CPUs, faster storage, 10 gigabit network we can still pay 1.5x less per box what we are presumably paying now. Include replication, then that’s paying 1.5x less per each of the replication servers.

But wait … there is a catch.

How to migrate to local storage from EBS

Well, we have some challenges here to migrate from EBS to local instance NVMe storage.

  1. Wait, we are storing ~10Tb and i3.8xlarge have 7600 GiB disk. The answer is simple: compression (see below)
  2. Wait, but the local storage is ephemeral, if we loose the box we will loose our data – that is unacceptable.  The answer is also simple: replication (see below)
  3. Wait, but we use EBS snapshots for backups. That answer is simple too: we can still use EBS (and use snapshots) on 1 of the replication slave (see below)

Compression

To fit i3.8xlarge we only need 2x compression. This can be done with InnoDB row compression (row_format=compressed) or InnoDB page compression, which requires sparse file and hole punching support. However, InnoDB compression may be slower and will only compress ibd files—it does not compress binary logs, frm files, etc.

ZFS

Another option: use the ZFS filesystem. ZFS will compress all files, including binary logs and frm. That can be very helpful if we use a “schema per customer” or “table per customer” approach and need to store 100K – 200K tables in a single MySQL instance. If the data is compressible, or new tables were provisioned without much data in those, ZFS can give a significant disk savings.

I’ve used ZFS (followed Yves blog post, Hands-On Look at ZFS with MySQL). Here are the results of data compression with ZFS (this is real data, not a generated data):

# du -sh --apparent-size /mysqldata/mysql/data
8.6T	/mysqldata/mysql/data
# du -sh /mysqldata/mysql/data
3.2T	/mysqldata/mysql/data

Compression ratio:

# zfs get all | grep -i compress
...
mysqldata/mysql/data  compressratio         2.42x                  -
mysqldata/mysql/data  compression           gzip                   inherited from mysqldata/mysql
mysqldata/mysql/data  refcompressratio      2.42x                  -
mysqldata/mysql/log   compressratio         3.75x                  -
mysqldata/mysql/log   compression           gzip                   inherited from mysqldata/mysql
mysqldata/mysql/log   refcompressratio      3.75x                  -

As we can see, the original 8.6Tb of data was compressed to 3.2Tb, the compression ratio for MySQL tables is 2.42x, for binary logs 3.75x. That will definitely fit i3.8xlarge.

(For another test, I’ve generated 40 million tables spread across multiple schemas (databases). I’ve added some data only to one schema, leaving others blank. For that test I achieved ~10x compression ratio.)

Conclusion: ZFS can provide you with very good compression ratio, will allow you to use different EC2 instances on AWS, and save you a substantial amount of money. Although compression is not free performance-wise, and ZFS can be slower for some workloads, using local NVMe storage can compensate.

You can find some performance testing for ZFS on linux in this blog post: About ZFS Performance. Some benchmarks comparing EBS and local NVMe SSD storage (i3 instances) can be found in this blog post: Percona XtraDB Cluster on Amazon GP2 Volumes

MyRocks

Another option for compression would be using the MyRocks storage engine in Percona Server for MySQL, which provides compression.

Replication and using local volumes

As the local instance storage is ephemeral we need redundancy: we can use MySQL replication or Percona XtraDB cluster (PXC). In addition, we can use one replication slave—or we can attach a replication slave to PXC—and have it use EBS storage.

Local storage is not durable. If you stop the instance and then start it again, the local storage will probably disappear. (Though reboot is an exception, you can reboot the instance and the local storage will be fine.) In addition if the local storage disappears we will have to recreate MySQL local storage partition (for ZFS, i.e. zpool create or for EXT4/XFS, i.e. mkfs)

For example, using MySQL replication:

master - local storage (AZ 1, i.e. 1a)
-> slave1 - local storage (AZ 2, i.e. 1b)
-> slave2 - ebs storage (AZ 3, i.e. 1c)
   (other replication slaves if needed with local storage - optional)

MySQL Master AZ 1a, Local storage

Then we can use slave2 for ebs snapshots (if needed). This slave will be more expensive (as it is using EBS) but it can also be used to either serve production traffic (i.e. we can place smaller amount of traffic) or for other purposes (for example analytical queries, etc).

For Percona XtraDB cluster (PXC) we can just use 3 nodes, 1 in each AZ. PXC uses auto-provisioning with SST if the new node comes back blank. For MySQL replication we need some additional things:

  1. Failover from master to a slave if the master will go down. This can be done with MHA or Orchestrator
  2. Ability to clone slave. This can be done with Xtrabackup or ZFS snapshots (if using ZFS)
  3. Ability to setup a new MySQL local storage partition (for ZFS, i.e. zpool create or for EXT4/XFS, i.e. mkfs)

Other options

Here are some totally different options we could consider:

  1. Use IO1 volumes (as discussed). That can be way more expensive.
  2. Use local storage and MyRocks storage engine. However, switching to another storage engine is another bigger project and requires lots of testing
  3. Switch to AWS Aurora. That can be even more expensive for this particular case; and switching to aurora can be another big project by itself.

Conclusions

  1. Using EC2 i3 instances with local NVMe storage can increase performance and save money. There are some limitations: local storage is ephemeral and will disappear if the node has stopped. Reboot is fine.
  2. ZFS filesystem with compression enabled can decrease the storage requirements so that a MySQL instance will fit into local storage. Another option for compression could be to use InnoDB compression (row_format=compressed).

That may not work for everyone as it requires additional changes to the existing server provisioning: failover from master to a slave, ability to clone replication slaves (or use PXC), ability to setup a new MySQL local storage partition, using compression.

The post Using AWS EC2 instance store vs EBS for MySQL: how to increase performance and decrease cost appeared first on Percona Database Performance Blog.

by Alexander Rubin at August 20, 2018 12:07 PM

Webinar Tues 8/21: MariaDB 10.3 vs. MySQL 8.0

MariaDB 10.3 vs MySQL 8.0

MariaDB 10.3 vs MySQL 8.0Please join Percona’s Chief Evangelist, Colin Charles on Tuesday, August 21st, 2018, as he presents MariaDB 10.3 vs. MySQL 8.0 at 7:00 AM PDT (UTC-7) / 10:00 PM EDT (UTC-4).

Are they syntactically similar? Where do these two languages differ? Why would I use one over the other?

MariaDB 10.3 is on the path of gradually diverging from MySQL 8.0. One obvious example is the internal data dictionary currently under development for MySQL 8.0. This is a major change to the way metadata is stored and used within the server: MariaDB doesn’t have an equivalent feature. Implementing this feature could mark the end of datafile-level compatibility between MySQL and MariaDB.

There are also non-technical differences between MySQL 8.0 and MariaDB 10.4, including:

  • Licensing: MySQL offers their code as open-source under the GPL, and provides the option of non-GPL commercial distribution in the form of MySQL Enterprise. MariaDB can only use the GPL because their work is derived from the MySQL source code under the terms of that license.
  • Support services: Oracle provides technical support, training, certification and consulting for MySQL, while MariaDB has their own support services. Some people prefer working with smaller companies, as traditionally it affords them more leverage as a customer.
  • Community contributions: MariaDB touts the fact that they accept more community contributions than Oracle. Part of the reason for this disparity is that developers like to contribute features, bug fixes and other code without a lot of paperwork overhead (and they complain about the Oracle Contributor Agreement). However, MariaDB has its own MariaDB Contributor Agreement — which more or less serves the same purpose.

Colin will take a look at some of the differences between MariaDB 10.3 and MySQL 8.0 and help answer some of the common questions our Database Performance Experts get about the two databases.

Register Now

The post Webinar Tues 8/21: MariaDB 10.3 vs. MySQL 8.0 appeared first on Percona Database Performance Blog.

by Colin Charles at August 20, 2018 10:22 AM

August 18, 2018

Valeriy Kravchuk

On Fine MySQL Manual

Today I am going to provide some details on the last item in my list of problems with Oracle's way of MySQL server development, maintenance of MySQL Manual. I stated that:
"MySQL Manual still have many details missing and is not fixed fast enough.
Moreover, it is not open source...
"
Let me explain the above:
  1. MySQL Reference Manual is not open source. It used to be built from DocBook XML sources. Probably that's still the case. But you can not find the source code in open repositories (please, correct me if I am wrong, I tried to search...) That's because it is NOT open source. It says this clearly in Preface:
    "This documentation is NOT distributed under a GPL license. Use of this documentation is subject to the following terms:
    ...
    Any other use, such as any dissemination of printed copies or use of this documentation, in whole or in part, in another publication, requires the prior written consent from an authorized representative of Oracle. Oracle and/or its affiliates reserve any and all rights to this documentation not expressly granted above.
    "
    It was NOT Oracle who closed the source (as far as I remember, the manual was not GPL even in 2004, when I started to care about MySQL in general). But Oracle had a chance to change the license and set up some better contribution process for MySQL Community users, with many good writers among them. They decided not do this, so creators of forks and derived software have to duplicate efforts and rewrite everything themselves, and the only real way to help make the manual better is to report bugs.
  2. Quality of new documentation is not improved much. We, MySQL Community users, have to report bugs in the manual, as it still has some details missing or documented wrongly. Let me illustrate this with recent 10 or so documentation requests users made (I skipped reports for features I do not care about for now, like group replication):
    • Bug #91955 - "8.0 broke plugin API for C". According to the comment from Oracle developer in this bug reported by Laurynas Biveinis, writing plugins in C is not supported any more... But this change is NOT properly documented.
    • Bug #91781 - "Add version matrix in "Chapter 2 Keywords and Reserved Words". Good idea in the manual is not consistently implemented.
    • Bug #91743 - "performance_schema.data_locks not working correctly". Nice feature was added to Performance Schema (now we can check and study InnoDB locks with SQL!), but it is still not completely and properly documented.
    • Bug #91654 - "Handling an Unexpected Halt of a Replication Slave" Documentation is uncertain". This is also an example of improper bug handling - when I worked in Oracle newer bugs were usually declared duplicates of older ones. Here the opposite decision was made, even though both reports were from the same user, Uday Varagani, who explicitly asked to change the decision. Obviously, documentation requests do not get any extra care comparing to other kinds of bugs, quite to the contrary...
    • Bug #91648 - "Numeric literals must be a number between 0 and 9999". Surely ports with numbers larger than 9999 can be used.
    • Bug #91642 - "Document how protocol version 4.0 and 4.1 differ on how strings are terminated". As noted by Rene' Cannao', comments in the code are still sometimes more useful than documentation.
    • Bug #91549 - "MDL lock queue order seems to depend on table names". Make sure to read last comment in this nice request by Shane Bester. Dmitry Lenev provides some details on priorities of MDL locks in it. There are still cases when bugs and documentation requests document some details better than fine MySQL Manual!
    • Bug #90997 - "Handling an Unexpected Halt of a Replication Slave" manual page is wrong". Sveta Smirnova highlighted a totally misleading statement in the manual.
    • Bug #90935 - "Modify relay_log_recovery Documentation". Simple documentation request stays "Open" for 3 months. Definitely processing documentation requests is not a highest priority for Oracle engineers.
    • Bug #90680 - "dragnet logging: document how to use / find error_symbol codes". Even if request comes from a customer or otherwise well known bug reporter, like Simon Mudd, and it's related to some totally new cool feature of MySQL 8, it can wait for the fix for months...
    You can make your own conclusions from the above. But I do not see any good trends in the way new documentation is created or documentation requests are processed recently. Same problems as 4 years ago (see more on that in a side note below).
  3. Older documentation requests get even less attention than recent ones sometimes, even though they may highlight problems with software itself, not the MySQL Manual. Let me illustrate this with a few bugs I reported:
    • Bug #83640 - "Locks set by DELETE statement on already deleted record". I explicitly asked to
      "...document in the manual what locks this kind of DELETE sets when it encountered a record already marked as deleted, and why"
      This still had NOT happened. Moreover, eventually both MySQL and MariaDB engineers decided that current implementation of locking for this case is wrong and can be improved, so this report ended up as InnoDB bug. Check related  MDEV-11215 for more details.
    • Bug #73413 - "Manual does not explain MTS implementation in details". This is one of my favorite documentation requests. I've got a suggestion to explain what I want to see documented, in details. I tried my best, but if you want to get more details, read this blog.
    • Bug #72368 - "Empty/zero statistics for imported tablespace until explicit ANALYZE TABLE". This is a bug in the way persistent statistics (the feature I truly hate) in InnoDB is re-estimated "automatically". But until the bug is fixed, I asked to document current implementation (and problems with it). So, where current implementation is properly documented? If only in the bug reports...
    • Bug #71294 - "Manual page for P_S.FILE_INSTANCES table does not explain '~' in FILE_NAME".  They pretend they do not get the point:
      "Per Marc Alff, the Performance Schema is simply reporting what it gets from runtime. If runtime is reporting nonexistent filenames, that's a server issue.

      Recategorizing this as a server bug.
      "
    • Bug #71293 - "Manual page for P_S.FILE_INSTANCES table does not explain EVENT_NAME values". Now they are waiting for the new DOCUMENTATION column in the setup_instruments table to be filled in with something by server developers... The code is the documentation? OK, bus as we know from the experience (see slides 44-46 here) chances to get a bug in Performance Schema fixed fast are even less than to see it properly and completely documented...
There are more problems with MySQL documentation (not only reference manual), but at the moment I consider 3 highlighted and illustrated above the most serious.

Regent's Canal is nice. If I only knew how to operate the locks there... MySQL Manual also misses some information about locks.
As a side note, it's not the first time I write about MySQL Manual. You can find some useful details in the following posts:
  • "What's wrong with MySQL Manual". In 2014, after spending few weeks reporting up to 5 documentation bugs per day, I thought that, essentially, there is nothing much wrong with it - it's fine, well indexed by Google and has meaningful human-readable URLs. Few problems listed were lack of careful readers (I tried my best to fix that), limited attention to old documentation requests, some pages with not so much useful content and lack of "How to" documents. The later I also tried to fix to some small extent in this blog, see howto tagged posts. The real fix came mostly from Percona's blog, though.
  • I have a special "missing manual" tag for blog posts that mention at least one bug in the manual.
  • I tried to use "missing_manual" tag consistently for my own documentation requests. Last year I shared a detailed enough review of the bugs with that tag that were still active.
As a yet another side note, I tried once to create a kind of community driven "missing manual" project, but failed. Writing manual from scratch is a (hard) full time job, while my full time job is providing support to users of MySQL, MariaDB and Percona software...

That said, I also wanted to join MySQL's documentation team in the past, but this was not possible at least because I am not a native speaker. If anything changed in this regard, I am still open to a job offer of this kind. My conditions for an acceptable offer from Oracle are known to all interested parties and they include (but are not limited to) at least 4 times the salary I had before I quit (it was almost 6 years ago) and working for Oracle's office in Paris (because in Oracle office you ere employed by formally matters a lot for your happiness and success as employee).

In case of no offer in upcoming week, I'll just continue to write my annoying but hopefully somewhat useful blog posts (until MySQL Reference Manual becomes true open source project ;)

by Valeriy Kravchuk (noreply@blogger.com) at August 18, 2018 11:53 AM

August 17, 2018

Peter Zaitsev

Percona Server for MySQL 5.6.41-84.1 Is Now Available

Percona Server for MySQL 5.6

Percona Server for MySQL 5.6Percona announces the release of Percona Server for MySQL 5.6.41-84.1 on August 17, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.6.41, including all the bug fixes in it. Percona Server for MySQL 5.6.41-84.1 is now the current GA release in the 5.6 series. All of Percona’s software is open-source and free.

Bugs Fixed
  • A simple SELECT query on a table with CHARSET=euckr COLLATE=euckr_bin could return different results each time it was executed. Bug fixed #4513 (upstream 91091).
  • Percona Server 5.6.39-83.1 could crash when altering an InnoDB table that has a full-text search index defined. Bug fixed #3851 (upstream 68987).
Other Bugs Fixed
  • #3325 “online upgrade GTID cause binlog damage in high write QPS situation”
  • #3976 “Errors in MTR tests main.variables-big, main.information_schema-big, innodb.innodb_bug14676111”
  • #4506 “Backport fixes from 8.0 for InnoDB memcached Plugin”

Find the release notes for Percona Server for MySQL 5.6.41-84.1 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.6.41-84.1 Is Now Available appeared first on Percona Database Performance Blog.

by Borys Belinsky at August 17, 2018 10:20 PM

Percona Server for MySQL 5.5.61-38.13 Is Now Available

Percona Server for MySQL 5.6

Percona Server for MySQL 5.6Percona announces the release of Percona Server for MySQL 5.5.61-38.13 on August 17, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.5.61, including all the bug fixes in it. Percona Server for MySQL 5.5.61-38.13 is now the current GA release in the 5.5 series. All of Percona’s software is open-source and free.

Bugs Fixed
  • The --innodb-optimize-keys option of the mysqldump utility fails when a column name is used as a prefix of a column which has the AUTO_INCREMENT attribute. Bug fixed #4524.
Other Bugs Fixed
  • #4566 “stack-use-after-scope in reinit_io_cache()” (upstream 91603)
  • #4581 “stack-use-after-scope in _db_enter_() / mysql_select_db()” (upstream 91604)
  • #4600 “stack-use-after-scope in _db_enter_() / get_upgrade_info_file_name()” (upstream 91617)
  • #3976 “Errors in MTR tests main.variables-big, main.information_schema-big, innodb.innodb_bug14676111”

Find the release notes for Percona Server for MySQL 5.5.61-38.13 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.5.61-38.13 Is Now Available appeared first on Percona Database Performance Blog.

by Borys Belinsky at August 17, 2018 07:30 PM

Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon

postgres mysql replication using pg_chameleon

postgres mysql replication using pg_chameleonReplication is one of the well-known features that allows us to build an identical copy of a database. It is supported in almost every RDBMS. The advantages of replication may be huge, especially HA (High Availability) and load balancing. But what if we need to build replication between 2 heterogeneous databases like MySQL and PostgreSQL? Can we continuously replicate changes from a MySQL database to a PostgreSQL database? The answer to this question is pg_chameleon.

For replicating continuous changes, pg_chameleon uses the mysql-replication library to pull the row images from MySQL, which are transformed into a jsonb object. A pl/pgsql function in postgres decodes the jsonb and replays the changes into the postgres database. In order to setup this type of replication, your mysql binlog_format must be “ROW”.

A few points you should know before setting up this tool :

  1. Tables that need to be replicated must have a primary key.
  2. Works for PostgreSQL versions > 9.5 and MySQL > 5.5
  3. binlog_format must be ROW in order to setup this replication.
  4. Python version must be > 3.3

When you initialize the replication, pg_chameleon pulls the data from MySQL using the CSV format in slices, to prevent memory overload. This data is flushed to postgres using the COPY command. If COPY fails, it tries INSERT, which may be slow. If INSERT fails, then the row is discarded.

To replicate changes from mysql, pg_chameleon mimics the behavior of a mysql slave. It creates the schema in postgres, performs the initial data load, connects to MySQL replication protocol, stores the row images into a table in postgres. Now, the respective functions in postgres decode those rows and apply the changes. This is similar to storing relay logs in postgres tables and applying them to a postgres schema. You do not have to create a postgres schema using any DDLs. This tool automatically does that for the tables configured for replication. If you need to specifically convert any types, you can specify this in the configuration file.

The following is just an exercise that you can experiment with and implement if it completely satisfies your requirement. We performed these tests on CentOS Linux release 7.4.

Prepare the environment

Set up Percona Server for MySQL

InstallMySQL 5.7 and add appropriate parameters for replication.

In this exercise, I have installed Percona Server for MySQL 5.7 using YUM repo.

yum install http://www.percona.com/downloads/percona-release/redhat/0.1-6/percona-release-0.1-6.noarch.rpm
yum install Percona-Server-server-57
echo "mysql ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
usermod -s /bin/bash mysql
sudo su - mysql

pg_chameleon requires the following the parameters to be set in your my.cnf file (parameter file of your MySQL server). You may add the following parameters to /etc/my.cnf

binlog_format= ROW
binlog_row_image=FULL
log-bin = mysql-bin
server-id = 1

Now start your MySQL server after adding the above parameters to your my.cnf file.

$ service mysql start

Fetch the temporary root password from mysqld.log, and reset the root password using mysqladmin

$ grep "temporary" /var/log/mysqld.log
$ mysqladmin -u root -p password 'Secret123!'

Now, connect to your MySQL instance and create sample schema/tables. I have also created an emp table for validation.

$ wget http://downloads.mysql.com/docs/sakila-db.tar.gz
$ tar -xzf sakila-db.tar.gz
$ mysql -uroot -pSecret123! < sakila-db/sakila-schema.sql
$ mysql -uroot -pSecret123! < sakila-db/sakila-data.sql
$ mysql -uroot -pSecret123! sakila -e "create table emp (id int PRIMARY KEY, first_name varchar(20), last_name varchar(20))"

Create a user for configuring replication using pg_chameleon and give appropriate privileges to the user using the following steps.

$ mysql -uroot -p
create user 'usr_replica'@'%' identified by 'Secret123!';
GRANT ALL ON sakila.* TO 'usr_replica'@'%';
GRANT RELOAD, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'usr_replica'@'%';
FLUSH PRIVILEGES;

While creating the user in your mysql server (‘usr_replica’@’%’), you may wish to replace % with the appropriate IP or hostname of the server on which pg_chameleon is running.

Set up PostgreSQL

Install PostgreSQL and start the database instance.

You may use the following steps to install PostgreSQL 10.x

yum install https://yum.postgresql.org/10/redhat/rhel-7.4-x86_64/pgdg-centos10-10-2.noarch.rpm
yum install postgresql10*
su - postgres
$/usr/pgsql-10/bin/initdb
$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data start

As seen in the following logs, create a user in PostgreSQL using which pg_chameleon can write changed data to PostgreSQL. Also create the target database.

postgres=# CREATE USER usr_replica WITH ENCRYPTED PASSWORD 'secret';
CREATE ROLE
postgres=# CREATE DATABASE db_replica WITH OWNER usr_replica;
CREATE DATABASE

Steps to install and setup replication using pg_chameleon

Step 1: In this exercise, I installed Python 3.6 and pg_chameleon 2.0.8 using the following steps. You may skip the python install steps if you already have the desired python release. We can create a virtual environment if the OS does not include Python 3.x by default.

yum install gcc openssl-devel bzip2-devel wget
cd /usr/src
wget https://www.python.org/ftp/python/3.6.6/Python-3.6.6.tgz
tar xzf Python-3.6.6.tgz
cd Python-3.6.6
./configure --enable-optimizations
make altinstall
python3.6 -m venv venv
source venv/bin/activate
pip install pip --upgrade
pip install pg_chameleon

Step 2: This tool requires a configuration file to store the source/target server details, and a directory to store the logs. Use the following command to let pg_chameleon create the configuration file template and the respective directories for you.

$ chameleon set_configuration_files

The above command would produce the following output, which shows that it created some directories and a file in the location where you ran the command.

creating directory /var/lib/pgsql/.pg_chameleon
creating directory /var/lib/pgsql/.pg_chameleon/configuration/
creating directory /var/lib/pgsql/.pg_chameleon/logs/
creating directory /var/lib/pgsql/.pg_chameleon/pid/
copying configuration example in /var/lib/pgsql/.pg_chameleon/configuration//config-example.yml

Copy the sample configuration file to another file, lets say, default.yml

$ cd .pg_chameleon/configuration/
$ cp config-example.yml default.yml

Here is how my default.yml file looks after adding all the required parameters. In this file, we can optionally specify the data type conversions, tables to skipped from replication and the DML events those need to skipped for selected list of tables.

---
#global settings
pid_dir: '~/.pg_chameleon/pid/'
log_dir: '~/.pg_chameleon/logs/'
log_dest: file
log_level: info
log_days_keep: 10
rollbar_key: ''
rollbar_env: ''
# type_override allows the user to override the default type conversion into a different one.
type_override:
  "tinyint(1)":
    override_to: boolean
    override_tables:
      - "*"
#postgres  destination connection
pg_conn:
  host: "localhost"
  port: "5432"
  user: "usr_replica"
  password: "secret"
  database: "db_replica"
  charset: "utf8"
sources:
  mysql:
    db_conn:
      host: "localhost"
      port: "3306"
      user: "usr_replica"
      password: "Secret123!"
      charset: 'utf8'
      connect_timeout: 10
    schema_mappings:
      sakila: sch_sakila
    limit_tables:
#      - delphis_mediterranea.foo
    skip_tables:
#      - delphis_mediterranea.bar
    grant_select_to:
      - usr_readonly
    lock_timeout: "120s"
    my_server_id: 100
    replica_batch_size: 10000
    replay_max_rows: 10000
    batch_retention: '1 day'
    copy_max_memory: "300M"
    copy_mode: 'file'
    out_dir: /tmp
    sleep_loop: 1
    on_error_replay: continue
    on_error_read: continue
    auto_maintenance: "disabled"
    gtid_enable: No
    type: mysql
    skip_events:
      insert:
#        - delphis_mediterranea.foo #skips inserts on the table delphis_mediterranea.foo
      delete:
#        - delphis_mediterranea #skips deletes on schema delphis_mediterranea
      update:

Step 3: Initialize the replica using this command:

$ chameleon create_replica_schema --debug

The above command creates a schema and nine tables in the PostgreSQL database that you specified in the .pg_chameleon/configuration/default.yml file. These tables are needed to manage replication from source to destination. The same can be observed in the following log.

db_replica=# \dn
List of schemas
Name | Owner
---------------+-------------
public | postgres
sch_chameleon | target_user
(2 rows)
db_replica=# \dt sch_chameleon.t_*
List of relations
Schema | Name | Type | Owner
---------------+------------------+-------+-------------
sch_chameleon | t_batch_events | table | target_user
sch_chameleon | t_discarded_rows | table | target_user
sch_chameleon | t_error_log | table | target_user
sch_chameleon | t_last_received | table | target_user
sch_chameleon | t_last_replayed | table | target_user
sch_chameleon | t_log_replica | table | target_user
sch_chameleon | t_replica_batch | table | target_user
sch_chameleon | t_replica_tables | table | target_user
sch_chameleon | t_sources | table | target_user
(9 rows)

Step 4: Add the source details to pg_chameleon using the following command. Provide the name of the source as specified in the configuration file. In this example, the source name is mysql and the target is postgres database defined under pg_conn.

$ chameleon add_source --config default --source mysql --debug

Once you run the above command, you should see that the source details are added to the t_sources table.

db_replica=# select * from sch_chameleon.t_sources;
-[ RECORD 1 ]-------+----------------------------------------------
i_id_source | 1
t_source | mysql
jsb_schema_mappings | {"sakila": "sch_sakila"}
enm_status | ready
t_binlog_name |
i_binlog_position |
b_consistent | t
b_paused | f
b_maintenance | f
ts_last_maintenance |
enm_source_type | mysql
v_log_table | {t_log_replica_mysql_1,t_log_replica_mysql_2}
$ chameleon show_status --config default
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql ready Yes N/A N/A

Step 5: Initialize the replica/slave using the following command. Specify the source from which you are replicating the changes to the PostgreSQL database.

$ chameleon init_replica --config default --source mysql --debug

Initialization involves the following tasks on the MySQL server (source).

1. Flush the tables with read lock
2. Get the master’s coordinates
3. Copy the data
4. Release the locks

The above command creates the target schema in your postgres database automatically.
In the default.yml file, we mentioned the following schema_mappings.

schema_mappings:
sakila: sch_sakila

So, now it created the new schema scott in the target database db_replica.

db_replica=# \dn
List of schemas
Name | Owner
---------------+-------------
public | postgres
sch_chameleon | usr_replica
sch_sakila | usr_replica
(3 rows)

Step 6: Now, start replication using the following command.

$ chameleon start_replica --config default --source mysql

Step 7: Check replication status and any errors using the following commands.

$ chameleon show_status --config default
$ chameleon show_errors

This is how the status looks:

$ chameleon show_status --source mysql
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql running No N/A N/A
== Schema mappings ==
Origin schema Destination schema
--------------- --------------------
sakila sch_sakila
== Replica status ==
--------------------- ---
Tables not replicated 0
Tables replicated 17
All tables 17
Last maintenance N/A
Next maintenance N/A
Replayed rows
Replayed DDL
Skipped rows

Now, you should see that the changes are continuously getting replicated from MySQL to PostgreSQL.

Step 8:  To validate, you may insert a record into the table in MySQL that we created for the purpose of validation and check that it is replicated to postgres.

$ mysql -u root -pSecret123! -e "INSERT INTO sakila.emp VALUES (1,'avinash','vallarapu')"
mysql: [Warning] Using a password on the command line interface can be insecure.
$ psql -d db_replica -c "select * from sch_sakila.emp"
 id | first_name | last_name
----+------------+-----------
  1 | avinash    | vallarapu
(1 row)

In the above log, we see that the record that was inserted to the MySQL table was replicated to the PostgreSQL table.

You may also add multiple sources for replication to PostgreSQL (target).

Reference : http://www.pgchameleon.org/documents/

Please refer to the above documentation to find out about the many more options that are available with pg_chameleon

The post Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon appeared first on Percona Database Performance Blog.

by Avinash Vallarapu at August 17, 2018 03:38 PM

Jean-Jerome Schmidt

How to Monitor your Database Servers using ClusterControl CLI

How would you like to merge "top" process for all your 5 database nodes and sort by CPU usage with just a one-liner command? Yeah, you read it right! How about interactive graphs display in the terminal interface? We introduced the CLI client for ClusterControl called s9s about a year ago, and it’s been a great complement to the web interface. It’s also open source..

In this blog post, we’ll show you how you can monitor your databases using your terminal and s9s CLI.

Introduction to s9s, The ClusterControl CLI

ClusterControl CLI (or s9s or s9s CLI), is an open source project and optional package introduced with ClusterControl version 1.4.1. It is a command line tool to interact, control and manage your database infrastructure using ClusterControl. The s9s command line project is open source and can be found on GitHub.

Starting from version 1.4.1, the installer script will automatically install the package (s9s-tools) on the ClusterControl node.

Some prerequisites. In order for you to run s9s-tools CLI, the following must be true:

  • A running ClusterControl Controller (cmon).
  • s9s client, install as a separate package.
  • Port 9501 must be reachable by the s9s client.

Installing the s9s CLI is straightforward if you install it on the ClusterControl Controller host itself:$ rm

$ rm -Rf ~/.s9s
$ wget http://repo.severalnines.com/s9s-tools/install-s9s-tools.sh
$ ./install-s9s-tools.sh

You can install s9s-tools outside of the ClusterControl server (your workstation laptop or bastion host), as long as the ClusterControl Controller RPC (TLS) interface is exposed to the public network (default to 127.0.0.1:9501). You can find more details on how to configure this in the documentation page.

To verify if you can connect to ClusterControl RPC interface correctly, you should get the OK response when running the following command:

$ s9s cluster --ping
PING OK 2.000 ms

As a side note, also look at the limitations when using this tool.

Example Deployment

Our example deployment consists of 8 nodes across 3 clusters:

  • PostgreSQL Streaming Replication - 1 master, 2 slaves
  • MySQL Replication - 1 master, 1 slave
  • MongoDB Replica Set - 1 primary, 2 secondary nodes

All database clusters were deployed by ClusterControl by using "Deploy Database Cluster" deployment wizard and from the UI point-of-view, this is what we would see in the cluster dashboard:

Cluster Monitoring

We will start by listing out the clusters:

$ s9s cluster --list --long
ID STATE   TYPE              OWNER  GROUP  NAME                   COMMENT
23 STARTED postgresql_single system admins PostgreSQL 10          All nodes are operational.
24 STARTED replication       system admins Oracle 5.7 Replication All nodes are operational.
25 STARTED mongodb           system admins MongoDB 3.6            All nodes are operational.

We see the same clusters as the UI. We can get more details on the particular cluster by using the --stat flag. Multiple clusters and nodes can also be monitored this way, the command line options can even use wildcards in the node and cluster names:

$ s9s cluster --stat *Replication
Oracle 5.7 Replication                                                                                                                                                                                               Name: Oracle 5.7 Replication              Owner: system/admins
      ID: 24                                  State: STARTED
    Type: REPLICATION                        Vendor: oracle 5.7
  Status: All nodes are operational.
  Alarms:  0 crit   1 warn
    Jobs:  0 abort  0 defnd  0 dequd  0 faild  7 finsd  0 runng
  Config: '/etc/cmon.d/cmon_24.cnf'
 LogFile: '/var/log/cmon_24.log'

                                                                                HOSTNAME    CPU   MEMORY   SWAP    DISK       NICs
                                                                                10.0.0.104 1  6% 992M 120M 0B 0B 19G 13G   10K/s 54K/s
                                                                                10.0.0.168 1  6% 992M 116M 0B 0B 19G 13G   11K/s 66K/s
                                                                                10.0.0.156 2 39% 3.6G 2.4G 0B 0B 19G 3.3G 338K/s 79K/s

The output above gives a summary of our MySQL replication together with the cluster status, state, vendor, configuration file and so on. Down the line, you can see the list of nodes that fall under this cluster ID with a summarized view of system resources for each host like number of CPUs, total memory, memory usage, swap disk and network interfaces. All information shown are retrieved from the CMON database, not directly from the actual nodes.

You can also get a summarized view of all databases on all clusters:

$ s9s  cluster --list-databases --long
SIZE        #TBL #ROWS     OWNER  GROUP  CLUSTER                DATABASE
  7,340,032    0         0 system admins PostgreSQL 10          postgres
  7,340,032    0         0 system admins PostgreSQL 10          template1
  7,340,032    0         0 system admins PostgreSQL 10          template0
765,460,480   24 2,399,611 system admins PostgreSQL 10          sbtest
          0  101         - system admins Oracle 5.7 Replication sys
Total: 5 databases, 789,577,728, 125 tables.

The last line summarizes that we have total of 5 databases with 125 tables, 4 of them are on our PostgreSQL cluster.

For a complete example of usage on s9s cluster command line options, check out s9s cluster documentation.

Node Monitoring

For nodes monitoring, s9s CLI has similar features with the cluster option. To get a summarized view of all nodes, you can simply do:

$ s9s node --list --long
STAT VERSION    CID CLUSTER                HOST       PORT  COMMENT
coC- 1.6.2.2662  23 PostgreSQL 10          10.0.0.156  9500 Up and running
poM- 10.4        23 PostgreSQL 10          10.0.0.44   5432 Up and running
poS- 10.4        23 PostgreSQL 10          10.0.0.58   5432 Up and running
poS- 10.4        23 PostgreSQL 10          10.0.0.60   5432 Up and running
soS- 5.7.23-log  24 Oracle 5.7 Replication 10.0.0.104  3306 Up and running.
coC- 1.6.2.2662  24 Oracle 5.7 Replication 10.0.0.156  9500 Up and running
soM- 5.7.23-log  24 Oracle 5.7 Replication 10.0.0.168  3306 Up and running.
mo-- 3.2.20      25 MongoDB 3.6            10.0.0.125 27017 Up and Running
mo-- 3.2.20      25 MongoDB 3.6            10.0.0.131 27017 Up and Running
coC- 1.6.2.2662  25 MongoDB 3.6            10.0.0.156  9500 Up and running
mo-- 3.2.20      25 MongoDB 3.6            10.0.0.35  27017 Up and Running
Total: 11

The most left-hand column specifies the type of the node. For this deployment, "c" represents ClusterControl Controller, 'p" for PostgreSQL, "m" for MongoDB, "e" for Memcached and s for generic MySQL nodes. The next one is the host status - "o" for online, "l" for off-line, "f" for failed nodes and so on. The next one is the role of the node in the cluster. It can be M for master, S for slave, C for controller and - for everything else. The remaining columns are pretty self-explanatory.

You can get all the list by looking at the man page of this component:

$ man s9s-node

From there, we can jump into a more detailed stats for all nodes with --stats flag:

$ s9s node --stat --cluster-id=24
 10.0.0.104:3306
    Name: 10.0.0.104              Cluster: Oracle 5.7 Replication (24)
      IP: 10.0.0.104                 Port: 3306
   Alias: -                         Owner: system/admins
   Class: CmonMySqlHost              Type: mysql
  Status: CmonHostOnline             Role: slave
      OS: centos 7.0.1406 core     Access: read-only
   VM ID: -
 Version: 5.7.23-log
 Message: Up and running.
LastSeen: Just now                    SSH: 0 fail(s)
 Connect: y Maintenance: n Managed: n Recovery: n Skip DNS: y SuperReadOnly: n
     Pid: 16592  Uptime: 01:44:38
  Config: '/etc/my.cnf'
 LogFile: '/var/log/mysql/mysqld.log'
 PidFile: '/var/lib/mysql/mysql.pid'
 DataDir: '/var/lib/mysql/'
 10.0.0.168:3306
    Name: 10.0.0.168              Cluster: Oracle 5.7 Replication (24)
      IP: 10.0.0.168                 Port: 3306
   Alias: -                         Owner: system/admins
   Class: CmonMySqlHost              Type: mysql
  Status: CmonHostOnline             Role: master
      OS: centos 7.0.1406 core     Access: read-write
   VM ID: -
 Version: 5.7.23-log
 Message: Up and running.
  Slaves: 10.0.0.104:3306
LastSeen: Just now                    SSH: 0 fail(s)
 Connect: n Maintenance: n Managed: n Recovery: n Skip DNS: y SuperReadOnly: n
     Pid: 975  Uptime: 01:52:53
  Config: '/etc/my.cnf'
 LogFile: '/var/log/mysql/mysqld.log'
 PidFile: '/var/lib/mysql/mysql.pid'
 DataDir: '/var/lib/mysql/'
 10.0.0.156:9500
    Name: 10.0.0.156              Cluster: Oracle 5.7 Replication (24)
      IP: 10.0.0.156                 Port: 9500
   Alias: -                         Owner: system/admins
   Class: CmonHost                   Type: controller
  Status: CmonHostOnline             Role: controller
      OS: centos 7.0.1406 core     Access: read-write
   VM ID: -
 Version: 1.6.2.2662
 Message: Up and running
LastSeen: 28 seconds ago              SSH: 0 fail(s)
 Connect: n Maintenance: n Managed: n Recovery: n Skip DNS: n SuperReadOnly: n
     Pid: 12746  Uptime: 01:10:05
  Config: ''
 LogFile: '/var/log/cmon_24.log'
 PidFile: ''
 DataDir: ''

Printing graphs with the s9s client can also be very informative. This presents the data the controller collected in various graphs. There are almost 30 graphs supported by this tool as listed here and s9s-node enumerates them all. The following shows server load histogram of all nodes for cluster ID 1 as collected by CMON, right from your terminal:

It is possible to set the start and end date and time. One can view short periods (like the last hour) or longer periods (like a week or a month). The following is an example of viewing the disk utilization for the last hour:

Using the --density option, a different view can be printed for every graph. This density graph shows not the time series, but how frequently the given values were seen (X-axis represents the density value):

If the terminal does not support Unicode characters, the --only-ascii option can switch them off:

The graphs have colors, where dangerously high values for example are shown in red. The list of nodes can be filtered with --nodes option, where you can specify the node names or use wildcards if convenient.

Process Monitoring

Another cool thing about s9s CLI is it provides a processlist of the entire cluster - a “top” for all nodes, all processes merged into one. The following command runs the "top" command on all database nodes for cluster ID 24, sorted by the most CPU consumption, and updated continuously:

$ s9s process --top --cluster-id=24
Oracle 5.7 Replication - 04:39:17                                                                                                                                                      All nodes are operational.
3 hosts, 4 cores, 10.6 us,  4.2 sy, 84.6 id,  0.1 wa,  0.3 st,
GiB Mem : 5.5 total, 1.7 free, 2.6 used, 0.1 buffers, 1.1 cached
GiB Swap: 0 total, 0 used, 0 free,

PID   USER     HOST       PR  VIRT      RES    S   %CPU   %MEM COMMAND
12746 root     10.0.0.156 20  1359348    58976 S  25.25   1.56 cmon
 1587 apache   10.0.0.156 20   462572    21632 S   1.38   0.57 httpd
  390 root     10.0.0.156 20     4356      584 S   1.32   0.02 rngd
  975 mysql    10.0.0.168 20  1144260    71936 S   1.11   7.08 mysqld
16592 mysql    10.0.0.104 20  1144808    75976 S   1.11   7.48 mysqld
22983 root     10.0.0.104 20   127368     5308 S   0.92   0.52 sshd
22548 root     10.0.0.168 20   127368     5304 S   0.83   0.52 sshd
 1632 mysql    10.0.0.156 20  3578232  1803336 S   0.50  47.65 mysqld
  470 proxysql 10.0.0.156 20   167956    35300 S   0.44   0.93 proxysql
  338 root     10.0.0.104 20     4304      600 S   0.37   0.06 rngd
  351 root     10.0.0.168 20     4304      600 R   0.28   0.06 rngd
   24 root     10.0.0.156 20        0        0 S   0.19   0.00 rcu_sched
  785 root     10.0.0.156 20   454112    11092 S   0.13   0.29 httpd
   26 root     10.0.0.156 20        0        0 S   0.13   0.00 rcuos/1
   25 root     10.0.0.156 20        0        0 S   0.13   0.00 rcuos/0
22498 root     10.0.0.168 20   127368     5200 S   0.09   0.51 sshd
14538 root     10.0.0.104 20        0        0 S   0.09   0.00 kworker/0:1
22933 root     10.0.0.104 20   127368     5200 S   0.09   0.51 sshd
28295 root     10.0.0.156 20   127452     5016 S   0.06   0.13 sshd
 2238 root     10.0.0.156 20   197520    10444 S   0.06   0.28 vc-agent-007
  419 root     10.0.0.156 20    34764     1660 S   0.06   0.04 systemd-logind
    1 root     10.0.0.156 20    47628     3560 S   0.06   0.09 systemd
27992 proxysql 10.0.0.156 20    11688      872 S   0.00   0.02 proxysql_galera
28036 proxysql 10.0.0.156 20    11688      876 S   0.00   0.02 proxysql_galera

There is also a --list flag which returns a similar result without continuous update (similar to "ps" command):

$ s9s process --list --cluster-id=25

Job Monitoring

Jobs are tasks performed by the controller in the background, so that the client application does not need to wait until the entire job is finished. ClusterControl executes management tasks by assigning an ID for every task and lets the internal scheduler decide whether two or more jobs can be run in parallel. For example, more than one cluster deployment can be executed simultaneously, as well as other long running operations like backup and automatic upload of backups to cloud storage.

In any management operation, it's would be helpful if we could monitor the progress and status of a specific job, like e.g., scale out a new slave for our MySQL replication. The following command add a new slave, 10.0.0.77 to scale out our MySQL replication:

$ s9s cluster --add-node --nodes="10.0.0.77" --cluster-id=24
Job with ID 66992 registered.

We can then monitor the jobID 66992 using the job option:

$ s9s job --log --job-id=66992
addNode: Verifying job parameters.
10.0.0.77:3306: Adding host to cluster.
10.0.0.77:3306: Testing SSH to host.
10.0.0.77:3306: Installing node.
10.0.0.77:3306: Setup new node (installSoftware = true).
10.0.0.77:3306: Setting SELinux in permissive mode.
10.0.0.77:3306: Disabling firewall.
10.0.0.77:3306: Setting vm.swappiness = 1
10.0.0.77:3306: Installing software.
10.0.0.77:3306: Setting up repositories.
10.0.0.77:3306: Installing helper packages.
10.0.0.77: Upgrading nss.
10.0.0.77: Upgrading ca-certificates.
10.0.0.77: Installing socat.
...
10.0.0.77: Installing pigz.
10.0.0.77: Installing bzip2.
10.0.0.77: Installing iproute2.
10.0.0.77: Installing tar.
10.0.0.77: Installing openssl.
10.0.0.77: Upgrading openssl openssl-libs.
10.0.0.77: Finished with helper packages.
10.0.0.77:3306: Verifying helper packages (checking if socat is installed successfully).
10.0.0.77:3306: Uninstalling existing MySQL packages.
10.0.0.77:3306: Installing replication software, vendor oracle, version 5.7.
10.0.0.77:3306: Installing software.
...

Or we can use the --wait flag and get a spinner with progress bar:

$ s9s job --wait --job-id=66992
Add Node to Cluster
- Job 66992 RUNNING    [         █] ---% Add New Node to Cluster

That's it for today's monitoring supplement. We hope that you’ll give the CLI a try and get value out of it. Happy clustering!

by ashraf at August 17, 2018 01:05 PM

Peter Zaitsev

This Week in Data with Colin Charles 49: MongoDB Conference Opportunities and Serverless Aurora MySQL

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Beyond the MongoDB content that will be at Percona Live Europe 2018, there is also a bit of an agenda for MongoDB Europe 2018, happening on November 8 in London—a day after Percona Live in Frankfurt. I expect you’ll see a diverse set of MongoDB content at Percona Live.

The Percona Live Europe Call for Papers closes TODAY! (Friday August 17, 2018)

From Amazon, there have been some good MySQL changes. You now have access to time delayed replication as a strategy for your High Availability and disaster recovery. This works with versions 5.7.22, 5.6.40 and later. It is worth noting that this isn’t documented as working for MariaDB (yet?). It arrived in MariaDB Server in 10.2.3.

Another MySQL change from Amazon? Aurora Serverless MySQL is now generally available. You can build and run applications without thinking about instances: previously, the database function was not all that focused on serverless. This on-demand auto-scaling serverless Aurora should be fun to use. Only Aurora MySQL 5.6 is supported at the moment and also, be aware that this is not available in all regions yet (e.g. Singapore).

Releases

  • pgmetrics is described as an open-source, zero-dependency, single-binary tool that can collect a lot of information and statistics from a running PostgreSQL server and display it in easy-to-read text format or export it as JSON for scripting.
  • PostgreSQL 10.5, 9.6.10, 9.5.14, 9.4.19, 9.3.24, And 11 Beta 3 has two fixed security vulnerabilities may inspire an upgrade.

Link List

Industry Updates

  • Martin Arrieta (LinkedIn) is now a Site Reliability Engineer at Fastly. Formerly of Pythian and Percona.
  • Ivan Zoratti (LinkedIn) is now Director of Product Management at Neo4j. He was previously on founding teams, was the CTO of MariaDB Corporation (then SkySQL), and is a long time MySQL veteran.

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 49: MongoDB Conference Opportunities and Serverless Aurora MySQL appeared first on Percona Database Performance Blog.

by Colin Charles at August 17, 2018 12:49 PM

August 16, 2018

Peter Zaitsev

MongoDB: how to use the JSON Schema Validator

JSON Schema Validator for MongoDB

JSON Schema Validator for MongoDBThe flexibility of MongoDB as a schemaless database is one of its strengths. In early versions, it was left to application developers to ensure that any necessary data validation is implemented. With the introduction of JSON Schema Validator there are new techniques to enforce data integrity for MongoDB. In this article, we use examples to show you how to use the JSON Schema Validator to introduce validation checks at the database level—and consider the pros and cons of doing so.

Why validate?

MongoDB is a schemaless database. This means that we don’t have to define a fixed schema for a collection. We just need to insert a JSON document into a collection and that’s all. Documents in the same collection can have a completely different set of fields, and even the same fields can have different types on different documents. The same object can be a string in some documents and can be a number in other documents.

The schemaless feature has given MongoDB great flexibility and the capability to adapt the database to the changing needs of applications. Let’s say that this flexibility is one of the main reasons to use MongoDB. Relational databases are not so flexible: you always need to define a schema at first. Then, when you need to add new columns, create new tables or change existing architecture to respond to the needs of the applications it’s sometimes a very hard task.

The real world can often be messy and MongoDB can really help, but in most cases the real world requires some kind of backbone architecture too. In real applications built on MongoDB there is always some kind of “fixed schema” or “validation rules” in collections and in documents. It’s possible to have in a collection two documents that represent two completely different things.

Well, it’s technically possible, but it doesn’t make sense in most cases for the application. Most of the arguments for enforcing a schema on the data are well known: schemas maintain structure, giving a clear idea of what’s going into the database, reducing preventable bugs and allowing for cleaner code. Schemas are a form of self-documenting code, as they describe exactly what type of data something should be, and they let you know what checks will be performed. It’s good to be flexible, but behind the scenes we need some strong regulations.

So, what we need to do is to find a balance between flexibility and schema validation. In real world applications, we need to define a sort of “backbone schema” for our data and retain the possibility to be flexible to manage specific particularities. In the past developers implemented schema validation in their applications, but starting from version 3.6, MongoDB supports the JSON Schema Validator. We can rely on it to define a fixed schema and validation rules directly into the database and free the applications to take care of it.

Let’s have a look at how it works.

JSON Schema Validator

In fact, a “Validation Schema” was already introduced in 3.2 but the new “JSON Schema Validator” introduced in the 3.6 release is by far the best and a friendly way to manage validations in MongoDB.

What we need to do is to define the rules using the operator $jsonSchema in the db.createCollection command. The $jsonSchema operator requires a JSON document where we specify all the rules to be applied on each inserted or updated document: for example what are the required fields, what type the fields must be, what are the ranges of the values, what pattern a specific field must have, and so on.

Let’s have a look at the following example where we create a collection people defining validation rules with JSON Schema Validator.

db.createCollection( "people" , {
   validator: { $jsonSchema: {
      bsonType: "object",
      required: [ "name", "surname", "email" ],
      properties: {
         name: {
            bsonType: "string",
            description: "required and must be a string" },
         surname: {
            bsonType: "string",
            description: "required and must be a string" },
         email: {
            bsonType: "string",
            pattern: "^.+\@.+$",
            description: "required and must be a valid email address" },
         year_of_birth: {
            bsonType: "int",
            minimum: 1900,
            maximum: 2018,
            description: "the value must be in the range 1900-2018" },
         gender: {
            enum: [ "M", "F" ],
            description: "can be only M or F" }
      }
   }
}})

Based on what we have defined, only 3 fields are strictly required in every document of the collection: name, surname, and email. In particular, the email field must have a specific pattern to be sure the content is a valid address. (Note: to validate an email address you need a more complex regular expression, here we just use a simpler version just to check there is the @ symbol). Other fields are not required but in case someone inserts them, we have defined a validation rule.

Let’s try to do some example inserting documents to test if everything is working as expected.

Insert a document with one of the required fields missing:

MongoDB > db.people.insert( { name : "John", surname : "Smith" } )
    WriteResult({
      "nInserted" : 0,
      "writeError" : {
      "code" : 121,
      "errmsg" : "Document failed validation"
   }
})

Insert a document with all the required fields but with an invalid email address

MongoDB > db.people.insert( { name : "John", surname : "Smith", email : "john.smith.gmail.com" } )
   WriteResult({
      "nInserted" : 0,
      "writeError" : {
      "code" : 121,
      "errmsg" : "Document failed validation"
   }
})

Finally, insert a valid document

MongoDB > db.people.insert( { name : "John", surname : "Smith", email : "john.smith@gmail.com" } )
WriteResult({ "nInserted" : 1 })

Let’s try now to do more inserts including of other fields.

MongoDB > db.people.insert( { name : "Bruce", surname : "Dickinson", email : "bruce@gmail.com", year_of_birth : NumberInt(1958), gender : "M" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.insert( { name : "Corrado", surname : "Pandiani", email : "corrado.pandiani@percona.com", year_of_birth : NumberInt(1971), gender : "M" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.insert( { name : "Marie", surname : "Adamson", email : "marie@gmail.com", year_of_birth : NumberInt(1992), gender : "F" } )
WriteResult({ "nInserted" : 1 })

The records were inserted correctly because all the rules on the required fields, and on the other not required fields, were satisfied. Let’s see now a case where the year_of_birth or gender fields are not correct.

MongoDB > db.people.insert( { name : "Tom", surname : "Tom", email : "tom@gmail.com", year_of_birth : NumberInt(1980), gender : "X" } )
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
MongoDB > db.people.insert( { name : "Luise", surname : "Luise", email : "tom@gmail.com", year_of_birth : NumberInt(1899), gender : "F" } )
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})

In the first query gender is X, but the valid values are only M or F. In the second query year of birth is outside the permitted range.

Let’s try now to insert documents with arbitrary extra fields that are not in the JSON Schema Validator.

MongoDB > db.people.insert( { name : "Tom", surname : "Tom", email : "tom@gmail.com", year_of_birth : NumberInt(2000), gender : "M", shirt_size : "XL", preferred_band : "Coldplay" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.insert( { name : "Luise", surname : "Luise", email : "tom@gmail.com", gender : "F", shirt_size : "M", preferred_band : "Maroon Five" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.find().pretty()
{
"_id" : ObjectId("5b6b12e0f213dc83a7f5b5e8"),
"name" : "John",
"surname" : "Smith",
"email" : "john.smith@gmail.com"
}
{
"_id" : ObjectId("5b6b130ff213dc83a7f5b5e9"),
"name" : "Bruce",
"surname" : "Dickinson",
"email" : "bruce@gmail.com",
"year_of_birth" : 1958,
"gender" : "M"
}
{
"_id" : ObjectId("5b6b1328f213dc83a7f5b5ea"),
"name" : "Corrado",
"surname" : "Pandiani",
"email" : "corrado.pandiani@percona.com",
"year_of_birth" : 1971,
"gender" : "M"
}
{
"_id" : ObjectId("5b6b1356f213dc83a7f5b5ed"),
"name" : "Marie",
"surname" : "Adamson",
"email" : "marie@gmail.com",
"year_of_birth" : 1992,
"gender" : "F"
}
{
"_id" : ObjectId("5b6b1455f213dc83a7f5b5f0"),
"name" : "Tom",
"surname" : "Tom",
"email" : "tom@gmail.com",
"year_of_birth" : 2000,
"gender" : "M",
"shirt_size" : "XL",
"preferred_band" : "Coldplay"
}
{
"_id" : ObjectId("5b6b1476f213dc83a7f5b5f1"),
"name" : "Luise",
"surname" : "Luise",
"email" : "tom@gmail.com",
"gender" : "F",
"shirt_size" : "M",
"preferred_band" : "Maroon Five"
}

As we can see, we have the flexibility to add new fields with no restrictions on the permitted values.

Having a really fixed schema

The behavior we have seen so far to permit the addition of extra fields that are not in the validation rules is the default. If we would like to be more restrictive and have a really fixed schema for the collection we need to add the additionalProperties: false parameter in the createCollection command.

In the following example, we create a validator to permit only the required fields. No other extra fields are permitted.

db.createCollection( "people2" , {
   validator: {
     $jsonSchema: {
        bsonType: "object",
        additionalProperties: false,
        properties: {
           _id : {
              bsonType: "objectId" },
           name: {
              bsonType: "string",
              description: "required and must be a string" },
           age: {
              bsonType: "int",
              minimum: 0,
              maximum: 100,
              description: "required and must be in the range 0-100" }
        }
     }
}})

Note a couple of differences:

  • we don’t need to specify the list of required fields; using additionalProperties: false forces all the fields to be required by default
  • we need to put explicitly even the _id field

As you can notice in the following test, we are no longer allowed to add extra fields. We are forced to insert documents always with the same two fields name and age.

MongoDB > db.people2.insert( {name : "George", age: NumberInt(30)} )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people2.insert( {name : "Maria", age: NumberInt(35), surname: "Peterson"} )
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})

In this case we don’t have flexibility, and that is the main benefit of having a NoSQL database like MongoDB.

Well, it’s up to you to use it or not. It depends on the nature and goals of your application. I wouldn’t recommend it in most cases.

Add validation to existing collections

We have seen so far how to create a new collection with validation rules, But what about the existing collections? How can we add rules?

This is quite trivial. The syntax to use in $jsonSchema remains the same, we just need to use the collMod command instead of createCollection. The following example shows how to create validation rules on an existing collection.

First we create a simple new collection people3, inserting some documents.

MongoDB > db.people3.insert( {name: "Corrado", surname: "Pandiani", year_of_birth: NumberLong(1971)} )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people3.insert( {name: "Tom", surname: "Cruise", year_of_birth: NumberLong(1961), gender: "M"} )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people3.insert( {name: "Kevin", surname: "Bacon", year_of_birth: NumberLong(1964), gender: "M", shirt_size: "L"} )
WriteResult({ "nInserted" : 1 })

Let’s create the validator.

MongoDB > db.runCommand( { collMod: "people3",
   validator: {
      $jsonSchema : {
         bsonType: "object",
         required: [ "name", "surname", "gender" ],
         properties: {
            name: {
               bsonType: "string",
               description: "required and must be a string" },
            surname: {
               bsonType: "string",
               description: "required and must be a string" },
            gender: {
               enum: [ "M", "F" ],
               description: "required and must be M or F" }
         }
       }
},
validationLevel: "moderate",
validationAction: "warn"
})

The two new options validationLevel and validationAction are important in this case.

validationLevel can have the following values:

  • “off” : validation is not applied
  • “strict”: it’s the default value. Validation applies to all inserts and updates
  • “moderated”: validation applies to all the valid existing documents. Not valid documents are ignored.

When creating validation rules on existing collections, the “moderated” value is the safest option.

validationAction can have the following values:

  • “error”: it’s the default value. The document must pass the validation in order to be written
  • “warn”: a document that doesn’t pass the validation is written but a warning message is logged

When adding validation rules to an existing collection the safest option is “warn”

These two options can be applied even with createCollection. We didn’t use them because the default values are good in most of the cases.

How to investigate a collection definition

In case we want to see how a collection was defined, and, in particular, what the validator rules are, the command db.getCollectionInfos() can be used. The following example shows how to investigate the “schema” we have created for the people collection.

MongoDB > db.getCollectionInfos( {name: "people"} )
[
  {
    "name" : "people",
    "type" : "collection",
    "options" : {
      "validator" : {
        "$jsonSchema" : {
          "bsonType" : "object",
          "required" : [
            "name",
            "surname",
            "email"
          ],
          "properties" : {
            "name" : {
              "bsonType" : "string",
              "description" : "required and must be a string"
            },
            "surname" : {
              "bsonType" : "string",
              "description" : "required and must be a string"
            },
            "email" : {
              "bsonType" : "string",
              "pattern" : "^.+@.+$",
              "description" : "required and must be a valid email address"
             },
             "year_of_birth" : {
               "bsonType" : "int",
               "minimum" : 1900,
               "maximum" : 2018,
               "description" : "the value must be in the range 1900-2018"
             },
             "gender" : {
               "enum" : [
                 "M",
                 "F"
               ],
             "description" : "can be only M or F"
        }
      }
    }
  }
},
"info" : {
  "readOnly" : false,
  "uuid" : UUID("5b98c6f0-2c9e-4c10-a3f8-6c1e7eafd2b4")
},
"idIndex" : {
  "v" : 2,
  "key" : {
    "_id" : 1
  },
"name" : "_id_",
"ns" : "test.people"
}
}
]

Limitations and restrictions

Validators cannot be defined for collections in the following databases: admin, local, config.

Validators cannot be defined for system.* collections.

A limitation in the current implementation of JSON Schema Validator is that the error messages are not very good in terms of helping you to understand which of the rules are not satisfied by the document. This should be confirmed manually by doing some tests, and that’s not so easy when dealing with complex documents. Having more specific error strings, hopefully taken from the validator definition, could be very useful when debugging application errors and warnings. This is definitely something that should be improved in the next releases.

While waiting for improvements, someone has developed a wrapper for the mongo client to gather more defined error strings. You can have a look at https://www.npmjs.com/package/mongo-schemer. You can test it and use it, but pay attention to the clause “Running in prod is not recommended due to the overhead of validating documents against the schema“.

Conclusions

Doing schema validation in the application remains, in general, a best practice, but JSON Schema Validator is a good tool to enforce validation directly into the database.

Hence even though it needs some improvements, the JSON Schema feature is good enough for most of the common cases. We suggest to test it and use it when you really need to create a backbone structure for your data.

While you are here…

You might also enjoy these other articles about MongoDB 3.6

 

The post MongoDB: how to use the JSON Schema Validator appeared first on Percona Database Performance Blog.

by Corrado Pandiani at August 16, 2018 10:57 AM

Jean-Jerome Schmidt

MySQL on Docker: How to Monitor MySQL Containers with Prometheus - Part 1 - Deployment on Standalone and Swarm

Monitoring is a concern for containers, as the infrastructure is dynamic. Containers can be routinely created and destroyed, and are ephemeral. So how do you keep track of your MySQL instances running on Docker?

As with any software component, there are many options out there that can be used. We’ll look at Prometheus as a solution built for distributed infrastructure, and works very well with Docker.

This is a two-part blog. In this part 1 blog, we are going to cover the deployment aspect of our MySQL containers with Prometheus and its components, running as standalone Docker containers and Docker Swarm services. In part 2, we will look at the important metrics to monitor from our MySQL containers, as well as integration with the paging and notification systems.

Introduction to Prometheus

Prometheus is a full monitoring and trending system that includes built-in and active scraping, storing, querying, graphing, and alerting based on time series data. Prometheus collects metrics through pull mechanism from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true. It supports all the target metrics that we want to measure if one would like to run MySQL as Docker containers. Those metrics include physical hosts metrics, Docker container metrics and MySQL server metrics.

Take a look at the following diagram which illustrates Prometheus architecture (taken from Prometheus official documentation):

We are going to deploy some MySQL containers (standalone and Docker Swarm) complete with a Prometheus server, MySQL exporter (i.e., a Prometheus agent to expose MySQL metrics, that can then be scraped by the Prometheus server) and also Alertmanager to handle alerts based on the collected metrics.

For more details check out the Prometheus documentation. In this example, we are going to use the official Docker images provided by the Prometheus team.

Standalone Docker

Deploying MySQL Containers

Let's run two standalone MySQL servers on Docker to simplify our deployment walkthrough. One container will be using the latest MySQL 8.0 and the other one is MySQL 5.7. Both containers are in the same Docker network called "db_network":

$ docker network create db_network
$ docker run -d \
--name mysql80 \
--publish 3306 \
--network db_network \
--restart unless-stopped \
--env MYSQL_ROOT_PASSWORD=mypassword \
--volume mysql80-datadir:/var/lib/mysql \
mysql:8 \
--default-authentication-plugin=mysql_native_password

MySQL 8 defaults to a new authentication plugin called caching_sha2_password. For compatibility with Prometheus MySQL exporter container, let's use the widely-used mysql_native_password plugin whenever we create a new MySQL user on this server.

For the second MySQL container running 5.7, we execute the following:

$ docker run -d \
--name mysql57 \
--publish 3306 \
--network db_network \
--restart unless-stopped \
--env MYSQL_ROOT_PASSWORD=mypassword \
--volume mysql57-datadir:/var/lib/mysql \
mysql:5.7

Verify if our MySQL servers are running OK:

[root@docker1 mysql]# docker ps | grep mysql
cc3cd3c4022a        mysql:5.7           "docker-entrypoint.s…"   12 minutes ago      Up 12 minutes       0.0.0.0:32770->3306/tcp   mysql57
9b7857c5b6a1        mysql:8             "docker-entrypoint.s…"   14 minutes ago      Up 14 minutes       0.0.0.0:32769->3306/tcp   mysql80

At this point, our architecture is looking something like this:

Let's get started to monitor them.

Exposing Docker Metrics to Prometheus

Docker has built-in support as Prometheus target, where we can use to monitor the Docker engine statistics. We can simply enable it by creating a text file called "daemon.json" inside the Docker host:

$ vim /etc/docker/daemon.json

And add the following lines:

{
  "metrics-addr" : "12.168.55.161:9323",
  "experimental" : true
}

Where 192.168.55.161 is the Docker host primary IP address. Then, restart Docker daemon to load the change:

$ systemctl restart docker

Since we have defined --restart=unless-stopped in our MySQL containers' run command, the containers will be automatically started after Docker is running.

Deploying MySQL Exporter

Before we move further, the mysqld exporter requires a MySQL user to be used for monitoring purposes. On our MySQL containers, create the monitoring user:

$ docker exec -it mysql80 mysql -uroot -p
Enter password:
mysql> CREATE USER 'exporter'@'%' IDENTIFIED BY 'exporterpassword' WITH MAX_USER_CONNECTIONS 3;
mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';

Take note that it is recommended to set a max connection limit for the user to avoid overloading the server with monitoring scrapes under heavy load. Repeat the above statements onto the second container, mysql57:

$ docker exec -it mysql57 mysql -uroot -p
Enter password:
mysql> CREATE USER 'exporter'@'%' IDENTIFIED BY 'exporterpassword' WITH MAX_USER_CONNECTIONS 3;
mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';

Let's run the mysqld exporter container called "mysql8-exporter" to expose the metrics for our MySQL 8.0 instance as below:

$ docker run -d \
--name mysql80-exporter \
--publish 9104 \
--network db_network \
--restart always \
--env DATA_SOURCE_NAME="exporter:exporterpassword@(mysql80:3306)/" \
prom/mysqld-exporter:latest \
--collect.info_schema.processlist \
--collect.info_schema.innodb_metrics \
--collect.info_schema.tablestats \
--collect.info_schema.tables \
--collect.info_schema.userstats \
--collect.engine_innodb_status

And also another exporter container for our MySQL 5.7 instance:

$ docker run -d \
--name mysql57-exporter \
--publish 9104 \
--network db_network \
--restart always \
-e DATA_SOURCE_NAME="exporter:exporterpassword@(mysql57:3306)/" \
prom/mysqld-exporter:latest \
--collect.info_schema.processlist \
--collect.info_schema.innodb_metrics \
--collect.info_schema.tablestats \
--collect.info_schema.tables \
--collect.info_schema.userstats \
--collect.engine_innodb_status

We enabled a bunch of collector flags for the container to expose the MySQL metrics. You can also enable --collect.slave_status, --collect.slave_hosts if you have a MySQL replication running on containers.

We should be able to retrieve the MySQL metrics via curl from the Docker host directly (port 32771 is the published port assigned automatically by Docker for container mysql80-exporter):

$ curl 127.0.0.1:32771/metrics
...
mysql_info_schema_threads_seconds{state="waiting for lock"} 0
mysql_info_schema_threads_seconds{state="waiting for table flush"} 0
mysql_info_schema_threads_seconds{state="waiting for tables"} 0
mysql_info_schema_threads_seconds{state="waiting on cond"} 0
mysql_info_schema_threads_seconds{state="writing to net"} 0
...
process_virtual_memory_bytes 1.9390464e+07

At this point, our architecture is looking something like this:

We are now good to setup the Prometheus server.

Deploying Prometheus Server

Firstly, create Prometheus configuration file at ~/prometheus.yml and add the following lines:

$ vim ~/prometheus.yml
global:
  scrape_interval:     5s
  scrape_timeout:      3s
  evaluation_interval: 5s

# Our alerting rule files
rule_files:
  - "alert.rules"

# Scrape endpoints
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'mysql'
    static_configs:
      - targets: ['mysql57-exporter:9104','mysql80-exporter:9104']

  - job_name: 'docker'
    static_configs:
      - targets: ['192.168.55.161:9323']

From the Prometheus configuration file, we have defined three jobs - "prometheus", "mysql" and "docker". The first one is the job to monitor the Prometheus server itself. The next one is the job to monitor our MySQL containers named "mysql". We define the endpoints on our MySQL exporters on port 9104, which exposed the Prometheus-compatible metrics from the MySQL 8.0 and 5.7 instances respectively. The "alert.rules" is the rule file that we will include later in the next blog post for alerting purposes.

We can then map the configuration with the Prometheus container. We also need to create a Docker volume for Prometheus data for persistency and also expose port 9090 publicly:

$ docker run -d \
--name prometheus-server \
--publish 9090:9090 \
--network db_network \
--restart unless-stopped \
--mount type=volume,src=prometheus-data,target=/prometheus \
--mount type=bind,src="$(pwd)"/prometheus.yml,target=/etc/prometheus/prometheus.yml \
--mount type=bind,src="$(pwd)
prom/prometheus

Now our Prometheus server is already running and can be accessed directly on port 9090 of the Docker host. Open a web browser and go to http://192.168.55.161:9090/ to access the Prometheus web UI. Verify the target status under Status -> Targets and make sure they are all green:

At this point, our container architecture is looking something like this:

Our Prometheus monitoring system for our standalone MySQL containers are now deployed.

Docker Swarm

Deploying a 3-node Galera Cluster

Supposed we want to deploy a three-node Galera Cluster in Docker Swarm, we would have to create 3 different services, each service representing one Galera node. Using this approach, we can keep a static resolvable hostname for our Galera container, together with MySQL exporter containers that will accompany each of them. We will be using MariaDB 10.2 image maintained by the Docker team to run our Galera cluster.

Firstly, create a MySQL configuration file to be used by our Swarm service:

$ vim ~/my.cnf
[mysqld]

default_storage_engine          = InnoDB
binlog_format                   = ROW

innodb_flush_log_at_trx_commit  = 0
innodb_flush_method             = O_DIRECT
innodb_file_per_table           = 1
innodb_autoinc_lock_mode        = 2
innodb_lock_schedule_algorithm  = FCFS # MariaDB >10.1.19 and >10.2.3 only

wsrep_on                        = ON
wsrep_provider                  = /usr/lib/galera/libgalera_smm.so
wsrep_sst_method                = mariabackup

Create a dedicated database network in our Swarm called "db_swarm":

$ docker network create --driver overlay db_swarm

Import our MySQL configuration file into Docker config so we can load it into our Swarm service when we create it later:

$ cat ~/my.cnf | docker config create my-cnf -

Create the first Galera bootstrap service, with "gcomm://" as the cluster address called "galera0". This is a transient service for bootstrapping process only. We will delete this service once we have gotten 3 other Galera services running:

$ docker service create \
--name galera0 \
--replicas 1 \
--hostname galera0 \
--network db_swarm \
--publish 3306 \
--publish 4444 \
--publish 4567 \
--publish 4568 \
--config src=my-cnf,target=/etc/mysql/mariadb.conf.d/my.cnf \
--env MYSQL_ROOT_PASSWORD=mypassword \
--mount type=volume,src=galera0-datadir,dst=/var/lib/mysql \
mariadb:10.2 \
--wsrep_cluster_address=gcomm:// \
--wsrep_sst_auth="root:mypassword" \
--wsrep_node_address=galera0

At this point, our database architecture can be illustrated as below:

Then, repeat the following command for 3 times to create 3 different Galera services. Replace {name} with galera1, galera2 and galera3 respectively:

$ docker service create \
--name {name} \
--replicas 1 \
--hostname {name} \
--network db_swarm \
--publish 3306 \
--publish 4444 \
--publish 4567 \
--publish 4568 \
--config src=my-cnf,target=/etc/mysql/mariadb.conf.d/my.cnf \
--env MYSQL_ROOT_PASSWORD=mypassword \
--mount type=volume,src={name}-datadir,dst=/var/lib/mysql \
mariadb:10.2 \
--wsrep_cluster_address=gcomm://galera0,galera1,galera2,galera3 \
--wsrep_sst_auth="root:mypassword" \
--wsrep_node_address={name}

Verify our current Docker services:

$ docker service ls 
ID                  NAME                MODE                REPLICAS            IMAGE               PORTS
wpcxye3c4e9d        galera0             replicated          1/1                 mariadb:10.2        *:30022->3306/tcp, *:30023->4444/tcp, *:30024-30025->4567-4568/tcp
jsamvxw9tqpw        galera1             replicated          1/1                 mariadb:10.2        *:30026->3306/tcp, *:30027->4444/tcp, *:30028-30029->4567-4568/tcp
otbwnb3ridg0        galera2             replicated          1/1                 mariadb:10.2        *:30030->3306/tcp, *:30031->4444/tcp, *:30032-30033->4567-4568/tcp
5jp9dpv5twy3        galera3             replicated          1/1                 mariadb:10.2        *:30034->3306/tcp, *:30035->4444/tcp, *:30036-30037->4567-4568/tcp

Our architecture is now looking something like this:

We need to remove the Galera bootstrap Swarm service, galera0, to stop it from running because if the container is being rescheduled by Docker Swarm, a new replica will be started with a fresh new volume. We run the risk of data loss because the --wsrep_cluster_address contains "galera0" in the other Galera nodes (or Swarm services). So, let's remove it:

$ docker service rm galera0

At this point, we have our three-node Galera Cluster:

We are now ready to deploy our MySQL exporter and Prometheus Server.

MySQL Exporter Swarm Service

Login to one of the Galera nodes and create the exporter user with proper privileges:

$ docker exec -it {galera1} mysql -uroot -p
Enter password:
mysql> CREATE USER 'exporter'@'%' IDENTIFIED BY 'exporterpassword' WITH MAX_USER_CONNECTIONS 3;
mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';

Then, create the exporter service for each of the Galera services (replace {name} with galera1, galera2 and galera3 respectively):

$ docker service create \
--name {name}-exporter \
--network db_swarm \
--replicas 1 \
-p 9104 \
-e DATA_SOURCE_NAME="exporter:exporterpassword@({name}:3306)/" \
prom/mysqld-exporter:latest \
--collect.info_schema.processlist \
--collect.info_schema.innodb_metrics \
--collect.info_schema.tablestats \
--collect.info_schema.tables \
--collect.info_schema.userstats \
--collect.engine_innodb_status

At this point, our architecture is looking something like this with exporter services in the picture:

Prometheus Server Swarm Service

Finally, let's deploy our Prometheus server. Similar to the Galera deployment, we have to prepare the Prometheus configuration file first before importing it into Swarm using Docker config command:

$ vim ~/prometheus.yml
global:
  scrape_interval:     5s
  scrape_timeout:      3s
  evaluation_interval: 5s

# Our alerting rule files
rule_files:
  - "alert.rules"

# Scrape endpoints
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'galera'
    static_configs:
      - targets: ['galera1-exporter:9104','galera2-exporter:9104', 'galera3-exporter:9104']

From the Prometheus configuration file, we have defined three jobs - "prometheus" and "galera". The first one is the job to monitor the Prometheus server itself. The next one is the job to monitor our MySQL containers named "galera". We define the endpoints on our MySQL exporters on port 9104, which expose the Prometheus-compatible metrics from the three Galera nodes respectively. The "alert.rules" is the rule file that we will include later in the next blog post for alerting purposes.

Import the configuration file into Docker config to be used with Prometheus container later:

$ cat ~/prometheus.yml | docker config create prometheus-yml -

Let's run the Prometheus server container, and publish port 9090 of all Docker hosts for the Prometheus web UI service:

$ docker service create \
--name prometheus-server \
--publish 9090:9090 \
--network db_swarm \
--replicas 1 \    
--config src=prometheus-yml,target=/etc/prometheus/prometheus.yml \
--mount type=volume,src=prometheus-data,dst=/prometheus \
prom/prometheus

Verify with the Docker service command that we have 3 Galera services, 3 exporter services and 1 Prometheus service:

$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                         PORTS
jsamvxw9tqpw        galera1             replicated          1/1                 mariadb:10.2                  *:30026->3306/tcp, *:30027->4444/tcp, *:30028-30029->4567-4568/tcp
hbh1dtljn535        galera1-exporter    replicated          1/1                 prom/mysqld-exporter:latest   *:30038->9104/tcp
otbwnb3ridg0        galera2             replicated          1/1                 mariadb:10.2                  *:30030->3306/tcp, *:30031->4444/tcp, *:30032-30033->4567-4568/tcp
jq8i77ch5oi3        galera2-exporter    replicated          1/1                 prom/mysqld-exporter:latest   *:30039->9104/tcp
5jp9dpv5twy3        galera3             replicated          1/1                 mariadb:10.2                  *:30034->3306/tcp, *:30035->4444/tcp, *:30036-30037->4567-4568/tcp
10gdkm1ypkav        galera3-exporter    replicated          1/1                 prom/mysqld-exporter:latest   *:30040->9104/tcp
gv9llxrig30e        prometheus-server   replicated          1/1                 prom/prometheus:latest        *:9090->9090/tcp

Now our Prometheus server is already running and can be accessed directly on port 9090 from any Docker node. Open a web browser and go to http://192.168.55.161:9090/ to access the Prometheus web UI. Verify the target status under Status -> Targets and make sure they are all green:

At this point, our Swarm architecture is looking something like this:

To be continued..

We now have our database and monitoring stack deployed on Docker. In part 2 of the blog, we will look into the different MySQL metrics to keep an eye on. We’ll also see how to configure alerting with Prometheus.

by ashraf at August 16, 2018 10:18 AM

August 15, 2018

MariaDB AB

MariaDB Server 10.3.9 now available

MariaDB Server 10.3.9 now available dbart Wed, 08/15/2018 - 13:04

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.3.9. See the release notes and changelog for details and visit mariadb.com/downloads to download.

 

Download MariaDB Server 10.3.9

Release Notes Changelog What is MariaDB 10.3?

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.3.9. See the release notes and changelog for details.

Login or Register to post comments

by dbart at August 15, 2018 05:04 PM