Planet MariaDB

February 28, 2020

SeveralNines

An Introduction to Percona Server for MongoDB 4.2

When choosing a NoSQL database technology important considerations should be taken into account, such as performance, resilience, reliability, and security. These key factors should also be aligned with achieving business goals, at least as far as the database is concerned. 

Many technologies have come into play to improve these aspects, and it is advisable for an organisation to improve the salient options and try integrating them into the database systems. 

New technologies should ensure maximum performance to enhance achievement of business goals at an affordable operating cost but with more manipulative features such as error detection and alerting systems.

In this blog we will discuss the Percona version of MongoDB and how it expands the power of MongoDB in a variety of ways.

What is Percona Server for MongoDB?

For a database to perform well, there must be an optimally established underlying server for enhancing read and write transactions. Percona Server for MongoDB is a free open-source drop-in replacement for the MongoDB Community Edition, but with additional enterprise-grade functionality. It is designed with some major improvements on the default MongoDB server setup. 

It delivers high performance, improved security, and reliability for optimum performance with reduced expenditure on proprietary software vendor relationships. 

Percona Server for MongoDB Salient Features

MongoDB Community Edition is core to the Percona server considering that it already constitutes important features such as the flexible schema, distributed transactions, familiarity of the JSON documents and native high availability. Besides this, Percona Server for MongoDB integrates the following salient features that enables it to satisfy the aspects we have mentioned above:

  • Hot Backups
  • Data at rest encryption
  • Audit Logging
  • Percona Memory Engine
  • External LDAP Authentication with SASL
  • HashiCorp Vault Integration
  • Enhanced query profiling

Hot Backups 

Percona server for MongoDB creates a physical data backup on a running server in the background without any noticeable operation degradation. This is achievable by running the createBackup command as an administrator on the admin database and specifying the backup directory. 

> use admin

switched to db admin

> db.runCommand({createBackup: 1, backupDir: "/my/backup/data/path"})

{ "ok" : 1 }

When you receive { "ok" : 1 } then the backup was successful. Otherwise, if for example you specify an empty backup directory path, you may receive an error response i.e:

{ "ok" : 0, "errmsg" : "Destination path must be absolute" }

Restoring the backup requires one to first stop the mongod instance, clean the data directory, copy the files from the directory and then restart the mongod service. This can be done by running the command below

$ service mongod stop && rm -rf /var/lib/mongodb/* && cp --recursive /my/backup/data/path /var/lib/mongodb/ && service mongod start

You can also store the backup in archive format if using Percona server for MongoDB 4.2.1-1 

> use admin

> db.runCommand({createBackup: 1, archive: "path/to/archive.tar" })

You can also backup directly to AWS S3 using the default settings or with more configurations. For a default S3 bucket backup:

> db.runCommand({createBackup: 1,  s3: {bucket: "backup", path: "newBackup"}})

Data-at-Rest Encryption

MongoDB version 3.2 introduced data at rest encryption for the WiredTiger storage engine towards ensuring that data files can be decrypted and read by parties with decryption key only. Data encryption at rest in Percona Server for MongoDB was introduced in version 3.6 to go in hand with data encryption at rest interface in MongoDB. However the latest version does not include support for Amazon AWS and KIMP key management services.

The encryption can also be applied to rollback files when data at rest is enabled. Percona Server for MongoDB uses encryptionCipherMode option with 2 selective cipher modes:

  1. AES256-CBC (default cipher mode)
  2. AES256-GCM

You can encrypt data with the command below

$ mongod ... --encryptionCipherMode or 

$ mongod ... --encryptionCipherMode AES256-GCM

We use the --ecryptionKeyFile option to specify the path to a file that contains the encryption key.

$ mongod ... --enableEncryption --encryptionKeyFile <fileName>

Audit Logging

For every database system, administrators have a mandate to keep track on activities taking place. In Percona Server for MongoDB, when auditing is enabled, the server generates an audit log file than constitutes information about different user events such as authorization and authentication. However, starting the server with auditing enabled, the logs won’t be displayed dynamically during runtime. 

The Audit Logging in MongoDB Community Edition can take two data formats that is, JSON and BSON. However, for Percona Server for MongoDB, audit logging is limited to JSON file only. The server also logs only important commands contrary to MongoDB that logs everything. Since the filtering procedure in Percona is so unclear in terms of the filtering syntax, enabling the audit log without filtering would offer more entries from which one can narrow down to own specifications.

Percona Memory Engine

This is a special configuration of the WiredTiger storage engine that does not store user data on disk. The data fully resides and is readily available in the main memory except for diagnostic data that is written to disk. This makes data processing much faster but with a consideration that you must ensure there is enough memory to hold the data set and the server should not shut down. One can select a storage engine to use with the  --storageEngine command. Data created for one storage engine cannot be compatible with other storage engines because each storage engine has its own data model. For instance to select the in-memory storage engine. You first stop any running mongod instance and then issue the commands:

$ service mongod stop

$ mongod --storageEngine inMemory --dbpath <newDataDir>

If you already have some data with your default MongoDB Community edition and you would like to migrate to Percona Memory Engine, just use the mongodumb and mongorestore utilities by issuing the command:

$ mongodump --out <dumpDir>

$ service mongod stop

$ rm -rf /var/lib/mongodb/*

$ sed -i '/engine: .*inMemory/s/#//g' /etc/mongod.conf

$ service mongod start

$ mongorestore <dumpDir>

External LDAP Authentication With SASL

Whenever  clients make either  a read or write request to MongoDB mongod instance, they need to authenticate against the MongoDB server user database first. The external authentication allows the MongoDB server to verify the client credentials (username and password) against a separate service. The external authentication architecture involves:

  1. LDAP Server which remotely stores all user credentials
  2. SASL Daemon that is used as a MongoDB server-local proxy for the remote LDAP service.
  3. SASL Library: creates necessary authentication data for MongoDB client and server.

Authentication session sequence

  • The Client gets connected to a running mongod instance and creates a PLAIN authentication request using the SASL library.
  • The auth request is then sent to the server as a special Mongo command which is then received by the mongod server with its request payload.
  • The server creates some SASL sessions derived with client credentials using its own reference to the SASL library.
  • The mongod server passes the auth payload to the SASL library which hands it over to the saslauthd daemon. The daemon passes it to the LDAP and awaits a YES or NO response upon the authentication request by checking if the user exists and the submitted password is correct.
  • The saslauthd passes this response to the mongod server through the SASL library which then authenticates or rejects the request accordingly.

 Here is an illustration for this process:

To add an external user to a mongod server:

> db.getSiblingDB("$external").createUser( {user : username, roles: [ {role: "read", db: "test"} ]} );

External users however  cannot have roles assigned in the admin database.

HashiCorp Vault Integration

HashCorp Vault is a product designed to manage secrets and protect sensitive data by securely storing and tightly controlling access to confidential information. With the previous Percona version, data at rest encryption key was stored locally on the server inside the key file. The integration with HashCorp Vault secures the encryption key much far better.

Enhanced Query Profiling

Profiling has a degradation impact on the database  performance especially when there are so many queries issued. Percona server for MongoDB comes in hand by limiting the number of queries collected by the database profiler hence decreases its impact on performance.

Conclusion

Percona Server for MongoDB is an enhanced open source and highly scalable database that may act as a compatible drop-in replacement for MongoDB Community Edition but with similar syntax and configuration. It enhances extensive data security especially one at rest and improved database performance through provision of Percona Server engine, limiting on the profiling rate among other features.

Percona Server for MongoDB is fully supported by ClusterControl as an option for deployment.

by Onyancha Brian Henry at February 28, 2020 06:02 PM

February 27, 2020

SeveralNines

What to Look for if Your PostgreSQL Replication is Lagging

Replication lag issues in PostgreSQL is not a widespread issue for most setups. Although, it can occur and when it does it can impact your production setups. PostgreSQL is designed to handle multiple threads, such as query parallelism or deploying worker threads to handle specific tasks based on the assigned values in the configuration. PostgreSQL is designed to handle heavy and stressful loads, but sometimes (due to a bad configuration) your server might still go south.

Identifying the replication lag in PostgreSQL is not a complicated task to do, but there are a few different approaches to look into the problem. In this blog, we'll take a look at what things to look at when your PostgreSQL replication is lagging.

Types of Replication in PostgreSQL

Before diving into the topic, let's see first how replication in PostgreSQL evolves as there are diverse set of approaches and solutions when dealing with replication.

Warm standby for PostgreSQL was implemented in version 8.2 (back in 2006) and was based on the log shipping method. This means that the WAL records are directly moved from one database server to another to be applied, or simply an analogous approach to PITR, or very much like what you are doing with rsync.

This approach, even old, is still used today and some institutions actually prefer this older approach. This approach implements a file-based log shipping by transferring WAL records one file (WAL segment) at a time. Though it has a downside; A major failure on the primary servers, transactions not yet shipped will be lost. There is a window for data loss (you can tune this by using the archive_timeout parameter, which can be set to as low as a few seconds, but such a low setting will substantially increase the bandwidth required for file shipping).

In PostgreSQL version 9.0, Streaming Replication was introduced. This feature allowed us to stay more up-to-date when compared to file-based log shipping. Its approach is by transferring WAL records (a WAL file is composed of WAL records) on the fly (merely a record based log shipping), between a master server and one or several standby servers. This protocol does not need to wait for the WAL file to be filled, unlike file-based log shipping. In practice, a process called WAL receiver, running on the standby server, will connect to the primary server using a TCP/IP connection. In the primary server, another process exists named WAL sender. It's role is in charge of sending the WAL registries to the standby server(s) as they happen.

Asynchronous Replication setups in streaming replication can incur problems such as data loss or slave lag, so version 9.1 introduces synchronous replication. In synchronous replication, each commit of a write transaction will wait until confirmation is received that the commit has been written to the write-ahead log on disk of both the primary and standby server. This method minimizes the possibility of data loss, as for that to happen we will need for both, the master and the standby to fail at the same time. 

The obvious downside of this configuration is that the response time for each write transaction increases, as we need to wait until all parties have responded. Unlike MySQL, there's no support such as in a semi-synchronous environment of MySQL, it will failback to asynchronous if timeout has occurred. So in With PostgreSQL, the time for a commit is (at minimum) the round trip between the primary and the standby. Read-only transactions will not be affected by that.

As it evolves, PostgreSQL is continuously improving and yet its replication is diverse. For example, you can use physical streaming asynchronous replication or use logical streaming replication. Both are monitored differently though use the same approach when sending data over replication, which is still streaming replication. For more details check in the manual for different types of solutions in PostgreSQL when dealing with replication.

Causes of PostgreSQL Replication Lag

As defined in our previous blog, a replication lag is the cost of delay for transaction(s) or operation(s) calculated by its time difference of execution between the primary/master against the standby/slave node.  

Since PostgreSQL is using streaming replication, it's designed to be fast as changes are recorded as a set of sequence of log records (byte-by-byte) as intercepted by the WAL receiver then writes these log records to the WAL file. Then the startup process by PostgreSQL replays the data from that WAL segment and streaming replication begins. In PostgreSQL, a replication lag can occur by these factors:

  • Network issues
  • Not able to find the WAL segment from the primary. Usually, this is due to the checkpointing behavior where WAL segments are rotated or recycled
  • Busy nodes (primary and standby(s)). Can be caused by external processes or some bad queries caused to be a resource intensive
  • Bad hardware or hardware issues causing to take some lag
  • Poor configuration in PostgreSQL such as small numbers of max_wal_senders being set while processing tons of transaction requests (or large volume of changes).

What To Look for With PostgreSQL Replication Lag

PostgreSQL replication is yet diverse but monitoring the replication health is subtle yet not complicated. In this approach we'll showcase are based on a primary-standby setup with asynchronous streaming replication. The logical replication cannot benefit most of the cases we're discussing here but the view pg_stat_subscription can help you collect information. However, we'll not focus on that in this blog.

Using pg_stat_replication View

The most common approach is to run a query referencing this view in the primary node. Remember, you can only harvest information from the primary node using this view. This view contains the following table definition based on PostgreSQL 11 as shown below:

postgres=# \d pg_stat_replication

                    View "pg_catalog.pg_stat_replication"

      Column      | Type           | Collation | Nullable | Default 

------------------+--------------------------+-----------+----------+---------

 pid              | integer           | | | 

 usesysid         | oid           | | | 

 usename          | name           | | | 

 application_name | text                     | | | 

 client_addr      | inet           | | | 

 client_hostname  | text           | | | 

 client_port      | integer           | | | 

 backend_start    | timestamp with time zone |           | | 

 backend_xmin     | xid           | | | 

 state            | text           | | | 

 sent_lsn         | pg_lsn           | | | 

 write_lsn        | pg_lsn           | | | 

 flush_lsn        | pg_lsn           | | | 

 replay_lsn       | pg_lsn           | | | 

 write_lag        | interval           | | | 

 flush_lag        | interval           | | | 

 replay_lag       | interval           | | | 

 sync_priority    | integer           | | | 

 sync_state       | text           | | | 

Where the fields are defined as (includes PG < 10 version),

  • pid: Process id of walsender process
  • usesysid: OID of user which is used for Streaming replication.
  • username: Name of user which is used for Streaming replication
  • application_name: Application name connected to master
  • client_addr: Address of standby/streaming replication
  • client_hostname: Hostname of standby.
  • client_port: TCP port number on which standby communicating with WAL sender
  • backend_start: Start time when SR connected to Master.
  • backend_xmin: standby's xmin horizon reported by hot_standby_feedback.
  • state: Current WAL sender state i.e streaming
  • sent_lsn/sent_location: Last transaction location sent to standby.
  • write_lsn/write_location: Last transaction written on disk at standby
  • flush_lsn/flush_location: Last transaction flush on disk at standby.
  • replay_lsn/replay_location: Last transaction flush on disk at standby.
  • write_lag: Elapsed time during committed WALs from primary to the standby (but not yet committed in the standby)
  • flush_lag: Elapsed time during committed WALs from primary to the standby (WAL's has already been flushed but not yet applied)
  • replay_lag: Elapsed time during committed WALs from primary to the standby (fully committed in standby node)
  • sync_priority: Priority of standby server being chosen as synchronous standby
  • sync_state: Sync State of standby (is it async or synchronous).

A sample query would look as follows in PostgreSQL 9.6,

paultest=# select * from pg_stat_replication;

-[ RECORD 1 ]----+------------------------------

pid              | 7174

usesysid         | 16385

usename          | cmon_replication

application_name | pgsql_1_node_1

client_addr      | 192.168.30.30

client_hostname  | 

client_port      | 10580

backend_start    | 2020-02-20 18:45:52.892062+00

backend_xmin     | 

state            | streaming

sent_location    | 1/9FD5D78

write_location   | 1/9FD5D78

flush_location   | 1/9FD5D78

replay_location  | 1/9FD5D78

sync_priority    | 0

sync_state       | async

-[ RECORD 2 ]----+------------------------------

pid              | 7175

usesysid         | 16385

usename          | cmon_replication

application_name | pgsql_80_node_2

client_addr      | 192.168.30.20

client_hostname  | 

client_port      | 60686

backend_start    | 2020-02-20 18:45:52.899446+00

backend_xmin     | 

state            | streaming

sent_location    | 1/9FD5D78

write_location   | 1/9FD5D78

flush_location   | 1/9FD5D78

replay_location  | 1/9FD5D78

sync_priority    | 0

sync_state       | async

This basically tells you what blocks of location in the WAL segments that have been written, flushed, or applied. It provides you a granular overlook of the replication status.

Queries to Use In the Standby Node

In the standby node, there are functions that are supported for which you can mitigate this into a query and provide you the overview of your standby replication's health. To do this, you can run the following query below (query is based on PG version > 10),

postgres=#  select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();

-[ RECORD 1 ]-----------------+------------------------------

pg_is_in_recovery             | t

pg_is_wal_replay_paused       | f

pg_last_wal_receive_lsn       | 0/2705BDA0

pg_last_wal_replay_lsn        | 0/2705BDA0

pg_last_xact_replay_timestamp | 2020-02-21 02:18:54.603677+00

In older versions, you can use the following query:

postgres=# select pg_is_in_recovery(),pg_last_xlog_receive_location(), pg_last_xlog_replay_location(), pg_last_xact_replay_timestamp();

-[ RECORD 1 ]-----------------+------------------------------

pg_is_in_recovery             | t

pg_last_xlog_receive_location | 1/9FD6490

pg_last_xlog_replay_location  | 1/9FD6490

pg_last_xact_replay_timestamp | 2020-02-21 08:32:40.485958-06

What does the query tell? Functions are defined accordingly here,

  • pg_is_in_recovery(): (boolean) True if recovery is still in progress.
  • pg_last_wal_receive_lsn()/pg_last_xlog_receive_location():  (pg_lsn) The write-ahead log location received and synced to disk by streaming replication. 
  • pg_last_wal_replay_lsn()/pg_last_xlog_replay_location():  (pg_lsn) The last write-ahead log location replayed during recovery. If recovery is still in progress this will increase monotonically.
  • pg_last_xact_replay_timestamp():  (timestamp with time zone) Get timestamp of last transaction replayed during recovery. 

Using some basic math, you can combine these functions. The most common used function that are used by DBA's are,

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()

THEN 0

ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())

END AS log_delay;

or in versions PG < 10,

SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()

THEN 0

ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())

END AS log_delay;

Although this query has been in-practice and is used by DBA's. Still, it doesn't provide you an accurate view of the lag. Why? Let's discuss this in the next section.

Identifying Lag Caused by WAL Segment's Absence

PostgreSQL standby nodes, which are in recovery mode, does not report to you the exact state of what's happening of your replication. Not unless you view the PG log, you can gather information of what's going on. There's no query you can run to determine this. In most cases, organizations and even small institutions come up with 3rd party softwares to let them be alerted when an alarm is raised. 

One of these is ClusterControl, which offers you observability, sends alerts when alarms are raised or recovers your node in case of disaster or catastrophe happens. Let's take this scenario, my primary-standby async streaming replication cluster has failed. How would you know if something's wrong? Let's combine the following:

Step 1: Determine if There's a Lag

postgres=# SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()

postgres-# THEN 0

postgres-# ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())

postgres-# END AS log_delay;

-[ RECORD 1 ]

log_delay | 0

Step 2: Determine the WAL Segments Received From the Primary and Compare with Standby Node

## Get the master's current LSN. Run the query below in the master

postgres=# SELECT pg_current_wal_lsn();

-[ RECORD 1 ]------+-----------

pg_current_wal_lsn | 0/925D7E70

For older versions of PG < 10, use pg_current_xlog_location.

## Get the current WAL segments received (flushed or applied/replayed)

postgres=# select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();

-[ RECORD 1 ]-----------------+------------------------------

pg_is_in_recovery             | t

pg_is_wal_replay_paused       | f

pg_last_wal_receive_lsn       | 0/2705BDA0

pg_last_wal_replay_lsn        | 0/2705BDA0

pg_last_xact_replay_timestamp | 2020-02-21 02:18:54.603677+00

That seems to look bad. 

Step 3: Determine How Bad it Could Be

Now, let's mix the formula from step #1 and in step #2 and get the diff. How to do this, PostgreSQL has a function called pg_wal_lsn_diff which is defined as,

pg_wal_lsn_diff(lsn pg_lsn, lsn pg_lsn) / pg_xlog_location_diff (location pg_lsn, location pg_lsn):  (numeric) Calculate the difference between two write-ahead log locations

Now, let's use it to determine the lag. You can run it in any PG node, since it's we'll just provide the static values:

postgres=# select pg_wal_lsn_diff('0/925D7E70','0/2705BDA0');                                                                                                                                     -[ RECORD 1 ]---+-----------

pg_wal_lsn_diff | 1800913104

Let's estimate how much is 1800913104, that seems to be about 1.6GiB might have been absent in the standby node,

postgres=# select round(1800913104/pow(1024,3.0),2) missing_lsn_GiB;

-[ RECORD 1 ]---+-----

missing_lsn_gib | 1.68

Lastly, you can proceed or even prior to the query look at the logs like using tail -5f to follow and check what's going on. Do this for both primary/standby nodes. In this example, we'll see that it has a problem,

## Primary

root@debnode4:/var/lib/postgresql/11/main# tail -5f log/postgresql-2020-02-21_033512.log

2020-02-21 16:44:33.574 UTC [25023] ERROR:  requested WAL segment 000000030000000000000027 has already been removed

...



## Standby

root@debnode5:/var/lib/postgresql/11/main# tail -5f log/postgresql-2020-02-21_014137.log 

2020-02-21 16:45:23.599 UTC [26976] LOG:  started streaming WAL from primary at 0/27000000 on timeline 3

2020-02-21 16:45:23.599 UTC [26976] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000030000000000000027 has already been removed

...

When encountering this issue, it's better to rebuild your standby nodes. In ClusterControl, it's as easy as one click. Just go to the Nodes/Topology section, and rebuild the node just like below:

Other Things to Check

You can use the same approach in our previous blog (in MySQL), using system tools such as ps, top, iostat, netstat combination. For example, you can also get the current recovered WAL segment from the standby node,

root@debnode5:/var/lib/postgresql/11/main# ps axufwww|egrep "postgre[s].*startup"

postgres  8065 0.0 8.3 715820 170872 ?       Ss 01:41 0:03 \_ postgres: 11/main: startup   recovering 000000030000000000000027

How Can ClusterControl Help?

ClusterControl offers an efficient way on monitoring your database nodes from primary to the slave nodes. When going to the Overview tab, you have already the view of your replication health:

Basically, the two screenshots above displays how's the replication health and what's the current WAL segments. That's not at all. ClusterControl also shows the current activity of what's going on with your Cluster.

Conclusion

Monitoring the replication health in PostgreSQL can end up on a different approach as long as you are able to meet your needs. Using third party tools with observability that can notify you in case of catastrophe is your perfect route, whether an open source or enterprise. The most important thing is, you have your disaster recovery plan and business continuity planned ahead of such trouble.

by Paul Namuag at February 27, 2020 07:22 PM

February 26, 2020

SeveralNines

How to Protect Your MySQL & MariaDB Database Against Cyberattacks When on a Public Network

It is sometimes inevitable to run MySQL database servers on a public or exposed network. This is a common setup in a shared hosting environment, where a server is configured with multiple services and often running within the same server as the database server. For those who have this kind of setup, you should always have some kind of protection against cyberattacks like denial-of-service, hacking, cracking, data breaches; all which can result in data loss. These are things that we always want to avoid for our database server. 

Here are some of the tips that we can do to improve our MySQL or MariaDB security.

Scan Your Database Servers Regularly

Protection against any malicious files in the server is very critical. Scan the server regularly to look for any viruses, spywares, malwares or rootkits especially if the database server is co-located with other services like mail server, HTTP, FTP, DNS, WebDAV, telnet and so on. Commonly, most of the database hacked issues originated from the application tier that is facing the public network. Thus, it's important to scan all files, especially web/application files since they are one of the entry points to get into the server. If those are compromised, the hacker can get into the application directory, and have the ability to read the application files. These might contain sensitive information, for instance, the database login credentials. 

ClamAV is one of the most widely known and widely trusted antivirus solutions for a variety of operating systems, including Linux. It's free and very easy to install and comes with a fairly good detection mechanism to look for unwanted things in your server. Schedule periodic scans in the cron job, for example:

0 3 * * * /bin/freshclam ; /bin/clamscan / --recursive=yes -i > /tmp/clamav.log ; mail -s clamav_log_`hostname` monitor@mydomain.local < /tmp/clamav.log

The above will update the ClamAV virus database, scan all directories and files and send you an email on the status of the execution and report every day at 3 AM.

Use Stricter User Roles and Privileges

When creating a MySQL user, do not allow all hosts to access the MySQL server with wildcard host (%). You should scan your MySQL host and look for any wildcard host value, as shown in the following statement:

mysql> SELECT user,host FROM mysql.user WHERE host = '%';
+---------+------+
| user    | host |
+---------+------+
| myadmin | %    |
| sbtest  | %    |
| user1   | %    |
+---------+------+

From the above output, strict or remove all users that have only '%' value under Host column. Users that need to access the MySQL server remotely can be enforced to use SSH tunnelling method, which does not require remote host configuration for MySQL users. Most of the MySQL administration clients such as MySQL Workbench and HeidiSQL can be configured to connect to a MySQL server via SSH tunelling, therefore it's possible to completely eliminate remote connection for MySQL users.

Also, limit the SUPER privilege to only users from localhost, or connecting via UNIX socket file. Be more cautious when assigning FILE privilege to non-root users since it permits read and write files on the server using the LOAD DATA INFILE and SELECT ... INTO OUTFILE statements. Any user to whom this privilege is granted can also read or write any file that the MySQL server can read or write.

Change the Database Default Settings

By moving away from the default setup, naming and configurations, we can reduce the attack vector to a number of folds. The following actions are some examples on default configurations that DBAs could easily change but commonly overlooked related to MySQL:

  • Change default MySQL port to other than 3306.
  • Rename the MySQL root username to other than "root".
  • Enforce password expiration and reduce the password lifetime for all users.
  • If MySQL is co-located with the application servers, enforce connection through UNIX socket file only, and stop listening on port 3306 for all IP addresses.
  • Enforce client-server encryption and server-server replication encryption.

We actually have covered this in detail in this blog post, How to Secure MySQL/MariaDB Servers.

Setup a Delayed Slave

A delayed slave is just a typical slave, however the slave server intentionally executes transactions later than the master by at least a specified amount of time, available from MySQL 5.6. Basically, an event received from the master is not executed until at least N seconds later than its execution on the master. The result is that the slave will reflect the state of the master some time back in the past.

A delayed slave can be used to recover data, which would be helpful when the problem is found immediately, within the period of delay. Suppose we configured a slave with a 6-hour delay from the master. If our database were modified or deleted (accidentally by a developer or deliberately by a hacker) within this time range, there is a possibility for us to revert to the moment right before it happened by stopping the current master, then bringing the slave server up until certain point with the following command:

# on delayed slave
mysql> STOP SLAVE;
mysql> START SLAVE UNTIL MASTER_LOG_FILE='xxxxx', MASTER_LOG_POS=yyyyyy;

Where 'xxxxx' is the binary log file and 'yyyyy' is the position right before the disaster happens (use mysqlbinlog tool to examine those events). Finally, promote the slave to become the new master and your MySQL service is now back operational as usual. This method is probably the fastest way to recover your MySQL database in production environment without having to reload a backup. Having a number of delayed slaves with different length durations, as shown in this blog, Multiple Delayed Replication Slaves for Disaster Recovery with Low RTO on how to set up a cost-effective delayed replication servers on top of Docker containers.

Enable Binary Logging

Binary logging is generally recommended to be enabled even though you are running on a standalone MySQL/MariaDB server. The binary log contains information about SQL statements that modify database contents. The information is stored in the form of "events" that describe the modifications. Despite performance impact, having binary log allows you to have the possibility to replay your database server to the exact point where you want it to be restored, also known as point-in-time recovery (PITR). Binary logging is also mandatory for replication. 

With binary logging enabled, one has to include the binary log file and position information when taking up a full backup. For mysqldump, using the --master-data flag with value 1 or 2 will print out the necessary information that we can use as a starting point to roll forward the database when replaying the binary logs later on. 

With binary logging enabled, you can use another cool recovery feature called flashback, which is described in the next section.

Enable Flashback

The flashback feature is available in MariaDB, where you can restore back the data to the previous snapshot in a MySQL database or in a table. Flashback uses the mysqlbinlog to create the rollback statements and it needs a FULL binary log row image for that. Thus, to use this feature, the MySQL/MariaDB server must be configured with the following:

[mysqld]
...
binlog_format = ROW
binlog_row_image = FULL

The following architecture diagram illustrates how flashback is configured on one of the slave:

To perform the flashback operation, firstly you have to determine the date and time when you want to "see" the data, or binary log file and position. Then, use the --flashback flag with mysqlbinlog utility to generate SQL statements to rollback the data to that point. In the generated SQL file, you will notice that the DELETE events are converted to INSERTs and vice versa, and also it swaps WHERE and SET parts of the UPDATE events. 

The following command line should be executed on the slave2 (configured with binlog_row_image=FULL):

$ mysqlbinlog --flashback --start-datetime="2020-02-17 01:30:00"  /var/lib/mysql/mysql-bin.000028 -v --database=shop --table=products > flashback_to_2020-02-17_013000.sql

Then, detach slave2 from the replication chain because we are going to break it and use the server to rollback our data:

mysql> STOP SLAVE;
mysql> RESET MASTER;
mysql> RESET SLAVE ALL;

Finally, import the generated SQL file into the MariaDB server for database shop on slave2:

$ mysql -u root -p shop < flashback_to_2020-02-17_013000.sql

When the above is applied, the table "products" will be at the state of 2020-02-17 01:30:00. Technically, the generated SQL file can be applied to both MariaDB and MySQL servers. You could also transfer the mysqlbinlog binary from MariaDB server so you can use the flashback feature on a MySQL server. However, MySQL GTID implementation is different than MariaDB thus restoring the SQL file requires you to disable MySQL GTID.

A couple of advantages using flashback is you do not need to stop the MySQL/MariaDB server to carry out this operation. When the amount of data to revert is small, the flashback process is much faster than recovering the data from a full backup. 

Log All Database Queries

General log basically captures every SQL statement being executed by the client in the MySQL server. However, this might not be a popular decision on a busy production server due to the performance impact and space consumption. If performance matters, binary log has the higher priority to be enabled. General log can be enabled during runtime by running the following commands:

mysql> SET global general_log_file='/tmp/mysql.log'; 
mysql> SET global log_output = 'file';
mysql> SET global general_log = ON;

You can also set the general log output to a table:

mysql> SET global log_output = 'table';

You can then use the standard SELECT statement against the mysql.general_log table to retrieve queries. Do expect a bit more performance impact when running with this configuration as shown in this blog post.

Otherwise, you can use external monitoring tools that can perform query sampling and monitoring so you can filter and audit the queries that come into the server. ClusterControl can be used to collect and summaries all your queries, as shown in the following screenshots where we filter all queries that contain DELETE string:

Similar information is also available under ProxySQL's top queries page (if your application is connecting via ProxySQL):

This can be used to track recent changes that have happened to the database server and can also be used for auditing purposes. 

Conclusion

Your MySQL and MariaDB servers must be well-protected at all times since it usually contains sensitive data that attackers are looking after. You may also use ClusterControl to manage the security aspects of your database servers, as showcased by this blog post, How to Secure Your Open Source Databases with ClusterControl.

by ashraf at February 26, 2020 07:43 PM

February 25, 2020

SeveralNines

How to Identify PostgreSQL Performance Issues with Slow Queries

When working with OLTP (OnLine Transaction Processing) databases, query performance is paramount as it directly impacts the user experience. Slow queries mean that the application feels unresponsive and slow and this results in bad conversion rates, unhappy users, and all sets of problems. 

OLTP is one of the common use cases for PostgreSQL therefore you want your queries to run as smooth as possible. In this blog we’d like to talk about how you can identify problems with slow queries in PostgreSQL.

Understanding the Slow Log

Generally speaking, the most typical way of identifying performance problems with PostgreSQL is to collect slow queries. There are a couple of ways you can do it. First, you can enable it on a single database:

pgbench=# ALTER DATABASE pgbench SET log_min_duration_statement=0;

ALTER DATABASE

After this all new connections to ‘pgbench’ database will be logged into PostgreSQL log.

It is also possible to enable this globally by adding:

log_min_duration_statement = 0

to PostgreSQL configuration and then reload config:

pgbench=# SELECT pg_reload_conf();

 pg_reload_conf

----------------

 t

(1 row)

This enables logging of all queries across all of the databases in your PostgreSQL. If you do not see any logs, you may want to enable logging_collector = on as well. The logs will include all of the traffic coming to PostgreSQL system tables, making it more noisy. For our purposes let’s stick to the database level logging.

What you’ll see in the log are entries as below:

2020-02-21 09:45:39.022 UTC [13542] LOG:  duration: 0.145 ms statement: SELECT abalance FROM pgbench_accounts WHERE aid = 29817899;

2020-02-21 09:45:39.022 UTC [13544] LOG:  duration: 0.107 ms statement: SELECT abalance FROM pgbench_accounts WHERE aid = 11782597;

2020-02-21 09:45:39.022 UTC [13529] LOG:  duration: 0.065 ms statement: SELECT abalance FROM pgbench_accounts WHERE aid = 16318529;

2020-02-21 09:45:39.022 UTC [13529] LOG:  duration: 0.082 ms statement: UPDATE pgbench_tellers SET tbalance = tbalance + 3063 WHERE tid = 3244;

2020-02-21 09:45:39.022 UTC [13526] LOG:  duration: 16.450 ms statement: UPDATE pgbench_branches SET bbalance = bbalance + 1359 WHERE bid = 195;

2020-02-21 09:45:39.023 UTC [13523] LOG:  duration: 15.824 ms statement: UPDATE pgbench_accounts SET abalance = abalance + -3726 WHERE aid = 5290358;

2020-02-21 09:45:39.023 UTC [13542] LOG:  duration: 0.107 ms statement: UPDATE pgbench_tellers SET tbalance = tbalance + -2716 WHERE tid = 1794;

2020-02-21 09:45:39.024 UTC [13544] LOG:  duration: 0.112 ms statement: UPDATE pgbench_tellers SET tbalance = tbalance + -3814 WHERE tid = 278;

2020-02-21 09:45:39.024 UTC [13526] LOG:  duration: 0.060 ms statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (4876, 195, 39955137, 1359, CURRENT_TIMESTAMP);

2020-02-21 09:45:39.024 UTC [13529] LOG:  duration: 0.081 ms statement: UPDATE pgbench_branches SET bbalance = bbalance + 3063 WHERE bid = 369;

2020-02-21 09:45:39.024 UTC [13523] LOG:  duration: 0.063 ms statement: SELECT abalance FROM pgbench_accounts WHERE aid = 5290358;

2020-02-21 09:45:39.024 UTC [13542] LOG:  duration: 0.100 ms statement: UPDATE pgbench_branches SET bbalance = bbalance + -2716 WHERE bid = 210;

2020-02-21 09:45:39.026 UTC [13523] LOG:  duration: 0.092 ms statement: UPDATE pgbench_tellers SET tbalance = tbalance + -3726 WHERE tid = 67;

2020-02-21 09:45:39.026 UTC [13529] LOG:  duration: 0.090 ms statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (3244, 369, 16318529, 3063, CURRENT_TIMESTAMP);

You can see information about the query and its duration. Not much else but it’s definitely a good place to start. The main thing to keep in mind is that not every slow query is a problem. Sometimes queries have to access a significant amount of data and it is expected for them to take longer to access and analyze all of the information user asked for. Another question is what “slow” means? This mostly depends on the application. If we are talking about interactive applications, most likely anything slower than a second is noticeable. Ideally everything is executed within 100 - 200 milliseconds limit.

Developing a Query Execution Plan

Once we determine that given query is indeed something we want to improve, we should take a look at the query execution plan. First of all, it may happen that there’s nothing we can do about it and we’ll have to accept that given query is just slow. Second, query execution plans may change. Optimizers always try to pick the most optimal execution plan but they make their decisions based on just a sample of data therefore it may happen that the query execution plan changes in time. In PostgreSQL you can check the execution plan in two ways. First, the estimated execution plan, using EXPLAIN:

pgbench=# EXPLAIN SELECT abalance FROM pgbench_accounts WHERE aid = 5290358;

                                          QUERY PLAN

----------------------------------------------------------------------------------------------

 Index Scan using pgbench_accounts_pkey on pgbench_accounts  (cost=0.56..8.58 rows=1 width=4)

   Index Cond: (aid = 5290358)

As you can see, we are expected to access data using primary key lookup. If we want to double-check how exactly the query will be executed, we can use EXPLAIN ANALYZE:

pgbench=# EXPLAIN ANALYZE SELECT abalance FROM pgbench_accounts WHERE aid = 5290358;

                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------

 Index Scan using pgbench_accounts_pkey on pgbench_accounts  (cost=0.56..8.58 rows=1 width=4) (actual time=0.046..0.065 rows=1 loops=1)

   Index Cond: (aid = 5290358)

 Planning time: 0.053 ms

 Execution time: 0.084 ms

(4 rows)

Now, PostgreSQL has executed this query and it can tell us not just the estimates but exact numbers when it comes to the execution plan, number of rows accessed and so on. Please keep in mind that logging all of the queries may become a serious overhead on your system. You should also keep an eye on the logs and ensure they are properly rotated.

Pg_stat_statements

Pg_stat_statements is the extension that collects execution statistics for different query types.

pgbench=# select query, calls, total_time, min_time, max_time, mean_time, stddev_time, rows from public.pg_stat_statements order by calls desc LIMIT 10;

                                                query                                                 | calls | total_time | min_time | max_time |     mean_time | stddev_time | rows

------------------------------------------------------------------------------------------------------+-------+------------------+----------+------------+---------------------+---------------------+-------

 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2                                  | 30437 | 6636.83641200002 | 0.006533 | 83.832148 | 0.218051595492329 | 1.84977058799388 | 30437

 BEGIN                                                                                                | 30437 | 231.095600000001 | 0.000205 | 20.260355 | 0.00759258796859083 | 0.26671126085716 | 0

 END                                                                                                  | 30437 | 229.483213999999 | 0.000211 | 16.980678 | 0.0075396134310215 | 0.223837608828596 | 0

 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2                                  | 30437 | 290021.784321001 | 0.019568 | 805.171845 | 9.52859297305914 | 13.6632712046825 | 30437

 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2                                   | 30437 | 6667.27243200002 | 0.00732 | 212.479269 | 0.219051563294674 | 2.13585110968012 | 30437

 SELECT abalance FROM pgbench_accounts WHERE aid = $1                                                 | 30437 | 3702.19730600006 | 0.00627 | 38.860846 | 0.121634763807208 | 1.07735927551245 | 30437

 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP) | 30437 | 2349.22475800002 | 0.003218 |  61.372127 | 0.0771831901304325 | 0.971590327400244 | 30437

 SELECT $1                                                                                            | 6847 | 60.785467 | 0.002321 | 7.882384 | 0.00887767883744706 | 0.105198744982906 | 6847

 insert into pgbench_tellers(tid,bid,tbalance) values ($1,$2,$3)                                      | 5000 | 18.592042 | 0.001572 | 0.741427 | 0.0037184084 | 0.0137660355678027 | 5000

 insert into pgbench_tellers(tid,bid,tbalance) values ($1,$2,$3)                                      | 3000 | 7.323788 | 0.001598 | 0.40152 | 0.00244126266666667 | 0.00834442591085048 | 3000

(10 rows)

As you can see on the data above, we have a list of different queries and information about their execution times - this is just a part of the data you can see in the pg_stat_statements but it is enough for us to understand that our primary key lookup takes sometimes almost 39 seconds to complete - this does not look good and it is definitely something we want to investigate.

If you do not have pg_stat_statements enabled, you can do it in a standard way. Either via configuration file and

shared_preload_libraries = 'pg_stat_statements'

Or you can enable it via PostgreSQL command line:

pgbench=# CREATE EXTENSION pg_stat_statements;

CREATE EXTENSION

Using ClusterControl to Eliminate Slow Queries

If you happen to use ClusterControl to manage your PostgreSQL database, you can use it to collect data about slow queries.

As you can see, it collects data about query execution - rows sent and examined, execution time statistics and so on. With it you can easily pinpoint the most expensive queries, and see what the average and maximum execution times looks like. By default ClusterControl collects queries that took longer than 0.5 second to complete, you can change this in the settings:

Conclusion

This short blog by no means covers all of the aspects and tools helpful in identifying and solving query performance problems in PostgreSQL. We hope it is a good start and that it will help you to understand what you can do to pinpoint the root cause of the slow queries.

by krzysztof at February 25, 2020 07:17 PM

February 24, 2020

SeveralNines

My PostgreSQL Database is Out of Disk Space

Disk space is a demanding resource nowadays. You usually will want to store data as long as possible, but this could be a problem if you don’t take the necessary actions to prevent a potential “out of disk space” issue. 

In this blog, we will see how we can detect this issue for PostgreSQL, prevent it, and if it is too late, some options that probably will help you to fix it.

How to Identify PostgreSQL Disk Space Issues

If you, unfortunately, are in this out of disk space situation, you will able to see some errors in the PostgreSQL database logs:

2020-02-20 19:18:18.131 UTC [4400] LOG:  could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device

or even in your system log:

Feb 20 19:29:26 blog-pg1 rsyslogd: imjournal: fclose() failed for path: '/var/lib/rsyslog/imjournal.state.tmp': No space left on device [v8.24.0-41.el7_7.2 try http://www.rsyslog.com/e/2027 ]

PostgreSQL can continue works for awhile running read-only queries, but eventually, it will fail trying to write to disk, then you will see something like this in your client session:

WARNING:  terminating connection because of crash of another server process

DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

HINT:  In a moment you should be able to reconnect to the database and repeat your command.

server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

The connection to the server was lost. Attempting reset: Failed.

Then, if you take a look at the disk space, you will have this unwanted output…

$ df -h

Filesystem                        Size Used Avail Use% Mounted on

/dev/mapper/pve-vm--125--disk--0   30G 30G 0 100% /

How to Prevent PostgreSQL Disk Space Issues

The main way to prevent this kind of issue is by monitoring the disk space usage, and database or disk usage growth. For this, a graph should be a friendly way to monitor the disk space increment:

PostgreSQL Disk Space - ClusterControl

And the same for the database growth:

PostgreSQL Database Growth - ClusterControl

Another important thing to monitor is the replication status. If you have a replica and, for some reason, this stops working, depending on the configuration, it could be possible that PostgreSQL store all the WAL files to restore the replica when it comes back.

PostgreSQL Topology

All this monitoring system doesn’t make sense without an alerting system to know when you need to take actions:

How to Fix PostgreSQL Disk Space Issues

Well, if you are facing this out of disk space issue even with the monitoring and alerting system implemented (or not), there are many options to try to fix this issue without data lost (or the less as possible).

What is Consuming Your Disk Space?

The first step should be determining where my disk space is. A best practice is having separate partitions, at least one separate partition for your database storage, so you can easily confirm if your database or your system is using excessive disk space. Another advantage of this is to minimize the damage. If your root partition is full, your database can still write in his own partition without issues.

Database Space Usage

Let’s see now some useful commands to check your database disk space usage.

A basic way to check the database space usage is checking the data directory in the filesystem:

$ du -sh /var/lib/pgsql/11/data/

819M /var/lib/pgsql/11/data/

Or if you have a separate partition for your data directory, you can use df -h directly.

The PostgreSQL command “\l+” list the databases adding the size information:

$ postgres=# \l+

                                                               List of databases

   Name    | Owner   | Encoding | Collate | Ctype |   Access privileges | Size | Tablespace

|                Description

-----------+----------+-----------+---------+-------+-----------------------+---------+------------

+--------------------------------------------

 postgres  | postgres | SQL_ASCII | C       | C | | 7965 kB | pg_default

| default administrative connection database

 template0 | postgres | SQL_ASCII | C       | C | =c/postgres +| 7817 kB | pg_default

| unmodifiable empty database

           |          | |         | | postgres=CTc/postgres |         |

|

 template1 | postgres | SQL_ASCII | C       | C | =c/postgres +| 7817 kB | pg_default

| default template for new databases

           |          | |         | | postgres=CTc/postgres |         |

|

 world     | postgres | SQL_ASCII | C       | C | | 8629 kB | pg_default

|

(4 rows)

Using pg_database_size and the database name you can see the database size:

postgres=# SELECT pg_database_size('world');

 pg_database_size

------------------

          8835743

(1 row)

And using the pg_size_pretty to see this value in a human-readable way could be even better:

postgres=# SELECT pg_size_pretty(pg_database_size('world'));

 pg_size_pretty

----------------

 8629 kB

(1 row)

When you know where space is, you can take the corresponding action to fix it. Keep in mind that just deleting rows is not enough to recover the disk space, you will need to run a VACUUM or VACUUM FULL to finish the task. 

Log Files

The easiest way to recover disk space is by deleting log files. You can check the PostgreSQL log directory or even the system logs to verify if you can gain some space from there. If you have something like this:

$ du -sh /var/lib/pgsql/11/data/log/

18G /var/lib/pgsql/11/data/log/

You should check the directory content to see if there is a log rotation/retention problem or something is happening in your database and writing it to the logs.

$ ls -lah /var/lib/pgsql/11/data/log/

total 18G

drwx------  2 postgres postgres 4.0K Feb 21 00:00 .

drwx------ 21 postgres postgres 4.0K Feb 21 00:00 ..

-rw-------  1 postgres postgres  18G Feb 21 14:46 postgresql-Fri.log

-rw-------  1 postgres postgres 9.3K Feb 20 22:52 postgresql-Thu.log

-rw-------  1 postgres postgres 3.3K Feb 19 22:36 postgresql-Wed.log

Before deleting the logs, if you have a huge one, a good practice is to keep the last 100 lines or so, and then delete it. So, you can check what is happening after generating free space.

$ tail -100 postgresql-Fri.log > /tmp/log_temp.log

And then:

$ cat /dev/null > /var/lib/pgsql/11/data/log/postgresql-Fri.log

If you just delete it with “rm” and the log file is being used by the PostgreSQL server (or another service) space won’t be released, so you should truncate this file using this cat /dev/null command instead.

This action is only for PostgreSQL and system log files. Don’t delete the pg_wal content or another PostgreSQL file as it could generate critical damage to your database.

Bloat

In a normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from the table; they are present until a VACUUM is performed. So, it is necessary to do the VACUUM periodically (AUTOVACUUM), especially in frequently-updated tables.

The problem here is space is not returned to the operating system using just VACUUM, it is only available for use in the same table.

VACUUM FULL rewrites the table into a new disk file, returning the unused space to the operating system. Unfortunately, it requires an exclusive lock on each table while it is running.

You should check the tables to see if a VACUUM (FULL) process is required.

Replication Slots

If you are using replication slots, and it is not active for some reason:

postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;

 slot_name | slot_type | active

-----------+-----------+--------

 slot1     | physical  | f

(1 row)

It could be a problem for your disk space because it will store the WAL files until they have been received by all the standby nodes.

The way to fix it is recovering the replica (if possible), or deleting the slot:

postgres=# SELECT pg_drop_replication_slot('slot1');

 pg_drop_replication_slot

--------------------------

(1 row)

So, the space used by the WAL files will be released.

Conclusion

As we mentioned, monitoring and alerting systems are the keys to avoiding these kinds of issues. In this way, ClusterControl can help you to have your systems up and running, sending you alarms when needed or even taking recovery action to keep your database cluster working. You can also deploy/import different database technologies and scaling them out if needed.

by Sebastian Insausti at February 24, 2020 09:47 PM

February 22, 2020

Valeriy Kravchuk

Fun with Bugs #94 - On MySQL Bug Reports I am Subscribed to, Part XXVIII

I may get a chance to speak about proper bugs processing for open source projects later this year, so I have to keep reviewing recent MySQL bugs to be ready for that. In my previous post in this series I listed some interesting MySQL bug reports created in December, 2019. Time to move on to January, 2020! Belated Happy New Year of cool MySQL Bugs!

As usual I mostly care about InnoDB, replication and optimizer bugs and explicitly mention bug reporter by name and give link to his other active reports (if any). I also pick up examples of proper (or improper) reporter and Oracle engineers attitudes. Here is the list:
  • Bug #98103 - "unexpected behavior while logging an aborted query in the slow query log".  Query that was killed while waiting for the table metadata lock is not only get logged, but also lock wait time is saved as query execution time. I'd like to highlight how bug reporter, Pranay Motupalli, used gdb to study what really happens in the code in this case. Perfect bug report!
  • Bug #98113 - "Crash possible when load & unload a connection handler". The (quite obvious) bug was verified based on code review, but only after some efforts were spent by Oracle engineer on denial to accept the problem and its importance. This bug was reported by Fangxin Flou.
  • Bug #98132 - "Analyze table leads to empty statistics during online rebuild DDL ". Nice addition to my collections! This bug with a nice and clear test case was reported by Albert Hu, who also suggested a fix.
  • Bug #98139 - "Committing a XA transaction causes a wrong sequence of events in binlog". This bug reported by Dehao Wang was verified as a "documentation" one, but I doubt documenting current behavior properly is an acceptable fix. Bug reporter suggested to commit in the binary log first, for example. Current implementation that allows users to commit/rollback a XA transaction by using another connection if the former connection is closed or killed, is risky. A lot of arguing happened in comments in the process, and my comment asking for a clear quote from the manual:
    Would you be so kind to share some text from this page you mentioned:

    https://dev.mysql.com/doc/refman/8.0/en/xa.html

    or any other fine MySQL 8 manual page stating that XA COMMIT is NOT supported when executed from session/connection/thread other than those prepared the XA transaction? I am doing something wrong probably, but I can not find such text anywhere.
    was hidden. Let's see what happens to this bug report next.
  • Bug #98211 - "Auto increment value didn't reset correctly.". Not sure what this bug reported by Zhao Jianwei has to do with "Data Types", IMHO it's more about DDL or data dictionary. Again, some sarcastic comments from Community users were needed to put work on this bug back on track...
  • Bug #98220 - "with log_slow_extra=on Errno: info not getting updated correctly for error". This bug was reported by lalit Choudhary from Percona.
  • Bug #98227 - "innodb_stats_method='nulls_ignored' and persistent stats get wrong cardinalities". I think category is wrong for this bug. It's a but in InnoDB's persistent statistics implementation, one of many. The bug was reported by Agustín G from Percona.
  • Bug #98231 - "show index from a partition table gets a wrong cardinality value". Yet another by report by Albert Hu. that ended up as a "documentation" bug for now, even though older MySQL versions provided better cardinality estimations than MySQL 8.0 in this case (so this is a regression of a kind). I hope the bug will be re-classified and properly processed later.
  • Bug #98238 - "I_S.KEY_COLUMN_USAGE is very slow". I am surprised to see such a bug in MySQL 8. According to the bug reporter, Manuel Mausz, this is also a kind of regression comparing to older MySQL version, where these queries used to run faster. Surely, no "regression" tag in this case was added.
  • Bug #98284 - "Low sysbench score in the case of a large number of connections". This notable performance regression of MySQL 8 vs 5.7 was reported by zanye zjy. perf profiling pointed out towards ppoll() where a lot of time is spent. There is a fix suggested by Fangxin Flou (to use poll() instead), but the bug is still "Open".
  • Bug #98287 - "Explanation of hash joins is inconsistent across EXPLAIN formats". This bug was reported by Saverio M and ended up marked as a duplicate of Bug #97299 fixed in upcoming 8.0.20. Use EXPLAIN FORMAT=TREE in the meantime to see proper information about hash joins usage in the plan.
  • Bug #98288 - "xa commit crash lead mysql replication error". This bug report from Phoenix Zhang (who also suggested a patch) was declared a duplicate of Bug #76233 - "XA prepare is logged ahead of engine prepare" (that I've already discussed among other XA transactions bugs here).
  • Bug #98324 - "Deadlocks more frequent since version 5.7.26". Nice regression bug report by Przemyslaw Malkowski from Percona, with additional test provided later by Stephen Wei . Interestingly enough, test results shared by Umesh Shastry show that MySQL 8.0.19 is affected in the same way as 5.7.26+, but 8.0.19 is NOT listed as one of versions affected. This is a mistake to fix, along with missing regression tag.
  • Bug #98427 - "InnoDB FullText AUX Tables are broken in 8.0". Yet another regression in MySQL 8 was found by Satya Bodapati. Change in default collation for utf8mb4 character set caused this it seems. InnoDB FULLTEXT search was far from perfect anyway...
The are clouds in the sky of MySQL bugs processing.
To summarize:
  1.  Still too much time and efforts are sometimes spent on arguing with bug reporter instead of accepting and processing bugs properly. This is unfortunate.
  2. Sometimes bugs are wrongly classified when verified (documentation vs code bug, wrong category, wrong severity, not all affected versions are listed, ignoring regression etc). This is also unfortunate.
  3. Percona engineers still help to make MySQL better.
  4. There are some fixes in upcoming MySQL 8.0.20 that I am waiting for :)
  5. XA transactions in MySQL are badly broken (they are not atomic in storage engine + binary log) and hardly safe to use in reality.

by Valerii Kravchuk (noreply@blogger.com) at February 22, 2020 08:21 PM

February 21, 2020

SeveralNines

What to Check if MySQL Memory Utilisation is High

One of the key factors of a performant MySQL database server is having good memory allocation and utilization, especially when running it in a production environment. But how can you determine if the MySQL utilization is optimized? Is it reasonable to have high memory utilization or does it require fine tuning? What if I come up against a memory leak?

Let's cover these topics and show the things you can check in MySQL to determine traces of high memory utilization.

Memory Allocation in MySQL

Before we delve into the specific subject title, I'll just give a short information about how MySQL uses memory. Memory plays a significant resource for speed and efficiency when handling concurrent transactions and running big queries. Each thread in MySQL demands memory which is used to manage client connections, and these threads share the same base memory. Variables like thread_stack (stack for threads), net_buffer_length (for connection buffer and result buffer), or with max_allowed_packet where connection and result will dynamically enlarge up to this value when needed, are variables that do affect memory utilization. When a thread is no longer needed, the memory allocated to it is released and returned to the system unless the thread goes back into the thread cache. In that case, the memory remains allocated. Query joins, query caches, sorting, table cache, table definitions do require memory in MySQL but these are attributed with system variables that you can configure and set.

In most cases, the memory-specific variables set for a configuration are targeted on a storage-based specific configuration such as MyISAM or InnoDB. When a mysqld instance spawns within the host system, MySQL allocates buffers and caches to improve performance of database operations based on the set values set on a specific configuration. For example, the most common variables every DBA will set in InnoDB are variables innodb_buffer_pool_size and innodb_buffer_pool_instances which are both related to buffer pool memory allocation that holds cached data for InnoDB tables. It's desirable if you have large memory and are expecting to handle big transactions by setting innodb_buffer_pool_instances to improve concurrency by dividing the buffer pool into multiple buffer pool instances. 

While for MyISAM, you have to deal with key_buffer_size to handle the amount of memory that the key buffer will handle. MyISAM also allocates buffer for every concurrent threads which contains a table structure, column structures for each column, and a buffer of size 3 * N are allocated (where N is the maximum row length, not counting BLOB columns).  MyISAM also maintains one extra row buffer for internal use.

MySQL also allocates memory for temporary tables unless it becomes too large (determined by tmp_table_size and max_heap_table_size). If you are using MEMORY tables and variable max_heap_table_size is set very high, this can also take a large memory since max_heap_table_size system variable determines how large a table can grow, and there is no conversion to on-disk format.

MySQL also has a Performance Schema which is a feature for monitoring MySQL activities at a low level. Once this is enabled, it dynamically allocates memory incrementally, scaling its memory use to actual server load, instead of allocating required memory during server startup. Once memory is allocated, it is not freed until the server is restarted. 

MySQL can also be configured to allocate large areas of memory for its buffer pool if using Linux and if kernel is enabled for large page support, i.e. using HugePages

What To Check Once MySQL Memory is High

Check Running Queries

It's very common for MySQL DBAs to touch base first what's going on with the running MySQL server. The most basic procedures are check processlist, check server status, and check the storage engine status. To do these things, basically, you have just to run the series of queries by logging in to MySQL. See below:

To view the running queries,

mysql> SHOW [FULL] PROCESSLIST;

Viewing the current processlist reveals queries that are running actively or even idle or sleeping processes. It is very important and is a significant routine to have a record of queries that are running. As noted on how MySQL allocates memory, running queries will utilize memory allocation and can drastically cause performance issues if not monitored.

View the MySQL server status variables,

mysql> SHOW SERVER STATUS\G

or filter specific variables like

mysql> SHOW SERVER STATUS WHERE variable_name IN ('<var1>', 'var2'...);

MySQL's status variables serve as your statistical information to grab metric data to determine how your MySQL performs by observing the counters given by the status values. There are certain values here which gives you a glance that impacts memory utilization. For example, checking the number of threads, the number of table caches, or the buffer pool usage,

...

| Created_tmp_disk_tables                 | 24240 |

| Created_tmp_tables                      | 334999 |

…

| Innodb_buffer_pool_pages_data           | 754         |

| Innodb_buffer_pool_bytes_data           | 12353536         |

...

| Innodb_buffer_pool_pages_dirty          | 6         |

| Innodb_buffer_pool_bytes_dirty          | 98304         |

| Innodb_buffer_pool_pages_flushed        | 30383         |

| Innodb_buffer_pool_pages_free           | 130289         |

…

| Open_table_definitions                  | 540 |

| Open_tables                             | 1024 |

| Opened_table_definitions                | 540 |

| Opened_tables                           | 700887 |

...

| Threads_connected                             | 5 |

...

| Threads_cached    | 2 |

| Threads_connected | 5     |

| Threads_created   | 7 |

| Threads_running   | 1 |

View the engine's monitor status, for example, InnoDB status

mysql> SHOW ENGINE INNODB STATUS\G

The InnoDB status also reveals the current status of transactions that the storage engine is processing. It gives you the heap size of a transaction, adaptive hash indexes revealing its buffer usage, or shows you the innodb buffer pool information just like the example below:

---TRANSACTION 10798819, ACTIVE 0 sec inserting, thread declared inside InnoDB 1201

mysql tables in use 1, locked 1

1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 8801

MySQL thread id 68481, OS thread handle 139953970235136, query id 681821 localhost root copy to tmp table

ALTER TABLE NewAddressCode2_2 ENGINE=INNODB



…

-------------------------------------

INSERT BUFFER AND ADAPTIVE HASH INDEX

-------------------------------------

Ibuf: size 528, free list len 43894, seg size 44423, 1773 merges

merged operations:

 insert 63140, delete mark 0, delete 0

discarded operations:

 insert 0, delete mark 0, delete 0

Hash table size 553193, node heap has 1 buffer(s)

Hash table size 553193, node heap has 637 buffer(s)

Hash table size 553193, node heap has 772 buffer(s)

Hash table size 553193, node heap has 1239 buffer(s)

Hash table size 553193, node heap has 2 buffer(s)

Hash table size 553193, node heap has 0 buffer(s)

Hash table size 553193, node heap has 1 buffer(s)

Hash table size 553193, node heap has 1 buffer(s)

115320.41 hash searches/s, 10292.51 non-hash searches/s

...

----------------------

BUFFER POOL AND MEMORY

----------------------

Total large memory allocated 2235564032

Dictionary memory allocated 3227698

Internal hash tables (constant factor + variable factor)

    Adaptive hash index 78904768        (35404352 + 43500416)

    Page hash           277384 (buffer pool 0 only)

    Dictionary cache    12078786 (8851088 + 3227698)

    File system         1091824 (812272 + 279552)

    Lock system         5322504 (5313416 + 9088)

    Recovery system     0 (0 + 0)

Buffer pool size   131056

Buffer pool size, bytes 2147221504

Free buffers       8303

Database pages     120100

Old database pages 44172

Modified db pages  108784

Pending reads      0

Pending writes: LRU 2, flush list 342, single page 0

Pages made young 533709, not young 181962

3823.06 youngs/s, 1706.01 non-youngs/s

Pages read 4104, created 236572, written 441223

38.09 reads/s, 339.46 creates/s, 1805.87 writes/s

Buffer pool hit rate 1000 / 1000, young-making rate 12 / 1000 not 5 / 1000

Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s

LRU len: 120100, unzip_LRU len: 0

I/O sum[754560]:cur[8096], unzip sum[0]:cur[0]

…

Another thing to add, you can also use Performance Schema and sys schema for monitoring memory consumption and utilization by your MySQL server. By default, most instrumentations are disabled by default so there are manual things to do to use this. 

Check for Swappiness 

Either way, it's probable that MySQL is swapping out its memory to disk. This is oftentimes a very common situation especially when MySQL server and the underlying hardware is not set optimally in parallel to the expected requirements. There are certain cases that the demand of traffic has not been anticipated, memory could grow increasingly especially if bad queries are run causing to consume or utilize a lot of memory space causing degrading performance as data are picked on disk instead of on the buffer. To check for swappiness, just run freemem command or vmstat just like below,

[root@node1 ~]# free -m

              total        used free      shared buff/cache available

Mem:           3790 2754         121 202 915         584

Swap:          1535 39        1496

[root@node1 ~]# vmstat 5 5

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

 r  b swpd   free buff  cache si so    bi bo in cs us sy id wa st

 2  0 40232 124100      0 937072 2 3 194  1029 477 313 7 2 91 1  0

 0  0 40232 123912      0 937228 0 0   0 49 1247 704 13 3 84  0 0

 1  0 40232 124184      0 937212 0 0   0 35 751 478 6 1 93  0 0

 0  0 40232 123688      0 937228 0 0   0 15 736 487 5 1 94  0 0

 0  0 40232 123912      0 937220 0 0   3 74 1065 729 8 2 89  0 0

You may also check using procfs and gather information such as going to /proc/vmstat or /proc/meminfo.

Using Perf, gdb, and Valgrind with Massif

Using tools like perf, gdb, and valgrind helps you dig into a more advanced method of determining MySQL memory utilization. There are times that an interesting outcome becomes a mystery of solving memory consumption that leads to your bewilderment in MySQL. This turns in the need to have more skepticism and using these tools helps you investigate how MySQL is using handling memory from allocating it to utilizing it for processing transactions or processes. This is useful for example if you are observing MySQL is behaving abnormally that might cause bad configuration or could lead to a findings of memory leaks.

For example, using perf in MySQL reveals more information in a system level report:

[root@testnode5 ~]# perf report --input perf.data --stdio

# To display the perf.data header info, please use --header/--header-only options.

#

#

# Total Lost Samples: 0

#

# Samples: 54K of event 'cpu-clock'

# Event count (approx.): 13702000000

#

# Overhead  Command Shared Object        Symbol                                                                                                                                                                                             

# ........  ....... ...................  ...................................................................................................................................................................................................

#

    60.66%  mysqld [kernel.kallsyms]    [k] _raw_spin_unlock_irqrestore

     2.79%  mysqld   libc-2.17.so         [.] __memcpy_ssse3

     2.54%  mysqld   mysqld             [.] ha_key_cmp

     1.89%  mysqld   [vdso]             [.] __vdso_clock_gettime

     1.05%  mysqld   mysqld             [.] rec_get_offsets_func

     1.03%  mysqld   mysqld             [.] row_sel_field_store_in_mysql_format_func

     0.92%  mysqld   mysqld             [.] _mi_rec_pack

     0.91%  mysqld   [kernel.kallsyms]    [k] finish_task_switch

     0.90%  mysqld   mysqld             [.] row_search_mvcc

     0.86%  mysqld   mysqld             [.] decimal2bin

     0.83%  mysqld   mysqld             [.] _mi_rec_check

….

Since this can be a special topic to dig in, we suggest you look into these really good external blogs as your references, perf Basics for MySQL Profiling, Finding MySQL Scaling Problems Using perf, or learn how to debug using valgrind with massif.

Efficient Way To Check MySQL Memory Utilization

Using ClusterControl relieves any hassle routines like going over through your runbooks or even creating your own playbooks that would deliver reports for you. In ClusterControl, you have Dashboards (using SCUMM) where you can have a quick overview of your MySQL node(s). For example, viewing the MySQL General dashboard,

you can determine how the MySQL node performs,

You see that the images above reveal variables that impact MySQL memory utilization. You can check how the metrics for sort caches, temporary tables, threads connected, query cache, or storage engines innodb buffer pool or MyISAM's key buffer.

Using ClusterControl offers you a one-stop utility tool where you can also check queries running to determine those processes (queries) that can impact high memory utilization. See below for an example,

Viewing the status variables of MySQL is quiet easy,

You can even go to Performance -> Innodb Status as well to reveal the current InnoDB status of your database nodes. Also, in ClusterControl, an incident is detected, it will try to collect incident and shows history as a report that provides you InnoDB status as shown in our previous blog about MySQL Freeze Frame.

Summary

Troubleshooting and diagnosing your MySQL database when suspecting high memory utilization isn't that difficult as long as you know the procedures and tools to use. Using the right tool offers you more flexibility and faster productivity to deliver fixes or solutions with a chance of greater result.

by Paul Namuag at February 21, 2020 10:45 AM

February 20, 2020

Percona

A Hidden Gem in MySQL: MyRocks

using MyRocks in MySQL

using MyRocks in MySQLIn this blog post, we will share some experiences with the hidden gem in MySQL called MyRocks, a storage engine for MySQL’s famous pluggable storage engine system. MyRocks is based on RocksDB which is a fork of LevelDB. In short, it’s another key-value store based on LSM-tree, thus granting it some distinctive features compared with other MySQL engines. It was introduced in 2016 by Facebook and later included, respectively, in Percona Server for MySQL and MariaDB

Background and History

The original paper on LSM was published in 1996, and if you need a single takeaway, the following quote is the one: “The LSM-tree uses an algorithm that defers and batches index changes, cascading the changes from a memory-based component through one or more disk components in an efficient manner reminiscent of merge sort.”  At the time, disks were slow and IOPS expensive, and the idea was to minimize the write costs by essentially turning random write load into a sequential one. The technology is quite popular, being a foundation or inspiration in a multitude of databases and storage engines: HBase, LevelDB, RocksDB, Tarantool, WiredTiger, and more. Even in 2020, when storage is faster and cheaper, LSM-tree can still provide substantial benefits for some workloads.

The development of MyRocks was started around 2015 by Facebook. Yoshinori Matsunobu gave multiple presentations, detailing the reasoning behind using RocksDB inside MySQL. They were underutilizing the servers because they were constrained in disk space and MyRocks allowed for better space efficiency. This better space efficiency is inherent for LSM tree storage engines.

So far, MyRocks continues to be a somewhat niche solution, and, frankly, not a lot of people know about it and consider its use. Without further ado, let’s see how it works and why would you want to use it.

Working Dynamics of MyRocks

MyRocks in MySQL

MyRocks engine is based on LSM-tree structure, which we have mentioned above. That makes it a very different beast than InnoDB. So let’s take a high-level overview of MyRocks internals. First, how does row-based data fit into key-value store? You can think of a regular clustered index as a key-value structure on its own: there’s a key, which value is a whole row. Secondary indexes can have primary indexes’ key as value, and additionally a column data value.

Writes

All writes in MyRocks are done sequentially to a special structure called memtable, one of the few mutable structures in the engine. Since we need the writes to actually be durable, all writes are also written to WAL (a concept similar to InnoDB redo log), which is flushed to disk. Once the memtable becomes full, it’s copied in memory and made immutable. In the background, the immutable memtables will be flushed to disk in the form of sorted string tables (SSTs), forming the L0 of the multi-leveled compaction scheme. During this initial flush, changes in the memtable are deduplicated (a thousand updates for one key become a single update). Resulting SSTs are immutable, and, on L0, have overlapping data.

As more SSTs are created on L0, they will start to pour over to L1…L6. On each level after L0, data within SSTs is not overlapping, thus compaction can proceed in parallel. Compaction takes an SST from the higher level, and merges it with one (or more) SSTs on the lower level, deleting the originals and creating new ones on the lower level. Eventually, data reaches the lowest level. As you can see below, each level has more and more data, so most data is actually stored at the lower levels. The merge mentioned happens for Key Value pairs, and during the merge KV on the lower level will always be older than KV on the higher one, and thus can be discarded.

LSM Leveled Compaction

 

Having immutable SSTs allows them to be filled to 100% all the time, improving space utilization. In fact, that’s one of the selling points of MyRocks, as it allows for greater space efficiency. In addition to the inherent compactness of the SSTs, data there is also compressed, which further minimizes the footprint. An interesting feature here is that you can specify different compression algorithms for the bottommost (where, by nature, most of the data is) and other levels.

Another important component for the MyRocks engine is Column Family (CF). Each key-value pair (or, in familiar terms, each index) is associated with a CF. Quoting the Percona Server for MySQL docs: “Each column family has distinct attributes, such as block size, compression, sort order, and MemTable.” In addition to controlling physical storage characteristics, this provides atomicity for queries across different key spaces.

MyRocks in MySQL

Reads

So far we’ve only been talking about writing the data. Reading it is also quite different in MyRocks due to its structure. Since the data is leveled, to find a value for a key, you need to look at memtables, L0, L1 … L6. This is an LSM read penalty. However, you don’t always have to scan the whole dataset to find the row, and not all scans go to disk. The read path starts in memtables, which will by definition have the most recent data. Then the block cache will be used, which might contain the recently-accessed data.

Once in-memory options are exhausted, reads will spill to disk and start traversing SSTs on consecutive levels. L0 has to be scanned whole since data in SSTs overlaps, but only a subset of SSTs on other levels has to be scanned, as we know key ranges of data inside each SST. To further improve this scanning, bloom filters are utilized, which helps the scan operation answer a question: “is key present in given SST?” – but only if we are sure it’s not present. Thus, we can avoid reading some SSTs, whose key range covers the key we look for. Unfortunately, for now, there’s no BF-like technique for range scans, though prefix bloom filters might help.

Each time we find the data we’re looking for, we populate the block cache for future use. In addition to that, index and bloom filter data is also cached, thus speeding up the SST scans even if the data is not in block cache. Even with all of these improvements, you can see that in general, the reads are more involved than they are in regular b-tree storage engines. The negative effects, however, become less pronounced the more data there’s in the data set.

Tools and Utilities

Production readiness of a solution is defined not only by its own maturity but also by the ecosystem around it. Let’s review how MyRocks fits with existing tools and regular maintenance activities.

First and foremost, can we back it up online with minimal locking as we can innodb? The answer is yes (with some catches). Original Facebook’s MySQL 5.6 includes myrocks_hotbackup script, which enables hot backups of MyRocks, but no other engines. Starting with Percona XtraBackup version 8.0.6 and Mariabackup 10.2.16/10.3.8, we have the ability to use a single tool to back up heterogeneous clusters.

One of the significant MyRocks limitations is that it doesn’t support online DDL as InnoDB does. You can use solutions like pt-online-schema-change and gh-ost, which are preferred anyway when doing large table changes. For pt-osc, there are some details to note. Global transaction isolation should be set to Read Committed, or pt-osc will fail when a target table is already in RocksDB engine. It also needs binlog_format to be set to ROW. Both of these settings are usually advisable for MyRocks anyway, as it doesn’t support gap locking yet, and so its repeatable read implementation differs.

Because we’re limited to ROW-level replication, tools like pt-table-checksum and pt-table-sync will not work, so be careful with the data consistency.

Monitoring is another important consideration for production use. MyRocks is quite well-instrumented internally, providing more than a hundred metrics, extensive show engine output, and verbose logging. Here’s an overview of some of the available metrics: MyRocks Information Schema. With Percona Monitoring and Management, you get a dedicated dashboard for MyRocks, providing an overview of the internals of the engine.

Partitioning in MyRocks is supported and has an interesting feature where you can assign partitions to different column families: Column Families on Partitioned Tables.

Unfortunately, for now, encryption does not work with MyRocks, even though RocksDB supports pluggable encryption.

Load Test and Comparison Versus InnoDB

We have compiled a basic load test on MyRocks vs InnoDB with the following details. 

We downloaded Ontime Performance Data Reporting for the year 2019 and loaded it to both engines. The test consisted of loading to a single table data for one year worth of information (about 14million rows). Load scripts can be found at github repo.

AWS Instance : t2.large – 8Gb Ram – 16Gb SSD

Engine  Size Duration Rows Method
innodb + log_bin off 5.6Gb 9m56 14,009,743 Load Infile
innodb + log_bin on 5.6Gb ** 11m58 14,009,743 Load Infile
innodb compressed + log_bin on 2.6Gb ** 17m9 14,009,743 Load Infile
innodb compressed + log_bin off 2.6Gb 15m56 14,009,743 Load Infile
myrocks/lz4 + log_bin on 1.4G* 9m24 14,009,743 Load Infile
myrocks/lz4 + log_bin off 1.4G* 8m2 14,009,743 Load Infile

 

* MyRocks WAL files aren’t included (This is a configurable parameter) 

**InnoDB Redo logs aren’t included

Conclusion

As we’ve shown above, MyRocks can be a surprisingly versatile choice of the storage engine. While usually it’s sold on space efficiency and write load, benchmarks show that it’s quite good in TPC-C workload. So when would you use MyRocks?

 In the simplest terms:

  • You have extremely large data sets, much bigger than the memory available
  • The bulk of your load is write-only
  • You need to save on space

This best translates to servers with expensive storage (SSDs), and to the cloud, where these could be significant price points.

But real databases rarely consist of pure log data. We do selects, be it point lookups or range queries, we modify the data. As it happens, if you can sacrifice some database-side constraints, MyRocks can be surprisingly good as a general-purpose storage engine, more so the larger the data set you have. Give it a try, and let us know. 

Limitations to consider before moving forward:

  • Foreign Keys
  • Full-Text Keys
  • Spatial Keys
  • No Tablespaces (instead, Column Families)
  • No Online DDL (pt-osc and gh-ost help here)
  • Other limitations listed in the documentation
  • Not supported by Percona XtraDB Cluster/Galera
  • Only binary collations supported for indexes

Warnings:

It’s designed for small transactions, so configure for bulk operations. For loading data, use rocksdb_bulk_load=1, and for deleting large data sets use rocksdb-commit-in-the-middle.

Mixing different storage engines in one transaction will work, but be aware of the differences of how isolation levels work between InnoDB and RocksDB engines, and limitations like the lack of Savepoints. Another important thing to note when mixing storage engines is that they use different memory structures, so plan carefully.

Corrupted immutable files are not recoverable.

References 

MyRocks Deep Dive

Exposing MyRocks Internals Via System Variables: Part 1, Data Writing

Webinar: How to Rock with MyRocks

MyRocks Troubleshooting

MyRocks Introduction

Optimizer Statistics in MyRocks

MyRocks and InnoDB: a summary

RocksDB Is Eating the Database World

by Alkin Tezuysal at February 20, 2020 04:47 PM

SeveralNines

How to Protect your MySQL or MariaDB Database From SQL Injection: Part Two

In the first part of this blog we described how ProxySQL can be used to block incoming queries that were deemed dangerous. As you saw in that blog, achieving this is very easy. This is not a full solution, though. You may need to design an even more tightly secured setup - you may want to block all of the queries and then allow just some select ones to pass through. It is possible to use ProxySQL to accomplish that. Let’s take a look at how it can be done.

There are two ways to implement whitelist in ProxySQL. First, the historical one, would be to create a catch-all rule that will block all the queries. It should be the last query rule in the chain. An example below:

We are matching every string and generate an error message. This is the only rule existing at this time, it prevents any query from being executed.

mysql> USE sbtest;

Database changed

mysql> SELECT * FROM sbtest1 LIMIT 10;

ERROR 1148 (42000): This query is not on the whitelist, you have to create a query rule before you'll be able to execute it.

mysql> SHOW TABLES FROM sbtest;

ERROR 1148 (42000): This query is not on the whitelist, you have to create a query rule before you'll be able to execute it.

mysql> SELECT 1;

ERROR 1148 (42000): This query is not on the whitelist, you have to create a query rule before you'll be able to execute it.

As you can see, we can’t run any queries. In order for our application to work we would have to create query rules for all of the queries that we want to allow to execute. It can be done per query, based on the digest or pattern. You can also allow traffic based on the other factors: username, client host, schema. Let’s allow SELECTs to one of the tables:

Now we can execute queries on this table, but not on any other:

mysql> SELECT id, k FROM sbtest1 LIMIT 2;

+------+------+

| id   | k |

+------+------+

| 7615 | 1942 |

| 3355 | 2310 |

+------+------+

2 rows in set (0.01 sec)

mysql> SELECT id, k FROM sbtest2 LIMIT 2;

ERROR 1148 (42000): This query is not on the whitelist, you have to create a query rule before you'll be able to execute it.

The problem with this approach is that it is not efficiently handled in ProxySQL, therefore in ProxySQL 2.0.9 comes with new mechanism of firewalling which includes new algorithm, focused on this particular use case and as such more efficient. Let’s see how we can use it.

First, we have to install ProxySQL 2.0.9. You can download packages manually from https://github.com/sysown/proxysql/releases/tag/v2.0.9 or you can set up the ProxySQL repository.

 

Once this is done, we can start looking into it and try to configure it to use SQL firewall. 

The process itself is quite easy. First of all, you have to add a user to the mysql_firewall_whitelist_users table. It contains all the users for which firewall should be enabled.

mysql> INSERT INTO mysql_firewall_whitelist_users (username, client_address, mode, comment) VALUES ('sbtest', '', 'DETECTING', '');

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL FIREWALL TO RUNTIME;

Query OK, 0 rows affected (0.00 sec)

In the query above we added ‘sbtest’ user to the list of users which should have firewall enabled. It is possible to tell that only connections from a given host are tested against the firewall rules. You can also have three modes: ‘OFF’, when firewall is not used, ‘DETECTING’, where incorrect queries are logged but not blocked and ‘PROTECTING’, where not allowed queries will not be executed.

Let’s enable our firewall:

mysql> SET mysql-firewall_whitelist_enabled=1;

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL VARIABLES TO RUNTIME;

Query OK, 0 rows affected (0.00 sec)

ProxySQL firewall bases on the digest of the queries, it does not allow for regular expressions to be used. The best way to collect data about which queries should be allowed is to use stats.stats_mysql_query_digest table, where you can collect queries and their digests. On top of that, ProxySQL 2.0.9 comes with a new table: history_mysql_query_digest, which is an persistent extension to the previously mentioned in-memory table. You can configure ProxySQL to store data on disk from time to time:

mysql> SET admin-stats_mysql_query_digest_to_disk=30;

Query OK, 1 row affected (0.00 sec)

Every 30 seconds data about queries will be stored on disk. Let’s see how it goes. We’ll execute couple of queries and then check their digests:

mysql> SELECT schemaname, username, digest, digest_text FROM history_mysql_query_digest;

+------------+----------+--------------------+-----------------------------------+

| schemaname | username | digest             | digest_text |

+------------+----------+--------------------+-----------------------------------+

| sbtest     | sbtest | 0x76B6029DCBA02DCA | SELECT id, k FROM sbtest1 LIMIT ? |

| sbtest     | sbtest | 0x1C46AE529DD5A40E | SELECT ?                          |

| sbtest     | sbtest | 0xB9697893C9DF0E42 | SELECT id, k FROM sbtest2 LIMIT ? |

+------------+----------+--------------------+-----------------------------------+

3 rows in set (0.00 sec)

As we set the firewall to ‘DETECTING’ mode, we’ll also see entries in the log:

2020-02-14 09:52:12 Query_Processor.cpp:2071:process_mysql_query(): [WARNING] Firewall detected unknown query with digest 0xB9697893C9DF0E42 from user sbtest@10.0.0.140

2020-02-14 09:52:17 Query_Processor.cpp:2071:process_mysql_query(): [WARNING] Firewall detected unknown query with digest 0x76B6029DCBA02DCA from user sbtest@10.0.0.140

2020-02-14 09:52:20 Query_Processor.cpp:2071:process_mysql_query(): [WARNING] Firewall detected unknown query with digest 0x1C46AE529DD5A40E from user sbtest@10.0.0.140

Now, if we want to start blocking queries, we should update our user and set the mode to ‘PROTECTING’. This will block all the traffic so let’s start by whitelisting queries above. Then we’ll enable the ‘PROTECTING’ mode:

mysql> INSERT INTO mysql_firewall_whitelist_rules (active, username, client_address, schemaname, digest, comment) VALUES (1, 'sbtest', '', 'sbtest', '0x76B6029DCBA02DCA', ''), (1, 'sbtest', '', 'sbtest', '0xB9697893C9DF0E42', ''), (1, 'sbtest', '', 'sbtest', '0x1C46AE529DD5A40E', '');

Query OK, 3 rows affected (0.00 sec)

mysql> UPDATE mysql_firewall_whitelist_users SET mode='PROTECTING' WHERE username='sbtest' AND client_address='';

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL FIREWALL TO RUNTIME;

Query OK, 0 rows affected (0.00 sec)

mysql> SAVE MYSQL FIREWALL TO DISK;

Query OK, 0 rows affected (0.08 sec)

That’s it. Now we can execute whitelisted queries:

mysql> SELECT id, k FROM sbtest1 LIMIT 2;

+------+------+

| id   | k |

+------+------+

| 7615 | 1942 |

| 3355 | 2310 |

+------+------+

2 rows in set (0.00 sec)

But we cannot execute non-whitelisted ones:

mysql> SELECT id, k FROM sbtest3 LIMIT 2;

ERROR 1148 (42000): Firewall blocked this query

ProxySQL 2.0.9 comes with yet another interesting security feature. It has embedded libsqlinjection and you can enable the detection of possible SQL injections. Detection is based on the algorithms from the libsqlinjection. This feature can be enabled by running:

mysql> SET mysql-automatic_detect_sqli=1;

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL VARIABLES TO RUNTIME;

Query OK, 0 rows affected (0.00 sec)

It works with the firewall in a following way:

  • If the firewall is enabled and the user is in PROTECTING mode, SQL injection detection is not used as only explicitly whitelisted queries can pass through.
  • If the firewall is enabled and the user is in DETECTING mode, whitelisted queries are not tested for SQL injection, all others will be tested.
  • If the firewall is enabled and the user is in ‘OFF’ mode, all queries are assumed to be whitelisted and none will be tested for SQL injection.
  • If the firewall is disabled, all queries will be tested for SQL intection.

Basically, it is used only if the firewall is disabled or for users in ‘DETECTING’ mode. SQL injection detection, unfortunately, comes with quite a lot of false positives. You can use table mysql_firewall_whitelist_sqli_fingerprints to whitelist fingerprints for queries which were detected incorrectly. Let’s see how it works. First, let’s disable firewall:

mysql> set mysql-firewall_whitelist_enabled=0;

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL VARIABLES TO RUNTIME;

Query OK, 0 rows affected (0.00 sec)

Then, let’s run some queries.

mysql> SELECT id, k FROM sbtest2 LIMIT 2;

ERROR 2013 (HY000): Lost connection to MySQL server during query

Indeed, there are false positives. In the log we could find:

2020-02-14 10:11:19 MySQL_Session.cpp:3393:handler(): [ERROR] SQLinjection detected with fingerprint of 'EnknB' from client sbtest@10.0.0.140 . Query listed below:

SELECT id, k FROM sbtest2 LIMIT 2

Ok, let’s add this fingerprint to the whitelist table:

mysql> INSERT INTO mysql_firewall_whitelist_sqli_fingerprints VALUES (1, 'EnknB');

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL FIREWALL TO RUNTIME;

Query OK, 0 rows affected (0.00 sec)

Now we can finally execute this query:

mysql> SELECT id, k FROM sbtest2 LIMIT 2;

+------+------+

| id   | k |

+------+------+

|   84 | 2456 |

| 6006 | 2588 |

+------+------+

2 rows in set (0.01 sec)

We tried to run sysbench workload, this resulted in two more fingerprints added to the whitelist table:

2020-02-14 10:15:55 MySQL_Session.cpp:3393:handler(): [ERROR] SQLinjection detected with fingerprint of 'Enknk' from client sbtest@10.0.0.140 . Query listed below:

SELECT c FROM sbtest21 WHERE id=49474

2020-02-14 10:16:02 MySQL_Session.cpp:3393:handler(): [ERROR] SQLinjection detected with fingerprint of 'Ef(n)' from client sbtest@10.0.0.140 . Query listed below:

SELECT SUM(k) FROM sbtest32 WHERE id BETWEEN 50053 AND 50152

We wanted to see if this automated SQL injection can protect us against our good friend, Booby Tables.

mysql> CREATE TABLE school.students (id INT, name VARCHAR(40));

Query OK, 0 rows affected (0.07 sec)

mysql> INSERT INTO school.students VALUES (1, 'Robert');DROP TABLE students;--

Query OK, 1 row affected (0.01 sec)

Query OK, 0 rows affected (0.04 sec)

mysql> SHOW TABLES FROM school;

Empty set (0.01 sec)

Unfortunately, not really. Please keep in mind this feature is based on automated forensic algorithms, it is far from perfect. It may come as an additional layer of defence but it will never be able to replace properly maintained firewall created by someone who knows the application and its queries.

We hope that after reading this short, two-part series you have a better understanding of how you can protect your database against SQL injection and malicious attempts (or just plainly user errors) using ProxySQL. If you have more ideas, we’d love to hear from you in the comments.

by krzysztof at February 20, 2020 10:45 AM

February 19, 2020

SeveralNines

When Should I Add an Extra Database Node?

The fact that people are not easily convinced to have an additional database node in production due to cost is somewhat absurd and is an idea that should be put aside. While adding a new node would bring more complexity to the current database infrastructure, there is a plethora of automation and helper tools in the market that can help you manage the scalability and continuity of the database layer. 

There are diverse reasons that may influence this somewhat costly decision, and you will probably realize it only when something is going south or starting to fall apart. This blog post provides common reasons when you should add an extra database node into your existing database infrastructure, whether you are running on a standalone or a clustered setup.

Faster Recovery Time

The ultimate reason for having an extra database node for redundancy is to achieve better availability and faster recovery time when something goes wrong. It's a protection against malfunctions that could occur on the primary database node and you would have a standby node which is ready to take over the primary role from the problematic node at any given time. 

A standby node replicating to a primary node is probably the most cost-effective solution that you can have to improve the recovery time. When the primary database node is down, promote the standby node as the new master and change the database connection string in the applications to connect to the new master and you are pretty much back in business. The failover process can then be automated and fine tuned over time, or you could introduce a reverse proxy tier which acts as the gateway on top of the database tier.

Improved Performance

Application grows to be more demanding over time. The magnitude of growth could be exponential depending on the success of your business. Scaling out your database tier to cater for bigger workloads is commonly necessary to improve the performance and responsiveness of your applications. 

Database workloads can be categorized into two - reads or writes. For read-intensive workload, adding more database replicas will help to spread out the load to multiple database servers. For write-intensive workload, adding more database masters will likely reduce the contention that commonly happens in a single node and improve parallelism processing. Just make sure that the multi-master clustering technology that you use supports conflict detection and resolution, otherwise the application has to handle this part separately.

Approaching the Thresholds

As your database usage grows, there will be a point of time where the database node is fast approaching the defined threshold for the server and database resources. Resources like CPU clock, RAM, disk I/O and disk space are frequently becoming the limiting factors for your database to keep up with the demand.

For example, one would probably hit the limit of storage allocated for the database and also approaching the maximum connections allowed to the database. In this case, partitioning your data into multiple nodes would make more sense because you would get more storage space and I/O operations with the ability to process bigger write workloads for the database, just like killing two birds with one stone.

Upgrade Testing

Before upgrading to another major version, it's recommended to test out your current dataset on the new version just to make sure you can operate smoothly and eliminate the element of surprise later on. It's pretty common for the new major version to deprecate some legacy options or parameters that we have been using in the current version and some incompatibilities where application programming changes might be required. Also, you can measure the performance improvement (or regression) that you will get after upgrading, which could justify the reason for this exercise.

Major version upgrade commonly requires extra attention to the upgrade step, if compared to minor version patching which usually can be performed with few steps. Minor releases never change the internal storage format and are always compatible with earlier and later minor releases of the same major version number.

Generally, there are 3 ways to perform database major version upgrade:

  • In-place
  • Logical upgrade
  • Replication

In-place, where you use existing data directory against the new database major version, with just running upgrade script after binaries are upgraded. For logical upgrade, use the logical backup on an old version and then restore it on a new version. This usually requires an additional database node, unless you would like to restore the logical backup on the new version installed in the same server as the old one.

For replication, create a standby server with the updated database version and replicate from the old major version. Once everything is synced up, connect your application to the to the standby (or slave) server and verify if necessary adjustments are required. Then, you can promote the standby server as the new master and your database server is officially upgraded, with a very minimal downtime.

Backup Verification

We have stressed out this a couple of times in the older blog posts - backup is not a backup if it is not restorable. Backup verification is an important process to ensure you meet your RTO, which basically represents how long it takes to restore from the incident until normal operations are available to the mass users.

You can measure the amount of time it takes to recover by observing the backup verification process, which is the best to be performed on a separate node, as you don't want to increase the burden or put the production database servers under risks.

The ClusterControl backup verification feature allows you to estimate your total mean recovery time, with the extra database node used for verification process can be configured to shut down automatically right after verification process completes. Check out this blog post if you want to learn more about how ClusterControl performs this job.

Conclusion

As your database grows, scaling out your database nodes is going to be necessary and must be well-thought since the beginning. The actual cost of having more database nodes for your environment sometimes justify your requirements and could be more than worth it to keep up with the growth of your business.

 

by ashraf at February 19, 2020 10:45 AM

February 18, 2020

SeveralNines

What to Check if PostgreSQL Memory Utilization is High

Reading from memory will always be more performant than going to disk, so for all database technologies you would want to use as much memory as possible. If you are not sure about the configuration, or you have an error, this could generate high memory utilization or even an out-of-memory issue.

In this blog, we’ll look at how to check your PostgreSQL memory utilization and which parameter you should take into account to tune it. For this, let’s start by seeing an overview of PostgreSQL's architecture.

PostgreSQL Architecture

PostgreSQL's architecture is based on three fundamental parts: Processes, Memory, and Disk.

The memory can be classified into two categories:

  • Local Memory: It is loaded by each backend process for its own use for queries processing. It is divided into sub-areas:
    • Work mem: The work mem is used for sorting tuples by ORDER BY and DISTINCT operations, and for joining tables.
    • Maintenance work mem: Some kinds of maintenance operations use this area. For example, VACUUM, if you’re not specifying autovacuum_work_mem.
    • Temp buffers: It is used for store temporary tables.
  • Shared Memory: It is allocated by the PostgreSQL server when it is started, and it is used by all the processes. It is divided into sub-areas:
    • Shared buffer pool: Where PostgreSQL loads pages with tables and indexes from disk, to work directly from memory, reducing the disk access.
    • WAL buffer: The WAL data is the transaction log in PostgreSQL and contains the changes in the database. WAL buffer is the area where the WAL data is stored temporarily before writing it to disk into the WAL files. This is done every some predefined time called checkpoint. This is very important to avoid the loss of information in the event of a server failure.
    • Commit log: It saves the status of all transactions for concurrency control.

How to Know What is Happening

If you are having high memory utilization, first, you should confirm which process is generating the consumption.

Using the “Top” Linux Command

The top linux command is probably the best option here (or even a similar one like htop). With this command, you can see the process/processes that are consuming too much memory. 

When you confirm that PostgreSQL is responsible for this issue, the next step is to check why.

Using the PostgreSQL Log

Checking both the PostgreSQL and systems logs is definitely a good way to have more information about what is happening in your database/system. You could see messages like:

Resource temporarily unavailable

Out of memory: Kill process 1161 (postgres) score 366 or sacrifice child

If you don’t have enough free memory.

Or even multiple database message errors like:

FATAL:  password authentication failed for user "username"

ERROR:  duplicate key value violates unique constraint "sbtest21_pkey"

ERROR:  deadlock detected

When you are having some unexpected behavior on the database side. So, the logs are useful to detect these kinds of issues and even more. You can automate this monitoring by parsing the log files looking for works like “FATAL”, “ERROR” or “Kill”, so you will receive an alert when it happens.

Using Pg_top

If you know that the PostgreSQL process is having a high memory utilization, but the logs didn’t help, you have another tool that can be useful here, pg_top.

This tool is similar to the top linux tool, but it’s specifically for PostgreSQL. So, using it, you will have more detailed information about what is running your database, and you can even kill queries, or run an explain job if you detect something wrong. You can find more information about this tool here.

But what happens if you can’t detect any error, and the database is still using a lot of RAM. So, you will probably need to check the database configuration.

Which Configuration Parameters to Take into Account

If everything looks fine but you still have the high utilization problem, you should check the configuration to confirm if it is correct. So, the following are parameters that you should take into account in this case.

shared_buffers

This is the amount of memory that the database server uses for shared memory buffers. If this value is too low, the database would use more disk, which would cause more slowness, but if it is too high, could generate high memory utilization. According to the documentation, if you have a dedicated database server with 1GB or more of RAM, a reasonable starting value for shared_buffers is 25% of the memory in your system.

work_mem

It specifies the amount of memory that will be used by the ORDER BY, DISTINCT and JOIN before writing to the temporary files on disk. As with the shared_buffers, if we configure this parameter too low, we can have more operations going into disk, but too high is dangerous for the memory usage. The default value is 4 MB.

max_connections

Work_mem also goes hand to hand with the max_connections value, as each connection will be executing these operations at the same time, and each operation will be allowed to use as much memory as specified by this value before it starts to write data in temporary files. This parameter determines the maximum number of simultaneous connections to our database, if we configure a high number of connections, and don’t take this into account, you can start having resource issues. The default value is 100.

temp_buffers

The temporary buffers are used to store the temporary tables used in each session. This parameter sets the maximum amount of memory for this task. The default value is 8 MB.

maintenance_work_mem

This is the max memory that an operation like Vacuuming, adding indexes or foreign keys can consume. The good thing is that only one operation of this type can be run in a session, and is not the most common thing to be running several of these at the same time in the system. The default value is 64 MB.

autovacuum_work_mem

The vacuum uses the maintenance_work_mem by default, but we can separate it using this parameter. We can specify the maximum amount of memory to be used by each autovacuum worker here.

wal_buffers

The amount of shared memory used for WAL data that has not yet been written to disk. The default setting is 3% of shared_buffers, but not less than 64kB nor more than the size of one WAL segment, typically 16MB. 

Conclusion

There are different reasons to have a high memory utilization, and detecting the root issue could be a time-consuming task. In this blog, we mentioned different ways to check your PostgreSQL memory utilization and which parameter should you take into account to tune it, to avoid excessive memory usage.

by Sebastian Insausti at February 18, 2020 07:20 PM

Shlomi Noach

The state of Orchestrator, 2020 (spoiler: healthy)

This post serves as a pointer to my previous announcement about The state of Orchestrator, 2020.

Thank you to Tom Krouper who applied his operational engineer expertise to content publishing problems.

by shlomi at February 18, 2020 07:14 PM

Oli Sennhauser

InnoDB Page Cleaner intended loop takes too long

Recently we migrated a database system from MySQL 5.7 to MariaDB 10.3. Everything went fine so far just the following message started to pop-up in the MariaDB Error Log File with the severity Note:

InnoDB: page_cleaner: 1000ms intended loop took 4674ms. The settings might not be optimal. (flushed=102 and evicted=0, during the time.)

I remember that this message also appeared in earlier MySQL 5.7 releases but somehow disappeared in later releases. I assume MySQL has just disabled the Note?

You can find various advices in the Internet on to get rid of this Note:

innodb_lru_scan_depth        = 1024, 256
innodb_buffer_pool_instances = 1, 8
innodb_io_capcity            = 100, 200 or 1000
innodb_page_cleaners         = 1, 4 or 8

But non of these changes made the Note go away in our case. I only found one voice claiming it could be an external reason which makes this message appear. Because we are actually running on a Cloud-Machine the appearance of this message could really be an effect of the Cloud and not caused by the Database or the Application.

We further know that our MariaDB Database has a more or less uniform workload over the day. Further it is a Master/Master (active/passive) set-up. So both nodes should see more or less the same write load at the same time.

But as our investigation clearly shows that the Note does not appear at the same time on both nodes. So I strongly assume it is a noisy-neighbour problem.

First we tried to find any trend or correlation between these 2 Master/Master Databases maas1 and maas2:

What we can see here is, that the message appeared on different days on maas1 and maas2. The database maas1 had a problem in the beginning of December and end of January. Database maas1 had much less problems in general but end of December there was a problem.

During night both instances seem to have less problems than during the day. And maas2 has more problems in the afternoon and evening.

If we look at the distribution per minute we can see that maas2 has some problems around xx:45 to xx:50 and maas1 more at xx:15.

Then we had a closer look at 28 January at about 12:00 to 15:00 on maas2:

We cannot see any anomalies which would explain a huge increase of dirty pages and and a page_cleaner stuck.

The only thing we could see at the specified time is that I/O latency significantly increased on server side. Because we did not cause more load and over-saturated the system it must be triggered externally:

This correlates quite well to the Notes we see in the MariaDB Error Log on maas2:

2020-01-28 12:45:27 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5760ms. The settings might not be optimal. (flushed=101 and evicted=0, during the time.)
2020-01-28 12:46:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 6908ms. The settings might not be optimal. (flushed=4 and evicted=0, during the time.)
2020-01-28 12:46:32 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5339ms. The settings might not be optimal. (flushed=17 and evicted=0, during the time.)
2020-01-28 12:47:36 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4379ms. The settings might not be optimal. (flushed=101 and evicted=0, during the time.)
2020-01-28 12:48:08 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5053ms. The settings might not be optimal. (flushed=7 and evicted=0, during the time.)
2020-01-28 12:48:42 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5760ms. The settings might not be optimal. (flushed=102 and evicted=0, during the time.)
2020-01-28 12:49:38 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4202ms. The settings might not be optimal. (flushed=100 and evicted=0, during the time.)
2020-01-28 12:57:28 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4615ms. The settings might not be optimal. (flushed=18 and evicted=0, during the time.)
2020-01-28 12:58:01 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5593ms. The settings might not be optimal. (flushed=4 and evicted=0, during the time.)
2020-01-28 12:58:34 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5442ms. The settings might not be optimal. (flushed=101 and evicted=0, during the time.)
2020-01-28 12:59:31 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4327ms. The settings might not be optimal. (flushed=101 and evicted=0, during the time.)
2020-01-28 13:00:05 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5154ms. The settings might not be optimal. (flushed=82 and evicted=0, during the time.)
2020-01-28 13:08:01 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4321ms. The settings might not be optimal. (flushed=4 and evicted=0, during the time.)
2020-01-28 13:10:46 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 21384ms. The settings might not be optimal. (flushed=100 and evicted=0, during the time.)
2020-01-28 13:14:16 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4180ms. The settings might not be optimal. (flushed=20 and evicted=0, during the time.)
2020-01-28 13:14:49 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4935ms. The settings might not be optimal. (flushed=4 and evicted=0, during the time.)
2020-01-28 13:15:20 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4472ms. The settings might not be optimal. (flushed=25 and evicted=0, during the time.)
2020-01-28 13:15:47 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4358ms. The settings might not be optimal. (flushed=9 and evicted=0, during the time.)
2020-01-28 13:48:31 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 6212ms. The settings might not be optimal. (flushed=9 and evicted=0, during the time.)
2020-01-28 13:55:44 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4280ms. The settings might not be optimal. (flushed=101 and evicted=0, during the time.)
2020-01-28 13:59:43 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5817ms. The settings might not be optimal. (flushed=4 and evicted=0, during the time.)
2020-01-28 14:00:16 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5384ms. The settings might not be optimal. (flushed=100 and evicted=0, during the time.)
2020-01-28 14:00:52 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 9460ms. The settings might not be optimal. (flushed=5 and evicted=0, during the time.)
2020-01-28 14:01:25 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 7727ms. The settings might not be optimal. (flushed=103 and evicted=0, during the time.)
2020-01-28 14:01:57 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 7154ms. The settings might not be optimal. (flushed=5 and evicted=0, during the time.)
2020-01-28 14:02:29 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 7501ms. The settings might not be optimal. (flushed=5 and evicted=0, during the time.)
2020-01-28 14:03:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4322ms. The settings might not be optimal. (flushed=78 and evicted=0, during the time.)
2020-01-28 14:32:02 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4927ms. The settings might not be optimal. (flushed=4 and evicted=0, during the time.)
2020-01-28 14:32:34 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4506ms. The settings might not be optimal. (flushed=101 and evicted=0, during the time.)

by Shinguz at February 18, 2020 04:50 PM

Shlomi Noach

The state of Orchestrator, 2020 (spoiler: healthy)

Yesterday was my last day at GitHub, and this post explains what this means for orchestrator. First, a quick historical review:

  • 2014: I began work on orchestrator at Outbrain, as https://github.com/outbrain/orchestrator. I authored several open source projects while working for Outbrain, and created orchestrator to solve discovery, visualization and simple refactoring needs. Outbrain was happy to have the project developed as a public, open source repo from day 1, and it was released under the Apache 2 license. Interestingly, the idea to develop orchestrator came after I attended Percona Live Santa Clara 2014 and watched "ChatOps: How GitHub Manages MySQL" by one Sam Lambert.
  • 2015: Joined Booking.com where my main focus was to redesign and solve issues with the existing high availability setup. With Booking.com's support, I continued work on orchestrator, pursuing better failure detection and recovery processes. Booking.com was an incredible playground and testbed for orchestrator, a massive deployment of multiple MySQL/MariaDB flavors and configuration.
  • 2016 - 2020: Joined GitHub. GitHub adopted orchestrator and I developed it under GitHub's own org, at https://github.com/github/orchestrator. It became a core component in github.com's high availability design, running failure detection and recoveries across sites and geographical regions, with more to come. These 4+ years have been critical to orchestrator's development and saw its widespread use. At this time I'm aware of multiple large-scale organizations using orchestrator for high availability and failovers. Some of these are GitHub, Booking.com, Shopify, Slack, Wix, Outbrain, and more. orchestrator is the underlying failover mechanism for vitess, and is also included in Percona's PMM. These years saw a significant increase in community adoption and contributions, in published content, such as Pythian and Percona technical blog posts, and, not surprisingly, increase in issues and feature requests.


2020

GitHub was very kind to support moving the orchestrator repo under my own https://github.com/openark org. This means all issues, pull requests, releases, forks, stars and watchers have automatically transferred to the new location: https://github.com/openark/orchestrator. The old links do a "follow me" and implicitly direct to the new location. All external links to code and docs still work. I'm grateful to GitHub for supporting this transfer.

I'd like to thank all the above companies for their support of orchestrator and of open source in general. Being able to work on the same product throughout three different companies is mind blowing and an incredible opportunity. orchestrator of course remains open source and licensed with Apache 2. Existing Copyrights are unchanged.

As for what's next: some personal time off, please understand if there's delays to reviews/answers. My intention is to continue developing orchestrator. Naturally, the shape of future development depends on how orchestrator meets my future work. Nothing changes in that respect: my focus on orchestrator has always been first and foremost the pressing business needs, and then community support as possible. There are some interesting ideas by prominent orchestrator users and adopters and I'll share more thoughts in due time.

 

by shlomi at February 18, 2020 08:09 AM

February 17, 2020

Oli Sennhauser

FromDual Ops Center for MariaDB and MySQL 0.9.3 has been released

FromDual has the pleasure to announce the release of the new version 0.9.3 of its popular FromDual Ops Center focmm, a Graphical User Interface (GUI) for MariaDB and MySQL.

The FromDual Ops Center for MariaDB and MySQL (focmm) helps DBA's and System Administrators to better manage their MariaDB and MySQL database farms. Ops Center makes DBA and Admins life easier!

The main task of Ops Center is to support you in your daily MySQL and MariaDB operation tasks. More information about FromDual Ops Center you can find here.

Download

The new FromDual Ops Center for MariaDB and MySQL (focmm) can be downloaded from here. How to install and use focmm is documented in the Ops Center User Guide.

In the inconceivable case that you find a bug in the FromDual Ops Center for MariaDB and MySQL please report it to the FromDual bug tracker or just send us an email.

Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.

Installation of Ops Center 0.9.3

A complete guide on how to install FromDual Ops Center you can find in the Ops Center User Guide.

Upgrade from 0.9.x to 0.9.3

Upgrade from 0.9.x to 0.9.3 should happen automatically. Please do a backup of your Ops Center Instance before you upgrade! Please also check Upgrading.

Changes in Ops Center 0.9.3

Machine

  • Machine without a usergroup is not allowed, fixed.
  • Machine name number is incremented by one if server already exists.
  • Delete machine fixed by setting server_id on instance to null.
  • Refresh repo automatization added.
  • Monitoring link added to menu.
  • Machine performance graphs added to menu.

Instance

  • Node renamed to Instance.
  • Create instance for CentOS and Ubuntu integrated.
  • Instance version comment was too long for PXC. Fixed.
  • Data Dictionary version added.
  • Special case for error log to syslog added (MariaDB packages on CentOS).
  • performance_schema last seen queries added.
  • Instance show overview made nicer.
  • Bug in instance check fixed.
  • Generate password improved thus bad special characters are not suggested any more.
  • Instance edit, eye added and field length shortened.
  • Instance name checking improved for creating and add.
  • Various minor bugs fixed.
  • Monitoring link added to menu.

Cluster

  • Cluster is clickable now in instance overview.
  • Minor Cluster bugs fixed.
  • Galera cluster added.
  • Master/Slave replication made smoother.

Load-Balancer

  • HAproxy and glb load-balancer added.

Tools

  • Jobs: Error logging improved to get more info about aborted jobs.
  • Crontab: Run job now icon added.
  • Schema compare: Schema drop-down sorted ascending and related code cleaned and refactored.

Configuration

  • Crontab: start_jobs.php was removed from crontab and is now started by run_crontab.php.

Database-as-a-Service (DBaaS)

  • Pricing plan added.
  • Database pricing added.
  • Machine cost added.
  • Resource cost included into job structure.

Building and Packaging

  • Installer: Repository installation hint added to installer.
  • Upgrade: Table fixed for MySQL 5.7.
  • Packaging: Session folder included into packaging.
  • Packaging: DEB and RPM improved for Upgrade.

Themes / UI

  • Default theme made a bit nicer.
  • Link more visible in default theme.

General

  • Disable license warning.
  • Link to fromdual license information added.
  • Jquery upgraded from 1.12.0 to 1.12.1.
  • http authentication brought to Apache version 2.4.
  • Session store changed because we loose very often ours sessions.
  • Session path also for all frag adapted.
  • Function implode syntax made compatible with newer PHP versions.
  • Minor typos fixed.
  • Minor errors fixed.

by Shinguz at February 17, 2020 03:37 PM

SeveralNines

Migrating PostgreSQL to the Cloud - Comparing Solutions from Amazon, Google & Microsoft

From a bird’s eye view, it would appear that when it comes to migrating the PostgreSQL workloads into the cloud, the choice of cloud provider should make no difference. Out of the box, PostgreSQL makes it easy to replicate data, with no downtime, via Logical Replication, although with some restrictions. In order to make their service offering more attractive cloud providers may work out some of those restrictions. As we start thinking about differences in the available PostgreSQL versions, compatibility, limits, limitations, and performance, it becomes clear that the migration services are key factors in the overall service offering. It isn’t any longer a case of “we offer it, we migrate it”. It’s become more like “we offer it, we migrate it, with the least limitations”.

Migration is important to small and large organizations alike. It is not as much about the size of the PostgreSQL cluster, as it is about the acceptable downtime and post-migration effort.

Selecting a Strategy

The migration strategy should take into consideration the size of the database, the network link between the source and the target, as well as the migration tools offered by the cloud provider.

Hardware or Software?

Just as mailing USB keys, and DVDs back in the early days of Internet, in cases where the network bandwidth isn’t enough for transferring data at the desired speed, cloud providers are offering hardware solutions, able to carry up to hundreds of petabytes of data. Below are the current solutions from each of the big three:

A handy table provided by Google showing the available options:

GCP migration options

GCP appliance is Transfer Appliance

A similar recommendation from Azure based on the data size vs network bandwidth:

Azure migration options

Azure appliance is Data box

Towards the end of its data migrations page, AWS provides a glimpse of what we can expect, along with their recommendation of the solution:

AWS migration choices: managed or unmanaged.

In cases where the database sizes exceed 100GB and limited network bandwidth AWS suggest a hardware solution.

AWS appliance is Snowball Edge

Data Export/Import

Organizations that tolerate downtime, can benefit from the simplicity of common tools provided by PostgreSQL out of the box. However, when migrating data from one cloud (or hosting) provider to another cloud provider, beware of the egress cost.

AWS

For testing the migrations I used a local installation of my Nextcloud database running on one of my home network servers:

postgres=# select pg_size_pretty(pg_database_size('nextcloud_prod'));

pg_size_pretty

----------------

58 MB

(1 row)



nextcloud_prod=# \dt

                     List of relations

Schema |             Name | Type  | Owner

--------+-------------------------------+-------+-----------

public | awsdms_ddl_audit              | table | s9sdemo

public | oc_accounts                   | table | nextcloud

public | oc_activity                   | table | nextcloud

public | oc_activity_mq                | table | nextcloud

public | oc_addressbookchanges         | table | nextcloud

public | oc_addressbooks               | table | nextcloud

public | oc_appconfig                  | table | nextcloud

public | oc_authtoken                  | table | nextcloud

public | oc_bruteforce_attempts        | table | nextcloud

public | oc_calendar_invitations       | table | nextcloud

public | oc_calendar_reminders         | table | nextcloud

public | oc_calendar_resources         | table | nextcloud

public | oc_calendar_resources_md      | table | nextcloud

public | oc_calendar_rooms             | table | nextcloud

public | oc_calendar_rooms_md          | table | nextcloud

...

public | oc_termsofservice_terms       | table | nextcloud

public | oc_text_documents             | table | nextcloud

public | oc_text_sessions              | table | nextcloud

public | oc_text_steps                 | table | nextcloud

public | oc_trusted_servers            | table | nextcloud

public | oc_twofactor_backupcodes      | table | nextcloud

public | oc_twofactor_providers        | table | nextcloud

public | oc_users                      | table | nextcloud

public | oc_vcategory                  | table | nextcloud

public | oc_vcategory_to_object        | table | nextcloud

public | oc_whats_new                  | table | nextcloud

(84 rows)

The database is running PostgreSQL version 11.5:

postgres=# select version();

                                                version

------------------------------------------------------------------------------------------------------------

PostgreSQL 11.5 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), 64-bit

(1 row)

I have also created a PostgreSQL user to be used by AWS DMS which is Amazon’s service for importing PostgreSQL into Amazon RDS:

postgres=# \du s9sdemo

            List of roles

Role name | Attributes |  Member of

-----------+------------+-------------

s9sdemo   |   | {nextcloud}

AWS DMS provides many advantages, just as we’d expect from a managed solution in the cloud: 

  • auto-scaling (storage only, as compute instance must be right sized)
  •  automatic provisioning
  •  pay-as-you-go model
  •  automatic failover

However, maintaining data consistency for a live database is a best effort. A 100% consistency is achieved only when the database is in read-only mode — that is a consequence of how table changes are captured.

In other words, tables have a different point-in-time cutover:

AWS DMS: tables have different point in time cutover.

Just as with everything in the cloud, there is a cost associated with the migration service.

In order to create the migration environment, follow the Getting Started guide to setup a replication instance, a source, a target endpoint, and one or more tasks.

Replication Instance

Creating the replication instance is straightforward to anyone familiar with EC2 instances on AWS:

The only change from the defaults was in selecting AWS DMS 3.3.0 or later due to my local PostgreSQL engine being 11.5:

AWS DMS: Supported PostgreSQL versions.

And here’s the list of currently available AWS DMS versions:

Current AWS DMS versions.

Large installations should also take note of the AWS DMS Limits:

AWS DMS limits.

There is also a set of limitations that are a consequence of PostgreSQL logical replication restrictions. For example, AWS DMS will not migrate secondary objects:

AWS DMS: secondary objects are not migrated.

It is worth mentioning that in PostgreSQL all indexes are secondary indexes, and that is not a bad thing, as noted in this more detailed discussion.

Source Endpoint

Follow the wizard to create the Source Endpoint:

AWS DMS: Source Endpoint configuration.

In the setup scenario Configuration for a Network to a VPC Using the Internet my home network required a few tweaks in order to allow the source endpoint IP address to access my internal server. First, I created a port forwarding on the edge router (173.180.222.170) to sent traffic on port 30485 to my internal gateway (10.11.11.241) on port 5432 where I can fine tune access based on the source IP address via iptables rules. From there, network traffic flows through an SSH tunnel to the web server running the PostgreSQL database. With the described configuration the client_addr in the output of pg_stat_activity will show up as 127.0.0.1.

Before allowing incoming traffic, iptables logs show 12 attempts from replication instance at ip=3.227.167.58):

Jan 19 17:35:28 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=23973 DF PROTO=TCP SPT=54662 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:29 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=23974 DF PROTO=TCP SPT=54662 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:31 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=23975 DF PROTO=TCP SPT=54662 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:35 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=23976 DF PROTO=TCP SPT=54662 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:48 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=4328 DF PROTO=TCP SPT=54667 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:49 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=4329 DF PROTO=TCP SPT=54667 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:51 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=4330 DF PROTO=TCP SPT=54667 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:55 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=4331 DF PROTO=TCP SPT=54667 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:36:08 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=8298 DF PROTO=TCP SPT=54670 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:36:09 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=8299 DF PROTO=TCP SPT=54670 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:36:11 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=8300 DF PROTO=TCP SPT=54670 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:36:16 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=8301 DF PROTO=TCP SPT=54670 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Once allowing the source endpoint IP address (3.227.167.58) the connection test succeed and the source endpoint configuration is complete. We also have an SSL connection in order to encrypt the traffic through public networks. This can be confirmed on the PostgreSQL server using the query below as well as in the AWS console:

postgres=# SELECT datname, usename, client_addr, ssl, cipher, query, query_start FROM pg_stat_activity a, pg_stat_ssl s where a.pid=s.pid and usename = 's9sdemo';

datname | usename | client_addr | ssl | cipher | query | query_start

---------+---------+-------------+-----+--------+-------+-------------

(0 rows)

…and then watch while running the connection from the AWS console. The results should looks similar to the following:

postgres=# \watch



                                                                           Sun 19 Jan 2020 06:50:51 PM PST (every 2s)



    datname     | usename | client_addr | ssl |           cipher |                 query | query_start

----------------+---------+-------------+-----+-----------------------------+------------------------------------------------------------------------------------+-------------------------------

 nextcloud_prod | s9sdemo | 127.0.0.1   | t | ECDHE-RSA-AES256-GCM-SHA384 | select cast(setting as integer) from pg_settings where name = 'server_version_num' | 2020-01-19 18:50:51.463496-08

(1 row)

…while AWS console should report a success:

AWS DMS: Source Endpoint connection test successful.

As indicated in the prerequisites section, if we choose the migration option Full load, ongoing replication, we will need to alter the permissions for the PostgreSQL user. This migration option requires superuser privileges, therefore I adjusted the settings for the PostgreSQL user created earlier:

nextcloud_prod=# \du s9sdemo

         List of roles

Role name | Attributes | Member of

-----------+------------+-----------

s9sdemo   | Superuser  | {}

The same document contains instructions for modifying postgresql.conf. Here’s a diff from the original one:

--- a/var/lib/pgsql/data/postgresql.conf

+++ b/var/lib/pgsql/data/postgresql.conf

@@ -95,7 +95,7 @@ max_connections = 100                 # (change requires restart)



# - SSL -



-#ssl = off

+ssl = on

#ssl_ca_file = ''

#ssl_cert_file = 'server.crt'

#ssl_crl_file = ''

@@ -181,6 +181,7 @@ dynamic_shared_memory_type = posix  # the default is the first option



# - Settings -



+wal_level = logical

#wal_level = replica                   # minimal, replica, or logical

                                       # (change requires restart)

#fsync = on                            # flush data to disk for crash safety

@@ -239,6 +240,7 @@ min_wal_size = 80MB

#max_wal_senders = 10          # max number of walsender processes

                              # (change requires restart)

#wal_keep_segments = 0         # in logfile segments; 0 disables

+wal_sender_timeout = 0

#wal_sender_timeout = 60s      # in milliseconds; 0 disables



#max_replication_slots = 10    # max number of replication slots

@@ -451,6 +453,7 @@ log_rotation_size = 0                       # Automatic rotation of logfiles will

#log_duration = off

#log_error_verbosity = default         # terse, default, or verbose messages

Lastly, don’t forget to adjust the pg_hba.conf settings in order to allow SSL connection from the replication instance IP address.

We are now ready for the next step.

Target Endpoint

Follow the wizard to create the Target Endpoint:

AWS DMS: Target Endpoint configuration.

This step assumes that the RDS instance with the specified endpoint already exists along with the empty database nextcloud_awsdms. The database can be created during the RDS instance setup.

At this point, if the AWS networking is correctly setup, we should be ready to run the connection test:

AWS DMS: Target Endpoint connection test successful.

With the environment in place, it is now time to create the migration task:

Migration Task

Once the wizard completed the configuration looks like this:

AWS DMS: Migration Task configuratoin - part 1.

...and the second part of the same view:

AWS DMS: Migration Task configuration - part 2.

Once the task is started we can monitor the progress —open up the task details and scroll down to Table Statistics:

AWS DMS: Table Statistics for running tasks.

 AWS DMS is using the cached schema in order to migrate the database tables. While migration progresses, we can continue “watching” the queries on the source database, and the PostgreSQL error log, in addition to the AWS console:

psql: `\watch'-ing the AWS DMS queries.

In case of errors, the failure state is displayed in the console:

AWS DMS: failed task display.

One place to look for clues is CloudWatch, although during my tests the logs didn’t end up being published, which could likely be just another glitch in the beta version of the AWS DMS 3.3.0 as it turned out to be towards the end of this exercise:

AWS DMS: logs not published to CloudWatch - 3.3.0 beta version glitch?

The migration progress is nicely displayed in the AWS DMS console:

AWS DMS: migration progress displayed in console.

Once the migration is complete, reviewing one more time, the PostgreSQL error log, reveals a surprising message:

PostgreSQL error log: relhaspkey error - another AWS DMS 3.3.0 beta version glitch?

What seems to happen, is that in PostgreSQL 9.6, 10 the pg_class table contains the named column relhaspkey, but that’s not the case in 11. And that’s the glitch in beta version of AWS DMS 3.3.0 that I was referring to earlier.

GCP

Google’s approach is based on the open source tool PgBouncer. The excitement was short lived, as the official documentation talks about migrating PostgreSQL into a compute engine environment.

Further attempts to find a migration solution to Cloud SQL that resembles AWS DMS failed. The Database migration strategies contain no reference to PostgreSQL:

GCP: migrating to Cloud SQL - not available for PostgreSQL.

On-prem PostgreSQL installations can be migrated to Cloud SQL by using the services of one of the Google Cloud partners.

A potential solution may be PgBouncer to Cloud SQL, but that is not within the scope of this blog.

Microsoft Cloud Services (Azure)

In order to facilitate the migration of PostgreSQL workloads from on-prem to the managed Azure Database for PostgreSQL Microsoft provides Azure DMS which according to documentation can be used to migrate with minimal downtime. The tutorial Migrate PostgreSQL to Azure Database for PostgreSQL online using DMS describes these steps in detail.

The Azure DMS documentation discusses in great detail the issues and limitations associated with migrating the PostgreSQL workloads into Azure.

One notable difference from AWS DMS is the requirement to manually create the schema:

Azure DMS: schema must be migrated manually.

A demo of this will be the topic of a future blog. Stay tuned.

 

by Viorel Tabara at February 17, 2020 10:45 AM

February 14, 2020

MariaDB Foundation

MariaDB 10.5.1 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.5.1, the first beta release in the MariaDB 10.5 development series.
See the release notes and changelogs for details. […]

The post MariaDB 10.5.1 now available appeared first on MariaDB.org.

by Ian Gilfillan at February 14, 2020 05:25 PM

SeveralNines

What's New in MongoDB 4.2

Database updates come with improved features for performance, security, and with new integrated features. It is always advisable to test a new version before deploying it into production, just to ensure that it suits your needs and there is no possibility of crashes. 

Considering many products, those preceding the first minor versions of a new major version have the most important fixes. For instance I would prefer to have MongoDB version 4.2.1 in production a few days after release than I would for version 4.2.0. 

In this blog we are going to discuss what has been included and what improvements have been made to MongoDB version 4.2

What’s New in MongoDB 4.2

  1. Distributed transactions
  2. Wildcard indexes
  3. Retryable reads and writes
  4. Automatic Client-Side Field-level encryption.
  5. Improved query language for expressive updates
  6. On-demand materialized Views
  7. Modern maintenance operations

Distributed Transactions

Transactions are important database features that ensure data consistency and integrity especially those that guarantees the ACID procedures. MongoDB version 4.2  now supports multi-document transactions on replica sets and a sharded cluster through the distributed transaction approach. The same syntax for using transactions has been maintained as the previous 4.0 version.

However, the client driver specs have changed a bit hence if one intends to use transactions in MongoDB 4.2, you must  upgrade the drivers to versions that are compatible with 4.2 servers.

This version does not limit the size of a transaction in terms of memory usage but only dependent on the size of your hardware and the hardware handling capability.

Global cluster locale reassignment is now possible with version 4.2. This is to say, for a geo zone sharding implementation, if a user residing in region A moves to region B, by changing the value of their location field, the data can be automatically moved from region A to B through a transaction.

The sharding system now allows one to change a shard key contrary to the previous version. Literally, when a shard key is changed, it is an equivalent to moving the document to another shard. In this version, MongoDB wraps this update and if the document needs to be moved from one shard to another, the update will be executed inside a transaction in a background manner.

Using transactions is not an advisable approach since they degrade database performance especially if they occur multiple times. During a transaction period,  there is a stretched window for operations that may cause conflicts when making writes to a document that is affected. As much as a transaction can be retried, there might be an update made to the document before this retry, and whenever the retry happens, it may deal with the old rather than the latest document version. Retries obviously exert more processing cost besides increasing the application downtime through growing latency. 

A good practice around using transactions include:

  1. Avoid using unindexed queries inside a transaction as a way of ensuring the Op will not be slow.
  2. Your transaction should involve a few documents.

With the MongoDB dynamic schema format and embedding feature, you can opt to put all fields in the same collection to avoid the need to use transaction as a first measure.

Wildcard Indexes

Wildcard indexes were introduced in MongoDB version 4.2 to enhance queries against arbitrary fields or fields whose names are not known in advance, by indexing the entire document or subdocument.They are not intended to replace the workload based indexes but suit working with data involving polymorphic pattern. A polymorphic pattern is where all documents in a collection are similar but not having an identical structure. Polymorphic data patterns can be generated from application involving, product catalogs or social data. Below is an example of Polymorphic collection data

{

Sport: ‘Chess’,

playerName: ‘John Mah’,

Career_earning: {amount: NumberDecimal(“3000”), currency: “EUR”},

gamesPlayed:25,

career_titles:10

},

{

Sport: Tennis,

playerName: ‘Semenya Jones,

Career_earning: {amount: NumberDecimal(“34545”), currency: “USD”},

Event: {

name:”Olympics”,

career_titles:10,

career_tournaments:14

}

By indexing the entire document using Wildcard indexes, you can make a query using any arbitrary field as an index.

To create a Wildcard index

$db.collection.createIndex({“fieldA.$**”: 1})

If the selected field is a nested document or an array, the Wildcard index recurses into the document and stores the value for all the fields in the document or array.

Retryable Reads and Writes

Normally a database may incur some frequent network transient outages that may result in a query being partially or fully unexecuted. These network errors may not be that serious hence offer a chance for a retry of these queries once reconnected. Starting with MongoDB 4.2, the retry configuration is enabled by default. The MongoDB drivers can retry failed reads and writes for certain transactions whenever they encounter minor network errors or rather when they are unable to find some healthy primary in the sharded cluster/ replica set. However, if you don’t want the retryable writes, you can explicitly disable them in your configurations but I don’t find a compelling reason why one should disable them.

This feature is to ensure that in any case MongoDB infrastructure changes, the application code shouldn’t be affected. Regarding an example explained by Eliot Horowitz, the Co-founder of MongoDB, for a webpage that does 20 different database operations, instead of reloading the entire thing or having to wrap the entire web page in some sort of loop, the driver under the covers can just decide to retry the operation. Whenever a write fails, it will retry automatically and will have a contract with the server to guarantee that every write happens only once.

The retryable writes only makes a single retry attempt which helps to address replica set elections and  transient network errors but not for persistent network errors.

Retryable writes do not address instances where the failover period exceeds serverSelectionTimoutMs value in the parameter configurations.

With this MongoDB version, one can update document shard key values (except if the shardkey is the immutable _id field) by issuing a single-document findAndModify/update operations either in a transaction or as a retryable write.

MongoDB version 4.2 can now retry a single-document upsert operation (i.e upsert: true and multi: false) which may have failed because of duplicate key error if the operation meets these key conditions:

  1. Target collection contains a unique index that made the duplicate key error.
  2. The update operation will not modify any of the fields in the query predicate.
  3. The update match condition is either a single equality predicate {field: “value”} or a logical AND of equality predicates {filed: “value”, field0: “value0”}
  4. Set of fields in the unique index key pattern matches the set of fields in the update query predicate.

Automatic Client-Side Field-level Encryption

MongoDB version 4.2 comes with the Automatic Client-Side Field Level encryption (CSFLE), a feature that allows  developers to selectively encrypt individual fields of a document on the client side before it is sent to the server. The encrypted data is thus kept private from the providers hosting the database and any user that may have direct access to the database.

Only applications with the access to the correct encryption keys can decrypt and read the protected data. In case the encryption key is deleted, all data that was encrypted will be rendered unreadable.

Note: this feature is only available with MongoDB enterprise only.

Improved query language for expressive updates

MongoDB version 4.2 provides a richer query language than its predecessors. It now supports aggregations and modern use-case operations along the lines of geo-based searches, graph search and text search. It has integrated a third-party search engine which makes searches faster considering that the search engine is running on a different process/server. This generally improves on database performance contrary to if all searches were made to the mongod process which would rather make the database operation latency volatile whenever the search engine reindexes.

With this version, you can now handle arrays, do sums and other maths operations directly through an update statement.

On-Demand Materialized Views

The data aggregation pipeline framework in MongoDB is a great feature with different stages of transforming a document to some desired state. The MongoDB version 4.2 introduces a new stage $merge which for me I will say it saved me some time working with the final output that needed to be stored in a collection. Initially, the $out stage allows creating a new collection based on aggregation and populates the collection with the results obtained. If the collection already exist, it would overwrite the collection with the new results contrary to the $merge stage which only incorporates the pipeline results into an existing output rather than fully replacing the collection. Regenerating an entire collection everytime with the $out stage consumes a lot of CPU and IO which may degrade the database performance. The output content will therefore be timely updated enabling users to create on-demand materialized views

Modern Maintenance Operations

Developers can now have a great operational experience with MongoDB version 4.2 with integrated features that enhance high availability, cloud managed backup strategy,  improve the monitoring power and alerting systems. MongoDB Atlas and MongoDB Ops Manager are the providing platforms for these features. The latter has been labeled as the best for running MongoDB on-enterprise. It has also been integrated with Kubernetes operator for on-premise users who are moving to private cloud. This interface enables one to directly control Ops Manager.

There are some internal changes made to MongoDB version 4.2 which include:

  1. Listing open cursors.
  2. Removal of the MMAPv1 storage engine.
  3. Improvement on the WiredTiger data file repair.
  4. Diagnostic fields can now have queryHash
  5. Auto-splitting thread for mongos nodes has been removed.

Conclusion

MongoDB version 4.2 comes with some improvements along the lines of security and database performance. It has included an Automatic Client-Side Field Level Encryption that ensures data is protected from the client angle. More features like a Third party search engine and inclusion of the $merge stage in the aggregation framework render some improvement in the database performance. Before putting this version in production, please ensure that all your needs are fully addressed.

by Onyancha Brian Henry at February 14, 2020 10:45 AM

February 13, 2020

SeveralNines

Steps to Take if You Have a MySQL Outage

A MySQL outage simply means your MySQL service is not accessible or unresponsive from the other's perspective. Outages can be originated by a bunch of possible causes..

  • Network issue - Connectivity issue, switch, routing, resolver, load-balancer tier.
  • Resource issue - Whether you have reached resources limit or bottleneck.
  • Misconfiguration - Wrong permission or ownership, unknown variable, wrong password, privilege changed.
  • Locking - Global or table lock prevent others from accessing the data.

In this blog post, we’ll look at some steps to take if you’re having a MySQL outage (Linux environment).

Step One: Get the Error Code

When you have an outage, your application will throw out some errors and exceptions. These errors commonly come with an error code, that will give you a rough idea on what you’re facing and what to do next to troubleshoot the issue and recover the outage. 

To get more details on the error, check the MySQL Error Code or MariaDB Error Code pages respectively to figure out what the error means.

Step Two: Is the MySQL Server Running?

Log into the server via terminal and see if MySQL daemon is running and listening to the correct port. In Linux, one would do the following:

Firstly, check the MySQL process:

$ ps -ef | grep -i mysql

You should get something in return. Otherwise, MySQL is not running. If MySQL is not running, try to start it up:

$ systemctl start mysql # systemd

$ service mysql start # sysvinit/upstart

$ mysqld_safe # manual

If you are seeing an error on the above step, you should go look at the MySQL error log, which varies depending on the operating system and MySQL variable configuration for log_error in MySQL configuration file. For RedHat-based server, the file is commonly located at:

$ cat /var/log/mysqld.log

Pay attention to the most recent lines with log level "[Error]". Some lines labelled with "[Warning]" could indicate some problems, but those are pretty uncommon. Most of the time, misconfiguration and resource issues can be detected from here.

If MySQL is running, check whether it's listening to the correct port:

$ netstat -tulpn | grep -i mysql

tcp6       0 0 :::3306                 :::* LISTEN   1089/mysqld

You would get the process name "mysqld", listening on all interfaces (:::3306 or 0.0.0.0:3306) on port 3306 with PID 1089 and the state is "LISTEN". If you see the above line shows 127.0.0.1:3306, MySQL is only listening locally. You might need to change the bind_address value in MySQL configuration file to listen to all IP addresses, or simply comment on the line. 

Step Three: Check for Connectivity Issues

If the MySQL server is running fine without error inside the MySQL error log, the chance that connectivity issues are happening is pretty high. Start by checking connectivity to the host via ping (if ICMP is enabled) and telnet to the MySQL server from the application server:

(application-server)$ ping db1.mydomain.com

(application-server)$ telnet db1.mydomain.com 3306

Trying db1.mydomain.com...

Connected to 192.168.0.16.

Escape character is '^]'.

O

5.6.46-86.2sN&nz9NZ�32?&>H,EV`_;mysql_native_password

You should see some lines in the telnet output if you can get connected to the MySQL port. Now, try once more by using MySQL client from the application server:

(application-server)$ mysql -u db_user -p -h db1.mydomain.com -P3306

ERROR 1045 (28000): Access denied for user 'db_user'@'db1.mydomain.com' (using password: YES)

In the above example, the error gives us a bit of information on what to do next. The above probably because someone has changed the password for "db_user" or the password for this user has expired. This is a rather normal behaviour from MySQL 5.7. 4 and above, where the automatic password expiration policy is enabled by default with a 360 days threshold - meaning that all passwords will expire once a year.

Step Four: Check the MySQL Processlist

If MySQL is running fine without connectivity issues, check the MySQL process list to see what processes are currently running:

mysql> SHOW FULL PROCESSLIST;

+-----+------+-----------+------+---------+------+-------+-----------------------+-----------+---------------+

| Id  | User | Host      | db | Command | Time | State | Info                  | Rows_sent | Rows_examined |

+-----+------+-----------+------+---------+------+-------+-----------------------+-----------+---------------+

| 117 | root | localhost | NULL | Query   | 0 | init | SHOW FULL PROCESSLIST |       0 | 0 |

+-----+------+-----------+------+---------+------+-------+-----------------------+-----------+---------------+

1 row in set (0.01 sec)

Pay attention to the Info and Time column. Some MySQL operations could be destructive enough to make the database stalls and become unresponsive. The following SQL statements, if running, could block others to access the database or table (which could bring a brief outage of MySQL service from the application perspective):

  • FLUSH TABLES WITH READ LOCK
  • LOCK TABLE ...
  • ALTER TABLE ...

Some long running transactions could also stall others, which eventually would cause timeouts to other transactions waiting to access the same resources. You may either kill the offensive transaction to let others access the same rows or retry the enqueue transactions after the long transaction finishes.

Conclusion

Proactive monitoring is really important to minimize the risk of MySQL outage. If your database is managed by ClusterControl, all the mentioned aspects are being monitored automatically without any additional configuration from the user. You shall receive alarms in your inbox for anomaly detections like long running queries, server misconfiguration, resource exceeding threshold and many more. Plus, ClusterControl will automatically attempt to recover your database service if something goes wrong with the host or network.

You can also learn more about MySQL & MariaDB Disaster Recovery by reading our whitepaper.

by ashraf at February 13, 2020 10:45 AM

February 12, 2020

SeveralNines

What to Look for if Your MySQL Replication is Lagging

A master/slave replication cluster setup is a common use case in most organizations. Using MySQL Replication enables your data to be replicated across different environments and guarantees that the information gets copied. It is asynchronous and single-threaded (by default), but replication also allows you to configure it to be synchronous (or actually “semi-synchronous”) and can run slave thread to multiple threads or in parallels.

This idea is very common and usually arrives with a simple setup, making its slave serving as its recovery or for backup solutions. However, this always comes to a price especially when bad queries (such as lack of primary or unique keys) are replicated or some trouble with the hardware (such as network or disk IO issues). When these issues occur, the most common problem to face is the replication lag. 

A replication lag is the cost of delay for transaction(s) or operation(s) calculated by its time difference of execution between the primary/master against the standby/slave node. The most certain cases in MySQL relies on bad queries being replicated such as lack of primary keys or bad indexes, a poor network hardware or malfunctioning network card, a distant location between different regions or zones, or some processes such as physical backups running can cause your MySQL database to delay applying the current replicated transaction. This is a very common case when diagnosing these issues. In this blog, we'll check how to deal with these cases and what to look if you are experiencing MySQL replication lag.

The "SHOW SLAVE STATUS": The MySQL DBA's Mantra

In some cases, this is the silver bullet when dealing with replication lag and it reveals mostly everything the cause of an issue in your MySQL database. Simply run this SQL statement in your slave node that is suspected experiencing a replication lag. 

The initial fields that are common to trace for problems are,

  • Slave_IO_State - It tells you what the thread is doing. This field will provide you good insights if the replication health is running normally, facing network problems such as reconnecting to a master, or taking too much time to commit data which can indicate disk problems when syncing data to disk. You can also determine this state value when running SHOW PROCESSLIST.
  • Master_Log_File -  Master's binlog file name where the I/O thread is currently fetch.
  • Read_Master_Log_Pos - binlog file position from the master where the replication I/O thread has already read.
  • Relay_Log_File - the relay log filename for which the SQL thread is currently executing the events
  • Relay_Log_Pos - binlog position from the file specified in Relay_Log_File for which SQL thread has already executed.
  • Relay_Master_Log_File - The master's binlog file that the SQL thread has already executed and is a congruent to  Read_Master_Log_Pos value.
  • Seconds_Behind_Master -  this field shows an approximation for difference between the current timestamp on the slave against the timestamp on the master for the event currently being processed on the slave. However, this field might not be able to tell you the exact lag if the network is slow because the difference in seconds are taken between the slave SQL thread and the slave I/O thread. So there can be cases that it can be caught up with slow-reading slave I/O thread, but i master it's already different.
  • Slave_SQL_Running_State - state of the SQL thread and the value is identical to the state value displayed in SHOW PROCESSLIST.
  • Retrieved_Gtid_Set - Available when using GTID replication. This is the set of GTID's corresponding to all transactions received by this slave. 
  • Executed_Gtid_Set - Available when using GTID replication. It's the set of GTID's written in the binary log.

For example, let's take the example below which uses a GTID replication and is experiencing a replication lag:

mysql> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.10.70

                  Master_User: cmon_replication

                  Master_Port: 3306

                Connect_Retry: 10

              Master_Log_File: binlog.000038

          Read_Master_Log_Pos: 826608419

               Relay_Log_File: relay-bin.000004

                Relay_Log_Pos: 468413927

        Relay_Master_Log_File: binlog.000038

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB: 

          Replicate_Ignore_DB: 

           Replicate_Do_Table: 

       Replicate_Ignore_Table: 

      Replicate_Wild_Do_Table: 

  Replicate_Wild_Ignore_Table: 

                   Last_Errno: 0

                   Last_Error: 

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 826608206

              Relay_Log_Space: 826607743

              Until_Condition: None

               Until_Log_File: 

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File: 

           Master_SSL_CA_Path: 

              Master_SSL_Cert: 

            Master_SSL_Cipher: 

               Master_SSL_Key: 

        Seconds_Behind_Master: 251

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error: 

               Last_SQL_Errno: 0

               Last_SQL_Error: 

  Replicate_Ignore_Server_Ids: 

             Master_Server_Id: 45003

                  Master_UUID: 36272880-a7b0-11e9-9ca6-525400cae48b

             Master_Info_File: mysql.slave_master_info

                    SQL_Delay: 0

          SQL_Remaining_Delay: NULL

      Slave_SQL_Running_State: copy to tmp table

           Master_Retry_Count: 86400

                  Master_Bind: 

      Last_IO_Error_Timestamp: 

     Last_SQL_Error_Timestamp: 

               Master_SSL_Crl: 

           Master_SSL_Crlpath: 

           Retrieved_Gtid_Set: 36272880-a7b0-11e9-9ca6-525400cae48b:7631-9192

            Executed_Gtid_Set: 36272880-a7b0-11e9-9ca6-525400cae48b:1-9191,

864dd532-a7af-11e9-85f2-525400cae48b:1-173,

df68c807-a7af-11e9-9b56-525400cae48b:1-4

                Auto_Position: 1

         Replicate_Rewrite_DB: 

                 Channel_Name: 

           Master_TLS_Version: 

1 row in set (0.00 sec)

Diagnosing issues like this, mysqlbinlog can also be your tool to identify what query has been running on a specific binlog x & y position. To determine this, let's take the Retrieved_Gtid_Set, Relay_Log_Pos, and the Relay_Log_File. See the command below:

[root@testnode5 mysql]# mysqlbinlog --base64-output=DECODE-ROWS --include-gtids="36272880-a7b0-11e9-9ca6-525400cae48b:9192" --start-position=468413927 -vvv relay-bin.000004

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;

/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;

DELIMITER /*!*/;

# at 468413927

#200206  4:36:14 server id 45003  end_log_pos 826608271 CRC32 0xc702eb4c        GTID last_committed=1562 sequence_number=1563    rbr_only=no

SET @@SESSION.GTID_NEXT= '36272880-a7b0-11e9-9ca6-525400cae48b:9192'/*!*/;

# at 468413992

#200206  4:36:14 server id 45003  end_log_pos 826608419 CRC32 0xe041ec2c        Query thread_id=24 exec_time=31 error_code=0

use `jbmrcd_date`/*!*/;

SET TIMESTAMP=1580963774/*!*/;

SET @@session.pseudo_thread_id=24/*!*/;

SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;

SET @@session.sql_mode=1436549152/*!*/;

SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;

/*!\C utf8 *//*!*/;

SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;

SET @@session.lc_time_names=0/*!*/;

SET @@session.collation_database=DEFAULT/*!*/;

ALTER TABLE NewAddressCode ADD INDEX PostalCode(PostalCode)

/*!*/;

SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;

DELIMITER ;

# End of log file

/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

It tells us that it was trying to replicate and execute a DML statement which tries to be the source of the lag. This table is a huge table containing 13M of rows. 

Check SHOW PROCESSLIST, SHOW ENGINE INNODB STATUS, with ps, top, iostat command combination

In some cases, SHOW SLAVE STATUS is not enough to tell us the culprit. It's possible that the replicated statements are affected by internal processes running in the MySQL database slave. Running the statements SHOW [FULL] PROCESSLIST and SHOW ENGINE INNODB STATUS also provides informative data that gives you insights about the source of the problem. 

For example, let's say a benchmarking tool is running causing to saturate the disk IO and CPU. You can check by running both SQL statements. Combine it with ps and top commands. 

You can also determine bottlenecks with your disk storage by running iostat which provides statistics of the current volume you are trying to diagnose. Running iostat can show how busy or loaded your server is. For example, taken by a slave which is lagging but also experiencing high IO utilization at the same time, 

[root@testnode5 ~]# iostat -d -x 10 10

Linux 3.10.0-693.5.2.el7.x86_64 (testnode5)     02/06/2020 _x86_64_ (2 CPU)



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.42 3.71   60.65 218.92 568.39   24.47 0.15 2.31 13.79    1.61 0.12 0.76

dm-0              0.00 0.00 3.70   60.48 218.73 568.33   24.53 0.15 2.36 13.85    1.66 0.12 0.76

dm-1              0.00 0.00 0.00    0.00 0.04 0.01 21.92     0.00 63.29 2.37 96.59 22.64   0.01



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.20 392.30 7983.60  2135.60 49801.55 12.40 36.70    3.84 13.01 3.39 0.08 69.02

dm-0              0.00 0.00 392.30 7950.20  2135.60 50655.15 12.66 36.93    3.87 13.05 3.42 0.08 69.34

dm-1              0.00 0.00 0.00    0.30 0.00 1.20   8.00 0.06 183.67 0.00  183.67 61.67 1.85



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 1.40 370.93 6775.42  2557.04 46184.22 13.64 43.43    6.12 11.60 5.82 0.10 73.25

dm-0              0.00 0.00 370.93 6738.76  2557.04 47029.62 13.95 43.77    6.20 11.64 5.90 0.10 73.41

dm-1              0.00 0.00 0.00    0.30 0.00 1.20   8.00 0.03 107.00 0.00  107.00 35.67 1.07



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.00 299.80 7253.35  1916.88 52766.38 14.48 30.44    4.59 15.62 4.14 0.10 72.09

dm-0              0.00 0.00 299.80 7198.60  1916.88 51064.24 14.13 30.68    4.66 15.70 4.20 0.10 72.57

dm-1              0.00 0.00 0.00    0.00 0.00 0.00   0.00 0.00 0.00 0.00    0.00 0.00 0.00



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.10 215.50 8939.60  1027.60 67497.10 14.97 59.65    6.52 27.98 6.00 0.08 72.50

dm-0              0.00 0.00 215.50 8889.20  1027.60 67495.90 15.05 60.07    6.60 28.09 6.08 0.08 72.75

dm-1              0.00 0.00 0.00    0.30 0.00 1.20   8.00 0.01 32.33 0.00   32.33 30.33 0.91



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.90 140.40 8922.10   625.20 54709.80 12.21 11.29    1.25 9.88 1.11 0.08 68.60

dm-0              0.00 0.00 140.40 8871.50   625.20 54708.60 12.28 11.39    1.26 9.92 1.13 0.08 68.83

dm-1              0.00 0.00 0.00    0.30 0.00 1.20   8.00 0.01 27.33 0.00   27.33 9.33 0.28



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 1.70 284.50 8621.30 24228.40 51535.75    17.01 34.14 3.27 8.19 3.11 0.08 72.78

dm-0              0.00 0.00 290.90 8587.10 25047.60 53434.95    17.68 34.28 3.29 8.02 3.13 0.08 73.47

dm-1              0.00 0.00 0.00    2.00 0.00 8.00   8.00 0.83 416.45 0.00  416.45 63.60 12.72



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.30 851.60 11018.80 17723.60 85165.90    17.34 142.59 12.44 7.61 12.81 0.08 99.75

dm-0              0.00 0.00 845.20 10938.90 16904.40 83258.70    17.00 143.44 12.61 7.67 12.99 0.08 99.75

dm-1              0.00 0.00 0.00    0.00 0.00 0.00   0.00 0.00 0.00 0.00    0.00 0.00 0.00



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 1.10 24.60 12965.40   420.80 51114.45 7.93 39.44    3.04 0.33 3.04 0.07 93.39

dm-0              0.00 0.00 24.60 12890.20   420.80 51114.45 7.98 40.23    3.12 0.33 3.12 0.07 93.35

dm-1              0.00 0.00 0.00    0.00 0.00 0.00   0.00 0.00 0.00 0.00    0.00 0.00 0.00



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.00 3.60 13420.70    57.60 51942.00 7.75 0.95   0.07 0.33 0.07 0.07 92.11

dm-0              0.00 0.00 3.60 13341.10    57.60 51942.00 7.79 0.95   0.07 0.33 0.07 0.07 92.08

dm-1              0.00 0.00 0.00    0.00 0.00 0.00   0.00 0.00 0.00 0.00    0.00 0.00 0.00

The result above displays the high IO utilization and a high writes. It also reveals that the average queue size and average request size are moving which means it's an indication of a high workload. In these cases, you need to determine if there are external processes that cause MySQL to choke the replication threads.

How Can ClusterControl Help?

With ClusterControl, dealing with slave lag and determining the culprit is very easy and efficient. It directly tells you in the web UI, see below:

It reveals to you the current slave lag your slave nodes are experiencing. Not only that, with SCUMM dashboards, if enabled, provides you more insights of what your slave node's health or even the whole cluster is doing:

ClusterControl Replication Slave Dashboard
ClusterControl Cluster Overview Dashboard
ClusterControl Cluster Overview Dashboard

Not only that these things are available in ClusterControl, it does provide you also the capability to avoid bad queries from occurring with these features as seen below,

The redundant indexes allows you to determine that these indexes can cause performance issues for incoming queries that references the duplicate indexes. It also tells you tables that have no Primary Keys which are usually a common problem of slave lag when a certain SQL query or transactions that references big tables without primary or unique keys when it's replicated to the slaves.

Conclusion

Dealing with MySQL Replication lag is a frequent problem in a master-slave replication setup. It can be easy to diagnose, but difficult to solve. Make sure you have your tables with primary key or unique key existing, and determine the steps and tools on how to troubleshoot and diagnose the cause of slave lag. Efficiency is always the key when solving problems though.

by Paul Namuag at February 12, 2020 07:34 PM

February 11, 2020

SeveralNines

How Do I Know if My PostgreSQL Backup is Good?

Backups are a must in all Disaster Recovery Plan. It might not always be enough to guarantee an acceptable Recovery Point Objective, but is a good first approach. The problem is what happens if, in case of failure, you need to use this backup, and it’s not usable for some reason? Probably you don’t want to be in that situation, so, in this blog, we’ll see how to confirm if your backup is good to use.

Types of PostgreSQL Backups

Let’s start talking about the different types of backups. There are different types, but in general, we can separate it in two simple categories:

  • Logical: The backup is stored in a human-readable format like SQL.
  • Physical: The backup contains binary data.

Why are we mentioning this? Because we’ll see that there are some checks we can do for one type and not for the other one.

Checking the Backup Logs

The first way to confirm if everything goes fine is by checking the backup logs.

The simplest command to run a PostgreSQL backup could be for example:

$ pg_dumpall > /path/to/dump.sql

But, how can I know if there was an error when the command was running? You can just add to send the output to some specific log file:

$ pg_dumpall > /path/to/dump.sql > /var/log/postgres/pg_dump.log

So, you can add this line in the server cron to run it every day:

30 0 * * * pg_dumpall > /path/to/dump.sql > /var/log/postgres/pg_dump.log

And you should monitor the log file to look for errors, for example, adding it into some monitoring tool like Nagios.

Checking logs is not enough to confirm that the backup will work, because for example, if the backup file is corrupted for some reason, you probably won’t see that in the log file.

Checking the Backup Content

If you are using logical backups, you can verify the content of the backup file, to confirm you have all databases there.

You can list your current PostgreSQL databases using, for example, this command:

$ psql -l | awk '{ print $1 }'| awk 'FNR > 3' |grep '^[a-zA-Z0-9]' |grep -v 'template0'

postgres

template1

world

And check which databases you have in the backup file:

$ grep '^[\]connect' /path/to/dump.sql |awk '{print $2}'

template1

postgres

world

The problem with this check is you don’t check the size or data, so it could be possible that you have some data loss if there was some error when the backup was executed.

Restoring to Check the Backup Manually

The most secure way to confirm if a backup is working is restoring it and access the database.

After the backup is completed, you can restore it manually in another host by copying the dump file and running for example:

$ psql -f /path/to/dump.sql postgres

Then, you can access it and check the databases:

$ psql

postgres=# \l

                                  List of databases

   Name    | Owner   | Encoding |   Collate | Ctype    | Access privileges

-----------+----------+----------+-------------+-------------+-----------------------

 postgres  | postgres | UTF8     | en_US.utf-8 | en_US.utf-8 |

 template0 | postgres | UTF8     | en_US.utf-8 | en_US.utf-8 | =c/postgres          +

           |          | |             | | postgres=CTc/postgres

 template1 | postgres | UTF8     | en_US.utf-8 | en_US.utf-8 | =c/postgres          +

           |          | |             | | postgres=CTc/postgres

 world     | postgres | UTF8     | en_US.utf-8 | en_US.utf-8 |

(4 rows)

The problem with this method is, of course, you should run it manually, or find a way to automate this, which could be a time-consuming task.

Automatic ClusterControl Backup Verification

Now, let’s see how ClusterControl can automate the verification of PostgreSQL backups and help avoid any surprises or manual tasks.

In ClusterControl, select your cluster and go to the "Backup" section, then, select “Create Backup”.

The automatic verify backup feature is available for the scheduled backups. So, let’s choose the “Schedule Backup” option.

When scheduling a backup, in addition to selecting the common options like method or storage, you also need to specify schedule/frequency.

In the next step, you can compress and encrypt your backup and specify the retention period. Here, you also have the “Verify Backup” feature.

To use this feature, you need a dedicated host (or VM) that is not part of the cluster.

ClusterControl will install the software and it’ll restore the backup in this host. After restoring, you can see the verification icon in the ClusterControl Backup section.

Conclusion

As we mentioned, backups are mandatory in any environment, but backup is not a backup if you can’t use it. So, you should make sure that your backup is useful in case you need it one day. In this blog, we showed different ways to check your backup to avoid problems when you want to restore it.

 

by Sebastian Insausti at February 11, 2020 06:51 PM

Federico Razzoli

Use cases for MariaDB Invisible Columns

Invisible columns are columns that are not returned by a SELECT *. Their use cases are not obvious.

by Federico Razzoli at February 11, 2020 09:47 AM

February 10, 2020

SeveralNines

How to Protect your MySQL or MariaDB Database From SQL Injection: Part One

Security is one of the most important elements of the properly designed database environment. There are numerous attack vectors used with SQL injection being probably the most popular one. You can design layers of defence in the application code but what can you do on the database layer? Today we would like to show you how easily you can implement SQL firewall on top of MySQL using ProxySQL. In the second part of this blog we will explain how you can create a whitelist of queries that are allowed to access the database.

First, we want to deploy ProxySQL. The easiest way to do it is to use ClusterControl. With a couple of clicks you can deploy it to your cluster.

Deploy ProxySQL to Database Cluster

Define where to deploy it, you can either pick existing host in the cluster or just write down any IP or hostname. Set credentials for administrative and monitoring users.

Then you can create a new user in the database to be used with ProxySQL or you can import one of the existing ones. You also need to define the database nodes you want to include in the ProxySQL. Answer if you use implicit transactions or not and you are all set to deploy ProxySQL. In a couple of minutes a ProxySQL with configuration prepared based on your input is ready to use.

Given our issue is security, we want to be able to tell ProxySQL how to handle inappropriate queries. Let’s take a look at the query rules, the core mechanism that governs how ProxySQL handles the traffic that passes through it. The list of query rules may look like this:

They are being applied from the lowest ID onwards.

Let’s try to create a query rule which will allow only SELECT queries for a particular user:

We are adding a query rule at the beginning of the rules list. We are going to match anything that is not SELECTs (please note Negate Match Pattern is enabled). The query rule will be used only when the username is ‘devuser’. If all the conditions are matched, the user will see the error as in the “Error Msg” field.

root@vagrant:~# mysql -u devuser -h 10.0.0.144 -P6033 -ppass

mysql: [Warning] Using a password on the command line interface can be insecure.

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 3024

Server version: 5.5.30 (ProxySQL)



Copyright (c) 2009-2019 Percona LLC and/or its affiliates

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.



Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.



Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.



mysql> create schema myschema;

ERROR 1148 (42000): The query is not allowed

mysql> SELECT 1;

+---+

| 1 |

+---+

| 1 |

+---+

1 row in set (0.01 sec)



mysql> SELECT * FROM sbtest.sbtest1 LIMIT 1\G

*************************** 1. row ***************************

 id: 1

  k: 503019

  c: 18034632456-32298647298-82351096178-60420120042-90070228681-93395382793-96740777141-18710455882-88896678134-41810932745

pad: 43683718329-48150560094-43449649167-51455516141-06448225399

1 row in set (0.00 sec)

Another example, this time we will try to prevent accidents related to the Bobby Tables situation.

With this query rule in place, your ‘students’ table won’t be dropped by Bobby:

mysql> use school;

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A



Database changed

mysql> INSERT INTO students VALUES (1, 'Robert');DROP TABLE students;--

Query OK, 1 row affected (0.01 sec)



ERROR 1148 (42000): Only superuser can execute DROP TABLE;

As you can see, Bobby was not able to remove our ‘students’ table. He was only nicely inserted into the table.

 

by krzysztof at February 10, 2020 08:58 PM

MariaDB Foundation

FOSDEM 2020: Some memories and todos

FOSDEM gives energy. FOSDEM gives ideas. FOSDEM opens up opportunities, FOSDEM allows you to connect with old friends and colleagues. Hence, no big surprise that MariaDB Foundation attended FOSDEM, in order to promote Open Source and to get ourselves closer to the community. […]

The post FOSDEM 2020: Some memories and todos appeared first on MariaDB.org.

by Kaj Arnö at February 10, 2020 03:10 PM

February 08, 2020

Valeriy Kravchuk

Fun with Bugs #93 - On MySQL Bug Reports I am Subscribed to, Part XXVII

No matter what I write and present about dynamic tracing, blog posts about MySQL bugs are more popular based on statistics. So, to make more readers happy, I'd like to continue my review of interesting bugs reported in November with this post on bugs reported during December, 2019.

As usual, I'll try to concentrate on bug reports related to InnoDB, replication and optimizer, but some other categories also got my attention:
  • Bug #97911 - "[FATAL] Semaphore wait has lasted > 600 seconds. We intentionally crash the serv...". This bug got marked as a duplicate of other, older long semaphore wait bug (in "Can't repeat" status!) without much analysis. I think all Oracle engineers who added comments to that bug missed one interesting point:
    ... [ERROR] [MY-012872] [InnoDB] Semaphore wait has lasted > 39052544 seconds.
    even though bug reporter highlighted it in a comment. Reported wait time is a problem and is surely a bug, no matter how to reproduce the long wait itself and what is its root cause!
  • Bug #97913 - "Undo logs growing during partitioning ALTER queries". This bug (affecting only MySQL 5.7.x) was reported by Przemyslaw Malkowski from Percona, who also presented useful examples of monitoring queries to the information_schema.innodb_metrics and performance_schema. Check also comments that may explain why 8.0 is not affected in a similar way.
  • Bug #97935 - "Memory leak in client connection using information_schema". It took some efforts (starting from but not limited to Valgrind Massif profiling of heap memory usage) and time for Daniel Nichter to prove the point and get this bug "Verified". It is also not clear if MySQL 8 is also affected.
  • Bug #97950 - "buf_read_page_handle_error can trigger assert failure". Bug reporter, Shu Lin, tried his best to make the point. It's clear enough how to repeat this, and one could use one of documented test synchonisation methods if gdb is too much for bug verification. I do not think this bug was handled properly or got the level of attention it truly deserved.
  • Bug #97966 - "XA COMMIT in another session will not write binlog event". This bug was reported by Lujun Wang and immediately verified, but again with no documented check if MySQL 8 is affected. This happens too often, unfortunately.
  • Bug #97971 - "Roles not handling column level privileges correctly; Can SELECT, but not UPDATE". Clear and simple bug report with a workaround from Travis Bement. It was immediately verified.
  • Bug #98014 - "Lossy implicit conversion in conditional breaks ONLY_FULL_GROUP_BY". Yet another case of (IMHO) improper bug processing. The argument presented (from the manual):
    "MySQL 5.7.5 and later also permits a nonaggregate column not named in a GROUP BY clause when ONLY_FULL_GROUP_BY SQL mode is enabled, provided that this column is limited to a single value"
    does not apply, as "single value" for = 0 is NOT selected, we have multiple Host values matching it due to conversion. This is how proper version (guess what it is) works:
    mysql> SELECT User, Host, COUNT(*) FROM mysql.user WHERE Host = 0 GROUP BY 1;
    ERROR 1055 (42000): 'mysql.user.Host' isn't in GROUP BY
    mysql> select @@sql_mode;
    +--------------------+
    | @@sql_mode         |
    +--------------------+
    | ONLY_FULL_GROUP_BY |
    +--------------------+
    1 row in set (0.001 sec)

    mysql> set session sql_mode='';
    Query OK, 0 rows affected (0.029 sec)

    mysql> SELECT User, Host, COUNT(*) FROM mysql.user WHERE Host = 0 GROUP BY 1;+---------------+-----------+----------+
    | User          | Host      | COUNT(*) |
    +---------------+-----------+----------+
    | data_engineer |           |        1 |
    | en            | localhost |        1 |
    | ro1           |           |        1 |
    | ro2           |           |        1 |
    | role-1        |           |        1 |
    | root          | ::1       |        3 |
    ...
    | user1         | %         |        1 |
    +---------------+-----------+----------+
    13 rows in set, 17 warnings (0.003 sec)
    I think this bug reported by Joshua Varner must be verified.
  • Bug #98046 - "Inconsistent behavior while logging a killed query in the slow query log". Bug reporter, Pranay Motupalli, provided a clear test case and a detailed analysis, including the gdb debugging session that proves the point. Nice bug report.
  • Bug #98055 - "MySQL Optimizer Bug not picking right index". Both the bug reporter (Sudheer Gadipathi) and engineer who verified the bug stated that MySQL 8.0.x is similarly affected (UNIQUE key is preferred for the partitioned table, even though there is a better non-unique index). But 8.0.x is NOT listed in the "Version:" filed. Weird.
  • Bug #98068 - "SELECT FOR UPDATE not-exist table in PERFORMANCE SCHEMA reports confusing error". This is a funny (but still a regression) bug report by William ZHANG. Proper versions work like this:
    mysql> select database();+--------------------+
    | database()         |
    +--------------------+
    | performance_schema |
    +--------------------+
    1 row in set (0.001 sec)

    mysql> select * from not_exist_table;
    ERROR 1146 (42S02): Table 'performance_schema.not_exist_table' doesn't exist
    mysql> select * from not_exist_table for update;
    ERROR 1146 (42S02): Table 'performance_schema.not_exist_table' doesn't exist
  • Bug #98072 - "innochecksum summary shows blob pages as other type of page for 8.0 tables". The bug was reported by SERGEY KUZMICHEV. This time "regression" tag is missing, even though it's clearly stated that MySQL 5.7 worked differently. This is from the proper version:
    ...
    File::..\data\test\blob_test.ibd
    ================PAGE TYPE SUMMARY==============
    #PAGE_COUNT     PAGE_TYPE
    ===============================================
           1        Index page
           0        Undo log page
           1        Inode page
           0        Insert buffer free list page
         508        Freshly allocated page
           1        Insert buffer bitmap
           0        System page
           0        Transaction system page
           1        File Space Header
           0        Extent descriptor page
          64        BLOB page
           0        Compressed BLOB page
           0        Page compressed page
           0        Page compressed encrypted page
           0        Other type of page

    ...
  • Bug #98083 - "Restarting the computer when deleting the database will cause directory residues". One would expect that MySQL 8 with a data dictionary should have some means to figure out the remaining database directory for a dropped database upon startup (as it stores information about databases elsewhere) and do proper cleanup. I think this bug reported by Jinming Liao must be verified and fixed. There is no "... manual creation or deletion of tables or databases..." involved in this case.
  • Bug #98091 - "InnoDB does not initialize raw disk partitions". As simple as that and both 5.7.29 and 8.0.219 are surely affected. It was not always the case, I've used raw devices myself with older MySQL versions, so this bug reported by Saverio M is a regression. Still, "regression" tag is missing.
That's all bugs reported in December, 2019 that I cared to subscribe to  and mention here. Next time I'll check bugs reported in January, 2020. There are at least 16 in my list already, so stay tuned.

Follow the links in this post to get more details about profiling and creating off-CPU FlameGraphs for MySQL. This post is devoted to bugs, though :)


To summarize:
  1. I am happy to see bug reports from people whom I never noticed before. MySQL Community is alive.
  2. Some flexibility in following common sense based bugs verification procedures is still visible. Bugs reported for 5.7 are not checked on 8.0 (or the results of this check are not documented in public), nobody cares to read what bug reporter says carefully or go extra mile, "regression" tag not added, and so on. 
  3. Probably at this stage my writings are mostly ignored by Oracle's decision makers. But I keep watching them all anyway.

by Valerii Kravchuk (noreply@blogger.com) at February 08, 2020 05:25 PM

February 07, 2020

SeveralNines

How to Identify MySQL Performance Issues with Slow Queries

Performance issues are common problems when administering MySQL databases. Sometimes these problems are, in fact, due to slow queries. In this blog, we'll deal with slow queries and how to identify these.

Checking Your Slow Query Logs

MySQL has the capability to filter and log slow queries. There are various ways you can investigate these, but the most common and efficient way is to use the slow query logs. 

You need to determine first if your slow query logs are enabled. To deal with this, you can go to your server and query the following variable:

MariaDB [(none)]> show global variables like 'slow%log%';

+---------------------+-------------------------------+

| Variable_name       | Value           |

+---------------------+-------------------------------+

| slow_query_log      | ON           |

| slow_query_log_file | /var/log/mysql/mysql-slow.log |

+---------------------+-------------------------------+

2 rows in set (0.001 sec)

You must ensure that the variable slow_query_log is set to ON, while the slow_query_log_file determines the path where you need to place your slow query logs. If this variable is not set, it will use the DATA_DIR of your MySQL data directory.

Accompanied by the slow_query_log variable are the long_query_time and min_examined_row_limit which impacts how the slow query logging works. Basically, the slow query logs work as SQL statements that take more than long_query_time seconds to execute and also require at least min_examined_row_limit rows to be examined. It can be used to find queries that take a long time to execute and are therefore candidates for optimization and then you can use external tools to bring the report for you, which will talk later.

By default, administrative statements (ALTER TABLE, ANALYZE TABLE, CHECK TABLE, CREATE INDEX, DROP INDEX, OPTIMIZE TABLE, and REPAIR TABLE) do not fall into slow query logs. In order to do this, you need to enable variable log_slow_admin_statements

Querying Process List and InnoDB Status Monitor

In a normal DBA routine, this step is the most common way to determine the long running queries or active running queries that causes performance degradation. It might even cause your server to be stuck followed by piled up queues that are slowly increasing due to a lock covered by a running query. You can just simply run,

SHOW [FULL] PROCESSLIST;

or

SHOW ENGINE INNODB STATUS \G

If you are using ClusterControl, you can find it by using <select your MySQL cluster> → Performance → InnoDB Status just like below,

or using <select your MySQL cluster> → Query Monitor → Running Queries (which will discuss later) to view the active processes, just like how a SHOW PROCESSLIST works but with better control of the queries.

Analyzing MySQL Queries

The slow query logs will show you a list of  queries that have been identified as slow, based on the given values in the system variables as mentioned earlier. The slow queries definition might differ in different cases since there are certain occasions that even a 10 second query is acceptable and still not slow. However, if your application is an OLTP, it's very common that a 10 second or even a 5 second query is an issue or causes performance degradation to your database. MySQL query log does help you this but it's not enough to open the log file as it does not provide you an overview of what are those queries, how they perform, and what are the frequency of their occurrence. Hence, third party tools can help you with these.

pt-query-digest

Using Percona Toolkit, which I can say the most common DBA tool, is to use pt-query-digest. pt-query-digest provides you a clean overview of a specific report coming from your slow query log. For example, this specific report shows a clean perspective of understanding the slow query reports in a specific node:

# A software update is available:



# 100ms user time, 100ms system time, 29.12M rss, 242.41M vsz

# Current date: Mon Feb  3 20:26:11 2020

# Hostname: testnode7

# Files: /var/log/mysql/mysql-slow.log

# Overall: 24 total, 14 unique, 0.00 QPS, 0.02x concurrency ______________

# Time range: 2019-12-12T10:01:16 to 2019-12-12T15:31:46

# Attribute          total min max     avg 95% stddev median

# ============     ======= ======= ======= ======= ======= ======= =======

# Exec time           345s 1s 98s   14s 30s 19s 7s

# Lock time             1s 0 1s 58ms    24ms 252ms 786us

# Rows sent          5.72M 0 1.91M 244.14k   1.86M 629.44k 0

# Rows examine      15.26M 0 1.91M 651.23k   1.86M 710.58k 961.27k

# Rows affecte       9.54M 0 1.91M 406.90k 961.27k 546.96k       0

# Bytes sent       305.81M 11 124.83M  12.74M 87.73M 33.48M 56.92

# Query size         1.20k 25 244   51.17 59.77 40.60 38.53



# Profile

# Rank Query ID                         Response time Calls R/Call V/M   

# ==== ================================ ============= ===== ======= ===== 

#    1 0x00C8412332B2795DADF0E55C163... 98.0337 28.4%     1 98.0337 0.00 UPDATE sbtest?

#    2 0xDEF289292EA9B2602DC12F70C7A... 74.1314 21.5%     3 24.7105 6.34 ALTER TABLE sbtest? sbtest3

#    3 0x148D575F62575A20AB9E67E41C3... 37.3039 10.8%     6 6.2173 0.23 INSERT SELECT sbtest? sbtest

#    4 0xD76A930681F1B4CC9F748B4398B... 32.8019  9.5% 3 10.9340 4.24 SELECT sbtest?

#    5 0x7B9A47FF6967FD905289042DD3B... 20.6685  6.0% 1 20.6685 0.00 ALTER TABLE sbtest? sbtest3

#    6 0xD1834E96EEFF8AC871D51192D8F... 19.0787  5.5% 1 19.0787 0.00 CREATE

#    7 0x2112E77F825903ED18028C7EA76... 18.7133  5.4% 1 18.7133 0.00 ALTER TABLE sbtest? sbtest3

#    8 0xC37F2569578627487D948026820... 15.0177  4.3% 2 7.5088 0.00 INSERT SELECT sbtest? sbtest

#    9 0xDE43B2066A66AFA881D6D45C188... 13.7180  4.0% 1 13.7180 0.00 ALTER TABLE sbtest? sbtest3

# MISC 0xMISC                           15.8605 4.6% 5 3.1721 0.0 <5 ITEMS>



# Query 1: 0 QPS, 0x concurrency, ID 0x00C8412332B2795DADF0E55C1631626D at byte 5319

# Scores: V/M = 0.00

# Time range: all events occurred at 2019-12-12T13:23:15

# Attribute    pct total min     max avg 95% stddev  median

# ============ === ======= ======= ======= ======= ======= ======= =======

# Count          4 1

# Exec time     28 98s 98s     98s 98s 98s   0 98s

# Lock time      1 25ms 25ms    25ms 25ms 25ms       0 25ms

# Rows sent      0 0 0       0 0 0 0       0

# Rows examine  12 1.91M 1.91M   1.91M 1.91M 1.91M       0 1.91M

# Rows affecte  20 1.91M 1.91M   1.91M 1.91M 1.91M       0 1.91M

# Bytes sent     0 67 67      67 67 67   0 67

# Query size     7 89 89      89 89 89   0 89

# String:

# Databases    test

# Hosts        localhost

# Last errno   0

# Users        root

# Query_time distribution

#   1us

#  10us

# 100us

#   1ms

#  10ms

# 100ms

#    1s

#  10s+  ################################################################

# Tables

#    SHOW TABLE STATUS FROM `test` LIKE 'sbtest3'\G

#    SHOW CREATE TABLE `test`.`sbtest3`\G

update sbtest3 set c=substring(MD5(RAND()), -16), pad=substring(MD5(RAND()), -16) where 1\G

# Converted for EXPLAIN

# EXPLAIN /*!50100 PARTITIONS*/

select  c=substring(MD5(RAND()), -16), pad=substring(MD5(RAND()), -16) from sbtest3 where  1\G



# Query 2: 0.00 QPS, 0.01x concurrency, ID 0xDEF289292EA9B2602DC12F70C7A041A9 at byte 3775

# Scores: V/M = 6.34

# Time range: 2019-12-12T12:41:47 to 2019-12-12T15:25:14

# Attribute    pct total min     max avg 95% stddev  median

# ============ === ======= ======= ======= ======= ======= ======= =======

# Count         12 3

# Exec time     21 74s 6s     36s 25s 35s 13s     30s

# Lock time      0 13ms 1ms     8ms 4ms 8ms   3ms 3ms

# Rows sent      0 0 0       0 0 0 0       0

# Rows examine   0 0 0       0 0 0 0       0

# Rows affecte   0 0 0       0 0 0 0       0

# Bytes sent     0 144 44      50 48 49.17   3 49.17

# Query size     8 99 33      33 33 33   0 33

# String:

# Databases    test

# Hosts        localhost

# Last errno   0 (2/66%), 1317 (1/33%)

# Users        root

# Query_time distribution

#   1us

#  10us

# 100us

#   1ms

#  10ms

# 100ms

#    1s ################################

#  10s+  ################################################################

# Tables

#    SHOW TABLE STATUS FROM `test` LIKE 'sbtest3'\G

#    SHOW CREATE TABLE `test`.`sbtest3`\G

ALTER TABLE sbtest3 ENGINE=INNODB\G

Using performance_schema

Slow query logs might be an issue if you don't have direct access to the file such as using RDS or using fully-managed database services such Google Cloud SQL or Azure SQL. Although it might need you some variables to enable these features, it comes handy when querying for queries logged into your system. You can order it by using a standard SQL statement in order to retrieve a partial result. For example,

mysql> SELECT SCHEMA_NAME, DIGEST, DIGEST_TEXT, COUNT_STAR, SUM_TIMER_WAIT/1000000000000 SUM_TIMER_WAIT_SEC, MIN_TIMER_WAIT/1000000000000 MIN_TIMER_WAIT_SEC, AVG_TIMER_WAIT/1000000000000 AVG_TIMER_WAIT_SEC, MAX_TIMER_WAIT/1000000000000 MAX_TIMER_WAIT_SEC, SUM_LOCK_TIME/1000000000000 SUM_LOCK_TIME_SEC, FIRST_SEEN, LAST_SEEN FROM events_statements_summary_by_digest;

+--------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------+--------------------+--------------------+--------------------+-------------------+---------------------+---------------------+

| SCHEMA_NAME        | DIGEST               | DIGEST_TEXT                                                                                                                                                                                                                                                                                                                               | COUNT_STAR | SUM_TIMER_WAIT_SEC | MIN_TIMER_WAIT_SEC | AVG_TIMER_WAIT_SEC | MAX_TIMER_WAIT_SEC | SUM_LOCK_TIME_SEC | FIRST_SEEN | LAST_SEEN |

+--------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------+--------------------+--------------------+--------------------+-------------------+---------------------+---------------------+

| NULL               | 390669f3d1f72317dab6deb40322d119 | SELECT @@`skip_networking` , @@`skip_name_resolve` , @@`have_ssl` = ? , @@`ssl_key` , @@`ssl_ca` , @@`ssl_capath` , @@`ssl_cert` , @@`ssl_cipher` , @@`ssl_crl` , @@`ssl_crlpath` , @@`tls_version`                                                                                                                                                             | 1 | 0.0373 | 0.0373 | 0.0373 | 0.0373 | 0.0000 | 2020-02-03 20:22:54 | 2020-02-03 20:22:54 |

| NULL               | fba95d44e3d0a9802dd534c782314352 | SELECT `UNIX_TIMESTAMP` ( )                                                                                                                                                                                                                                                                                                                                     | 2 | 0.0002 | 0.0001 | 0.0001 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | 18c649da485456d6cdf12e4e6b0350e9 | SELECT @@GLOBAL . `SERVER_ID`                                                                                                                                                                                                                                                                                                                                   | 2 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | dd356b8a5a6ed0d7aee2abd939cdb6c9 | SET @? = ?                                                                                                                                                                                                                                                                                                                                                      | 6 | 0.0003 | 0.0000 | 0.0001 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | 1c5ae643e930af6d069845d74729760d | SET @? = @@GLOBAL . `binlog_checksum`                                                                                                                                                                                                                                                                                                                           | 2 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | ad5208ffa004a6ad7e26011b73cbfb4c | SELECT @?                                                                                                                                                                                                                                                                                                                                                       | 2 | 0.0001 | 0.0000 | 0.0000 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | ed0d1eb982c106d4231b816539652907 | SELECT @@GLOBAL . `GTID_MODE`                                                                                                                                                                                                                                                                                                                                   | 2 | 0.0001 | 0.0000 | 0.0000 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | cb47e22372fdd4441486b02c133fb94f | SELECT @@GLOBAL . `SERVER_UUID`                                                                                                                                                                                                                                                                                                                                 | 2 | 0.0001 | 0.0000 | 0.0000 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | 73301368c301db5d2e3db5626a21b647 | SELECT @@GLOBAL . `rpl_semi_sync_master_enabled`                                                                                                                                                                                                                                                                                                                | 2 | 0.0001 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | 0ff7375c5f076ba5c040e78a9250a659 | SELECT @@`version_comment` LIMIT ?                                                                                                                                                                                                                                                                                                                              | 1 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0000 | 2020-02-03 20:45:59 | 2020-02-03 20:45:59 |

| NULL               | 5820f411e67a393f987c6be5d81a011d | SHOW TABLES FROM `performance_schema`                                                                                                                                                                                                                                                                                                                           | 1 | 0.0008 | 0.0008 | 0.0008 | 0.0008 | 0.0002 | 2020-02-03 20:46:11 | 2020-02-03 20:46:11 |

| NULL               | a022a0ab966c51eb820da1521349c7ef | SELECT SCHEMA ( )                                                                                                                                                                                                                                                                                                                                               | 1 | 0.0005 | 0.0005 | 0.0005 | 0.0005 | 0.0000 | 2020-02-03 20:46:29 | 2020-02-03 20:46:29 |

| performance_schema | e4833a7c1365b0b4492e9a514f7b3bd4 | SHOW SCHEMAS                                                                                                                                                                                                                                                                                                                                                    | 1 | 0.1167 | 0.1167 | 0.1167 | 0.1167 | 0.0001 | 2020-02-03 20:46:29 | 2020-02-03 20:46:29 |

| performance_schema | 1107f048fe6d970cb6a553bd4727a1b4 | SHOW TABLES                                                                                                                                                                                                                                                                                                                                                     | 1 | 0.0002 | 0.0002 | 0.0002 | 0.0002 | 0.0000 | 2020-02-03 20:46:29 | 2020-02-03 20:46:29 |

...

You can use the table performance_schema.events_statements_summary_by_digest. Although there are chances that the entries on the tables from performance_schema will be flush, you can decide to save this in a specific table. Take a look at this external post from Percona MySQL query digest with Performance Schema

In case you're wondering why we need to divide the wait time columns (SUM_TIMER_WAIT, MIN_TIMER_WAIT_SEC, AVG_TIMER_WAIT_SEC), these columns are using picoseconds so you might need to do some math or some round ups to make it more readable to you.

Analyzing Slow Queries Using ClusterControl

If you are using ClusterControl, there are different ways to deal with this. For example, in a MariaDB Cluster I have below, it shows you the following tab (Query Monitor) and it's drop-down items (Top Queries, Running Queries, Query Outliers):

  • Top Queries -   aggregated list of all your top queries running on all the nodes of your database cluster
  • Running Queries - View current running queries on your database cluster similar to SHOW FULL PROCESSLIST command in MySQL
  • Query Outliers - Shows queries that are outliers. An outlier is a query taking longer time than the normal query of that type.

On top of that, ClusterControl also captures query performance using graphs which provides you a quick overlook of how your database system performs in relation to query performance. See below,

Wait, it's not over yet. ClusterControl also offers a high resolution metric using Prometheus and showcases very detailed metrics and captures real-time statistics from the server. We have discussed this in our previous blogs which are divided into two part series of blog. Check out part 1 and then the part 2 blogs. It offers you on how to efficiently monitor not only the slow queries but the overall performance of your MySQL, MariaDB, or Percona database servers. 

There are also other tools in ClusterControl which provide pointers and hints that can cause slow query performance even if it's not yet occurred or captured by the slow query log. Check out the Performance Tab as seen below,

these items provides you the following:

  • Overview - You can view graphs of different database counters on this page
  • Advisors - Lists of scheduled advisors’ results created in ClusterControl > Manage > Developer Studio using ClusterControl DSL.
  • DB Status - DB Status provides a quick overview of MySQL status across all your database nodes, similar to SHOW STATUS statement
  • DB Variables - DB Variables provide a quick overview of MySQL variables that are set across all your database nodes, similar to SHOW GLOBAL VARIABLES statement
  • DB Growth - Provides a summary of your database and table growth on daily basis for the last 30 days. 
  • InnoDB Status - Fetches the current InnoDB monitor output for selected host, similar to SHOW ENGINE INNODB STATUS command.
  • Schema Analyzer - Analyzes your database schemas for missing primary keys, redundant indexes and tables using the MyISAM storage engine. 
  • Transaction Log - Lists out long-running transactions and deadlocks across database cluster where you can easily view what transactions are causing the deadlocks. The default query time threshold is 30 seconds.

Conclusion

Tracing your MySQL Performance issue is not really difficult with MySQL. There are various external tools that provide you the efficiency and capabilities that you are looking for. The most important thing is that, it's easy to use and you are able to provide productivity at work. Fix the most outstanding issues or even avoid a certain disaster before it can happen.

by Paul Namuag at February 07, 2020 08:53 PM

February 06, 2020

SeveralNines

My MySQL Database is Out of Disk Space

When the MySQL server ran out of disk space, you would see one of the following error in your application (as well as in the MySQL error log):

ERROR 3 (HY000) at line 1: Error writing file '/tmp/AY0Wn7vA' (Errcode: 28 - No space left on device)

For binary log:

[ERROR] [MY-000035] [Server] Disk is full writing './binlog.000019' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.

For relay log:

[ERROR] [MY-000035] [Server] Disk is full writing './relay-bin.000007' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.

For slow query log:

[ERROR] [MY-011263] [Server] Could not use /var/log/mysql/mysql-slow.log for logging (error 28 - No space left on device). Turning logging off for the server process. To turn it on again: fix the cause, then either restart the query logging by using "SET GLOBAL SLOW_QUERY_LOG=ON" or restart the MySQL server.

For InnoDB:

[ERROR] [MY-012144] [InnoDB] posix_fallocate(): Failed to preallocate data for file ./#innodb_temp/temp_8.ibt, desired size 16384 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/8.0/en/operating-system-error-codes.html
[Warning] [MY-012638] [InnoDB] Retry attempts for writing partial data failed.
[ERROR] [MY-012639] [InnoDB] Write to file ./#innodb_temp/temp_8.ibt failed at offset 81920, 16384 bytes should have been written, only 0 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
[ERROR] [MY-012640] [InnoDB] Error number 28 means 'No space left on device'
[Warning] [MY-012145] [InnoDB] Error while writing 16384 zeroes to ./#

They are all reporting the same error code number which is 28. Alternatively, we can use the error code to see the actual error with perror command:

$ perror 28
OS error code  28: No space left on device

The above simply means the MySQL server is out of disk space, and most of the time MySQL is stopped or stalled at this point. In this blog post, we are going to look into ways to solve this issue for MySQL running in a Linux-based environment.

Troubleshooting

First of all, we have to determine which disk partition is full. MySQL can be configured to store data on different disk or partition. Look at the path as stated in the error to start with. In this example, our directory is located in the default location, /var/lib/mysql which is under the / partition. We can use df command and specify the full path to the datadir to get the partition the data is stored:

$ df -h /var/lib/mysql
Filesystem      Size Used Avail Use% Mounted on
/dev/sda1        40G 40G 20K 100% /

The above means we have to clear up some space in the root partition.

Temporary Workarounds

The temporary workaround is to clear up some disk space so MySQL can write to the disk and resume the operation. Things that we can do if we face this kind of problems are:

  • Remove unnecessary files
  • Purge binary logs
  • Drop old tables, or rebuild a very big table

Remove Unnecessary Files

This is commonly the first step to do if MySQL server is down or unresponsive, or you have no binary logs enabled. For example, files under /var/log/ are commonly the first place to look for unnecessary files:

$ cd /var/log
$ find . -type f -size +5M -exec du -sh {} +
8.1M ./audit/audit.log.6
8.1M ./audit/audit.log.5
8.1M ./audit/audit.log.4
8.1M ./audit/audit.log.3
8.1M ./audit/audit.log.2
8.1M ./audit/audit.log.1
11M ./audit/audit.log
8.5M ./secure-20190429
8.0M ./wtmp

The above example shows how to retrieve files that are bigger than 5MB. We can safely remove the rotated log files which are usually in {filename}.{number} format, for example audit.log.1 until audit.log.6. The same thing goes to any huge older backups that are stored in the server. If you had performed a restoration via Percona Xtrabackup or MariaDB Backup, all files prefixed with xtrabackup_ can be removed from the MySQL datadir, as they are no longer necessary for the restoration. The xtrabackup_logfile usually is the biggest file since it contains all transactions executed while the xtrabackup process copying the datadir to the destination. The following example shows all the related files in MySQL datadir:

$ ls -lah /var/lib/mysql | grep xtrabackup_
-rw-r-----.  1 mysql root   286 Feb 4 11:30 xtrabackup_binlog_info
-rw-r--r--.  1 mysql root    24 Feb 4 11:31 xtrabackup_binlog_pos_innodb
-rw-r-----.  1 mysql root    83 Feb 4 11:31 xtrabackup_checkpoints
-rw-r-----.  1 mysql root   808 Feb 4 11:30 xtrabackup_info
-rw-r-----.  1 mysql root  179M Feb 4 11:31 xtrabackup_logfile
-rw-r--r--.  1 mysql root     1 Feb 4 11:31 xtrabackup_master_key_id
-rw-r-----.  1 mysql root   248 Feb 4 11:31 xtrabackup_tablespaces

Therefore, the mentioned files are safe to be deleted. Start MySQL service once there is at least 10% more free space.

Purge the Binary Logs

If the MySQL server is still responsive and it has binary log enabled, e.g, for replication or point-in-time recovery, we can purge the old binary log files by using PURGE statement and provide the interval. In this example, we are deleting all binary logs before 3 days ago:

mysql> SHOW BINARY LOGS;
mysql> PURGE BINARY LOGS BEFORE DATE(NOW() - INTERVAL 3 DAY);
mysql> SHOW BINARY LOGS;

For MySQL Replication, it's safe to delete all logs that have been replicated and applied on slaves. Check the Relay_Master_Log_File value on the server:

mysql> SHOW SLAVE STATUS\G
...
        Relay_Master_Log_File: binlog.000008
...

And delete the older log files for example binlog.000007 and older. It's good practice to restart MySQL server to make sure that it has enough resources. We can also let the binary log rotation to happen automatically via expire_logs_days variable (<MySQL 8.0). For example, to keep only 3 days of binary logs, run the following statement:

mysql> SET GLOBAL expire_logs_days = 3;

Then, add the following line into MySQL configuration file under [mysqld] section:

expire_logs_days=3

In MySQL 8.0, use binlog_expire_logs_seconds instead, where the default value is 2592000 seconds (30 days). In this example, we reduce it to only 3 days (60 seconds x 60 minutes x 24 hours x 3 days):

mysql> SET GLOBAL binlog_expire_logs_seconds = (60*60*24*3);
mysql> SET PERSIST binlog_expire_logs_seconds = (60*60*24*3);

SET PERSIST will make sure the configuration is loaded in the next restart. Configuration set by this command is stored inside /var/lib/mysql/mysqld-auto.cnf.

Drop Old Tables / Rebuild Tables

Note that DELETE operation won't free up the disk space unless OPTIMIZE TABLE is executed afterward. Thus, if you have deleted many rows, and you would like to return the free space back to the OS after a huge DELETE operation, run the OPTIMIZE TABLE, or rebuild it. For example:

mysql> DELETE tbl_name WHERE id < 100000; -- remove 100K rows
mysql> OPTIMIZE TABLE tbl_name;

We can also force to rebuild a table by using ALTER statement:

mysql> ALTER TABLE tbl_name FORCE;
mysql> ALTER TABLE tbl_name; -- a.k.a "null" rebuild

Note that the above DDL operation is performed via online DDL, meaning MySQL permits concurrent DML operations while the rebuilding is ongoing. Another way to perform a defragmentation operation is to use mysqldump to dump the table to a text file, drop the table, and reload it from the dump file. Ultimately, we can also use DROP TABLE to remove the unused table or TRUNCATE TABLE to clear up all rows in the table, which consequently return the space back to the OS.

Permanent Solutions to Disk Space Issues

The permanent solution is of course adding more space to the corresponding disk or partition, or applying a shorter retention rule to keep unnecessary files in the server. If you are running on top of a scalable file storage system, you should be able to scale the resource up without too much hassle, or with minimal disruption and downtime to the MySQL service. To learn more on how to dimension your storage and understand MySQL and MariaDB capacity planning, check out this blog post.

You can be least worried with ClusterControl proactive monitoring, where you would get a warning notification when the disk space has reached 80%, and critical notification if the disk usage is 90% and higher.

by ashraf at February 06, 2020 08:19 PM

February 05, 2020

SeveralNines

Is My Database Vulnerable to Attack? A Security Checklist

Data is probably the most important asset in a company, so you should make sure your database is secured to avoid any possible data theft. It’s hard to create an environment that is 100% secure, but in this blog we’ll share a checklist to help you make your database as secure as possible.

Controlling Database Access

You should always restrict both physical and remote access.

  • Physical access (on-prem): Restrict unauthorized physical access to the database server.
  • Remote access: Limit the remote access to only the necessary people, and from the less amount of source possibles. Using a VPN to access it is definitely a must here.

Managing Database User Accounts

Depending on the technology, there are many ways to improve security for your user accounts.

  • Remove inactive users.
  • Grant only the necessary privileges.
  • Restrict the source for each user connection.
  • Define a secure password policy (or, depending on the technology, enable a plugin for this if there is one).

Secure Installations and Configurations

There are some changes to do to secure your database installation.

  • Install only the necessary packages and services on the server.
  • Change the default admin user password and restrict the usage from only the localhost.
  • Change the default port and specify the interface to listen in.
  • Enable password security policy plugin.
  • Configure SSL certificates to encrypt data in-transit.
  • Encrypt data at-rest (if it’s possible).
  • Configure the local firewall to allow access to the database port only from the local network (if it’s possible).

Employ a WAF to Avoid SQL Injections or DoS attack (Denial of Service)

These are the most common attacks to a database, and the most secure way to avoid it is by using a WAF (Web Application Firewall) to catch this kind of SQL queries or a SQL Proxy to analyze the traffic.

Keep Your OS and Database Up-to-Date

There are several fixes and improvements that the database vendor or the operating system release in order to fix or avoid vulnerabilities. It’s important to keep your system as up-to-date as possible applying patches and security upgrades.

Check CVE (Common Vulnerabilities and Exposures) Frequently

Every day, new vulnerabilities are detected for your database server. You should check it frequently to know if you need to apply a patch or change something in your configuration. One way to know it is by reviewing the CVE website, where you can find a list of vulnerabilities with a description, and you can look for your database version and vendor, to confirm if there is something critical to fix ASAP.

Conclusion

Following the tips above, your server will be safer, but unfortunately, there is always a risk of being hacked.

To minimize this risk, you should have a good monitoring system like ClusterControl, and run periodically some security scan tool looking for vulnerabilities like Nessus.

by Sebastian Insausti at February 05, 2020 07:40 PM

Percona

Observability Differences Between MySQL 8 and MariaDB 10.4

mysql mariadb observability

mysql mariadb observabilityI did a MariaDB Observability talk at MariaDB Day in Brussels, which  I roughly based on the MySQL 8 Observability talk I gave earlier in the year. This process pushed me to contrast MySQL and MariaDB observability.

In summary, there are a lot of differences that have accumulated through the years; a lot more than I expected.  Here are some highlights.

SHOW STATUS and SHOW VARIABLES

If you want to access SHOW [GLOBAL] STATUS output through tables, they have been moved to performance_schema in MySQL 8 but they are in  information_schema in MariaDB 10.4, meaning you need to use different queries.

mysql> select * from performance_schema.global_status where variable_name='questions';
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| Questions     | 401146958      |
+---------------+----------------+
1 row in set (0.00 sec)

MariaDB [(none)]> select * from information_schema.global_status where variable_name='questions';
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| QUESTIONS     | 21263834       |
+---------------+----------------+
1 row in set (0.002 sec)

The other difference you may notice is how VARIABLE_NAME is capitalized. It is all capitals for MariaDB and leading capital in MySQL, which can be a problem if you store data in a case-sensitive datastore.

The same applies to SHOW VARIABLES tables which are exposed as information_schema.global_variables in MariaDB 10.4 and performance_schema.global_variables in MySQL 8.

MariaDB 10.4 also exposes more variables in the SHOW STATUS (542) while in the current version of MySQL 8 it is less than 500.

INFORMATION_SCHEMA

Besides the location of the named tables, there are a lot of other differences in INFORMATION_SCHEMA.  For example, MariaDB 10.4 has INNODB_MUTEXES to expose “SHOW ENGINE INNODB MUTEX” in a table format which is easier to extract and report rather than parsing strings.  MySQL 8 does not have an INFORMATION_SCHEMA.INNODB_MUTEXES table.

MariaDB [information_schema]> select * from innodb_mutexes;
+------+-------------+-------------+----------+
| NAME | CREATE_FILE | CREATE_LINE | OS_WAITS |
+------+-------------+-------------+----------+
|      | log0log.cc  |         578 |        1 |
|      | btr0sea.cc  |         243 |      232 |
+------+-------------+-------------+----------+
2 rows in set (0.008 sec)

Another example of the tables that MariaDB 10.4 provides is current InnoDB Semaphore waits as INNODB_SYS_SEMAPHORE_WAITS  or  USER_VARIABLES to show currently set User Variables:

MariaDB [information_schema]> select * from user_variables;
+---------------+----------------+---------------+--------------------+
| VARIABLE_NAME | VARIABLE_VALUE | VARIABLE_TYPE | CHARACTER_SET_NAME |
+---------------+----------------+---------------+--------------------+
| a             | 2              | INT           | utf8               |
+---------------+----------------+---------------+--------------------+
1 row in set (0.001 sec)

MySQL 8 does not have this particular table but provides similar functionality via the USER_VARIABLES_BY_THREAD table in PERFORMANCE_SCHEMA.

mysql> select *  from performance_schema.user_variables_by_thread;
+-----------+---------------+----------------+
| THREAD_ID | VARIABLE_NAME | VARIABLE_VALUE |
+-----------+---------------+----------------+
|    202312 | a             | 2              |
+-----------+---------------+----------------+
1 row in set (0.00 sec)

Note that quite different information is provided in those tables!

There is also a lot of difference in what is available from the MariaDB 10.4 processlist table. Most significantly, you can discover how many rows were accessed (EXAMINED_ROWS) as well as the memory used by the query:

MariaDB [performance_schema]> select * from information_schema.processlist \G
*************************** 1. row ***************************
             ID: 118
           USER: root
           HOST: localhost
             DB: performance_schema
        COMMAND: Query
           TIME: 0
          STATE: Filling schema table
           INFO: select * from information_schema.processlist
        TIME_MS: 0.696
          STAGE: 0
      MAX_STAGE: 0
       PROGRESS: 0.000
    MEMORY_USED: 106592
MAX_MEMORY_USED: 2267712
  EXAMINED_ROWS: 0
       QUERY_ID: 21264066
    INFO_BINARY: select * from information_schema.processlist
            TID: 9977

Compare this to MySQL 8:

mysql> select * from information_schema.processlist \G
*************************** 1. row ***************************
           ID: 202266
         USER: root
         HOST: localhost
           DB: performance_schema
      COMMAND: Query
         TIME: 0
        STATE: executing
         INFO: select * from information_schema.processlist

I like how MariaDB adds a couple of practical fields here which are available simply and efficiently.  MySQL provides much more extended sys.processlist table as part of SYS_SCHEMA (driven by data from Performance Schema), but it is a lot more difficult to query.

mysql> select * from sys.processlist \G
*************************** 13. row ***************************
                thd_id: 202312
               conn_id: 202266
                  user: root@localhost
                    db: performance_schema
               command: Query
                 state: NULL
                  time: 0
     current_statement: select * from sys.processlist
     statement_latency: 83.48 ms
              progress: NULL
          lock_latency: 789.00 us
         rows_examined: 0
             rows_sent: 0
         rows_affected: 0
            tmp_tables: 4
       tmp_disk_tables: 0
             full_scan: YES
        last_statement: NULL
last_statement_latency: NULL
        current_memory: 1.38 MiB
             last_wait: NULL
     last_wait_latency: NULL
                source: NULL
           trx_latency: 82.71 ms
             trx_state: ACTIVE
        trx_autocommit: YES
                   pid: 24746
          program_name: mysql

There are many more differences than outlined above, so take it as an example of what amount of information available through INFORMATION_SCHEMA is substantially different in MySQL 8 and MariaDB 10.4, not as a complete list.

PERFORMANCE_SCHEMA

MySQL 8 is focused on observability through Performance Schema which is where all the new information is being exposed in a consistent manner.  MariaDB 10.4 does not place as high a value on Performance Schema.

Also, MySQL 8 has Performance Schema enabled by default while MariaDB 10.4 has it disabled. It also is missing a lot of instrumentations added in later MySQL series and MariaDB Performance Schema looks similar to one in MySQL 5.6.

Performance Schema Tables in MySQL 8

mysql> show tables;
+------------------------------------------------------+
| Tables_in_performance_schema                         |
+------------------------------------------------------+
| accounts                                             |
| cond_instances                                       |
| data_lock_waits                                      |
| data_locks                                           |
| events_errors_summary_by_account_by_error            |
| events_errors_summary_by_host_by_error               |
| events_errors_summary_by_thread_by_error             |
| events_errors_summary_by_user_by_error               |
| events_errors_summary_global_by_error                |
| events_stages_current                                |
| events_stages_history                                |
| events_stages_history_long                           |
| events_stages_summary_by_account_by_event_name       |
| events_stages_summary_by_host_by_event_name          |
| events_stages_summary_by_thread_by_event_name        |
| events_stages_summary_by_user_by_event_name          |
| events_stages_summary_global_by_event_name           |
| events_statements_current                            |
| events_statements_histogram_by_digest                |
| events_statements_histogram_global                   |
| events_statements_history                            |
| events_statements_history_long                       |
| events_statements_summary_by_account_by_event_name   |
| events_statements_summary_by_digest                  |
| events_statements_summary_by_host_by_event_name      |
| events_statements_summary_by_program                 |
| events_statements_summary_by_thread_by_event_name    |
| events_statements_summary_by_user_by_event_name      |
| events_statements_summary_global_by_event_name       |
| events_transactions_current                          |
| events_transactions_history                          |
| events_transactions_history_long                     |
| events_transactions_summary_by_account_by_event_name |
| events_transactions_summary_by_host_by_event_name    |
| events_transactions_summary_by_thread_by_event_name  |
| events_transactions_summary_by_user_by_event_name    |
| events_transactions_summary_global_by_event_name     |
| events_waits_current                                 |
| events_waits_history                                 |
| events_waits_history_long                            |
| events_waits_summary_by_account_by_event_name        |
| events_waits_summary_by_host_by_event_name           |
| events_waits_summary_by_instance                     |
| events_waits_summary_by_thread_by_event_name         |
| events_waits_summary_by_user_by_event_name           |
| events_waits_summary_global_by_event_name            |
| file_instances                                       |
| file_summary_by_event_name                           |
| file_summary_by_instance                             |
| global_status                                        |
| global_variables                                     |
| host_cache                                           |
| hosts                                                |
| keyring_keys                                         |
| log_status                                           |
| memory_summary_by_account_by_event_name              |
| memory_summary_by_host_by_event_name                 |
| memory_summary_by_thread_by_event_name               |
| memory_summary_by_user_by_event_name                 |
| memory_summary_global_by_event_name                  |
| metadata_locks                                       |
| mutex_instances                                      |
| objects_summary_global_by_type                       |
| performance_timers                                   |
| persisted_variables                                  |
| prepared_statements_instances                        |
| replication_applier_configuration                    |
| replication_applier_filters                          |
| replication_applier_global_filters                   |
| replication_applier_status                           |
| replication_applier_status_by_coordinator            |
| replication_applier_status_by_worker                 |
| replication_connection_configuration                 |
| replication_connection_status                        |
| replication_group_member_stats                       |
| replication_group_members                            |
| rwlock_instances                                     |
| session_account_connect_attrs                        |
| session_connect_attrs                                |
| session_status                                       |
| session_variables                                    |
| setup_actors                                         |
| setup_consumers                                      |
| setup_instruments                                    |
| setup_objects                                        |
| setup_threads                                        |
| socket_instances                                     |
| socket_summary_by_event_name                         |
| socket_summary_by_instance                           |
| status_by_account                                    |
| status_by_host                                       |
| status_by_thread                                     |
| status_by_user                                       |
| table_handles                                        |
| table_io_waits_summary_by_index_usage                |
| table_io_waits_summary_by_table                      |
| table_lock_waits_summary_by_table                    |
| threads                                              |
| user_defined_functions                               |
| user_variables_by_thread                             |
| users                                                |
| variables_by_thread                                  |
| variables_info                                       |
+------------------------------------------------------+
103 rows in set (0.01 sec)

Performance Schema Tables in MariaDB 10.4

MariaDB [performance_schema]> show tables;
+----------------------------------------------------+
| Tables_in_performance_schema                       |
+----------------------------------------------------+
| accounts                                           |
| cond_instances                                     |
| events_stages_current                              |
| events_stages_history                              |
| events_stages_history_long                         |
| events_stages_summary_by_account_by_event_name     |
| events_stages_summary_by_host_by_event_name        |
| events_stages_summary_by_thread_by_event_name      |
| events_stages_summary_by_user_by_event_name        |
| events_stages_summary_global_by_event_name         |
| events_statements_current                          |
| events_statements_history                          |
| events_statements_history_long                     |
| events_statements_summary_by_account_by_event_name |
| events_statements_summary_by_digest                |
| events_statements_summary_by_host_by_event_name    |
| events_statements_summary_by_thread_by_event_name  |
| events_statements_summary_by_user_by_event_name    |
| events_statements_summary_global_by_event_name     |
| events_waits_current                               |
| events_waits_history                               |
| events_waits_history_long                          |
| events_waits_summary_by_account_by_event_name      |
| events_waits_summary_by_host_by_event_name         |
| events_waits_summary_by_instance                   |
| events_waits_summary_by_thread_by_event_name       |
| events_waits_summary_by_user_by_event_name         |
| events_waits_summary_global_by_event_name          |
| file_instances                                     |
| file_summary_by_event_name                         |
| file_summary_by_instance                           |
| host_cache                                         |
| hosts                                              |
| mutex_instances                                    |
| objects_summary_global_by_type                     |
| performance_timers                                 |
| rwlock_instances                                   |
| session_account_connect_attrs                      |
| session_connect_attrs                              |
| setup_actors                                       |
| setup_consumers                                    |
| setup_instruments                                  |
| setup_objects                                      |
| setup_timers                                       |
| socket_instances                                   |
| socket_summary_by_event_name                       |
| socket_summary_by_instance                         |
| table_io_waits_summary_by_index_usage              |
| table_io_waits_summary_by_table                    |
| table_lock_waits_summary_by_table                  |
| threads                                            |
| users                                              |
+----------------------------------------------------+
52 rows in set (0.000 sec)

MariaDB also lacks “sys schema” shipped with a server, which means it does not provide a built-in interface to access Performance Schema data, which would make it easy and convenient for humans. In the end, for me, it all points to Performance Schema not being a priority for MariaDB.

SLOW QUERY LOG

Both MySQL 8  and MariaDB 10.4 support basic Slow Query Log.  When it comes to additional options, though, there is quite a divergence. MariaDB supports quite a few extended slow query logging options from Percona Server for MySQL, both for enhancing the data logged as well as for filtering. It also supports logging Query EXPLAIN Plan. On the other hand, MySQL 8 can log  additional information:

MariaDB 10.4 Slow Query Log (with Explain)

# Time: 200201 22:32:37
# User@Host: root[root] @ localhost []
# Thread_id: 113  Schema: sbtest  QC_hit: No
# Query_time: 0.000220  Lock_time: 0.000091  Rows_sent: 1  Rows_examined: 1
# Rows_affected: 0  Bytes_sent: 190
#
# explain: id   select_type     table   type    possible_keys   key     key_len ref     rows    r_rowsfiltered r_filtered      Extra
# explain: 1    SIMPLE  sbtest1 const   PRIMARY PRIMARY 4       const   1       NULL    100.00  NULL
#
SET timestamp=1580596357;
SELECT c FROM sbtest1 WHERE id=101985;

MySQL 8 Slow Query Log with Extended Metrics

# Time: 2019-06-14T14:14:22.980797Z
# User@Host: root[root] @ localhost []  Id:     8
# Query_time: 0.005342  Lock_time: 0.000451 Rows_sent: 33  Rows_examined: 197 Thread_id: 8 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 664 Read_first: 1 Read_last: 0 Read_key: 71 Read_next: 127 Read_prev: 0 Read_rnd: 33 Read_rnd_next:
34 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 33 Sort_scan_count: 1 Created_tmp_disk_tables: 0 Created_tmp_tables: 1 Start: 2019-06-14T14:14:22.975455Z
 End: 2019-06-14T14:14:22.980797Z
SET timestamp=1560521662;
show tables;

EXPLAIN

Both MySQL and MariaDB support the classic “Table” EXPLAIN output. Although, even in this output there may be format differences. This actually makes sense as optimizers in MySQL and MariaDB have different features and optimizations so it only makes sense the EXPLAIN outputs are different:

MySQL 8.0 EXPLAIN

mysql> explain select count(*) from sbtest1 s1,sbtest1 s2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: s1
   partitions: NULL
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 987292
     filtered: 100.00
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: s2
   partitions: NULL
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 987292
     filtered: 100.00
        Extra: Using index
2 rows in set, 1 warning (0.00 sec)

 

MariaDB 10.4  EXPLAIN

MariaDB [sbtest]> explain select count(*) from sbtest1 s1,sbtest1 s2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: s1
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 986499
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: s2
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 986499
        Extra: Using index; Using join buffer (flat, BNL join)
2 rows in set (0.001 sec)

Where things get more interesting though is advanced EXPLAIN features. If you want to explain running query you need to use SHOW EXPLAIN FOR <thread_id> in MariaDB but EXPLAIN FOR CONNECTION <connection_id> for MySQL. 

EXPLAIN FORMAT=JSON works both with MariaDB 10.4 and MySQL 8 but the output is so different you would surely need to handle it separately.

EXPLAIN FORMAT=TREE is only supported in MySQL 8.  It is a very new feature so it may appear in MariaDB sometime in the future. TREE format strives to provide an easier-to-read output, especially for users not familiar with MySQL query execution details or terminology.  For example, for this query it gives this output:

mysql> explain FORMAT=TREE select count(*) from sbtest1 s1,sbtest1 s2 \G
*************************** 1. row ***************************
EXPLAIN: -> Count rows in s1

1 row in set (0.00 sec)

This leaves a lot of questions unanswered but is very human-readable.

Finally, both MySQL and MariaDB allow you to Analyze (profile) the query to see how it is really executed. Both syntaxes for this feature and output are significantly different between MySQL 8 and MariaDB 10.4.

MySQL 8.0  EXPLAIN ANALYZE

mysql> explain analyze  select count(*) from sbtest1 where k>2 \G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)  (actual time=506.084..506.085 rows=1 loops=1)
    -> Filter: (sbtest1.k > 2)  (cost=99211.38 rows=493646) (actual time=0.037..431.186 rows=999997 loops=1)
        -> Index range scan on sbtest1 using k_1  (cost=99211.38 rows=493646) (actual time=0.035..312.929 rows=999997 loops=1)

1 row in set (0.51 sec)

MariaDB 10.4  ANALYZE

MariaDB [sbtest]> analyze select count(*) from sbtest1 where k>2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sbtest1
         type: range
possible_keys: k_1
          key: k_1
      key_len: 4
          ref: NULL
         rows: 493249
       r_rows: 999997.00
     filtered: 100.00
   r_filtered: 100.00
        Extra: Using where; Using index
1 row in set (0.365 sec)

Summary

I’ve been saying for a while now that “MariaDB is not MySQL” and you need to treat MySQL and MariaDB as separate databases.  It is even more important when you’re looking at observability functionality, as this space is where MySQL and MariaDB are unconstrained by SQL standards and can innovate as they like, which they really have been doing a lot of and diverging rapidly as a result.

by Peter Zaitsev at February 05, 2020 04:56 PM

February 04, 2020

SeveralNines

What to Check if the MySQL I/O Utilisation is High

The I/O performance is vital for MySQL databases. Data is read and written to the disk in numerous places. Redo logs, tablespaces, binary and relay logs. With an increase of the usage of solid state drives I/O performance has significantly increased allowing users to push their databases even faster but even then I/O may become a bottleneck and a limiting factor of the performance of the whole database. In this blog post we will take a look at the things you want to check if you notice your I/O performance is high on your MySQL instance.

What does “High” I/O utilisation mean? In short, if the performance of your database is affected by it, it is high. Typically you would notice it as writes slowing down in the database. It will also clearly manifest as high I/O wait on your system. Please keep in mind, though, on hosts with 32 and more CPU cores, even if one core will show 100% I/O wait, you may not notice it on a aggregated view - it will represent only 1/32 of the whole load. Seems not impacting but in fact some single-threaded I/O operation is saturating your CPU and some application is waiting for that I/O activity to finish.

Let’s say we did notice an increase in the I/O activity, just as in the screenshot above. What to look at if you noticed high I/O activity? First, check the list of the processes in the system. Which one is responsible for an I/O wait? You can use iotop to check that:

In our case it is quite clear that it is MySQL which is responsible for most of it. We should start with the simplest check - what exactly is running in the MySQL right now?

We can see there is replication activity on our slave. What is happening to the master?

We can clearly see some batch load job is running. This sort of ends our journey here as we managed to pinpoint the problem quite easily.

There are other cases, though, which may not be that easy to understand and track. MySQL comes with some instrumentation, which is intended to help with understanding the I/O activity in the system. As we mentioned, I/O can be generated in numerous places in the system. Writes are the most clear ones but we may also have on-disk temporary tables - it’s good to see if your queries do use such tables or not.

If you have performance_schema enabled, one way to check which files are responsible for the I/O load can be to query ‘table_io_waits_summary_by_table’:

*************************** 13. row ***************************

                FILE_NAME: /tmp/MYfd=68

               EVENT_NAME: wait/io/file/sql/io_cache

    OBJECT_INSTANCE_BEGIN: 140332382801216

               COUNT_STAR: 17208

           SUM_TIMER_WAIT: 23332563327000

           MIN_TIMER_WAIT: 1596000

           AVG_TIMER_WAIT: 1355913500

           MAX_TIMER_WAIT: 389600380500

               COUNT_READ: 10888

           SUM_TIMER_READ: 20108066180000

           MIN_TIMER_READ: 2798750

           AVG_TIMER_READ: 1846809750

           MAX_TIMER_READ: 389600380500

 SUM_NUMBER_OF_BYTES_READ: 377372793

              COUNT_WRITE: 6318

          SUM_TIMER_WRITE: 3224434875000

          MIN_TIMER_WRITE: 16699500

          AVG_TIMER_WRITE: 510356750

          MAX_TIMER_WRITE: 223219960500

SUM_NUMBER_OF_BYTES_WRITE: 414000000

               COUNT_MISC: 2

           SUM_TIMER_MISC: 62272000

           MIN_TIMER_MISC: 1596000

           AVG_TIMER_MISC: 31136000

           MAX_TIMER_MISC: 60676000

*************************** 14. row ***************************

                FILE_NAME: /tmp/Innodb Merge Temp File

               EVENT_NAME: wait/io/file/innodb/innodb_temp_file

    OBJECT_INSTANCE_BEGIN: 140332382780800

               COUNT_STAR: 1128

           SUM_TIMER_WAIT: 16465339114500

           MIN_TIMER_WAIT: 8490250

           AVG_TIMER_WAIT: 14596931750

           MAX_TIMER_WAIT: 583930037500

               COUNT_READ: 540

           SUM_TIMER_READ: 15103082275500

           MIN_TIMER_READ: 111663250

           AVG_TIMER_READ: 27968670750

           MAX_TIMER_READ: 583930037500

 SUM_NUMBER_OF_BYTES_READ: 566231040

              COUNT_WRITE: 540

          SUM_TIMER_WRITE: 1234847420750

          MIN_TIMER_WRITE: 286167500

          AVG_TIMER_WRITE: 2286754250

          MAX_TIMER_WRITE: 223758795000

SUM_NUMBER_OF_BYTES_WRITE: 566231040

               COUNT_MISC: 48

           SUM_TIMER_MISC: 127409418250

           MIN_TIMER_MISC: 8490250

           AVG_TIMER_MISC: 2654362750

           MAX_TIMER_MISC: 43409881500

As you can see above, it also shows temporary tables that are in use.

To double-check if a particular query uses temporary table you can use EXPLAIN FOR CONNECTION:

mysql> EXPLAIN FOR CONNECTION 3111\G

*************************** 1. row ***************************

           id: 1

  select_type: SIMPLE

        table: sbtest1

   partitions: NULL

         type: ALL

possible_keys: NULL

          key: NULL

      key_len: NULL

          ref: NULL

         rows: 986400

     filtered: 100.00

        Extra: Using temporary; Using filesort

1 row in set (0.16 sec)

On the example above a temporary table is used for filesort.

Another way of catching up disk activity is, if you happen to use Percona Server for MySQL, to enable full slow log verbosity:

mysql> SET GLOBAL log_slow_verbosity='full';

Query OK, 0 rows affected (0.00 sec)

Then, in the slow log, you may see entries like this:

# Time: 2020-01-31T12:05:29.190549Z

# User@Host: root[root] @ localhost []  Id: 12395

# Schema:   Last_errno: 0  Killed: 0

# Query_time: 43.260389  Lock_time: 0.031185 Rows_sent: 1000000  Rows_examined: 2000000 Rows_affected: 0

# Bytes_sent: 197889110  Tmp_tables: 0 Tmp_disk_tables: 0  Tmp_table_sizes: 0

# InnoDB_trx_id: 0

# Full_scan: Yes  Full_join: No Tmp_table: No  Tmp_table_on_disk: No

# Filesort: Yes  Filesort_on_disk: Yes  Merge_passes: 141

#   InnoDB_IO_r_ops: 9476  InnoDB_IO_r_bytes: 155254784  InnoDB_IO_r_wait: 5.304944

#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000

#   InnoDB_pages_distinct: 8191

SET timestamp=1580472285;

SELECT * FROM sbtest.sbtest1 ORDER BY RAND();

As you can see, you can tell if there was a temporary table on disk or if the data was sorted on disk. You can also check the number of I/O operations and amount of data accessed.

We hope this blog post will help you understand the I/O activity in the system and let you manage it better.

 

by krzysztof at February 04, 2020 08:59 PM

February 03, 2020

SeveralNines

An Overview of Job Scheduling Tools for PostgreSQL

Unlike other database management systems that have their own built-in scheduler (like Oracle, MSSQL or MySQL), PostgreSQL still doesn’t have this kind of feature.

In order to provide scheduling functionality in PostgreSQL you will need to use an external tool like...

  • Linux crontab
  • Agent pgAgent
  • Extension pg_cron

In this blog we will explore these tools and highlight how to operate them and their main features.

Linux crontab

It’s the oldest one, however, an efficient and useful way to execute scheduling tasks. This program is based on a daemon (cron) that allows tasks to be automatically run in the background periodically and regularly verifies the configuration files ( called crontab files) on which are defined the script/command to be executed and its scheduling.

Each user can have his own crontab file and for the newest Ubuntu releases are located in: 

/var/spool/cron/crontabs (for other linux distributions the location could be different):

root@severalnines:/var/spool/cron/crontabs# ls -ltr

total 12

-rw------- 1 dbmaster crontab 1128 Jan 12 12:18 dbmaster

-rw------- 1 slonik   crontab 1126 Jan 12 12:22 slonik

-rw------- 1 nines    crontab 1125 Jan 12 12:23 nines

The syntax of the configuration file is the following:

mm hh dd mm day <<command or script to execute>>



mm: Minute(0-59)

hh: Hour(0-23)

dd: Day(1-31)

mm: Month(1-12)

day: Day of the week(0-7 [7 or 0 == Sunday])

A few operators could be used with this syntax to streamline the scheduling definition and these symbols allow to specify multiple values in a field:

Asterisk (*)  - it means all possible values for a field

The comma (,) - used to define a list of values

Dash (-) - used to define a range of values

Separator (/) - specifies a step value

The script all_db_backup.sh will be executed according each scheduling expression:

0 6 * * * /home/backup/all_db_backup.sh

At 6 am every day

20 22 * * Mon, Tue, Wed, Thu, Fri /home/backup/all_db_backup.sh

At 10:20 PM, every weekday

0 23 * * 1-5 /home/backup/all_db_backup.sh

At 11 pm during the week

0 0/5 14 * * /home/backup/all_db_backup.sh

Every five hours starting at 2:00 p.m. and ending at 2:55 p.m., every day

Although it’s not very difficult, this syntax can be automatically generated on multiple web pages.

If the crontab file doesn’t exist for a user it can be created by the following command:

slonik@severalnines:~$ crontab -e

or presented it using the -l parameter:

slonik@severalnines:~$ crontab -l

If necessary to remove this file, the appropriate parameter is -r:

slonik@severalnines:~$ crontab -r

The cron daemon status is shown by the execution of the following command:

Agent pgAgent

The pgAgent is a job scheduling agent available for PostgreSQL that allows the execution of stored procedures, SQL statements, and shell scripts. Its configuration is stored on the postgres database in the cluster.

The purpose is to have this agent running as a daemon on Linux systems and periodically does a connection to the database to check if there are any jobs to execute.

This scheduling is easily managed by PgAdmin 4, but it’s not installed by default once the pgAdmin installed, it’s necessary to download and install it on your own.

Hereafter are described all the necessary steps to have the pgAgent working properly:

Step One

Installation of pgAdmin 4

$ sudo apt install pgadmin4 pgadmin4-apache

Step Two

Creation of plpgsql procedural language if not defined

CREATE TRUSTED PROCEDURAL LANGUAGE ‘plpgsql’

     HANDLER plpgsql_call_handler

          HANDLER plpgsql_validator;

Step Three

Installation of  pgAgent

$ sudo apt-get install pgagent

Step Four

Creation of the pgagent extension

CREATE EXTENSION pageant 

This extension will create all the tables and functions for the pgAgent operation and hereafter is showed the data model used by this extension:

Now the pgAdmin interface already has the option “pgAgent Jobs” in order to manage the pgAgent: 

In order to define a new job, it’s only necessary select "Create" using the right button on “pgAgent Jobs”, and it’ll insert a designation for this job and define the steps to execute it:

In the tab “Schedules” must be defined the scheduling for this new job:

Finally, to have the agent running in the background it’s necessary to launch the following process manually:

/usr/bin/pgagent host=localhost dbname=postgres user=postgres port=5432 -l 1

Nevertheless, the best option for this agent is to create a daemon with the previous command.

Extension pg_cron

The pg_cron is a cron-based job scheduler for PostgreSQL that runs inside the database as an extension (similar to the DBMS_SCHEDULER in Oracle) and allows the execution of database tasks directly from the database, due to a background worker.

The tasks to perform can be any of the following ones:

  • stored procedures
  • SQL statements
  • PostgreSQL commands (as VACUUM, or VACUUM ANALYZE)

pg_cron can run several jobs in parallel, but only one instance of a program can be running at a time. 

If a second run should be started before the first one finishes, then it is queued and will be started as soon as the first run completes.

This extension was defined for the version 9.5 or higher of PostgreSQL.

Installation of pg_cron

The installation of this extension only requires the following command:

slonik@sveralnines:~$ sudo apt-get -y install postgresql-10-cron

Updating of Configuration Files

In order to start the pg_cron background worker once PostgreSQL server starts, it’s necessary to set pg_cron to shared_preload_libraries parameter in postgresql.conf: 

shared_preload_libraries = ‘pg_cron’

It’s also necessary to define in this file, the database on which the pg_cron extension will be created, by adding the following parameter:

cron.database_name= ‘postgres’

On the other hand, in pg_hba.conf file that manages the authentication, it’s necessary to define the postgres login as trust for the IPV4 connections, because pg_cron requires such user to be able to connect to the database without providing any password, so the following line needs to be added to this file:

host postgres postgres 192.168.100.53/32 trust

The trust method of authentication allows anyone to connect to the database(s) specified in the pg_hba.conf file, in this case the postgres database. It's a method used often to allow connection using Unix domain socket on a single user machine to access the database and should only be used when there isan  adequate operating system-level protection on connections to the server.

Both changes require a PostgreSQL service restart:

slonik@sveralnines:~$ sudo system restart postgresql.service

It’s important to take into account that pg_cron does not run any jobs as long as the server is in hot standby mode, but it automatically starts when the server is promoted.

Creation of pg_cron extension

This extension will create the meta-data and the procedures to manage it, so the following command should be executed on psql:

postgres=#CREATE EXTENSION pg_cron;

CREATE EXTENSION

Now, the needed objects to schedule jobs are already defined on the cron schema:

This extension is very simple, only the job table is enough to manage all this functionality:

Definition of New Jobs

The scheduling syntax to define jobs on pg_cron is the same one used on the cron tool, and the definition of new jobs is very simple, it’s only necessary to call the function cron.schedule:

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(12356,''DAILY_DATA'');')

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(998934,''WEEKLY_DATA'');')

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(45678,''DAILY_DATA'');')

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(1010,''WEEKLY_DATA'');')

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(1001,''MONTHLY_DATA'');')

select cron.schedule('*/5 * * * *','select reporting.f_reset_client_data(0,''DATA'')')

select cron.schedule('*/5 * * * *','VACUUM')

select cron.schedule('*/5 * * * *','$$DELETE FROM reporting.rep_request WHERE create_dt<now()- interval '60 DAYS'$$)

The job setup is stored on the job table: 

Another way to define a job is by inserting the data directly on the cron.job table:

INSERT INTO cron.job (schedule, command, nodename, nodeport, database, username)

VALUES ('0 11 * * *','call loader.load_data();','postgresql-pgcron',5442,'staging', 'loader');

and use custom values for nodename and nodeport to connect to a different machine (as well as other databases).

Deactivation of a Jobs

On the other hand, to deactivate a job it’s only necessary to execute the following function:

select cron.schedule(8)

Jobs Logging

The logging of these jobs can be found on the PostgreSQL log file /var/log/postgresql/postgresql-12-main.log:

by Hugo Dias at February 03, 2020 08:01 PM

February 02, 2020

MariaDB Foundation

2020 MariaDB Day presentations

Our first MariaDB Day in Brussels is seeing some interesting presentations. Slides and videos are posted below.
This post will be updated as more slides and presentations become available. […]

The post 2020 MariaDB Day presentations appeared first on MariaDB.org.

by Ian Gilfillan at February 02, 2020 10:19 AM

January 30, 2020

Percona

Webinar 2/6: MySQL 8 vs. MariaDB 10.4

MySQL 8 vs. MariaDB 10.4

At the moment, MySQL 8 and MariaDB 10.4 are the latest versions of the corresponding database management systems. Each of these DBMS has a unique set of features. For example, specific MariaDB features might be unavailable in MySQL, and vice versa. In this presentation, we’ll cover these new features and provide recommendations regarding which will work best on which DBMS.

Please join Percona Senior Technical Manager Alkin Tezuysal on Thursday, February 6, 2020, at 9 am EST for his webinar “MySQL 8 vs MariaDB 10.4”.

Register Now

If you can’t attend, sign up anyway and we’ll send you the slides and recording afterward.

by David Quilty at January 30, 2020 02:46 PM

SeveralNines

Managing Database Backup Retention Schedules

Attention: Skip reading this blog post if you can afford unlimited storage space. 

If you could afford unlimited storage space, you wouldn't have to worry about backup retention at all, since you could store your backups infinitely without any restriction, provided your storage space provider can assure the data won't be missing. Database backup retention is commonly overlooked because it doesn't seem important at first, and only comes into sheer attention when you have stumbled upon resource limit or hit a bottleneck.

In this blog post, we are going to look into database backup retention management and scheduling and how we can manage them efficiently with ClusterControl.

Database Backup Retention Policy

Database backup retention policy refers to how long the database backups are kept within our possession. Some examples would be:

  • daily backups for big databases are kept for one week using local storage, 
  • weekly backups for small databases are kept for eight weeks on disk storage (both local and remote),
  • monthly backups for all databases are kept for 3 months on cloud storage,
  • no backups are saved beyond 3 months.

The main advantage of having a database backup retention policy is to make sure we efficiently manage our storage resources, without impacting our database recovery process if something goes wrong. You don't want to get caught up when an urgent recovery is needed and the necessary backup file is no longer there to help us perform restoration because we had got it deleted to clear some space up.

To build a good backup retention policy, we need to consider the two most important aspects:

  • Backup storage size.
  • Database backup size.

Backup Storage Size

The first priority is to ensure that we have enough space to store our backups as a starter. A simple rule of thumb is the storage space must at least have the same size as the data directory size for the database server. Generally, the bigger the storage size, the bigger the cost is. If you can opt for a bigger storage space, you can keep older backups longer. This aspect hugely influences your retention policy in terms of the number of backups that you can store. 

Storing the backups off-site, in the cloud, can be a good way to secure your backups against disaster. It comes with a higher price per GB ratio but it's still affordable considering the advantages that you will get from it. Most of the cloud storage providers now offer a secure, scalable, highly available with decent IO performance. Either way, ClusterControl supports storing your backup in the local storage, remote storage or in the cloud.

Database Backup Size

The size of backups are directly affected by the following factors:

  • Backup tools - Physical backup is commonly bigger than logical backup.
  • Backup method - Incremental and partial backups are smaller than a full backup.
  • Compression ratio - Higher compression level produces smaller backup, with a tradeoff of processing power.

Mix and match these 3 factors will allow you to have a suitable backup size to fit into your backup storage and restoration policy. If storing a full backup is considered too big and costly, we can combine incremental backups with a full backup to have a backup for one particular set. Incremental backups commonly stores the delta between two points, and usually only takes a relatively small amount of disk space if compared to a full backup. Or, you can opt for a full partial backup, just backs up chosen databases or tables that can potentially impact the business operation.

If a full physical backup with a compression ratio of 50%, producing 100M of backup size, you could increase the compression ratio to 100% in order to reduce the disk space usage but with a slower backup creation time. Just make sure that you are complying with your database recovery policy when deciding which backup tools, method and compression ratio to use.

Managing Retention Schedules Using ClusterControl

ClusterControl sophisticated backup management features includes the retention management for all database backup methods when creating or scheduling a backup:

The default value is 31 days, which means the backup will be kept in possession for 31 days, and will be automatically deleted on the 32nd day after it was successfully created. The default retention value (in day) can be changed under Backup Settings. One can customize this value for every backup schedule or on-demand backup creation job to any number of days or keep it forever. ClusterControl also supports retention for backup that is stored in the supported cloud platforms (AWS S3, Google Cloud Storage and Azure Blob Storage).

When a backup is successfully created, you will see the retention period in the backup list, as highlighted in the following screenshot:

For backup purging process, ClusterControl triggers backup purge thread every single time after any backup process for that particular cluster is completed. The purge backup thread looks for all "expired" backups and performs the necessary deletion process automatically. The purging interval sounds a bit excessive in some environments but this is the best purging scheduling configuration that we have figured out for most configurations thus far.  To understand this easily, consider the following backup retention setting for a cluster:

  1. One creates a weekly backup, with a retention period of 14 days.
  2. One creates an hourly backup, with a retention period of 7 days.
  3. One creates a monthly backup, without a retention period (keep forever).

For the above configuration, ClusterControl will initiate a backup purge thread for (a) and (b) every hour because of (b), although the retention period for (a) is 14 days. Created backups that have been marked as "Keep Forever" (c) will be skipped by the purge thread. This configuration protects ClusterControl from excessive purging if the job is scheduled daily. Thus, don't be surprised if you see the following lines in job messages after any of the backup job is completed:

Advanced Retention Management with ClusterControl CLI

ClusterControl CLI a.k.a s9s, can be used to perform advanced retention management operations like deleting old backup files while keeping a number of copies exist for safety purpose. This can be very useful when you need to clear up some space and have no idea which backups that will be purged by ClusterControl, and you want to make sure that a number of copies of the backup must exist regardless of its expiration as a precaution. We can easily achieve this with the following command:

$ s9s backup \
--delete-old \
--cluster-id=4 \
--backup-retention=60 \
--cloud-retention=180 \
--safety-copies=3 \
--log

Deleting old backups.
Local backup retention is 60 day(s).
Cloud backup retention is 180 day(s).
Kept safety backup copies 3.
Querying records older than 60 day(s).
Checking for backups to purge.
No old backup records found, nothing to delete.
Checking for old backups is finished.

The above job will force ClusterControl to look for local backups that have been created which are older than 60 days and backups that are stored in the cloud which are older than 180 days. If ClusterControl finds something that matches this query, ClusterControl will make sure only the 4th copy and older will be deleted, regardless of the retention period.

The --backup-retention and --cloud-retention parameters accept a number of values:

  • A positive number  value can control how long (in days) the taken backups will be preserved.
  • -1 has a very special meaning, it means the backup will be kept forever.
  • 0 is the default, it means prefer the global setting which can be configured from the UI.

Apart from the above, the standard backup creation job can be triggered directly from the command line. The following command create a mysqldump backup for cluster ID 4 on node 192.168.1.24, where we will keep the backup forever:

$ s9s backup --create \
--backup-method=mysqldump \
--cluster-id=4 \
--nodes=192.168.1.24:3306 \
--backup-retention=-1 \
--log

192.168.1.24:3306: Preparing for backup - host state (MYSQL_OK) is acceptable.
192.168.1.24:3306: Verifying connectivity and credentials.
Checking backup creation job.
192.168.1.24:3306: Timezone of backup host is UTC.
Backup title is     ''.
Backup host is      192.168.1.24:3306.
Backup directory is /backups/production/mysqldump/.
Backup method is    mysqldump.
PITR compatible     no.
Backup record created.
Backup record saved.
192.168.1.24: Creating backup dir '/backups/production/mysqldump/BACKUPPERDB-190-mysqldump-2020-01-25_093526'.
Using gzip to compress archive.
192.168.1.24:3306: detected version 5.7.28-31-log.
Extra-arguments be passed to mysqldump:  --set-gtid-purged=OFF
Backup (mysqldump, storage node): '192.168.1.24: /usr/bin/mysqldump --defaults-file=/etc/my.cnf  --flush-privileges --hex-blob --opt --master-data=2 --single-transaction --skip-lock-tables --triggers --routines --events   --set-gtid-purged=OFF --databases mysql backupninja backupninja_doc proxydemo severalnines_prod severalnines_service --ignore-table='mysql.innodb_index_stats'  --ignore-table='mysql.innodb_table_stats' |gzip -c > /backups/production/mysqldump/BACKUPPERDB-190-mysqldump-2020-01-25_093526/mysqldump_2020-01-25_093546_dbdumpfile.sql.gz'.
192.168.1.24: MySQL >= 5.7.6 detected, enabling 'show_compatibility_56'
192.168.1.24: A progress message will be written every 1 minutes
192.168.1.24: Backup 190 completed and is stored in 192.168.1.24:/backups/production/mysqldump/BACKUPPERDB-190-mysqldump-2020-01-25_093526.
192.168.1.24:/backups/production/mysqldump/BACKUPPERDB-190-mysqldump-2020-01-25_093526: Custom retention period: never delete.
Checking for backup retention (clearing old backups).
Local backup retention is 31 day(s).
Cloud backup retention is 180 day(s).
Kept safety backup copies 1.
Querying records older than 31 day(s).
Checking for backups to purge.
Found 4 backups older than 31 day(s).
We have 9 completed full backups.

For more explanation and examples, check out the s9s backup guide.

Conclusion

ClusterControl backup retention management allows you to manage your backup storage space efficiently, without compromising your database recovery policy.

by ashraf at January 30, 2020 10:45 AM

Federico Razzoli

Practical advice for MySQL/MariaDB live migrations

Modifying table structures is sometimes necessary, or desirable. Modifying them online can be a pain, especially with big tables. Migrations should be ran properly in production.

by Federico Razzoli at January 30, 2020 10:42 AM

January 29, 2020

MariaDB Foundation

MariaDB Day Brussels 02.02.2020 – Introducing speakers – Sveta Smirnova on How to Avoid Pitfalls in Schema Upgrade with Galera

Galera Cluster for MySQL is a 100% synchronized cluster in regards to data modification operations (DML). It is ensured by the optimistic locking model and ability to rollback a transaction, which cannot be applied on all nodes. […]

The post MariaDB Day Brussels 02.02.2020 – Introducing speakers – Sveta Smirnova on How to Avoid Pitfalls in Schema Upgrade with Galera appeared first on MariaDB.org.

by Anna Widenius at January 29, 2020 03:59 PM

SeveralNines

What to Monitor in MySQL 8.0

Monitoring is a must in all environments, and databases aren’t the exception. Once you have your database infrastructure up-and-running, you’ll need to keep tabs on what’s happening. Monitoring is a must if you want to be sure everything is going fine but also if you make necessary adjustments while your system grows and evolves. That will enable you to identify trends, plan for upgrades or improvements, or react adequately to any problems or errors that may arise with new versions, different purposes, and so on.

For each database technology, there are different things to monitor. Some of these are specific to the database engine, vendor, or even the particular version that you’re using. Database clusters heavily depend on the underlying infrastructure, so network and operating stats are interesting to see by the database administrators too. 

When running multiple database systems, the monitoring of these systems can become quite a chore. 

In this blog, we’ll take a look at what you need to monitor a MySQL 8.0 environment. We will also take a look at cluster control monitoring features, which may help you to track the health of your databases for free.

OS and Database System Monitoring

When observing a database cluster or node, there are two main points to take into account: the operating system and the MySQL instance itself. You will need to define which metrics you are going to monitor from both sides and how you are going to do it. You need to follow the parameter always in the meaning of your system, and you should look for alterations on the behavior model.

Grip in mind that when one of your parameters is affected, it can also affect others, making troubleshooting of the issue more complicated. Having a proper monitoring and alerting system is essential to make this task as simple as possible.

In most cases, you will need to use some tools, as it is difficult to find one to cover all the wanted metrics. 

OS System Monitoring

One major thing (which is common to all database engines and even to all systems) is to monitor the Operating System behavior. Here are some points to check here. Below you can find top system resources to watch on a database server. It's actually also the list of very first things to check.

CPU Usage

A high CPU usage is not a bad thing as long as you don’t reach the limit. Excessive percentage of CPU usage could be a problem if it’s not usual behavior. In this case, it is essential to identify the process/processes that are generating this issue. If the problem is the database process, you will need to check what is happening inside the database.

RAM Memory or SWAP Usage

Ideally, your entire database should be stored in memory, but this is not always possible. Give MySQL as much as you can afford but leave enough for other processes to function.

If you see a high value for this metric and nothing has changed in your system, you probably need to check your database configuration. Parameters like shared_buffers and work_mem can affect this directly as they define the amount of memory to be able to use for the MySQL database. Swap is for emergencies only, and it should not be used, make sure you also have your operating system set to let MySQL decide about swap usage.

Disk Usage 

Disk usage is one of the key metrics to monitor and alert. Make sure you always have free space for new data, temporary files, snapshots, or backups.

Monitoring hard metric values is not good enough. An abnormal increase in the use of disk space or an excessive disk access consumption is essential things to watch as you could have a high number of errors logged in the MySQL log file or a lousy cache configuration that could generate a vital disk access consumption instead of using memory to process the queries. Make sure you are able to catch abnormal behaviors even if your warning and critical metrics are not reached yet.

Along with monitoring space we also should monitor disk activity.  The top values to monitor are:

  • Read/Write requests
  • IO Queue length
  • Average IO wait
  • Average Read/Write time
  • Read/Write bandwidth

You can use iostat or pt-diskstats from Percona to see all these details. 

Things that can affect your disk performance are often related to data transfer from and towards your disk so monitor abnormal processes than can be started from other users.

Load Average

An all-in-one performance metric. Understanding Linux Load is a key to monitor OS and database dependent systems.

Load average related to the three points mentioned above. A high load average could be generated by an excessive CPU, RAM, or disk usage.

Network

Unless doing backups or transferring vast amounts of data, it shouldn’t be the bottleneck.

A network issue can affect all the systems as the application can’t connect (or connect losing packages) to the database, so this is an important metric to monitor indeed. You can monitor latency or packet loss, and the main issue could be a network saturation, a hardware issue, or just a lousy network configuration.

Database Monitoring

While monitoring is a must, it’s not typically free. There is always a cost on the database performance, depending on how much you are monitoring, so you should avoid monitoring things that you won’t use.

In general, there are two ways to monitor your databases, from the logs or from the database side by querying.

In the case of logs, to be able to use them, you need to have a high logging level, which generates high disk access and it can affect the performance of your database.

For the querying mode, each connection to the database uses resources, so depending on the activity of your database and the assigned resources, it may affect the performance too.

Of course, there are many metrics in MySQL. Here we will focus on the top important.

Monitoring Active Sessions

You should also track the number of active sessions and DB up down status. Often to understand the problem you need to see how long the database is running. so we can use this to detect respawns.

The next thing would be a number of sessions. If you are near the limit, you need to check if something is wrong or if you just need to increment the max_connections value. The difference in the number can be an increase or decrease of connections. Improper usage of connection pooling, locking or network issues are the most common problems related to the number of connections.

The key values here are

  • Uptime
  • Threads_connected
  • Max_used_connections
  • Aborted_connects

Database Locks

If you have a query waiting for another query, you need to check if that another query is a normal process or something new. In some cases, if somebody is making an update on a big table, for example, this action can be affecting the normal behavior of your database, generating a high number of locks.

Monitoring Replication

The key metrics to monitor for replication are the lag and the replication state. Not only the up down status but also the lag because a continuous increase in this value is not a very good sign as it means that the slave is not able to catch up with its master.

The most common issues are networking issues, hardware resource issues, or under dimensioning issues. If you are facing a replication issue you will need to know this asap as you will need to fix it to ensure the high availability environment. 

Replication is best monitored by checking SLAVE STATUS and the following parameters:

  • SLAVE_RUNNING
  • SLAVE_IO_Running
  • SLAVE_SQL_RUNNING
  • LAST_SQL_ERRNO
  • SECONDS_BEHIND_MASTER

Backups

Unfortunately, the vanilla community edition doesn't come with the backup manager. You should know if the backup was completed, and if it’s usable. Usually, this last point is not taken into account, but it’s probably the most critical check in a backup process. Here we would have to use external tools like percona-xtrabackup or ClusterControl.

Database Logs

You should monitor your database log for errors like FATAL or deadlock, or even for common errors like authentication issues or long-running queries. Most of the errors are written in the log file with detailed useful information to fix it. Common failure points you need to keep an eye on are errors, log file sizes. The location of the error log can be found under the log_error variable.

External Tools

Last but not least you can find a list of useful tools to monitor your database activity. 

Percona Toolkit - is the set of Linux tools from Percona to analyze MySQL and OS activities. You can find it here. It supports the most popular 64 bit Linux distributions like Debian, Ubuntu, and Redhat. 

mysqladmin - mysqladmin is an administration program for the MySQL daemon. It can be used to check server health (ping), list the processes, see the values of the variables, but also do some administrative work like create/drop databases, flush (reset) logs, statistics, and tables, kill running queries, stop the server and control replication.

innotop - offers an extended view of SHOW statements. It's very powerful and can significantly reduce the investigation time. Among vanilla MySQL support, you can see the Galera view and Master-slave replication details. 

mtop - monitors a MySQL server showing the queries which are taking the most amount of time to complete. Features include 'zooming' in on a process to show the complete query, 'explaining' the query optimizer information for a query and 'killing' queries. In addition, server performance statistics, configuration information, and tuning tips are provided.

Mytop -  runs in a terminal and displays statistics about threads, queries, slow queries, uptime, load, etc. in tabular format, much similar to the Linux

Conclusion

This blog is not intended to be an exhaustive guide to how to enhance database monitoring, but it hopefully gives a clearer picture of what things can become essential and some of the basic parameters that can be watched. Do not hesitate to let us know if we’ve missed any important ones in the comments below.

 

by Bart Oles at January 29, 2020 09:09 AM

MariaDB Foundation

MariaDB Day Brussels 02.02.2020 – Introducing speakers – Seppo Jaakola on MariaDB 10.5 new Galera features

Galera R&D team is currently finalizing new features targeted for the next MariaDB 10.5 release. This presentation is a high level overview of the most prominent Galera clustering features under work, such as:
* Non Blocking DDL – […]

The post MariaDB Day Brussels 02.02.2020 – Introducing speakers – Seppo Jaakola on MariaDB 10.5 new Galera features appeared first on MariaDB.org.

by Anna Widenius at January 29, 2020 08:45 AM

January 28, 2020

MariaDB Foundation

MariaDB day Brussels 02.02.2020 – Introducing speakers – Vicențiu Ciorbaru on comparing MariaDB and MySQL Roles.

MySQL 8.0 has introduced roles, a feature that was present since MariaDB 10.0. There are quite a number of differences between the two databases.
During the MariaDB day Vicențiu will present a comparison between them and see how roles are useful for your application and what are the key differences to consider when working with both databases. […]

The post MariaDB day Brussels 02.02.2020 – Introducing speakers – Vicențiu Ciorbaru on comparing MariaDB and MySQL Roles. appeared first on MariaDB.org.

by Anna Widenius at January 28, 2020 03:47 PM

MariaDB Day Brussels 02.02.2020 – Introducing speakers – Andrew Hutchings on ColumnStore engine

MariaDB has an Open Source engine called ColumnStore which provides columnar storage capabilities. During the MariaDB Day Andrew Hutchings (a.k.a. LinuxJedi) will hold a talk explaining what columnar storage is, how it works and the advantages / disadvantages of it. […]

The post MariaDB Day Brussels 02.02.2020 – Introducing speakers – Andrew Hutchings on ColumnStore engine appeared first on MariaDB.org.

by Anna Widenius at January 28, 2020 12:25 PM

SeveralNines

Understanding the ProxySQL Audit Log

ProxySQL became a very important bit of infrastructure in the database environments. It works as a load balancer, it helps to shape the flow of the traffic and reduce the downtime. With great power comes great responsibility. How can you stay up to date on who is accessing the ProxySQL configuration? Who is connecting to the database through ProxySQL? Those questions can be answered using ProxySQL Audit Log, which is available starting from ProxySQL 2.0.5. In this blog post we will look into how to enable this feature and how the log contents look like.

The initial steps will be to deploy ProxySQL. We can easily do that using ClusterControl - both MySQL Replication and Galera Cluster types support ProxySQL deployment.

Assuming we have a cluster up and running, we can deploy ProxySQL from Manage -> LoadBalancers:

We have to decide on which node ProxySQL should be installed, its version (we’ll keep the default 2.x) and define credentials for ProxySQL administrative and monitoring users.

Below we can either import existing application users from the database or create a new one by assigning name, password, schema and MySQL privileges. We can then configure which nodes should be included in ProxySQL and decide if we use implicit transactions or not. Once everything is done, we can deploy ProxySQL. For high availability you probably want to add a second ProxySQL and then keepalived on top of them. Keepalived can also be easily deployed from ClusterControl:

Here we have to pick nodes on which ProxySQL is deployed, pass the Virtual IP and network interface VIP should be assigned to. Once this is done, ClusterControl can deploy Keepalived for you.

Now, let’s take a look at the audit log. All configurations should be performed on both ProxySQL nodes. Alternatively you can use an option to sync the nodes:

There are two settings that govern how the audit log should work:

The first one defines the file where data should be stored, the second tells how large the log file should be before it’ll be rotated. Let’s configure log in ProxySQL data directory:

Now, we can take a look at the data we see in the audit log file. First of all, the format in which data is stored is JSON. There are two types of events, one related to MySQL connectivity and second related to ProxySQL admin interface connectivity.

Here is an example of entries triggered by MySQL traffic:

  "client_addr": "10.0.0.100:40578",

  "event": "MySQL_Client_Connect_OK",

  "proxy_addr": "0.0.0.0:6033",

  "schemaname": "sbtest",

  "ssl": false,

  "thread_id": 810,

  "time": "2020-01-23 14:24:17.595",

  "timestamp": 1579789457595,

  "username": "sbtest"

}

{

  "client_addr": "10.0.0.100:40572",

  "event": "MySQL_Client_Quit",

  "proxy_addr": "0.0.0.0:6033",

  "schemaname": "sbtest",

  "ssl": false,

  "thread_id": 807,

  "time": "2020-01-23 14:24:17.657",

  "timestamp": 1579789457657,

  "username": "sbtest"

}

{

  "client_addr": "10.0.0.100:40572",

  "creation_time": "2020-01-23 14:24:17.357",

  "duration": "299.653ms",

  "event": "MySQL_Client_Close",

  "extra_info": "MySQL_Thread.cpp:4307:process_all_sessions()",

  "proxy_addr": "0.0.0.0:6033",

  "schemaname": "sbtest",

  "ssl": false,

  "thread_id": 807,

  "time": "2020-01-23 14:24:17.657",

  "timestamp": 1579789457657,

  "username": "sbtest"

}

As you can see, most of the data repeats: client address, ProxySQL address, schema name, if SSL was used in connections, related thread number in MySQL, user that created the connection. The “MySQL_Client_Close” event also contains information about the time when the connection was created and the duration of the connection. You can also see which part of ProxySQL code was responsible for closing the connection.

Admin connections are quite similar:

{

  "client_addr": "10.0.0.100:52056",

  "event": "Admin_Connect_OK",

  "schemaname": "information_schema",

  "ssl": false,

  "thread_id": 815,

  "time": "2020-01-23 14:24:19.490",

  "timestamp": 1579789459490,

  "username": "proxysql-admin"

}

{

  "client_addr": "10.0.0.100:52056",

  "event": "Admin_Quit",

  "schemaname": "information_schema",

  "ssl": false,

  "thread_id": 815,

  "time": "2020-01-23 14:24:19.494",

  "timestamp": 1579789459494,

  "username": "proxysql-admin"

}

{

  "client_addr": "10.0.0.100:52056",

  "creation_time": "2020-01-23 14:24:19.482",

  "duration": "11.795ms",

  "event": "Admin_Close",

  "extra_info": "MySQL_Thread.cpp:3123:~MySQL_Thread()",

  "schemaname": "information_schema",

  "ssl": false,

  "thread_id": 815,

  "time": "2020-01-23 14:24:19.494",

  "timestamp": 1579789459494,

  "username": "proxysql-admin"

}

The data collected is very similar, the main difference is that it is related to connections to the ProxySQL administrative interface.

Conclusion

As you can see, in a very easy way you can enable auditing of the access to ProxySQL. This, especially the administrative access, is something which should be monitored from the security standpoint. Audit plugin makes it quite easy to accomplish.

by krzysztof at January 28, 2020 10:45 AM

MariaDB Foundation

January 27, 2020

SeveralNines

A Comparison Between the MySQL Clone Plugin and Xtrabackup

In one of our previous blogs we explained how Clone Plugin, one of new features that showed in MySQL 8.0.17, can be used to rebuild a replication slave. Currently the go-to tool for that, as well as for backups, is Xtrabackup. We thought it is interesting to compare how those tools work and behave.

Comparing Performance

The first thing we decided to test is how both perform when it comes to storing the copy of the data locally. We used AWS and m5d.metal instance with two NVMe SSD and we ran the clone to local copy:

mysql> CLONE LOCAL DATA DIRECTORY='/mnt/clone/';

Query OK, 0 rows affected (2 min 39.77 sec)

Then we tested Xtrabackup and made the local copy:

rm -rf /mnt/backup/ ; time xtrabackup --backup --target-dir=/mnt/backup/ --innodb-file-io-threads=8 --innodb-read-io-threads=8  --innodb-write-io-threads=8 --innodb-io-capacity=20000 --parallel=16

200120 13:12:28 completed OK!

real 2m38.407s

user 0m45.181s

sys 4m18.642s

As you can see, the time required to copy the data was basically the same. In both cases the limitation was the hardware, not the software.

Transferring data to another server will be the most common use case for both tools. It can be a slave you want to provision or rebuild. In the future it may be a backup, Clone Plugin doesn’t have such functionality as of now but we are pretty sure in the future someone will make it possible to use it as a backup tool. Given that hardware is the limitation for local backup in both cases, hardware will also be a limitation for transferring the data across the network. Depending on your setup, it could be either the network, disk I/O or CPU.

In a I/O-intensive operations CPU is the least common bottleneck. This makes it quite common to trade some CPU utilization for reduction in the data set size. You can accomplish that through compression. If it is done on the fly, you still have to read the same amount of data but you send less of it (as it is compressed) over the network. Then, you will have to decompress it and write it down. It is also possible that the files themselves are compressed. In that case you reduce the amount of data read, transferred and written to disk.

Both Clone Plugin and Xtrabackup come with an on-the-fly compression (we would like to thanks Kenny Gryp, who corrected us on this bit). In Clone Plugin you can enable it through clone_enable_compression, which is disabled by default. Xtrabackup can also utilize external tools to compress the data. In case of compressed InnoDB tables, external compression won't make too much of a difference so both tools should perform in a similar manner in case the network bandwidth is the limiting factor.

Comparing Usability

Performance is just one thing to compare, there are many others like how easy tools are to use. In both cases there are several steps you have to perform. For Clone Plugin it is:

  1. Install the plugin on all nodes
  2. Create users on both donor and receiver nodes
  3. Set up the donor list on the receiver

Those three steps have to be performed once. When they are set, you can use Clone Plugin to copy the data. Based on the init system you may need to start MySQL node after the clone process has completed. This is not required if, like in the case of systemd, MySQL will be automatically restarted.

Xtrabackup requires a couple more steps to get things done.

  1. Install the software on all nodes
  2. Create user on the donor

Those two steps have to be executed once. For every backup you have to execute following steps:

  1. Configure network streaming. Simple and secure way would be to use SSH, something like:
xtrabackup --backup --innodb-file-io-threads=8 --innodb-read-io-threads=8  --innodb-write-io-threads=8 --innodb-io-capacity=20000 --parallel=8 --stream=xbstream --target-dir=/mnt/backup/ | ssh root@172.32.4.70 "xbstream -x -C /mnt/backup/"

We found, though, for faster harddrives, with single-threaded SSH, CPU becomes a bottleneck. Setting up netcat requires additional step on the receiver to ensure netcat is up, listening and redirecting the traffic to the proper software (xbstream).

  1. Stop MySQL on the receiver node

  2. Run the Xtrabackup

  3. Apply InnoDB logs

  4. Copy back the data

  5. Start MySQL on the receiver node

As you can see, Xtrabackup requires more steps to be taken.

Security Considerations

Clone Plugin can be configured to use SSL for data transfer even though by default it uses plain text. Cloning of the encrypted tablespaces is possible but there is no option to encrypt, for example, the local clone. User would have to do it separately, after the clone process is completed.

Xtrabackup itself doesn’t provide any security. Security is determined by how you stream the data. If you use SSH for streaming, data in transit will be encrypted. If you decide to use netcat, it will be sent as a plain text. Of course, if the data is encrypted in tablespaces, it is already secured, just like in the case of the Clone Plugin. Xtrabackup can also be used along with on-the-fly encryption to ensure your data is encrypted also at rest.

Plugin Features

Clone Plugin is a new product, still in an infant phase. Its primary task is to provide ways of provisioning nodes in InnoDB Cluster and it does that just fine. For other tasks, like backups or provisioning of replication slaves, it can be used to some extent but it suffers from several limitations. We covered some of them in our previous blog so we won’t repeat it here but the most serious one, when talking about provisioning and backups, is that only InnoDB tables are cloned. If you happen to use any other storage engine, you cannot really use Clone Plugin. On the other hand Xtrabackup will happily backup and transfer most commonly used storage engines: InnoDB, MyISAM (unfortunately, it’s still used in many places) and CSV. Xtrabackup comes also with a set of tools that are intended to help with streaming the data from node to node or even stream backup to S3 buckets.

To sum it up, when it comes to backing up data and provisioning replication slaves, xtrabackup is and most likely will still be the most popular pick. On the other hand, Clone Plugin, most likely, will improve and evolve. We will see what the future holds and how things will look like in a year’s time.

Let us know if you have any thoughts on the Clone Plugin, we are very interested to see what is your opinion on this new tool.

 

by krzysztof at January 27, 2020 10:45 AM

MariaDB Foundation

January 26, 2020

Valeriy Kravchuk

Using bpftrace on Fedora 29 - More Advanced Examples

It happened so that I did not have much time to devote to further eBPF and bpftrace-based tracing and profiling since October 2019, when I posted a simple example of how it can be used on Fedora 29 to "emulate" slow query log (to some extent, by recording queries and time it took to execute them) with dynamic probes. Recently I started preparing for my talk about ftrace, eBPF and bpftrace planned for FOSDEM MySQL, MariaDB and Friends Devroom, but rejected there and later accepted for "MariaDB Day Brussels 0202 2020" (that I suggest everyone to attend, at least if you do not have better plans for the second day of FOSDEM 2020).

So, I decided to check for any ways to use bpftrace as a quick profiler, maybe "embed" it into pt-pmp one day and, surely, get what MariaDB developers ask me most often while working on performance problems or hangs - stack traces. This blog post describes my "achievements" so far on this way.

I had the same version of bpftrace coming from Fedora RPM (that had not updated in the meantime):
[openxs@fc29 server]$ rpm -q -a | grep bpftrace
bpftrace-0.9-0.fc29.x86_64
So, after checking this fine reference guide quickly, I came up with a simple one-liner:
[openxs@fc29 ~]$ sudo bpftrace -e 'profile:hz:99 { @[ustack] = count(); }'
Error creating map: '@'
Error creating stack id map
This might have happened because kernel.perf_event_max_stackis smaller than 127. Try to tweak this value with sysctl kernel.perf_event_max_stack=<new value>
Error creating perf event map (-1)
Attaching 1 probe...
Segmentation fault
Ups! I tried to do what error message suggested:
[openxs@fc29 ~]$ sudo sysctl -w kernel.perf_event_max_stack=1000
kernel.perf_event_max_stack = 1000
But it had not helped. This is how I hit my first problem with bpftrace. Fedora had not cared to update it when it stopped matching "something" in a new kernel, for whatever reason. This is actually a real problem for eBPF programs - what they assume may not be "there" in a kernel, so they have to be checked carefully. There is a separate "BPF Type Format (BTF)" for encoding the debug info related to BPF program (or map) that may help to find out if the program is "compatible" without trying it. In my case incompatibility was obvious and I needed some fast way to proceed with bpftrace, in a hope to come up with some useful examples for my talk.

So, i decided to build new version of bpftrace from recent GitHub source (against bcc tools I had installed and supposedly up to date). This ended up with errors no matter what I tried:
[openxs@fc29 build]$ make
[  3%] Built target parser
[  4%] Built target arch
[  5%] Building CXX object src/ast/CMakeFiles/ast.dir/irbuilderbpf.cpp.o
/mnt/home/openxs/git/bpftrace/src/ast/irbuilderbpf.cpp: In member function 'void bpftrace::ast::IRBuilderBPF::CreateSignal(llvm::Value*)':
/mnt/home/openxs/git/bpftrace/src/ast/irbuilderbpf.cpp:669:16: error: 'BPF_FUNC_send_signal' was not declared in this scope
       getInt64(BPF_FUNC_send_signal),
                ^~~~~~~~~~~~~~~~~~~~
...
Quick search led me to the idea that it's actually a know and recently reported problem, see Issue #1014. As a solution I decided to rebuild bcc tools from GitHub source as well (the alternative to to change bpftrace source) and removed Fedora packages. To make a long story short, this is what I did eventually (from fc -l output):
1051     rm -rf bcc/
1052     rm -rf bpftrace/
1053     sudo dnf remove bcc
1054     git clone https://github.com/iovisor/bcc.git
1055     cd bcc/
1056     git submodule update --init --recursive
1057     sudo dnf install -y bison cmake ethtool flex git iperf libstdc++-static   python-netaddr python-pip gcc gcc-c++ make zlib-devel   elfutils-libelf-devel
1058     sudo dnf install -y luajit luajit-devel
1059     sudo dnf install -y   http://repo.iovisor.org/yum/extra/mageia/cauldron/x86_64/netperf-2.7.0-1.mga6.x86_64.rpm
1060     sudo pip install pyroute2
1061     sudo dnf install -y clang clang-devel llvm llvm-devel llvm-static ncurses-devel
1062     mkdir build
1063     cd build/
1064     cmake .. -DCMAKE_INSTALL_PREFIX=/usr
1065     make
1066     make test
1067     sudo make install
1068     cd ../..
1069     git clone https://github.com/iovisor/bpftrace.git
1070     cd bpftrace/
1071     git submodule update --init --recursive
1072     mkdir build
1073     cd build/
1074     cmake -DCMAKE_BUILD_TYPE=Release ../
1075     make
1076     ./tests/bpftrace_test
1077     fc -l
Basically I did it all as documented in related projects for my version of Fedora. Step highlighted (1056 above) is essential though, as it was missed here in any obvious way as a mandatory (while it is in recent versions). Step 1076 was successful:
...
[----------] Global test environment tear-down
[==========] 350 tests from 9 test cases ran. (2590 ms total)
[  PASSED  ] 350 tests.

[openxs@fc29 build]$ make testRunning tests...
Test project /mnt/home/openxs/git/bpftrace/build
    Start 1: bpftrace_test
1/3 Test #1: bpftrace_test ....................   Passed    2.42 sec
    Start 2: runtime_test
2/3 Test #2: runtime_test .....................***Failed   21.10 sec
    Start 3: tools-parsing-test
3/3 Test #3: tools-parsing-test ...............***Failed    5.88 sec

33% tests passed, 2 tests failed out of 3

Total Test time (real) =  29.41 sec

The following tests FAILED:
          2 - runtime_test (Failed)
          3 - tools-parsing-test (Failed)
Errors while running CTest
make: *** [Makefile:128: test] Error 8
while make test shown some failures (same as make test for bcc tools), but form previous experience this had not looked like a big deal. I continued basic testing that the resulting bpftrace now works (I tried to add kernel probe, guess what software runs on this workstation):
[openxs@fc29 build]$ sudo ./src/bpftrace -e 'kprobe:do_nanosleep { printf("sleep by %s\n", comm); }'
Attaching 1 probe...
sleep by percona-qan-age
sleep by percona-qan-age
sleep by skypeforlinux
sleep by mysqld
...
^C
So, with some confidence built up, I've installed my new and very recent bpftrace tool I built myself:
[openxs@fc29 build]$ sudo make install
[  3%] Built target parser
...
-- Installing: /usr/local/bin/bpftrace
...
-- Installing: /usr/local/man/man8/xfsdist.8.gz

[openxs@fc29 build]$ which bpftrace
/usr/local/bin/bpftrace

[openxs@fc29 build]$ cd ..
[openxs@fc29 bpftrace]$ git log -1
commit afafbf561522dd33fa316be3e33375bc662399ac (HEAD -> master, origin/master, origin/HEAD)
Author: bas smit <bas@baslab.org>
Date:   Fri Jan 17 12:33:43 2020 +0100

    Add `--info` flag

    This prints the supported kernel features
Quick test involved starting MariaDB 10.3 with thread pool enabled (more on "why thread pool?" in some next blog post):
[openxs@fc29 maria10.3]$ ./bin/mysqld_safe --no-defaults --thread_handling=pool-of-threads --thread_pool_idle_timeout=10 &
[1] 32643
and running sysbench:
[openxs@fc29 maria10.3]$ sysbench /usr/local/share/sysbench/oltp_update_index.lua --mysql-host=127.0.0.1 --mysql-user=root --mysql-port=3306 --threads=10 --tables=4 --table-size=1000000 --time=600 --report-interval=5 run
sysbench 1.1.0-174f3aa (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 10
Report intermediate results every 5 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 5s ] thds: 10 tps: 54.37 qps: 54.37 (r/w/o: 0.00/54.37/0.00) lat (ms,95%): 467.30 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 10 tps: 40.60 qps: 40.60 (r/w/o: 0.00/40.60/0.00) lat (ms,95%): 530.08 err/s: 0.00 reconn/s: 0.00
[ 15s ] thds: 10 tps: 40.00 qps: 40.00 (r/w/o: 0.00/40.00/0.00) lat (ms,95%): 520.62 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 10 tps: 38.00 qps: 38.00 (r/w/o: 0.00/38.00/0.00) lat (ms,95%): 623.33 err/s: 0.00 reconn/s: 0.00
[ 25s ] thds: 10 tps: 34.00 qps: 34.00 (r/w/o: 0.00/34.00/0.00) lat (ms,95%): 746.32 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 10 tps: 36.40 qps: 36.40 (r/w/o: 0.00/36.40/0.00) lat (ms,95%): 549.52 err/s: 0.00 reconn/s: 0.00
[ 35s ] thds: 10 tps: 37.80 qps: 37.80 (r/w/o: 0.00/37.80/0.00) lat (ms,95%): 601.29 err/s: 0.00 reconn/s: 0.00
...
and adding the probe like this in another shell:
[openxs@fc29 ~]$ sudo bpftrace -e 'profile:hz:99 { @[ustack] = count(); }' > /tmp/bpf.stack
^C
When I stopped collecting with Ctrl-C, bpftrace printed the content of associative arrays collected (with unique user space stack 9for all processes!) as index and count of time this stack was as a value). No segmentation fault, unlike with the version from package! In the resulting file we can see (among other things):
[openxs@fc29 bpftrace]$ cat /tmp/bpf.stack | more
Attaching 1 probe...

...
@[
    rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned short*, bool, unsigned long, mem_block_info_t**)+31
    page_cur_search_with_match_bytes(buf_block_t const*, dict_index_t const*, dtuple_t const*, page_cur_mode_t, unsigned long*, unsigned long*, unsigned long*, unsigned long*, page_cur_t*)+1773
    btr_cur_search_to_nth_level_func(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_cur_t*, rw_lock_t*, char const*, unsigned int, mtr_t*, unsigned long)+8316
    btr_pcur_open_low(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_pcur_t*, char const*, unsigned int, unsigned long, mtr_t*) [clone .constprop.33]+146
    row_search_on_row_ref(btr_pcur_t*, unsigned long, dict_table_t const*, dtuple_t const*, mtr_t*)+85
    row_purge_poss_sec(purge_node_t*, dict_index_t*, dtuple_t const*, btr_pcur_t*, mtr_t*, bool)+503
    row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, dtuple_t const*)+1293
    row_purge_record_func(purge_node_t*, unsigned char*, que_thr_t const*, bool)+2271
    row_purge_step(que_thr_t*)+1057
    que_run_threads(que_thr_t*)+2280
    srv_worker_thread+956
    start_thread+234
]: 1

...

@[
    rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned short*, bool, unsigned long, mem_block_info_t**)+31
    btr_search_build_page_hash_index(dict_index_t*, buf_block_t*, rw_lock_t*, unsigned long, unsigned long, unsigned long)+2781
    btr_search_info_update_slow(btr_search_t*, btr_cur_t*)+622
    btr_cur_search_to_nth_level_func(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_cur_t*, rw_lock_t*, char const*, unsigned int, mtr_t*, unsigned long)+10130
    row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)+4823
    ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function)+338
    handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+280
    handler::read_range_first(st_key_range const*, st_key_range const*, bool, bool)+98
    handler::multi_range_read_next(void**)+183
    Mrr_simple_index_reader::get_next(void**)+74
    DsMrr_impl::dsmrr_next(void**)+66
    QUICK_RANGE_SELECT::get_next()+39
    rr_quick(READ_RECORD*)+25
    mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, bool, unsigned long long*, unsigned long long*)+3560
    mysql_execute_command(THD*)+8446
    Prepared_statement::execute(String*, bool)+979
    Prepared_statement::execute_loop(String*, bool, unsigned char*, unsigned char*)+154
    mysql_stmt_execute_common(THD*, unsigned long, unsigned char*, unsigned char*, unsigned long, bool, bool)+349
    mysqld_stmt_execute(THD*, char*, unsigned int)+37
    dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool)+5123
    do_command(THD*)+368
    tp_callback(TP_connection*)+314
    worker_main(void*)+160
    start_thread+234
]: 1

...

@[
    trx_purge(unsigned long, bool)+1054
    srv_purge_coordinator_thread+1872
    start_thread+234
]: 2
@[
    ut_crc32_sw(unsigned char const*, unsigned long)+1611
    buf_calc_page_crc32(unsigned char const*)+61
    buf_flush_init_for_writing(buf_block_t const*, unsigned char*, void*, unsigned long)+872
    buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool)+149
    buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool)+607
    buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*)+1627
    buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long)+392
    pc_flush_slot()+2617
    buf_flush_page_cleaner_coordinator.cold.151+1235
    start_thread+234
]: 2

...
@[]: 6474
So, this is the default format to output stacks (one line per call, no address, frame number or anything, symbols properly resolved for my MariaDB built from GitHub source etc). But I had something like this mind eventually (from pt-pmp):
[openxs@fc29 bpftrace]$ pt-pmp
Tue Jan 21 13:52:12 EET 2020

...
     10 syscall(libc.so.6),__io_getevents_0_4(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:1932),LinuxAIOHandler::poll(os0file.cc:2093),os_aio_linux_handler(stl_vector.h:805),os_aio_handler(stl_vector.h:805),fil_aio_wait(fil0fil.cc:4435),io_handler_thread(srv0start.cc:326),start_thread(libpthread.so.0),clone(libc.so.6)
      6 pthread_cond_wait,os_event::wait(sync0types.h:476),os_event::wait_low(sync0types.h:476),log_write_up_to(log0log.cc:929),trx_flush_log_if_needed_low(trx0trx.cc:1218),trx_flush_log_if_needed(trx0trx.cc:1218),trx_commit_complete_for_mysql(trx0trx.cc:1218),innobase_commit(ha_innodb.cc:4637),commit_one_phase_2(handler.cc:1642),ha_commit_one_phase(handler.cc:1622),ha_commit_trans(handler.cc:1484),trans_commit_stmt(transaction.cc:505),mysql_execute_command(sql_parse.cc:6106),Prepared_statement::execute(sql_prepare.cc:4807),Prepared_statement::execute_loop(sql_prepare.cc:4235),Prepared_statement::execute_loop(sql_prepare.cc:4235),mysql_stmt_execute_common(sql_prepare.cc:3235),mysqld_stmt_execute(sql_prepare.cc:3132),dispatch_command(sql_parse.cc:1798),do_command(sql_parse.cc:1401),threadpool_process_request(threadpool_common.cc:359),tp_callback(threadpool_common.cc:359),worker_main(threadpool_generic.cc:1606),start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_wait,os_event::wait(sync0types.h:476),os_event::wait_low(sync0types.h:476),srv_resume_thread(srv0srv.cc:2533),srv_worker_thread(srv0srv.cc:2533),start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_timedwait,inline_mysql_cond_timedwait(mysql_thread.h:1215),get_event(mysql_thread.h:1215),worker_main(threadpool_generic.cc:1602),start_thread(libpthread.so.0),clone(libc.so.6)
      ...
That is, number of times collapsed stack was noted and then the stack itself, without function arguments if possible. This kind of stack is generated from gdb outputs that looks as follows:
Thread 1 (Thread 0x7f29eee66900 (LWP 32728)):
#0  0x00007f29ef25f431 in poll () from /lib64/libc.so.6
#1  0x000055cc4ece656f in poll (__timeout=-1, __nfds=2, __fds=0x7ffe174108e0) at /usr/include/bits/poll2.h:41
#2  handle_connections_sockets () at /mnt/home/openxs/git/server/sql/mysqld.cc:6753
#3  0x000055cc4ece7a7c in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /mnt/home/openxs/git/server/sql/mysqld.cc:6222
#4  0x00007f29ef190413 in __libc_start_main () from /lib64/libc.so.6
#5  0x000055cc4ecd9ece in _start () at /mnt/home/openxs/git/server/sql/sql_list.h:158
So, each tread stack starts from 'Thread..." line, each frame is numbered and function call is actually the fourth word in the output (in awk terms).
 
Based on that I tried add probe to produce stack in perf format for mysqld process, with some decoration to have a clear marker of when new capture starts:
[openxs@fc29 bpftrace]$ sudo bpftrace -e 'profile:hz:99 /comm == "mysqld"/ {printf("# %s\n", ustack(perf));}' > /tmp/ustack.txt
If you wonder about the details of command format, functions used etc, check the Reference Guide. The result looks as follows in hexdump -C output:
00000000  41 74 74 61 63 68 69 6e  67 20 31 20 70 72 6f 62  |Attaching 1 prob|
00000010  65 2e 2e 2e 0a 23 20 0a  09 37 66 32 39 65 66 62  |e....# ..7f29efb|
00000020  38 34 33 30 30 20 6e 61  6e 6f 73 6c 65 65 70 2b  |84300 nanosleep+|
00000030  36 34 20 28 2f 75 73 72  2f 6c 69 62 36 34 2f 6c  |64 (/usr/lib64/l|
00000040  69 62 70 74 68 72 65 61  64 2d 32 2e 32 38 2e 73  |ibpthread-2.28.s|
00000050  6f 29 0a 0a 23 20 0a 09  35 35 63 63 34 66 32 30  |o)..# ..55cc4f20|
...
This is how you can see where are the new lines, how many etc, for proper parsing. Next step was to check pt-pmp source code and http://poormansprofiler.org/, review and adapt some awk magic used there to collapse stacks, get rid of function arguments etc, and I've ended up with this draft version:
[openxs@fc29 maria10.3]$ cat /tmp/ustack.txt | awk '
BEGIN { s = ""; }
/^\#/ { print s; s = ""; }
/^\t/ { if (index($2, "(") > 0) {targ = substr($2, 1, index($2, "(") - 1)} else {targ = substr($2, 1, index($2, "+") - 1)} ; if (s != "") { s = s "," targ } else { s = targ } }
END { print s }' | sort | uniq -c | sort -r -n -k 1,1 | more

    199 __sched_yield,srv_purge_coordinator_thread,start_thread
     16 srv_get_task_queue_length,trx_purge,srv_purge_coordinator_thread,start_thread
     13 trx_purge,srv_purge_coordinator_thread,start_thread
      4 srv_purge_coordinator_thread,start_thread
      4 srv_get_task_queue_length,srv_purge_coordinator_thread,start_thread
      2 ut_crc32_sw,buf_calc_page_crc32,buf_page_is_corrupted,buf_page_io_complete,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level_func,row_search_mvcc,ha_innobase::index_read,handler::ha_index_read_map,handler::read_range_first,handler::multi_range_read_next,Mrr_simple_index_reader::get_next,DsMrr_impl::dsmrr_next,QUICK_RANGE_SELECT::get_next,rr_quick,mysql_update,mysql_execute_command,Prepared_statement::execute,Prepared_statement::execute_loop,mysql_stmt_execute_common,mysqld_stmt_execute,dispatch_command,do_command,tp_callback,worker_main,start_thread
      2 syscall,
      2 __pwrite,
      2 mtr_t::commit,row_ins_sec_index_entry_low,row_ins_sec_index_entry,row_upd_sec_index_entry,row_upd_step,row_update_for_mysql,ha_innobase::update_row,handler::ha_update_row,mysql_update,mysql_execute_command,Prepared_statement::execute,Prepared_statement::execute_loop,mysql_stmt_execute_common,mysqld_stmt_execute,dispatch_command,do_command,tp_callback,worker_main,start_thread
...
The above looks reasonable (even if not yet a direct match to what pt-pmp produces, I am too lazy to do much awk these days).

What's more interesting is that with bpftrace we can also get kernel stack for thread while profiling. I took a quick look at what we may get as a result of combining both:
[openxs@fc29 bpftrace]$ sudo bpftrace -e 'profile:hz:99 /comm == "mysqld"/ {printf("# %s\n%s\n", kstack(perf), ustack(perf));}' > /tmp/kstack.txt
[sudo] password for openxs:
^C
I just pring one after the other without any separator in perf format, so I can apply the same trick as above to the result:
openxs@fc29 maria10.3]$ cat /tmp/kstack.txt | awk '                   BEGIN { s = ""; }
/^\#/ { print s; s = ""; }
/^\t/ { if (index($2, "(") > 0) {targ = substr($2, 1, index($2, "(") - 1)} else {targ = substr($2, 1, index($2, "+") - 1)} ; if (s != "") { s = s "," targ } else { s = targ } }
END { print s }' | sort | uniq -c | sort -r -n -k 1,1 | more

     93 __sched_text_start,schedule,__x64_sys_sched_yield,do_syscall_64,entry_SYSCALL_64_after_hwframe,__sched_yield,srv_purge_coordinator_thread,start_thread
     49 __sched_yield,srv_purge_coordinator_thread,start_thread
     42 do_syscall_64,entry_SYSCALL_64_after_hwframe,__sched_yield,srv_purge_coordinator_thread,start_thread
     20 srv_get_task_queue_length,trx_purge,srv_purge_coordinator_thread,start_thread
     11 trx_purge,srv_purge_coordinator_thread,start_thread
      2 srv_get_task_queue_length,srv_purge_coordinator_thread,start_thread
      1 __x86_indirect_thunk_rax,do_syscall_64,entry_SYSCALL_64_after_hwframe,__sched_yield,srv_purge_coordinator_thread,start_thread
      1 ut_crc32_sw,log_block_calc_checksum_crc32,log_write_up_to,trx_commit_complete_for_mysql,innobase_commit,commit_one_phase_2,ha_commit_trans,trans_commit_stmt,mysql_execute_command,Prepared_statement::execute,Prepared_statement::execute_loop,mysql_stmt_execute_common,mysqld_stmt_execute,dispatch_command,do_command,tp_callback,worker_main,start_thread
...
      1 _raw_spin_unlock_irqrestore,ata_scsi_queuecmd,scsi_queue_rq,blk_mq_dispatch_rq_list,blk_mq_do_dispatch_sched,blk_mq_sched_dispatch_requests,__blk_mq_run_hw_queue,__blk_mq_delay_run_hw_queue,blk_mq_run_hw_queue,blk_mq_sched_insert_requests,blk_mq_flush_plug_list,blk_flush_plug_list,blk_finish_plug,ext4_writepages,do_writepages,__filemap_fdatawrite_range,file_write_and_wait_range,ext4_sync_file,do_fsync,__x64_sys_fsync,do_syscall_64,entry_SYSCALL_64_after_hwframe,fsync,fil_flush_low,fil_flush_file_spaces,buf_dblwr_update,buf_page_io_complete,fil_aio_wait,io_handler_thread,start_thread
      1 page_simple_validate_new,buf_dblwr_flush_buffered_writes,pc_flush_slot,buf_flush_page_cleaner_coordinator.cold.151,start_thread
      1 page_cur_insert_rec_write_log,page_cur_insert_rec_low,page_copy_rec_list_end_no_locks,btr_page_reorganize_low,btr_can_merge_with_page,btr_compress,btr_cur_compress_if_useful,btr_cur_pessimistic_delete,row_purge_remove_sec_if_poss_tree,row_purge_record_func,row_purge_step,que_run_threads,srv_worker_thread,start_thread
      1 os_thread_yield,srv_purge_coordinator_thread,start_thread
      1 os_thread_get_curr_id,fseg_free_page_func,btr_page_free,btr_compress,btr_cur_compress_if_useful,btr_cur_pessimistic_delete,row_purge_remove_sec_if_poss_tree,row_purge_record_func,row_purge_step,que_run_threads,srv_worker_thread,start_thread
...
Looks cool to me, we see where we spend time in kernel while on-CPU, not only in user space code.

It's sometimes scary and not that easy to use bpftrace, but you'll get there eventually, same as we got to North Greenwich via that Emirates Air Line back in 2019. The only limit is the sky!
* * *
To summarize:
  1. bpftrace and bcc tools you get from vendor packages may not work as expected, so be ready to get nightly builds etc or, as I prefer, build from source. If this is a problem better use ftrace :) 
  2. It seems getting stack traces with bpftrace for pt-pmp is an easy next stap to make. I may be integrated as one of tracers supported here.
  3. Now we can also add kernel stack to the picure.
  4. "JUST DO BPF"!
I have many more details to share about dynamic tracing for MySQL and MariaDB DBAs. Stay tuned!

by Valerii Kravchuk (noreply@blogger.com) at January 26, 2020 04:38 PM

January 25, 2020

Valeriy Kravchuk

Dynamic Tracing of MariaDB Server With ftrace - Basic Example

Today I'd like to continue my series of blog posts about dynamic tracing of MySQL server (or Percona Server for MySQL, or MariaDB, or basically any other software, whatever you prefer) on Linux. This post ideally had to start it, as the approach discussed here is truly dynamic and suitable production servers under high load (unlike gdb tricks, most of the time), works more or less the same way starting from Linux 2.6.27, is "always there" at your fingertips (if you are root, but this is a requirement in a general case for all kinds of dynamic tracing), and does not even strictly require -debuginfo packages with symbolic information for basic usage. I mean ftrace.

To prove that this interface that appeared in 2008 is still usable, let me add the same kind of probe for tracing dispatch_command() function and printing SQL statement that somebody tries to execute (so we can avoid enabling general query log etc) with pure ftrace interface, without much explanations or background (that is presented below). In this case I use recent MariaDB Server 10.3.22 built from source on my Ubuntu 16.04 netbook (because I do not care about being shared in the feed of Planet MySQL anymore in any case).

You just have to do the following:
  1. Check that dynamic tracing with ftrace is possible:
    openxs@ao756:~$ mount | grep tracefs
    tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime)
    If you do not see tracefs mounted, mount it with the following command (as root):
    mount -t tracefs nodev /sys/kernel/tracing
  2. Find the address of (mangled) dispatch_command() function in the mysqld binary:
    openxs@ao756:~$ objdump -T /home/openxs/dbs/maria10.3/bin/mysqld | grep dispatch_command
    0000000000587b90
    g    DF .text  000000000000236e  Base        _Z16dispatch_command19enum_server_commandP3THDPcjbb
    The address you need (in many cases) is the first word in the output, in hex. So, 0x0000000000587b90
  3. If you want to print values of parameters passed, you need to know how they are passed by the compiler used on the architecture used. Let me share the quote that applies to x86_64:
    "First six arguments are in rdi, rsi, rdx, rcx, r8d, r9d; remaining arguments are on the stack."
    In my case (see this post or details below, or check the source code yourself if in doubts) we need to print third argument, so it must be in the rdx (64-bit) register. For ftrace we have to use just %dx, trust me for now, I tried ;)
  4. Check probe syntax (see the picture below for quick reference) and come up with something like this to add a probe named "dc". Note that on Ubuntu 16.04 this had not worked via sudo directly, I had to become root via "sudo su -" (weird, something to check one day):
    root@ao756:~# echo 'p:dc /home/openxs/dbs/maria10.3/bin/mysqld:0x0000000000587b90 query=+0(%dx):string' > /sys/kernel/debug/tracing/uprobe_events
    All these happens while MariaDB server is running and procerssing queries, in a truly dynamic manner. There is no impact at all so far. Creators of MariaDB never added anything ot the code (or to the build process) to make this work. It's enough to have a binary that runs of Linux kernel that supports ftrace.
  5. Enable tracing for the probe added (this is when magic happens) and tracing in general:
    root@ao756:~# echo 1 > /sys/kernel/debug/tracing/events/uprobes/dc/enable
    root@ao756:~# echo 1 > /sys/kernel/debug/tracing/tracing_on
  6. Now assuming you run the following:
    root@ao756:~# mysql -uroot --socket=/tmp/mariadb.sock
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 92
    Server version: 5.5.5-10.3.22-MariaDB Source distribution

    Copyright (c) 2009-2019 Percona LLC and/or its affiliates
    Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.

    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

    mysql> select 1+1;
    +-----+
    | 1+1 |
    +-----+
    |   2 |
    +-----+
    1 row in set (0.00 sec)

    mysql> select version();
    +-----------------+
    | version()       |
    +-----------------+
    | 10.3.22-MariaDB |
    +-----------------+
    1 row in set (0.00 sec)
    you may get the trace as follows, for example:
    root@ao756:~# cat /sys/kernel/debug/tracing/trace
    # tracer: nop
    #
    # entries-in-buffer/entries-written: 9851/9851   #P:2
    #
    #                              _-----=> irqs-off
    #                             / _----=> need-resched
    #                            | / _---=> hardirq/softirq
    #                            || / _--=> preempt-depth
    #                            ||| /     delay
    #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
    #              | |       |   ||||       |         |
              python-21369 [001] d... 209044.826580: : sys_wr: 0000000000000008
    ...
    (I had show the top of the output to give hint about format), or:
    root@ao756:~# cat /sys/kernel/debug/tracing/trace_pipe
              mysqld-1082  [000] d... 273258.971401: dc: (0x560d8a20fb90) query="select @@version_comment limit 1"
              mysqld-1082  [000] d... 273263.187839: dc: (0x560d8a20fb90) query="select 1+1"
              mysqld-1082  [001] d... 273269.128542: dc: (0x560d8a20fb90) query="select version()"
    ...
  7. If you want to stop this basic tracing, run these:
    root@ao756:~# echo 0 > /sys/kernel/debug/tracing/events/uprobes/dc/enable
    root@ao756:~# echo 0 > /sys/kernel/debug/tracing/tracing_on
So, with 7 simple steps we can enable dynamic tracing and see SQL statements executed and time when this happened.
Check https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/trace/uprobetracer.rst
The magic behind ftrace and steps that I really had to perform when I tried to apply this approach for the very first time will be presented in the next posts. Stay tuned! 

by Valerii Kravchuk (noreply@blogger.com) at January 25, 2020 07:55 PM

January 24, 2020

SeveralNines

Open Source Database Adoption - Taking a Look at Percona’s Survey Results

Are you still struggling to pick the best open source database software for your organisation? You would have penned down a must-have feature list right from deploying, managing, and monitoring a database but finding the best fit?... still not there. 

There are many software vendors out there trying their best to offer a variety of combinations of features to manage the open-source databases, so it would be wise to get an insight on the current happenings around this space before making any decision. 

Recently Percona conducted a survey on 750 respondents from small, medium, and large companies to try to understand how they have managed their open source database environments. The survey results have led to interesting findings to understand the trends in open source database adoption by the open source community. 

This blog walks you through important points on open source database features, leading technologies, adoption factors, and concerns evaluated by those organisations. 

Multiple Open Source Environments

Many companies now have multiple databases on multiple platforms across different locations, to keep up with the rapid needs and changes in their business. The need for multiple database instances often increases as the data volume increases. On average, over half of the open source community uses at least 25 instances for this purpose. 

73% of relational database users, shows that the relational database is still the market preference over the multiple-model databases like time-Series, graph, wide column, and other niche database technologies. 

With many different databases, companies are now overwhelmed with choices, allowing them to have a combination of database types to support the various applications they have in their environment. The combination usually depends on the interaction and data support between the database and the applications. To maintain these multi environments, companies should be prepared to invest in either a multi-skilled DBA or have a good open source database management system (like ClusterControl) to deploy, manage, and monitor the various databases in their environment.

The Leading Open Source Databases

The survey on open source databases  also highlighted the top databases installed in 2019, and the five leading databases are as below

Postgres-XL, Clustrix, Alibaba Cloud: ApsaraDB RDS for PostgreSQL, FoundationDB Document Layer and Azure Cosmos DB are at the bottom 5 with less than 1% installation for the year.

MySQL - Variants and Combinations

The MySQL Community Edition secured the title for most deployed database for the year of 2019. The top five most popular MySQL-compatible softwares after the community version are...

  • MariaDB Community
  • Percona Server for MySQL
  • MySQL on Amazon RDS
  • Percona XtraDB Cluster
  • Amazon Aurora

MySQL combinations with other databases differ based on the editions. PostgreSQL, Elastic, Redis and MongoDB commonly used with the MySQL Community version. On the other hand, with the Enterprise Version, proprietary databases, SQL Server and Oracle are used as the best combination. 

64% of the community also selected PostgreSQL as a popular database to use alongside the enterprise edition. These results show clearly that the community version is not usually paired with a proprietary database. It is assumed that there could be two main reasons for this decision; lack of skills to manage multiple open source databases and/or the fact that management has concerns over the support of stability of the open source products.

PostgreSQL - Variants and Combinations

PostgreSQL has gained a lot of attention in the last few years and has the most installation after the MySQL database. Its strength lies in the large community base which contributes to its upgrades and expansions. Although there are many compatible versions, only the standard version is preferred one. PostgreSQL is coupled most with Elastic, MongoDB, SQL, and Redis. The enterprise version, like MySQL Enterprise, is commonly paired with the enterprise databases like Oracle and SQL Server.

MongoDB - Variants and Combinations

MongoDB gained its popularity along with big data and its ability to overcome the limitation of a rigid relational database with NoSQL. NoSQL paved the way for agile development and supports flexibility and scalability. Like the other two, MongoDB Community is still the most widely used version by small and medium companies, and the enterprise version is only used by large organisations. 

Open Source Database Adoption

Open-source databases gained popularity because of cost-saving and to avoid vendor lock-in situations. Another bit of good news is that these databases work for any business size, hence it is widely used by small, medium, and large companies alike. 

Open source tools give a platform for experimentation, which allows the users to use a community edition, and get comfortable with it, before moving on to further deployments. On average 53% of companies are moving into the open-source software adoption. 

Open Source Community Contributions

Enhancement in the open source world really depends on the contributions from its user community. This is why open source software with a large community (like PostgreSQL) is always adopted widely by small, medium, and large companies.  Although companies are geared up for open source adoption, and do know the need to contribute, many of the users have said they don't have the time to contribute back to the libraries. 

Support Services

The next main concern on the adoption of open source is around support preferences. Generally, small scale companies management and technical staff prefer a self-support option. 

Support services are also a limiting factor. Companies are often worried about the support mechanism, especially during times of crisis. They lack confidence in their own support team or it could be the internal team just has too many other tasks, making it impossible to give adequate support. 

Small companies usually rely on self-support to minimize cost. To increase confidence in the open source solution, some companies appoint external vendors for support services. Another option which can be considered is to opt for an open source database management system which includes support services as well.

Enterprise or Subscribed Database Preferences

There is still a large percentage of companies using proprietary databases for three major reasons; the strong 24x7 dedicated support line, brand trust, and the enhanced security.  Trust is tagged with a long established brand which gives peace of mind to the users. Despite these factors, community open source still wins with one major factor which is cost saving. 

Open Source Database Adoption Concerns

The survey showed there are three main adoption concerns (besides vendor lock-in). 

The first concern is the lack of support which has been discussed in the earlier sections of this blog. Next is the concern around the lack of fixes and bugs from the small and medium companies. The worry could be on the cost incurred to fix any bugs. 

Large companies are not worried about this, because they can afford the cost to hire someone to fix any bugs and even further enhance the system. 

Security is the third reason, and this concern is mainly from the technical team because they are responsible for the security compliance of its systems in the organisation. 

Conclusion

Adopting an open source database is the way-to-go for any size business and is the best fit if cost and avoiding vendor lock-in is a concern. You also need to be aware of, and check on, the support mechanism, patch support, and security aspects before making a choice. 

Along with the open source technology adoption, you would need a proper technical team to manage the database and have a proper support mechanism to handle any limitations. 

Open source technologies allow you to experiment with the available free or community versions and then decide to go ahead with the licensed or enterprise version if required. 

The great thing is that with open source technologies, you won't have to settle for one database anymore, as you have more than one to serve the different aspects of your business.

by Sarojini Devi Nagappan at January 24, 2020 06:48 PM

January 23, 2020

SeveralNines

An Introduction to MySQL Deployment Using an Ansible Role

Ansible automates and simplifies repetitive, complex, and tedious operations. It is an IT automation engine that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs. It requires no agents, using only SSH to push changes from a single source to multiple remote resources with no additional custom security infrastructure configuration and use a simple language format (YAML) to describe the automation jobs.

Installing a standalone MySQL server is a simple straightforward task, but this can be problematic if you have multiple database servers, versions, platforms and environments to support. Thus, having a configuration management tool is the way to go to improve efficiency, remove repetitiveness and reduce human errors.

In this blog post, we are going to go walk you through the basics of Ansible's automation for MySQL, as well as configuration management with examples and explanations. We will start with a simple standalone MySQL deployment, as illustrated in the following high-level diagram:

Installing Ansible

For this walkthrough, we need to have at least two hosts - One host is for Ansible (you could use a workstation instead of a server) and another one is the target host that we want to deploy a MySQL server. 

To install Ansible on CentOS 7, simply run the following commands:

(ansible-host)$ yum install -y epel-release

(ansible-host)$ yum install -y ansible

For other OS distributions, check out the Ansible installation guide.

Setting up Passwordless SSH

Using password during SSH is supported, but passwordless SSH keys with ssh-agent are one of the best ways to use Ansible. The initial step is to configure passwordless SSH since Ansible will perform the deployment solely by this channel. Firstly, generate a SSH key on the Ansible host:

(ansible-host)$ whoami

root

(ansible-host)$ ssh-keygen -t rsa -N '' -f ~/.ssh/id_rsa

You should get at least the following files generated:

(ansible-host)$ ls -al ~/.ssh/

-rw-------. 1 root root 1679 Jan 14 03:40 id_rsa

-rw-r--r--. 1 root root  392 Jan 14 03:40 id_rsa.pub

To allow passwordless SSH, we need to copy the SSH public key (id_rsa.pub) to the remote host that we want to access. We can use a tool called ssh-copy-id to do this task for us. However, you must know the user's password of the target host and the password authentication is allowed on the target host:

(ansible-host)$ whoami

root

(ansible-host)$ ssh-copy-id root@192.168.0.221

The above command will prompt out for root password of 192.168.0.221, simply enter the password and the SSH key for the current user of the Ansible host will be copied over to the target host, 192.168.0.221 into ~/.ssh/authorized_keys, meaning we authorize that particular key to access this server remotely. To test out, you should be able to run the following remote command without any password from Ansible host:

(ansible-host)$ ssh root@192.168.0.221 "hostname -I"

192.168.0.221

In case where you are not allowed to use root user for SSH (e.g, "PermitRootLogin no" in SSH configuration), you can use a sudo user instead. In the following example, we set up passwordless SSH for a sudo user called "vagrant":

(ansible-host)$ whoami

vagrant

(ansible-host)$ ssh-keygen -t rsa -N '' -f ~/.ssh/id_rsa

(ansible-host)$ ls -al ~/.ssh/

-rw-------. 1 vagrant vagrant 1679 Jan 14 03:45 id_rsa

-rw-r--r--. 1 vagrant vagrant  392 Jan 14 03:45 id_rsa.pub

(ansible-host)$ ssh-copy-id vagrant@192.168.0.221

If the target server doesn't allow password authentication via SSH, simply copy the content of SSH public key at ~/.ssh/id_rsa.pub manually into the target hosts' ~/.ssh/authorized_keys file. For example, on the Ansible host, retrieve the public key content:

(ansible-host)$ cat ~/.ssh/id_rsa.pub

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5MZjufN0OiKyKa2OG0EPBEF/w23FnOG2x8qpAaYYuqHlVc+ZyRugtGm+TdTJDfLA1Sr/rtZpXmPDuLUdlAvPmmwqIhgiatKiDw5t2adNUwME0sVgAlBv/KvbusTTdtpFQ1o+Z9CltGiENDCFytr2nVeBFxImoZu2H0ilZed/1OY2SZejUviXTQ0Dh0QYdIeiQHkMf1CiV2sNYs8j8+ULV26OOKCd8c1h1O9M5Dr4P6kt8E1lVSl9hbd4EOHQmeZ3R3va5zMesLk1A+iadIGJCJNCVOA2RpxDHmmaX28zQCwrpCliH00g9iCRixlK+cB39d1coUWVGy7SeaI8bzfv3 vagrant@cc

Connect to the target host and paste the Ansible's host public key into ~/.ssh/authorized_keys:

(target-host)$ whoami

root

(target-host)$ vi ~/.ssh/authorized_keys

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5MZjufN0OiKyKa2OG0EPBEF/w23FnOG2x8qpAaYYuqHlVc+ZyRugtGm+TdTJDfLA1Sr/rtZpXmPDuLUdlAvPmmwqIhgiatKiDw5t2adNUwME0sVgAlBv/KvbusTTdtpFQ1o+Z9CltGiENDCFytr2nVeBFxImoZu2H0ilZed/1OY2SZejUviXTQ0Dh0QYdIeiQHkMf1CiV2sNYs8j8+ULV26OOKCd8c1h1O9M5Dr4P6kt8E1lVSl9hbd4EOHQmeZ3R3va5zMesLk1A+iadIGJCJNCVOA2RpxDHmmaX28zQCwrpCliH00g9iCRixlK+cB39d1coUWVGy7SeaI8bzfv3 vagrant@cc

You may now try to run a remote command from Ansible host to verify and you should not be prompted with any password. At this point, our passwordless SSH is configured.

Defining the Target Host

Next we need to define the target host, the host that we want to manage using Ansible. Based on our architecture, we are going to deploy only one MySQL server which is 192.168.0.221. Add the following lines into /etc/ansible/hosts:

[db-mysql]

192.168.0.221

The above simply means we defined a group called "db-mysql", which will be the identifier when we refer to the target host in Ansible playbook. We can also list out all IP addresses or hostnames of the target hosts under this group. At this point, we only have one MySQL server to deploy, thus only one entry is there. You can also specify a any matching rule to match the hosts under one group, for example:

[db-mysql]

192.168.0.[221:223]

The above definition means we are having 3 hosts under this very group with the following IP addresses:

  • 192.168.0.221
  • 192.168.0.222
  • 192.168.0.223

There are a lot of ways and rules to match and group the target hosts as shown in the Ansible inventory guide.

Choosing an Ansible Role

To tell Ansible what to deploy, we need to define the deployment steps in a YML formatted file called playbook. As you might know, installing a complete MySQL server requires multiple steps to satisfy all MySQL dependencies, post-installation configuration, user and schema creation and so on. Ansible has provided a number of MySQL modules that can help us out, but still we have to write a playbook for the deployment steps.

To simplify the deployment steps, we can use existing Ansible roles. Ansible role is an independent component which allows reuse of common configuration steps. An Ansible role has to be used within the playbook. There are a number of MySQL Ansible roles available in the Ansible Galaxy, a repository for Ansible roles that are available to drop directly into your playbooks.

If you lookup "mysql", you will get plenty of Ansible roles for MySQL:

We will use the most popular one named "mysql" by geerlingguy. You can opt to use other roles but mostly the most downloaded one tends to be for general purpose which usually works fine in most cases.

On the Ansible host, run the following command to download the Ansible role:

(ansible-host)$ ansible-galaxy install geerlingguy.mysql

The role will be downloaded into ~/.ansible/roles/geerlingguy.mysql/ of the current user.

Writing the Ansible Playbook

By looking at the Readme of the Ansible role, we can follow the example playbook that is being provided. Firstly, create a playbook file called deploy-mysql.yml and add the following lines:

(ansible-host)$ vim ~/deploy-mysql.yml

- hosts: db-mysql

  become: yes

  vars_files:

    - vars/main.yml

  roles:

    - { role: geerlingguy.mysql }

In the above lines, we define the target host which is all hosts under db-mysql entries in /etc/ansible/hosts. The next line (become) tells Ansible to execute the playbook as a root user, which is necessary for the role (it is stated there in the Readme file). Next, we define the location of variables file (var_files) located at vars/main.yml, relative to the playbook path.

Let's create the variable directory and file and specify the following line:

(ansible-host)$ mkdir vars

(ansible-host)$ vim vars/main.yml

mysql_root_password: "theR00tP455w0rd"

For more information check out the Role Variables section in the Readme file of this role.

Start the Deployment

Now we are ready to start the MySQL deployment. Use the ansible-playbook command to execute our playbook definitions:

(ansible-host)$ ansible-playbook deploy-mysql.yml

You should see a bunch of lines appear in the output. Focus on the last line where it summarizes the deployment:

PLAY RECAP ***************************************************************************************************************************************

192.168.0.221              : ok=36 changed=8 unreachable=0    failed=0 skipped=16 rescued=0 ignored=0

If everything turns up green and OK, you can verify on the database host that our MySQL server is already installed and running:

(mysql-host)$ rpm -qa | grep -i maria

mariadb-server-5.5.64-1.el7.x86_64

mariadb-libs-5.5.64-1.el7.x86_64

mariadb-5.5.64-1.el7.x86_64



(mysql-host)$ mysqladmin -uroot -p ping

Enter password:

mysqld is alive

As you can see from the above, for CentOS 7, the default MySQL installation is MariaDB 5.5 as part of the standard package repository. At this point, our deployment is considered complete, however, we would like to further customize our deployment as shown in the next sections.

Customizing the Deployment

The simplest definition in playbook gives us a very basic installation and uses all default configuration options. We can further customize the MySQL installation by extending/modifying/appending the playbook to do the following:

  • modify MySQL configuration options
  • add database user
  • add database schema
  • configure user privileges
  • configure MySQL replication
  • install MySQL from other vendors
  • import a custom MySQL configuration file

Installing MySQL from Oracle repository

By default, the role will install the default MySQL package that comes with the OS distribution. As for CentOS 7, you would get MariaDB 5.5 installed by default. Suppose we want to install MySQL from another vendor, we can extend the playbook with pre_tasks, a task which Ansible executes before executing any tasks mentioned in any .yml file, as shown in the following example:

(ansible-host)$ vim deploy-mysql.yml

- hosts: db-mysql

  become: yes

  vars_files:

    - vars/main.yml

  roles:

    - { role: geerlingguy.mysql }

  pre_tasks:

    - name: Install the MySQL repo.

      yum:

        name: http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm

        state: present

      when: ansible_os_family == "RedHat"

    - name: Override variables for MySQL (RedHat).

      set_fact:

        mysql_daemon: mysqld

        mysql_packages: ['mysql-server']

        mysql_log_error: /var/lib/mysql/error.log

        mysql_syslog_tag: mysqld

        mysql_pid_file: /var/run/mysqld/mysqld.pid

        mysql_socket: /var/lib/mysql/mysql.sock

      when: ansible_os_family == "RedHat"

Execute the playbook:

(ansible-host)$ ansible-playbook deploy-mysql.yml

The above will install MySQL from Oracle repository instead. The default version you would get is MySQL 5.6. Executing the above playbook on a target host that already has a running older version of MySQL/MariaDB would likely fail because of the incompatibility.

Creating MySQL Databases and Users

Inside vars/main.yml, we can define the MySQL database and users that we want Ansible to configure on our MySQL server by using the mysql_database and mysql_users modules, right after our previous definition on mysql_root_password:

(ansible-host)$ vim vars/main.yml

mysql_root_password: "theR00tP455w0rd"

mysql_databases:

  - name: myshop

    encoding: latin1

    collation: latin1_general_ci

  - name: sysbench

    encoding: latin1

    collation: latin1_general_ci

mysql_users:

  - name: myshop_user

    host: "%"

    password: mySh0pPassw0rd

    priv: "myshop.*:ALL"

  - name: sysbench_user

    host: "192.168.0.%"

    password: sysBenchPassw0rd

    priv: "sysbench.*:ALL"

The definition instructs Ansible to create two databases, "myshop" and "sysbench", followed its respective MySQL user with proper privileges, allowed host and password.

Re-execute the playbook to apply the change into our MySQL server:

(ansible-host)$ ansible-playbook deploy-mysql.yml

This time, Ansible will pick up all the changes we made in vars/main.yml to be applied to our MySQL server. We can verify in the MySQL server with the following commands:

(mysql-host)$ mysql -uroot -p -e 'SHOW DATABASES'

Enter password:

+--------------------+

| Database           |

+--------------------+

| information_schema |

| myshop             |

| mysql              |

| performance_schema |

| sysbench           |

+--------------------+

(mysql-host)$ mysql -uroot -p -e 'SHOW GRANTS FOR sysbench_user@"192.168.0.%"'

Enter password:

+------------------------------------------------------------------------------------------------------------------------+

| Grants for sysbench_user@192.168.0.%                                                                                   |

+------------------------------------------------------------------------------------------------------------------------+

| GRANT USAGE ON *.* TO 'sysbench_user'@'192.168.0.%' IDENTIFIED BY PASSWORD '*4AC2E8AD02562E8FAAF5A958DC2AEA4C47451B5C' |

| GRANT ALL PRIVILEGES ON `sysbench`.* TO 'sysbench_user'@'192.168.0.%'                                                  |

+------------------------------------------------------------------------------------------------------------------------+

Enabling Slow Query Log

This role supports enabling MySQL slow query log, we can define the location of the log file as well as the slow query time. Add the necessary variables inside vars/main.yml file:

mysql_root_password: "theR00tP455w0rd"

mysql_databases:

  - name: example_db

    encoding: latin1

    collation: latin1_general_ci

  - name: sysbench

    encoding: latin1

    collation: latin1_general_ci

mysql_users:

  - name: example_user

    host: "%"

    password: similarly-secure-password

    priv: "example_db.*:ALL"

  - name: sysbench_user

    host: "192.168.0.%"

    password: sysBenchPassw0rd

    priv: "sysbench.*:ALL"

mysql_slow_query_log_enabled: true

mysql_slow_query_log_file: 'slow_query.log'

mysql_slow_query_time: '5.000000'

Re-run the playbook to apply the changes:

(ansible-host)$ ansible-playbook deploy-mysql.yml

The playbook will make necessary changes to MySQL slow query related options and restart the MySQL server automatically to load the new configurations. We can then verify if the new configuration options are loaded correctly on the MySQL server:

(mysql-host)$ mysql -uroot -p -e 'SELECT @@slow_query_log, @@slow_query_log_file, @@long_query_time'

+------------------+-----------------------+-------------------+

| @@slow_query_log | @@slow_query_log_file | @@long_query_time |

+------------------+-----------------------+-------------------+

|                1 | slow_query.log        | 5.000000 |

+------------------+-----------------------+-------------------+

Including Custom MySQL Configuration File

Ansible role variables and MySQL variables are two different things. The author of this role has created a number of MySQL related variables that can be represented with Ansible role variables. Taken from the Readme file, here are some of them:

mysql_port: "3306"

mysql_bind_address: '0.0.0.0'

mysql_datadir: /var/lib/mysql

mysql_socket: *default value depends on OS*

mysql_pid_file: *default value depends on OS*

mysql_log_file_group: mysql *adm on Debian*

mysql_log: ""

mysql_log_error: *default value depends on OS*

mysql_syslog_tag: *default value depends on OS*

If the generated configuration does not satisfy our MySQL requirement, we can include custom MySQL configuration files into the deployment by using mysql_config_include_files variable. It accepts an array of values separated by a comma, with a "src" as the prefix for the actual path on the Ansible host.

First of all, we have to prepare the custom configuration files on the Ansible host. Create a directory and a simple MySQL configuration file:

(ansible-host)$ mkdir /root/custom-config/

(ansible-host)$ vim /root/custom-config/my-severalnines.cnf

[mysqld]

max_connections=250

log_bin=binlog

expire_logs_days=7

Let's say we have another configuration file specifically for mysqldump configuration:

(ansible-host)$ vim /root/custom-config/mysqldump.cnf

[mysqldump]

max_allowed_packet=128M

To import these configuration files into our deployment, define them in the mysql_config_include_files array in vars/main.yml file:

mysql_root_password: "theR00tP455w0rd"

mysql_databases:

  - name: example_db

    encoding: latin1

    collation: latin1_general_ci

  - name: sysbench

    encoding: latin1

    collation: latin1_general_ci

mysql_users:

  - name: example_user

    host: "%"

    password: similarly-secure-password

    priv: "example_db.*:ALL"

  - name: sysbench_user

    host: "192.168.0.%"

    password: sysBenchPassw0rd

    priv: "sysbench.*:ALL"

mysql_slow_query_log_enabled: true

mysql_slow_query_log_file: slow_query.log

mysql_slow_query_time: 5

mysql_config_include_files: [

  src: '/root/custom-config/my-severalnines.cnf',

  src: '/root/custom-config/mysqldump.cnf'

]

Note that /root/custom-config/mysqld-severalnines.cnf and /root/custom-config/mysqldump.cnf exist inside the Ansible host.

Re-run the playbook:

(ansible-host)$ ansible-playbook deploy-mysql.yml

The playbook will import those configuration files and put them into the include directory (depending on the OS) which is /etc/my.cnf.d/ for CentOS 7. The playbook will auto-restart the MySQL server to load the new configuration options. We can then verify if the new configuration options are loaded correctly:

(mysql-host)$ mysql -uroot -p -e 'select @@max_connections'

250

(mysql-host)$ mysqldump --help | grep ^max-allowed-packet

max-allowed-packet                134217728

Conclusion

Ansible can be used to automate the database deployment and configuration management with a little knowledge of scripting. Meanwhile, ClusterControl uses a similar passwordless SSH approach to deploy, monitor, manage and scale your database cluster from A to Z, with a user interface and needs no additional skill to achieve the same result.

by ashraf at January 23, 2020 04:35 PM

MariaDB Foundation

MariaDB Day Brussels 0202 2020 Provisional Schedule

A provisional schedule for the first MariaDB Day, to be held as part of the FOSDEM Fringe in Brussels at the Bedford Hotel and Congress Centre on Sunday February 2, is now available. […]

The post MariaDB Day Brussels 0202 2020 Provisional Schedule appeared first on MariaDB.org.

by Ian Gilfillan at January 23, 2020 06:05 AM

January 22, 2020

SeveralNines

Using PostgreSQL Replication Slots

What are Replication Slots?

Back in the days when "Replication Slots" were not yet introduced, managing the WAL segments were a challenge. In standard streaming replication, the master has no knowledge of the slave status.  Take the example of a master that executes a large transaction, while a standby node is in maintenance mode for a couple of hours (such as upgrading the system packages, adjusting network security, hardware upgrade, etc.). At some point, the master removes its transaction log (WAL segments) as checkpoint passes. Once the slave is off maintenance, it possibly has a huge slave lag and has to catch up with the master. Eventually, the slave will get a fatal issue like below:

LOG:  started streaming WAL from primary at 0/73000000 on timeline 1

FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000000000073 has already been removed

The typical approach is to specify in your postgresql.conf a WAL archival script that will copy WAL files to one or more long-term archive locations. If you don’t have any standbys or other streaming replication clients, then basically the server can discard the WAL file once the archive script is done or responds OK. But you’ll still need some recent WAL files for crash recovery (data from recent WAL files get replayed during crash recovery. In our example of a standby node which is placed for a long maintenance period, problems arise when it comes back online and asks the primary for a WAL file that the primary no longer has, then the replication fails.

This problem was addressed in PostgreSQL 9.4 via "Replication Slots".

If not using replication slots, a common way to reduce the risk of failing replication is to set the wal_keep_segments high enough so that WAL files that might be needed won't be rotated or recycled. The disadvantage of this approach is that it's hard to determine what value is best for your setup. You won't need maintenance on a daily basis or you won't need to retain a large pile of WAL files that eats your disk storage. While this works, it's not an ideal solution as risking disk space on the master can cause incoming transactions to fail.

Alternative approaches of not using replication slots is to configure PostgreSQL with continuous archiving and provide a restore_command to give the replica access to the archive. To avoid WAL build-up on the primary, you may use a separate volume or storage device for the WAL files, e.g., SAN or NFS. Another thing is with synchronous replication since it requires that primary has to wait for standby nodes to commit transaction. This means, it assures that WAL files have been applied to the standby nodes. But still, it's best that you provide archiving commands from the primary so that once WAL's are recycled in the primary, rest assured that you have WAL backups in case for recovery. Although in some situations, synchronous replication is not an ideal solution as it comes with some performance overhead as compared with asynchronous replication.

Types of Replication Slots

There are two types of replication slots. These are:

Physical Replication Slots 

Can be used for standard streaming replication. They will make sure that data is not recycled too early. 

Logical Replication Slots

Logical replication does the same thing as physical replication slots and are used for logical replication. However, they are used for logical decoding. The idea behind logical decoding is to give users a chance to attach to the transaction log and decode it with a plugin. It allows to extract changes made to the database and therefore to the transaction log in any format and for any purpose.

In this blog, we'll be using physical replication slots and how to achieve this using ClusterControl.

Advantages and Disadvantages of Using Replication Slots

Replications slots are definitely beneficial once enabled. By default, "Replication Slots" are not enabled and have to be set  manually. Among the advantages of using Replication Slots are

  • Ensures master retains enough WAL segments for all replicas to receive them
  • Prevents the master from removing rows that could cause recovery conflict on the replicas
  • A master can only recycle the transaction log once it has been consumed by all replicas. The advantage here is that a slave can never fall behind so much that a re-sync is needed.

Replication slots also come with some caveats.

  • An orphan replication slot can cause unbounded disk growth due to piled up WAL files from the master
  • Slave nodes placed under long maintenance (such as days or weeks) and that are tied to a replication slot will have unbounded disk growth due to piled up WAL files from the master

You can monitor this by querying pg_replication_slots to determine the slots that are not used. We'll check back on this a bit later.

Using Replication Slots 

As stated earlier, there are two types of replication slots. For this blog, we'll use physical replication slots for streaming replication.

Creating A Replication Slot

Creating a replication is simple. You need to invoke the existing function pg_create_physical_replication_slot to do this and has to be run and created in the master node. The function is simple,

maximus_db=# \df pg_create_physical_replication_slot

Schema              | pg_catalog

Name                | pg_create_physical_replication_slot

Result data type    | record

Argument data types | slot_name name, immediately_reserve boolean DEFAULT false, OUT slot_name name, OUT xlog_position pg_lsn

Type                | normal

e.g. Creating a replication slot named slot1,

postgres=# SELECT pg_create_physical_replication_slot('slot1');

-[ RECORD 1 ]-----------------------+---------

pg_create_physical_replication_slot | (slot1,)

The replication slot names and its underlying configuration is only system-wide and not cluster-wide. For example, if you have nodeA (current master), and standby nodes nodeB and nodeC, creating the slot on a master nodeA namely "slot1", then data will not be available to nodeB and nodeC. Therefore, when failover/switchover is about to happen, you need to re-create the slots you have created.

Dropping A Replication Slot

Unused replication slots have to be dropped or deleted. As stated earlier, when there are orphaned replication slots or slots that have not been assigned to any client or standby nodes, it can lead to boundless disk space issues if left undropped. So it is very important that these have to be dropped when it's no longer use. To drop it, simply invoke pg_drop_replication_slot. This function has the following definition:

maximus_db=# \df pg_drop_replication_slot

Schema              | pg_catalog

Name                | pg_drop_replication_slot

Result data type    | void

Argument data types | name

Type                | normal

Dropping it is simple:

maximus_db=# select pg_drop_replication_slot('slot2');

-[ RECORD 1 ]------------+-

pg_drop_replication_slot |

Monitoring Your PostgreSQL Replication Slots

Monitoring your replication slots is something that you don't want to miss. Just collect the information from view pg_replication_slots in the primary/master node just like below:

postgres=# select * from pg_replication_slots;

-[ RECORD 1 ]-------+-----------

slot_name           | main_slot

plugin              |

slot_type           | physical

datoid              |

database            |

active              | t

active_pid          | 16297

xmin                |

catalog_xmin        |

restart_lsn         | 2/F4000108

confirmed_flush_lsn |

-[ RECORD 2 ]-------+-----------

slot_name           | main_slot2

plugin              |

slot_type           | physical

datoid              |

database            |

active              | f

active_pid          |

xmin                |

catalog_xmin        |

restart_lsn         |

confirmed_flush_lsn |

The above result shows that the main_slot has been taken, but not main_slot2.

Another thing you can do is to monitor how much lag behind the slots you have. To achieve this, you can simply use the query based on the sample result below:

postgres=# SELECT redo_lsn, slot_name,restart_lsn, 

round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind 

FROM pg_control_checkpoint(), pg_replication_slots;

redo_lsn    | slot_name | restart_lsn | gb_behind 

------------+-----------+-------------+-----------

 1/8D400238 |     slot1 | 0/9A000000 | 3.80

But redo_lsn is not present in 9.6, shall use redo_location, so in 9.6,

imbd=# SELECT redo_location, slot_name,restart_lsn, 

round((redo_location-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind 

FROM pg_control_checkpoint(), pg_replication_slots;

-[ RECORD 1 ]-+-----------

redo_location | 2/F6008BE0

slot_name     | main_slot

restart_lsn   | 2/F6008CC0

gb_behind     | 0.00

-[ RECORD 2 ]-+-----------

redo_location | 2/F6008BE0

slot_name     | main_slot2

restart_lsn   | 2/F6008CC0

gb_behind     | 0.00

System Variable Requirements

Implementing replication slots requires manual setting. There are variables that you have to keep in mind that require changes and be specified in your postgresql.conf. See below:

  • max_replication_slots – If set to 0, this means that replication slots are totally disabled. If you're using PostgreSQL < 10 versions, this slot has to be specified other than 0 (default). Since PostgreSQL 10, the default is 10. This variable specifies the maximum number of replication slots. Setting it to a lower value than the number of currently existing replication slots will prevent the server from starting.
  • wal_level – must at least be replica or higher (replica is default). Setting hot_standby or archive will map to replica. For a physical replication slot, replica is enough. For logical replication slots, logical is preferred.
  • max_wal_senders – set to 10 by default, 0 in 9.6 version which means replication is disabled. We suggest you set this at least to 16 especially when running with ClusterControl.
  • hot_standby – in versions < 10, you need to set this to on which is off by default. This is important for standby nodes which means when on, you can connect and run queries during recovery or in standby mode.
  • primary_slot_name –  this variable is set via recovery.conf on the standby node. This is the slot to be used by the receiver or standby node when connecting with the sender (or primary/master).

You have to take note that these variables mostly require a database service restart in order to reload new values.

Using Replication Slots in a ClusterControl PostgreSQL Environment

Now, let’s see how we can use physical replication slots and implement them within a Postgres setup managed by ClusterControl.

Deploying of PostgreSQL Database Nodes

Let's start deploying a 3-node PostgreSQL Cluster using ClusterControl using PostgreSQL 9.6 version this time.

ClusterControl will deploy nodes with the following system variables defined accordingly based on their defaults or tuned up values. In:

postgres=# select name, setting from pg_settings where name in ('max_replication_slots', 'wal_level', 'max_wal_senders', 'hot_standby');

         name          | setting 

-----------------------+---------

 hot_standby           | on

 max_replication_slots | 0

 max_wal_senders       | 16

 wal_level             | replica

(4 rows)

In versions PostgreSQL > 9.6, max_replication_slots default value is 10 which is enabled by default but not in 9.6 or lower versions which is disabled by default. You need to assign max_replication_slots higher than 0. In this example, I set max_replication_slots to 5.

root@debnode10:~# grep 'max_replication_slots' /etc/postgresql/9.6/main/postgresql.conf 

# max_replication_slots = 0                     # max number of replication slots

max_replication_slots = 5

and restarted the service,

root@debnode10:~# pg_lsclusters 

Ver Cluster Port Status Owner    Data directory Log file

9.6 main    5432 online postgres /var/lib/postgresql/9.6/main pg_log/postgresql-%Y-%m-%d_%H%M%S.log



root@debnode10:~# pg_ctlcluster 9.6 main restart

Setting The Replication Slots For Primary and Standby Nodes

There's no option in ClusterControl to do this, so you have to create your slots manually. In this example, I created the slots in the primary in host 192.168.30.100:

192.168.10.100:5432 pgdbadmin@maximus_db=# SELECT pg_create_physical_replication_slot('slot1'), pg_create_physical_replication_slot('slot2');

 pg_create_physical_replication_slot | pg_create_physical_replication_slot 

-------------------------------------+-------------------------------------

 (slot1,)                            | (slot2,)

(1 row)

Checking what we have just created shows,

192.168.10.100:5432 pgdbadmin@maximus_db=# select * from pg_replication_slots;

 slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 

-----------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------

 slot1     | | physical  | | | f      | | |       | | 

 slot2     | | physical  | | | f      | | |       | | 

(2 rows)

Now in the standby nodes, we need to update the recovery.conf and add the variable primary_slot_name and change the application_name so it's easier to identify the node. Here's how it looks like in host 192.168.30.110 recovery.conf: 

root@debnode11:/var/lib/postgresql/9.6/main/pg_log# cat ../recovery.conf 

standby_mode = 'on'

primary_conninfo = 'application_name=node11 host=192.168.30.100 port=5432 user=cmon_replication password=m8rLmZxyn23Lc2Rk'

recovery_target_timeline = 'latest'

primary_slot_name = 'slot1'

trigger_file = '/tmp/failover_5432.trigger'

Doing the same thing as well in host 192.168.30.120 but changed the application_name and set the primary_slot_name = 'slot2'.

Checking the replication slot health:

192.168.10.100:5432 pgdbadmin@maximus_db=# select * from pg_replication_slots;

 slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 

-----------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------

 slot1     | | physical  | | | t      | 24252 | |       | 0/CF0A4218 | 

 slot2     | | physical  | | | t      | 11635 | |       | 0/CF0A4218 | 

(2 rows)

What Else Do You Need?

Since ClusterControl doesn't support Replication Slots as of this time, there are things that you need to take into account. What are these? Let's go into details.

Failover/Switchover Process

When an auto failover or switchover via ClusterControl has been attempted, slots will not be retained from the primary and on the standby nodes. You need to re-create this manually, check the variables if set correctly, and modify the recovery.conf accordingly.

Rebuilding a Slave from a Master

When rebuilding a slave, the recovery.conf will not be retained. This means that your recovery.conf settings having the primary_slot_name will be erased. You need to specify this manually again and check the pg_replication_slots view to determine if slots are properly used or left orphaned.

If you want to rebuild the slave/standby node from a master, you might have to consider specifying the PGAPPNAME env variable just like the command below:

$ export PGAPPNAME="app_repl_testnode15"; /usr/pgsql-9.6/bin/pg_basebackup -h 192.168.10.190 -U cmon_replication -D /var/lib/pgsql/9.6/data -p5434 -W -S main_slot -X s -R -P

Specifying the -R param is very important so it will re-create the recovery.conf, while -S shall specify what slot name to use when rebuilding the standby node.

Conclusion

Implementing the Replication Slots in PostgreSQL is straightforward yet there are certain caveats that you must remember. When deploying with ClusterControl, you’ll need to update some settings during failover or slave rebuilds.

by Paul Namuag at January 22, 2020 05:15 PM

January 21, 2020

SeveralNines

Moving from MySQL 5.7 to MySQL 8.0 - What You Should Know

April 2018 is not just a date for the MySQL world. MySQL 8.0 was released there, and more than 1 year after, it’s probably time to consider migrating to this new version.

MySQL 8.0 has important performance and security improvements, and, as in all migration to a new database version, there are several things to take into account before going into production to avoid hard issues like data loss, excessive downtime, or even a rollback during the migration task.

In this blog, we’ll mention some of the new MySQL 8.0 features, some deprecated stuff, and what you need to keep in mind before migrating.

What’s New in MySQL 8.0?

Let’s now summarize some of the most important features mentioned in the official documentation for this new MySQL version.

  • MySQL incorporates a transactional data dictionary that stores information about database objects.
  • An atomic DDL statement combines the data dictionary updates, storage engine operations, and binary log writes associated with a DDL operation into a single, atomic transaction.
  • The MySQL server automatically performs all necessary upgrade tasks at the next startup to upgrade the system tables in the mysql schema, as well as objects in other schemas such as the sys schema and user schemas. It is not necessary for the DBA to invoke mysql_upgrade.
  • It supports the creation and management of resource groups, and permits assigning threads running within the server to particular groups so that threads execute according to the resources available to the group. 
  • Table encryption can now be managed globally by defining and enforcing encryption defaults. The default_table_encryption variable defines an encryption default for newly created schemas and general tablespace. Encryption defaults are enforced by enabling the table_encryption_privilege_check variable. 
  • The default character set has changed from latin1 to utf8mb4.
  • It supports the use of expressions as default values in data type specifications. This includes the use of expressions as default values for the BLOB, TEXT, GEOMETRY, and JSON data types.
  • Error logging was rewritten to use the MySQL component architecture. Traditional error logging is implemented using built-in components, and logging using the system log is implemented as a loadable component.
  • A new type of backup lock permits DML during an online backup while preventing operations that could result in an inconsistent snapshot. The new backup lock is supported by LOCK INSTANCE FOR BACKUP and UNLOCK INSTANCE syntax. The BACKUP_ADMIN privilege is required to use these statements.
  • MySQL Server now permits a TCP/IP port to be configured specifically for administrative connections. This provides an alternative to the single administrative connection that is permitted on the network interfaces used for ordinary connections even when max_connections connections are already established.
  • It supports invisible indexes. This index is not used by the optimizer and makes it possible to test the effect of removing an index on query performance, without removing it.
  • Document Store for developing both SQL and NoSQL document applications using a single database.
  • MySQL 8.0 makes it possible to persist global, dynamic server variables using the SET PERSIST command instead of the usual SET GLOBAL one. 

MySQL Security and Account Management

As there are many improvements related to security and user management, we'll list them in a separate section.

  • The grant tables in the mysql system database are now InnoDB tables. 
  • The new caching_sha2_password authentication plugin is now the default authentication method in MySQL 8.0. It implements SHA-256 password hashing, but uses caching to address latency issues at connect time. It provides more secure password encryption than the mysql_native_password plugin, and provides better performance than sha256_password.
  • MySQL now supports roles, which are named collections of privileges. Roles can have privileges granted to and revoked from them, and they can be granted to and revoked from user accounts. 
  • MySQL now maintains information about password history, enabling restrictions on reuse of previous passwords. 
  • It enables administrators to configure user accounts such that too many consecutive login failures due to incorrect passwords cause temporary account locking. 

InnoDB enhancements

As the previous point, there are also many improvements related to this topic, so we'll list them in a separate section too.

  • The current maximum auto-increment counter value is written to the redo log each time the value changes, and saved to an engine-private system table on each checkpoint. These changes make the current maximum auto-increment counter value persistent across server restarts
  • When encountering index tree corruption, InnoDB writes a corruption flag to the redo log, which makes the corruption flag crash-safe. InnoDB also writes in-memory corruption flag data to an engine-private system table on each checkpoint. During recovery, InnoDB reads corruption flags from both locations and merges results before marking in-memory table and index objects as corrupt.
  • A new dynamic variable, innodb_deadlock_detect, may be used to disable deadlock detection. On high concurrency systems, deadlock detection can cause a slowdown when numerous threads wait for the same lock. At times, it may be more efficient to disable deadlock detection and rely on the innodb_lock_wait_timeout setting for transaction rollback when a deadlock occurs.
  • InnoDB temporary tables are now created in the shared temporary tablespace, ibtmp1.
  • mysql system tables and data dictionary tables are now created in a single InnoDB tablespace file named mysql.ibd in the MySQL data directory. Previously, these tables were created in individual InnoDB tablespace files in the mysql database directory.
  • By default, undo logs now reside in two undo tablespaces that are created when the MySQL instance is initialized. Undo logs are no longer created in the system tablespace.
  • The new innodb_dedicated_server variable, which is disabled by default, can be used to have InnoDB automatically configure the following options according to the amount of memory detected on the server: innodb_buffer_pool_size, innodb_log_file_size, and innodb_flush_method. This option is intended for MySQL server instances that run on a dedicated server. 
  • Tablespace files can be moved or restored to a new location while the server is offline using the innodb_directories option. 

Now, let’s take a look at some of the features that you shouldn’t use anymore in this new MySQL version.

What is Deprecated in MySQL 8.0?

The following features are deprecated and will be removed in a future version.

  • The utf8mb3 character set is deprecated. Please use utf8mb4 instead.
  • Because caching_sha2_password is the default authentication plugin in MySQL 8.0 and provides a superset of the capabilities of the sha256_password authentication plugin, sha256_password is deprecated.
  • The validate_password plugin has been reimplemented to use the server component infrastructure. The plugin form of validate_password is still available but is deprecated.
  • The ENGINE clause for the ALTER TABLESPACE and DROP TABLESPACE statements.
  • The PAD_CHAR_TO_FULL_LENGTH SQL mode.
  • AUTO_INCREMENT support is deprecated for columns of type FLOAT and DOUBLE (and any synonyms). Consider removing the AUTO_INCREMENT attribute from such columns, or convert them to an integer type.
  • The UNSIGNED attribute is deprecated for columns of type FLOAT, DOUBLE, and DECIMAL (and any synonyms). Consider using a simple CHECK constraint instead for such columns.
  • FLOAT(M,D) and DOUBLE(M,D) syntax to specify the number of digits for columns of type FLOAT and DOUBLE (and any synonyms) is a nonstandard MySQL extension. This syntax is deprecated.
  • The nonstandard C-style &&, ||, and ! operators that are synonyms for the standard SQL AND, OR, and NOT operators, respectively, are deprecated. Applications that use the nonstandard operators should be adjusted to use the standard operators.
  • The mysql_upgrade client is deprecated because its capabilities for upgrading the system tables in the mysql system schema and objects in other schemas have been moved into the MySQL server.
  • The mysql_upgrade_info file, which is created data directory and used to store the MySQL version number.
  • The relay_log_info_file system variable and --master-info-file option are deprecated. Previously, these were used to specify the name of the relay log info log and master info log when relay_log_info_repository=FILE and master_info_repository=FILE were set, but those settings have been deprecated. The use of files for the relay log info log and master info log has been superseded by crash-safe slave tables, which are the default in MySQL 8.0.
  • The use of the MYSQL_PWD environment variable to specify a MySQL password is deprecated.

And now, let’s take a look at some of the features that you must stop using in this MySQL version.

What Was Removed in MySQL 8.0?

The following features have been removed in MySQL 8.0.

  • The innodb_locks_unsafe_for_binlog system variable was removed. The READ COMMITTED isolation level provides similar functionality.
  • Using GRANT to create users. Instead, use CREATE USER. Following this practice makes the NO_AUTO_CREATE_USER SQL mode immaterial for GRANT statements, so it too is removed, and an error now is written to the server log when the presence of this value for the sql_mode option in the options file prevents mysqld from starting.
  • Using GRANT to modify account properties other than privilege assignments. This includes authentication, SSL, and resource-limit properties. Instead, establish such properties at account-creation time with CREATE USER or modify them afterward with ALTER USER.
  • IDENTIFIED BY PASSWORD 'auth_string' syntax for CREATE USER and GRANT. Instead, use IDENTIFIED WITH auth_plugin AS 'auth_string' for CREATE USER and ALTER USER, where the 'auth_string' value is in a format compatible with the named plugin. 
  • The PASSWORD() function. Additionally, PASSWORD() removal means that SET PASSWORD ... = PASSWORD('auth_string') syntax is no longer available.
  • The old_passwords system variable.
  • The FLUSH QUERY CACHE and RESET QUERY CACHE statements.
  • These system variables: query_cache_limit, query_cache_min_res_unit, query_cache_size, query_cache_type, query_cache_wlock_invalidate.
  • These status variables: Qcache_free_blocks, Qcache_free_memory, Qcache_hits, Qcache_inserts, Qcache_lowmem_prunes, Qcache_not_cached, Qcache_queries_in_cache, Qcache_total_blocks.
  • These thread states: checking privileges on cached query, checking query cache for a query, invalidating query cache entries, sending cached result to the client, storing result in the query cache, Waiting for query cache lock.
  • The tx_isolation and tx_read_only system variables have been removed. Use transaction_isolation and transaction_read_only instead.
  • The sync_frm system variable has been removed because .frm files have become obsolete.
  • The secure_auth system variable and --secure-auth client option have been removed. The MYSQL_SECURE_AUTH option for the mysql_options() C API function was removed.
  • The log_warnings system variable and --log-warnings server option have been removed. Use the log_error_verbosity system variable instead.
  • The global scope for the sql_log_bin system variable was removed. sql_log_bin has session scope only, and applications that rely on accessing @@GLOBAL.sql_log_bin should be adjusted.
  • The unused date_format, datetime_format, time_format, and max_tmp_tables system variables are removed.
  • The deprecated ASC or DESC qualifiers for GROUP BY clauses are removed. Queries that previously relied on GROUP BY sorting may produce results that differ from previous MySQL versions. To produce a given sort order, provide an ORDER BY clause.
  • The parser no longer treats \N as a synonym for NULL in SQL statements. Use NULL instead. This change does not affect text file import or export operations performed with LOAD DATA or SELECT ... INTO OUTFILE, for which NULL continues to be represented by \N. 
  • The client-side --ssl and --ssl-verify-server-cert options have been removed. Use --ssl-mode=REQUIRED instead of --ssl=1 or --enable-ssl. Use --ssl-mode=DISABLED instead of --ssl=0, --skip-ssl, or --disable-ssl. Use --ssl-mode=VERIFY_IDENTITY instead of --ssl-verify-server-cert options.
  • The mysql_install_db program has been removed from MySQL distributions. Data directory initialization should be performed by invoking mysqld with the --initialize or --initialize-insecure option instead. In addition, the --bootstrap option for mysqld that was used by mysql_install_db was removed, and the INSTALL_SCRIPTDIR CMake option that controlled the installation location for mysql_install_db was removed.
  • The mysql_plugin utility was removed. Alternatives include loading plugins at server startup using the --plugin-load or --plugin-load-add option, or at runtime using the INSTALL PLUGIN statement.
  • The resolveip utility is removed. nslookup, host, or dig can be used instead.

There are a lot of new, deprecated, and removed features. You can check the official website for more detailed information.

Considerations Before Migrating to MySQL 8.0

Let’s mention now some of the most important things to consider before migrating to this MySQL version.

Authentication Method

As we mentioned, caching_sha2_password is not the default authentication method, so you should check if your application/connector supports it. If not, let’s see how you can change the default authentication method and the user authentication plugin to ‘mysql_native_password’ again.

To change the default  authentication method, edit the my.cnf configuration file, and add/edit the following line:

$ vi /etc/my.cnf

[mysqld]

default_authentication_plugin=mysql_native_password

To change the user authentication plugin, run the following command with a privileged user:

$ mysql -p

ALTER USER ‘username’@’hostname’ IDENTIFIED WITH ‘mysql_native_password’ BY ‘password’;

Anyway, these changes aren’t a permanent solution as the old authentication could be deprecated soon, so you should take it into account for a future database upgrade.

Also the roles are an important feature here. You can reduce the individual privileges assigning it to a role and adding the corresponding users there. 

For example, you can create a new role for the marketing and the developers teams:

$ mysql -p

CREATE ROLE 'marketing', 'developers';

Assign privileges to these new roles:

GRANT SELECT ON *.* TO 'marketing';

GRANT ALL PRIVILEGES ON *.* TO 'developers';

And then, assign the role to the users:

GRANT 'marketing' TO 'marketing1'@'%';

GRANT 'marketing' TO 'marketing2'@'%';

GRANT 'developers' TO 'developer1'@'%';

And that’s it. You’ll have the following privileges:

SHOW GRANTS FOR 'marketing1'@'%';

+-------------------------------------------+

| Grants for marketing1@%                   |

+-------------------------------------------+

| GRANT USAGE ON *.* TO `marketing1`@`%`    |

| GRANT `marketing`@`%` TO `marketing1`@`%` |

+-------------------------------------------+

2 rows in set (0.00 sec)

SHOW GRANTS FOR 'marketing';

+----------------------------------------+

| Grants for marketing@%                 |

+----------------------------------------+

| GRANT SELECT ON *.* TO `marketing`@`%` |

+----------------------------------------+

1 row in set (0.00 sec)

Character Sets

As the new default character set is utf8mb4, you should make sure you’re not using the default one as it’ll change.

To avoid some issues, you should specify the character_set_server and the collation_server variables in the my.cnf configuration file.

$ vi /etc/my.cnf

[mysqld]

character_set_server=latin1

collation_server=latin1_swedish_ci

MyISAM Engine

The MySQL privilege tables in the MySQL schema are moved to InnoDB. You can create a table engine=MyISAM, and it will work as before, but coping a MyISAM table into a running MySQL server will not work because it will not be discovered.

Partitioning

There must be no partitioned tables that use a storage engine that does not have native partitioning support. You can run the following query to verify this point.

$ mysql -p

SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE ENGINE NOT IN ('innodb', 'ndbcluster') AND CREATE_OPTIONS LIKE '%partitioned%';

If you need to change the engine of a table, you can run:

ALTER TABLE table_name ENGINE = INNODB;

Upgrade Check

As a last step, you can run the mysqlcheck command using the check-upgrade flag to confirm if everything looks fine.

$ mysqlcheck -uroot -p --all-databases --check-upgrade

Enter password:

mysql.columns_priv                                 OK

mysql.component                                    OK

mysql.db                                           OK

mysql.default_roles                                OK

mysql.engine_cost                                  OK

mysql.func                                         OK

mysql.general_log                                  OK

mysql.global_grants                                OK

mysql.gtid_executed                                OK

mysql.help_category                                OK

mysql.help_keyword                                 OK

mysql.help_relation                                OK

mysql.help_topic                                   OK

mysql.innodb_index_stats                           OK

mysql.innodb_table_stats                           OK

mysql.password_history                             OK

mysql.plugin                                       OK

mysql.procs_priv                                   OK

mysql.proxies_priv                                 OK

mysql.role_edges                                   OK

mysql.server_cost                                  OK

mysql.servers                                      OK

mysql.slave_master_info                            OK

mysql.slave_relay_log_info                         OK

mysql.slave_worker_info                            OK

mysql.slow_log                                     OK

mysql.tables_priv                                  OK

mysql.time_zone                                    OK

mysql.time_zone_leap_second                        OK

mysql.time_zone_name                               OK

mysql.time_zone_transition                         OK

mysql.time_zone_transition_type                    OK

mysql.user                                         OK

sys.sys_config                                     OK

world_x.city                                       OK

world_x.country                                    OK

world_x.countryinfo                                OK

world_x.countrylanguage                            OK

There are several things to check before performing the upgrade. You can check the official MySQL documentation for more detailed information.

Upgrade Methods

There are different ways to upgrade MySQL 5.7 to 8.0. You can use the upgrade in-place or even create a replication slave in the new version, so you can promote it later. 

But before upgrading, step 0 must be backing up your data. The backup should include all the databases including the system databases. So, if there is any issue, you can rollback asap. 

Another option, depending on the available resources, can be creating a cascade replication MySQL 5.7 -> MySQL 8.0 -> MySQL 5.7, so after promoting the new version, if something went wrong, you can promote the slave node with the old version back. But it could be dangerous if there was some issue with the data, so the backup is a must before it.

For any method to be used, it’s necessary a test environment to verify that the application is working without any issue using the new MySQL 8.0 version.

Conclusion

More than 1 year after the MySQL 8.0 release, it is time to start thinking to migrate your old MySQL version, but luckily, as the end of support for MySQL 5.7 is 2023, you have time to create a migration plan and test the application behavior with no rush. Spending some time in that testing step is necessary to avoid any issue after migrating it.

by Sebastian Insausti at January 21, 2020 08:57 PM

January 20, 2020

Valeriy Kravchuk

Dynamic Tracing of MariaDB Server With bcc trace - Basic Example

This is a yet another blog post in my series about dynamic tracing of MySQL server (and friends) on Linux. Logically it had to appear after this one about perf and another one about bpftrace. For older Linux systems or when you are in a hurry with customer and have no time to upgrade, build from source etc perf just works and is really flexible (but it comes with some cost of writing many samples to disk and then processing them). For happy users of Linux with kernels 4.9+ (the newer the better), like recent Ubuntu, RHEL 8, Debian 9+ or Fedora there entire world of new efficient tracing with bpftrace is open and extending with every new kernel release.

For those in between, like me with this Ubuntu 16.04:
openxs@ao756:~/git/bcc/build$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
openxs@ao756:~/git/bcc/build$ uname -a
Linux ao756 4.4.0-171-generic #200-Ubuntu SMP Tue Dec 3 11:04:55 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
the fancy world of eBPF and more efficient dynamic tracing is still mostly open, as we can try to use BCC tools. BCC is a toolkit for creating efficient kernel tracing and manipulation programs that includes several potentially useful tools and examples for MySQL DBAs. It makes use of extended BPF (Berkeley Packet Filters), formally known as eBPF.

I have a draft of this blog post hanging around since October 2019, but every time I tried to complete it I was not happy with the content. I wanted to get back, test more, try to present more tools, find out how to be able to access structure members in probes as easy as I can do it with gdb or perf, but then I hit some problem and put the draft aside...

When I started again some time later I often hit some new problem, so today I just decided to finally write down what I already know for sure, and provide at least a very basic example of dynamic tracing along the lines of those used in earlier posts (capturing queries executed by different threads using dynamic probes).

The first problem in case of Ubuntu 16.04 is to get the binaries of BCC tools. One of the ways is to build from GitHub source. The INSTALL.md document is clear enough when describing build dependencies and steps:
git clone https://github.com/iovisor/bcc.git
mkdir bcc/build; cd bcc/build
cmake .. -DCMAKE_INSTALL_PREFIX=/usr
make
sudo make install
but still there is something to note. In recent versions you surely have to update the libbpf submodule, or you'll end up with compilation errors at early stage. My steps today were the following:
openxs@ao756:~/dbs/maria10.3$ cd ~/git/bcc/
openxs@ao756:~/git/bcc$ git pull
Already up-to-date.
openxs@ao756:~/git/bcc$ git log -1
commit dce8e9daf59f44dec4e3500d39a82a8ce59e43ba
Author: Yonghong Song <yhs@fb.com>
Date:   Fri Jan 17 22:06:52 2020 -0800

    sync with latest libbpf repo

    sync libbpf submodule upto the following commit:
        commit 033ad7ee78e8f266fdd27ee2675090ccf4402f3f
        Author: Andrii Nakryiko <andriin@fb.com>
        Date:   Fri Jan 17 16:22:23 2020 -0800

            sync: latest libbpf changes from kernel

    Signed-off-by: Yonghong Song <yhs@fb.com>

openxs@ao756:~/git/bcc$ git submodule init
openxs@ao756:~/git/bcc$ git submodule update
Submodule path 'src/cc/libbpf': checked out '033ad7ee78e8f266fdd27ee2675090ccf4402f3f'
Now I can proceed to build subdirectory and complete the build:
openxs@ao756:~/git/bcc/build$ cmake .. -DCMAKE_INSTALL_PREFIX=/usr...
openxs@ao756:~/git/bcc/build$ make
...
[ 99%] Building CXX object tests/cc/CMakeFiles/test_libbcc.dir/test_usdt_probes.cc.o
[100%] Building CXX object tests/cc/CMakeFiles/test_libbcc.dir/utils.cc.o
[100%] Linking CXX executable test_libbcc
[100%] Built target test_libbcc
It's always interesting to check if tests pass:
openxs@ao756:~/git/bcc/build$ make test
Running tests...
Test project /home/openxs/git/bcc/build
      Start  1: style-check
 1/40 Test  #1: style-check ......................   Passed    0.01 sec
      Start  2: c_test_static
 2/40 Test  #2: c_test_static ....................   Passed    0.30 sec
...
40/40 Test #40: lua_test_standalone ..............***Failed    0.06 sec

75% tests passed, 10 tests failed out of 40

Total Test time (real) = 450.78 sec

The following tests FAILED:
          3 - test_libbcc (Failed)
          4 - py_test_stat1_b (Failed)
          5 - py_test_bpf_log (Failed)
          6 - py_test_stat1_c (Failed)
          7 - py_test_xlate1_c (Failed)
          8 - py_test_call1 (Failed)
         16 - py_test_brb (Failed)
         17 - py_test_brb2 (Failed)
         18 - py_test_clang (Failed)
         40 - lua_test_standalone (Failed)
Errors while running CTest
Makefile:105: recipe for target 'test' failed
make: *** [test] Error 8
I've always had some tests failed, and one day I probably have to report the issue for the project, but for the purpose of this post (based on previous experience with older code) I'll get at least trace tool working as expected. So, I decided to process with installation:
openxs@ao756:~/git/bcc/build$ sudo make install
...
-- Up-to-date: /usr/share/bcc/tools/old/stackcount
-- Up-to-date: /usr/share/bcc/tools/old/oomkill
The tools are installed by default to /usr/share/bcc/tools.

For adding dynamic probes I'll use trace tool  that probes functions you specify and displays trace messages if a particular condition is met. You can control the message format to display function
arguments and return values.

Brendan Gregg explains the usage of this and other tools here with a lot of details. I'll just add a nice chart from that page here:
There is a separate tutorial with examples. You may want to check section for trace tool there.

For the purpose of this blog post I think it's enough to quickly check help output:
openxs@ao756:~/git/bcc/build$ sudo /usr/share/bcc/tools/trace
usage: trace [-h] [-b BUFFER_PAGES] [-p PID] [-L TID] [-v] [-Z STRING_SIZE]
             [-S] [-M MAX_EVENTS] [-t] [-u] [-T] [-C] [-c CGROUP_PATH]
             [-n NAME] [-f MSG_FILTER] [-B] [-s SYM_FILE_LIST] [-K] [-U] [-a]
             [-I header]
             probe [probe ...]
trace: error: too few arguments
and note the following basic syntax is used to define probes (see man /usr/share/bcc/man/man8/trace.8.gz after buuilding the tools from source as described above):
PROBE SYNTAX
       The general probe syntax is as follows:

       [{p,r}]:[library]:function[(signature)]      [(predicate)]     ["format
       string"[, arguments]]

       {t:category:event,u:library:probe}  [(predicate)]  ["format   string"[,
       arguments]]

       {[{p,r}],t,u}
              Probe  type  -  "p" for function entry, "r" for function return,
              "t" for kernel tracepoint, "u" for USDT probe. The default probe
              type is "p".
...
Fir simplicity here we do not consider conditional probes, so predicate is skipped. At the moment we are not interested in kernel or or user defined static tracepoints (the are not defined in default recent builds of MySQL or MariaDB server anway and require -DENABLE_DTRACE=ON to be explictly added to cmake command line used). For user defined dynamic probes in the mysqld process we need p (for probe at function entry) and maybe r (for function return).

We need to refer to library, and in our case this is a full path name to the mysqld binary (or just mysqld if it's in PATH). We also need to refer to some function by names. Quick test will show you that by default trace does NOT accept plain function names in MySQL or MariaDB code (as perf does), and require mangled ones to be used (same as bpftrace). We can find the names with nm command:
openxs@ao756:~/git/bcc/build$ nm -na /home/openxs/dbs/maria10.3/bin/mysqld | grep dispatch_command
00000000004a1eef t _Z16dispatch_command19enum_server_commandP3THDPcjbb.cold.344
00000000005c5180 T _Z16dispatch_command19enum_server_commandP3THDPcjbb
00000000005c5180 t _Z16dispatch_command19enum_server_commandP3THDPcjbb.localalias.256
In the example above I was specifically looking for dispatch_command() function of MariaDB server version 10.3.x that I assume (see previous post) has a string with SQL statement as the third argument, packet. So, I can refer to this function in probe as "_Z16dispatch_command19enum_server_commandP3THDPcjbb".

The "format string" that define how to output arguments of probe  is a printf-style format string.  You  can  use the following format specifiers: %s, %d%u, ...  with the same semantics as printf's. In our case for zero-terminating string we'll use "%s".

Arguments of the function traced are named arg1, arg2, ... argN (unless we provide a signature for function), and are numbered starting from 1. So, in out case we can add a probe to print third argument of the dispatch_command() function upon entry as follows:
openxs@ao756:~/dbs/maria10.3$ sudo /usr/share/bcc/tools/trace -T 'p:/home/openxs/dbs/maria10.3/bin/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb "%s" arg3'
[sudo] password for openxs:
TIME     PID     TID     COMM            FUNC             -
16:16:53 26585   29133   mysqld          _Z16dispatch_command19enum_server_commandP3THDPcjbb select @@version_comment limit 1
16:17:02 26585   29133   mysqld          _Z16dispatch_command19enum_server_commandP3THDPcjbb select 1
16:17:05 26585   29133   mysqld          _Z16dispatch_command19enum_server_commandP3THDPcjbb select 2
16:17:07 26585   29133   mysqld          _Z16dispatch_command19enum_server_commandP3THDPcjbb
^C
I've got the output above for this sample session:
openxs@ao756:~/dbs/maria10.3$ bin/mysql -uroot --socket=/tmp/mariadb.sock
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 16
Server version: 10.3.22-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> select 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0,000 sec)

MariaDB [(none)]> select 2;
+---+
| 2 |
+---+
| 2 |
+---+
1 row in set (0,001 sec)

MariaDB [(none)]> exit
Bye
Note that I've added -T option to the command line to output timestamp.
  
It's a bit more complex with recent versions of Percona Server or MySQL (or any function parameters that are complex structures). It's also more complex if we want to process prepared statements and process packet content depending on the first argument (the above is correct only for COM_QUERY) and so on. But these are basic steps to get a log of SQL queries with timestamps by adding a dynamic probe with BCC trace tool. Enjoy!

by Valerii Kravchuk (noreply@blogger.com) at January 20, 2020 02:25 PM

SeveralNines

Rebuilding a MySQL 8.0 Replication Slave Using a Clone Plugin

With MySQL 8.0 Oracle adopted a new approach to development. Instead of pushing features with major versions, almost every minor MySQL 8.0 version comes with new features or improvements. One of these new features is what we would like to focus on in this blog post. 

Historically MySQL did not come with good tools for provisioning. Sure, you had mysqldump, but it is just a logical backup tool, not really suitable for larger environments. MySQL enterprise users could benefit from MySQL Enterprise Backup while community users could use xtrabackup. Neither of those came with a clean MySQL Community deployments though. It was quite annoying as provisioning is a task you do quite often. You may need to build a new slave, rebuild a failed one - all of this will require some sort of a data transfer between separate nodes.

MySQL 8.0.17 introduced a new way of provisioning MySQL data - clone plugin. It was intended with MySQL Group Replication in mind to introduce a way of automatic provisioning and rebuilding of failed nodes, but its usefulness is not limited to that area. We can as well use it to rebuild a slave node or provision a new server. In this blog post we would like to show you how to set up MySQL Clone plugin and how to rebuild a replication slave.

First of all, the plugin has to be enabled as it is disabled by default. Once you do this, it will stay enabled through restarts. Ideally, you will do it on all of the nodes in the replication topology.

mysql> INSTALL PLUGIN clone SONAME 'mysql_clone.so';

Query OK, 0 rows affected (0.00 sec)

Clone plugin requires MySQL user with proper privileges. On donor it has to have “BACKUP_ADMIN” privilege while on the joiner it has to have “CLONE_ADMIN” privilege. Assuming you want to use the clone plugin extensively, you can just create user with both privileges. Do it on the master so the user will be created also on all of the slaves. After all, you never know which node will be a master some time in the future therefore it’s more convenient to have everything prepared upfront.

mysql> CREATE USER clone_user@'%' IDENTIFIED BY 'clonepass';

Query OK, 0 rows affected (0.01 sec)

mysql> GRANT BACKUP_ADMIN, CLONE_ADMIN ON *.* to clone_user@'%';

Query OK, 0 rows affected (0.00 sec)

MySQL Clone plugin has some prerequisites thus sanity checks should be performed. You should ensure that both donor and joiner will have the same values in the following configuration variables:

mysql> SHOW VARIABLES LIKE 'innodb_page_size';

+------------------+-------+

| Variable_name    | Value |

+------------------+-------+

| innodb_page_size | 16384 |

+------------------+-------+

1 row in set (0.01 sec)

mysql> SHOW VARIABLES LIKE 'innodb_data_file_path';

+-----------------------+-------------------------+

| Variable_name         | Value   |

+-----------------------+-------------------------+

| innodb_data_file_path | ibdata1:100M:autoextend |

+-----------------------+-------------------------+

1 row in set (0.01 sec)

mysql> SHOW VARIABLES LIKE 'max_allowed_packet';

+--------------------+-----------+

| Variable_name      | Value |

+--------------------+-----------+

| max_allowed_packet | 536870912 |

+--------------------+-----------+

1 row in set (0.00 sec)

mysql> SHOW GLOBAL VARIABLES LIKE '%character%';

+--------------------------+--------------------------------+

| Variable_name            | Value       |

+--------------------------+--------------------------------+

| character_set_client     | utf8mb4       |

| character_set_connection | utf8mb4                        |

| character_set_database   | utf8mb4       |

| character_set_filesystem | binary                         |

| character_set_results    | utf8mb4       |

| character_set_server     | utf8mb4       |

| character_set_system     | utf8       |

| character_sets_dir       | /usr/share/mysql-8.0/charsets/ |

+--------------------------+--------------------------------+

8 rows in set (0.00 sec)



mysql> SHOW GLOBAL VARIABLES LIKE '%collation%';

+-------------------------------+--------------------+

| Variable_name                 | Value |

+-------------------------------+--------------------+

| collation_connection          | utf8mb4_0900_ai_ci |

| collation_database            | utf8mb4_0900_ai_ci |

| collation_server              | utf8mb4_0900_ai_ci |

| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |

+-------------------------------+--------------------+

4 rows in set (0.00 sec)

Then, on the master, we should double-check that undo tablespaces have unique names:

mysql> SELECT TABLESPACE_NAME, FILE_NAME FROM INFORMATION_SCHEMA.FILES

    ->        WHERE FILE_TYPE LIKE 'UNDO LOG';

+-----------------+------------+

| TABLESPACE_NAME | FILE_NAME  |

+-----------------+------------+

| innodb_undo_001 | ./undo_001 |

| innodb_undo_002 | ./undo_002 |

+-----------------+------------+

2 rows in set (0.12 sec)

Default verbosity level does not show too much data regarding cloning process therefore we would recommend to increase it to have better insight into what is happening:

mysql> SET GLOBAL log_error_verbosity=3;

Query OK, 0 rows affected (0.00 sec)

To be able to start the process on our joiner, we have to configure a valid donor:

mysql> SET GLOBAL clone_valid_donor_list ='10.0.0.101:3306';

Query OK, 0 rows affected (0.00 sec)

mysql> SHOW VARIABLES LIKE 'clone_valid_donor_list';

+------------------------+-----------------+

| Variable_name          | Value |

+------------------------+-----------------+

| clone_valid_donor_list | 10.0.0.101:3306 |

+------------------------+-----------------+

1 row in set (0.00 sec)

Once it is in place, we can use it to copy the data from:

mysql> CLONE INSTANCE FROM 'clone_user'@'10.0.0.101':3306 IDENTIFIED BY 'clonepass';

Query OK, 0 rows affected (18.30 sec)

That’s it, the progress can be tracked in the MySQL error log on the joiner. Once everything is ready, all you have to do is to setup the replication:

mysql> CHANGE MASTER TO MASTER_HOST='10.0.0.101', MASTER_AUTO_POSITION=1;

Query OK, 0 rows affected (0.05 sec)

mysql> START SLAVE USER='rpl_user' PASSWORD='afXGK2Wk8l';

Query OK, 0 rows affected, 1 warning (0.01 sec)

Please keep in mind that Clone plugin comes with a set of limitations. For starters, it transfers only InnoDB tables so if you happen to use any other storage engines, you would have to either convert them to InnoDB or use another provisioning method. It also interferes with Data Definition Language - ALTERs will block and be blocked by cloning operations.

By default cloning is not encrypted so it could be used only in a secure environment. If needed, you can set up SSL encryption for the cloning process by ensuring that the donor has SSL configured and then define following variables on the joiner:

clone_ssl_ca=/path/to/ca.pem

clone_ssl_cert=/path/to/client-cert.pem

clone_ssl_key=/path/to/client-key.pem

Then, you need to add “REQUIRE SSL;” at the end of the CLONE command and the process will be executed with SSL encryption. Please keep in mind this is the only method to clone databases with data-at-rest encryption enabled.

As we mentioned at the beginning, cloning was, most likely, designed with MySQL Group Replication/InnoDB Cluster in mind but, as long as the limitations are not affecting particular use case, it can be used as a native way of provisioning any MySQL instance. We will see how broad of adoption it will have - possibilities are numerous. What’s already great is we now have another hardware-agnostic method we can use to provision servers in addition to Xtrabackup. Competition is always good and we are looking forward to see what the future holds.

 

by krzysztof at January 20, 2020 10:45 AM

January 18, 2020

Valeriy Kravchuk

Fun with Bugs #92 - On MySQL Bug Reports I am Subscribed to, Part XXVI

I'd like to continue reviewing MySQL bug reports from Community users that I considered interesting and subscribed to. Unlike in the previous post in this series, I am not going to check test cases on any competitor product, but will use only recently released MySQL 5.7.29 and 8.0.19 for checks, if any. This time I'll concentrate on bugs reported in November 2019.

As usual, I mostly care about optimizer, InnoDB and replication related bugs. Here is the list:
  • Bug #97476 - "Range optimizer skips rows". This bug reported by Ilya Raudsepp looks like a clear regression in MySQL 8.0.x comparing to MySQL 5.7.x at least. I get the following correct results with 5.7.29:
    mysql> SELECT t.id
        -> FROM Test t
        -> JOIN (
        ->     SELECT item_id, MAX(created_at) AS created_at
        ->     FROM Test t
        ->     WHERE (platform_id = 2) AND (item_id IN (3,2,111)) AND (type = 'Default')
        ->     GROUP BY item_id
        -> ) t2 ON t.item_id = t2.item_id
        ->   t.item_id = t2.item_id
        ->   AND t.created_at = t2.created_at
        ->   AND t.type = 'Default'
        -> WHERE t.platform_id = 2;
    +----+
    | id |
    +----+
    |  6 |
    |  3 |
    |  5 |
    +----+
    3 rows in set (0,03 sec)

    mysql> select version();
    +-----------+
    | version() |
    +-----------+
    | 5.7.29    |
    +-----------+
    1 row in set (0,02 sec)
  • Bug #97531 - "5.7 replication breakage with syntax error with GRANT management". This tricky bug reported by Simon Mudd applies also to MySQL 8.0.x. It is closed as fixed, but the fix had not made it to recent 5.7.29 and 8.0.19 releases, so you'll have to wait for few more months.
  • Bug #97552 - "Regression: LEFT JOIN with Impossible ON condition performs slowly". Yet another optimizer regression in MySQL 8 (comparing to 5.7.x) that is fixed only in MySQL 8.0.20+. The bug was reported by Fredric Johansson.
  • Bug #97648 - "Bug in order by clause in union clause". Yet another regression (at least from user's point of view) in recent MySQL 5.7.x and 8.0.x comparing to 5.6.x. This time without a "regression" tag. The bug was reported by Andrei Mart.
  • Bug #97662 - "MySQL v8.0.18 FIPS mode is no longer supported". According to Ryan L, MySQL 8.0.18+ is no longer supporting ssl_fips_mode=STRICT, as OpenSSL 1.1.1 is not FIPS-compatible and MySQL Server must be compiled using OpenSSL 1.1.1 or higher. That's interesting. Check also this link.
  • Bug #97682 - "Handler fails to trigger on Error 1049 or SQLSTATE 42000 or plain sqlexception". This regression (comparing to MySQL 5.7) was reported by Jericho Rivera. It is fixed in MySQL 8.0.20. The patch was provided by Kamil Holubicki.
  • Bug #97692 - "Querying information_schema.TABLES issue". I do not see any documented attempt to check on MySQL 8.0, so I had to add a comment to the bug report. From what I see, in MySQL 8.0.19 we still get different (empty) result from the second query, but at least now we have a warning:
    mysql> SELECT ts.TABLE_SCHEMA
        -> FROM information_schema.TABLES ts
        -> WHERE ts.TABLE_TYPE ='VIEW'
        -> AND ts.TABLE_SCHEMA NOT IN ('sys')
        -> AND ts.TABLE_COMMENT LIKE '%invalid%';
    +--------------+
    | TABLE_SCHEMA |
    +--------------+
    | test         |
    +--------------+
    1 row in set, 1 warning (0,00 sec)

    mysql> show warnings\G
    *************************** 1. row ***************************
      Level: Warning
       Code: 1356
    Message: View 'test.v' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
    1 row in set (0,00 sec)

    mysql> select version();
    +-----------+
    | version() |
    +-----------+
    | 8.0.19    |
    +-----------+
    1 row in set (0,00 sec)
    The bug was reported by Vinicius Malvestio Grippa.
  • Bug #97693 - "ALTER USER user IDENTIFIED BY 'password' broken by invalid authentication_string". The bug was reported by Nikolai Ikhalainen. MySQL 8.0.19 is still affected.
  • Bug #97694 - "MySQL 8.0.18 fails on STOP SLAVE/START SLAVE stress test". For some reason I do not see any documented attempt to verify this on MySQL 5.7 also. The bug was reported by Przemysław Skibiński, who also suggested a fix.
  • Bug #97734 - "Document the correct method to stop slaving with MTS without a warning or error". I can only agree with this request from Buchan Milne. Please. do :)
  • Bug #97735 - "ALTER USER IF EXISTS ... WITH_MAX_USER_CONNECTIONS 9999 not applied correctly". yet another bug report by Simon Mudd in this list. For some reason, again, I do not see any documented attempt to verify the bug on MySQL 8.0.x, while there is no clear reason to think it is not affected.
  • Bug #97742 - "bad item ref from correlated subquery to outer distinct table". This bug was reported by Song Zhibai, who also had contributed a patch. Based on further comments from  Øystein Grøvlen and these results:
    mysql> EXPLAIN SELECT f3 FROM t1 HAVING (SELECT 1 FROM t2 HAVING f2 LIMIT 1);
    +----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
    | id | select_type        | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
    +----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
    |  1 | PRIMARY            | t1    | NULL       | ALL   | NULL          | NULL    | NULL    | NULL |    3 |   100.00 | NULL        |
    |  2 | DEPENDENT SUBQUERY | t2    | NULL       | index | NULL          | PRIMARY | 4       | NULL |    1 |   100.00 | Using index |
    +----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
    2 rows in set, 2 warnings (0,00 sec)

    mysql> show warnings\G
    *************************** 1. row ***************************
      Level: Note
       Code: 1276
    Message: Field or reference 'f2' of SELECT #2 was resolved in SELECT #1
    *************************** 2. row ***************************
      Level: Note
       Code: 1003
    Message: /* select#1 */ select `test`.`t1`.`f3` AS `f3` from `test`.`t1` having (/* select#2 */ select 1 from `test`.`t2` having `test`.`t1`.`f2` limit 1)
    2 rows in set (0,00 sec)

    mysql> select version();
    +-----------+
    | version() |
    +-----------+
    | 5.7.29    |
    +-----------+
    1 row in set (0,00 sec)
    I'd say that MySQL 5.7.x is also affected, but for some reason nobody documented any attempt to verify it there. So, I've added a comment.
  • Bug #97777 - "separate global variables (from hot variables) using linker script (ELF)". Beautiful bug report from Daniel Black.With a lot of details, perf and readelf outputs and patch contributed. See also his Bug #97822 - "buf_page_get_gen buf_pool->stat.n_page_gets++ is a cpu waste", with perf analysis up to a single assembler instruction level and fix suggested.
  • Bug #97825 - "dd_mdl_acquire in dd_table_open with dict_sys->mutex hold may cause deadlock". Here I am really puzzled by no visible attempt to check the arguments of bug reporter, Dave Do, who tried to perform lock order analysis by code review. All we see as a result is this:
    "Lock order could be different, but it is irrelevant, since these are locks on totally different levels and can't, in themselves, cause any deadlock."
    What a great argument! Not a bug, surely... We trust you.
    "What bugs are you talking about? I have no bugs, neither does MySQL 8!"
    To summarize:
    1. MySQL 8 introduces some optimizer (and some other) regressions. They seem to be fixed fast enough, but I wonder why only Community users were able to find them not Oracle's QA...
    2. MySQL 8.0.19 is surely great, but I see many serious bugs fixed o0nly in 8.0.20+.
    3. Percona, Booking and Facebook engineers still continue contributing high quality bug reports, comments/verification details and patches. Oracle is lucky to have such nice partners in making MySQL better.
    4. I still see problems with following proper verification procedures and documenting the results. Too often the bug reported for 8.0.x is NOT checked on 5.7.x as well, regression tag is not set, and so on. Sometimes reports are closed as "Not a bug" without any attempt to follow the analysis provided or prove the point. This is sad and wrong.

    by Valerii Kravchuk (noreply@blogger.com) at January 18, 2020 07:49 PM

    January 17, 2020

    SeveralNines

    MongoDB 4.2 Management & Monitoring Without Vendor Lockin

    With the release of a new version of ClusterControl (1.7.5), we can see several new features, one of the main ones being the support for MongoDB 4.2.

    MongoDB 4.2 is on the market for a while. It was initially announced at MongoDB World in June 2019, with GA ready in August. Since then, a lot of you have been putting it through its paces. It brings many awaited features, which makes NoSQL a more straightforward choice over RDBMS.

    The most significant feature in 4.X was transaction support. It dramatically reduces the gap between RDBMS and NoSQL systems. MongoDB transactions were added in version 4.0, but that didn't work with the most powerful feature of MongoDB clusters. Now MongoDB extends multi-document ACID, which is now guaranteed from the replica set to sharded clusters, enabling you to serve an even broader range of use cases.

    The most prominent features of version 4.2 are:

    • On-Demand Materialized Views using the new $merge operator. 
    • Distributed transactions
    • Wildcard Indexes
    • Server-side updates 
    • MongoDB Query Language enhancements
    • Field-level encryption to selectively protect sensitive files

    To install MongoDB 4.2 manually, we must first add the repositories or download the necessary packages for the installation, install them, and configure them correctly, depending on our infrastructure. All these steps take time, so let's see how we could speed it up.

    In this blog, we will see how to deploy this new MongoDB version with a few clicks using ClusterControl and how to manage it. As a prerequisite, please install the 1.7.5 version of ClusterControl on a dedicated host or VM.

    Deploying a MongoDB 4.2 Replica Shard

    To perform a new installation from ClusterControl, select the option "Deploy" and follow the instructions that appear. Note that if you already have a MongoDB 4.2 instance running, then you need to choose the 'Import Existing Server/Database' instead.

    Deploy MongoDB 4.2

    ClusterControl Deployment Options

    When selecting MongoDB, we must specify User, Key or Password and port to connect by SSH to our MongoDB nodes. We also need the name for our new cluster and if we want ClusterControl to install the corresponding software and configurations for us.

    After setting up the SSH access information, we must define the database user, version, and datadir (optional). We can also specify which repository to use. In this case, we want to deploy MongoDB 4.2, so select it and continue.

    In the next step, we need to add our servers to the cluster we are going to create.

    ClusterControl Percona 4.2 MongoDB Deployment

    When adding our servers, we can enter IP or hostname.

    ClusterControl MongoDB 4.2 Deployment

    We can monitor the status of the creation of our new cluster from the ClusterControl activity monitor.

    ClusterControl Job Details

    Once the task is finished, we can see our new MongoDB replicaSet in the main ClusterControl screen.

    ClusterContorol Dashboard Status

    Once we have our cluster created, we can perform several tasks on it, like adding a backup job

    Scaling MongoDB 4.2 

    If we go to cluster actions and select "Add  Node", we can either create a new replica from scratch or add an existing MongoDB database as a replica.

    ClusterControl MongoDB 4.2 Add a Node

    As you can see in the image, we only need to choose our new or existing server, enter the IP address for our new slave server and the database port. Then, we can choose if we want ClusterControl to install the software for us and configure cluster.

    The other option is to convert replica set clusters to MongoDB shard. CusterControl will walk you through the process. We need to provide details about Configuration Server and Routers as you can see on the below screen.  

    ClusterControl Convert MongoDB 4.2 ReplicaSet to Shard

    Conclusion

    As we have seen above, you can now deploy the latest MongoDB (version 4.2) using ClusterControl. Once deployed, ClusterControl provides a whole range of features, from monitoring, alerting, automatic failover, backup, point-in-time recovery, backup verification, to scaling of reading replicas.

    by Bart Oles at January 17, 2020 10:45 AM

    January 16, 2020

    SeveralNines

    Why Did My MySQL Database Crash? Get Insights with the New MySQL Freeze Frame

    In case you haven't seen it, we just released ClusterControl 1.7.5  with major improvements and new useful features. Some of the features include Cluster Wide Maintenance, support for version CentOS 8 and Debian 10, PostgreSQL 12 Support, MongoDB 4.2 and Percona MongoDB v4.0 support, as well as the new MySQL Freeze Frame. 

    Wait, but What is a MySQL Freeze Frame? Is This Something New to MySQL? 

    Well it's not something new within the MySQL Kernel itself. It's a new feature we added to ClusterControl 1.7.5 that is specific to MySQL databases. The MySQL Freeze Frame in ClusterControl 1.7.5 will cover these following things:

    • Snapshot MySQL status before cluster failure.
    • Snapshot MySQL process list before cluster failure (coming soon).
    • Inspect cluster incidents in operational reports or from the s9s command line tool.

    These are valuable sets of information that can help trace bugs and fix your MySQL/MariaDB clusters when things go south. In the future, we are planning to include also snapshots of the SHOW ENGINE InnoDB status values as well. So please stay tuned to our future releases.

    Note that this feature is still in beta state, we expect to collect more datasets as we work with our users. In this blog, we will show you how to leverage this feature, especially when you need further information when diagnosing your MySQL/MariaDB cluster.

    ClusterControl on Handling Cluster Failure

    For cluster failures, ClusterControl does nothing unless Auto Recovery (Cluster/Node) is enabled just like below:

    Once enabled, ClusterControl will try to recover a node or recover the cluster by bringing up the entire cluster topology. 

    For MySQL, for example in a master-slave replication, it must have at least one master alive at any given time, regardless of the number of available slave/s. ClusterControl attempts to correct the topology at least once for replication clusters, but provides more retries for multi-master replication like NDB Cluster and Galera Cluster. Node recovery attempts to recover a failing database node, e.g. when the process was killed (abnormal shutdown), or the process suffered an OOM (Out-of-Memory). ClusterControl will connect to the node via SSH and try to bring up MySQL. We have previously blogged about How ClusterControl Performs Automatic Database Recovery and Failover, so please visit that article to learn more about the scheme for ClusterControl auto recovery.

    In the previous version of ClusterControl < 1.7.5, those attempted recoveries triggered alarms. But one thing our customers missed was a more complete incident report with state information just before the cluster failure. Until we realized this shortfall and  added this feature in ClusterControl 1.7.5. We called it the "MySQL Freeze Frame". The MySQL Freeze Frame, as of this writing, offers a brief summary of incidents leading to cluster state changes just before the crash. Most importantly, it includes at the end of the report the list of hosts and their MySQL Global Status variables and values.

    How Does MySQL Freeze Frame Differs With Auto Recovery?

    The MySQL Freeze Frame is not part of the auto recovery of ClusterControl. Whether Auto Recovery is disabled or enabled, the MySQL Freeze Frame will always do its work as long as a cluster or node failure has been detected.

    How Does MySQL Freeze Frame Work?

    In ClusterControl, there are certain states that we classify as different types of Cluster Status. MySQL Freeze Frame will generate an incident report when these two states are triggered:

    • CLUSTER_DEGRADED
    • CLUSTER_FAILURE

    In ClusterControl, a CLUSTER_DEGRADED is when you can write to a cluster, but one or more nodes are down. When this happens, ClusterControl will generate the incident report.

    For CLUSTER_FAILURE, though its nomenclature explains itself, it is the state where your cluster fails and is no longer able to process reads or writes. Then that is a CLUSTER_FAILURE state. Regardless of whether an auto-recovery process is attempting to fix the problem or whether it's disabled, ClusterControl will generate the incident report.

    How Do You Enable MySQL Freeze Frame?

    ClusterControl's MySQL Freeze Frame is enabled by default and only generates an incident report only when the states CLUSTER_DEGRADED or CLUSTER_FAILURE are triggered or encountered. So there's no need on the user end to set any ClusterControl configuration setting, ClusterControl will do it for you automagically.

    Locating the MySQL Freeze Frame Incident Report

    As of this writing, there are 4-ways you can locate the incident report. These can be found by doing the following sections below.

    Using the Operational Reports Tab

    The Operational Reports from the previous versions are used only to create, schedule, or list the operational reports that have been generated by users. Since version 1.7.5, we included the incident report generated by our MySQL Freeze Frame feature. See the example below:

    The checked items or items with Report type == incident_report, are the incident reports generated by MySQL Freeze Frame feature in ClusterControl.

    Using Error Reports

    By selecting the cluster and generating an error report, i.e. going through this process: <select the cluster> → Logs → Error Reports→ Create Error Report. This will include the incident report under the ClusterControl host.

    Using s9s CLI Command Line

    On a generated incident report, it does include instructions or hint on how you can use this with s9s CLI command. Below are what's shown in the incident report:

    Hint! Using the s9s CLI tool allows you to easily grep data in this report, e.g:

    s9s report --list --long
    
    s9s report --cat --report-id=N

    So if you want to locate and generate an error report, you can use this approach:

    [vagrant@testccnode ~]$ s9s report --list --long --cluster-id=60
    
    ID CID TYPE            CREATED TITLE                            
    
    19  60 incident_report 16:50:27 Incident Report - Cluster Failed
    
    20  60 incident_report 17:01:55 Incident Report

    If I want to grep the wsrep_* variables on a specific host, I can do the following:

    [vagrant@testccnode ~]$ s9s report --cat --report-id=20 --cluster-id=60|sed -n '/WSREP.*/p'|sed 's/  */ /g'|grep '192.168.10.80'|uniq -d
    
    | WSREP_APPLIER_THREAD_COUNT | 4 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_CLUSTER_CONF_ID | 18446744073709551615 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_CLUSTER_SIZE | 1 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_CLUSTER_STATE_UUID | 7c7a9d08-2d72-11ea-9ef3-a2551fd9f58d | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_EVS_DELAYED | 27ac86a9-3254-11ea-b104-bb705eb13dde:tcp://192.168.10.100:4567:1,9234d567-3253-11ea-92d3-b643c178d325:tcp://192.168.10.90:4567:1,9234d567-3253-11ea-92d4-b643c178d325:tcp://192.168.10.90:4567:1,9e93ad58-3241-11ea-b25e-cfcbda888ea9:tcp://192.168.10.90:4567:1,9e93ad58-3241-11ea-b25f-cfcbda888ea9:tcp://192.168.10.90:4567:1,9e93ad58-3241-11ea-b260-cfcbda888ea9:tcp://192.168.10.90:4567:1,9e93ad58-3241-11ea-b261-cfcbda888ea9:tcp://192.168.10.90:4567:1,9e93ad58-3241-11ea-b262-cfcbda888ea9:tcp://192.168.10.90:4567:1,9e93ad58-3241-11ea-b263-cfcbda888ea9:tcp://192.168.10.90:4567:1,b0b7cb15-3241-11ea-bdbc-1a21deddc100:tcp://192.168.10.100:4567:1,b0b7cb15-3241-11ea-bdbd-1a21deddc100:tcp://192.168.10.100:4567:1,b0b7cb15-3241-11ea-bdbe-1a21deddc100:tcp://192.168.10.100:4567:1,b0b7cb15-3241-11ea-bdbf-1a21deddc100:tcp://192.168.10.100:4567:1,b0b7cb15-3241-11ea-bdc0-1a21deddc100:tcp://192.168.10.100:4567:1,dea553aa-32b9-11ea-b321-9a836d562a47:tcp://192.168.10.100:4567:1,dea553aa-32b9-11ea-b322-9a836d562a47:tcp://192.168.10.100:4567:1,e27f4eff-3256-11ea-a3ab-e298880f3348:tcp://192.168.10.100:4567:1,e27f4eff-3256-11ea-a3ac-e298880f3348:tcp://192.168.10.100:4567:1 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_GCOMM_UUID | 781facbc-3241-11ea-8a22-d74e5dcf7e08 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_LAST_COMMITTED | 443 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_LOCAL_CACHED_DOWNTO | 98 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_LOCAL_RECV_QUEUE_MAX | 2 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_LOCAL_STATE_UUID | 7c7a9d08-2d72-11ea-9ef3-a2551fd9f58d | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_PROTOCOL_VERSION | 10 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_PROVIDER_VERSION | 26.4.3(r4535) | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_RECEIVED | 112 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_RECEIVED_BYTES | 14413 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_REPLICATED | 86 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_REPLICATED_BYTES | 40592 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_REPL_DATA_BYTES | 31734 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_REPL_KEYS | 86 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_REPL_KEYS_BYTES | 2752 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_ROLLBACKER_THREAD_COUNT | 1 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_THREAD_COUNT | 5 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_EVS_REPL_LATENCY | 4.508e-06/4.508e-06/4.508e-06/0/1 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |

    Manually Locating via System File Path

    ClusterControl generates these incident reports in the host where ClusterControl runs. ClusterControl creates a directory in the /home/<OS_USER>/s9s_tmp or /root/s9s_tmp if you are using the root system user. The incident reports can be located, for example, by going to  /home/vagrant/s9s_tmp/60/galera/cmon-reports/incident_report_2020-01-09_085027.html where the format explains as,  /home/<OS_USER>/s9s_tmp/<CLUSTER_ID>/<CLUSTER_TYPE>/cmon-reports/<INCIDENT_FILE_NAME>.html. The full path of the file is also displayed when you hover your mouse in the item or file you want to check under the Operational Reports Tab just like below:

    Are There Any Dangers or Caveats When Using MySQL Freeze Frame?

    ClusterControl does not change nor modify anything in your MySQL nodes or cluster. MySQL Freeze Frame will just read SHOW GLOBAL STATUS (as of this time) at specific intervals to save records since we cannot predict the state of a MySQL node or cluster when it can crash or when it can have hardware or disk issues. It's not possible to predict this, so we save the values and therefore we can generate an incident report in case a particular node goes down. In that case, the danger of having this is close to none. It can theoretically add a series of client requests to the server(s) in case some locks are held within MySQL, but we have not noticed it yet.The series of tests doesn't show this so we would be glad if you can let us know or file a support ticket in case problems arise.

    There are certain situations where an incident report might not be able to gather global status variables if a network issue was the problem prior to ClusterControl freezing a specific frame to gather data. That's completely reasonable because there's no way ClusterControl can collect data for further diagnosis as there's no connection to the node in the first place.

    Lastly, you might wonder why not all variables are shown in the GLOBAL STATUS section? For the meantime, we set a filter where empty or 0 values are excluded in the incident report. The reason is that we want to save some disk space. Once these incident reports are no longer needed, you can delete it via Operational Reports Tab.

    Testing the MySQL Freeze Frame Feature

    We believe that you are eager to try this one and see how it works. But please, make sure you are not running or testing this in a live or production environment. We'll cover 2-phases of scenario in the MySQL/MariaDB, one for master-slave setup and one for Galera-type setup.

    Master-Slave Setup Test Scenario

    In a master-slave(s) setup, it's easy and simple to try. 

    Step One

    Make sure that you have disabled the Auto Recovery modes (Cluster and Node), like below:

    so it won't try or attempt to fix the test scenario.

    Step Two

    Go to your Master node and try setting to read-only:

    root@node1[mysql]> set @@global.read_only=1;
    
    Query OK, 0 rows affected (0.000 sec)

    Step Three

    This time, an alarm was raised and so a generated incident report. See below how does my cluster looks like:

    and the alarm was triggered:

    and the incident report was generated:

    Galera Cluster Setup Test Scenario

    For Galera-based setup, we need to make sure that the cluster will be no longer available, i.e., a cluster-wide failure. Unlike the Master-Slave test, you can let Auto Recovery enabled since we'll play around with network interfaces.

    Note: For this setup, ensure that you have multiple interfaces if you are testing the nodes in a remote instance since you cannot bring the interface up when you down that interface where you are connected.

    Step One

    Create a 3-node Galera cluster (for example using vagrant)

    Step Two

    Issue the command (just like below) to simulate network issue and do this to all the nodes

    [root@testnode10 ~]# ifdown eth1
    
    Device 'eth1' successfully disconnected.

    Step Three

    Now, it took my cluster down and have this state:

    raised an alarm,

    and it generates an incident report:

    For a sample incident report, you can use this raw file and save it as html.

    It's quite simple to try but again, please do this only in a non-live and non-prod environment.

    Conclusion

    MySQL Freeze Frame in ClusterControl can be helpful when diagnosing crashes. When troubleshooting, you need a wealth of information in order to determine cause and that is exactly what MySQL Freeze Frame provides.

    by Paul Namuag at January 16, 2020 10:45 AM

    January 15, 2020

    SeveralNines

    Database Management & Monitoring for PostgreSQL 12

    A few months ago we blogged about the release of PostgreSQL 12, with notable improvements to query performance (particularly over larger data sets and overall space utilization) among other important features. Now, with the ClusterControl 1.7.5 version, we’re glad to announce support for this new PostgreSQL version.

    This new ClusterControl 1.7.5 version comes with many new features for managing and monitoring your database cluster. In this blog, we’ll take a look at these features and see how to deploy PostgreSQL 12 easily.

    Easily Deploy PostgreSQL 12

    To perform a new installation of PostgreSQL 12 from ClusterControl, just select the “Deploy” option and follow the instructions that appear. Note that if you already have a PostgreSQL 12 instance running, then you need to select the “Import Existing Server/Database” instead.

    Deploy PostgreSQL 12

    When selecting PostgreSQL, you must specify User, Key or Password, and port to connect by SSH to your PostgreSQL hosts. You also need the name for your new cluster and if you want ClusterControl to install the corresponding software and configurations for you.

    Deploy PostgreSQL 12

    Please check the ClusterControl user requirement for this step here.

    Deploy PostgreSQL 12

    After setting up the SSH access information, you must define the database user, version, and datadir (optional). You can also specify which repository to use. In this case, we want to deploy PostgreSQL 12, so just select it and continue.

    In the next step, you need to add your servers to the cluster you’re going to create.

    When adding your servers, you can enter IP or hostname.

    In the last step, you can choose if your replication will be Synchronous or Asynchronous.

    Deploy Postgres 12

    You can monitor the status of the creation of your new cluster from the ClusterControl Activity Monitor.

    Once the task is finished, you can see your new PostgreSQL 12 cluster in the main ClusterControl screen.

    Once you have your cluster created, you can perform several tasks on it, like adding a load balancer (HAProxy, Keepalived) or a new replica, and also different management or monitoring tasks.

    PostgreSQL 12 Database Management

    As you probably know, using ClusterControl you can perform different management tasks like add/remove load balancers, add/remove slave nodes, automatic fail-over and recovery, backups, create/modify advisors, and even more.

    Schedule Maintenance Mode

    One of the new ClusterControl management features is the option to schedule maintenance mode for the database cluster. If you need to modify something in your environment or if for some reason you need to schedule a maintenance window, you can set it with ClusterControl.

    Go to ClusterControl -> Cluster Actions -> Schedule Maintenance Mode, to enable the maintenance window for all the cluster.

    After enabling it, you won’t receive alarms and notifications from this cluster during the specified period.

    In case you will work over one specific node, you can enable this maintenance mode just for that node and not for all the cluster by using the “Schedule Maintenance Mode” in the Node Actions section.

    PostgreSQL User Management

    Now, in the ClusterControl 1.7.5 version, you’ll be able to manage users/roles for your PostgreSQL cluster. Go to ClusterControl -> Select Cluster -> Manage -> User Management.

    PostgreSQL GUI User Management

    Here you can see all the accounts with the privileges assigned, and you can create a new one, or modify/edit an existing account.

    Now, let’s see how to monitor this new PostgreSQL version by using ClusterControl.

    PostgreSQL 12 Database Monitoring

    Monitoring is a must in all environments, and databases aren’t the exception. If you select your cluster in the ClusterControl main screen, you’ll see an overview of it with some basic metrics.

    PostgreSQL 12 Monitoring

    But probably this is not enough to see what is happening in your database cluster. So if you go to ClusterControl -> Select your Cluster -> Dashboards, you can enable this agent-based dashboard to monitor your database in more detail.

    Once it is enabled, you’ll have detailed information from both the database and the operating system side.

    Postgres 12 Monitoring

    This dashboard method is useful to see, in a friendly way,  if everything is going fine.

    You can also take advantage of the old monitoring features like query monitor, performance, advisors, and more features for PostgreSQL or different database technologies.

    Conclusion

    PostgreSQL 12 comes with many improvements to query performance and new features. If you’re looking for a quick way to give it a try, ClusterControl can help you to deploy, manage and monitor it in an easy way.

    by Sebastian Insausti at January 15, 2020 10:45 AM

    January 14, 2020

    Henrik Ingo

    Automatic retries in MongoDB

    At work we were discussing whether MongoDB will retry operations in some circumstances or whether the client needs to be prepared to do so. After a while we realized different participants in the discussion were discussing different retries.

    So I sat down to get to the bottom of all the retries that can happen in MongoDB, and write a blog post about them. But after googling a bit it turns out someone has already written that blog post, so this will be a short post for me linking to other posts.

    Retries by the driver

    If you set retryWrites=true in your MongoDB connection string, then the driver will automatically retry some write operations for some types of failures. Ok, can I be more specific? Yes I can...

    read more

    by hingo at January 14, 2020 11:10 AM

    SeveralNines

    Cluster-Wide Database Maintenance and Why You Need It

    Undoubtedly, there is a long list of maintenance tasks that have to be performed by system administrators, especially when it comes to critical systems. Some of the tasks have to be performed at regular intervals, like daily, weekly, monthly and yearly. Some have to be done right away, urgently. Nevertheless, any maintenance operation should not lead to another bigger problem, and any maintenance has to be handled with extra care to avoid any interruption to the business. Therefore, planning, scheduling and reporting are important aspects. 

    ClusterControl, as a cluster automation and management tool, is smart enough to plan and schedule maintenance windows in advance. This can help avoid unpleasant surprises during production operations, for instance unnecessary recovery procedure, failovers and alarms being triggered. This blog showcases some of the new maintenance mode features that come with ClusterControl 1.7.5.

    Maintenance Mode pre v1.7.5

    Maintenance mode has been in ClusterControl logic since v1.4.0, where one could set a maintenance duration to an individual node, which allows ClusterControl to disable recovery/failover and alarms on that node during a set period. The maintenance mode can be activated immediately or scheduled to run in the future. Alarms and notifications will be turned off when maintenance mode is active, which is expected in an environment where the corresponding node is undergoing maintenance.

    Some of the weaknesses that we found out and also reported by our users:

    • Maintenance mode was bound per node. This means if one would want to perform maintenance on all nodes in the cluster, one had to repeatedly configure the maintenance mode for every node in the cluster. For larger environments, scheduling a major maintenance window for all nodes on multiple clusters could be repetitive.
    • Activating maintenance mode did not deactivate the automatic recovery feature. This would cause an unhealthy node to be recovered automatically while maintenance is ongoing. False alarms might be raised.
    • Maintenance mode could not be activated periodically per schedule. Therefore, regular maintenance had to be defined manually for every approaching date. There was no way to schedule a cron-based (with iteration) maintenance mode.

    ClusterControl new maintenance mode and job implementations solve all of the key problems mentioned, which are shown in the next sections.

    Database Cluster-Wide Maintenance Mode

    Cluster-wide maintenance mode comes handy in an environment where you have multiple clusters, and multiple nodes per cluster managed by a single ClusterControl instance. For example, a common production setup of a MySQL Galera Cluster could have up to 7 nodes - A three-node Galera Cluster could have one additional host for asynchronous slave, with two ProxySQL/Keepalived nodes and one backup verification server. For older ClusterControl versions where only node maintenance was supported, if a major maintenance is required, for example upgrading OS kernel on all hosts, the scheduling had to be repeated 7 times for every monitored node. We have covered this issue in detail in this blog post, with some workarounds.

    Cluster-wide maintenance mode is the super-set of node maintenance mode as in the previous versions. An activated cluster-wide maintenance mode will activate maintenance mode on all nodes in the particular cluster. Simply click on the Cluster Actions > Schedule Maintenance Mode and you will be presented with the following dialog:

    The fields in this dialog are almost identical with scheduling maintenance dialog for single node, except its domain is the particular cluster, as highlighted in the red oval. You can activate the maintenance immediately, or schedule it to run in the future. Once scheduled, you should see the following notification under the summary bar with status "Scheduled" for all clusters:

    Once the maintenance mode is activated, you should see the blue maintenance icon on the summary bar of the cluster, together with the green 'Active' icon notification in the ClusterControl UI:

    All active maintenance mode can be deactivated at any time via the UI, just go to the Cluster Actions > Disable Maintenance Mode.

    Advanced Maintenance Management via ClusterControl CLI

    ClusterControl CLI a.k.a s9s, comes with an extended maintenance management functionality, allowing users to improve the existing maintenance operation flow as a whole. The CLI works by sending commands as JSON messages to ClusterControl Controller (CMON) RPC interface, via TLS encryption which requires the port 9501 to be opened on controller and the client host.

    With a bit of scripting knowledge, we can fully automate and synchronize the maintenance process flow especially if the exercise involves another layer/party/domain outside of ClusterControl. Note that we always incorporated our changes via the CLI first before making it to the UI. This is one of the ways to test out new functionality to find out if they would be useful to our users.

    The following sections will give you a walkthrough on advanced management for maintenance mode via command line.

    View Maintenance Mode

    To list out all maintenance that has been scheduled for all clusters and nodes:

    $ s9s maintenance --list --long
    ST UUID    OWNER          GROUP  START               END                 HOST/CLUSTER REASON
    Ah 460a97b dba            admins 02:31:32            04:31:32            192.168.0.22 Switching to different racks
    -h e3bf19f user@email.com        2020-01-17 02:35:00 2020-01-17 03:00:00 192.168.0.23 Change network cable - Clark Kent
    -c 8f55f76 user@email.com        2020-01-17 02:34:00 2020-01-17 03:59:00 PXC 57       Kernel upgrade and system reboot - John Doe
    Ac 4f4d73c dba            admins 02:30:01            02:31:01            MariaDB 10.3 Test maintenance job creation every 5 minutes

    Owner with email address means the maintenance mode was created by ClusterControl UI user. While for owners with groups, that user is coming from the CLI with our new user/group permission currently supported on CLI only. The leftmost column is the maintenance mode status:

    • The first character: 'A' stands for active and '-' stands for inactive.
    • The second character: 'h' stands for host-related maintenance and 'c' stands for cluster-related maintenance.

    To list out the current active maintenance mode:

    $ s9s maintenance --current --cluster-id=32
    Cluster 32 is under maintenance: Kernel upgrade and system reboot - John Doe

    Use the job command option to get the timestamp, and status of past maintenance mode:

    $ s9s job --list | grep -i maintenance
    5979  32 SCHEDULED dba            admins 2020-01-09 05:29:34   0% Registering Maintenance
    5980  32 FINISHED  dba            admins 2020-01-09 05:30:01   0% Registering Maintenance
    5981  32 FINISHED  dba            admins 2020-01-09 05:35:00   0% Registering Maintenance
    5982  32 FINISHED  dba            admins 2020-01-09 05:40:00   0% Registering Maintenance

    'Registering Maintenance' is the job name to schedule or activate the maintenance mode.

    Create a Maintenance Mode

    To create a new maintenance mode for a node, specify the host under --nodes parameter, with --begin and --end in ISO 8601 (with microsecond, UTC only thus the suffix 'Z') date format:

    $ s9s maintenance --create \
    --nodes="192.168.0.21" \
    --begin="2020-01-09T08:50:58.000Z" \
    --end="2020-01-09T09:50:58.000Z" \
    --reason="Upgrading RAM"

    However, the above will require an extra effort to figure out the correct start time and end time. We can use the "date" command to translate the date and time to the supported format relative to the current time, similar to below:

    $ s9s maintenance --create \
    --nodes="192.168.0.21" \
    --begin="$(date +%FT%T.000Z -d 'now')" \
    --end="$(date +%FT%T.000Z -d 'now + 2 hours')" \
    --reason="Upgrading RAM"
    b348f2ac-9daa-4481-9a95-e8cdf83e81fc

    The above will activate a maintenance mode for node 192.168.0.21 immediately and will end up in 2 hours from the moment it was created. An accepted command should receive a UUID, as in the above example, it was 'b348f2ac-9daa-4481-9a95-e8cdf83e81fc'. A wrong command will simply return a blank output.

    The following command will schedule a maintenance mode for cluster ID 32 on the next day:

    $ s9s maintenance --create \
    --cluster-id=32 \
    --begin="$(date +%FT%T.000Z -d 'now + 1 day')" \
    --end="$(date +%FT%T.000Z -d 'now + 1 day + 2 hours')" \
    --reason="Replacing old network cable"
    85128b1a-a1cd-450e-b381-2a92c03db7a0

    We can also see what is coming up next in the scheduled maintenance for a particular node or cluster:

    $ date -d 'now'
    Wed Jan  8 07:41:57 UTC 2020
    
    $ s9s maintenance --next --cluster-id=32 --nodes='192.168.0.22'
    Host 192.168.0.22 maintenance starts Jan 09 07:41:23: Replacing old network cable

    Omit --nodes if you just want to see the upcoming maintenance details for a particular cluster.

    Delete Maintenance Mode

    Firstly, retrieve the maintenance job UUID:

    $ s9s maintenance --list --long
    ST UUID    OWNER          GROUP START               END                 HOST/CLUSTER             REASON
    -h 7edeabb user@email.com       04:59:00            06:59:00            192.168.0.21             Changing network cable - John Doe
    -c 82b13d3 user@email.com       2020-01-10 05:02:00 2020-01-10 06:27:00 MariaDB 10.3 Replication Upgrading RAM
    Total: 2

    Use the --uuid and specify the corresponding maintenance mode to delete:

    $ s9s maintenance --delete --uuid=82b13d3
    Deleted.

    At this point the maintenance mode has been deleted for the corresponding node or cluster.

    Maintenance Mode Scheduling with Iteration

    In ClusterControl 1.7.5, maintenance mode can be scheduled and iterated just like a cron job. For example, you can now schedule a maintenance mode for daily, weekly, monthly or yearly. This iteration automates the maintenance mode job creation and simplifies the maintenance workflow, especially if you are running in a fully automated infrastructures, where maintenance happens automatically and at regular intervals.

    There is a special flag that we have to use called --create-with-job, where it registers the maintenance as a new job for the controller to execute. The following is a simple example where we activate maintenance mode by registering a new job:

    $ s9s maintenance \
    --create-with-job \
    --cluster-id=32 \
    --reason="testmainteannce" \
    --minutes=60 \
    --log
    
    Preparing to register maintenance.
    The owner of the maintenance will be 'dba'.
    The reason is: testmainteannce
    The maintenance starts NOW.
    Maintenance will be 60 minute(s) long.
    Registering maintenance for cluster 32.
    Maintenance registered.

    To schedule a periodic maintenance, use the --create-with-job flag, with --minutes for the maintenance duration and --recurrence flag in cron-style formatting. The following command schedules a maintenance job every Friday at 3 AM for cluster ID 32:

    $ s9s maintenance \
    --create-with-job \
    --cluster-id=32 \
    --reason="Weekly OS patch at 3 AM every Friday" \
    --minutes=120 \
    --recurrence="0 3 * * 5" \
    --job-tags="maintenance"
    
    Job with ID 5978 registered.

    You should get a job ID in the response. We can then verify if the job has been created correctly:

    $ s9s job --list --job-id=5978
    ID   CID STATE     OWNER GROUP  CREATED  RDY TITLE
    5978  32 SCHEDULED dba   admins 05:21:07 0%  Registering Maintenance

    We can also use the --show-scheduled flag together with --long flag to get extended information on the scheduled job:

    $ s9s job --show-scheduled --list --long
    --------------------------------------------------------------------------------------------------------------------------
    Registering Maintenance
    Scheduled
    
    Created   : 2020-01-09 05:21:07    ID : 5978      Status : SCHEDULED
    Started   :                      User : dba         Host : 127.0.0.1
    Ended     :                      Group: admins    Cluster: 32
    Tags      : #maintenance
    RPC       : 2.0
    --------------------------------------------------------------------------------------------------------------------------

    A recurring job created by the scheduled job will be tagged as "recurrence":

    --------------------------------------------------------------------------------------------------------------------------
    Registering Maintenance
    Job finished.                                                                                                [ ]
                                                                                                                     0.00%
    Created   : 2020-01-09 05:40:00    ID : 5982        Status : FINISHED
    Started   : 2020-01-09 05:40:01    User : dba         Host : 127.0.0.1
    Ended     : 2020-01-09 05:40:01    Group: admins    Cluster: 32
    Tags      : #recurrence
    RPC       : 2.0
    --------------------------------------------------------------------------------------------------------------------------

    Thus, to list out the recurring job, we can use the --job-tags flag. The following example shows executed recurring jobs scheduled to run every 5 minutes:

    $ s9s job --list --job-tags=recurrence
    ID   CID STATE    OWNER GROUP  CREATED  RDY TITLE
    5980  32 FINISHED dba   admins 05:30:01 0%  Registering Maintenance
    5981  32 FINISHED dba   admins 05:35:00 0%  Registering Maintenance
    5982  32 FINISHED dba   admins 05:40:00 0%  Registering Maintenance

    Automatic Recovery as a Job

    In the previous versions, automatic recovery feature can only be enabled or disabled at runtime via the UI, through a simple switch button in the cluster's summary bar, as shown in the following screenshot:

    In ClusterControl 1.7.5, automatic recovery is also part of an internal job, where the configuration can be controlled via CLI and persistent across restarts. This means the job can be scheduled, iterated and controlled with an expiration period via ClusterControl CLI and allows users to incorporate the automatic recovery management in the maintenance automation scripts when necessary. 

    When a cluster-wide maintenance is ongoing, it is pretty common to see some questionable states of database hosts, which is totally acceptable during this period. The common practice is to ignore these questionable states and make no interruption to the node while maintenance is happening. If ClusterControl automatic recovery is turned on, it will automatically attempt to recover the problematic host back to the good state, regardless of the maintenance mode state. Thus, disabling ClusterControl automatic recovery during the maintenance operation is highly recommended so ClusterControl will not interrupt the maintenance as it carries on.

    To disable cluster automatic recovery, simply use the --disable-recovery flag with respective cluster ID:

    $ s9s cluster --disable-recovery --log --cluster-id=32
    Cluster ID is 32.
    Cluster recovery is currently enabled.
    Node recovery is currently enabled.
    Disabling cluster auto recovery.
    Disabling node auto recovery.

    To reverse the above, use --enable-recovery flag to enable it again:

    $ s9s cluster --enable-recovery --log --cluster-id=32
    Cluster ID is 32.
    Cluster recovery is currently disabled.
    Node recovery is currently disabled.
    Enabling cluster auto recovery.
    Enabling node auto recovery.

    The CLI also supports disabling recovery together with activating maintenance mode in the same command. One has to use the --maintenance-minutes flag and optionally provide a reason:

    $ s9s cluster \
    --disable-recovery \
    --log \
    --cluster-id=29 \
    --maintenance-minutes=60 \
    --reason='Disabling recovery for 1 hour to update kernel'
    
    Registering maintenance for 60 minute(s) for cluster 32.
    Cluster ID is 29.
    Cluster recovery is currently enabled.
    Node recovery is currently enabled.
    Disabling cluster auto recovery.
    Disabling node auto recovery.

    From the above output, we can tell that ClusterControl has disabled automatic recovery for the node, and also registered a maintenance mode for the cluster. We can then verify with the list maintenance command:

    $ s9s maintenance --list --long
    ST UUID    OWNER     GROUP  START    END      HOST/CLUSTER             REASON
    Ac 687e255 system    admins 06:09:57 07:09:57 MariaDB 10.3 Replication Disabling recovery for 1 hour to update kernel

    Similarly, it will appear in the UI as shown in the following screenshot:

    You can enable the automatic recovery feature using the --enable-recovery flag if it is no longer necessary. The maintenance mode will still be active as defined in the --maintenance-minutes option, unless you explicitly delete or deactivate the maintenance mode via GUI or CLI.

    Conclusion

    ClusterControl allows you to manage your maintenance window efficiently, by discarding possible false alarms and controlling the automatic recovery behaviour while maintenance is ongoing. Maintenance mode is available for free in all ClusterControl editions, so give it a try.

    by ashraf at January 14, 2020 10:45 AM

    January 13, 2020

    SeveralNines

    Announcing ClusterControl 1.7.5: Advanced Cluster Maintenance & Support for PostgreSQL 12 and MongoDB 4.2

    We’re excited to announce the 1.7.5 release of ClusterControl - the only database management system you’ll ever need to take control of your open source database infrastructure. 

    This new version features support for the latest MongoDB & PostgreSQL general releases as well as new operating system support allowing you to install ClusterControl on Centos 8 and Debian 10.

    ClusterControl 1.7.4 provided the ability to place a node into Maintenance Mode. 1.7.5 now allows you to place (or schedule) the entire database cluster in Maintenance Mode, giving you more control over your database operations.

    In addition, we are excited to announce a brand new function in ClusterControl we call “Freeze Frame.” This new feature will take snapshots of your MySQL or MariaDB setups right before a detected failure, providing you with invaluable troubleshooting information about what caused the issue. 

    Release Highlights

    Database Cluster-Wide Maintenance

    • Perform tasks in Maintenance-Mode across the entire database cluster.
    • Enable/disable cluster-wide maintenance mode with a cron-based scheduler.
    • Enable/disable recurring jobs such as cluster or node recovery with automatic maintenance mode.

    MySQL Freeze Frame (BETA)

    • Snapshot MySQL status before cluster failure.
    • Snapshot MySQL process list before cluster failure (coming soon).
    • Inspect cluster incidents in operational reports or from the s9s command line tool.

    New Operating System & Database Support

    • Centos 8 and Debian 10 support.
    • PostgreSQL 12 support.
    • MongoDB 4.2 and Percona MongoDB v4.0 support.

    Additional Misc Improvements

    • Synchronize time range selection between the Overview and Node pages.
    • Improvements to the nodes status updates to be more accurate and with less delay.
    • Enable/Disable Cluster and Node recovery are now regular CMON jobs.
    • Topology view for Cluster-to-Cluster Replication.
     

    View Release Details and Resources

    Release Details

    Cluster-Wide Maintenance 

    The ability to place a database node into Maintenance Mode was implemented in the last version of ClusterControl (1.7.4). In this release we now offer the ability to place your entire database cluster into Maintenance Mode to allow you to perform updates, patches, and more.

    MySQL & MariaDB Freeze Frame

    This new ClusterControl feature allows you to get a snapshot of your MySQL statuses and related processes immediately before a failure is detected. This allows you to better understand what happened when troubleshooting, and provide you with actionable information on how you can prevent this type of failure from happening in the future. 

    This new feature is not part of the auto-recovery features in ClusterControl. Should your database cluster go down those functions will still perform to attempt to get you back online; it’s just that now you’ll have a better idea of what caused it. 

    Support for PostgreSQL 12

    Released in October 2019, PostgreSQL 12 featured major improvements to indexing, partitioning, new SQL & JSON functions, and improved security features, mainly around authentication. ClusterControl now allows you to deploy a preconfigured Postgres 12 database cluster with the ability to fully monitor and manage it.

    PostgreSQL GUI - ClusterControl

    Support for MongoDB 4.2

    MongoDB 4.2 offers unique improvements such as new ACID transaction guarantees, new query and analytics functions including new charts for rich data visualizations. ClusterControl now allows you to deploy a preconfigured MongoDB 4.2 or Percona Server for MongoDB 4.2 ReplicaSet with the ability to fully monitor and manage it.

    MongoDB GUI - ClusterControl
     

    by fwlymburner at January 13, 2020 03:59 PM

    January 12, 2020

    Valeriy Kravchuk

    Fun with Bugs #91 - On MySQL Bug Reports I am Subscribed to, Part XXV

    Not sure if it's still interesting to anybody else, but MySQL users keep finding and reporting new problems that may be caused by genuine bugs in the code. I keep checking these reports and subscribing to those I consider interesting. Let me start blogging in the New Year of 2020 with a review of some replication, InnoDB and (many!) optimizer bugs reported in September and October, 2019.

    As usual, I start from the oldest and care to mention bug reporters by names and add links to their other bug reports, if any. So, here is the new list:
    • Bug #96827 - "mysqlbinlog needs options to abort if invalid events are found on in-use binlogs". I had never checked myself, but I see no reasons not to trust Yoshinori Matsunobu in this case, based on code fragments shared. All current MySQL versions, from 5.6.x to 8.0.x, are affected. From what I see here, MariaDB is also affected.
    • Bug #96853 - "Inconsistent super_read_only status when changing the variable is blocked". Nice bug report by Przemyslaw Malkowski from Percona. For some reason I do not see clear statement if MySQL 8 is affected.
    • Bug #96874 - "The write_notifier_mutex in log_sys is useless". This bug was reported by Chen Zongzhi on MySQL 8.0.17 (see also hist another similar Bug #97358 - "The log.flush_notifier_mutex in log_sys is useless"), but "Version" filed is empty even though the bug is "Verified". This is NOT acceptable.
    • Bug #96946 - "Outer reference in join condition isn't allowed". This bug (that affects all MySQL versions) was reported by Laurents Meyer. See also older Bug #35242 (still "Verified" and affects MariaDB 10.3.x as well).
    • Bug #96950 - "CONCAT() can generate corrupted output". I wish we'd see the exact test case, but at least based on code review this bug (reported by Jay Edgar) was verified for MySQL 5.6 and 5.7. I see the same code in MariaDB, unfortunately.
    • Bug #97001 - "Dangerous optimization reconsidering_access_paths_for_index_ordering". The problems is with queries like this:
      SELECT ... WHERE [secondary key conditions] ORDER BY `id` ASC LIMIT n
      and bug reporter, Jeremy Cole, listed a lot of potentially related older bug reports. he had also suggested a patch. I'd be happy to see the fix in MySQL soon.
    • Bug #97113 - "BIT column serialized incorrectly in CASE expression". This bug report was created by Bradley Grainger. It is stated there that MySQL 5.7 (not only 8.0) is affected, but "Version:" field of a verified bug does NOT list 5.7.x. Absolutely wrong way of bugs processing. MariaDB also seems to be inconsistent, even though the result for one of the queries is different:
      MariaDB [test]> SELECT CASE WHEN name IS NOT NULL THEN value ELSE NULL END FROM query_bit;
      Field   1:  `CASE WHEN name IS NOT NULL THEN value ELSE NULL END`
      Catalog:    `def`
      Database:   ``
      Table:      ``
      Org_table:  ``
      Type:       NEWDECIMAL
      Collation:  binary (63)
      Length:     2
      Max_length: 1
      Decimals:   0
      Flags:      BINARY NUM


      +-----------------------------------------------------+
      | CASE WHEN name IS NOT NULL THEN value ELSE NULL END |
      +-----------------------------------------------------+
      | 1                                                   |
      +-----------------------------------------------------+
      1 row in set (0.021 sec)
    • Bug #97150 - "rwlock: refine lock->recursive with C11 atomics". Patch for MySQL 8.0.x was contributed by Cai Yibo. See also his another contribution, Bug #97228 - "rwlock: refine lock->lock_word with C11 atomics".
    • Bug #97299 - "Improve the explain informations for Hash Joins". Simple EXPLAIN (unlike the one with format=tree) does not give a hint that new MySQL 8.0.18+ features, hash join, was used. Simple and useful feature request from Tibor Korocz.
    • Bug #97345 - "IO Thread not detecting failed master with relay_log_space_limit." Nice bug report from Jean-François Gagné, but no documented attempt to check if MySQL 5.6.x and 8.0.x are also affected.
    • Bug #97347 - "In some cases queries with ST_CONTAINS do not return any results". Simple and easy to check bug report from Christian Koinig. Note that based on quick test MariaDB is NOT affected:
      MariaDB [test]> select version(), count(*) FROM test
          -> WHERE ST_CONTAINS(
          ->  geo_footprint,
          ->  ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[15.11333480819996
      6,48.1337532388],[15.113329984100005,48.1337371609],[15.113411697200036,48.13371
      66354],[15.113673777399981,48.1336819199],[15.114544787600039,48.1335464618],[15
      .115336574000025,48.1334415189],[15.116374992200008,48.1332937084],[15.117266346
      799966,48.1330924824],[15.11769786879995,48.1329803459],[15.118129375299986,48.1
      32868199],[15.118515258099933,48.1327388086],[15.118597296700045,48.1327141533],
      [15.118635348899943,48.132702717],[15.11867907729993,48.1327796282],[15.11876276
      890007,48.13290987],[15.118805112699988,48.1330357889],[15.118850101500016,48.13
      33486685],[15.118823191700017,48.1334777297],[15.118820984299987,48.1334784295],
      [15.118821076099948,48.1334779691],[15.113334808199966,48.1337532388],[15.113334
      808199966,48.1337532388]]]}'));
      +--------------------+----------+
      | version()          | count(*) |
      +--------------------+----------+
      | 10.3.7-MariaDB-log |        1 |
      +--------------------+----------+
      1 row in set (0.003 sec)
    • Bug #97372 - "Constructor Query_event must check enough space". Contribution to 5.7 and 8.0 by Pengbo Shi. Waiting for the OCI signed by the contributor...
    • Bug #97418 - "MySQL chooses different execution plan in 5.7". Interesting bug report from Vinodh Krish. I am again not sure if versions affected match the results of tests presented here.
    • Bug #97421 - "Replace into affected row count not match with events in binlog". Not sure if MySQL 8 was checked, but MariaDB 10.3.7 also uses single Update_rows event in the binary log. Thanks to Ke Lu for noticing and reporting this!
    Also, on a separate note, this claim of MySQL 8.0 performance regression from Mark CallaghanBug #86215, is still being analyzed it seems. No further comments for 2.5 years already!

    Autumn of 2019 was fruitful. A lot of interesting MySQL bug reports also, not just grapes on my balcony...
    To summarize:
    1. For some reason I often do not see explicit documented attempts by Oracle MySQL engineers from the bugs verification team to check bug on different MySQL versions. Sometimes obviously affected version (like MySQL 8.0.x) is not listed in the field. So "Version" field becomes useless This is absolutely wrong. Maybe I should submit yet another talk to some conference on how to process bugs properly?
    2. Some regression bugs are still not marked with "regression" tag when verified.
    3. MySQL optimizer still requires a lot of work to become decent.
    4. I see a lot of new interesting new bug reports both from well known old community members and users I had never noticed before by name. This is great and proves that MySQL is still alive and use all kinds of contributions from Community.
    Next time I'll review interesting bugs reported in November and December, 2019. Stay tuned!

    by Valerii Kravchuk (noreply@blogger.com) at January 12, 2020 06:51 PM

    January 10, 2020

    SeveralNines

    A SOx Compliance Checklist for PostgreSQL

    The United States SOx (Sarbanes-Oxley) Act, 2002, addresses a broad spectrum of fundamental information security principles for commercial enterprises, ensuring their functions are rooted and consistently applied, based on concepts of CIA (Confidentiality, Integrity, and Availability).

    Accomplishing these goals requires commitment from many individuals, all which must be aware of; their responsibilities maintaining the secure state of the enterprise assets, understanding policies, procedures, standards, guidelines, and the possibilities of losses involved with their duties.

    CIA aims at ensuring that the alignment of the business strategy, goals, mission, and objectives, are supported by security controls, approved in consideration with senior management's due diligence, and tolerance for risks and costs.

    PostgreSQL Database Clusters

    The PostgreSQL Server has a broad collection of features offered for free, making it one of the most popular DBMS (Database Management Systems), enabling its adoption on a wide range of projects in different social and economic spheres.

    The main advantage for its adoption, is the Open Source License, removing concerns around copyright infringement within an organization, possibly being caused by an IT administrator, inadvertently exceeding the number of permitted licenses.

    The implementation of information security for PostgreSQL (From an organizational context) will not be successful without carefully constructed and uniformly applied security policies and procedures which cover all aspects of business continuity planning.

    BCP (Business Continuity Planning)

    Leadership must agree prior to starting the BCP program to ensure they understand the expected deliverables, as well their personal liability (financially and even criminally) if it is determined that they did not use due care to adequately protect the organization and its resources.

    The senior management's expectations are communicated through policies, developed and maintained by security officers, responsible for establishing procedures and adherence to standards, baselines, and guidelines, and for discovering SPoFs (Single Points of Failure) that can compromise an entire system from working securely and reliably.

    The classification of these potential disruptive events, is done using BIA (Business Impact Analysis), which is a sequential approach of; identifying the assets and business processes, determine criticality of each one, estimate MTD (Maximum Tolerable Downtime) based on their time sensitivity for recovery, and finally, calculate the recovery objectives; RTO (Recovery Time Objective) and RPO (Recovery Point Objective), considering the cost of achieving the objective, versus, the benefit.

    Data Access Roles and Responsibilities

    Commercial businesses commonly hire outside firms who specialize in background checks in order to gather more information of prospective new employees, assisting the hiring manager with solid work records, validating education degrees and certifications, criminal history, and reference checks.

    Operational systems are being out-dated and poor or written down passwords, are just a couple of the many ways unauthorized individuals can find vulnerabilities and attack an organization's information systems, through the network or social engineering.

    Third-party services, hired by the organization, can represent a threat as well, especially if employees are not trained to use proper security procedures. Their interactions must be rooted in strong security foundations in order to prevent information disclosure.

    Least privilege refers to granting users only the access they need to do their jobs, nothing more. While some employees (based upon their job functions) have a higher “need-to-know” access. Consequently, their workstations must be continuously monitored, and up-to-date with security standards.

    Some Resources That Can Help

    Logos of frameworks and organizations, responsible for providing Cybersecurity guidelines.

    COSO (Committee of Sponsoring Organizations of the Treadway Commission)

    Formed in 1985, to sponsor the US (United States) National Commission on Fraudulent Financial Reporting, which studied causal factors that lead to fraudulent financial reporting, and produced recommendations for; public companies, their auditors, the SEC (Securities Exchange Commission), other regulators, and law enforcement bodies.

    ITIL (Information Technology Infrastructure Library)

    Built by the British government’s Stationary Office, ITIL is a framework composed of a set of books, which demonstrates best practices for specific needs for IT of an organization, such as management of core operational processes, incidents and availability, and financial considerations.

    COBIT (Control Objectives for Information and Related Technology)

    Published by the ITGI (IT Governance Institute), COBIT is a framework that provides an overall structure for IT controls, including examination of efficiency, effectiveness, CIA, reliability, and compliance, in alignment with the business needs. ISACA (Information Systems Audit and Control Association) provides deep instructions about COBIT, as well as certifications recognized globally, such as CISA (Certified Information Systems Auditor).

    ISO/IEC 27002:2013 (International Organization for Standardization/International Electrotechnical Commission)

    Previously known as ISO/IEC 17799:2005, the ISO/IEC 27002:2013 contains detailed instructions for organizations, covering information security controls, such as; policies, compliance, access controls, operations and HR (Human Resources) security, cryptography, management of incidents, risks, BC (Business Continuity), assets, and many more. There is also a preview of the document.

    VERIS (Vocabulary of Event Recording and Incident Sharing)

    Available on GitHub, VERIS is a project in continuous development, intended to help organizations collecting useful incident-related information, and sharing it anonymously and responsibly, expanding the VCDB (VERIS Community Database). The cooperation of users, resulting in an excellent reference for risk management, is then translated into an annual report, the VDBIR (Verizon Data Breach Investigation Report).

    OECD Guidelines (Organization for Economic Cooperation and Development)

    The OECD, in cooperation with partners around the globe, promotes RBCs (Responsible Business Conduct) for multinational enterprises, ensuring privacy to individuals upon their PII (Personally Identifiable Information), and establishing principles of how their data must be retained and maintained by enterprises.

    NIST SP 800 Series (National Institute of Standards and Technology Special Publication)

    The US NIST, provides on its CSRC (Computer Security Resource Center), a collection of publications for Cybersecurity, covering all kinds of topics, including databases. The most important one, from a database perspective, is the SP 800-53 Revision 4.

    Conclusion

    The Information Security Triad, versus its opposite.

    Achieving SOx goals is a daily concern for many organizations, even those not limited to accounting activities. Frameworks containing instructions for risk assessment and internal controls must be in place for enterprise's security practitioners, as well as software for preventing destruction, alteration, and disclosure of sensitive data.

     

    by thiagolopes at January 10, 2020 04:28 PM

    MariaDB Foundation

    MariaDB Day Brussels 0202 2020

    The first MariaDB Day will be held in Brussels at the Bedford Hotel and Congress Centre on Sunday February 2. This is a complementary event to the MySQL, MariaDB and Friends Day at FOSDEM, which is far-oversubscribed, and gives an opportunity for other speakers and more in-depth coverage of MariaDB-related topics. […]

    The post MariaDB Day Brussels 0202 2020 appeared first on MariaDB.org.

    by Ian Gilfillan at January 10, 2020 06:10 AM

    January 09, 2020

    SeveralNines

    Tips for Delivering MySQL Database Performance - Part Two

    The management of database performance is an area that businesses when administrators often find themselves contributing more time to than they expected.

    Monitoring and reacting to the production database performance issues is one of the most critical tasks within a database administrator job. It is an ongoing process that requires constant care. Application and underlying databases usually evolve with time; grow in size, number of users, workload, schema changes that come with code changes.

    Long-running queries are seldom inevitable in a MySQL database. In some circumstances, a long-running query may be a harmful event. If you care about your database, optimizing query performance, and detecting long-running queries must be performed regularly. 

    In this blog, we are going to take a more in-depth look at the actual database workload, especially on the running queries side. We will check how to track queries, what kind of information we can find in MySQL metadata, what tools to use to analyze such queries.

    Handling The Long-Running Queries

    Let’s start with checking Long-running queries. First of all, we have to know the nature of the query, whether it is expected to be a long-running or a short running query. Some analytic and batch operations are supposed to be long-running queries, so we can skip those for now. Also, depending on the table size, modifying table structure with ALTER command can be a long-running operation (especially in MySQL Galera Clusters).

    • Table lock - The table is locked by a global lock or explicit table lock when the query is trying to access it.
    • Inefficient query - Use non-indexed columns while lookup or joining, thus MySQL takes a longer time to match the condition.
    • Deadlock - A query is waiting to access the same rows that are locked by another request.
    • Dataset does not fit into RAM - If your working set data fits into that cache, then SELECT queries will usually be relatively fast.
    • Suboptimal hardware resources - This could be slow disks, RAID rebuilding, saturated network, etc.

    If you see a query takes longer than usual to execute, do investigate it.

    Using the MySQL Show Process List

    ​MYSQL> SHOW PROCESSLIST;

    This is usually the first thing you run in the case of performance issues. SHOW PROCESSLIST is an internal mysql command which shows you which threads are running. You can also see this information from the information_schema.PROCESSLIST table or the mysqladmin process list command. If you have the PROCESS privilege, you can see all threads. You can see information like Query Id, execution time, who runs it, the client host, etc. The information with slightly wary depending on the MySQL flavor and distribution (Oracle, MariaDB, Percona)

    SHOW PROCESSLIST;
    
    +----+-----------------+-----------+------+---------+------+------------------------+------------------+----------+
    
    | Id | User            | Host | db | Command | Time | State                  | Info | Progress |
    
    +----+-----------------+-----------+------+---------+------+------------------------+------------------+----------+
    
    |  2 | event_scheduler | localhost | NULL | Daemon  | 2693 | Waiting on empty queue | NULL   | 0.000 |
    
    |  4 | root            | localhost | NULL | Query   | 0 | Table lock   | SHOW PROCESSLIST | 0.000 |
    
    +----+-----------------+-----------+------+---------+------+------------------------+------------------+----------+

    we can immediately see the offensive query right away from the output. In the above example that could be a Table lock.  But how often do we stare at those processes? This is only useful if you are aware of the long-running transaction. Otherwise, you wouldn't know until something happens - like connections are piling up, or the server is getting slower than usual.

    Using MySQL Pt-query-digest

    If you would like to see more information about a particular workload use pt-query-digest.  The pt-query-digest is a Linux tool from Percona to analyze MySQL queries. It’s part of the Percona Toolkit which you can find here. It supports the most popular 64 bit Linux distributions like Debian, Ubuntu, and Redhat. 

    To install it you must configure Percona repositories and then install the perona-toolkit package.

    Install Percona Toolkit using your package manager:

    Debian or Ubuntu:

    sudo apt-get install percona-toolkit

    RHEL or CentOS:

    sudo yum install percona-toolkit

    Pt-query-digest accepts data from the process list, general log, binary log, slow log or tcpdump In addition to that, it’s possible to poll the MySQL process list at a defined interval - a process that can be resource-intensive and far from ideal, but can still be used as an alternative.

    The most common source for pt-query-digest is a slow query log. You can control how much data will go there with parameter log_slow_verbosity.  

    There are a number of things that may cause a query to take a longer time to execute:

    • microtime - queries with microsecond precision.
    • query_plan - information about the query’s execution plan.
    • innodb  - InnoDB statistics.
    • minimal - Equivalent to enabling just microtime.
    • standard - Equivalent to enabling microtime,innodb.
    • full - Equivalent to all other values OR’ed together without the profiling and profiling_use_getrusage options.
    • profiling - Enables profiling of all queries in all connections.
    • profiling_use_getrusage - Enables usage of the getrusage function.

    source: Percona documentation

    For completeness use log_slow_verbosity=full which is a common case.

    Slow Query Log

    The slow query log can be used to find queries that take a long time to execute and are therefore candidates for optimization. Slow query log captures slow queries (SQL statements that take more than long_query_time seconds to execute), or queries that do not use indexes for lookups (log_queries_not_using_indexes). This feature is not enabled by default and to enable it simply set the following lines and restart the MySQL server:

    [mysqld]
    slow_query_log=1
    log_queries_not_using_indexes=1
    long_query_time=0.1

    The slow query log can be used to find queries that take a long time to execute and are therefore candidates for optimization. However, examining a long slow query log can be a time-consuming task. There are tools to parse MySQL slow query log files and summarize their contents like mysqldumpslow, pt-query-digest.

    Performance Schema

    Performance Schema is a great tool available for monitoring MySQL Server internals and execution details at a lower level. It had a bad reputation in an early version (5.6) because enabling it often caused performance issues, however the recent versions do not harm performance. The following tables in Performance Schema can be used to find slow queries:

    • events_statements_current
    • events_statements_history
    • events_statements_history_long
    • events_statements_summary_by_digest
    • events_statements_summary_by_user_by_event_name
    • events_statements_summary_by_host_by_event_name

    MySQL 5.7.7 and higher include the sys schema, a set of objects that helps DBAs and developers interpret data collected by the Performance Schema into a more easily understandable form. Sys schema objects can be used for typical tuning and diagnosis use cases.

    Network tracking

    What if we don’t have access to the query log or direct application logs. In that case, we could use a combination of tcpdump and pt-query digest which could help to capture queries.

    $ tcpdump -s 65535 -x -nn -q -tttt -i any port 3306 > mysql.tcp.txt

    Once the capture process ends, we can proceed with processing the data:

    $ pt-query-digest --limit=100% --type tcpdump mysql.tcp.txt > ptqd_tcp.out

    ClusterControl Query Monitor

    ClusterControl Query Monitor is a module in a cluster control that provides combined information about database activity. It can gather information from multiple sources like show process list or slow query log and present it in a pre-aggregated way. 

    ClusterControl Top Queries

    The SQL Monitoring is divided into three sections.

    Top Queries

    presents the information about queries that take a significant chunk of resources.

    ClusterControl Top Queries

    Running Queries

    it’s a process list of information combined from all database cluster nodes into one view. You can use that to kill queries that affect your database operations.

    ClusterControl Running Queries

    Query Outliers

    present the list of queries with execution time longer than average.

    ClusterControl Query Outliners

    Conclusion

    This is all for part two. This blog is not intended to be an exhaustive guide to how to enhance database performance, but it hopefully gives a clearer picture of what things can become essential and some of the basic parameters that can be configured. Do not hesitate to let us know if we’ve missed any important ones in the comments below.

     

    by Bart Oles at January 09, 2020 08:02 PM

    January 08, 2020

    SeveralNines

    Database Performance Tuning for MariaDB

    Ever since MySQL was originally forked to form MariaDB it has been widely supported and adopted quickly by a large audience in the open source database community. Originally a drop-in replacement, MariaDB has started to create distinction against MySQL, especially with the release of MariaDB 10.2

    Despite this, however, there's still no real telltale difference between MariaDB and MySQL, as both have engines that are compatible and can run natively with one another. So don't be surprised if the tuning of your MariaDB setup has a similar approach to one tuning MySQL

    This blog will discuss the tuning of MariaDB, specifically those systems running in a Linux environment.

    MariaDB Hardware and System Optimization

    MariaDB recommends that you improve your hardware in the following priority order...

    Memory

    Memory is the most important factor for databases as it allows you to adjust the Server System Variables. More memory means larger key and table caches, which are stored in memory so that disks can access, an order of magnitude slower, is subsequently reduced.

    Keep in mind though, simply adding more memory may not result in drastic improvements if the server variables are not set to make use of the extra available memory.

    Using more RAM slots on the motherboard increases the bus frequency, and there will be more latency between the RAM and the CPU. This means that using the highest RAM size per slot is preferable.

    Disks

    Fast disk access is critical, as ultimately it's where the data resides. The key figure is the disk seek time (a measurement of how fast the physical disk can move to access the data) so choose disks with as low a seek time as possible. You can also add dedicated disks for temporary files and transaction logs.

    Fast Ethernet

    With the appropriate requirements for your internet bandwidth, fast ethernet means it can have faster response to clients requests, replication response time to read binary logs across the slaves, faster response times is also very important especially on Galera-based clusters.

    CPU

    Although hardware bottlenecks often fall elsewhere, faster processors allow calculations to be performed more quickly, and the results sent back to the client more quickly. Besides processor speed, the processor's bus speed and cache size are also important factors to consider.

    Setting Your Disk I/O Scheduler

    I/O schedulers exist as a way to optimize disk access requests. It merges I/O requests to similar locations on the disk. This means that the disk drive doesn’t need to seek as often and improves a huge overall response time and saves disk operations. The recommended values for I/O performance are noop and deadline

    noop is useful for checking whether complex I/O scheduling decisions of other schedulers are not causing I/O performance regressions. In some cases it can be helpful for devices that do I/O scheduling themselves, as intelligent storage, or devices that do not depend on mechanical movement, like SSDs. Usually, the DEADLINE I/O scheduler is a better choice for these devices, but due to less overhead NOOP may produce better performance on certain workloads.

    For deadline, it is a latency-oriented I/O scheduler. Each I/O request has got a deadline assigned. Usually, requests are stored in queues (read and write) sorted by sector numbers. The DEADLINE algorithm maintains two additional queues (read and write) where the requests are sorted by deadline. As long as no request has timed out, the “sector” queue is used. If timeouts occur, requests from the “deadline” queue are served until there are no more expired requests. Generally, the algorithm prefers reads over writes.

    For PCIe devices (NVMe SSD drives), they have their own large internal queues along with fast service and do not require or benefit from setting an I/O scheduler. It is recommended to have no explicit scheduler-mode configuration parameter.

    You can check your scheduler setting with:

    cat /sys/block/${DEVICE}/queue/scheduler

    For instance, it should look like this output:

    cat /sys/block/sda/queue/scheduler
    
    [noop] deadline cfq

    To make it permanent, edit /etc/default/grub configuration file, look for the variable GRUB_CMDLINE_LINUX and add elevator just like below:

    GRUB_CMDLINE_LINUX="elevator=noop"

    Increase Open Files Limit

    To ensure good server performance, the total number of client connections, database files, and log files must not exceed the maximum file descriptor limit on the operating system (ulimit -n). Linux systems limit the number of file descriptors that any one process may open to 1,024 per process. On active database servers (especially production ones) it can easily reach the default system limit.

    To increase this, edit /etc/security/limits.conf and specify or add the following:

    mysql soft nofile 65535
    
    mysql hard nofile 65535

    This requires a system restart. Afterwards, you can confirm by running the following:

    $ ulimit -Sn
    
    65535
    
    $ ulimit -Hn
    
    65535

    Optionally, you can set this via mysqld_safe if you are starting the mysqld process thru mysqld_safe,

    [mysqld_safe]
    
    open_files_limit=4294967295

    or if you are using systemd,

    sudo tee /etc/systemd/system/mariadb.service.d/limitnofile.conf <<EOF
    
    [Service]
    
    
    
    LimitNOFILE=infinity
    
    EOF
    
    sudo systemctl daemon-reload

    Setting Swappiness on Linux for MariaDB

    Linux Swap plays a big role in database systems. It acts like your spare tire in your vehicle, when nasty memory leaks interfere with your work, the machine will slow down... but in most cases will still be usable to finish its assigned task. 

    To apply changes to your swappiness, simply run,

    sysctl -w vm.swappiness=1

    This happens dynamically, with no need to reboot the server. To make it persistent, edit /etc/sysctl.conf and add the line,

    vm.swappiness=1

    It's pretty common to set swappiness=0, but since the release of new kernels (i.e. kernels > 2.6.32-303), changes have been made so you need to set vm.swappiness=1.

    Filesystem Optimizations for MariaDB

    The most common file systems used in Linux environments running MariaDB are ext4 and XFS. There are also certain setups available for implementing an architecture using ZFS and BRTFS (as referenced in the MariaDB documentation).

    In addition to this, most database setups do not need to record file access time. You might want to disable this when mounting the volume into the system. To do this, edit your file /etc/fstab. For example, on a volume named /dev/md2, this how it looks like:

    /dev/md2 / ext4 defaults,noatime 0 0

    Creating an Optimal MariaDB Instance

    Store Data On A Separate Volume

    It is always ideal to separate your database data on a separate volume. This volume is specifically for those types of fast storage volumes such as SSD, NVMe, or PCIe cards. For example, if your entire system volume will fail, you'll have your database volume safe and rest assured not affected in case your storage hardware will fail. 

    Tuneup MariaDB To Utilize Memory Efficiently

    innodb_buffer_pool_size

    The primary value to adjust on a database server with entirely/primarily XtraDB/InnoDB tables, can be set up to 80% of the total memory in these environments. If set to 2 GB or more, you will probably want to adjust innodb_buffer_pool_instances as well. You can set this dynamically if you are using MariaDB >= 10.2.2 version. Otherwise, it requires a server restart.

    tmp_memory_table_size/max_heap_table_size

    For tmp_memory_table_size (tmp_table_size), if you're dealing with large temporary tables, setting this higher provides performance gains as it will be stored in the memory. This is common on queries that are heavily using GROUP BY, UNION, or sub-queries. Although if max_heap_table_size is smaller, the lower limit will apply. If a table exceeds the limit, MariaDB converts it to a MyISAM or Aria table. You can see if it's necessary to increase by comparing the status variables Created_tmp_disk_tables and Created_tmp_tables to see how many temporary tables out of the total created needed to be converted to disk. Often complex GROUP BY queries are responsible for exceeding the limit.

    While max_heap_table_size,  this is the maximum size for user-created MEMORY tables. The value set on this variable is only applicable for the newly created or re-created tables and not the existing ones. The smaller of max_heap_table_size and tmp_table_size also limits internal in-memory tables. When the maximum size is reached, any further attempts to insert data will receive a "table ... is full" error. Temporary tables created with CREATE TEMPORARY will not be converted to Aria, as occurs with internal temporary tables, but will also receive a table full error.

    innodb_log_file_size

    Large memories with high-speed processing and fast I/O disk aren't new and has its reasonable price as it recommends. If you are preferring more performance gains especially during and handling your InnoDB transactions, setting the variable innodb_log_file_size to a larger value such as 5Gib or even 10GiB is reasonable. Increasing means that the larger transactions can run without needing to perform disk I/O before committing. 

    join_buffer_size

    In some cases, your queries tend to lack use of proper indexing or simply, there are instances that you need this query to run. Not unless it's going to be heavily called or invoked from the client perspective, setting this variable is best on a session level. Increase it to get faster full joins when adding indexes is not possible, although be aware of memory issues, since joins will always allocate the minimum size.

    Set Your max_allowed_packet

    MariaDB has the same nature as MySQL when handling packets. It splits data into packets and the client must be aware of the max_allowed_packet variable value. The server will have a buffer to store the body with a maximum size corresponding to this max_allowed_packet value. If the client sends more data than max_allowed_packet size, the socket will be closed. The max_allowed_packet directive defines the maximum size of packet that can be sent.

    Setting this value too low can cause a query to stop and close its client connection which is pretty common to receive errors like ER_NET_PACKET_TOO_LARGE or Lost connection to MySQL server during query. Ideally, especially on most application demands today, you can start setting this to 512MiB. If it's a low-demand type of application, just use the default value and set this variable only via session when needed if the data to be sent or received is too large than the default value (16MiB since MariaDB 10.2.4). In certain workloads that demand on large packets to be processed, then you need to adjust his higher according to your needs especially when on replication. If max_allowed_packet is too small on the slave, this also causes the slave to stop the I/O thread.

    Using Threadpool

    In some cases, this tuning might not be necessary or recommended for you. Threadpools are most efficient in situations where queries are relatively short and the load is CPU bound (OLTP workloads). If the workload is not CPU bound, you might still want to limit the number of threads to save memory for the database memory buffers.

    Using threadpool is an ideal solution especially if your system is experiencing context switching and you are finding ways to reduce this and maintain a lower number of threads than the number of clients. However, this number should also not be too low, since we also want to make maximum use of the available CPUs. Therefore there should be, ideally, a single active thread for each CPU on the machine.

    You can set the thread_pool_max_threads, thread_pool_min_threads for the maximum and the minimum number of threads. Unlike MySQL, this is only present in MariaDB.

    Set the variable thread_handling which determines how the server handles threads for client connections. In addition to threads for client connections, this also applies to certain internal server threads, such as Galera slave threads.

    Tune Your Table Cache + max_connections

    If you are facing occasional occurrences in the processlist about Opening tables and Closing tables statuses, it can signify that you need to increase your table cache. You can monitor this also via the mysql client prompt by running SHOW GLOBAL STATUS LIKE 'Open%table%'; and monitor the status variables. 

    For max_connections, if you are application requires a lot of concurrent connections, you can start setting this to 500. 

    For table_open_cache, it shall be the total number of your tables but it's best you add more depending on the type of queries you serve since temporary tables shall be cached as well. For example, if you have 500 tables, it would be reasonable you start with 1500. 

    While your table_open_cache_instances, start setting it to 8. This can improve scalability by reducing contention among sessions, the open tables cache can be partitioned into several smaller cache instances of size table_open_cache / table_open_cache_instances.

    For InnoDB, table_definition_cache acts as a soft limit for the number of open table instances in the InnoDB data dictionary cache. The value to be defined will set the number of table definitions that can be stored in the definition cache. If you use a large number of tables, you can create a large table definition cache to speed up opening of tables. The table definition cache takes less space and does not use file descriptors, unlike the normal table cache. The minimum value is 400. The default value is based on the following formula, capped to a limit of 2000:

    MIN(400 + table_open_cache / 2, 2000)

    If the number of open table instances exceeds the table_definition_cache setting, the LRU mechanism begins to mark table instances for eviction and eventually removes them from the data dictionary cache. The limit helps address situations in which significant amounts of memory would be used to cache rarely used table instances until the next server restart. The number of table instances with cached metadata could be higher than the limit defined by table_definition_cache, because parent and child table instances with foreign key relationships are not placed on the LRU list and are not subject to eviction from memory.

    Unlike the table_open_cache, the table_definition_cache doesn't use file descriptors, and is much smaller.

    Dealing with Query Cache

    Preferably, we recommend to disable query cache in all of your MariaDB setup. You need to ensure that query_cache_type=OFF and query_cache_size=0 to complete disable query cache. Unlike MySQL, MariaDB is still completely supporting query cache and do not have any plans on withdrawing its support to use query cache. There are some people claiming that query cache still provides performance benefits for them. However, this post from Percona The MySQL query cache: Worst enemy or best friend reveals that query cache, if enabled, results to have an overhead and shows to have a bad server performance.

    If you intend to use query cache, make sure that you monitor your query cache by running SHOW GLOBAL STATUS LIKE 'Qcache%';. Qcache_inserts contains the number of queries added to the query cache, Qcache_hits contains the number of queries that have made use of the query cache, while Qcache_lowmem_prunes contains the number of queries that were dropped from the cache due to lack of memory. While in due time, using and enabling query cache may become fragmented. A high Qcache_free_blocks relative to Qcache_total_blocks may indicate fragmentation. To defragment it, run FLUSH QUERY CACHE. This will defragment the query cache without dropping any queries.

    Always Monitor Your Servers

    It is highly important that you properly monitor your MariaDB nodes. Common monitoring tools out there (like Nagios, Zabbix, or PMM) are available if you tend to prefer free and open-source tools. For corporate and fully-packed tools we suggest you give ClusterControl a try, as it does not only provide monitoring, but it also offers performance advisors, alerts and alarms which helps you improve your system performance and stay up-to-date with the current trends as you engage with the Support team. Database monitoring with ClusterControl is free and part of the Community Edition.

    Conclusion

    Tuning your MariaDB setup is almost the same approach as MySQL, but with some disparities, as it differs in some of its approaches and versions that it does support. MariaDB is now a different entity in the database world and has quickly gained the trust by the community without any FUD. They have their own reasons why it has to be implemented this way so it's very important we know how to tune this and optimize your MariaDB server(s).

    by Paul Namuag at January 08, 2020 07:09 PM

    January 07, 2020

    SeveralNines

    Using OpenVPN to Secure Access to Your Database Cluster in the Cloud

    The internet is a dangerous place, especially if you’re leaving your data unencrypted or without proper security. There are several ways to secure your data; all at different levels. You should always have a strong firewall policy,  data encryption, and a strong password policy. Another way to secure your data is by accessing it using a VPN connection. 

    Virtual Private Network (or VPN) is a connection method used to add security and privacy to private and public networks, protecting your data.

    OpenVPN is a fully-featured, open source, SSL VPN solution to secure communications. It can be used for remote access or communication between different servers or data centers. It can be installed on-prem or in the cloud, in different operating systems, and can be configured with many security options.

    In this blog, we’ll create a VPN connection to access a database in the cloud. There are different ways to achieve this goal, depending on your infrastructure and how much hardware resources you want to use for this task. 

    For example, you can create two VM, one on-prem and another one in the cloud, and they could be a bridge to connect your local network to the database cloud network through a Peer-to-Peer VPN connection.

    Another simpler option could be connecting to a VPN server installed in the database node using a VPN client connection configured in your local machine. In this case, we’ll use this second option. You’ll see how to configure an OpenVPN server in the database node running in the cloud, and you’ll be able to access it using a VPN client.

    For the database node, we’ll use an Amazon EC2 instance with the following configuration:

    • OS: Ubuntu Server 18.04
    • Public IP Address: 18.224.138.210
    • Private IP Address: 172.31.30.248/20
    • Opened TCP ports: 22, 3306, 1194

    How to Install OpenVPN on Ubuntu Server 18.04

    The first task is to install the OpenVPN server in your database node. Actually, the database technology used doesn’t matter as we’re working on a networking layer, but for testing purposes after configuring the VPN connection, let’s say we’re running Percona Server 8.0.

    So let’s start by installing the OpenVPN packages.

    $ apt install openvpn easy-rsa

    As OpenVPN uses certificates to encrypt your traffic, you’ll need EasyRSA for this task. It’s a CLI utility to create a root certificate authority, and request and sign certificates, including sub-CAs and certificate revocation lists.

    Note: There is a new EasyRSA version available, but to keep the focus on the OpenVPN installation, let’s use the EasyRSA version available in the Ubuntu 18.04 repository atm (EasyRSA version 2.2.2-2).

    The previous command will create the directory /etc/openvpn/ for the OpenVPN configuration, and the directory /usr/share/easy-rsa/ with the EasyRSA scripts and configuration.

    To make this task easier, let’s create a symbolic link to the EasyRSA path in the OpenVPN directory (or you can just copy it):

    $ ln -s /usr/share/easy-rsa /etc/openvpn/

    Now, you need to configure EasyRSA and create your certificates. Go to the EasyRSA location and create a backup for the “vars” file:

    $ cd /etc/openvpn/easy-rsa
    
    $ cp vars vars.bak

    Edit this file, and change the following lines according to your information:

    $ vi vars
    
    export KEY_COUNTRY="US"
    
    export KEY_PROVINCE="CA"
    
    export KEY_CITY="SanFrancisco"
    
    export KEY_ORG="Fort-Funston"
    
    export KEY_EMAIL="me@myhost.mydomain"
    
    export KEY_OU="MyOrganizationalUnit"

    Then, create a new symbolic link to the openssl file:

    $ cd /etc/openvpn/easy-rsa
    
    $ ln -s openssl-1.0.0.cnf openssl.cnf

    Now, apply the vars file:

    $ cd /etc/openvpn/easy-rsa
    
    $ . vars

    NOTE: If you run ./clean-all, I will be doing a rm -rf on /etc/openvpn/easy-rsa/keys

    Run the clean-all script:

    $ ./clean-all

    And create the Diffie-Hellman key (DH):

    $ ./build-dh
    
    Generating DH parameters, 2048 bit long safe prime, generator 2
    
    This is going to take a long time
    
    .....................................................................................................................................................................+

    This last action could take some seconds, and when it’s finished, you will have a new DH file inside the “keys” directory in the EasyRSA directory.

    $ ls /etc/openvpn/easy-rsa/keys
    
    dh2048.pem

    Now, let’s create the CA certificates.

    $ ./build-ca
    
    Generating a RSA private key
    
    ..+++++
    
    ...+++++
    
    writing new private key to 'ca.key'
    
    -----
    
    You are about to be asked to enter information that will be incorporated
    
    into your certificate request.
    
    What you are about to enter is what is called a Distinguished Name or a DN.
    
    There are quite a few fields but you can leave some blank
    
    For some fields there will be a default value,
    
    If you enter '.', the field will be left blank.
    
    ...

    This will create the ca.crt (public certificate) and ca.key (private key). The public certificate will be required in all servers to connect to the VPN.

    $ ls /etc/openvpn/easy-rsa/keys
    
    ca.crt  ca.key

    Now you have your CA created, let’s create the server certificate. In this case, we’ll call it “openvpn-server”:

    $ ./build-key-server openvpn-server
    
    Generating a RSA private key
    
    .......................+++++
    
    ........................+++++
    
    writing new private key to 'openvpn-server.key'
    
    -----
    
    You are about to be asked to enter information that will be incorporated
    
    into your certificate request.
    
    What you are about to enter is what is called a Distinguished Name or a DN.
    
    There are quite a few fields but you can leave some blank
    
    For some fields there will be a default value,
    
    If you enter '.', the field will be left blank.
    
    ...
    
    Certificate is to be certified until Dec 23 22:44:02 2029 GMT (3650 days)
    
    Sign the certificate? [y/n]:y
    
    
    
    1 out of 1 certificate requests certified, commit? [y/n]y
    
    
    
    Write out database with 1 new entries
    
    Data Base Updated

    This will create the CRT, CSR, and Key files for the OpenVPN server:

    $ ls /etc/openvpn/easy-rsa/keys
    
    openvpn-server.crt  openvpn-server.csr openvpn-server.key

    Now, you need to create the client certificate, and the process is pretty similar:

    $ ./build-key openvpn-client-1
    
    Generating a RSA private key
    
    .........................................................................................+++++
    
    .....................+++++
    
    writing new private key to 'openvpn-client-1.key'
    
    -----
    
    You are about to be asked to enter information that will be incorporated
    
    into your certificate request.
    
    What you are about to enter is what is called a Distinguished Name or a DN.
    
    There are quite a few fields but you can leave some blank
    
    For some fields there will be a default value,
    
    If you enter '.', the field will be left blank.
    
    ...
    
    Certificate is to be certified until Dec 24 01:45:39 2029 GMT (3650 days)
    
    Sign the certificate? [y/n]:y
    
    
    
    1 out of 1 certificate requests certified, commit? [y/n]y
    
    
    
    Write out database with 1 new entries
    
    Data Base Updated

    This will create the CRT, CSR, and Key files for the OpenVPN client:

    $ ls /etc/openvpn/easy-rsa/keys
    
    openvpn-client-1.csr  openvpn-client-1.crt openvpn-client-1.key

    At this point, you have all the certificates ready. The next step will be to create both server and client OpenVPN configuration.

    Configuring the OpenVPN Server

    As we mentioned, the OpenVPN installation will create the /etc/openvpn directory, where you will add the configuration files for both server and client roles, and it has a sample configuration file for each one in /usr/share/doc/openvpn/examples/sample-config-files/, so you can copy the files in the mentioned location and modify them as you wish.

    In this case, we’ll only use the server configuration file, as it’s an OpenVPN server:

    $ cp /usr/share/doc/openvpn/examples/sample-config-files/server.conf.gz /etc/openvpn/
    
    $ gunzip /etc/openvpn/server.conf.gz

    Now, let’s see a basic server configuration file:

    $ cat /etc/openvpn/server.conf
    
    port 1194  
    
    # Which TCP/UDP port should OpenVPN listen on?
    
    proto tcp  
    
    # TCP or UDP server?
    
    dev tun  
    
    # "dev tun" will create a routed IP tunnel,"dev tap" will create an ethernet tunnel.
    
    ca /etc/openvpn/easy-rsa/keys/ca.crt  
    
    # SSL/TLS root certificate (ca).
    
    cert /etc/openvpn/easy-rsa/keys/openvpn-server.crt  
    
    # Certificate (cert).
    
    key /etc/openvpn/easy-rsa/keys/openvpn-server.key  
    
    # Private key (key). This file should be kept secret.
    
    dh /etc/openvpn/easy-rsa/keys/dh2048.pem  
    
    # Diffie hellman parameters.
    
    server 10.8.0.0 255.255.255.0  
    
    # Configure server mode and supply a VPN subnet.
    
    push "route 172.31.16.0 255.255.240.0"
    
    # Push routes to the client to allow it to reach other private subnets behind the server.
    
    keepalive 20 120  
    
    # The keepalive directive causes ping-like messages to be sent back and forth over the link so that each side knows when the other side has gone down.
    
    cipher AES-256-CBC  
    
    # Select a cryptographic cipher.
    
    persist-key  
    
    persist-tun
    
    # The persist options will try to avoid accessing certain resources on restart that may no longer be accessible because of the privilege downgrade.
    
    status /var/log/openvpn/openvpn-status.log  
    
    # Output a short status file.
    
    log /var/log/openvpn/openvpn.log  
    
    # Use log or log-append to override the default log location.
    
    verb 3  
    
    # Set the appropriate level of log file verbosity.

    Note: Change the certificate paths according to your environment. 

    And then, start the OpenVPN service using the created configuration file:

    $ systemctl start openvpn@server

    Check if the service is listening in the correct port:

    $ netstat -pltn |grep openvpn
    
    tcp        0 0 0.0.0.0:1194            0.0.0.0:* LISTEN   20002/openvpn

    Finally, in the OpenVPN server, you need to add the IP forwarding line in the sysctl.conf file to allow the VPN traffic:

    $ echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf

    And run:

    $ sysctl -p
    
    net.ipv4.ip_forward = 1

    Now, let’s see how to configure an OpenVPN client to connect to this new VPN.

    Configuring the OpenVPN Client

    In the previous point, we mentioned the OpenVPN sample configuration files, and we used the server one, so now let’s do the same but using the client configuration file.

    Copy the file client.conf from /usr/share/doc/openvpn/examples/sample-config-files/ in the corresponding location and change it as you wish.

    $ cp /usr/share/doc/openvpn/examples/sample-config-files/client.conf /etc/openvpn/

    You’ll also need the following certificates created previously to configure the VPN client:

    ca.crt
    
    openvpn-client-1.crt
    
    openvpn-client-1.key

    So, copy these files to your local machine or VM. You’ll need to add this files location in the VPN client configuration file.

    Now, let’s see a basic client configuration file:

    $ cat /etc/openvpn/client.conf
    
    client  
    
    # Specify that we are a client
    
    dev tun  
    
    # Use the same setting as you are using on the server.
    
    proto tcp  
    
    # Use the same setting as you are using on the server.
    
    remote 18.224.138.210 1194  
    
    # The hostname/IP and port of the server.
    
    resolv-retry infinite  
    
    # Keep trying indefinitely to resolve the hostname of the OpenVPN server.
    
    nobind  
    
    # Most clients don't need to bind to a specific local port number.
    
    persist-key  
    
    persist-tun
    
    # Try to preserve some state across restarts.
    
    ca /Users/sinsausti/ca.crt  
    
    cert /Users/sinsausti/openvpn-client-1.crt
    
    key /Users/sinsausti/openvpn-client-1.key
    
    # SSL/TLS parms.
    
    remote-cert-tls server  
    
    # Verify server certificate.
    
    cipher AES-256-CBC  
    
    # Select a cryptographic cipher.
    
    verb 3  
    
    # Set log file verbosity.

    Note: Change the certificate paths according to your environment. 

    You can use this file to connect to the OpenVPN server from different Operating Systems like Linux, macOS, or Windows.

    In this example, we’ll use the application Tunnelblick to connect from a macOS client. Tunnelblick is a free, open source graphic user interface for OpenVPN on macOS. It provides easy control of OpenVPN clients. It comes with all the necessary packages like OpenVPN, EasyRSA, and tun/tap drivers.

    As the OpenVPN configuration files have extensions of .tblk, .ovpn, or .conf, Tunnelblick can read all of them.

    To install a configuration file, drag and drop it on the Tunnelblick icon in the menu bar or on the list of configurations in the 'Configurations' tab of the 'VPN Details' window.

    And then, press on “Connect”.

    Now, you should have some new routes in your client machine:

    $ netstat -rn # or route -n on Linux OS
    
    Destination        Gateway Flags        Netif Expire
    
    10.8.0.1/32        10.8.0.5 UGSc         utun5
    
    10.8.0.5           10.8.0.6 UH           utun5
    
    172.31.16/20       10.8.0.5 UGSc         utun5

    As you can see, there is a route to the local database network via the VPN interface, so you should be able to  access the database service using the Private Database IP Address.

    $ mysql -p -h172.31.30.248
    
    Enter password:
    
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    
    Your MySQL connection id is 13
    
    Server version: 8.0.18-9 Percona Server (GPL), Release '9', Revision '53e606f'
    
    
    
    Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
    
    
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    
    affiliates. Other names may be trademarks of their respective
    
    owners.
    
    
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    
    
    mysql>

    It’s working. Now you have your traffic secured using a VPN to connect to your database node.

    Conclusion

    Protecting your data is a must if you’re accessing it over the internet, on-prem, or on a mixed environment. You must know how to encrypt and secure your remote access. 

    As you could see, with OpenVPN you can reach the remote database using the local network through an encrypted connection using self-signed certificates. So, OpenVPN looks like a great option for this task. It’s an open source solution, and the installation/configuration is pretty easy. We used a basic OpenVPN server configuration, so you can look for more complex configuration in the OpenVPN official documentation to improve your OpenVPN server.

    by Sebastian Insausti at January 07, 2020 07:42 PM

    January 06, 2020

    SeveralNines

    How to Configure ClusterControl to Run on NGINX

    ClusterControl uses the Apache HTTP Server to serve its web interface, but it is also possible to use nginx. nginx + PHP fastcgi is well-known for its capabilities to run on a small memory footprint compared to standard Apache + PHP DSO.

    In this post, we will show you how to run ClusterControl 1.7.5 and later on nginx web server by swapping out the default Apache web server installed during the initial deployment. This blog post does not mean that we officially support nginx, it just an alternative way that a portion of our users have been interested in.

    Apache Configuration

    Before we jump into nginx configurations, let’s look at how ClusterControl web application is configured with Apache web server. ClusterControl consists of a number of components, and some of them require specific Apache module to run properly:

    • ClusterControl UI - Requires Apache rewrite module + PHP 5.4 and later
    • ClusterControl Controller
    • ClusterControl Notifications - Requires Apache rewrite module
    • ClusterControl SSH - Requires Apache 2.4 proxy module (wstunnel for web socket)
    • ClusterControl Cloud

    ClusterControl UI is located in the Apache’s document root which might vary depending on the operating system. For legacy OS distribution like Ubuntu 14.04 LTS and Debian 8, the Apache's document root is located at /var/www. For more recent OS distributions, most of them are now running with Apache 2.4 with /var/www/html as the default document root.

    Step One

    Make sure ClusterControl UI exists in the Apache document root. Document root for RedHat/CentOS and Ubuntu 14.04 LTS (Apache 2.4) is located at /var/www/html while Debian and Ubuntu 12.04 and lower is located at /var/www. ClusterControl UI will be installed under this document root directory and you should see something like this:

    $ ls -al /var/www/html
    
    total 16
    
    drwxr-xr-x 4 root   root 4096 Aug 8 11:42 .
    
    drwxr-xr-x 4 root   root 4096 Dec 19 03:32 ..
    
    dr-xr-xr-x 6 apache apache 4096 Dec 19 03:38 clustercontrol
    
    drwxrwx--- 3 apache apache 4096 Dec 19 03:29 cmon

    Step Two

    Apache must be able to read custom configuration file (.htaccess) under the document root directory. Thus the installer script will generate a configuration file and set the global AllowOverride option to All. Example in /etc/httpd/conf.d/s9s.conf:

        <Directory />
    
                Options +FollowSymLinks
    
                AllowOverride All
    
        </Directory>
    
        <Directory /var/www/html>
    
                Options +Indexes +FollowSymLinks +MultiViews
    
                AllowOverride All
    
                Require all granted
    
        </Directory>

    Step Three

    ClusterControl also requires the following rewrite rules:

        RewriteEngine On
    
        RewriteRule ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/ [R=301]
    
        RewriteRule ^/clustercontrol/ssh/term/ws/(.*)$ ws://127.0.0.1:9511/ws/$1 [P,L]
    
        RewriteRule ^/clustercontrol/ssh/term/(.*)$ http://127.0.0.1:9511/$1 [P]
    
        RewriteRule ^/clustercontrol/sse/events/(.*)$ http://127.0.0.1:9510/events/$1 [P,L]

    The first 3 URL rewrite rules indicate that ClusterControl SSH URL will be rewritten to use WebSocket tunneling on port 9511. This allows ClusterControl users to access the monitored nodes via SSH directly inside the ClusterControl UI.

    You may also notice another line with "sse/events" where the URL will be rewritten to port 9510 for cmon-events integration. Application cmon-events is a binary comes within ClusterControl Notifications package for notification integration with 3rd-party software like Slack, Telegram, Pagerduty and web hooks.

    Step Four

    Thus, ClusterControl suite requires the following PHP/Apache modules to be installed and enabled:

    • common
    • mysql
    • ldap
    • gd
    • curl
    • mod_proxy (websocket)

    The standard Apache installation via package manager will install PHP to run as dynamic shared object (DSO). Running on this mode will require you to restart Apache in case of PHP configuration changes.

    The following command should install all required packages for ClusterControl:

    $ yum install httpd php php-mysql php-ldap php-gd php-curl mod_ssl #RHEL/CentOS
    
    $ apt-get install apache2 php5-common php5-mysql php5-ldap php5-gd libapache2-mod-php5 php5-json php5-curl #Debian/Ubuntu

    Step Five

    The ClusterControl web components must be owned by Apache web server user ("apache" for RHEL/CentOS and "www-data" for Debian/Ubuntu).

    Switching from Apache to nginx

    We would need to configure nginx to behave similarly to our Apache configuration, as most of the Severalnines tools assume that ClusterControl is running on Apache. 

    Step One

    Install ClusterControl via the installer script:

    $ wget https://severalnines.com/downloads/cmon/install-cc
    
    $ chmod 755 install-cc
    
    $ ./install-cc

    The above will install ClusterControl and its components on top of Apache web server.

    Step Two

    Enable nginx repository. Depending on your operating system, please refer to this installation guide for details.

    Step Three

    Install nginx and PHP FPM:

    $ yum install nginx php-fpm -y #RHEL/CentOS
    
    $ sudo apt-get install nginx php5-fpm -y #Debian/Ubuntu

    Step Four

    Take note that removing Apache2 directly might cause dependent PHP packages to be uninstalled as well. So we take a safer approach by just turning it off and disabling it to start on boot:

    Systemd:

    $ systemctl stop httpd
    
    $ systemctl disable httpd

    Sysvinit RHEL/CentOS:

    $ chkconfig httpd off
    
    $ service httpd stop

    Sysvinit Debian/Ubuntu:

    $ sudo update-rc.d -f apache2 remove
    
    $ sudo service apache2 stop

    Step Five

    Open the nginx default virtual host configuration file (RHEL/CentOS: /etc/nginx/conf.d/default.conf, Debian/Ubuntu: /etc/nginx/sites-available/default) and make sure it contains the following lines:

    server {
    
            listen       0.0.0.0:80;
    
            server_name  localhost;
    
    
    
            access_log /var/log/nginx/localhost-access.log;
    
            error_log /var/log/nginx/localhost-error.log;
    
    
    
            root /var/www/html;
    
            index index.php;
    
    
    
            location ~ \.htaccess {
    
                    deny all;
    
            }
    
    
    
            location ~ \.php$ {
    
                    fastcgi_pass 127.0.0.1:9000;
    
                    fastcgi_index index.php;
    
                    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    
                    include /etc/nginx/fastcgi_params;
    
            }
    
    
    
            # Handle requests to /clustercontrol
    
            location /clustercontrol {
    
                    alias /var/www/html/clustercontrol/app/webroot;
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/index.php;
    
            }
    
    
    
            # Equivalent of $is_args but adds an & character
    
            set $is_args_amp "";
    
            if ($is_args != "") {
    
                    set $is_args_amp "&";
    
            }
    
    
    
            # Handle requests to /clustercontrol/access
    
            location ~ "^/clustercontrol/access/(.*)$" {
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/access/index.php?url=$1$is_args_amp$args;
    
            }
    
    
    
            # Handle requests to /clustercontrol/access2
    
            location ~ "^/clustercontrol/access2/(.*)$" {
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/access2/index.php?url=$1$is_args_amp$args;
    
            }
    
    
    
            # Pass to cmon-events module
    
            location /clustercontrol/sse/events/ {
    
                    proxy_pass http://127.0.0.1:9510/events/;
    
            }
    
    
    
            # Pass to cmon-ssh module
    
            location /clustercontrol/ssh/term/ {
    
                    proxy_pass http://127.0.0.1:9511/;
    
            }
    
    
    
            # Pass cmon-ssh module via websocket
    
            location /clustercontrol/ssh/term/ws/ {
    
                    proxy_set_header X-Forwarded-Host $host:$server_port;
    
                    proxy_set_header X-Forwarded-Server $host;
    
                    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    
                    proxy_http_version 1.1;
    
                    proxy_set_header Upgrade $http_upgrade;
    
                    proxy_set_header Connection "upgrade";
    
                    proxy_pass http://127.0.0.1:9511/ws/;
    
            }
    
    
    
            # Handle requests to /clustercontrol/ssh
    
            location /clustercontrol/ssh/ {
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/index.php?url=$1$is_args_amp$args;
    
            }
    
    
    
            # Redirect /clustercontrol/ssh/term to /term/
    
            rewrite ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/$1 permanent;
    
    
    
    }

    The above configuration example is specifically written to run ClusterControl UI on nginx in RHEL/CentOS. For other OS distributions, replace any occurrences of /var/www/html to its respective document root.

    Step Six

    Create a new virtual host configuration for HTTPS (optional):

    $ vim /etc/nginx/conf.d/s9s-ssl.conf #RHEL/CentOS
    
    $ vim /etc/nginx/sites-available/s9s-ssl #Debian/Ubuntu

    And make sure it contains the following lines:

    server {
    
            listen       443 ssl;
    
            server_name  localhost;
    
    
    
            access_log /var/log/nginx/localhost-access.log;
    
            error_log /var/log/nginx/localhost-error.log;
    
    
    
            # SSL cert and key path
    
            ssl_certificate      /etc/pki/tls/certs/s9server.crt;
    
            ssl_certificate_key  /etc/pki/tls/private/s9server.key;
    
    
    
            ssl_session_cache shared:SSL:1m;
    
            ssl_session_timeout  5m;
    
            ssl_ciphers  HIGH:!aNULL:!MD5;
    
            ssl_prefer_server_ciphers   on;
    
    
    
    
            root /var/www/html;
    
            index index.php;
    
    
    
            location ~ \.htaccess {
    
                    deny all;
    
            }
    
    
    
            location ~ \.php$ {
    
                    fastcgi_pass 127.0.0.1:9000;
    
                    fastcgi_index index.php;
    
                    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    
                    include /etc/nginx/fastcgi_params;
    
            }
    
    
    
            # Handle requests to /clustercontrol
    
            location /clustercontrol {
    
                    alias /var/www/html/clustercontrol/app/webroot;
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/index.php;
    
            }
    
    
    
            # Equivalent of $is_args but adds an & character
    
            set $is_args_amp "";
    
            if ($is_args != "") {
    
                    set $is_args_amp "&";
    
            }
    
    
    
            # Handle requests to /clustercontrol/access
    
            location ~ "^/clustercontrol/access/(.*)$" {
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/access/index.php?url=$1$is_args_amp$args;
    
            }
    
    
    
            # Handle requests to /clustercontrol/access2
    
            location ~ "^/clustercontrol/access2/(.*)$" {
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/access2/index.php?url=$1$is_args_amp$args;
    
            }
    
    
    
            # Pass to cmon-events module
    
            location /clustercontrol/sse/events/ {
    
                    proxy_pass http://127.0.0.1:9510/events/;
    
            }
    
    
    
            # Pass to cmon-ssh module
    
            location /clustercontrol/ssh/term/ {
    
                    proxy_pass http://127.0.0.1:9511/;
    
            }
    
    
    
            # Pass cmon-ssh module via websocket
    
            location /clustercontrol/ssh/term/ws/ {
    
                    proxy_set_header X-Forwarded-Host $host:$server_port;
    
                    proxy_set_header X-Forwarded-Server $host;
    
                    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    
                    proxy_http_version 1.1;
    
                    proxy_set_header Upgrade $http_upgrade;
    
                    proxy_set_header Connection "upgrade";
    
                    proxy_pass http://127.0.0.1:9511/ws/;
    
            }
    
    
    
            # Handle requests to /clustercontrol/ssh
    
            location /clustercontrol/ssh/ {
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/index.php?url=$1$is_args_amp$args;
    
            }
    
    
    
            # Redirect /clustercontrol/ssh/term to /term/
    
            rewrite ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/$1 permanent;
    
    
    
    }

    The above configuration example is specifically written to run ClusterControl UI on nginx in RHEL/CentOS. Replace any occurrences of the following:

    • /var/www/html to its respective document root for other OS distribution
    • /etc/pki/tls/certs/s9server.crt to /etc/ssl/certs/s9server.crt for Debian/Ubuntu
    • /etc/pki/tls/private/s9server.key to /etc/ssl/private/s9server.key for Debian/Ubuntu

    For Debian/Ubuntu, and extra step is needed to create a symlink for /etc/nginx/sites-enabled/default-ssl:
     

    $ sudo ln -sf /etc/nginx/sites-available/default-ssl /etc/nginx/sites-enabled/default-ssl

    Step Seven

    Enable and start nginx and php-fpm:

    Systemd:

    $ systemctl enable php-fpm
    
    $ systemctl enable nginx
    
    $ systemctl restart php-fpm
    
    $ systemctl restart nginx

    Sysvinit RHEL/CentOS:

    $ chkconfig php-fpm on
    
    $ chkconfig nginx on
    
    $ service php-fpm start
    
    $ service nginx start

    Sysvinit Debian/Ubuntu:

    $ sudo update-rc.d -f php-fpm defaults
    
    $ sudo update-rc.d -f nginx defaults
    
    $ sudo service php-fpm start
    
    $ sudo service nginx start

    Installation is now complete. At this point, PHP should run under fastcgi mode and nginx has taken over the web server role from Apache to serve ClusterControl UI. We can verify that with any web server detector extension on your preferred web browser:

    Caveats

    • Severalnines’s s9s_error_reporter might not get a complete error report on ClusterControl UI since it doesn’t collect any nginx related log files.
    • ClusterControl is built on a common Apache configuration. There might be some features that do not function well (although we have not encountered any malfunctions so far).
    • If you want to install ClusterControl manually on nginx (without using ClusterControl installer script), we recommend users to follow the Manual Installation documentation and install ClusterControl on Apache first. Then, follow the steps under "Switching from Apache to nginx" section to run on nginx.

    by ashraf at January 06, 2020 08:05 PM

    Federico Razzoli

    Understanding tables usage with User Statistics (Percona Server, MariaDB)

    Let's use Percona User Statistics to analyse our most used tables, and to look for problems where they mostly matter.

    by Federico Razzoli at January 06, 2020 01:32 PM

    January 03, 2020

    SeveralNines

    Tips for Delivering MySQL Database Performance - Part One

    The database backend affects the application, which can then impact organizational performance. When this happens, those in charge tend to want a quick fix. There are many different roads to improve performance in MySQL. As a very popular choice for many organizations, it's pretty common to find a MySQL installation with the default configuration. This might not, however, be appropriate for your workload and setup needs.

    In this blog, we will help you to better understand your database workload and the things that may cause harm to it. Knowledge of how to use limited resources is essential for anyone managing the database, especially if you run your production system on MySQL DB.

    To ensure that the database performs as expected, we will start with the free MySQL monitoring tools. We will then look at the related MySQL parameters you can tweak to improve the database instance. We will also take a look at indexing as a factor in database performance management. 

    To be able to achieve optimal usage of hardware resources, we’ll take a look into kernel optimization and other crucial OS settings. Finally, we will look into trendy setups based on MySQL Replication and how it can be examined in terms of performance lag. 

    Identifying MySQL Performance Issues

    This analysis helps you to understand the health and performance of your database better. The tools listed below can help to capture and understand every transaction, letting you stay on top of its performance and resource consumption.

    PMM (Percona Monitoring and Management)

    Percona Monitoring and Management tool is an open-source collection of tools dedicated to MySQL, MongoDB, and MariaDB databases (on-premise or in the cloud). PPM is free to use, and it's based on the well known Grafana and Prometheus time series DB. It Provides a thorough time-based analysis for MySQL.  It offers preconfigured dashboards that help to understand your database workload.

    PMM uses a client/server model. You'll have to download and install both the client and the server. For the server, you can use Docker Container. It's as easy as pulling the PMM server docker image, creating a container, and launching PMM.

    Pull PMM Server Image

    docker pull percona/pmm-server:2
    
    2: Pulling from percona/pmm-server
    
    ab5ef0e58194: Downloading  2.141MB/75.78MB
    
    cbbdeab9a179: Downloading  2.668MB/400.5MB

    Create PMM Container

    docker create \
    
       -v /srv \
    
       --name pmm-data \
    
       percona/pmm-server:2 /bin/true

    Run Container

    docker run -d \
    
       -p 80:80 \
    
       -p 443:443 \
    
       --volumes-from pmm-data \
    
       --name pmm-server \
    
       --restart always \
    
       percona/pmm-server:2

    You can also check how it looks without an installation. A demo of PMM is available here.

    Another tool that is part of PMM tools set is Query Analytics (QAN). QAN tool stays on top of the execution time of queries. You can even get details of SQL queries. It also gives a historical view of the different parameters that are critical for the optimal performance of a MySQL Database Server. This often helps to understand if any changes in the code could harm your performance. For example, a new code was introduced without your knowledge.  A simple use would be to display current SQL queries and highlight issues to help you improve the performance of your database.

    PMM offers point-in-time and historical visibility of MySQL database performance. Dashboards can be customized to meet your specific requirements. You can even expand a particular panel to find the information you want about a past event.

    Free Database Monitoring with ClusterControl

    ClusterControl provides real-time monitoring of the entire database infrastructure. It supports various database systems starting with MySQL, MariaDB, PerconaDB, MySQL NDB Cluster, Galera Cluster (both Percona and MariaDB), MongoDB, PostgreSQL and TimescaleDB. The monitoring and deployment modules are free to use.

    ClusterControl consists of several modules. In the free ClusterControl Community Edition we can use:

    Performance advisors offer specific advice on how to address database and server issues, such as performance, security, log management, configuration, and capacity planning. Operational reports can be used to ensure compliance across hundreds of instances. However, monitoring is not management. ClusterControl has features like backup management, automated recovery/failover, deployment/scaling, rolling upgrades, security/encryption, load balancer management, and so on.

    Monitoring & Advisors

    The ClusterControl Community Edition offers free database monitoring which provides a unified view of all of your deployments across data centers and lets you drill down into individual nodes. Similar to PMM we can find dashboards based on real-time data. It’s to know what is happening now, with high-resolution metrics for better accuracy, pre-configured dashboards, and a wide range of third-party notification services for alerting.

    On-premises and cloud systems can be monitored and managed from one single point. Intelligent health-checks are implemented for distributed topologies, for instance, detection of network partitioning by leveraging the load balancer’s view of the database nodes.

    ClusterControl Workload Analytics in one of the monitoring components which can easily help you to track your database activities. It provides clarity into transactions/queries from applications. Performance exceptions are never expected, but they do occur and are easy to miss in a sea of data. Outlier discovery will get any queries that suddenly start to execute much slower than usual. It tracks the moving average and standard deviation for query execution times and detects/alerts when the difference between the value exceeds the mean by two standard deviations. 

    As we can see from the below picture, we were able to catch some queries that in between one day tend to change execution time on a specific time. 

    To install ClusterControl click here and download the installation script. The install script will take care of the necessary installation steps. 

    You should also check out the ClusterControl Demo to see it in action.

    You can also get a docker image with ClusterControl.

    $ docker pull severalnines/clustercontrol

    For more information on this, follow this article.

    MySQL Database Indexing

    Without an index, running that same query results in a scan of every row for the needed data. Creating an index on a field in a table creates extra data structure, which is the field value, and a pointer to the record it relates to. In other words, indexing produces a shortcut, with much faster query times on expansive tables. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. 

    Generally speaking, indexing works best on those columns that are the subject of the WHERE clauses in your commonly executed queries.

    Tables can have multiple indexes. Managing indexes will inevitably require being able to list the existing indexes on a table. The syntax for viewing an index is below.

    To check indexes on MySQL table run:

    SHOW INDEX FROM table_name;

    Since indices are only used to speed up the searching for a matching field within the records, it stands to reason that indexing fields used only for output would be simply a waste of disk space. Another side effect is that indexes may extend insert or delete operations, and thus when not needed, should be avoided.

    MySQL Database Swappiness

    On servers where MySQL is the only service running, it’s a good practice to set vm.swapiness = 1. The default setting is set to 60 which is not appropriate for a database system.

    vi /etc/sysctl.conf
    vm.swappiness = 1

    Transparent Huge Pages

    If you are running your MySQL on RedHat, make sure that Transparent Huge Pages is disabled.

    This can be checked by command:

    cat /proc/sys/vm/nr_hugepages
    0

    (0 means that transparent huge pages are disabled.)

    MySQL I/O Scheduler 

    In most distributions noop or deadline I/O schedulers should be enabled by default. To check it run

    cat /sys/block/sdb/queue/scheduler 

    MySQL Filesystem Options

    It’s recommended to use journaled file systems like xfs, ext4 or btrfs. MySQL works fine with all that of them and the differences more likely will come with supported maximum file size.

    • XFS (maximum filesystem size 8EB, maximum file size 8EB)
    • XT4 (maximum filesystem size 8EB, maximum file size 16TB)
    • BTRFS (maximum filesystem size 16EB, maximum file size 16EB)

    The default file system settings should apply fine.

    NTP Deamon

    It’s a good best practice to install NTP time server demon on database servers. Use one of the following system commands.

    #Red Hat
    yum install ntp
    #Debian
    sudo apt-get install ntp

    Conclusion

    This is all for part one. In the next article, we will continue with MySQL variables operating systems settings and useful queries to gather database performance status. 

    by Bart Oles at January 03, 2020 07:24 PM

    January 02, 2020

    SeveralNines

    Full MariaDB Encryption At-Rest and In-Transit for Maximum Data Protection - Part Two

    In the first part of this series, we have covered in-transit encryption configuration for MariaDB replication servers, where we configured client-server and replication encryptions. Taken from the first post, where we had partially configured our full encryption (as indicated by the green arrows on the left in the diagram) and in this blog post, we are going to complete the encryption setup with at-rest encryption to create a fully encrypted MariaDB replication setup.

    The following diagram illustrates our current setup and the final setup that we are going to achieve:

    At-Rest Encryption

    At-rest encryption means the data-at-rest like data files and logs are encrypted on the disk, makes it almost impossible for someone to access or steal a hard disk and get access to the original data (provided that the key is secured and not stored locally). Data-at-Rest Encryption, also known as Transparent Data Encryption (TDE), is supported in MariaDB 10.1 and later. Note that using encryption has an overhead of roughly 5-10%, depending on the workload and cluster type.

    For MariaDB, the following MariaDB components can be encrypted at-rest:

    • InnoDB data file (shared tablespace or individual tablespace, e.g, *.ibd and ibdata1)
    • Aria data and index files.
    • Undo/redo logs (InnoDB log files, e.g, ib_logfile0 and ib_logfile1).
    • Binary/relay logs.
    • Temporary files and tables.

    The following files can not be encrypted at the moment:

    • Metadata file (for example .frm files).
    • File-based general log/slow query log. Table-based general log/slow query log can be encrypted.
    • Error log.

    MariaDB's data-at-rest encryption requires the use of a key management and encryption plugins. In this blog post, we are going to use File Key Management Encryption Plugin, which is provided by default since MariaDB 10.1.3. Note that there are a number of drawbacks using this plugin, e.g, the key can still be read by root and MySQL user, as explained in the MariaDB Data-at-Rest Encryption page.

    Generating Key File

    Let's create a dedicated directory to store our at-rest encryption stuff:

    $ mkdir -p /etc/mysql/rest
    $ cd /etc/mysql/rest

    Create a keyfile. This is the core of encryption:

    $ openssl rand -hex 32 > /etc/mysql/rest/keyfile

    Append a string "1;" as the key identifier into the keyfile:

    $ echo '1;' 
    sed -i '1s/^/1;/' /etc/mysql/rest/keyfile

    Thus, when reading the keyfile, it should look something like this:

    $ cat /etc/mysql/rest/keyfile
    1;4eb5770dcfa691bc634cbcd3c6bed9ed4ccd0111f3d3b1dae2c51a90fbf16ed7

    The above simply means for key identifier 1, the key is 4eb... The key file needs to contain two pieces of information for each encryption key. First, each encryption key needs to be identified with a 32-bit integer as the key identifier. Second, the encryption key itself needs to be provided in hex-encoded form. These two pieces of information need to be separated by a semicolon.

    Create a password to encrypt the above key. Here we are going to store the password inside a file called "keyfile.passwd":

    $ echo -n 'mySuperStrongPassword' > /etc/mysql/rest/keyfile.passwd

    You could skip the above step if you would like to specify the password directly in the configuration file using file_key_management_filekey option. For example: file_key_management_filekey=mySuperStrongPassword

    But in this example, we are going to read the password that is stored in a file, thus we have to define the following line in the configuration file later on: 

    file_key_management_filekey=FILE:/etc/mysql/encryption/keyfile.passwd

    We are going to encrypt the clear text keyfile into another file called keyfile.enc, using password inside the password file:

    $  openssl enc -aes-256-cbc -md sha1 -pass file:/etc/mysql/rest/keyfile.passwd -in /etc/mysql/rest/keyfile -out /etc/mysql/rest/keyfile.enc

    When listing out the directory, we should see these 3 files:

    $ ls -1 /etc/mysql/rest/
    keyfile
    keyfile.enc
    keyfile.passwd

    The content of the keyfile.enc is simply an encrypted version of keyfile:

    To test out, we can decrypt the encrypted file using OpenSSL by providing the password file (keyfile.passwd):

    $ openssl aes-256-cbc -d -md sha1 -pass file:/etc/mysql/rest/keyfile.passwd -in /etc/mysql/rest/keyfile.enc
    1;4eb5770dcfa691bc634cbcd3c6bed9ed4ccd0111f3d3b1dae2c51a90fbf16ed7

    We can then remove the plain key because we are going to use the encrypted one (.enc) together with the password file:

    $ rm -f /etc/mysql/encryption/keyfile

    We can now proceed to configure MariaDB at-rest encryption.

    Configuring At-Rest Encryption

    We have to move the encrypted key file and password to the slaves to be used by MariaDB to encrypt/decrypt the data. Otherwise, an encrypted table being backed up from the master using physical backup like MariaDB Backup would be having a problem to read by the slaves (due to different key/password combination). Logical backup like mysqldump should work with different keys and passwords.

    On the slaves, create a directory to store at-rest encryption stuff:

    (slave1)$ mkdir -p /etc/mysql/rest
    (slave2)$ mkdir -p /etc/mysql/rest

    On the master, copy the encrypted keyfile and password file to the other slaves:

    (master)$ cd /etc/mysql/rest
    (master)$ scp keyfile.enc keyfile.passwd root@slave1:/etc/mysql/rest/
    (master)$ scp keyfile.enc keyfile.passwd root@slave2:/etc/mysql/rest/

    Protect the files from global access and assign "mysql" user as the ownership:

    $ chown mysql:mysql /etc/mysql/rest/*
    $ chmod 600 /etc/mysql/rest/*

    Add the following into MariaDB configuration file under [mysqld] or [mariadb] section:

    # at-rest encryption
    plugin_load_add              = file_key_management
    file_key_management_filename = /etc/mysql/rest/keyfile.enc
    file_key_management_filekey  = FILE:/etc/mysql/rest/keyfile.passwd
    file_key_management_encryption_algorithm = AES_CBC
    
    innodb_encrypt_tables            = ON
    innodb_encrypt_temporary_tables  = ON
    innodb_encrypt_log               = ON
    innodb_encryption_threads        = 4
    innodb_encryption_rotate_key_age = 1
    encrypt-tmp-disk-tables          = 1
    encrypt-tmp-files                = 1
    encrypt-binlog                   = 1
    aria_encrypt_tables              = ON

    Take note on the file_key_management_filekey variable, if the password is in a file, you have to prefix the path with "FILE:". Alternatively, you could also specify the password string directly (not recommended due to its verbosity): 

    file_key_management_filekey=mySuperStrongPassword

    Restart MariaDB server one node at a time, starting with the slaves:

    (slave1)$ systemctl restart mariadb
    (slave2)$ systemctl restart mariadb
    (master)$ systemctl restart mariadb

    Observe the error log and make sure MariaDB encryption is activated during start up:

    $ tail -f /var/log/mysql/mysqld.log
    ...
    2019-12-17  6:44:47 0 [Note] InnoDB: Encrypting redo log: 2*67108864 bytes; LSN=143311
    2019-12-17  6:44:48 0 [Note] InnoDB: Starting to delete and rewrite log files.
    2019-12-17  6:44:48 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 67108864 bytes
    2019-12-17  6:44:48 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to 67108864 bytes
    2019-12-17  6:44:48 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
    2019-12-17  6:44:48 0 [Note] InnoDB: New log files created, LSN=143311
    2019-12-17  6:44:48 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
    2019-12-17  6:44:48 0 [Note] InnoDB: Creating shared tablespace for temporary tables
    2019-12-17  6:44:48 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
    2019-12-17  6:44:48 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
    2019-12-17  6:44:48 0 [Note] InnoDB: Waiting for purge to start
    2019-12-17  6:44:48 0 [Note] InnoDB: 10.4.11 started; log sequence number 143311; transaction id 222
    2019-12-17  6:44:48 0 [Note] InnoDB: Creating #1 encryption thread id 139790011840256 total threads 4.
    2019-12-17  6:44:48 0 [Note] InnoDB: Creating #2 encryption thread id 139790003447552 total threads 4.
    2019-12-17  6:44:48 0 [Note] InnoDB: Creating #3 encryption thread id 139789995054848 total threads 4.
    2019-12-17  6:44:48 0 [Note] InnoDB: Creating #4 encryption thread id 139789709866752 total threads 4.
    2019-12-17  6:44:48 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
    2019-12-17  6:44:48 0 [Note] Plugin 'FEEDBACK' is disabled.
    2019-12-17  6:44:48 0 [Note] Using encryption key id 1 for temporary files
    ...

    You should see lines indicating encryption initialization in the error log. At this point, the majority of the encryption configuration is now complete.

    Testing Your Encryption

    Create a test database to test on the master:

    (master)MariaDB> CREATE SCHEMA sbtest;
    (master)MariaDB> USE sbtest;

    Create a standard table without encryption and insert a row:

    MariaDB> CREATE TABLE tbl_plain (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255));
    MariaDB> INSERT INTO tbl_plain SET data = 'test data';

    We can see the stored data in clear text when browsing the InnoDB data file using a hexdump tool:

    $ xxd /var/lib/mysql/sbtest/tbl_plain.ibd | less
    000c060: 0200 1c69 6e66 696d 756d 0002 000b 0000  ...infimum......
    000c070: 7375 7072 656d 756d 0900 0000 10ff f180  supremum........
    000c080: 0000 0100 0000 0000 0080 0000 0000 0000  ................
    000c090: 7465 7374 2064 6174 6100 0000 0000 0000  test data.......
    000c0a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

    Create an encrypted table and insert a row:

    MariaDB> CREATE TABLE tbl_enc (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255)) ENCRYPTED=YES;
    MariaDB> INSERT INTO tbl_enc SET data = 'test data';

    We can't tell what is stored in InnoDB data file for encrypted tables:

    $ xxd /var/lib/mysql/sbtest/tbl_enc.ibd | less
    000c060: 0c2c 93e4 652e 9736 e68a 8b69 39cb 6157  .,..e..6...i9.aW
    000c070: 3cd1 581c 7eb9 84ca d792 7338 521f 0639  <.X.~.....s8R..9
    000c080: d279 9eb3 d3f5 f9b0 eccb ed05 de16 f3ac  .y..............
    000c090: 6d58 5519 f776 8577 03a4 fa88 c507 1b31  mXU..v.w.......1
    000c0a0: a06f 086f 28d9 ac17 8923 9412 d8a5 1215  .o.o(....#......

    Note that the metadata file tbl_enc.frm is not encrypted at-rest. Only the InnoDB data file (.ibd) is encrypted.

    When comparing the "plain" binary or relay logs, we can clearly see the content of it using hexdump tool:

    $ xxd binlog.000002 | less
    0000560: 0800 0800 0800 0b04 726f 6f74 096c 6f63  ........root.loc
    0000570: 616c 686f 7374 0047 5241 4e54 2052 454c  alhost.GRANT REL
    0000580: 4f41 442c 4c4f 434b 2054 4142 4c45 532c  OAD,LOCK TABLES,
    0000590: 5245 504c 4943 4154 494f 4e20 434c 4945  REPLICATION CLIE
    00005a0: 4e54 2c45 5645 4e54 2c43 5245 4154 4520  NT,EVENT,CREATE
    00005b0: 5441 424c 4553 5041 4345 2c50 524f 4345  TABLESPACE,PROCE
    00005c0: 5353 2c43 5245 4154 452c 494e 5345 5254  SS,CREATE,INSERT
    00005d0: 2c53 454c 4543 542c 5355 5045 522c 5348  ,SELECT,SUPER,SH
    00005e0: 4f57 2056 4945 5720 4f4e 202a 2e2a 2054  OW VIEW ON *.* T

    While for an encrypted binary log, the content looks gibberish:

    $ xxd binlog.000004 | less
    0000280: 4a1d 1ced 2f1b db50 016a e1e9 1351 84ba  J.../..P.j...Q..
    0000290: 38b6 72e7 8743 7713 afc3 eecb c36c 1b19  8.r..Cw......l..
    00002a0: 7b3f 6176 208f 0000 00dc 85bf 6768 e7c6  {?av .......gh..
    00002b0: 6107 5bea 241c db12 d50c 3573 48e5 3c3d  a.[.$.....5sH.<=
    00002c0: 3179 1653 2449 d408 1113 3e25 d165 c95b  1y.S$I....>%.e.[
    00002d0: afb0 6778 4b26 f672 1bc7 567e da96 13f5  ..gxK&.r..V~....
    00002e0: 2ac5 b026 3fb9 4b7a 3ef4 ab47 6c9f a686  *..&?.Kz>..Gl...

    Encrypting Aria Tables

    For Aria storage engine, it does not support the ENCRYPTED option in CREATE/ALTER statement since it follows the aria_encrypt_tables global option. Therefore, when creating an Aria table, simply create the table with ENGINE=Aria option:

    MariaDB> CREATE TABLE tbl_aria_enc (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255)) ENGINE=Aria;
    MariaDB> INSERT INTO tbl_aria_enc(data) VALUES ('test data');
    MariaDB> FLUSH TABLE tbl_aria_enc;

    We can then verify the content of the table's data file (tbl_aria_enc.MAD) or index file (tbl_aria_enc.MAI) with hexdump tool. To encrypt an existing Aria table, the table needs to be re-built:

    MariaDB> ALTER TABLE db.aria_table ENGINE=Aria ROW_FORMAT=PAGE;

    This statement causes Aria to rebuild the table using the ROW_FORMAT table option. In the process, with the new default setting, it encrypts the table when it writes to disk.

    Encrypting General Log/Slow Query Log

    To encrypt general and slow query logs, we can set MariaDB log_output option to 'TABLE' instead of the default 'FILE':

    MariaDB> SET GLOBAL log_ouput = 'TABLE';

    However, MariaDB will by default create the necessary tables using CSV storage engine, which is not encrypted by MariaDB. No engines other than CSV, MyISAM or Aria are legal for the log tables. The trick is to rebuild the default CSV table with Aria storage engine, provided that aria_encrypt_tables option is set to ON. However, the respective log option must be turned off for the table alteration to succeed.

    Thus, the steps to encrypt general log table is:

    MariaDB> SET GLOBAL general_log = OFF;
    MariaDB> ALTER TABLE mysql.general_log ENGINE=Aria;
    MariaDB> SET GLOBAL general_log = ON;

    Similarly, for slow query log:

    MariaDB> SET GLOBAL slow_query_log = OFF;
    MariaDB> ALTER TABLE mysql.slow_log ENGINE=Aria;
    MariaDB> SET GLOBAL slow_query_log = ON;

    Verify the output of general logs within the server:

    MariaDB> SELECT * FROM mysql.general_log;
    +----------------------------+---------------------------+-----------+-----------+--------------+------------------------------+
    | event_time                 | user_host                 | thread_id | server_id | command_type | argument                     |
    +----------------------------+---------------------------+-----------+-----------+--------------+------------------------------+
    | 2019-12-17 07:45:53.109558 | root[root] @ localhost [] |        19 |     28001 |        Query | select * from sbtest.tbl_enc |
    | 2019-12-17 07:45:55.504710 | root[root] @ localhost [] |        20 |     28001 |        Query | select * from general_log    |
    +----------------------------+---------------------------+-----------+-----------+--------------+------------------------------+

    As well as the encrypted content of the Aria data file inside data directory using hexdump tool:

    $ xxd /var/lib/mysql/mysql/general_log.MAD | less
    0002040: 1d45 820d 7c53 216c 3fc6 98a6 356e 1b9e  .E..|S!l?...5n..
    0002050: 6bfc e193 7509 1fa7 31e2 e22a 8f06 3c6f  k...u...1..*..<o
    0002060: ae71 bb63 e81b 0b08 7120 0c99 9f82 7c33  .q.c....q ....|3
    0002070: 1117 bc02 30c1 d9a7 c732 c75f 32a6 e238  ....0....2._2..8
    0002080: d1c8 5d6f 9a08 455a 8363 b4f4 5176 f8a1  ..]o..EZ.c..Qv..
    0002090: 1bf8 113c 9762 3504 737e 917b f260 f88c  ...<.b5.s~.{.`..
    00020a0: 368e 336f 9055 f645 b636 c5c1 debe fbe7  6.3o.U.E.6......
    00020b0: d01e 028f 8b75 b368 0ef0 8889 bb63 e032  .....u.h.....c.2

    MariaDB at-rest encryption is now complete. Combine this with in-transit encryption we have done in the first post, our final architecture is now looking like this:

    Conclusion

    It's now possible to totally secure your MariaDB databases via encryption for protection against physical and virtual breach or theft. ClusterControl can help you maintain this type of security as well and you can download it for free here.

     

    by ashraf at January 02, 2020 10:45 AM

    January 01, 2020

    SeveralNines

    Full MariaDB Encryption At-Rest and In-Transit for Maximum Data Protection - Part One

    In this blog series, we are going to give you a complete walkthrough on how to configure a fully encrypted MariaDB server for at-rest and in-transit encryption, to ensure maximum protection of the data from being stolen physically or while transferring and communicating with other hosts. The basic idea is we are going to turn our "plain" deployment into a fully encrypted MariaDB replication, as simplified in the following diagram:

    We are going to configure a number of encryption components:

    • In-transit encryption, which consists of:
      • Client-server encryption
      • Replication encryption
    • At-rest encryption, which consists of:
      • Data file encryption
      • Binary/relay log encryption.

    Note that this blog post only covers in-transit encryption. We are going to cover at-rest encryption in the second part of this blog series.

    This deployment walkthrough assumed that we already have an already running MariaDB replication server. If you don't have one, you can use ClusterControl to deploy a new MariaDB replication within minutes, with fewer than 5 clicks. All servers are running on MariaDB 10.4.11 on CentOS 7 system.

    In-Transit Encryption

    Data can be exposed to risks both in transit and at rest and requires protection in both states. In-transit encryption protects your data if communications are intercepted while data moves between hosts through network, either from your site and the cloud provider, between services or between clients and the server.

    For MySQL/MariaDB, data is in motion when a client connects to a database server, or when a slave node replicates data from a master node. MariaDB supports encrypted connections between clients and the server using the TLS (Transport Layer Security) protocol. TLS is sometimes referred to as SSL (Secure Sockets Layer) but MariaDB does not actually use the SSL protocol for encrypted connections because its encryption is weak. More details on this at MariaDB documentation page.

    Client-Server Encryption

    In this setup we are going to use self-signed certificates, which means we do not use external parties like Google, Comodo or any popular Certificate Authority provider out there to verify our identity. In SSL/TLS, identity verification is the first step that must be passed before the server and client exchange their certificates and keys.

    MySQL provides a very handy tool called mysql_ssl_rsa_setup which takes care of the key and certificate generation automatically. Unfortunately, there is no such tool for MariaDB server yet. Therefore, we have to manually prepare and generate the SSL-related files for our MariaDB TLS needs.

    The following is a list of the files that we will generate using OpenSSL tool:

    • CA key - RSA private key in PEM format. Must be kept secret.
    • CA certificate - X.509 certificate in PEM format. Contains public key and certificate metadata.
    • Server CSR - Certificate signing request. The Common Name (CN) when filling the form is important, for example CN=mariadb-server
    • Server key - RSA private key. Must be kept secret.
    • Server cert - X.509 certificate signed by CA key. Contains public key and certificate metadata.
    • Client CSR - Certificate signing request. Must use a different Common Name (CN) than Server's CSR, for example CN=client1 
    • Client key - RSA private key. Must be kept secret.
    • Client cert - X.509 certificate signed by CA key. Contains public key and certificate metadata.

    First and foremost, create a directory to store our certs and keys for in-transit encryption:

    $ mkdir -p /etc/mysql/transit/
    $ cd /etc/mysql/transit/

    Just to give you an idea why we name the directory as mentioned is because in the next part of this blog series, we will create another directory for at-rest encryption at /etc/mysql/rest.

    Certificate Authority

    Generate a key file for our own Certificate Authority (CA):

    $ openssl genrsa 2048 > ca-key.pem
    Generating RSA private key, 2048 bit long modulus
    .......................+++
    ...............................................................................................................................................................................................................................................+++
    e is 65537 (0x10001)

    Generate a certificate for our own Certificate Authority (CA) based on the ca-key.pem generated before with expiration of 3650 days:

    $ openssl req -new -x509 -nodes -days 3650 -key ca-key.pem -out ca.pem
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    -----
    Country Name (2 letter code) [XX]:SE
    State or Province Name (full name) []:Stockholm
    Locality Name (eg, city) [Default City]:Stockholm
    Organization Name (eg, company) [Default Company Ltd]:Severalnines
    Organizational Unit Name (eg, section) []:
    Common Name (eg, your name or your server's hostname) []:CA
    Email Address []:info@severalnines.com

    Now we should have ca-key.pem and ca.pem under this working directory.

    Key and Certificate for Server

    Next, generate private key for the MariaDB server:

    $ openssl genrsa 2048 > server-key.pem
    Generating RSA private key, 2048 bit long modulus
    .............................................................................................................+++
    ..................................................................................................................+++
    e is 65537 (0x10001)

    A trusted certificate must be a certificate signed by a Certificate Authority whereby here, we are going to use our own CA because we trust the hosts in the network. Before we can create a signed certificate, we need to generate a request certificate called Certificate Signing Request (CSR).

    Create a CSR for MariaDB server. We are going to call the certificate as server-req.pem. This is not the certificate that we are going to use for MariaDB server. The final certificate is the one that will be signed by our own CA private key (as shown in the next step):

    $ openssl req -new -key server-key.pem -out server-cert.pem
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    -----
    Country Name (2 letter code) [XX]:SE
    State or Province Name (full name) []:Stockholm
    Locality Name (eg, city) [Default City]:Stockholm
    Organization Name (eg, company) [Default Company Ltd]:Severalnines
    Organizational Unit Name (eg, section) []:
    Common Name (eg, your name or your server's hostname) []:MariaDBServer
    Email Address []:info@severalnines.com
    
    Please enter the following 'extra' attributes
    to be sent with your certificate request
    A challenge password []:
    An optional company name []:

    Take note on the Common Name where we specified "MariaDBServer". This can be any name but the value must not be the same as the client certificate. Commonly, if the applications connect to the MariaDB server via FQDN or hostname (skip-name-resolve=OFF), you probably want to specify the MariaDB server's FQDN as the Common Name. Doing so allows you to connect with 

    We can then generate the final X.509 certificate (server-cert.pem) and sign the CSR (server-req.pem) with CA's certificate (ca.pem) and CA's private key (ca-key.pem):

    $ openssl x509 -req -in server-req.pem -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out server-cert.pem -days 3650 -sha256
    Signature ok
    subject=/C=SE/ST=Stockholm/L=Stockholm/O=Severalnines/CN=MariaDBServer/emailAddress=info@severalnines.com
    Getting CA Private Key

    At this point, this is what we have now:

    $ ls -1 /etc/mysql/transite
    ca-key.pem
    ca.pem
    server-cert.pem
    server-key.pem
    server-req.pem

    We only need the signed certificate (server-cert.pem) and the private key (server-key.pem) for the MariaDB server. The CSR (server-req.pem) is no longer required.

    Key and Certificate for the Client

    Next, we need to generate key and certificate files for the MariaDB client. The MariaDB server will only accept remote connection from the client who has these certificate files. 

    Start by generating a 2048-bit key for the client:

    $ openssl genrsa 2048 > client-key.pem
    Generating RSA private key, 2048 bit long modulus
    .............................................................................................................+++
    ..................................................................................................................+++
    e is 65537 (0x10001)

    Create CSR for the client called client-req.pem:

    $ openssl req -new -key client-key.pem -out client-req.pem
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    -----
    Country Name (2 letter code) [XX]:SE
    State or Province Name (full name) []:Stockholm
    Locality Name (eg, city) [Default City]:Stockholm
    Organization Name (eg, company) [Default Company Ltd]:Severalnines
    Organizational Unit Name (eg, section) []:
    Common Name (eg, your name or your server's hostname) []:Client1
    Email Address []:info@severalnines.com
    
    Please enter the following 'extra' attributes
    to be sent with your certificate request
    A challenge password []:
    An optional company name []:

    Pay attention to the Common Name where we specify "Client1". Specify any name that represents the client. This value must be different from the server's Common Name. For advanced usage, you can use this Common Name to allow certain user with certificate matching this value, for example:

    MariaDB> GRANT SELECT ON schema1.* TO 'client1'@'192.168.0.93' IDENTIFIED BY 's' REQUIRE SUBJECT '/CN=Client2';

    We can then generate the final X.509 certificate (client-cert.pem) and sign the CSR (client-req.pem) with CA's certificate (ca.pem) and CA's private key (ca-key.pem):

    $ openssl x509 -req -in client-req.pem -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out client-cert.pem -days 3650 -sha256
    Signature ok
    subject=/C=SE/ST=Stockholm/L=Stockholm/O=Severalnines/CN=Client1/emailAddress=info@severalnines.com
    Getting CA Private Key

    All certificates that we need for in-transit encryption setup are generated. Verify both certificates are correctly signed by the CA:

    $ openssl verify -CAfile ca.pem server-cert.pem client-cert.pem
    server-cert.pem: OK
    client-cert.pem: OK

    Configuring SSL for MariaDB

    Create a new directory on the every slave:

    (slave1)$ mkdir -p /etc/mysql/transit/
    (slave2)$ mkdir -p /etc/mysql/transit/

    Copy the encryption files to all slaves:

    $ scp -r /etc/mysql/transit/* root@slave1:/etc/mysql/transit/
    $ scp -r /etc/mysql/transit/* root@slave2:/etc/mysql/transit/

    Make sure the owner of the certs directory to the "mysql" user and change the permissions of all key files so it won't be readable globally:

    $ cd /etc/mysql/transit
    $ chown -R mysql:mysql *
    $ chmod 600 client-key.pem server-key.pem ca-key.pem

    Here is what you should see when listing out files under "transit" directory:

    $ ls -al /etc/mysql/transit
    total 32
    drwxr-xr-x. 2 root  root 172 Dec 14 04:42 .
    drwxr-xr-x. 3 root  root 24 Dec 14 04:18 ..
    -rw-------. 1 mysql mysql 1675 Dec 14 04:19 ca-key.pem
    -rw-r--r--. 1 mysql mysql 1383 Dec 14 04:22 ca.pem
    -rw-r--r--. 1 mysql mysql 1383 Dec 14 04:42 client-cert.pem
    -rw-------. 1 mysql mysql 1675 Dec 14 04:42 client-key.pem
    -rw-r--r--. 1 mysql mysql 1399 Dec 14 04:42 client-req.pem
    -rw-r--r--. 1 mysql mysql 1391 Dec 14 04:34 server-cert.pem
    -rw-------. 1 mysql mysql 1679 Dec 14 04:28 server-key.pem
    -rw-r--r--. 1 mysql mysql 1415 Dec 14 04:31 server-req.pem

    Next, we will enable the SSL connection for MariaDB. On every MariaDB host (master and slaves) edit the configuration file and add the following lines under [mysqld] section:

    ssl-ca=/etc/mysql/transit/ca.pem
    ssl-cert=/etc/mysql/transit/server-cert.pem
    ssl-key=/etc/mysql/transit/server-key.pem

    Restart MariaDB server one node at a time, starting from slaves and finally on the master:

    (slave1)$ systemctl restart mariadb
    (slave2)$ systemctl restart mariadb
    (master)$ systemctl restart mariadb

    After restarted, MariaDB is now capable of accepting plain connections by connecting to it without any SSL-related parameters or with encrypted connections, when you specify SSL-related parameter in the connection string.

    For ClusterControl users, you can enable client-server encryption a matter of clicks. Just go to ClusterControl -> Security -> SSL Encryption -> Enable -> Create Certificate -> Certificate Expiration -> Enable SSL:

    ClusterControl will generate the required keys, X.509 certificate and CA certificate and set up SSL encryption for client-server connections for all the nodes in the cluster. For MySQL/MariaDB replication, the SSL files will be located under /etc/ssl/replication/cluster_X, where X is the cluster ID on every database node. The same certificates will be used on all nodes and the existing ones might be overwritten. The nodes must be restarted individually after this job completes. We recommend that you first restart a replication slave and verify that the SSL settings work.

    To restart every node, go to ClusterControl -> Nodes -> Node Actions -> Restart Node. Do restart one node at a time, starting with the slaves. The last node should be the master node with force stop flag enabled:

    You can tell if a node is able to handle client-server encryption by looking at the green lock icon right next to the database node in the Overview grid:

    At this point, our cluster is now ready to accept SSL connection from MySQL users.

    Connecting via Encrypted Connection

    The MariaDB client requires all client-related SSL files that we have generated inside the server. Copy the generated client certificate, CA certificate and client key to the client host:

    $ cd /etc/mysql/transit
    $ scp client-cert.pem client-key.pem ca.pem root@client-host:~

    **ClusterControl generates the client SSL files under /etc/ssl/replication/cluster_X/on every database node, where X is the cluster ID.

    Create a database user that requires SSL on the master:

    MariaDB> CREATE SCHEMA sbtest;
    MariaDB> CREATE USER sbtest@'%' IDENTIFIED BY 'mysecr3t' REQUIRE SSL;
    MariaDB> GRANT ALL PRIVILEGES ON sbtest.* to sbtest@'%';

    From the client host, connect to the MariaDB server with SSL-related parameters. We can verify the connection status by using "STATUS" statement:

    (client)$ mysql -usbtest -p -h192.168.0.91 -P3306 --ssl-cert client-cert.pem --ssl-key client-key.pem --ssl-ca ca.pem -e 'status'
    ...
    Current user: sbtest@192.168.0.19
    SSL: Cipher in use is DHE-RSA-AES256-GCM-SHA384
    ...

    Pay attention to the SSL line where the cipher is used for the encryption. This means the client is successfully connected to the MariaDB server via encrypted connection. 

    At this point, we have encrypted the client-server connection to the MariaDB server, as represented by the green two-headed arrow in the following diagram:

    In the next part, we are going to encrypt replication connections between nodes.

    Replication Encryption

    Setting up encrypted connections for replication is similar to doing so for client/server connections. We can use the same client certificates, key and CA certificate to let the replication user access the master's server via encryption channel. This will indirectly enable encryption between nodes when slave IO thread pulls replication events from the master. 

    Let's configure this on one slave at a time. For the first slave, 192.168.0.92, add the following line under [client] section inside MariaDB configuration file:

    [client]
    ssl-ca=/etc/mysql/transit/ca.pem
    ssl-cert=/etc/mysql/transit/client-cert.pem
    ssl-key=/etc/mysql/transit/client-key.pem

    Stop the replication thread on the slave:

    (slave)MariaDB> STOP SLAVE;

    On the master, alter the existing replication user to force it to connect using SSL:

    (master)MariaDB> ALTER USER rpl_user@192.168.0.92 REQUIRE SSL;

    On the slave, test the connectivity to the master, 192.168.0.91 via mysql command line with --ssl flag:

    (slave)MariaDB> mysql -urpl_user -p -h192.168.0.91 -P 3306 --ssl -e 'status'
    ...
    Current user: rpl_user@192.168.0.92
    SSL: Cipher in use is DHE-RSA-AES256-GCM-SHA384
    ...

    Make sure you can get connected to the master host without error. Then, on the slave, specify the CHANGE MASTER statement with SSL parameters as below:

    (slave)MariaDB> CHANGE MASTER TO MASTER_SSL = 1, MASTER_SSL_CA = '/etc/mysql/transit/ca.pem', MASTER_SSL_CERT = '/etc/mysql/transit/client-cert.pem', MASTER_SSL_KEY = '/etc/mysql/transit/client-key.pem';

    Start the replication slave:

    (slave)MariaDB> START SLAVE;

    Verify that the replication is running okay with related SSL parameters:

    MariaDB> SHOW SLAVE STATUS\G
    ...
                  Slave_IO_Running: Yes
                 Slave_SQL_Running: Yes
                Master_SSL_Allowed: Yes
                Master_SSL_CA_File: /etc/mysql/transit/ca.pem
                   Master_SSL_Cert: /etc/mysql/transit/client-cert.pem
                    Master_SSL_Key: /etc/mysql/transit/client-key.pem
    ...

    The slave is now replicating from the master securely via TLS encryption.

    Repeat all of the above steps on the remaining slave, 192.168.0.93. The only difference is the alter user statement to be executed on the master where we have to change to its respective host:

    (master)MariaDB> ALTER USER rpl_user@192.168.0.93 REQUIRE SSL;

    At this point we have completed in-transit encryption as illustrated by the green lines from master to slaves in the following diagram:

    You can verify the encryption connection by looking at the tcpdump output for interface eth1 on the slave. The following is an example of standard replication without encryption:

    (plain-slave)$ tcpdump -i eth1 -s 0 -l -w - 'src port 3306 or dst port 3306' | strings
    tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
    H"-'
    binlog.000008Ulw
    binlog.000008Ulw
    sbtest
    sbtest
    create table t1 (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255))
    binlog.000008
    sbtest
    BEGIN3
    sbtest
    test data3
    Ok*Z
    binlog.000008*Z
    
    ^C11 packets captured
    11 packets received by filter
    0 packets dropped by kernel

    We can clearly see the text as read by the slave from the master. While on an encrypted connection, you should see gibberish characters like below:

    (encrypted-slave)$ tcpdump -i eth1 -s 0 -l -w - 'src port 3306 or dst port 3306' | strings
    tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
    :|f^yb#
    O5~_
    @#PFh
    k)]O
    jtk3c
    @NjN9_a
    !\-@
    NrF
    ?7&Y
    
    ^C6 packets captured
    6 packets received by filter
    0 packets dropped by kernel

    Conclusion

    In the next part of this blog series we are going to look into completing our fully encrypted setup with MariaDB at-rest encryption. Stay tuned!

    by ashraf at January 01, 2020 10:45 AM

    December 31, 2019

    SeveralNines

    An Overview of Multi-Document ACID Transactions in MongoDB and How to Use Them

    Database systems have a mandate to guarantee data consistency and integrity especially when critical data is involved. These aspects are enforced through ACID transactions in MongoDB. An ACID transaction should meet some defined rules for data validity before making any updates to the database otherwise it should be aborted and no changes shall be made to the database. All database transactions are considered as a single logical operation and during the execution time the database is put in an inconsistent state until the changes have been committed. Operations that successfully change the state of the database are termed as write transactions whereas those that do not update the database but only retrieve data are referred to as read-only transactions. ACID is an acronym for Atomicity, Consistency, Isolation, and Durability. 

    A database is a shared resource that can be accessed by different users at different or at the same time. For this reason, concurrent transactions may happen and if not well managed, they may result in system crashes, hardware failure, deadlock, slow database performance or repetition in the execution of the same transaction.

    What Are ACID Rules?

    All database systems must meet the ACID properties in order to guarantee data integrity.

    Atomicity

    A transaction is considered as a single unit of operation which can either succeed completely or fail completely. A transaction cannot be executed partially. If any condition consulting a transaction fails, the entire transaction will fail completely and the database will remain unchanged. For example, if you want to transfer funds from account X to Y, here there are two transactions, the first one is to remove funds from X and the second one is to record the funds in Y. If the first transaction fails, the whole transaction will be aborted

    Consistency

    When an operation is issued, before execution, the database is in a consistent state and it should remain so after every transaction. Even if there is an update, the transaction should always bring the database to a valid state, maintaining the database invariants. For instance, you cannot delete a primary key which has been referenced as a foreign key in another collection.  All data must meet the defined constraints to prevent data corruption from an illegal transaction.

    Isolation

    Multiple transactions running concurrently are executed without affecting each other and their result should be the same if they were to be executed sequentially. When two or more transactions modify the same documents in MongoDB, there may be a conflict. The database will detect a conflict immediately before it is committed. The first operation to acquire a lock on the document will continue whereas the other will fail and a conflict error message will be presented.

    Durability

    This dictates that, once the transaction has been committed, the changes should be upheld at all times even at an event of a system failure for example due to power outages or internet disconnection. 

    MongoDB ACID Transactions

    MongoDB is a document based  NoSQL database with a flexible schema. Transactions are not operations that should be executed for every write operation  since they incur a greater performance cost over a single document writes. With a document based structure and denormalized data model, there will be a minimized need for transactions. Since MongoDB allows document embedding, you don’t necessarily need to use a transaction to meet a write operation.

    MongoDB version 4.0 provides multi-document transaction support for replica set deployments only and probably the version 4.2 will extend support for sharded deployments (per their release notes). 

    Example of a transaction:

    Ensure you have a replica set in place first. Assuming you have a database called app and a collection users in the Mongo Shell run the following commands:

    $mongos and you should see something like username:PRIMARY>

    $use app
    
    $db.users.insert([{_id:1, name: ‘Brian’}, {_id:2, name: ‘Sheila’}, {_id:3, name: ‘James’}])

    We need to start a session for our transaction:

    $db.getMongo().startSession() and you should see something like 
    
    session { "id" : UUID("dcfa8de5-627d-3b1c-a890-63c9a355520c") }

    Using this session we can add more users using a transaction with the following commands 

    $session.startTransaction()
    
    session.getDatabase(‘app’).users.insert({_id:4, name:  ‘Hitler’})

    You will be presented with WriteResult({“nInsterted”: 2})

    The transaction has not yet been committed and the normal $db.users.find({}) will give us the previously saved users only. But if we run the 

    $session.getDatabase(“app”).users.find()

    the last added record will be available in the returned results. To commit this transaction, we run the command below

    $session.commitTransaction()

    The transaction modification is stored in memory that is why even after failure, the data will be available on recovery.

    Multi-Document ACID Transactions in MongoDB

    These are multi-statement operations that need to be executed sequentially without affecting each other. For the sample above we can create two transactions, one to add a user and another to update a user with a field of age. I.e.

    $session.startTransaction()
    
       db.users.insert({_id:6, name “Ibrahim”})
    
       db.users.updateOne({_id:3 , {$set:{age:50}}})
    
    session.commit_transaction()

    Transactions can be applied to operations against multiple documents contained in one or many collection/database. Any changes due to document transaction do not impact performance for workloads not related or do not require them. Until the transaction is committed, uncommitted writes are neither replicated to the secondary nodes nor are they readable outside the transactions.

    Best Practices for MongoDB Transactions

    The multi-document transactions are only supported in the WiredTiger storage engine. As mentioned before, very few applications would require transactions and if so, we should try to make them short. Otherwise, for a single ACID transaction, if you try performing an excessive number of operations, it can result in high pressure on the WiredTiger cache. The cache is always dictated to maintain state for all subsequent writes since the oldest snapshot was created. This means new writes will accumulate in the cache throughout the duration of the transaction and will be flushed only after transactions currently running on old snapshots are committed or aborted. For the best database performance on the transaction, developers should consider:

    1. Always modify a small number of documents in a transaction. Otherwise, you will need to break the transaction into different parts and process the documents in different batches. At most, process 1000 documents at a time.
    2. Temporary exceptions such as awaiting to elect primary and transient network hiccups may result in abortion of the transaction. Developers should establish a logic to retry the transaction if the defined errors are presented.
    3. Configure optimal duration for the execution of the transaction from the default 60 seconds provided by MongoDB. Besides, employ indexing so that it can allow fast data access within the transaction.  You also have the flexibility to fine-tune the transaction in addressing timeouts by breaking it into batches that allow its execution within the time limits.
    4. Decompose your transaction into a small set of operation so that it fits the 16MB size constraints. Otherwise, if the operation together with oplog description exceed this limit, the transaction will be aborted.
    5. All data relating to an entity should be stored in a single, rich document structure. This is to reduce the number of documents that are to be cached when different fields are going to be changed.

    Limitations of Transactions

    1. You cannot create or drop a collection inside a transaction.
    2. Transactions cannot make writes to a capped collection
    3. Transactions take plenty of time to execute and somehow they can slow the performance of the database.
    4. Transaction size is limited to 16MB requiring one to split any that tends to exceed this size into smaller transactions.
    5. Subjecting a large number of documents to a transaction may exert excessive pressure on the WiredTiger engine and since it relies on the snapshot capability, there will be a retention of large unflushed operations in memory. This renders some performance cost on the database.

    Conclusion

    MongoDB version 4.0 introduced the multi-document transaction support for replica sets as a feature of improving data integrity and consistency. However, there are very few applications that would require transactions when using MongoDB. There are limitations against this feature that make it considerably little bit immature as far as the transactions concept is concerned. For instance, transactions for a sharded cluster are not supported and they cannot be larger than 16MB size limit. Data modeling provides a better structure for reducing transactions in your database. Unless you are dealing with special cases, it will be a better practice to avoid transactions in MongoDB.

    by Onyancha Brian Henry at December 31, 2019 10:45 AM

    December 30, 2019

    SeveralNines

    Cloud Vendor Deep-Dive: PostgreSQL on DigitalOcean

    DigitalOcean is a cloud service provider, more of an IaaS (Infrastructure-as-a-Service) provider which is more suitable for small to medium scale businesses. You can get to know more about DigitalOcean here. What it does is a bit different to other cloud vendors like AWS or Azure and is not heavily global yet, take a look at this video which compares DigitalOcean with AWS. 

    They provide a geographically distributed computing platform in the form of virtual machines where-in businesses can deploy their applications on cloud infrastructure in an easy, fast and flexible manner. Their core focus is to provide cloud environments which are highly flexible, easy-to-set-up and can scale for various types of workloads. 

    What attracted me in DigitalOcean is the “droplets” service. Droplets are Linux based VMs which can be created as a standalone or can be part of a large cloud infrastructure with a chosen Linux flavoured operating systems like CentOS, Ubuntu, etc. 

    PostgreSQL on DigitalOcean

    With DigitalOcean, building PostgreSQL environments can be done in two ways, one way is to build manually from scratch using droplets (only Linux based VMs) or the other way is to use managed services.

    DigitalOcean started managed services for PostgreSQL with an intention to speed up the provisioning of database servers in the form of VMs on a large cloud infrastructure. Otherwise, the only way is to build PostgreSQL environments is manually by using droplets. The supported capabilities with managed services are high-availability, automatic failover, logging, and monitoring. Alerting capability does not exist yet. 

    The managed services more-or-less are similar to AWS RDS. The PostgreSQL instances can be only accessed using UI, there is no access to host running the database instance. Managing, Monitoring, parameter configuration, everything must be done from a UI.

    PostgreSQL Compatibility with DigitalOcean

    You can build PostgreSQL environments on Digital Ocean with the droplets or go for managed services (similar to AWS RDS) which can really save your time. The only supported versions on managed services are 10 and 11. This means, businesses willing to leverage DigitalOcean’s PostgreSQL managed services will need to use/upgrade-to either version 10 or 11. Also, note that there is no support for Windows operating system. 

    This blog will focus on managed services.

    Managed PostgreSQL Services

    DigitalOcean started providing managed PostgreSQL database services since February 2019. The intention was to introduce a faster way to provisioning infrastructure with PostgreSQL instances which can save valuable time for infrastructure database professionals. Provisioning a PostgreSQL instance is rather simple.

    This can be done by logging to the DO account → go to a create database cluster page → choose the PostgreSQL version → choose the specs based on pricing → choose the location → click create. You are all good. Watch this video here for a better understanding.

    High Availability

    High Availability is one of the critical requirements for databases to ensure business continuity. It is imperative to ensure that high-availability meets the SLAs defined for RTO and RPO. DigitalOcean provides high-availability services in a faster and reliable manner.

    Pricing

    The pricing model in DigitalOcean is not complex. The price of the instance is directly proportional to the capacity and architecture of the instance. Below is an example of pricing for a standalone instance -

    The capacity and pricing which suites the requirement can be chosen from the available options. Minimum is $15 per month for 10GB of disk and 1vCPU. If high-availability is a requirement, standby node can be configured as well. The limitation is that, a standby node can be added only if the primary database size is of minimum 25 GB. And, only a maximum of 5 standby nodes can be added. Below are the standby options available

    If you can observe above, standby pricing is pretty simple and does not depend on the capacity. Adding one standby node will cost $20 irrespective of any size.

    Access

    PostgreSQL instances build using managed services can be accessed using GUIs and remotely via CLI in SSL mode only. However, PostgreSQL instances manually installed on droplets can be accessed via ssh.

    Data Centres

    DigitalOcean is not heavily global yet. The data centres are located in a few countries as shown below. Which means, it is not possible to deploy/run services for businesses running their services in countries other than the ones shown below.

    Advantages of PostgreSQL Managed Services

    Managed services for PostgreSQL is advantageous for various reasons. In my experience as a DBA, the requirement often arises to build environments for developers in a faster manner possible to perform functional, regression, and performance testing for releases. Generally, the approach would be to use tools like chef or puppet to build automation modules for applications and database environments and then use those templates to build cloud VMs. DigitalOcean’s managed services can be a great, efficient, and cost-effective option for such requirements as it is bound to be time saving. Let us take a look at the advantageous in detail -

    • Opting for managed services can save a lot of time for DBAs and Developers in building PostgreSQL environments from scratch. This means, there is no database administration and maintenance overhead.
    • PostgreSQL environments can be equipped with High-availability with automatic failover capability. 
    • Managed instances are designed to sustain disaster. Daily backups can be configured with the PITR (point-in-time-recovery) capability. Importantly, backups are free.
    • Managed PostgreSQL instances are designed to be highly scalable. DigitalOcean’s customers were able to achieve higher scalability with PostgreSQL instances and TimescaleDB extensions.
    • Dashboard can be configured to monitor log files and query performance.
    • Cost model of DigitalOcean is pretty simple.
    • As it is a cloud infrastructure, vertical scaling can be seamless.
    • Managed database instances are highly secured and optimized. A big part of the data retrieval is only possible via SSL based connections.
    • Documentation is available in good detail.

    Limitations of Running PostgreSQL on DigitalOcean

    • PostgreSQL versions 10 and 11 are supported, no other versions can be used.
    • Data centres of DigitalOcean are only available at limited geographical locations.
    • The number of standby nodes cannot exceed 5.
    • PITR cannot go beyond 7 days.
    • Not all extensions for PostgreSQL are supported, only selected extensions can be used.
    • The instances can only be up-sized. They cannot be downsized.
    • Superuser access is not allowed.
    • Alerting on certain thresholds is not available yet.
    • Managed database instances can only be restored to a new node when restoring from backups.

    Conclusion

    Managed PostgreSQL services offered by DigitalOcean is a great option for businesses looking for devops type solutions for PostgreSQL environments which can really help reduce time, planning, administration, and maintenance overhead involved in building high-scale and secured PostgreSQL environments for various workloads. Their pricing model is very simple and it can be a cost-effective option. It cannot, however, really be compared to the massive cloud service providers like AWS or Azure. DigitalOcean can surely benefit businesses with its innovative cloud solutions.

    by Venkata Nagothi at December 30, 2019 10:45 AM

    December 24, 2019

    Oli Sennhauser

    FromDual Performance Monitor for MariaDB and MySQL 1.1.0 has been released

    FromDual has the pleasure to announce the release of the new version 1.1.0 of its popular Database Performance Monitor for MariaDB, MySQL and Galera Cluster fpmmm.

    The FromDual Performance Monitor for MariaDB and MySQL (fpmmm) enables DBAs and System Administrators to monitor what is going on inside their MariaDB and MySQL databases and on their machines where the databases reside.

    More detailed information your can find in the fpmmm Installation Guide.

    Download

    The new FromDual Performance Monitor for MariaDB and MySQL (fpmmm) can be downloaded from here. How to install and use fpmmm is documented in the fpmmm Installation Guide.

    In case you find a bug in the FromDual Performance Monitor for MariaDB and MySQL please report it to the FromDual Bugtracker or just send us an email.

    Any feedback, statements and testimonials are welcome as well! Please send them to us.

    Monitoring as a Service (MaaS)

    You do not want to set-up your Database monitoring yourself? No problem: Choose our MariaDB and MySQL Monitoring as a Service (Maas) program to safe time and costs!

    Installation of Performance Monitor 1.1.0

    A complete guide on how to install FromDual Performance Monitor you can find in the fpmmm Installation Guide.

    Upgrade from 1.0.x to 1.1.0

    shell> cd /opt
    shell> tar xf /download/fpmmm-1.1.0.tar.gz
    shell> rm -f fpmmm
    shell> ln -s fpmmm-1.1.0 fpmmm
    

    Changes in FromDual Performance Monitor for MariaDB and MySQL 1.1.0

    This release contains various bug fixes.

    You can verify your current FromDual Performance Monitor for MariaDB and MySQL version with the following command:

    shell> fpmmm --version
    

    General

    • fpmmm is now available for Cent OS with RPM packages and for Ubuntu with DEB packages.
    • MariaDB 10.4 seems to work and thus is officially declared as supported.
    • TimeZone made configurable.
    • Error printed to STDOUT changed to STDERR.
    • Return codes made unique.
    • De-support PHP versions older than 7.0.
    • All old PHP 5.5 stuff removed, we need now at least PHP 7.0.
    • Cosmetic fixes and error handling improved.

    fpmmm agent

    • Error message typo fixed.
    • All mpm remainings removed.
    • Upload: Error exit handling improved.

    fpmmm Templates

    • InnoDB Template: Links to mysql-forum replaced by links to fromdual.com.
    • Templates: Zabbix 4.0 templates added and tpl directory restructured.

    fpmmm Modules

    • Backup: Backup hook added to templates as example.
    • InnoDB: InnoDB buffer pool flushing data and graph added.
    • InnoDB: innodb_metrics replacing mostly SHOW ENGINE INNODB STATUS.
    • InnoDB: Started replacing SHOW ENGINE INNODB STATUS by I_S.innodb_metrics with Adaptive Hash Index (AHI).
    • InnoDB: innodb_file_format removed.
    • InnoDB: InnoDB files items and graph added.
    • InnoDB: Negative values of innodb_buffer_pool_pages_misc_b fixed.
    • InnoDB: Bug report of Wang Chao about InnoDB Adaptive Hash Index (AHI) size fixed.
    • Memcached: Memcached module fixed.
    • MySQL: MariaDB thread pool items and graph added.
    • MySQL: Slow Queries item fixed and graph added.
    • Server: Smartmon monitor added to monitor HDD/SSD.
    • Server: Server module made more robust and numactl replaced by cpuinfo.
    • Server: Server free function adapted according to Linux free command.
    • Server: Function getFsStatsLinux added for global file descriptor limits.
    • Aria: Aria cleaned-up, old mariadb_* variables removed, Aria transaction log graph added.
    • Aria: Aria pagecache blocks converted to bytes.

    fpmmm agent installer

    • No changes.

    For subscriptions of commercial use of fpmmm please get in contact with us.

    by Shinguz at December 24, 2019 11:34 AM

    December 20, 2019

    SeveralNines

    Maximizing Database Query Efficiency for MySQL - Part Two

    This is the second part of a two-part series blog for Maximizing Database Query Efficiency In MySQL. You can read part one here.

    Using Single-Column, Composite, Prefix, and Covering Index

    Tables that are frequently receiving high traffic must be properly indexed. It's not only important to index your table, but you also need to determine and analyze what are the types of queries or types of retrieval that you need for the specific table. It is strongly recommended that you analyze what type of queries or retrieval of data you need on a specific table before you decide what indexes are required for the table. Let's go over these types of indexes and how you can use them to maximize your query performance.

    Single-Column Index

    InnoD table can contain a maximum of 64 secondary indexes. A single-column index (or full-column index) is an index assigned only to a particular column. Creating an index to a particular column that contains distinct values is a good candidate. A good index must have a high cardinality and statistics so the optimizer can choose the right query plan. To view the distribution of indexes, you can check with SHOW INDEXES syntax just like below:

    root[test]#> SHOW INDEXES FROM users_account\G
    
    *************************** 1. row ***************************
    
            Table: users_account
    
       Non_unique: 0
    
         Key_name: PRIMARY
    
     Seq_in_index: 1
    
      Column_name: id
    
        Collation: A
    
      Cardinality: 131232
    
         Sub_part: NULL
    
           Packed: NULL
    
             Null: 
    
       Index_type: BTREE
    
          Comment: 
    
    Index_comment: 
    
    *************************** 2. row ***************************
    
            Table: users_account
    
       Non_unique: 1
    
         Key_name: name
    
     Seq_in_index: 1
    
      Column_name: last_name
    
        Collation: A
    
      Cardinality: 8995
    
         Sub_part: NULL
    
           Packed: NULL
    
             Null: 
    
       Index_type: BTREE
    
          Comment: 
    
    Index_comment: 
    
    *************************** 3. row ***************************
    
            Table: users_account
    
       Non_unique: 1
    
         Key_name: name
    
     Seq_in_index: 2
    
      Column_name: first_name
    
        Collation: A
    
      Cardinality: 131232
    
         Sub_part: NULL
    
           Packed: NULL
    
             Null: 
    
       Index_type: BTREE
    
          Comment: 
    
    Index_comment: 
    
    3 rows in set (0.00 sec)

    You can inspect as well with tables information_schema.index_statistics or mysql.innodb_index_stats.

    Compound (Composite) or Multi-Part Indexes

    A compound index (commonly called a composite index) is a multi-part index composed of multiple columns. MySQL allows up to 16 columns bounded for a specific composite index. Exceeding the limit returns an error like below:

    ERROR 1070 (42000): Too many key parts specified; max 16 parts allowed

    A composite index provides a boost to your queries, but it requires that you must have a pure understanding on how you are retrieving the data. For example, a table with a DDL of...

    CREATE TABLE `user_account` (
    
      `id` int(11) NOT NULL AUTO_INCREMENT,
    
      `last_name` char(30) NOT NULL,
    
      `first_name` char(30) NOT NULL,
    
      `dob` date DEFAULT NULL,
    
      `zip` varchar(10) DEFAULT NULL,
    
      `city` varchar(100) DEFAULT NULL,
    
      `state` varchar(100) DEFAULT NULL,
    
      `country` varchar(50) NOT NULL,
    
      `tel` varchar(16) DEFAULT NULL
    
      PRIMARY KEY (`id`),
    
      KEY `name` (`last_name`,`first_name`)
    
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1

    ...which consists of composite index `name`. The composite index improves query performance once these keys are reference as used key parts. For example, see the following:

    root[test]#> explain format=json select * from users_account where last_name='Namuag' and first_name='Maximus'\G
    
    *************************** 1. row ***************************
    
    EXPLAIN: {
    
      "query_block": {
    
        "select_id": 1,
    
        "cost_info": {
    
          "query_cost": "1.20"
    
        },
    
        "table": {
    
          "table_name": "users_account",
    
          "access_type": "ref",
    
          "possible_keys": [
    
            "name"
    
          ],
    
          "key": "name",
    
          "used_key_parts": [
    
            "last_name",
    
            "first_name"
    
          ],
    
          "key_length": "60",
    
          "ref": [
    
            "const",
    
            "const"
    
          ],
    
          "rows_examined_per_scan": 1,
    
          "rows_produced_per_join": 1,
    
          "filtered": "100.00",
    
          "cost_info": {
    
            "read_cost": "1.00",
    
            "eval_cost": "0.20",
    
            "prefix_cost": "1.20",
    
            "data_read_per_join": "352"
    
          },
    
          "used_columns": [
    
            "id",
    
            "last_name",
    
            "first_name",
    
            "dob",
    
            "zip",
    
            "city",
    
            "state",
    
            "country",
    
            "tel"
    
          ]
    
        }
    
      }
    
    }
    
    1 row in set, 1 warning (0.00 sec

    The used_key_parts show that the query plan has perfectly selected our desired columns covered in our composite index.

    Composite indexing has its limitations as well. Certain conditions in the query cannot take all columns part of the key.

    The documentation says, "The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =, <=>, or IS NULL. If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses = from the first comparison. It also uses >= from the second comparison but considers no further key parts and does not use the third comparison for interval construction…". Basically, this means that regardless you have composite index for two columns, a sample query below does not cover both fields:

    root[test]#> explain format=json select * from users_account where last_name>='Zu' and first_name='Maximus'\G
    
    *************************** 1. row ***************************
    
    EXPLAIN: {
    
      "query_block": {
    
        "select_id": 1,
    
        "cost_info": {
    
          "query_cost": "34.61"
    
        },
    
        "table": {
    
          "table_name": "users_account",
    
          "access_type": "range",
    
          "possible_keys": [
    
            "name"
    
          ],
    
          "key": "name",
    
          "used_key_parts": [
    
            "last_name"
    
          ],
    
          "key_length": "60",
    
          "rows_examined_per_scan": 24,
    
          "rows_produced_per_join": 2,
    
          "filtered": "10.00",
    
          "index_condition": "((`test`.`users_account`.`first_name` = 'Maximus') and (`test`.`users_account`.`last_name` >= 'Zu'))",
    
          "cost_info": {
    
            "read_cost": "34.13",
    
            "eval_cost": "0.48",
    
            "prefix_cost": "34.61",
    
            "data_read_per_join": "844"
    
          },
    
          "used_columns": [
    
            "id",
    
            "last_name",
    
            "first_name",
    
            "dob",
    
            "zip",
    
            "city",
    
            "state",
    
            "country",
    
            "tel"
    
          ]
    
        }
    
      }
    
    }
    
    1 row in set, 1 warning (0.00 sec)

    In this case (and if your query is more of ranges instead of constant or reference types) then avoid using composite indexes. It just wastes your memory and buffer and it increases the performance degradation of your queries.

    Prefix Indexes

    Prefix indexes are indexes which contain columns referenced as an index, but only takes the starting length defined to that column, and that portion (or prefix data) are the only part stored in the buffer. Prefix indexes can help lessen your buffer pool resources and also your disk space as it does not need to take the full-length of the column.What does this mean? Let's take an example. Let's compare the impact between full-length index versus the prefix index.

    root[test]#> create index name on users_account(last_name, first_name);
    
    Query OK, 0 rows affected (0.42 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#> \! du -hs /var/lib/mysql/test/users_account.*
    
    12K     /var/lib/mysql/test/users_account.frm
    
    36M     /var/lib/mysql/test/users_account.ibd

    We created a full-length composite index which consumes a total of 36MiB tablespace for users_account table. Let's drop it and then add a prefix index.

    root[test]#> drop index name on users_account;
    
    Query OK, 0 rows affected (0.01 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#> alter table users_account engine=innodb;
    
    Query OK, 0 rows affected (0.63 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#> \! du -hs /var/lib/mysql/test/users_account.*
    
    12K     /var/lib/mysql/test/users_account.frm
    
    24M     /var/lib/mysql/test/users_account.ibd
    
    
    
    
    
    
    root[test]#> create index name on users_account(last_name(5), first_name(5));
    
    Query OK, 0 rows affected (0.42 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#> \! du -hs /var/lib/mysql/test/users_account.*
    
    12K     /var/lib/mysql/test/users_account.frm
    
    28M     /var/lib/mysql/test/users_account.ibd

    Using the prefix index, it holds up only to 28MiB and that's less than 8MiB than using full-length index. That's great to hear, but it doesn't mean that is performant and serves what you need. 

    If you decide to add a prefix index, you must identify first what type of query for data retrieval you need. Creating a prefix index helps you utilize more efficiency with the buffer pool and so it does help with your query performance but you also need to know its limitation. For example, let's compare the performance when using a full-length index and a prefix index.

    Let's create a full-length index using a composite index,

    root[test]#> create index name on users_account(last_name, first_name);
    
    Query OK, 0 rows affected (0.45 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#>  EXPLAIN format=json select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G
    
    *************************** 1. row ***************************
    
    EXPLAIN: {
    
      "query_block": {
    
        "select_id": 1,
    
        "cost_info": {
    
          "query_cost": "1.61"
    
        },
    
        "table": {
    
          "table_name": "users_account",
    
          "access_type": "ref",
    
          "possible_keys": [
    
            "name"
    
          ],
    
          "key": "name",
    
          "used_key_parts": [
    
            "last_name",
    
            "first_name"
    
          ],
    
          "key_length": "60",
    
          "ref": [
    
            "const",
    
            "const"
    
          ],
    
          "rows_examined_per_scan": 3,
    
          "rows_produced_per_join": 3,
    
          "filtered": "100.00",
    
          "using_index": true,
    
          "cost_info": {
    
            "read_cost": "1.02",
    
            "eval_cost": "0.60",
    
            "prefix_cost": "1.62",
    
            "data_read_per_join": "1K"
    
          },
    
          "used_columns": [
    
            "last_name",
    
            "first_name"
    
          ]
    
        }
    
      }
    
    }
    
    1 row in set, 1 warning (0.00 sec)
    
    
    
    root[test]#> flush status;
    
    Query OK, 0 rows affected (0.02 sec)
    
    
    
    root[test]#> pager cat -> /dev/null; select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G
    
    PAGER set to 'cat -> /dev/null'
    
    3 rows in set (0.00 sec)
    
    
    
    root[test]#> nopager; show status like 'Handler_read%';
    
    PAGER set to stdout
    
    +-----------------------+-------+
    
    | Variable_name         | Value |
    
    +-----------------------+-------+
    
    | Handler_read_first    | 0 |
    
    | Handler_read_key      | 1 |
    
    | Handler_read_last     | 0 |
    
    | Handler_read_next     | 3 |
    
    | Handler_read_prev     | 0 |
    
    | Handler_read_rnd      | 0 |
    
    | Handler_read_rnd_next | 0     |
    
    +-----------------------+-------+
    
    7 rows in set (0.00 sec)

    The result reveals that it's, in fact, using a covering index i.e "using_index": true and uses indexes properly, i.e. Handler_read_key is incremented and does an index scan as Handler_read_next is incremented.

    Now, let's try using prefix index of the same approach,

    root[test]#> create index name on users_account(last_name(5), first_name(5));
    
    Query OK, 0 rows affected (0.22 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#>  EXPLAIN format=json select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G
    
    *************************** 1. row ***************************
    
    EXPLAIN: {
    
      "query_block": {
    
        "select_id": 1,
    
        "cost_info": {
    
          "query_cost": "3.60"
    
        },
    
        "table": {
    
          "table_name": "users_account",
    
          "access_type": "ref",
    
          "possible_keys": [
    
            "name"
    
          ],
    
          "key": "name",
    
          "used_key_parts": [
    
            "last_name",
    
            "first_name"
    
          ],
    
          "key_length": "10",
    
          "ref": [
    
            "const",
    
            "const"
    
          ],
    
          "rows_examined_per_scan": 3,
    
          "rows_produced_per_join": 3,
    
          "filtered": "100.00",
    
          "cost_info": {
    
            "read_cost": "3.00",
    
            "eval_cost": "0.60",
    
            "prefix_cost": "3.60",
    
            "data_read_per_join": "1K"
    
          },
    
          "used_columns": [
    
            "last_name",
    
            "first_name"
    
          ],
    
          "attached_condition": "((`test`.`users_account`.`first_name` = 'Maximus Aleksandre') and (`test`.`users_account`.`last_name` = 'Namuag'))"
    
        }
    
      }
    
    }
    
    1 row in set, 1 warning (0.00 sec)
    
    
    
    root[test]#> flush status;
    
    Query OK, 0 rows affected (0.01 sec)
    
    
    
    root[test]#> pager cat -> /dev/null; select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G
    
    PAGER set to 'cat -> /dev/null'
    
    3 rows in set (0.00 sec)
    
    
    
    root[test]#> nopager; show status like 'Handler_read%';
    
    PAGER set to stdout
    
    +-----------------------+-------+
    
    | Variable_name         | Value |
    
    +-----------------------+-------+
    
    | Handler_read_first    | 0 |
    
    | Handler_read_key      | 1 |
    
    | Handler_read_last     | 0 |
    
    | Handler_read_next     | 3 |
    
    | Handler_read_prev     | 0 |
    
    | Handler_read_rnd      | 0 |
    
    | Handler_read_rnd_next | 0     |
    
    +-----------------------+-------+
    
    7 rows in set (0.00 sec)

    MySQL reveals that it does use index properly but noticeably, there's a cost overhead compared to a full-length index. That's obvious and explainable, since the prefix index does not cover the whole length of the field values. Using a prefix index is not a replacement, nor an alternative, of full-length indexing. It can also create poor results when using the prefix index inappropriately. So you need to determine what type of query and data