Planet MariaDB

May 22, 2018

Shlomi Noach

MySQL master discovery methods, part 6: other methods

This is the sixth in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

Hard coded configuration deployment

You may use your source/config repo as master service discovery method of sorts.

The master's identity would be hard coded into your, say, git repo, to be updated and deployed to production upon failover.

This method is simple and I've seen it being used by companies, in production. Noteworthy:

  • This requires a dependency of production on source availability.
    • The failover tool would need to have access to your source environment.
  • This requires a dependency of production on build/deploy flow.
    • The failover tool would need to kick build, test, deploy process.
  • Code deployment time can be long.
  • Deployment must take place on all relevant hosts, and cause for a mass refresh/reload.
    • It should interrupt processes that cannot reload themselves, such as various commonly used scripts.

Synchronous replication

This series of posts is focused on asynchronous replication, but we will do well to point out a few relevant notes on sychnronous replication (Galera, XtraDB Cluster, InnoDB Cluster).

  • Synchronous replication can act in single-writer mode or in multi-writer mode.
  • In single writer mode, apps should connect to a particular master.
    • The identity of such master can be achieved by querying the MySQL members of the cluster.
  • In multi-writer mode, apps can connect to any healthy member of the cluster.
    • This still calls for a check: is the member healthy?
  • Syncronous replication is not intended to work well cross DC.

The last bullet should perhaps be highlighted. In a cross-DC setup, and for cross-DC failovers, we are back to same requirements as with asynchronous replication, and the methods illustrated in this series of posts may apply.

  • VIPs make less sense.
  • Proxy-based solution make a lot of sense.

All posts in this series

by shlomi at May 22, 2018 08:39 AM

Open Query Pty Ltd

How not to respect your users’ privacy

PrivacyYou just run the usual online frameworks, with their extensive plugin range, CDN, Google Analytics, NewRelic, Twitter, Facebook and LinkedIn widgets, and the rest.  Then, you display a notice to your users that your site uses cookies and passes some data to third parties (such as Google Analytics and NewRelic) “to enhance the user experience”.

There. Easy, right? You probably didn’t need to change anything at all. Most companies, sites and applications do this.  Now tell me: given that you probably agree with at least some of the above, how come you display a notice to your users explaining how you respect their privacy?  It can’t both be true.

So yes, this was a test.  And most of us fail, including us.  Why is this?

  1. Are you asking for and storing more data than you actually require for delivering the product or service that you provide?  You can probably only test this by working out the minimum data requirements, questioning each item, and then comparing that list with what you currently actually collect.  There’s likely to be a (large) discrepancy.
  2. Are you using multiple analytics and trackers?  Why?  It does in fact affect the user experience of your site, both in terms of speed as well as privacy.  And you probably don’t actually use all that data.  So think about what you actually use, and get rid of the rest.  That’s a good exercise and an excellent step.
  3. Does your site deliver pixel images for Facebook and others?  If so, why?
  4. Does your site show a “site seal” advertising your SSL certificate’s vendor?  If so, why?
  5. Does your site set one or more cookies for every user, rather than only logged-in users?  If so, why?
  6. Most CMS and frameworks actually make it difficult to not flood users with cookies and third-party tracking. They have become the new bloat.  Example: you use a component that includes a piece  of javascript or css off a vendor-provided CDN. Very convenient, but you’ve just provided site-usage data to that vendor as well as your users’ IP address.
  7. Respecting privacy is not “business as usual” + a notice. It’s just not.

So, privacy is actually really hard, and for a large part because our tools make it so.  They make it so not for your users’ convenience, or even your convenience, but for the vendors of said tools/components. You get some benefit, which in turn could benefit your users, but I think it’s worthwhile to really review what’s actually necessary and what’s not.

A marketing or sales person might easily say “more data is better”, but is it, really?  It affects site speed and user experience. And unless you’ve got your analytics tools really well organised, you’re actually going to find that all that extra data is overhead you don’t need in your company.  If you just collect and use what you really need, you’ll do well. Additionally, it’ll enable you to tell your users/clients honestly about what you do and why, rather than deliver a generic fudge-text as described in the first paragraph of this post.

A few quick hints to check your users’ privacy experience, without relying on third-party sites.

  • Install EFF’s Privacy Badger plugin.  It uses heuristics (rather than a fixed list) to identify suspected trackers and deal with them appropriately (allow, block cookies, block completely).  Privacy Badger provides you with an icon on the right of your location bar, showing a number indicating how many trackers the current page has.  If you click on the icon, you can see details and adjust.  And as a site-owner, you’ll want to adjust the site it rather than badger!
  • If you click on the left hand side of your location bar, on the secure icon (because you are already offering https, right?), you can also see details on cookies: both how many and to which domains. If you see any domains which are not yours, they’re caused by components (images, javascript, css) on your page that retrieve bits from elsewhere. Prepare to be shocked.
  • To see in more detail what bits an individual page uses, you can right-click on a page and select “Inspect” then go to the “Sources” tab.  Again, prepare to be shocked.

Use that shock well, to genuinely improve privacy – and thereby respect your users.

Aside from the ethics, I expect that these indicates (cookies, third-party resource requests, trackers, etc) will get used to rank sites and identify bad players. So there’ll be a business benefit in being ahead of this predictable trend.  And again, doing a clean-up will also make your site faster, as well as easier to use.

by Arjen Lentz at May 22, 2018 01:20 AM

May 21, 2018

MariaDB Foundation

MariaDB Foundation financial report for 2017

The 2017 accounting for the MariaDB Foundation has been completed and the key figures are: Total income: 476,952.38 USD Total expenses: 476,952.38 USD Net income after adjustments: 153,890.65 USD Staff costs were about 292 000 USD. Travel costs were only about 30 000 USD. The remaining 23 000 USD is administration (accounting, finances, legal) and other expenses. As […]

The post MariaDB Foundation financial report for 2017 appeared first on MariaDB.org.

by Otto Kekäläinen at May 21, 2018 02:49 PM

Jean-Jerome Schmidt

Understanding Deadlocks in MySQL & PostgreSQL

Working with databases, concurrency control is the concept that ensures that database transactions are performed concurrently without violating data integrity.

There is a lot of theory and different approaches around this concept and how to accomplish it, but we will briefly refer to the way that PostgreSQL and MySQL (when using InnoDB) handle it, and a common problem that can arise in highly concurrent systems: deadlocks.

These engines implement concurrency control by using a method called MVCC (Multiversion Concurrency Control). In this method, when an item is being updated, the changes will not overwrite the original data, but instead a new version of the item (with the changes) will be created. Thus we will have several versions of the item stored.

One of the main advantages of this model is that locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.

But, if several versions of the same item are stored, which version of it will a transaction see? To answer that question we need to review the concept of transaction isolation. Transactions specify an isolation level, that defines the degree to which one transaction must be isolated from resource or data modifications made by other transactions.This degree is directly related with the locking generated by a transaction, and so, as it can be specified at transaction level, it can determine the impact that a running transaction can have over other running transactions.

This is a very interesting and long topic, although we will not go into too much details in this blog. We’d recommend the PostgreSQL and MySQL official documentation for further reading on this topic.

So, why are we going into the above topics when dealing with deadlocks? Because sql commands will automatically acquire locks to ensure the MVCC behaviour, and the lock type acquired depends on the transaction isolation defined.

There are several types of locks (again another long and interesting topic to review for PostgreSQL and MySQL) but, the important thing about them, is how they interact (most exactly, how they conflict) with each other. Why is that? Because two transactions cannot hold locks of conflicting modes on the same object at the same time. And a non minor detail, once acquired, a lock is normally held till end of transaction.

This is a PostgreSQL example of how locking types conflict with each other:

PostgreSQL Locking types conflict
PostgreSQL Locking types conflict

And for MySQL:

MySQL Locking types conflict
MySQL Locking types conflict

X= exclusive lock         IX= intention exclusive lock
S= shared lock         IS= intention shared lock

So what happens when I have two running transactions that want to hold conflicting locks on the same object at the same time? One of them will get the lock and the other will have to wait.

So now we are in a position to truly understand what is happening during a deadlock.

What is a deadlock then? As you can imagine, there are several definitions for a database deadlock, but i like the following for its simplicity.

A database deadlock is a situation in which two or more transactions are waiting for one another to give up locks.

So for example, the following situation will lead us to a deadlock:

Deadlock example
Deadlock example

Here, the application A gets a lock on table 1 row 1 in order to make an update.

At the same time application B gets a lock on table 2 row 2.

Now application A needs to get a lock on table 2 row 2, in order to continue the execution and finish the transaction, but it cannot get the lock because it is held by application B. Application A needs to wait for application B to release it.

But application B needs to get a lock on table 1 row 1, in order to continue the execution and finish the transaction, but it cannot get the lock because it is held by application A.

So here we are in a deadlock situation. Application A is waiting for the resource held by application B in order to finish and application B is waiting for the resource held by application A. So, how to continue? The database engine will detect the deadlock and kill one of the transactions, unblocking the other one and raising a deadlock error on the killed one.

Let's check some PostgreSQL and MySQL deadlock examples:

PostgreSQL

Suppose we have a test database with information from the countries of the world.

world=# SELECT code,region,population FROM country WHERE code IN ('NLD','AUS');
code |          region           | population
------+---------------------------+------------
NLD  | Western Europe            |   15864000
AUS  | Australia and New Zealand |   18886000
(2 rows)

We have two sessions that want to make changes to the database.

The first session will modify the region field for the NLD code, and the population field for the AUS code.

The second session will modify the region field for the AUS code, and the population field for the NLD code.

Table data:

code: NLD
region: Western Europe
population: 15864000
code: AUS
region: Australia and New Zealand
population: 18886000

Session 1:

world=# BEGIN;
BEGIN
world=# UPDATE country SET region='Europe' WHERE code='NLD';
UPDATE 1

Session 2:

world=# BEGIN;
BEGIN
world=# UPDATE country SET region='Oceania' WHERE code='AUS';
UPDATE 1
world=# UPDATE country SET population=15864001 WHERE code='NLD';

Session 2 will hang waiting for Session 1 to finish.

Session 1:

world=# UPDATE country SET population=18886001 WHERE code='AUS';

ERROR:  deadlock detected
DETAIL:  Process 1181 waits for ShareLock on transaction 579; blocked by process 1148.
Process 1148 waits for ShareLock on transaction 578; blocked by process 1181.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (0,15) in relation "country"

Here we have our deadlock. The system detected the deadlock and killed session 1.

Session 2:

world=# BEGIN;
BEGIN
world=# UPDATE country SET region='Oceania' WHERE code='AUS';
UPDATE 1
world=# UPDATE country SET population=15864001 WHERE code='NLD';
UPDATE 1

And we can check that the second session finished correctly after the deadlock was detected and the Session 1 was killed (thus, the lock was released).

To have more details we can see the log in our PostgreSQL server:

2018-05-16 12:56:38.520 -03 [1181] ERROR:  deadlock detected
2018-05-16 12:56:38.520 -03 [1181] DETAIL:  Process 1181 waits for ShareLock on transaction 579; blocked by process 1148.
       Process 1148 waits for ShareLock on transaction 578; blocked by process 1181.
       Process 1181: UPDATE country SET population=18886001 WHERE code='AUS';
       Process 1148: UPDATE country SET population=15864001 WHERE code='NLD';
2018-05-16 12:56:38.520 -03 [1181] HINT:  See server log for query details.
2018-05-16 12:56:38.520 -03 [1181] CONTEXT:  while updating tuple (0,15) in relation "country"
2018-05-16 12:56:38.520 -03 [1181] STATEMENT:  UPDATE country SET population=18886001 WHERE code='AUS';
2018-05-16 12:59:50.568 -03 [1181] ERROR:  current transaction is aborted, commands ignored until end of transaction block

Here we will be able to see the actual commands that were detected on deadlock.

Download the Whitepaper Today
 
PostgreSQL Management & Automation with ClusterControl
Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

MySQL

To simulate a deadlock in MySQL we can do the following.

As with PostgreSQL, suppose we have a test database with information on actors and movies among other things.

mysql> SELECT first_name,last_name FROM actor WHERE actor_id IN (1,7);
+------------+-----------+
| first_name | last_name |
+------------+-----------+
| PENELOPE   | GUINESS   |
| GRACE      | MOSTEL    |
+------------+-----------+
2 rows in set (0.00 sec)

We have two processes that want to make changes to the database.

The first process will modify the field first_name for actor_id 1, and the field last_name for actor_id 7.

The second process will modify the field first_name for actor_id 7, and the field last_name for actor_id 1.

Table data:

actor_id: 1
first_name: PENELOPE
last_name: GUINESS
actor_id: 7
first_name: GRACE
last_name: MOSTEL

Session 1:

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)
mysql> UPDATE actor SET first_name='GUINESS' WHERE actor_id='1';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Session 2:

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)
mysql> UPDATE actor SET first_name='MOSTEL' WHERE actor_id='7';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> UPDATE actor SET last_name='PENELOPE' WHERE actor_id='1';

Session 2 will hang waiting for Session 1 to finish.

Session 1:

mysql> UPDATE actor SET last_name='GRACE' WHERE actor_id='7';

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Here we have our deadlock. The system detected the deadlock and killed session 1.

Session 2:

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)
mysql> UPDATE actor SET first_name='MOSTEL' WHERE actor_id='7';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> UPDATE actor SET last_name='PENELOPE' WHERE actor_id='1';
Query OK, 1 row affected (8.52 sec)
Rows matched: 1  Changed: 1  Warnings: 0

As we can see in the error, as we saw for PostgreSQL, there is a deadlock between both processes.

For more details we can use the command SHOW ENGINE INNODB STATUS\G:

mysql> SHOW ENGINE INNODB STATUS\G
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-05-16 18:55:46 0x7f4c34128700
*** (1) TRANSACTION:
TRANSACTION 1456, ACTIVE 33 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 54, OS thread handle 139965388506880, query id 15876 localhost root updating
UPDATE actor SET last_name='PENELOPE' WHERE actor_id='1'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1456 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0001; asc   ;;
1: len 6; hex 0000000005af; asc       ;;
2: len 7; hex 2d000001690110; asc -   i  ;;
3: len 7; hex 4755494e455353; asc GUINESS;;
4: len 7; hex 4755494e455353; asc GUINESS;;
5: len 4; hex 5afca8b3; asc Z   ;;

*** (2) TRANSACTION:
TRANSACTION 1455, ACTIVE 47 sec starting index read, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 53, OS thread handle 139965267871488, query id 16013 localhost root updating
UPDATE actor SET last_name='GRACE' WHERE actor_id='7'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1455 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0001; asc   ;;
1: len 6; hex 0000000005af; asc       ;;
2: len 7; hex 2d000001690110; asc -   i  ;;
3: len 7; hex 4755494e455353; asc GUINESS;;
4: len 7; hex 4755494e455353; asc GUINESS;;
5: len 4; hex 5afca8b3; asc Z   ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1455 lock_mode X locks rec but not gap waiting
Record lock, heap no 202 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0007; asc   ;;
1: len 6; hex 0000000005b0; asc       ;;
2: len 7; hex 2e0000016a0110; asc .   j  ;;
3: len 6; hex 4d4f5354454c; asc MOSTEL;;
4: len 6; hex 4d4f5354454c; asc MOSTEL;;
5: len 4; hex 5afca8c1; asc Z   ;;

*** WE ROLL BACK TRANSACTION (2)

Under the title "LATEST DETECTED DEADLOCK", we can see details of our deadlock.

To see the detail of the deadlock in the mysql error log, we must enable the option innodb_print_all_deadlocks in our database.

mysql> set global innodb_print_all_deadlocks=1;
Query OK, 0 rows affected (0.00 sec)

MySQL Log Error:

2018-05-17T18:36:58.341835Z 12 [Note] InnoDB: Transactions deadlock detected, dumping detailed information.
2018-05-17T18:36:58.341869Z 12 [Note] InnoDB:
*** (1) TRANSACTION:
 
TRANSACTION 1812, ACTIVE 42 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140515492943616, query id 8467 localhost root updating
UPDATE actor SET last_name='PENELOPE' WHERE actor_id='1'
2018-05-17T18:36:58.341945Z 12 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
 
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1812 lock_mode X locks rec but not gap waiting
Record lock, heap no 204 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0001; asc   ;;
1: len 6; hex 000000000713; asc       ;;
2: len 7; hex 330000016b0110; asc 3   k  ;;
3: len 7; hex 4755494e455353; asc GUINESS;;
4: len 7; hex 4755494e455353; asc GUINESS;;
5: len 4; hex 5afdcb89; asc Z   ;;
 
2018-05-17T18:36:58.342347Z 12 [Note] InnoDB: *** (2) TRANSACTION:
 
TRANSACTION 1811, ACTIVE 65 sec starting index read, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 12, OS thread handle 140515492677376, query id 9075 localhost root updating
UPDATE actor SET last_name='GRACE' WHERE actor_id='7'
2018-05-17T18:36:58.342409Z 12 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
 
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1811 lock_mode X locks rec but not gap
Record lock, heap no 204 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0001; asc   ;;
1: len 6; hex 000000000713; asc       ;;
2: len 7; hex 330000016b0110; asc 3   k  ;;
3: len 7; hex 4755494e455353; asc GUINESS;;
4: len 7; hex 4755494e455353; asc GUINESS;;
5: len 4; hex 5afdcb89; asc Z   ;;
 
2018-05-17T18:36:58.342793Z 12 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
 
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1811 lock_mode X locks rec but not gap waiting
Record lock, heap no 205 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0007; asc   ;;
1: len 6; hex 000000000714; asc       ;;
2: len 7; hex 340000016c0110; asc 4   l  ;;
3: len 6; hex 4d4f5354454c; asc MOSTEL;;
4: len 6; hex 4d4f5354454c; asc MOSTEL;;
5: len 4; hex 5afdcba0; asc Z   ;;
 
2018-05-17T18:36:58.343105Z 12 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)

Taking into account what we have learned above about why deadlocks happen, you can see that there is not much we can do on the database side to avoid them. Anyway, as DBAs it is our duty to actually catch them, analyze them, and provide feedback to the developers.

The reality is that these errors are particular to each application, so you will need to check them one by one and there is not guide to tell you how to troubleshoot this. Keeping this in mind, there are some things you can look for.

Search for long running transactions. As the locks are usually held until the end of a transaction, the longer the transaction , the longer the locks over the resources. If it is possible, try to split long running transactions into smaller/faster ones.

Sometimes it is not possible to actually split the transactions, so the work should focus on trying to execute those operations in a consistent order each time, so transactions form well-defined queues and do not deadlock.

One workaround that you can also propose is to add retry logic into the application (of course, try to solve the underlying issue first) in a way that, if a deadlock happens, the application will to run the same commands again.

Check the isolation levels used, sometimes you try by changing them. Look for commands like SELECT FOR UPDATE, and SELECT FOR SHARE, as they generate explicit locks, and evaluate if they are really needed or you can work with an older snapshot of the data. One thing you can try if you cannot remove these commands is using a lower isolation level such as READ COMMITTED.

Of course, always add well-chosen indexes to your tables. Then your queries need scan fewer index records and consequently set fewer locks.

On a higher level, as a DBA you can take some precautions to minimize locking in general. For naming one example, in this case for PostgreSQL, you can avoid adding a default value in the same command that you will add a column. Altering a table will get a really aggressive lock, and setting a default value for it will actually update the existing rows that have null values, making this operation take really long. So if you split this operation into several commands, adding the column, adding the default, updating the null values, you will minimize the locking impact.

Of course there are tons of tips like this that the DBAs get with the practice (creating indexes concurrently, create the pk index separately before adding the pk,and so on), but the important thing is to learn and understand this "way of thinking" and always to minimize the lock impact of the operations we are doing.

by Sebastian Insausti at May 21, 2018 10:16 AM

May 18, 2018

Jean-Jerome Schmidt

Cloud Disaster Recovery for MariaDB and MySQL

MySQL has a long tradition in geographic replication. Distributing clusters to remote data centers reduces the effects of geographic latency by pushing data closer to the user. It also provides a capability for disaster recovery. Due to the significant cost of duplicating hardware in a separate site, not many companies were able to afford it in the past. Another cost is skilled staff who is able to design, implement and maintain a sophisticated multiple data centers environment.

With the Cloud and DevOps automation revolution, having distributed datacenter has never been more accessible to the masses. Cloud providers are increasing the range of services they offer for a better price.One can build cross-cloud, hybrid environments with data spread all over the world. One can make flexible and scalable DR plans to approach a broad range of disruption scenarios. In some cases, that can just be a backup stored offsite. In other cases, it can be a 1 to 1 copy of a production environment running somewhere else.

In this blog we will take a look at some of these cases, and address common scenarios.

Storing Backups in the Cloud

A DR plan is a general term that describes a process to recover disrupted IT systems and other critical assets an organization uses. Backup is the primary method to achieve this. When a backup is in the same data center as your production servers, you risk that all data may be wiped out in case you lose that data center. To avoid that, you should have the policy to create a copy in another physical location. It's still a good practice to keep a backup on disk to reduce the time needed to restore. In most cases, you will keep your primary backup in the same data center (to minimize restore time), but you should also have a backup that can be used to restore business procedures when primary datacenter is down.

ClusterControl: Upload Backup to the cloud
ClusterControl: Upload Backup to the cloud

ClusterControl allows seamless integration between your database environment and the cloud. It provides options for migrating data to the cloud. We offer a full combination of database backups for Amazon Web Services (AWS), Google Cloud Services or Microsoft Azure. Backups can now be executed, scheduled, downloaded and restored directly from your cloud provider of choice. This ability provides increased redundancy, better disaster recovery options, and benefits in both performance and cost savings.

ClusterControl: Managing Cloud Credentials
ClusterControl: Managing Cloud Credentials

The first step to set up "data center failure - proof backup" is to provide credentials for your cloud operator. You can choose from multiple vendors here. Let's take a look at the process set up for the most popular cloud operator - AWS.

ClusterControl: adding cloud credentials
ClusterControl: adding cloud credentials

All you need is the AWS Key ID and the secret for the region where you want to store your backup. You can get that from AWS console. You can follow a few steps to get it.

  1. Use your AWS account email address and password to sign in to the AWS Management Console as the AWS account root user.
  2. On the IAM Dashboard page, choose your account name in the navigation bar, and then select My Security Credentials.
  3. If you see a warning about accessing the security credentials for your AWS account, choose to Continue to Security Credentials.
  4. Expand the Access keys (access key ID and secret access key) section.
  5. Choose to Create New Access Key. Then choose Download Key File to save the access key ID and secret access key to a file on your computer. After you close the dialog box, you will not be able to retrieve this secret access key again.
ClusterControl: Hybrid cloud backup
ClusterControl: Hybrid cloud backup

When all is set, you can adjust your backup schedule and enable backup to cloud option. To reduce network traffic make sure to enable data compression. It makes backups smaller and minimizes the time needed for upload. Another good practice is to encrypt the backup. ClusterControl creates a key automatically and uses it if you decide to restore it. Advanced backup policies should have different keep times for backups stored on servers in the same datacenter, and the backups stored in another physical location. You should set a more extended retention period for cloud-based backups, and shorter period for backups stored near the production environment, as the probability of restore drops with the backup lifetime.

ClusterControl: backup retention policy
ClusterControl: backup retention policy

Extend your cluster with asynchronous replication

Galera with asynchronous replication can be an excellent solution to build an active DR node in a remote data center. There are a few good reasons to attach an asynchronous slave to a Galera Cluster. Long-running OLAP type queries on a Galera node might slow down a whole cluster. With delay apply option, delayed replication can save you from human errors so all those golden enters will be not immediately applied to your backup node.

ClusterControl: delayed replication
ClusterControl: delayed replication

In ClusterControl, extending a Galera node group with asynchronous replication is done in a single page wizard. You need to provide the necessary information about your future or existing slave server. The slave will be set up from an existing backup, or a freshly streamed XtraBackup from the master to the slave.

Load balancers in multi-datacenter

Load balancers are a crucial component in MySQL and MariaDB database high availability. It’s not enough to have a cluster spanning across multiple data centers. You still need your services to access them. A failure of a load balancer that is available in one data center will make your entire environment unreachable.

Web proxies in cluster environment
Web proxies in cluster environment

One of the popular methods to hide the complexity of the database layer from an application is to use a proxy. Proxies act as an entry point to the databases, they track the state of the database nodes and should always direct traffic to only the nodes that are available. ClusterControl makes it easy to deploy and configure several different load balancing technologies for MySQL and MariaDB, including ProxySQL, HAProxy, with a point-and-click graphical interface.

ClusterControl: load balancer HA
ClusterControl: load balancer HA

It also allows making this component redundant by adding keepalived on top of it. To prevent your load balancers from being a single point of failure, one would set up two identical (one active and one in different DC as standby) HAProxy, ProxySQL or MariaDB Maxscale instances and use Keepalived to run Virtual Router Redundancy Protocol (VRRP) between them. VRRP provides a Virtual IP address to the active load balancer and transfers the Virtual IP to the standby HAProxy in case of failure. It is seamless because the two proxy instances need no shared state.

Of course, there are many things to consider to make your databases immune to data center failures.
Proper planning and automation will make it work! Happy Clustering!

by Bart Oles at May 18, 2018 12:17 PM

MariaDB Foundation

MariaDB 10.2.15 and MariaDB Connector/J 2.2.4 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.2.15, the latest stable release in the MariaDB 10.2 series, and MariaDB Connector/J 2.2.4, the latest stable release in the MariaDB Connector/J 2.2 series. See the release notes and changelogs for details. Download MariaDB 10.2.15 Release Notes Changelog What is MariaDB 10.2? MariaDB APT […]

The post MariaDB 10.2.15 and MariaDB Connector/J 2.2.4 now available appeared first on MariaDB.org.

by Ian Gilfillan at May 18, 2018 08:07 AM

MariaDB AB

MariaDB Server 10.2.15 and Connector/J 2.2.4 now available

MariaDB Server 10.2.15 and Connector/J 2.2.4 now available dbart Fri, 05/18/2018 - 01:08

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.2.15 and MariaDB Connector/J 2.2.4. See the release notes and changelogs for details and visit mariadb.com/downloads to download.

Download MariaDB Server 10.2.15

Release Notes Changelog What is MariaDB 10.2?


Download MariaDB Connector/J 2.2.4

Release Notes Changelog About MariaDB Connector/J

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.2.15 and MariaDB Connector/J 2.2.4. See the release notes and changelogs for details.

Login or Register to post comments

by dbart at May 18, 2018 05:08 AM

May 15, 2018

Peter Zaitsev

About ZFS Performance

ZFS

If you are a regular reader of this blog, you likely know I like the ZFS filesystem a lot. ZFS has many very interesting features, but I am a bit tired of hearing negative statements on ZFS performance. It feels a bit like people are telling me “Why do you use InnoDB? I have read that MyISAM is faster.” I found the comparison of InnoDB vs. MyISAM quite interesting, and I’ll use it in this post.

To have some data to support my post, I started an AWS i3.large instance with a 1000GB gp2 EBS volume. A gp2 volume of this size is interesting because it is above the burst IOPS level, so it offers a constant 3000 IOPS performance level.

I used sysbench to create a table of 10M rows and then, using export/import tablespace, I copied it 329 times. I ended up with 330 tables for a total size of about 850GB. The dataset generated by sysbench is not very compressible, so I used lz4 compression in ZFS. For the other ZFS settings, I used what can be found in my earlier ZFS posts but with the ARC size limited to 1GB. I then used that plain configuration for the first benchmarks. Here are the results with the sysbench point-select benchmark, a uniform distribution and eight threads. The InnoDB buffer pool was set to 2.5GB.

In both cases, the load is IO bound. The disk is doing exactly the allowed 3000 IOPS. The above graph appears to be a clear demonstration that XFS is much faster than ZFS, right? But is that really the case? The way the dataset has been created is extremely favorable to XFS since there is absolutely no file fragmentation. Once you have all the files opened, a read IOP is just a single fseek call to an offset and ZFS doesn’t need to access any intermediate inode. The above result is about as fair as saying MyISAM is faster than InnoDB based only on table scan performance results of unfragmented tables and default configuration. ZFS is much less affected by the file level fragmentation, especially for point access type.

More on ZFS metadata

ZFS stores the files in B-trees in a very similar fashion as InnoDB stores data. To access a piece of data in a B-tree, you need to access the top level page (often called root node) and then one block per level down to a leaf-node containing the data. With no cache, to read something from a three levels B-tree thus requires 3 IOPS.

Simple three levels B-tree

The extra IOPS performed by ZFS are needed to access those internal blocks in the B-trees of the files. These internal blocks are labeled as metadata. Essentially, in the above benchmark, the ARC is too small to contain all the internal blocks of the table files’ B-trees. If we continue the comparison with InnoDB, it would be like running with a buffer pool too small to contain the non-leaf pages. The test dataset I used has about 600MB of non-leaf pages, about 0.1% of the total size, which was well cached by the 3GB buffer pool. So only one InnoDB page, a leaf page, needed to be read per point-select statement.

To correctly set the ARC size to cache the metadata, you have two choices. First, you can guess values for the ARC size and experiment. Second, you can try to evaluate it by looking at the ZFS internal data. Let’s review these two approaches.

You’ll read/hear often the ratio 1GB of ARC for 1TB of data, which is about the same 0.1% ratio as for InnoDB. I wrote about that ratio a few times, having nothing better to propose. Actually, I found it depends a lot on the recordsize used. The 0.1% ratio implies a ZFS recordsize of 128KB. A ZFS filesystem with a recordsize of 128KB will use much less metadata than another one using a recordsize of 16KB because it has 8x fewer leaf pages. Fewer leaf pages require less B-tree internal nodes, hence less metadata. A filesystem with a recordsize of 128KB is excellent for sequential access as it maximizes compression and reduces the IOPS but it is poor for small random access operations like the ones MySQL/InnoDB does.

To determine the correct ARC size, you can slowly increase the ARC size and monitor the number of metadata cache-misses with the arcstat tool. Here’s an example:

# echo 1073741824 > /sys/module/zfs/parameters/zfs_arc_max
# arcstat -f time,arcsz,mm%,mhit,mread,dread,pread 10
    time  arcsz  mm%  mhit  mread  dread  pread
10:22:49   105M    0     0     0      0      0
10:22:59   113M  100     0    22     73      0
10:23:09   120M  100     0    20     68      0
10:23:19   127M  100     0    20     65      0
10:23:29   135M  100     0    22     74      0

You’ll want the ‘mm%’, the metadata missed percent, to reach 0. So when the ‘arcsz’ column is no longer growing and you still have high values for ‘mm%’, that means the ARC is too small. Increase the value of ‘zfs_arc_max’ and continue to monitor.

If the 1GB of ARC for 1TB of data ratio is good for large ZFS recordsize, it is likely too small for a recordsize of 16KB. Does 8x more leaf pages automatically require 8x more ARC space for the non-leaf pages? Although likely, let’s verify.

The second option we have is the zdb utility that comes with ZFS, which allows us to view many internal structures including the B-tree list of pages for a given file. The tool needs the inode of a file and the ZFS filesystem as inputs. Here’s an invocation for one of the tables of my dataset:

# cd /var/lib/mysql/data/sbtest
# ls -li | grep sbtest1.ibd
36493 -rw-r----- 1 mysql mysql 2441084928 avr 15 15:28 sbtest1.ibd
# zdb -ddddd mysqldata/data 36493 > zdb5d.out
# more zdb5d.out
Dataset mysqldata/data [ZPL], ID 90, cr_txg 168747, 4.45G, 26487 objects, rootbp DVA[0]=<0:1a50452800:200> DVA[1]=<0:5b289c1600:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=3004977L/3004977P fill=26487 cksum=13723d4400:5d1f47fb738:fbfb87e6e278:1f30c12b7fa1d1
    Object  lvl   iblk   dblk  dsize  lsize   %full  type
     36493    4    16K    16K  1.75G  2.27G   97.62  ZFS plain file
                                        168   bonus  System attributes
        dnode flags: USED_BYTES USERUSED_ACCOUNTED
        dnode maxblkid: 148991
        path    /var/lib/mysql/data/sbtest/sbtest1.ibd
        uid     103
        gid     106
        atime   Sun Apr 15 15:04:13 2018
        mtime   Sun Apr 15 15:28:45 2018
        ctime   Sun Apr 15 15:28:45 2018
        crtime  Sun Apr 15 15:04:13 2018
        gen     3004484
        mode    100640
        size    2441084928
        parent  36480
        links   1
        pflags  40800000004
Indirect blocks:
               0 L3    0:1a4ea58800:400 4000L/400P F=145446 B=3004774/3004774
               0  L2   0:1c83454c00:1800 4000L/1800P F=16384 B=3004773/3004773
               0   L1  0:1eaa626400:1600 4000L/1600P F=128 B=3004773/3004773
               0    L0 0:1c6926ec00:c00 4000L/c00P F=1 B=3004773/3004773
            4000    L0 EMBEDDED et=0 4000L/6bP B=3004484
            8000    L0 0:1c69270c00:400 4000L/400P F=1 B=3004773/3004773
            c000    L0 0:1c7fbae400:800 4000L/800P F=1 B=3004736/3004736
           10000    L0 0:1ce3f53600:3200 4000L/3200P F=1 B=3004484/3004484
           14000    L0 0:1ce3f56800:3200 4000L/3200P F=1 B=3004484/3004484
           18000    L0 0:18176fa600:3200 4000L/3200P F=1 B=3004485/3004485
           1c000    L0 0:18176fd800:3200 4000L/3200P F=1 B=3004485/3004485
           ...
           [more than 140k lines truncated]

The last section of the above output is very interesting as it shows the B-tree pages. The ZFSB-tree of the file sbtest1.ibd has four levels. L3 is the root page, L2 is the first level (from the top) pages, L1 are the second level pages, and L0 are the leaf pages. The metadata is essentially L3 + L2 + L1. When you change the recordsize property of a ZFS filesystem, you affect only the size of the leaf pages.

The non-leaf page size is always 16KB (4000L) and they are always compressed on disk with lzop (If I read correctly). In the ARC, these pages are stored uncompressed so they use 16KB of memory each. The fanout of a ZFS B-tree, the largest possible ratio of a number of pages between levels, is 128. With the above output, we can easily calculate the required size of metadata we would need to cache all the non-leaf pages in the ARC.

# grep -c L3 zdb5d.out
1
# grep -c L2 zdb5d.out
9
# grep -c L1 zdb5d.out
1150
# grep -c L0 zdb5d.out
145447

So, each of the 330 tables of the dataset has 1160 non-leaf pages and 145447 leaf pages; a ratio very close to the prediction of 0.8%. For the complete dataset of 749GB, we would need the ARC to be, at a minimum, 6GB to fully cache all the metadata pages. Of course, there is some overhead to add. In my experiments, I found I needed to add about 15% for ARC overhead in order to have no metadata reads at all. The real minimum for the ARC size I should have used is almost 7GB.

Of course, an ARC of 7GB on a server with 15GB of Ram is not small. Is there a way to do otherwise? The first option we have is to use a larger InnoDB page size, as allowed by MySQL 5.7. Instead of the regular Innodb page size of 16KB, if you use a page size of 32KB with a matching ZFS recordsize, you will cut the ARC size requirement by half, to 0.4% of the uncompressed size.

Similarly, an Innodb page size of 64KB with similar ZFS recordsize would further reduce the ARC size requirement to 0.2%. That approach works best when the dataset is highly compressible. I’ll blog more about the use of larger InnoDB pages with ZFS in a near future. If the use of larger InnoDB page sizes is not a viable option for you, you still have the option of using the ZFS L2ARC feature to save on the required memory.

So, let’s proposed a new rule of thumb for the required ARC/L2ARC size for a a given dataset:

  • Recordsize of 128KB => 0.1% of the uncompressed dataset size
  • Recordsize of 64KB => 0.2% of the uncompressed dataset size
  • Recordsize of 32KB => 0.4% of the uncompressed dataset size
  • Recordsize of 16KB => 0.8% of the uncompressed dataset size

The ZFS revenge

In order to improve ZFS performance, I had 3 options:

  1. Increase the ARC size to 7GB
  2. Use a larger Innodb page size like 64KB
  3. Add a L2ARC

I was reluctant to grow the ARC to 7GB, which was nearly half the overall system memory. At best, the ZFS performance would only match XFS. A larger InnoDB page size would increase the CPU load for decompression on an instance with only two vCPUs; not great either. The last option, the L2ARC, was the most promising.

The choice of an i3.large instance type is not accidental. The instance has a 475GB ephemeral NVMe storage device. Let’s try to use this storage for the ZFS L2ARC. The warming of an L2ARC device is not exactly trivial. In my case, with a 1GB ARC, I used:

echo 1073741824 > /sys/module/zfs/parameters/zfs_arc_max
echo 838860800 > /sys/module/zfs/parameters/zfs_arc_meta_limit
echo 67108864 > /sys/module/zfs/parameters/l2arc_write_max
echo 134217728 > /sys/module/zfs/parameters/l2arc_write_boost
echo 4 > /sys/module/zfs/parameters/l2arc_headroom
echo 16 > /sys/module/zfs/parameters/l2arc_headroom_boost
echo 0 > /sys/module/zfs/parameters/l2arc_norw
echo 1 > /sys/module/zfs/parameters/l2arc_feed_again
echo 5 > /sys/module/zfs/parameters/l2arc_feed_min_ms
echo 0 > /sys/module/zfs/parameters/l2arc_noprefetch

I then ran ‘cat /var/lib/mysql/data/sbtest/* > /dev/null’ to force filesystem reads and caches on all of the tables. A key setting here to allow the L2ARC to cache data is the zfs_arc_meta_limit. It needs to be slightly smaller than the zfs_arc_max in order to allow some data to be cache in the ARC. Remember that the L2ARC is fed by the LRU of the ARC. You need to cache data in the ARC in order to have data cached in the L2ARC. Using lz4 in ZFS on the sysbench dataset results in a compression ration of only 1.28x. A more realistic dataset would compress by more than 2x, if not 3x. Nevertheless, since the content of the L2ARC is compressed, the 475GB device caches nearly 600GB of the dataset. The figure below shows the sysbench results with the L2ARC enabled:

Now, the comparison is very different. ZFS completely outperforms XFS, 5000 qps for ZFS versus 3000 for XFS. The ZFS results could have been even higher but the two vCPUs of the instance were clearly the bottleneck. Properly configured, ZFS can be pretty fast. Of course, I could use flashcache or bcache with XFS and improve the XFS results but these technologies are way more exotic than the ZFS L2ARC. Also, only the L2ARC stores data in a compressed form, maximizing the use of the NVMe device. Compression also lowers the size requirement and cost for the gp2 disk.

ZFS is much more complex than XFS and EXT4 but, that also means it has more tunables/options. I used a simplistic setup and an unfair benchmark which initially led to poor ZFS results. With the same benchmark, very favorable to XFS, I added a ZFS L2ARC and that completely reversed the situation, more than tripling the ZFS results, now 66% above XFS.

Conclusion

We have seen in this post why the general perception is that ZFS under-performs compared to XFS or EXT4. The presence of B-trees for the files has a big impact on the amount of metadata ZFS needs to handle, especially when the recordsize is small. The metadata consists mostly of the non-leaf pages (or internal nodes) of the B-trees. When properly cached, the performance of ZFS is excellent. ZFS allows you to optimize the use of EBS volumes, both in term of IOPS and size when the instance has fast ephemeral storage devices. Using the ephemeral device of an i3.large instance for the ZFS L2ARC, ZFS outperformed XFS by 66%.

The post About ZFS Performance appeared first on Percona Database Performance Blog.

by Yves Trudeau at May 15, 2018 06:59 PM

Jean-Jerome Schmidt

Updated: Become a ClusterControl DBA: Safeguarding your Data

In the past four posts of the blog series, we covered deployment of clustering/replication (MySQL/Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health and in the last post, how to make your setup highly available through HAProxy and ProxySQL.

So now that you have your databases up and running and highly available, how do you ensure that you have backups of your data?

You can use backups for multiple things: disaster recovery, to provide production data to test against development or even to provision a slave node. This last case is already covered by ClusterControl. When you add a new (replica) node to your replication setup, ClusterControl will make a backup/snapshot of the master node and use it to build the replica. It can also use an existing backup to stage the replica, in case you want to avoid that extra load on the master. After the backup has been extracted, prepared and the database is up and running, ClusterControl will automatically set up replication.

Creating an Instant Backup

In essence, creating a backup is the same for Galera, MySQL replication, PostgreSQL and MongoDB. You can find the backup section under ClusterControl > Backup and by default you would see a list of created backup of the cluster (if any). Otherwise, you would see a placeholder to create a backup:

From here you can click on the "Create Backup" button to make an instant backup or schedule a new backup:

All created backups can also be uploaded to cloud by toggling "Upload Backup to the Cloud", provided you supply working cloud credentials. By default, all backups older than 31 days will be deleted (configurable via Backup Retention settings) or you can choose to keep it forever or define a custom period.

"Create Backup" and "Schedule Backup" share similar options except the scheduling part and incremental backup options for the latter. Therefore, we are going to look into Create Backup feature (a.k.a instant backup) in more depth.

As all these various databases have different backup tools, there is obviously some difference in the options you can choose. For instance with MySQL you get to choose between mysqldump and xtrabackup (full and incremental). For MongoDB, ClusterControl supports mongodump and mongodb-consistent-backup (beta) while PostgreSQL, pg_dump and pg_basebackup are supported. If in doubt which one to choose for MySQL, check out this blog about the differences and use cases for mysqldump and xtrabackup.

Backing up MySQL and Galera

As mentioned in the previous paragraph, you can make MySQL backups using either mysqldump or xtrabackup (full or incremental). In the "Create Backup" wizard, you can choose which host you want to run the backup on, the location where you want to store the backup files, and its directory and specific schemas (xtrabackup) or schemas and tables (mysqldump).

If the node you are backing up is receiving (production) traffic, and you are afraid the extra disk writes will become intrusive, it is advised to send the backups to the ClusterControl host by choosing "Store on Controller" option. This will cause the backup to stream the files over the network to the ClusterControl host and you have to make sure there is enough space available on this node and the streaming port is opened on the ClusterControl host.

There are also several other options whether you would want to use compression and the compression level. The higher the compression level is, the smaller the backup size will be. However, it requires higher CPU usage for the compression and decompression process.

If you would choose xtrabackup as the method for the backup, it would open up extra options: desync, backup locks, compression and xtrabackup parallel threads/gzip. The desync option is only applicable to desync a node from a Galera cluster. Backup locks uses a new MDL lock type to block updates to non-transactional tables and DDL statements for all tables which is more efficient for InnoDB-specific workload. If you are running on Galera Cluster, enabling this option is recommended.

After scheduling an instant backup you can keep track of the progress of the backup job in the Activity > Jobs:

After it has finished, you should be able to see the a new entry under the backup list.

Backing up PostgreSQL

Similar to the instant backups of MySQL, you can run a backup on your Postgres database. With Postgres backups there are two backup methods supported - pg_dumpall or pg_basebackup. Take note that ClusterControl will always perform a full backup regardless of the chosen backup method.

We have covered this aspect in this details in Become a PostgreSQL DBA - Logical & Physical PostgreSQL Backups.

Backing up MongoDB

For MongoDB, ClusterControl supports the standard mongodump and mongodb-consistent-backup developed by Percona. The latter is still in beta version which provides cluster-consistent point-in-time backups of MongoDB suitable for sharded cluster setups. As the sharded MongoDB cluster consists of multiple replica sets, a config replica set and shard servers, it is very difficult to make a consistent backup using only mongodump.

Note that in the wizard, you don't have to pick a database node to be backed up. ClusterControl will automatically pick the healthiest secondary replica as the backup node. Otherwise, the primary will be selected. When the backup is running, the selected backup node will be locked until the backup process completes.

Scheduling Backups

Now that we have played around with creating instant backups, we now can extend that by scheduling the backups.

The scheduling is very easy to do: you can select on which days the backup has to be made and at what time it needs to run.

For xtrabackup there is an additional feature: incremental backups. An incremental backup will only backup the data that changed since the last backup. Of course, the incremental backups are useless if there would not be full backup as a starting point. Between two full backups, you can have as many incremental backups as you like. But restoring them will take longer.

Once scheduled the job(s) should become visible under the "Scheduled Backup" tab and you can edit them by clicking on the "Edit" button. Like with the instant backups, these jobs will schedule the creation of a backup and you can keep track of the progress via the Activity tab.

Backup List

You can find the Backup List under ClusterControl > Backup and this will give you a cluster level overview of all backups made. Clicking on each entry will expand the row and expose more information about the backup:

Each backup is accompanied with a backup log when ClusterControl executed the job, which is available under "More Actions" button.

Offsite Backup in Cloud

Since we have now a lot of backups stored on either the database hosts or the ClusterControl host, we also want to ensure they don’t get lost in case we face a total infrastructure outage. (e.g. DC on fire or flooded) Therefore ClusterControl allows you to store or copy your backups offsite on cloud. The supported cloud platforms are Amazon S3, Google Cloud Storage and Azure Cloud Storage.

The upload process happens right after the backup is successfully created (if you toggle "Upload Backup to the Cloud") or you can manually click on the cloud icon button of the backup list:

Choose the cloud credential and specify the backup location accordingly:

Restore and/or Verify Backup

From the Backup List interface, you can directly restore a backup to a host in the cluster by clicking on the "Restore" button for the particular backup or click on the "Restore Backup" button:

One nice feature is that it is able to restore a node or cluster using the full and incremental backups as it will keep track of the last full backup made and start the incremental backup from there. Then it will group a full backup together with all incremental backups till the next full backup. This allows you to restore starting from the full backup and applying the incremental backups on top of it.

ClusterControl supports restore on an existing database node or restore and verify on a new standalone host:

These two options are pretty similar, except the verify one has extra options for the new host information. If you follow the restoration wizard, you will need to specify a new host. If "Install Database Software" is enabled, ClusterControl will remove any existing MySQL installation on the target host and reinstall the database software with the same version as the existing MySQL server.

Once the backup is restored and verified, you will receive a notification on the restoration status and the node will be shut down automatically.

Point-in-Time Recovery

For MySQL, both xtrabackup and mysqldump can be used to perform point-in-time recovery and also to provision a new replication slave for master-slave replication or Galera Cluster. A mysqldump PITR-compatible backup contains one single dump file, with GTID info, binlog file and position. Thus, only the database node that produces binary log will have the "PITR compatible" option available:

When PITR compatible option is toggled, the database and table fields are greyed out since ClusterControl will always perform a full backup against all databases, events, triggers and routines of the target MySQL server.

Now restoring the backup. If the backup is compatible with PITR, an option will be presented to perform a Point-In-Time Recovery. You will have two options for that - “Time Based” and “Position Based”. For “Time Based”, you can just pass the day and time. For “Position Based”, you can pass the exact position to where you want to restore. It is a more precise way to restore, although you might need to get the binlog position using the mysqlbinlog utility. More details about point in time recovery can be found in this blog.

Backup Encryption

Universally, ClusterControl supports backup encryption for MySQL, MongoDB and PostgreSQL. Backups are encrypted at rest using AES-256 CBC algorithm. An auto generated key will be stored in the cluster's configuration file under /etc/cmon.d/cmon_X.cnf (where X is the cluster ID):

$ sudo grep backup_encryption_key /etc/cmon.d/cmon_1.cnf
backup_encryption_key='JevKc23MUIsiWLf2gJWq/IQ1BssGSM9wdVLb+gRGUv0='

If the backup destination is not local, the backup files are transferred in encrypted format. This feature complements the offsite backup on cloud, where we do not have full access to the underlying storage system.

Final Thoughts

We showed you how to get your data backed up and how to store them safely off site. Recovery is always a different thing. ClusterControl can recover automatically your databases from the backups made in the past that are stored on premises or copied back from the cloud.

Obviously there is more to securing your data, especially on the side of securing your connections. We will cover this in the next blog post!

by ashraf at May 15, 2018 05:33 AM

May 14, 2018

Peter Zaitsev

Installing MySQL 8.0 on Ubuntu 16.04 LTS in Five Minutes

Installing MySQL 8.0 on Ubuntu small

Do you want to install MySQL 8.0 on Ubuntu 16.04 LTS? In this quick tutorial, I show you exactly how to do it in five minutes or less.

This tutorial assumes you don’t have MySQL or MariaDB installed. If you do, it’s necessary to uninstall them or follow a slightly more complicated upgrade process (not covered here).

Step 1: Install MySQL APT Repository

Ubuntu 16.04 LTS, also known as Xenial, comes with a choice of MySQL 5.7 and MariaDB 10.0.

If you want to use MySQL 8.0, you need to install the MySQL/Oracle Apt repository first:

wget https://dev.mysql.com/get/mysql-apt-config_0.8.10-1_all.deb
dpkg -i mysql-apt-config_0.8.10-1_all.deb

The MySQL APT repository installation package allows you to pick what MySQL version you want to install, as well as if you want access to Preview Versions. Let’s leave them all as default:

Installing MySQL 8.0 on Ubuntu

Step 2: Update repository configuration and install MySQL Server

apt-get update
apt-get install mysql-server

Note: Do not forget to run “apt-get update”, otherwise you can get an old version of MySQL from Ubuntu repository installed.

The installation process asks you to set a password for the root user:

Installing MySQL 8.0 on Ubuntu 1

I recommend you set a root password for increased security. If you do not set a password for the root account, “auth_socket” authentication is enabled. This ensures only the operating system’s “root” user can connect to MySQL Server without a password.

Next, the installation script asks you whether to use Strong Password Encryption or Legacy Authentication:

Installing MySQL 8.0 on Ubuntu 2

While using strong passwords is recommend for security purposes, not all applications and drivers support this new authentication method. Going with Legacy Authentication is a safer choice

All Done

You should have MySQL 8.0 Server running. You can test it by connecting to it with a command line client:

Installing MySQL 8.0 on Ubuntu 3

As you can see, it takes just a few simple steps to install MySQL 8.0 on Ubuntu 16.04 LTS.

Installing MySQL 8.0 on Ubuntu 16.04 LTS is easy. Go ahead give it a try!

The post Installing MySQL 8.0 on Ubuntu 16.04 LTS in Five Minutes appeared first on Percona Database Performance Blog.

by Peter Zaitsev at May 14, 2018 05:27 PM

Shlomi Noach

MySQL master discovery methods, part 5: Service discovery & Proxy

This is the fifth in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).

Master discovery via Service discovery and Proxy

Part 4 presented with an anti-pattern setup, where a proxy would infer the identify of the master by drawing conclusions from backend server checks. This led to split brains and undesired scenarios. The problem was the loss of context.

We re-introduce a service discovery component (illustrated in part 3), such that:

  • The app does not own the discovery, and
  • The proxy behaves in an expected and consistent way.

In a failover/service discovery/proxy setup, there is clear ownership of duties:

  • The failover tool own the failover itself and the master identity change notification.
  • The service discovery component is the source of truth as for the identity of the master of a cluster.
  • The proxy routes traffic but does not make routing decisions.
  • The app only ever connects to a single target, but should allow for a brief outage while failover takes place.

Depending on the technologies used, we can further achieve:

  • Hard cut for connections to old, demoted master M.
  • Black/hold off for incoming queries for the duration of failover.

We explain the setup using the following assumptions and scenarios:

  • All clients connect to master via cluster1-writer.example.net, which resolves to a proxy box.
  • We fail over from master M to promoted replica R.

A non planned failover illustration #1

Master M has died, the box had a power failure. R gets promoted in its place. Our recovery tool:

  • Updates service discovery component that R is the new master for cluster1.

The proxy:

  • Either actively or passively learns that R is the new master, rewires all writes to go to R.
  • If possible, kills existing connections to M.

The app:

  • Needs to know nothing. Its connections to M fail, it reconnects and gets through to R.

A non planned failover illustration #2

Master M gets network isolated for 10 seconds, during which time we failover. R gets promoted.

Everything is as before.

If the proxy kills existing connections to M, then the fact M is back alive turns meaningless. No one gets through to M. Clients were never aware of its identity anyhow, just as they are unaware of R's identity.

Planned failover illustration

We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R.

  • In the process of promotion, M turned read-only.
  • Immediately following promotion, our failover tool updates service discovery.
  • Proxy reloads having seen the changes in service discovery.
  • Our app connects to R.

Discussion

This is a setup we use at GitHub in production. Our components are:

  • orchestrator for failover tool.
  • Consul for service discovery.
  • GLB (HAProxy) for proxy
  • Consul template running on proxy hosts:
    • listening on changes to Consul's KV data
    • Regenerate haproxy.cfg configuration file
    • reload haproxy

As mentioned earlier, the apps need not change anything. They connect to a name that is always resolved to proxy boxes. There is never a DNS change.

At the time of failover, the service discovery component must be up and available, to catch the change. Otherwise we do not strictly require it to be up at all times.

For high availability we will have multiple proxies. Each of whom must listen on changes to K/V. Ideally the name (cluster1-writer.example.net in our example) resolves to any available proxy box.

  • This, in itself, is a high availability issue. Thankfully, managing the HA of a proxy layer is simpler than that of a MySQL layer. Proxy servers tend to be stateless and equal to each other.
  • See GLB as one example for a highly available proxy layer. Cloud providers, Kubernetes, two level layered proxies, Linux Heartbeat, are all methods to similarly achieve HA.

See also:

Sample orchestrator configuration

An orchestrator configuration would look like this:

  "ApplyMySQLPromotionAfterMasterFailover": true,
  "KVClusterMasterPrefix": "mysql/master",
  "ConsulAddress": "127.0.0.1:8500",
  "ZkAddress": "srv-a,srv-b:12181,srv-c",
  "PostMasterFailoverProcesses": [
    “/just/let/me/know about failover on {failureCluster}“,
  ],

In the above:

  • If ConsulAddress is specified, orchestrator will update given Consul setup with K/V changes.
  • At 3.0.10, ZooKeeper, via ZkAddress, is still not supported by orchestrator.
  • PostMasterFailoverProcesses is here just to point out hooks are not strictly required for the operation to run.

See orchestrator configuration documentation.

All posts in this series

by shlomi at May 14, 2018 08:08 AM

May 12, 2018

MariaDB AB

Streaming Data From MariaDB Server Into MariaDB ColumnStore via MariaDB MaxScale

Streaming Data From MariaDB Server Into MariaDB ColumnStore via MariaDB MaxScale markusmakela Fri, 05/11/2018 - 23:04

In this blog post, we look at how to configure Change Data Capture (CDC) from the MariaDB Server to
MariaDB ColumnStore via MariaDB MaxScale. Our goal in this blog post is to have our analytical
ColumnStore instance reflect the changes that happen on our operational MariaDB Server.

MariaDB MaxScale Configuration

We start by creating a MaxScale configuration with a binlogrouter and avrorouter instances. The
former acts as a replication slave and fetches binary logs and the latter processes the binary logs
into CDC records.

[replication-router]
type=service
router=binlogrouter
user=maxuser
passwd=maxpwd
server_id=2
master_id=1
binlogdir=/var/lib/maxscale
mariadb10-compatibility=1
filestem=mariadb-bin

[replication-listener]
type=listener
service=replication-router
protocol=MySQLClient
port=3306

[avro-router]
type=service
router=avrorouter
source=replication-router
avrodir=/var/lib/maxscale

[avro-listener]
type=listener
service=avro-router
protocol=cdc
port=4001

Copy the contents of this file into the `maxscale.cnf` file.

The docker-compose.yml File

The next step is to clone the MaxScale repository and to create the docker-compose file.

To clone the MaxScale repository, execute the following command.

git clone https://github.com/mariadb-corporation/MaxScale.git --branch=2.2 --depth=1

After the command completes, create the `docker-compose.yml` file with the following contents in the
same directory where you cloned MaxScale.
 

version: '2'
services:
    master:
        image: mariadb:10.2
        container_name: master
        environment:
            MYSQL_ALLOW_EMPTY_PASSWORD: Y
        command: mysqld --log-bin=mariadb-bin --binlog-format=ROW --server-id=1
        ports:
            - "3306:3306"

    maxscale:
        build: ./MaxScale/docker/
        container_name: maxscale
        volumes:
            - ./maxscale.cnf:/etc/maxscale.cnf.d/maxscale.cnf
        ports:
            - "3307:3306"
            - "4001:4001"

    mcs:
        image: mariadb/columnstore_singlenode:latest
        container_name: mcs
        ports:
            - "3308:3306"

    adapter:
        image: centos:7
        container_name: adapter
        command: /bin/sleep 0xffffffff

This file contains a MariaDB Server that acts as the master server, a MaxScale instance in a CDC
configuration and a single-node ColumnStore container. We also use a plain CentOS 7 container where
we install the adapter.

To start the cluster, run the following commands.

docker-compose build
docker-compose up -d

Configuring

The next step is to copy the ColumnStore configuration file from the `mcs` container and modify it
to use the container hostname instead of the loopback address. To do this, execute the following
commands.

docker cp mcs:/usr/local/mariadb/columnstore/etc/Columnstore.xml .
sed -i 's/127.0.0.1/mcs/' Columnstore.xml
docker cp Columnstore.xml adapter:/etc/Columnstore.xml

After we have copied the configuration file into the `adapter` container, we are ready to install the adapter.

Installing Adapter

To access the container, execute `docker-compose exec adapter bash`. This will launch a new shell
where the following commands will be executed.

yum -y install epel-release
yum -y install https://downloads.mariadb.com/Data-Adapters/mariadb-columnstore-api/1.1.3/centos/x86_64/7/mariadb-columnstore-api-1.1.3-1-x86_64-centos7.rpm
curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash
yum -y install https://downloads.mariadb.com/Data-Adapters/mariadb-streaming-data-adapters/cdc-data-adapter/1.1.3/centos-7/mariadb-columnstore-maxscale-cdc-adapters-1.1.3-1-x86_64-centos7.rpm

After the adapter is installed, exit the shell.

Next we can start preparing the data on the master server and configure the replication between it
and MaxScale.

Preparing Data and Configuring Replication

We connect to the MariaDB Server running on the `master` container with the following command.

mysql -uroot -h 127.0.0.1 -P 3306

Once connected, executing the following SQL. This will prepare the server, create a table and insert
some dummy data into the table. It also modified the data to emulate changes in the database.

RESET MASTER;
CREATE USER 'maxuser'@'%' IDENTIFIED BY 'maxpwd';
GRANT ALL ON *.* TO 'maxuser'@'%';
CREATE DATABASE test;
USE test;
CREATE TABLE t1(id INT);
INSERT INTO t1 VALUES (1), (2), (3);
UPDATE t1 SET id = 4 WHERE id = 2;
DELETE FROM t1 WHERE id = 3;

Once we have created some data, we configure the replication between MaxScale and the master
server. To do this, execute the following command.

mysql -umaxuser -pmaxpwd -h 127.0.0.1 -P 3307 -e "CHANGE MASTER TO MASTER_HOST='master', MASTER_PORT=3306, MASTER_USER='maxuser', MASTER_PASSWORD='maxpwd', MASTER_LOG_FILE='mariadb-bin.000001', MASTER_LOG_POS=4; START SLAVE"

MaxScale will start to replicate events from the master server and process them into CDC records.

Create CDC User

To use the CDC system in MaxScale, we have to create a user for it. Execute the following command to create a user.

docker-compose exec maxscale maxctrl call command cdc add_user avro-router cdcuser cdcpassword

Starting the Adapter

We again execute the commands inside the adapter container. To access the container, execute
`docker-compose exec adapter bash`.

Once inside the container, we can try to start the adapter. Given that the table `test.t1` does not
exist on ColumnStore, the adapter will give us an error when we try to start it:
 

[root@d444d5c5b820 /]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h maxscale -P 4001 test t1
Table not found, create with:

    CREATE TABLE test.t1 (domain int, event_number int, event_type varchar(50), id int, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;

To create the table on ColumnStore, we have to exit the container. Once out of the container, we
connect to the ColumnStore container and create the table described in the error message with the
following command.

mysql -uroot -h 127.0.0.1 -P 3308 -e "CREATE TABLE test.t1 (domain int, event_number int, event_type varchar(50), id int, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;"

Once the table is created, we go back into the adapter container with `docker-compose exec adapter
bash` and try to start it again.

[root@d444d5c5b820 /]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h maxscale -P 4001 test t1
4 rows and 1 transactions inserted in 0.210798 seconds. GTID = 0-1-6
2 rows and 1 transactions inserted in 0.164197 seconds. GTID = 0-1-7

This time we see that it processed a total of six rows of data. We can now connect to the
ColumnStore on another terminal and see what the table contains.

[markusjm@localhost blog]$ mysql -uroot -h 127.0.0.1 -P 3308 -e "SELECT * FROM test.t1"
+--------+--------------+---------------+------+----------+-----------+------------+
| domain | event_number | event_type    | id   | sequence | server_id | timestamp  |
+--------+--------------+---------------+------+----------+-----------+------------+
|      0 |            1 | insert        |    1 |        5 |         1 | 1523948280 |
|      0 |            2 | insert        |    2 |        5 |         1 | 1523948280 |
|      0 |            3 | insert        |    3 |        5 |         1 | 1523948280 |
|      0 |            1 | update_before |    2 |        6 |         1 | 1523948280 |
|      0 |            2 | update_after  |    4 |        6 |         1 | 1523948280 |
|      0 |            1 | delete        |    3 |        7 |         1 | 1523948281 |
+--------+--------------+---------------+------+----------+-----------+------------+

The changes we did on the master MariaDB Server have been propagated to ColumnStore. To understand
what the values are, we can map the SQL statements to the rows in the table.

The first SQL statement is `INSERT INTO t1 VALUES (1), (2), (3);` which inserts three values into
the table. We see that the first three rows in the resultset are of type `insert` and the values
match what we inserted.

The next SQL statement is `UPDATE t1 SET id = 4 WHERE id = 2;` which only touches one row. Although
it modifies only one row in the database, it generated two rows in ColumnStore. This happened
because the MaxScale CDC system stores both the before and after images of the modified row. This
allows easy comparisons between new and old values.

The final SQL statement was `DELETE FROM t1 WHERE id = 3;` which deleted one row. This statement was
converted to a delete entry with the data that was deleted (row with `id` of 3). This allows deleted
data to be retained for analytical and auditing purposes without actually storing it on the master
database.

In this blog post, we look at how to configure Change Data Capture from the MariaDB Server to
MariaDB ColumnStore via MariaDB MaxScale. Our goal in this blog post is to have our analytical
ColumnStore instance reflect the changes that happen on our operational MariaDB Server.

Login or Register to post comments

by markusmakela at May 12, 2018 03:04 AM

May 11, 2018

Peter Zaitsev

This Week in Data with Colin Charles 39: a valuable time spent at rootconf.in

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

rootconf.in 2018 just ended, and it was very enjoyable to be in Bangalore for the conference. The audience was large, the conversations were great, and overall I think this is a rather important conference if you’re into the “DevOps” movement (or are a sysadmin!). From the data store world, Oracle MySQL was a sponsor, as was MyDBOPS (blog), and Elastic. There were plenty more, including Digital Ocean/GoJek/Walmart Labs — many MySQL users.

I took a handful of pictures with people, and here are some of the MyDBOPS team and myself.  They have over 20 employees, and serve the Indian market at rates that would be more palatable than straight up USD rates. Traveling through Asia, many businesses always do find local partners and offer local pricing; this really becomes more complex in the SaaS space (everyone pays the same rate generally) and also the services space.

Colin at Rootconf with Oracle
Some of the Oracle MySQL team who were exhibiting were very happy they got a good amount of traffic to the booth based on stuff discussed at the talk and BOF.

From a talk standpoint, I did a keynote for an hour and also a BoF session for another hour (great discussion, lots of blog post ideas from there), and we had a Q&A session for about 15 minutes. There were plenty of good conversations in the hallway track.

A quick observation that I notice happens everywhere: many people don’t realize features that have existed in MySQL since 5.6/5.7.  So they are truly surprised with stuff in 8.0 as well. It is clear there is a huge market that would thrive around education. Not just around feature checklists, but also around how to use features. Sometimes, this feels like the MySQL of the mid-2000’s — getting apps to also use new features, would be a great thing.

Releases

This seems to have been a quiet week on the releases front.

Are you a user of Amazon Aurora MySQL? There is now the Amazon Aurora Backtrack feature, which allows you to go back in time. It is described to work as:

Aurora uses a distributed, log-structured storage system (read Design Considerations for High Throughput Cloud-Native Relational Databases to learn a lot more); each change to your database generates a new log record, identified by a Log Sequence Number (LSN). Enabling the backtrack feature provisions a FIFO buffer in the cluster for storage of LSNs. This allows for quick access and recovery times measured in seconds.

Link List

Upcoming appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 39: a valuable time spent at rootconf.in appeared first on Percona Database Performance Blog.

by Colin Charles at May 11, 2018 03:42 PM

Jean-Jerome Schmidt

My MySQL Database is Corrupted... What Do I Do Now?

How do MySQL tables get corrupted? There are many ways to spoil data files. Often, corruption is due to defects in the underlying platform, which MySQL relies on to store and retrieve data - disk subsystem, controllers, communication channels, drivers, firmware or other hardware faults. Data corruption can also occur if the MySQL server daemon restarts suddenly, or your server reboots due to a crash of other OS components. If the database instance was in the middle of writing data to disk, it could write the data partially which may end up with a page checksum that is different than expected. There have also been bugs in MySQL so even if the server hardware is ok, MySQL itself can cause corruption.

Usually when MySQL data gets corrupted the recommendation is to restore it from the last backup, switch to DR server or take down the affected node if you have Galera cluster to serve data immediately from other nodes. In some cases you can't - if the backup is not there, the cluster was never set up, your replication is down for a very long time, or the DR procedure was never tested. Even if you have a backup, you may still want to take some actions to attempt recovery as it may take less time get back online.

MyISAM, the bad and ugly

InnoDB is more fault-tolerant than MyISAM. InnoDB has auto_recovery features and is much safer as compared to the older MyISAM engine.

MyISAM tables can easily get corrupted when lots of writes happen and a lot of locks happen on that table. The storage engine "writes" data to the filesystem cache, which may take some time before it is flushed to disk. Therefore if your server restarts suddenly, some unknown amount of data in the cache is lost. That's a usual way for MyISAM data to be corrupted. The recommendation is to migrate from MyISAM to InnoDB, but there may be cases where this is not possible.

Primum non nocere, the backup

Before you attempt to repair corrupted tables, you should back your database files first. Yes, it’s already broken but this is to minimize the risk of possible further damage which may be caused by a recovery operation. There is no guarantee that any action you take will not harm untouched data blocks. Forcing InnoDB recovery with values greater than 4 can corrupt data files, so make sure you will do it with prior backup and ideally on a separate physical copy of the database.

To back up all of the files from all of your databases, follow these steps:

Stop the MySQL server

service mysqld stop

Type the following command for your datadir.

cp -r /var/lib/mysql /var/lib/mysql_bkp

After we have a backup copy of the data directory, we are ready to start troubleshooting.

Data corruption identification

The error log is your best friend. Usually, when data corruption happens, you will find relevant information (including links to documentation) in the error log. If you don't know where it's located, check my.cnf and variable log_error, for more details check this article https://dev.mysql.com/doc/refman/8.0/en/error-log-destination-configuration.html. What you should also know is your storage engine type. You can find this information in the error log or in information_schema.

mysql> select table_name,engine from information_schema.tables where table_name = '<TABLE>' and table_schema = '<DATABASE>';

The main tools/commands to diagnose issues with data corruption are CHECK TABLE, REPAIR TABLE, and myisamchk. The mysqlcheck client performs table maintenance: It checks, repairs (MyISAM), optimizes or analyzes tables while MySQL is running.

mysqlcheck -uroot -p <DATABASE>

Replace DATABASE with the name of the database, and replace TABLE with the name of the table that you want to check:

mysqlcheck -uroot -p <DATABASE> <TABLE>

Mysqlcheck checks the specified database and tables. If a table passes the check, mysqlcheck displays OK for the table.

employees.departments                              OK
employees.dept_emp                                 OK
employees.dept_manager                             OK
employees.employees                                OK
Employees.salaries
Warning  : Tablespace is missing for table 'employees/salaries'
Error    : Table 'employees.salaries' doesn't exist in engine
status   : Operation failed
employees.titles                                   OK

Data corruption issues may be also related to permission issues. In some cases, OS can switch mount point to read-only mode due to R/W issues or this can be caused by a user who accidentally changed ownership of the data files. In such cases, you will find relevant information in the error log.

[root@node1 employees]# ls -rtla
...
-rw-rw----. 1 mysql mysql  28311552 05-10 06:24 titles.ibd
-rw-r-----. 1 root  root  109051904 05-10 07:09 salaries.ibd
drwxr-xr-x. 7 mysql mysql      4096 05-10 07:12 ..
drwx------. 2 mysql mysql      4096 05-10 07:17 .

MySQL Client

MariaDB [employees]> select count(*) from salaries;
ERROR 1932 (42S02): Table 'employees.salaries' doesn't exist in engine

Error log entry

2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: Failed to find tablespace for table `employees`.`salaries` in the cache. Attempting to load the tablespace with space id 9
2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: Operating system error number 13 in a file operation.
2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: The error means mysqld does not have the access rights to the directory.
2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: Cannot open datafile for read-only: './employees/salaries.ibd' OS error: 81
2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: Operating system error number 13 in a file operation.
2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: The error means mysqld does not have the access rights to the directory.
2018-05-10  9:15:38 140703666226944 [ERROR] InnoDB: Could not find a valid tablespace file for `employees/salaries`. Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Recovering InnoDB table

If you are using the InnoDB storage engine for a database table, you can run the InnoDB recovery process.
To enable auto recovery MySQL needs innodb_force_recovery option to be enabled. Innodb_force_recovery forces InnoDB to start up while preventing background operations from running, so that you can dump your tables.

To do this open my.cnf and add the following line to the [mysqld] section:

[mysqld]
innodb_force_recovery=1
service mysql restart

You should start from innodb_force_recovery=1 save the changes to my.cnf file, and then restart the MySQL server using the appropriate command for your operating system. If you are able to dump your tables with an innodb_force_recovery value of 3 or less, then you are relatively safe. In many cases you will have to go up to 4 and as you already know that can corrupt data.

[mysqld]
innodb_force_recovery=1
service mysql restart

If needed change to the higher value, six is the maximum and most dangerous.

Once you are able to start your database, type the following command to export all of the databases to the databases.sql file:

mysqldump --all-databases --add-drop-database --add-drop-table > dump.sql

Start mysql, and then try to drop the affected database or databases using the DROP DATABASE command. If MySQL is unable to drop a database, you can delete it manually using the steps below after you stop the MySQL server.

service mysqld stop

If you were unable to drop a database, type the following commands to delete it manually.

cd /var/lib/mysql
rm -rf <DATABASE>

Make sure you do not delete the internal database directories.
After you are done, comment out the following line in the [mysqld] to disable InnoDB recovery mode.

#innodb_force_recovery=...

Save the changes to the my.cnf file, and then start the MySQL server

service mysqld start

Type the following command to restore the databases from the backup file you created in step 5:

mysql> tee import_database.log
mysql> source dump.sql

Repairing MyISAM

If mysqlcheck reports an error for a table, type the mysqlcheck command with -repair flag to fix it. The mysqlcheck repair option works while the server is up and running.

mysqlcheck -uroot -p -r <DATABASE> <TABLE>

If the server is down and for any reason mysqlcheck cannot repair your table, you still have an option to perform recovery directly on files using myisamchk. With myisamchk, you need to make sure that the server doesn't have the tables open.

Stop the MySQL

service mysqld stop
cd /var/lib/mysql

Change to the directory where the database is located.

cd /var/lib/mysql/employees
myisamchk <TABLE>

To check all of the tables in a database, type the following command:

myisamchk *.MYI

If the previous command does not work, you can try deleting temporary files that may be preventing myisamchk from running correctly. To do this, change back to the data dir directory, and then run the following command:

ls */*.TMD

If there are any .TMD files listed, delete them:

rm */*.TMD

Then re-run myisamchk.

To attempt repair a table, execute the following command, replacing TABLE with the name of the table that you want to repair:

myisamchk --recover <TABLE>

Restart the MySQL server

service mysqld start

How to avoid data loss

There are several things you can do to minimize the risk of unrecoverable data. First of all backups. The problem with backups is that sometimes they can be overlooked. For cron scheduled backups, usually we write wrapper scripts that detect problems in the backup log, but that does not include cases when the backup didn’t start at all. Cron can sometimes hang and often there is no monitoring set on it. Another potential issue could be the case when the backup was never set up. The good practice is to run reports from a separate tool that will analyze the backup status and inform you about missing backups schedules. You can use ClusterControl for that or write your own programs.

ClusterControl operational backup report
ClusterControl operational backup report

To reduce the impact of the possible data corruption you should always consider clustered systems. It’s just a matter of time when the database will crash or get corrupted, so it’s good to have a copy which you can switch to. It could be Master / Slave replication. The important aspect here is to have safe automatic recovery to minimize the complexity of the switchover and minimize the recovery time (RTO).

ClusterControl auto recovery features
ClusterControl auto recovery features

by Bart Oles at May 11, 2018 09:58 AM

May 10, 2018

Peter Zaitsev

Why We’ve Deprecated MongoRocks in Percona Server for MongoDB 3.6

MongoRocks

MongoRocksIn this blog post, we’ll look at why we deprecated MongoRocks in Percona Server for MongoDB 3.6, and provide some guidance on switching from MongoRocks to WiredTiger.

On April 24, 2018, Percona announced the availability of Percona Server for MongoDB 3.6. If you read the blog post announcing the release, you probably saw the following note:

“MongoRocks is deprecated in Percona Server for MongoDB 3.6 and it will be fully removed in the next major version of Percona Server for MongoDB.”

Why did we make this decision, and what should you do if you’re using MongoRocks?

There are two significant factors that contributed to our decision to deprecate MongoRocks in Percona Server for MongoDB 3.6:

  1. We’ve seen little commercial interest in MongoRocks over the past two years, and
  2. MongoDB’s deep integration with WiredTiger makes supporting alternative storage engines increasingly difficult.

Little Commercial Interest

We haven’t seen strong demand for MongoRocks from our customers. Whenever we talk to Percona customers and Percona Server for MongoDB users, we receive pretty consistent feedback about some new features they would like to see, but nobody ever mentions storage engines. When we ask specifically which storage engine they use, it’s almost always WiredTiger. MongoRocks rarely comes up.

Deep WiredTiger Integration Makes Alternative Storage Engine Support Increasingly Difficult

MongoDB 3.6 introduced a number of exciting new features, including sessions, retryable writes and causal consistency (both of which are based on the sessions work). And, as was formally announced in February, MongoDB 4.0 will bring multi-document transactions for replica sets. All of these important new features work properly in large part because of significant changes MongoDB, Inc. made to the storage layer of WiredTiger. Additionally, given that the MongoDB server team is planning on deprecating MMAPv1 in MongoDB 4.0, we expect MongoDB, Inc. will continue making fundamental changes to WiredTiger to create new features in MongoDB.

Rearchitecting RocksDB (the storage layer of MongoRocks) to allow it to work properly with the new features in MongoDB 3.6 and with multi-document transactions in MongoDB 4.0 would be a massive undertaking, and we believe more users will be more positively affected if our engineering resources instead work on frequently-requested features for Percona Server for MongoDB.

For those of you who are using MongoRocks with Percona Server for MongoDB, we know this situation isn’t ideal; but we want to make sure you have a database that’s highly performant and reliable and using all of the latest and greatest features, including sessions (and soon, transactions). The best way to achieve that is to migrate from MongoRocks to WiredTiger and upgrade to Percona Server for MongoDB 3.6.

How to Switch to WiredTiger and then Upgrade

If you are using MongoRocks with an earlier version of Percona Server for MongoDB, and you wish to upgrade to Percona Server for MongoDB 3.6, we strongly encourage you to first switch to WiredTiger, then upgrade. For detailed instructions on how to change MongoDB storage engines without downtime, please see this blog post, appropriately titled, “How to Change MongoDB Storage Engines Without Downtime.” You can then follow the steps from the Upgrading to 3.6 section of the Percona Server for MongoDB 3.6 documentation.

We suspect sessions and transactions are just the tip of the iceberg of great new functionality that MongoDB will be able to implement by building deep integrations between the database and WiredTiger. We look forward to seeing what comes next!

The post Why We’ve Deprecated MongoRocks in Percona Server for MongoDB 3.6 appeared first on Percona Database Performance Blog.

by Jeff Sandstrom at May 10, 2018 03:00 PM

Jean-Jerome Schmidt

Deploying Cloud Databases with ClusterControl 1.6

ClusterControl 1.6 comes with tighter integration with AWS, Azure and Google Cloud, so it is now possible to launch new instances and deploy MySQL, MariaDB, MongoDB and PostgreSQL directly from the ClusterControl user interface. In this blog, we will show you how to deploy a cluster on Amazon Web Services.

Note that this new feature requires two modules called clustercontrol-cloud and clustercontrol-clud. The former is a helper daemon which extends CMON capability of cloud communication, while the latter is a file manager client to upload and download files on cloud instances. Both packages are dependencies of the clustercontrol UI package, which will be installed automatically if they do not exist. See the Components documentation page for details.

Cloud Credentials

ClusterControl allows you to store and manage your cloud credentials under Integrations (side menu) -> Cloud Providers:

The supported cloud platforms in this release are Amazon Web Services, Google Cloud Platform and Microsoft Azure. On this page, you can add new cloud credentials, manage existing ones and also connect to your cloud platform to manage resources.

The credentials that have been set up here can be used to:

  • Manage cloud resources
  • Deploy databases in the cloud
  • Upload backup to cloud storage

The following is what you would see if you clicked on "Manage AWS" button:

You can perform simple management tasks on your cloud instances. You can also check the VPC settings under "AWS VPC" tab, as shown in the following screenshot:

The above features are useful as reference, especially when preparing your cloud instances before you start the database deployments.

Database Deployment on Cloud

In previous versions of ClusterControl, database deployment on cloud would be treated similarly to deployment on standard hosts, where you had to create the cloud instances beforehand and then supply the instance details and credentials in the "Deploy Database Cluster" wizard. The deployment procedure was unaware of any extra functionality and flexibility in the cloud environment, like dynamic IP and hostname allocation, NAT-ed public IP address, storage elasticity, virtual private cloud network configuration and so on.

With version 1.6, you just need to supply the cloud credentials, which can be managed via the "Cloud Providers" interface and follow the "Deploy in the Cloud" deployment wizard. From ClusterControl UI, click Deploy and you will be presented with the following options:

At the moment, the supported cloud providers are the three big players - Amazon Web Service (AWS), Google Cloud and Microsoft Azure. We are going to integrate more providers in the future release.

In the first page, you will be presented with the Cluster Details options:

In this section, you would need to select the supported cluster type, MySQL Galera Cluster, MongoDB Replica Set or PostgreSQL Streaming Replication. The next step is to choose the supported vendor for the selected cluster type. At the moment, the following vendors and versions are supported:

  • MySQL Galera Cluster - Percona XtraDB Cluster 5.7, MariaDB 10.2
  • MongoDB Cluster - MongoDB 3.4 by MongoDB, Inc and Percona Server for MongoDB 3.4 by Percona (replica set only).
  • PostgreSQL Cluster - PostgreSQL 10.0 (streaming replication only).

In the next step, you will be presented with the following dialog:

Here you can configure the selected cluster type accordingly. Pick the number of nodes. The Cluster Name will be used as the instance tag, so you can easily recognize this deployment in your cloud provider dashboard. No space is allowed in the cluster name. My.cnf Template is the template configuration file that ClusterControl will use to deploy the cluster. It must be located under /usr/share/cmon/templates on the ClusterControl host. The rest of the fields are pretty self-explanatory.

The next dialog is to select the cloud credentials:

You can choose the existing cloud credentials or create a new one by clicking on the "Add New Credential" button. The next step is to choose the virtual machine configuration:

Most of the settings in this step are dynamically populated from the cloud provider by the chosen credentials. You can configure the operating system, instance size, VPC setting, storage type and size and also specify the SSH key location on the ClusterControl host. You can also let ClusterControl generate a new key specifically for these instances. When clicking on "Add New" button next to Virtual Private Cloud, you will be presented with a form to create a new VPC:

VPC is a logical network infrastructure you have within your cloud platform. You can configure your VPC by modifying its IP address range, create subnets, configure route tables, network gateways, and security settings. It's recommended to deploy your database infrastructure in this network for isolation, security and routing control.

When creating a new VPC, specify the VPC name and IPv4 address block with subnet. Then, choose whether IPv6 should be part of the network and the tenancy option. You can then use this virtual network for your database infrastructure.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

The last step is the deployment summary:

In this stage, you need to choose which subnet under the chosen virtual network that you want the database to be running on. Take note that the chosen subnet MUST have auto-assign public IPv4 address enabled. You can also create a new subnet under this VPC by clicking on "Add New Subnet" button. Verify if everything is correct and hit the "Deploy Cluster" button to start the deployment.

You can then monitor the progress by clicking on the Activity -> Jobs -> Create Cluster -> Full Job Details:

Depending on the connections, it could take 10 to 20 minutes to complete. Once done, you will see a new database cluster listed under the ClusterControl dashboard. For PostgreSQL streaming replication cluster, you might need to know the master and slave IP addresses once the deployment completes. Simply go to Nodes tab and you would see the public and private IP addresses on the node list on the left:

Your database cluster is now deployed and running on AWS.

At the moment, the scaling up works similar to the standard host, where you need to create a cloud instance manually beforehand and specify the host under ClusterControl -> pick the cluster -> Add Node.

Under the hood, the deployment process does the following:

  1. Create cloud instances
  2. Configure security groups and networking
  3. Verify the SSH connectivity from ClusterControl to all created instances
  4. Deploy database on every instance
  5. Configure the clustering or replication links
  6. Register the deployment into ClusterControl

Take note that this feature is still in beta. Nevertheless, you can use this feature to speed up your development and testing environment by controlling and managing the database cluster in different cloud providers from a single user interface.

Database Backup on Cloud

This feature has been around since ClusterControl 1.5.0, and now we added support for Azure Cloud Storage. This means that you can now upload and download the created backup on all three major cloud providers (AWS, GCP and Azure). The upload process happens right after the backup is successfully created (if you toggle "Upload Backup to the Cloud") or you can manually click on the cloud icon button of the backup list:

You can then download and restore backups from the cloud, in case you lost your local backup storage, or if you need to reduce local disk space usage for your backups.

Current Limitations

There are some known limitations for the cloud deployment feature, as stated below:

  • There is currently no 'accounting' in place for the cloud instances. You will need to manually remove the cloud instances if you remove a database cluster.
  • You cannot add or remove a node automatically with cloud instances.
  • You cannot deploy a load balancer automatically with a cloud instance.

We have extensively tested the feature in many environments and setups but there are always corner cases that we might have missed out upon. For more information, please take a look at the change log.

Happy clustering in the cloud!

by ashraf at May 10, 2018 09:58 AM

MariaDB AB

Moving a MariaDB Database to Encrypted and Unencrypted States

Moving a MariaDB Database to Encrypted and Unencrypted States janlindstrom Thu, 05/10/2018 - 02:28

In this blog, we present a way to move an existing database first to an encrypted state and then, how to move your database to an unencrypted state. 

In order to use encryption, you need to load a plugin to manage the encryption keys. See currently supported encryption plugins. Each key uses a 32-bit integer as a key identifier (key_id) and actual key. Keys can be versioned so that data is re-encrypted from older key to newer version of the key. In this blog, we will use file key management plugin as an example (see encryption key management). We also assume that you are using the most recent version of MariaDB Server (this blog assumes that MDEV-15566 is fixed i.e. MariaDB version should be 10.1.33, 10.2.15 or 10.3.6).

Moving a database to an encrypted state or to an unencrypted state is done using a key_rotation. Key rotation moves the database from an existing encrypted state to another. Note that here tablespace could have no encrypted state (i.e. tablespace is unencrypted) or tablespace could have an encryption state that is moved to an unencrypted state. Key rotation can happen periodically (based on configuration variable innodb-encryption-rotate-key-age i.e. how old key can be before it is rotated), requested by database administrator (e.g. by issuing set global innodb_encrypt_tables=ON;) or by encryption key management system (see e.g. rotate keys).

Database administrators need to make the decision if it is enough to encrypt only individual tables (see encrypting data for InnoDB) or the whole database including system tablespace. Note that table data is also written to redo log and undo log. Thus, if the database contains tables that contain very sensitive data innodb-encrypt-log should also be enabled. In this blog, we show how to encrypt the whole database. 

Moving database to encrypted state

Before the database can be moved to an encrypted state, we need to add encryption plugin configuration to config file (see detailed description on parameters):

# File Key Management
plugin-load-add = file_key_management
file-key-management-filename = /mnt/flash/keys.txt
file-key-management-encryption-algorithm = aes_ctr

# InnoDB encryption setup
innodb-encrypt-tables=ON
innodb-encrypt-log=ON
innodb-encryption-rotate-key-age=1024
innodb-encryption-threads=4
innodb-tablespaces-encryption

After restart progress of the encryption operation can be monitored from INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION table. In the following example, we query name of tablespace, current page under key rotation and maximum page in the tablespace for those tables that are not yet encrypted:

MariaDB [(none)]> select name, KEY_ROTATION_PAGE_NUMBER, KEY_ROTATION_MAX_PAGE_NUMBER from information_schema.innodb_tablespaces_encryption where min_key_version = 0 or ROTATING_OR_FLUSHING = 1;
+---------------+--------------------------+------------------------------+
| name          | KEY_ROTATION_PAGE_NUMBER | KEY_ROTATION_MAX_PAGE_NUMBER |
+---------------+--------------------------+------------------------------+
| innodb_system |                    17641 |                      1397504 |
+---------------+--------------------------+------------------------------+
1 row in set (0.000 sec)

Naturally, you may also query the status of all tables:

MariaDB [tpcc1000]> select * from information_schema.innodb_tablespaces_encryption;
+-------+-------------------+-------------------+--------------------+-----------------+---------------------+--------------------------+------------------------------+----------------+----------------------+
| SPACE | NAME              | ENCRYPTION_SCHEME | KEYSERVER_REQUESTS | MIN_KEY_VERSION | CURRENT_KEY_VERSION | KEY_ROTATION_PAGE_NUMBER | KEY_ROTATION_MAX_PAGE_NUMBER | CURRENT_KEY_ID | ROTATING_OR_FLUSHING |
+-------+-------------------+-------------------+--------------------+-----------------+---------------------+--------------------------+------------------------------+----------------+----------------------+
|     0 | innodb_system     |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|     3 | tpcc1000/customer |                 1 |                  1 |               0 |                   1 |                     2401 |                      1317888 |              1 |                    1 |
+-------+-------------------+-------------------+--------------------+-----------------+---------------------+--------------------------+------------------------------+----------------+----------------------+
2 rows in set (0.000 sec)

From this we can see that system tablespace is already encrypted but table customer from database tpcc1000 is currently being encrypted. If your system has hardware resources and the encryption process seems slow, you may try the following parameters:

# Set close to number of cores
set global innodb_encryption_threads=16;
# For SSD increase number of I/O operations used for encryption in second
set global innodb_encryption_rotation_iops=40000;

Database encryption is finished when there are no tables in an unencrypted state:

MariaDB [tpcc1000]> select name, KEY_ROTATION_PAGE_NUMBER, KEY_ROTATION_MAX_PAGE_NUMBER from information_schema.innodb_tablespaces_encryption where min_key_version = 0 or ROTATING_OR_FLUSHING = 1;
Empty set (0.001 sec)

And to verify, list all tables that are encrypted:

MariaDB [tpcc1000]> select * from information_schema.innodb_tablespaces_encryption where min_key_version != 0;
+-------+---------------------+-------------------+--------------------+-----------------+---------------------+--------------------------+------------------------------+----------------+----------------------+
| SPACE | NAME                | ENCRYPTION_SCHEME | KEYSERVER_REQUESTS | MIN_KEY_VERSION | CURRENT_KEY_VERSION | KEY_ROTATION_PAGE_NUMBER | KEY_ROTATION_MAX_PAGE_NUMBER | CURRENT_KEY_ID | ROTATING_OR_FLUSHING |
+-------+---------------------+-------------------+--------------------+-----------------+---------------------+--------------------------+------------------------------+----------------+----------------------+
|     0 | innodb_system       |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|     3 | tpcc1000/customer   |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|     2 | tpcc1000/district   |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|     4 | tpcc1000/history    |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|     8 | tpcc1000/item       |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|     5 | tpcc1000/new_orders |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|     7 | tpcc1000/order_line |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|     6 | tpcc1000/orders     |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|     9 | tpcc1000/stock      |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|     1 | tpcc1000/warehouse  |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
+-------+---------------------+-------------------+--------------------+-----------------+---------------------+--------------------------+------------------------------+----------------+----------------------+
10 rows in set (0.000 sec)

As can be seen, all tablespaces use ENCRYPTION_SCHEME=1 (encrypted) and MIN_KEY_VERSION=1. After this phase the database administrator should consider decreasing the number of used encryption threads and rotation iops. Furthermore, the need for further key rotation should be also considered as the file key management plugin does not support real key rotation. Key rotation can be disabled using innodb-encryption-rotate-key-age=0. Note that even with that setup all new tables created are considered for encryption.

Moving database to unencrypted state

Here we assume that you have a database that is encrypted and there is no longer a need to encrypt data or data protection is done differently. We will use the same database as an example as in moving database to encrypted state. At this point there is no need to restart the server. Instead moving the database to unencrypted state can be done as an online operation. First, the database administrator should check that there is no tables using explicit encryption i.e. there is a table where create table used ENCRYPTED=YES table option. Now moving the database to an unencrypted state can be simple done by issuing:

SET GLOBAL innodb_encrypt_tables=OFF;

This will start unencrypting all tablespaces including system tablespace and progress of this operation can be monitored by:

MariaDB [tpcc1000]> select * from information_schema.innodb_tablespaces_encryption where min_key_version != 0;
+-------+---------------------+-------------------+--------------------+-----------------+---------------------+--------------------------+------------------------------+----------------+----------------------+
| SPACE | NAME                | ENCRYPTION_SCHEME | KEYSERVER_REQUESTS | MIN_KEY_VERSION | CURRENT_KEY_VERSION | KEY_ROTATION_PAGE_NUMBER | KEY_ROTATION_MAX_PAGE_NUMBER | CURRENT_KEY_ID | ROTATING_OR_FLUSHING |
+-------+---------------------+-------------------+--------------------+-----------------+---------------------+--------------------------+------------------------------+----------------+----------------------+
|     7 | tpcc1000/order_line |                 1 |                  1 |               1 |                   1 |                    76564 |                      1947904 |              1 |                    1 |
|     6 | tpcc1000/orders     |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|     9 | tpcc1000/stock      |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|     1 | tpcc1000/warehouse  |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
|    10 | tpcc1000/t1         |                 1 |                  1 |               1 |                   1 |                     NULL |                         NULL |              1 |                    0 |
+-------+---------------------+-------------------+--------------------+-----------------+---------------------+--------------------------+------------------------------+----------------+----------------------+
5 rows in set (0.001 sec)

From this we can see that table order_line from database tpcc1000 is being rotated. Operation is finished when there is no tables using encryption i.e. have min_key_version != 0.

MariaDB [tpcc1000]> select * from information_schema.innodb_tablespaces_encryption where min_key_version != 0 or rotating_or_flushing = 1;
Empty set (0.000 sec)

If the encryption setup needs to be removed from the configuration now is the time to shutdown the server. If configuration uses redo log encryption i.e. innodb-encrypt-log=ON take backups from your database including InnoDB log files and after that remove InnoDB log files as they are unusable if they contain encrypted data.

rm -rf ib_logfile*

Remove encryption setup from the configuration and restart the server. Now you have a database instance where no encryption is used.

Conclusion

Moving a database to an encrypted state as seen above requires the server to be restarted and requires a careful encryption plugin configuration. How long this operation takes depends on the number of tables and how big these tables are. We have presented a way to monitor this progress and how to speed it up if the hardware used has enough resources. Moving a database to an unencrypted state requires only setting one global variable. However, if encryption is on longer needed and there is a need to remove all references to it, there is a need for one restart. We have shown how to monitor this transition and how to fully remove encryption setup from both the database and configuration.

In this blog, we present a way to move an existing database first to an encrypted state and then, how to move your database to an unencrypted state. 

Login or Register to post comments

by janlindstrom at May 10, 2018 06:28 AM

Shlomi Noach

MySQL master discovery methods, part 4: Proxy heuristics

Note: the method described here is an anti pattern

This is the fourth in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).

Master discovery via Proxy Heuristics

In Proxy Heuristics all clients connect to the master through a proxy. The proxy observes the backend MySQL servers and determines who the master is.

This setup is simple and easy, but is an anti pattern. I recommend against using this method, as explained shortly.

Clients are all configured to connect to, say, cluster1-writer.proxy.example.net:3306. The proxy will intercept incoming requests either based on hostname or by port. It is aware of all/some MySQL backend servers in that cluster, and will route traffic to the master M.

A simple heuristic that I've seen in use is: pick the server that has read_only=0, a very simple check.

Let's take a look at how this works and what can go wrong.

A non planned failover illustration #1

Master M has died, the box had a power failure. R gets promoted in its place. Our recovery tool:

  • Fails over, but doesn't need to run any hooks.

The proxy:

  • Knows both about M and R.
  • Notices M fails health checks (select @@global.read_only returns error since the box is down).
  • Notices R reports healthy and with read_only=0.
  • Routes all traffic to R.

Success, we're happy.

Configuration tip

With an automated failover solution, use read_only=1 in my.cnf at all times. Only the failover solution will set a server to read_only=0.

With this configuration, when M restarts, MySQL starts up as read_only=1.

A non planned failover illustration #2

Master M gets network isolated for 10 seconds, during which time we failover. R gets promoted. Our tool:

  • Fails over, but doesn't need to run any hooks.

The proxy:

  • Knows both about M and R.
  • Notices M fails health checks (select @@global.read_only returns error since the box is down).
  • Notices R reports healthy and with read_only=0.
  • Routes all traffic to R.
  • 10 seconds later M comes back to life, claiming read_only=0.
  • The proxy now sees two servers reporting as healthy and with read_only=0.
  • The proxy has no context. It does not know why both are reporting the same. It is unaware of failovers. All it sees is what the backend MySQL servers report.

Therein lies the problem: you can not trust multiple servers (MySQL backends) to deterministically pick a leader (the master) without them collaborating on some elaborate consensus communication.

A non planned failover illustration #3

Master M box is overloaded, issuing too many connections for incoming connections.

Our tool decides to failover.

  • And doesn't need to run any hooks.

The proxy:

  • Notices M fails health checks (select @@global.read_only does not respond because of the load).
  • Notices R reports healthy and with read_only=0.
  • Routes all traffic to R.
  • Shortly followed by M recovering (since no more writes are sent its way), claiming read_only=0.
  • The proxy now sees two servers reporting as healthy and with read_only=0.

Again, the proxy has no context, and neither do M and R, for that matter. The context (the fact we failed over from M to R) was known to our failover tool, but was lost along the way.

Planned failover illustration

We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R.

  • M is available and responsive, we set it to read_only=1.
  • We set R to read_only=0.
  • All new connections route to R.
  • We should also instruct our Proxy to kill all previous connections to M.

This works very nicely.

Discussion

There is a substantial risk to this method. Correlation between failover and network partitioning/load (illustrations #2 and #3) is reasonable.

The root of the problem is that we expect individual servers to resolve conflicts without speaking to each other: we expect the MySQL servers to correctly claim "I'm the master" without context.

We then add to that problem by using the proxy to "pick a side" without giving it any context, either.

Sample orchestrator configuration

By way of discouraging use of this method I do not present an orchestrator configuration file.

All posts in this series

by shlomi at May 10, 2018 06:10 AM

May 09, 2018

MariaDB Foundation

MariaDB 10.1.33 and MariaDB Galera Cluster 10.0.35 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.1.33, the latest stable release in the MariaDB 10.1 series. See the release notes and changelogs for details. Download MariaDB 10.1.33 Release Notes Changelog What is MariaDB 10.1? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Galera Cluster 10.0.35 Release Notes Changelog What […]

The post MariaDB 10.1.33 and MariaDB Galera Cluster 10.0.35 now available appeared first on MariaDB.org.

by Ian Gilfillan at May 09, 2018 09:41 PM

MariaDB AB

MariaDB Server 10.1.33 and MariaDB Cluster 10.0.35 now available

MariaDB Server 10.1.33 and MariaDB Cluster 10.0.35 now available dbart Wed, 05/09/2018 - 17:18

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.1.33, and MariaDB Cluster 10.0.35. See the release notes and changelogs for details and visit mariadb.com/downloads to download.

Download MariaDB Server 10.1.33

Release Notes Changelog What is MariaDB 10.1?


Download MariaDB Cluster 10.0.35

Release Notes Changelog What is MariaDB Cluster?

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.1.33 and MariaDB Cluster 10.0.35. See the release notes and changelogs for details.

Login or Register to post comments

by dbart at May 09, 2018 09:18 PM

Peter Zaitsev

Does the Version Number Matter?

ProxySQL

ProxySQLYes, it does! In this blog post, I am going to share my recent experiences with ProxySQL and how important the database software version number can be.

Migration

I was working on a migration to Percona XtraDB Cluster (PXC) with ProxySQL, fortunately on a staging environment first so we could catch any issues (like this one).

We installed Percona XtraDB Cluster and ProxySQL on the staging environment and repointed the staging application to ProxySQL. At first, everything looked great. We were able to do some application tests and everything looked good. I advised the customer to do more testing to make sure everything works well.

Something is wrong, but what?

A few days later the customer noticed that their application was not working properly.

We started investigating. Everything seemed well-configured, and the only thing we could see in the application log was the following:

2018-04-20 11:28:31,169 [ default-threads - 42] ERROR Error in lifecycle management : org.hibernate.StaleStateException : Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 {it.tasgroup.monetica.gt.lifecycle.LifeCycle:line 103} (method: error)
org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:85)
at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:70)

Based on this error I still did not know what is wrong. Were some of the queries failing because of PXC, ProxySQL or some other settings?

We redirected the application to one of the nodes from PXC, and everything worked fine. We tried HAproxy as well, and everything worked again. We knew something was happening around ProxySQL which is causing the problem. But we still could not find the problem. Every query went through ProxySQL without any issue.

Debug log is our savior

The customer finally enabled the application debug logging so we could see which query was failing:

delete from TABLENAME where ID='11' and Timestamp ='2018-04-20 16:15:03';

I was confused at first: this is a kind of simple query, what could be wrong? Let’s investigate it on the cluster. When I tried to select the data on the cluster, it gave me back zero results. That’s OK, maybe the row was already deleted?

For this investigation, the slow query logging was enabled and long_query_time set to 0 to log all the queries. I checked the slow query log looking for queries like this. What I found helped me realize what the problem was:

delete from TABLENAME where ID=10 and Timestamp ='2018-04-20 11:17:22.35';
delete from TABLENAME where ID=24 and Timestamp ='2018-04-20 11:17:31.602';
delete from TABLENAME where ID=43 and Timestamp ='2018-04-20 11:18:13.2';
delete from TABLENAME where ID=22 and Timestamp ='2018-04-20 11:11:02.854';
delete from TABLENAME where ID=11 and Timestamp ='2018-04-20 11:21:57';
delete from TABLENAME where ID=64 and Timestamp ='2018-04-20 11:18:34';
delete from TABLENAME where ID=47 and Timestamp ='2018-04-20 10:38:35';
delete from TABLENAME where ID=23 and Timestamp ='2018-04-20 11:30:03';

I hope you see the difference! The first four lines have fractional seconds! At that time, the application was pointed to the cluster directly. So ProxySQL cut off the fractional seconds? That would be a nasty bug.

I checked the application log again with the debug information, and I could see the application does not even use the fractional seconds in the queries when it points to ProxySQL. This is why the query was failing (does not delete any rows), because in the table all the rows had fractional seconds but the queries were not using them.

So why does the application not use fractional seconds with ProxySQL?

First of all, fractional seconds were introduced in MySQL 5.6.4. The application is a Java-based application with Jboss and Hibernate. I knew ProxySQL reports MySQL 5.5. Maybe the application/connector reads the version number and makes decisions based on that?

It was quite easy to test this theory by just changing the version number in ProxySQL like this:

update global_variables set variable_value="5.7.21" where variable_name='mysql-server_version';
load mysql variables to run;save mysql variables to disk;

The application had to be restarted (probably it was caching the previous settings) but after that everything was working as expected.

But be careful, now it will report 5.7.21 for all the hostgroups. What if you have multiple hostgroups with different MySQL versions? It would be nice if you could define this for every hostgroup.

Conclusion

The solution was very easy, but finding the source of the problem took a long time. If you are planning to use ProxySQL, I would always recommend changing the mysql-server_version to match to the underlying MySQL server version number because who knows which connector or application checks the version and makes a decision based on that.

There is another example here where Marco Tusa had a very similar problem with a Java connector.

The post Does the Version Number Matter? appeared first on Percona Database Performance Blog.

by Tibor Korocz at May 09, 2018 05:55 PM

Deploying PMM on Linode: Your $5-Per-Month Monitoring Solution

PMM on Linode small

In this blog, I will show you how to install PMM on Linode as a low-cost database monitoring solution.

Many of my friends use Linode to run their personal sites, as well as small projects. While Linode is no match for Big Cloud providers in features, it is really wonderful when it comes to cost and simplicity: a Linode “nanode” instance offers 1GB of memory, 1 core, 20GB of storage and 1TB of traffic for just $5 a month.

A single Linode instance is powerful enough to use with Percona Monitoring and Management (PMM) to monitor several systems, so I use Linode a lot when I want to demonstrate PMM deployment through Docker, rather than Amazon Marketplace.

Here are step-by-step instructions to get you started with Percona Monitoring and Management (PMM) on Linode in five minutes (or less):

Step 1:  Pick the Linode Type, Location and launch it.

PMM on Linode

Step 2: Name your Linode

This step is optional and is not PMM-related, but you may want to give your Linode an easy-to-remember name instead of something like “linode7796908”. Click on Linode Name and then on “Settings” and enter a name in “Linode Label”.

PMM on Linode 2

Step 3:  Deploy the Image

Click on Linode Name and then on “Deploy an Image”.

PMM on Linode 3

I suggest choosing the latest Ubuntu LTS version and allocating 512MB for the swap file, especially on a Linode with a small amount of memory. Remember to set a strong root password, as Linode allows root password login by default from any IP.

Step 4: Boot Linode

Now prepare the image you need to boot your Linode. Click on the Boot button for that:

PMM on Linode 4

Step 5: Login to the system and install Docker

Use your favorite SSH client to login to the Linode you created using “root” user and password you set at Step 3, and install Docker:

apt install docker.io

Step 6: Run PMM Server

Here are detailed instructions to install the PMM Server on Docker. Below are the commands to do basic installation:

docker pull percona/pmm-server:latest
docker create
  -v /opt/prometheus/data
  -v /opt/consul-data
  -v /var/lib/mysql
  -v /var/lib/grafana
  --name pmm-data
  percona/pmm-server:latest /bin/true
docker run -d
  -p 80:80
  --volumes-from pmm-data
  --name pmm-server
  --restart always
  percona/pmm-server:latest

Note: This deploys PMM Server without authentication. For anything but test usage, you should set a password by following instructions on this page.

You’re done!

You’ve now installed PMM Server and you can see it monitoring itself by going to the server IP with a browser.

PMM on Linode 5

Now you can go ahead and install the PMM Client on the nodes you want to monitor!

The post Deploying PMM on Linode: Your $5-Per-Month Monitoring Solution appeared first on Percona Database Performance Blog.

by Peter Zaitsev at May 09, 2018 12:13 AM

May 08, 2018

Peter Zaitsev

How to Enable Amazon RDS Remote Access

Amazon RDS remote access

It’s easy to enable Amazon RDS remote access when launching an Amazon RDS instance, but there can be many issues. I created this blog as a guide describing the various issues/configurations we might encounter.

As the first step, we need to select a VPC where we will launch our Amazon RDS instance. The default VPC has all the required settings to make the instance remotely available; we just have to enable it by selecting “Yes” at Public accessibility.

For this example, we used the Default VPC and asked AWS to create a new security group.

Once the instance is created, we can connect to the “Endpoint” address:

[root@server1 ~]# mysql -h publicdb.cbnuzwwzlcf1.eu-west-3.rds.amazonaws.com -u dbuser -p
Enter password: XXXXXX
mysql> s
--------------
mysql  Ver 14.14 Distrib 5.7.19-17, for Linux (x86_64) using  6.2
Connection id: 14
Current database:
Current user: dbuser@server1.hostname.com
SSL: Cipher in use is AES256-SHA
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.6.37 MySQL Community Server (GPL)
Protocol version: 10
Connection: publicdb.cbnuzwwzlcf1.eu-west-3.rds.amazonaws.com via TCP/IP
Server characterset: latin1
Db     characterset: latin1
Client characterset: utf8
Conn.  characterset: utf8
TCP port: 3306
Uptime: 1 min 56 sec
Threads: 2  Questions: 9986  Slow queries: 0 Opens: 319  Flush tables: 1 Open tables: 80  Queries per second avg: 86.086
--------------
mysql>

When AWS creates the security group after we select the option to make it publicly accessible, it appears that AWS takes care of everything. But what if we check the created security groups?

It created a rule to enable incoming traffic, as security group works as a whitelist (it denies everything except the matching rules). 

As we can see here, AWS only created the inbound rule for my current IP address, which means once we change IPs or try to connect from another server, it will fail. To get around that, we need to add another rule:

Adding the 0.0.0.0/0 rule opens the port for the world. This is dangerous! Since anyone can try connecting, it’s much better if we can supply a list of IPs or ranges we want enabled for remote access, even from outside of AWS.

Running remotely accessible RDS in custom VPC

To run RDS in a new VPC or in an existing VPC, we need to ensure a couple of things. 

The VPC needs to have at least two subnets. We believe this is something Amazon asks so that the VPC is ready if you choose to move to a Multi-AZ master, or to simply spread the read-only instances across multiple AZ for higher availability.

If you want to make the RDS cluster remotely available, we need to attach an IGW (Internet Gateway) to the VPC. If you don’t, it isn’t able to communicate with the outside world. To do that, go to VPC -> Internet gateways and hit “Create Internet Gateway”:

Once it’s created, select “Attach to VPC” and select your VPC. 

Still, you won’t be able to reach the internet as we need to add route towards the newly attached internet gateway. 

To do that, go to “Route Tables” and select our VPC, and add the following route (0.0.0.0/0 means it’s going to be the default gateway, and all non-internal traffic needs to be routed towards it):


Hit Save. Now the VPC has Internet access, just like AWS’s Default VPC.

The post How to Enable Amazon RDS Remote Access appeared first on Percona Database Performance Blog.

by Janos Ruszo at May 08, 2018 09:27 PM

Jean-Jerome Schmidt

New Webinar on How to Migrate to Galera Cluster for MySQL & MariaDB

Join us on Tuesday May 29th for this new webinar with Severalnines Support Engineer Bart Oles, who will walk you through what you need to know in order to migrate from standalone or a master-slave MySQL/MariaDB setup to Galera Cluster.

When considering such a migration, plenty of questions typically come up, such as: how do we migrate? Does the schema or application change? What are the limitations? Can a migration be done online, without service interruption? What are the potential risks?

Galera Cluster has become a mainstream option for high availability MySQL and MariaDB. And though it is now known as a credible replacement for traditional MySQL master-slave architectures, it is not a drop-in replacement.

It has some characteristics that make it unsuitable for certain use cases, however, most applications can still be adapted to run on it.

The benefits are clear: multi-master InnoDB setup with built-in failover and read scalability.

Join us on May 29th for this walk-through on how to migrate to Galera Cluster for MySQL and MariaDB.

Sign up below!

Date, Time & Registration

Europe/MEA/APAC

Tuesday, May 29th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)

Register Now

North America/LatAm

Tuesday, May 29th at 09:00 PDT (US) / 12:00 EDT (US)

Register Now

Agenda

  • Application use cases for Galera
  • Schema design
  • Events and Triggers
  • Query design
  • Migrating the schema
  • Load balancer and VIP
  • Loading initial data into the cluster
  • Limitations:
    • Cluster technology
    • Application vendor support
  • Performing Online Migration to Galera
  • Operational management checklist
  • Belts and suspenders: Plan B
  • Demo

Speaker

Bartlomiej Oles is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.

We look forward to “seeing” you there and to insightful discussions!

by jj at May 08, 2018 02:11 PM

Shlomi Noach

MySQL master discovery methods, part 3: app & service discovery

This is the third in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).

App & service discovery

Part 1 and part 2 presented solutions where the app remained ingorant of master's identity. This part takes a complete opposite direction and gives the app ownership on master access.

We introduce a service discovery component. Commonly known are Consul, ZooKeeper, etcd, highly available stores offering key/value (K/V) access, leader election or full blown service discovery & health.

We satisfy ourselves with K/V functionality. A key would be mysql/master/cluster1 and a value would be the master's hostname/port.

It is the app's responsibility at all times to fetch the identity of the master of a given cluster by querying the service discovery component, thereby opening connections to the indicated master.

The service discovery component is expected to be up at all times and to contain the identity of the master for any given cluster.

A non planned failover illustration #1

Master M has died. R gets promoted in its place. Our recovery tool:

  • Updates the service discovery component, key is mysql/master/cluster1, value is R's hostname.

Clients:

  • Listen on K/V changes, recognize that master's value has changed.
  • Reconfigure/refresh/reload/do what it takes to speak to new master and to drop connections to old master.

A non planned failover illustration #2

Master M gets network isolated for 10 seconds, during which time we failover. R gets promoted. Our tool (as before):

  • Updates the service discovery component, key is mysql/master/cluster1, value is R's hostname.

Clients (as before):

  • Listen on K/V changes, recognize that master's value has changed.
  • Reconfigure/refresh/reload/do what it takes to speak to new master and to drop connections to old master.
  • Any changes not taking place in a timely manner imply some connections still use old master M.

Planned failover illustration

We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R.

  • App should start connecting to R.

Discussion

The app is the complete owner. This calls for a few concerns:

  • How does a given app refresh and apply the change of master such that no stale connections are kept?
    • Highly concurrent apps may be more difficult to manage.
  • In a polyglot app setup, you will need all clients to use the same setup. Implement same listen/refresh logic for Ruby, golang, Java, Python, Perl and notably shell scripts.
    • The latter do not play well with such changes.
  • How can you validate that the change of master has been detected by all app nodes?

As for the service discovery:

  • What load will you be placing on your service discovery component?
    • I was familiar with a setup where there were so many apps and app nodes and app instances, such that the amount of connections was too much for the service discovery . In that setup caching layers were created, which introduced their own consistency problems.
  • How do you handle service discovery outage?
    • A reasonable approach is to keep using last known master idendity should service discovery be down. This, again, plays better wih higher level applications, but less so with scripts.

It is worth noting that this setup does not suffer from geographical limitations to the master's identity. The master can be anywhere; the service discovery component merely points out where the master is.

Sample orchestrator configuration

An orchestrator configuration would look like this:

  "ApplyMySQLPromotionAfterMasterFailover": true,
  "KVClusterMasterPrefix": "mysql/master",
  "ConsulAddress": "127.0.0.1:8500",
  "ZkAddress": "srv-a,srv-b:12181,srv-c",
  "PostMasterFailoverProcesses": [
    “/just/let/me/know about failover on {failureCluster}“,
  ],

In the above:

  • If ConsulAddress is specified, orchestrator will update given Consul setup with K/V changes.
  • At 3.0.10, ZooKeeper, via ZkAddress, is still not supported by orchestrator.
  • PostMasterFailoverProcesses is here just to point out hooks are not strictly required for the operation to run.

See orchestrator configuration documentation.

All posts in this series

by shlomi at May 08, 2018 08:02 AM

May 07, 2018

Open Query Pty Ltd

SSL certificates – not optional

We made a stuff-up over the weekend.  Historically we have different SSL certificates for different services in our realm, and last Saturday the certificate for the main website expired.  Of course we noticed at that point, but we should have had an internal notification earlier and somehow that had failed.  Fixed, but it would have been much better if the front-end hadn’t been temporarily inaccessible.  It was, because of HTTPS Strict Transport Security (HSTS). Any browser that had previously talked with our website (rightfully) refuses to talk to it if it doesn’t see a valid certificate.  Going back to non-HTTPS is not an option either, for this reason as well as others mentioned below. However, we do have different certificates for different services, so it was only our frontend website that was affected (bad enough), the various other services for our clients fortunately were unaffected.

Let's Encrypt logoGoing forward though, keeping up-to-date with the certificates and automatically renewing them is much easier now than it used to be. Let’s Encrypt® has been around for a while, but a few months ago they started supporting wildcard certificates.  With non-wildcard, one of the ways Let’s Encrypt can verify that you own the site is by doing a challenge/response on port 443 of the website address; Certbot will temporarily listen there and give the appropriate answers.  For a wildcard, that doesn’t work, because you can have an infinite number of subdomains and Let’s Encrypt needs to be certain that you actually control these.  So in v2 of the API there’s support for DNS based validation.  Through special TXT records for which Let’s Encrypt provides the token on every domain request, you can prove that you are in control of the DNS servers for the domain. That’s pretty slick!

There are integrations for many hosting providers as well as Cloudflare, which through a secure mechanism allow Let’s Encrypt to update those records in your DNS, and then validate. As Let’s Encrypt certificates are only valid for 3 months, this is important for the automation.  If you run your own DNS servers, you can still automate the DNS based verification process by setting up RFC-2136 remote updates in your DNS server (Bind9 can do it, it’s been around for many years – that said, being an older system, it can be rather finicky to set up).

Let’s Encrypt’s Certbot can take care of the entire updating process, including reloading your webserver’s or reverse proxy’s config.  Do make sure you use a recent Certbot, as all the appropriate support is quite recent. We had to grab Certbot from Github the first time as the Debian release hadn’t updated quite far enough yet – it has now.

We think that the EFF has done brilliantly with setting up Let’s Encrypt, and helping everyone move towards a fully encrypted web.  Particularly with the cost-factor removed, there’s no reason to not offer HTTPS to users – whether for a website, or an API.  Respecting one’s users and their online privacy is really a must-do.  Companies that don’t, increasingly look bad.  See it this way: going fully HTTPS is an opportunity to make a good first impression.  And did you know it also affects your ranking in search engines?  Now there’s a good incentive for the PHB

Do you need an EV certificate?  Probably not, as they actually have very little meaning – and even less so with various CAs having distinctly flawed verification processes.

Do you need a site seal from your CA (Certificate Authority)?  Really not.  It just advertises the CA, and actually enables them to track your users – if you get the seal served from the CA’s URL, that’s every single user. Not cool. So just don’t.

Final hint: if you do get a wildcard certificate from Let’s Encrypt, make sure you include both the wildcard and non-wildcard in the certificate domain names, otherwise it won’t work. So, *.example.com as well as example.com. You may not have noticed that your wildcard certificate always contains these, as many CAs automatically include the appropriate extra item.  Certbot just does exactly what you tell it to, so it’s something to be aware of.

by Arjen Lentz at May 07, 2018 11:37 PM

Peter Zaitsev

Webinar Wednesday, May 9, 2018: MySQL Troubleshooting and Performance Optimization with Percona Monitoring and Management (PMM)

MySQL Troubleshooting

MySQL TroubleshootingPlease join Percona’s CEO, Peter Zaitsev as he presents MySQL Troubleshooting and Performance Optimization with PMM on Wednesday, May 9, 2018, at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

Optimizing MySQL performance and troubleshooting MySQL problems are two of the most critical and challenging tasks for MySQL DBAs. The databases powering your applications must handle heavy traffic loads while remaining responsive and stable so that you can deliver an excellent user experience. Further, DBAs’ bosses expect solutions that are cost-efficient.

In this webinar, Peter discusses how you can optimize and troubleshoot MySQL performance and demonstrate how Percona Monitoring and Management (PMM) enables you to solve these challenges using free and open source software. We will look at specific, common MySQL problems and review the essential components in PMM that allow you to diagnose and resolve them.

Register for the webinar now.

Peter ZaitsevPeter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. The Inc. 5000 recognized Percona in 2013, 2014, 2015 and 2016. Peter was an early employee at MySQL AB, eventually leading the company’s High-Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High-Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. He was also tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization is one of percona.com’s most popular downloads.

The post Webinar Wednesday, May 9, 2018: MySQL Troubleshooting and Performance Optimization with Percona Monitoring and Management (PMM) appeared first on Percona Database Performance Blog.

by Peter Zaitsev at May 07, 2018 07:35 PM

Shlomi Noach

MySQL master discovery methods, part 2: VIP & DNS

This is the second in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).

Master discovery via VIP

In part 1 we saw that one the main drawbacks of DNS discovery is the time it takes for the apps to connect to the promoted master. This is the result of both DNS deployment time as well as client's TTL.

A quicker method is offered: use of VIPs (Virtual IPs). As before, apps would connect to cluster1-writer.example.net, cluster2-writer.example.net, etc. However, these would resolve to specific VIPs.

Say cluster1-writer.example.net resolves to 10.10.0.1. We let this address float between servers. Each server has its own IP (say 10.20.0.XXX) but could also potentially claim the VIP 10.10.0.1.

VIPs can be assigned by switches and I will not dwell into the internals, because I'm not a network expert. However, the following holds:

  • Acquiring a VIP is a very quick operation.
  • Acquiring a VIP must take place on the acquiring host.
  • A host may be unable to acquire a VIP should another host holds the same VIP.
  • A VIP can only be assigned within a bounded space: hosts connected to the same switch; hosts in the same Data Center or availability zone.

A non planned failover illustration #1

Master M has died, the box had a power failure. R gets promoted in its place. Our recovery tool:

  • Attempts to connect to M so that it can give up the VIP. The attempt fails because M is dead.
  • Connects to R and instructs it to acquire the VIP. Since M is dead there is no objection, and R successfully grabs the VIP.
  • Any new connections immediately route to the new master R.
  • Clients with connections to M cannot connect, issue retries, immediately route to R.

A non planned failover illustration #2

Master M gets network isolated for 30 seconds, during which time we failover. R gets promoted. Our tool:

  • Attempts to connect to M so that it can give up the VIP. The attempt fails because M is network isolated.
  • Connects to R and instructs it to acquire the VIP. Since M is network isolated there is no objection, and R successfully grabs the VIP.
  • Any new connections immediately route to the new master R.
  • Clients with connections to M cannot connect, issue retries, immediately route to R.
  • 30 seconds later M reappears, but no one pays any attention.

A non planned failover illustration #3

Master M box is overloaded. It is not responsive to new connections but may slowly serves existing connections. Our tool decides to failover:

  • Attempts to connect to M so that it can give up the VIP. The attempt fails because M is very loaded.
  • Connects to R and instructs it to acquire the VIP. Unfortunately, M hasn't given up the VIP and still shows up as owning it.
  • All existing and new connections keep on routing to M, even as R is the new master.
  • This continues until some time has passed and we are able to manually grab the VIP on R, or until we forcibly network isolate M or forcibly shut it down.

We suffer outage.

Planned failover illustration

We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R.

  • M is available and responsive, we ask it to give up the VIP, which is does.
  • We ask R to grab the VIP, which it does.
  • All new connections route to R.
  • We may still see old connections routing to M. We can forcibly network isolate M to break those connections so as to cause reconnects, or restart apps.

Discussion

As with DNS discovery, the apps are never told of the change. They may be forcibly restarted though.

Grabbing a VIP is a quick operation. However, consider:

  • It is not guaranteed to succeed. I have seen it fail in various situations.
  • Since releasing/acquiring of VIP can only take place on the demoted/promoted servers, respectively, our failover tool will need to:
    • Remote SSH onto both boxes, or
    • Remote exec a command on those boxes
  • Moreover, the tool will do so sequentially. First we must connect to demoted master to give up the VIP, then to promoted master to acquire it.
  • This means the time at which the new master grabs the VIP depends on how long it takes to connect to the old master to give up the VIP. Seeing that the old master had trouble causing failover, we can expect correlation to not being able to connect to old master, or seeing slow connect time.
  • An alternative exists, in the form of Pacemaker. Consider Percona's Replication Manager guide for more insights. Pacemaker provides a single point of access from where the VIP can be moved, and behind the scenes it will communicate to relevant nodes. This makes it simpler on the failover solution configuration.
  • We are constrained by physical location.
  • It is still possible for existing connection to keep on communicating to the demoted master, even while the VIP has been moved.

VIP & DNS combined

Per physical location, we could choose to use VIP. But should we need to failover to a server in another DC, we could choose to combine the DNS discovery, discussed in part 1.

We can expect to see faster failover time on a local physical location, and longer failover time on remote location.

Sample orchestrator configuration

What kind of remote exec method will you have? In this sample we will use remote (passwordless) SSH.

An orchestrator configuration would look like this:

  "ApplyMySQLPromotionAfterMasterFailover": true,
  "PostMasterFailoverProcesses": [
    "ssh {failedHost} 'sudo ifconfig the-vip-interface down'",
    "ssh {successorHost} 'sudo ifconfig the-vip-interface up'",
    "/do/what/you/gotta/do to apply dns change for {failureClusterAlias}-writer.example.net to {successorHost}"
  ],  

In the above:

  • Replace SSH with any remote exec method you may use.
    • But you will need to set up the access/credentials for orchestrator to run those operations.
  • Replace ifconfig with service quagga stop/start or any method you use to release/grab VIPs.

See orchestrator configuration documentation.

All posts in this series

by shlomi at May 07, 2018 06:46 AM

May 04, 2018

Peter Zaitsev

How Binary Logs Affect MySQL 8.0 Performance

As part of my benchmarks of binary logs, I’ve decided to check how the recently released MySQL 8.0 performance is affected in similar scenarios, especially as binary logs are enabled by default. It is also interesting to check how MySQL 8.0 performs against the claimed performance improvements in redo logs subsystem.

I will use a similar setup as in my last blog with MySQL 8.0, using the utf8mb4 charset.

I have a few words about MySQL 8.0 tuning. Dimitri’s recommends in his blog posts using innodb_undo_log_truncate=off and innodb_doublewrite=0. However, in my opinion, using these setting are the same as participating in a car race without working breaks: you will drive very fast, but it will end badly. So, contrary to Dimitri’s recommendations I used innodb_undo_log_truncate=on and innodb_doublewrite=1.

Servers Comparison

For the first run, let’s check the results without binary logs vs. with binary logs enabled, but with sync_binlog=1 for Percona Server for MySQL 5.7 vs. MySQL 8.0.


MySQL 8.0 Performance

In tabular form:

Binary log Buffer pool, GB MYSQL8 PS57 Ratio PS57/MySQL8
binlog 5 768.0375 771.5532 1.00
binlog 10 1224.535 1245.496 1.02
binlog 20 1597.48 1625.153 1.02
binlog 30 1859.603 1979.328 1.06
binlog 40 2164.329 2388.804 1.10
binlog 50 2572.827 2942.082 1.14
binlog 60 3158.408 3528.791 1.12
binlog 70 3883.275 4535.281 1.17
binlog 80 4390.69 5246.567 1.19
nobinlog 5 788.9388 783.155 0.99
nobinlog 10 1290.035 1294.098 1.00
nobinlog 20 1745.464 1743.759 1.00
nobinlog 30 2109.301 2158.267 1.02
nobinlog 40 2508.28 2649.695 1.06
nobinlog 50 3061.196 3334.766 1.09
nobinlog 60 3841.92 4168.089 1.08
nobinlog 70 4772.747 5140.316 1.08
nobinlog 80 5727.795 5947.848 1.04

 

Binary Log Effect

MySQL 8.0 Performance 2

In tabular form:

Buffer pool, GB server binlog nobinlog Ratio nobinlog / binlog
5 MYSQL8 768.0375 788.9388 1.03
5 PS57 771.5532 783.155 1.02
10 MYSQL8 1224.535 1290.0352 1.05
10 PS57 1245.496 1294.0983 1.04
20 MYSQL8 1597.48 1745.4637 1.09
20 PS57 1625.153 1743.7586 1.07
30 MYSQL8 1859.603 2109.3005 1.13
30 PS57 1979.328 2158.2668 1.09
40 MYSQL8 2164.329 2508.2799 1.16
40 PS57 2388.804 2649.6945 1.11
50 MYSQL8 2572.827 3061.1956 1.19
50 PS57 2942.082 3334.7656 1.13
60 MYSQL8 3158.408 3841.9203 1.22
60 PS57 3528.791 4168.0886 1.18
70 MYSQL8 3883.275 4772.7466 1.23
70 PS57 4535.281 5140.316 1.13
80 MYSQL8 4390.69 5727.795 1.30
80 PS57 5246.567 5947.8477 1.13

 

Conclusions

It seems that binary logs have quite an effect MySQL 8.0, and we see up to a 30% performance penalty as opposed to the 13% for Percona Server for MySQL 5.7.

In general, for in-memory workloads, Percona Server for MySQL 5.7 outperforms MySQL 8.0 by 10-20% with binary logs enabled, and 4-9% without binary logs enabled.

For io-bound workloads (buffer pool size <= 30GB), the performance numbers for Percona Server for MySQL and MySQL are practically identical.

Hardware spec

Supermicro server:

  • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
  • 2 sockets / 28 cores / 56 threads
  • Memory: 256GB of RAM
  • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
  • Filesystem: ext4/xfs
  • Percona-Server-5.7.21-20
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic

Extra Raw Results, Scripts and Config

My goal is to provide fully repeatable benchmarks. I have shared all scripts and settings I used in the following GitHub repo:

https://github.com/Percona-Lab-results/201805-sysbench-tpcc-mysql8

 

The post How Binary Logs Affect MySQL 8.0 Performance appeared first on Percona Database Performance Blog.

by Vadim Tkachenko at May 04, 2018 10:56 PM

How Binary Logs (and Filesystems) Affect MySQL Performance

I want to take a closer look at MySQL performance with binary logs enabled on different filesystems, especially as MySQL 8.0 comes with binary logs enabled by default.

As part of my benchmarks of the MyRocks storage engine, I’ve noticed an unusual variance in throughput for the InnoDB storage engine, even though we spent a lot of time making it as stable as possible in Percona Server for MySQL. In the end, the culprit was enabled binary logs. There is also always the question, “If there is a problem with EXT4, does XFS perform differently?” To answer that, I will repeat the same benchmark on the EXT4 and XFS filesystems.

You can find our previous experiments with binary logs here: https://www.percona.com/blog/2016/06/03/binary-logs-make-mysql-5-7-slower-than-5-6/.

Benchmark Setup

A short overview of the benchmark setup:

  • Percona Server for MySQL 5.7.21
  • InnoDB storage engine
  • In contrast to the previous benchmark, I enabled foreign keys, used REPEATABLE-READ isolation level, and I used UTF8 character sets. Because of these changes, the results are not really comparable with the previous results.
  • The dataset is the same: sysbench-tpcc with ten tables and 100 warehouses, resulting in a total of 1000 warehouses, and about a 90GB dataset size.
  • I will use innodb_buffer_pool_size 80GB, 70GB, and 60GB to emulate different IO loads and evaluate how that affects binary logs writes.

Initial Results

For the first run, let’s check the results without binary logs vs. with binary log enabled, but with sync_binlog=0:

Binary Log Performance

We can see that results without binary logs are generally better, but we can also see that with binary logs enabled and sync_binglog=0, there are regular drops to 0 for 1-2 seconds. This basically results in stalls in any connected application.

So, enabling binary logs may result in regular application stalls. The reason for this is that there is a limit on the size of the binary log file (max_binlog_size), which is 1GB. When the limit is reached, MySQL has to perform a binary log rotation. With sync_binlog=0, all previous writes to the binary log are cached in the OS cache, and during rotation, MySQL forces synchronous flushing of all changes to disk. This results in complete stalls every ~40 seconds (the amount of time it takes to fill 1GB of binary log in the above tests).

How can we deal with this? The obvious solution is to enable more frequent sync writes of binary logs. This can be achieved by setting sync_binlog > 0. The popular choice is the most strict, sync_binlog=1, providing the most guarantees. The strict setting also comes with noted performance penalties. I will also test sync_binlog=1000 and sync_binlog=10000, which means perform synchronous writes of binary logs every 1000 and 10000 transactions, respectively.

The Results

Binary Log Performance 1

The same results in a tabular format with median throughput (tps, more is better)

Bp sync_binlog 0 1 1000 10000 nobinlog
60 GB 4174.945 3598.12 3950.19 4205.165 4277.955
70 GB 5053.11 4541.985 4714 4997.875 5328.96
80 GB 5701.985 5263.375 5303.145 5664.155 6087.925

 

Some conclusions we can make:

  • sync_binlog=1 comes with the biggest performance penalty, but with minimal variance. This is comparable to running without binary logs.
  • sync_binlog=0 provides best (for enabled binary logs) performance, but the variance is huge.
  • sync_binlog=1000 is a good compromise, providing better performance than sync_binlog=1 with minimal variance.
  • sync_binlog=10000 might not be good, showing less variance than with 0, but it is still big.

So what value should we use? This is probably a choice between sync_binlog=1 or some value like 1000. It depends on your use case and your storage solution. In the case of slow storage, sync_binlog=1 may show a bigger penalty compared to what I can see on my enterprise SATA SSD SAMSUNG SM863.

Filesystems

All of the above results were on an EXT4 filesystem. Let’s compare to XFS. Will it show different throughput and variance?

Binary Log Performance 2

The median throughput in tabular format:

sync_binlog Buffer pool (GB) EXT4 XFS
0 60 4174.945 3902.055
0 70 5053.11 4884.075
0 80 5701.985 5596.025
1 60 3598.12 3526.545
1 70 4541.985 4538.455
1 80 5263.375 5255.38
1000 60 3950.19 3620.05
1000 70 4714 4526.49
1000 80 5303.145 5150.11
10000 60 4205.165 3874.03
10000 70 4997.875 4845.85
10000 80 5664.155 5557.61
No binlog 60 4277.955 4169.215
No binlog 70 5328.96 5139.625
No binlog 80 6087.925 5957.015

 

We can observe the general trend that median throughput on XFS is a little worse than with EXT4, with practically identical variance.

The difference in throughput is minimal. You can use either XFS or EXT4.

Hardware Spec

Supermicro server:

  • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
  • 2 sockets / 28 cores / 56 threads
  • Memory: 256GB of RAM
  • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
  • Filesystem: ext4/xfs
  • Percona-Server-5.7.21-20
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic

Extra Raw Results, Scripts and Config

My goal is to provide fully repeatable benchmarks. To that effect, I’ve shared all the scripts and settings I used in the following GitHub repo:

https://github.com/Percona-Lab-results/201805-sysbench-tpcc-binlog-fs

The post How Binary Logs (and Filesystems) Affect MySQL Performance appeared first on Percona Database Performance Blog.

by Vadim Tkachenko at May 04, 2018 10:50 PM

This Week in Data with Colin Charles 38: Percona Live Europe 2018 and PostgreSQL

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

The week after Percona Live Santa Clara 2018 tends to be much quieter, aided by the fact that I took a few days away during Labor Day. The next thing to look out for is Percona Live Europe 2018, which at this stage is really a note to let you save the dates: November 5-7 2018, at the Radisson Blu, in Frankfurt. There is no call for papers yet, there is no committee, and it is not listed yet at the Percona Live Conferences page. Hang in there! We’ll open the call for papers soon!

Now that Percona is in the PostgreSQL space, it seems prudent that there will also be more PostgreSQL content here. A great resource naturally is Planet PostgreSQL. There also seems to be another resource on The internals of PostgreSQL, and as books go, Mastering PostgreSQL in Application Development sure looks very interesting. Do you have recommended resources?

Releases

Link List

Upcoming appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 38: Percona Live Europe 2018 and PostgreSQL appeared first on Percona Database Performance Blog.

by Colin Charles at May 04, 2018 05:35 PM

Percona Live 2018 Community Report

So, after a whirlwind few days, Percona Live 2018 has been and gone. There was a great energy about the conference, and it was fantastic to meet so many open source database enthusiasts and supporters. A few things that I experienced:

  • Your great willingness to share knowledge. It was a fantastic place to learn for those who have experience from a different field of technology. Almost everyone seemed to be very open and generous with their time.
  • The “superstars” from our industry are not so scary. They are as willing to be open and generous with their experience and views as any of the other attendees, and equally as interested in making new discoveries.
  • There aren’t many times you can sit down to a (community) dinner, to share food and anecdotes with people from USA, UK, Germany and Armenia at the same time. I thoroughly enjoyed the company, and wish there were more opportunities for similar encounters. Thanks to Pythian for setting that up.
  • My Percona colleagues are wonderful, committed human beings with more than a passing interest in music – the Percona Sessions have got to happen…
  • That you can run a very long way in a day between the Santa Clara Convention Center and the Hyatt Regency Hotel.

I had very many positive conversations with delegates. You offered any criticisms along with a suggestion of how we should tweak things for the better. Our community is a creative, generous, problem-solving machine, though I shouldn’t be surprised at that.

So, with only a few more duties to complete, I’d like to thank you for your company. For those that did not make it to this year’s event, I hope that you might be persuaded to join us in the future — either at Percona Live Europe 2018 or at Percona Live 2019.

Packt Prizes

Our media sponsor, Packt, generously provided us with three free ebooks and two free instruction videos as prizes for delegates:

  1. Mastering MongoDB 3.x
  2. MySQL 8 Cookbook
  3. MongoDB Administrator’s Guide
  4. Elastic Databases and Data Processing with AWS [Video]
  5. AWS Administration – Database, Networking, and Beyond [Video]

There are another 10 titles for which we can offer delegates a 50% discount: you should have received your emails. Thanks are due again to Packt.

Community Blog

While I have your attention, I’d like to let you know about the forthcoming Percona community blog. Having been some time in the planning, this is starting really soon, and is like a year-round, online, Percona Live. We already have some keen writers for this, but if you would be interested in creating content (whether written, podcast or webcast) for the community blog, then please get in touch. The brief is very wide — as long as your submission is relevant to the open source database community then it would be welcome.

Finally, I would like to invite feedback on how to make the event shine even brighter — please drop me an email if you have suggestions or ideas. Meanwhile, I hope you enjoy these photographs of the MySQL Community Awards Winners, presented at PL18. You can read more about this community initiative.

Perhaps you’ll be able to join us in Frankfurt in November? Time to start thinking about those submissions for the call for papers!

Or perhaps next year at Percona Live Open Source Database Conference in 2019 – wherever it may be!

Photographs: Randy Tunnell Photography

The post Percona Live 2018 Community Report appeared first on Percona Database Performance Blog.

by Lorraine Pocklington at May 04, 2018 04:41 PM

May 03, 2018

Peter Zaitsev

Q&A: “Percona XtraDB Cluster 5.7 and ProxySQL for Your High Availability Needs” Webinar

High Availability

High AvailabilityOn March 22, 2018, we held a webinar on how Percona XtraDB cluster 5.7 (PXC) and ProxySQL can help achieve your database clustering high availability needs. Firstly, thanks to all the attendees for taking time to attend the webinar and we are sure you had a webinar experience. We tried answering some of your high availability questions during the call but due to time restrictions if we missed some of the questions then this blog will help clarify them.

Q. You say the replication to servers is virtually synchronous, if there is any latency, does ProxySQL detect this and select a node accordingly?

A. PXC nodes are virtually synchronous, which effectively means while the apply/commit of a transaction may be in progress on one node, other nodes may have completed applying it. There is no direct way for ProxySQL to know about this, but it could be traced by looking at wsrep_last_applied and wsrep_last_committed. Also, if a user expects to always fetch updated data, then a wsrep_sync_wait configuration can be used.

Q. Hello, do you suggest geoReplication / wan clustering for an e-commerce website? Let ‘s say www.domain.it served by an Italian pxc cluster and www.domain.us served by a US PXC cluster?

A. Geo-distributed PXC is already in use by a lot of customers, and is meant to exactly serve the use-case you have pointed out. An important aspect of geo-distributed clustering (that often gets missed) is to configure timeout and window setting to accommodate network latency and segment settings. There is also a separate webinar on this topic and you can surely get in touch with us to find out more details on how to configure it correctly.

Q. Can we add a read-only node with PXC?

A. You can simply mark the selected nodes as super_read_only (or read_only). Replication continues as normal but direct traffic is blocked.

Q. Does the ProxySQL impact performance?

A. Using all of ProxySQL’s features gives you a huge performance improvement. Here is the sample use case.

Q. I have not had “excellent” results with Drupal. (e.g., clearing cache sometimes causes corruption, although i ensure all tables do have a primary key). Any advice on its suitability? I am currently using proxySQL with a single percona (non-cluster) 5.7 but would like to try again with PXC if advisable.

A. Not sure what exact problem you faced, but you may want to check this variable and articles around it wsrep_drupal_282555_workaround.

Q. Another question (to queue up as you are able to answer if possible): do you recommend SSL between ProxySQL — and specifically, what are the performance impacts, especially if there’s some latency between proxySQL and master percona db for writes?

A. We recommend SSL for security reasons, but it depends on the individual setup. Currently, ProxySQL does not support SSL from frontends. This feature is only available since 2.0. https://github.com/sysown/proxysql/wiki/SSL-Support

Q1. Can i put two PXC clusters in master-slave replication mode with automatic failover?

Q2. How can i setup two PXC clusters in master-slave replication model with automatic failover?

A. You can have async master-slave replication link among two PXC clusters, but automatic failover of the node (if the acting master from cluster-1 fails then another active nodes of the cluster takes over as master) is currently not supported.

Q. Can the replication be done from ProxySQL level, so that if one node goes down in slave PXC, another node in PXC will take over the slave role?

A. This feature is not supported through ProxySQL. You can monitor replication lag through ProxySQL.

Q. Suppose if i have 5 node cluster in DC1 & DC2, how can we make transaction successful as soon as nodes in DC1 are committed rather than waiting for certification from nodes in DC2?

A. Given the transaction is executed on the local node and during commit (as a pre-commit stage) it is replicated (replication action doesn’t include certification and commit) to the other nodes of the cluster. Once replicated each node can parallelly certify, apply and commit the transaction. So this effectively means a transaction doesn’t need to be certified on all the nodes of the cluster before communicating commit success to end-user. Once the transaction is replicated, originating node can complete the local commit and communicate success to the application.

Q. Hi, thank you for the webinar is ProxySQL support HA, is it a single point of failure?

A. ProxySQL supports Native Clustering, thereby forming a ProxySQL cluster (vs. a single ProxySQL node) and in turn helps avoid a single point failure.

Q.  What is a good setup you will recommend, make proxySQL on some other server/vm or on the same as one of PXC nodes?

A. We would recommend installing ProxySQL on an independent node (or share with other applications). We don’t recommend installing ProxySQL on a PXC node. If the node hosting PXC and ProxySQL goes down (network or power failure), even though the cluster is working, the application will still lose connectivity as the ProxySQL gateway goes down as well.

Q. Let’s say we have three nodes, good quorum, what happened when one node goes down for maintenance what happens to the quorum since only two nodes now?

A. Two nodes can still form the quorum and continue servicing the workload.

Q If the transaction is not committed to all the nodes then will the cluster remains locked for read too?

A. No. The transaction commit is independent of a read action. “transaction commit” can continue in the background and the user can continue to read from the cluster node. If a user has configured wsrep_sync_wait, which effectively means wait for a transaction to get committed to fetch updated data only, then the read may wait for transaction commit to complete.

Q. Is there a way to do partitioning over data? To not have 100% replicate in each master?

A. PXC/Galera, being a multi-master solution, doesn’t recommend unsync data nodes. As an end-user you can still achieve it by setting wsrep_on=off -> execute a workload (this will not be replicated on the cluster) -> wsrep_on=on (all action post this point will again follow replication). This can lead to data inconsistency, though,  and shutdown of the cluster if the workload or action are not properly segregated – so not-recommended.

Q. Are changes done by triggers rollbackable?

A. Yes, they are.

Q. Does ProxySQL prevent “mysql server gone away” in mostly idle daemons?

A. ProxySQL Monitor Module regularly probes the backend nodes and marks the node as OFFLINE in the ProxySQL database if MySQL server is down.

Q. Can proxysql cache rules use regexes?

A. We can use regex with ProxySQL query rules. Go here for more info.

Q. Can PMM be used in Digitalocean droplets?

A. Yes.

Q. In regards to PXC, how much delay is introduced when data is written since it has to appear on all nodes?

A. When a user initiates a transaction on given node (let’s call it an originating node), then it is first applied (not committed) on the said node and a binary write-set is created. This write-set is then replicated on other nodes of the cluster. Once the replication is successful, each node can independently certify, apply and commit the transaction. Since originating node has already applied the transaction, it just needs to certify and commit the transaction. But it is interesting to note that the apply stage on the other replicated node is fast too, given that the transaction is now packed in a database optimized apply format. In short, there would be no delay (or marginal delay). Delay could be higher if the transaction is a huge transaction, as the apply stage could take time. That is one of the reasons Galera doesn’t recommend huge transactions.

Q. How does PXC (Percona XtraDB Cluster) allow DDL (schema changes) on one server with DML on the same table on another server? (This can break MySQL Master-Master replication)?

A. PXC executes DDL using the TOI (total order isolated) protocol. In short, while DDL is executing it takes complete control of the node (no other parallel DML or DDL is allowed). DDL executes at the same position on all then does.

Q. Can ProxySQL split read-write queries based on stored procedure names (patterns)? e.g. sp_write vs sp_read?

A. ProxySQL read/write split is based on mysql_query_rules and hostgroups. For more info.

Q. Can we use ProxySQL with a single node for the query caching feature? Especially since query cache will be discontinued in MySQL 8?

A. If you configure Query Cache properly, you can cache queries for a single node. 

Q. Must binary logging be enabled for ProxySQL / PXC to work?

A. PXC replicates write-sets. While binary logging is not needed, PXC still needs these write-sets that are generated using binary logging module so PXC can then enable emulation based bin logs for a generation of these write-sets (persistence to disk is not needed). If disk space is not a constraint, we recommend you enable binary logging.

Q. Please define Galera and Percona, as well as the relationship between the two?

A. Galera is replication technology owned and developed by Codership and distributed under GPL license. Percona has adopted the said technology along with its Percona Server for MySQL and build PXC. Percona continues to refresh updates made to Galera and related wsrep-plugin on a regular basis. At the same time, Percona also continues to refresh from Percona-Server for MySQL for related enhancement and bug fixes.

Q. Is the ProxySql Admin tool the script/tool that you mentioned would autodetect your existing PXC or there’s a different script? Trying to know if you need to have the PXC and ProxySQL installed at the same time?

With ProxySQL, do we need to wait for active threads on the PXC to drain before shutting down the PXC?

A. ProxySQL Admin (proxysql-admin) script helps you configure your PXC nodes to ProxySQL database. PXC and ProxySQL should be up and running to initiate proxysql-admin script. For more info.

If you trigger PXC node shutdown proxysql_galera_checker script marks the node as offline in the ProxySQL DB, and new connections aren’t redirected to the offline node.

Q. 1) HAProxy and ProxySQL: which one has the better performance when the number of Clusters is large? Up to 30 Clusters?
2) What´s the better tool to monitoring a large number of clusters and nodes?

A. For PXC, we strongly recommend ProxySQL as it is closely integrated with PXC. HAProxy works with PXC as well, and before ProxySQL we had customers using it. For a quick comparison, you can take a look at following article.

Q. When the PXC settings have a maximum connection how does ProxySQL allow for much more than the standard connections?

A. ProxySQL terminates the connection with a connection timeout error.

FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_insert.lua:61: SQL error, errno = 9001, state = 'HY000': Max connect timeout reached while reaching hostgroup 10 after 10012ms

__________________________________________________________________________________

Once again, thanks for your questions and queries. If you still have more questions or need clarification, you can log them at the percona-xtradb-cluster forum. We would also like to know what else you expect from Percona XtraDB Cluster in upcoming releases.

The post Q&A: “Percona XtraDB Cluster 5.7 and ProxySQL for Your High Availability Needs” Webinar appeared first on Percona Database Performance Blog.

by Krunal Bauskar at May 03, 2018 11:48 PM

Causes and Workarounds for Slave Performance Too Slow with Row-Based Events

Slave Performance Too Slow

Slave Performance Too SlowRecently I worked on one customer issue that I would describe as “slave performance too slow”. During a quick analysis, I’ve found that the replication slave SQL thread cannot keep up while processing row-based events from the master’s binary log.

For example:

mysql> SHOW SLAVE STATUSG
*************************** 1. row ***************************
                          ...
              Master_Log_File: binlog.0000185
          Read_Master_Log_Pos: 86698585
                          ...
        Relay_Master_Log_File: binlog.0000185
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                          ...
          Exec_Master_Log_Pos: 380
              Relay_Log_Space: 85699128
                          ...
                  Master_UUID: 98974e7f-2fbc-18e9-72cd-07003817585c
                          ...
           Retrieved_Gtid_Set: 98974e7f-2fbc-18e9-72cd-07003817585c:1055-1057
            Executed_Gtid_Set: 7f42e2c5-3fbc-16e7-7fb8-05003715789a:1-2,
98974e7f-2fbc-18e9-72cd-07003817585c:1-1056
                          ...

The processlist state for the SQL thread can be one of the following: Reading event from the relay log, or System lock, or potentially some other state. In my case:

mysql> SHOW PROCESSLIST;
+----+-----------------+-----------------+------+---------+------+----------------------------------+------------------+
| Id | User            | Host            | db   | Command | Time | State                            | Info             |
+----+-----------------+-----------------+------+---------+------+----------------------------------+------------------+
...
|  4 | system user     |                 | NULL | Connect |  268 | Reading event from the relay log | NULL             |
...
+----+-----------------+-----------------+------+---------+------+----------------------------------+------------------+

What causes that?

Let’s take a look what could potentially cause such behavior and what we need to pay attention to. When the SQL thread applies the change from a row-based event, it has to locate the exact row that was updated. With a primary key, this is trivial as only one row can possibly have the same value for the primary key.

However, if there is no primary key on the table on the replication slave side, the SQL thread must search the entire table to locate the row to update or delete. It repeats the search for each updated row. This search is both very resource usage intensive (CPU usage can be up to 100%) and slow causing the slave to fall behind.

For InnoDB tables, the “hidden” key used for the clustered index for tables without a primary key cannot be used to avoid searching the entire table for the rows to update or delete. We need to keep in mind that the “hidden” key is unique only to each MySQL instance, so the replication master and replication slave generally don’t have the same values for the “hidden” key for the same row.

What can we do to solve that?

The best solution is to ensure that all tables have a primary key. This not only ensures the SQL thread can easily locate rows to update or delete, but it is also considered as a best practice since it ensures all rows are unique.

If there is no way to logically add a natural primary key for the table, a potential solution is to add an auto-increment unsigned integer column as the primary key.

The query below helps you to locate tables without a primary key:

SELECT tables.table_schema, tables.table_name, tables.table_rows
      FROM information_schema.tables
      LEFT JOIN (
        SELECT table_schema, table_name
        FROM information_schema.statistics
        GROUP BY table_schema, table_name, index_name
        HAVING
          SUM(
            CASE WHEN non_unique = 0 AND nullable != 'YES' THEN 1 ELSE 0 END
          ) = COUNT(*)
      ) puks
      ON tables.table_schema = puks.table_schema AND tables.table_name = puks.table_name
      WHERE puks.table_name IS NULL
        AND tables.table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
        AND tables.table_type = 'BASE TABLE' AND engine='InnoDB';

Please note that for InnoDB, there must always be a unique NOT NULL key for all tables. It is required for the clustered index. So adding an explicit “dummy” column as suggested above will not add to the overall storage requirements as it will merely replace the hidden key.

It’s not always possible to add a primary key to the table immediately if, for example, there are many relations on the application side/legacy system, lack of resources, unknown application behavior after the change which required testing, etc.

In this case, a short-term solution is to change the search algorithm used by the replication slave to locate the rows changed by row-based events.

The search algorithm is set using the slave_rows_search_algorithms option which is available in MySQL 5.6 and later. The default value is to use an index scan if possible, otherwise a table scan.

https://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html#option_mysqld_slave-rows-search-algorithms

However, for tables without a primary key using a hash scan, which causes the SQL thread to temporarily cache hashes to reduce the overhead of searching the whole table. The value of slave_rows_search_algorithms can be changed dynamically using:

mysql> SET GLOBAL slave_rows_search_algorithms = 'INDEX_SCAN,HASH_SCAN';

Just to note INDEX_SCAN,HASH_SCAN is the default value in MySQL 8.0.

One thing to be aware of when using hash scans that the hashes are only reused within one row-based event. (Each row-based event may have changes to several rows in the same table originating from the same SQL statement).

The binlog_row_event_max_size option on the replication master controls the maximum size of a row-based event. The default max event size is 8kB. This means that switching to hash scans only improves the performance of the SQL thread when:

  1. Several rows fit into one row based event. It may help to increase the value of binlog_row_event_max_size on the replication master, if you perform updates or deletes on large rows (e.g., with blob or text data). You can only set the binlog_row_event_max_size in the MySQL configuration file, and resetting this value requires a restart.
  2. One statement changes several rows.

https://dev.mysql.com/doc/refman/5.7/en/replication-options-binary-log.html#option_mysqld_binlog-row-event-max-size

Conclusion

Even if enabling hash scans improves the performance enough for the replication slave to keep up, the permanent solution is to add an explicit primary key to each table. This should be the general rule of thumb in the schema design in order avoid and/or minimize many issues like slave performance too slow (as described in this post).

Next, I am going to investigate how we can find out the exact thread state using Performance Schema in order to make issue identification less of a guessing game.

The post Causes and Workarounds for Slave Performance Too Slow with Row-Based Events appeared first on Percona Database Performance Blog.

by Alex Poritskiy at May 03, 2018 07:09 PM

MariaDB Foundation

MariaDB 10.0.35, MariaDB Galera Cluster 5.5.60 and MariaDB Connector C 3.0.4 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.0.35, MariaDB Galera Cluster 5.5.60 as well as MariaDB Connector/C 3.0.4, all stable releases. See the release notes and changelogs for details. Download MariaDB 10.0.35 Release Notes Changelog What is MariaDB 10.0? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Galera Cluster 5.5.60 […]

The post MariaDB 10.0.35, MariaDB Galera Cluster 5.5.60 and MariaDB Connector C 3.0.4 now available appeared first on MariaDB.org.

by Ian Gilfillan at May 03, 2018 03:23 PM

MariaDB AB

MariaDB Server 10.0.35 and MariaDB Cluster 5.5.60 now available

MariaDB Server 10.0.35 and MariaDB Cluster 5.5.60 now available dbart Thu, 05/03/2018 - 10:58

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.0.35, and MariaDB Cluster 5.5.60. See the release notes and changelogs for details and visit mariadb.com/downloads to download.

Download MariaDB Server 10.0.35

Release Notes Changelog What is MariaDB 10.0?


Download MariaDB Cluster 5.5.60

Release Notes Changelog What is MariaDB Cluster?

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.0.35 and MariaDB Cluster 5.5.60. See the release notes and changelogs for details.

Login or Register to post comments

by dbart at May 03, 2018 02:58 PM

Shlomi Noach

MySQL master discovery methods, part 1: DNS

This is the first in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).

Master discovery via DNS

In DNS master discovery applications connect to the master via a name that gets resolved to the master's box. By way of example, apps would target the masters of different clusters by connecting to cluster1-writer.example.net, cluster2-writer.example.net, etc. It is up for the DNS to resolve those names to IPs.

Issues for concern are:

  • You will likely have multiple DNS servers. How many? In which data centers / availability zones?
  • What is your method for distributing/deploying a name change to all your DNS servers?
  • DNS will indicate a TTL (Time To Live) such that clients can cache the IP associated with a name for a given number of seconds. What is that TTL?

As long as things are stable and going well, discovery via DNS makes sense. Trouble begins when the master fails over. Assume M used to be the master, but got demoted. Assume R used to be a replica, that got promoted and is now effectively the master of the topology.

Our failover solution has promoted R, and now needs to somehow apply the change, such that the apps connect to R instead of M. Some notes:

  • The apps need not change configuration. They should still connect to cluster1-writer.example.net, cluster2-writer.example.net, etc.
  • Our tool instructs DNS servers to make the change.
  • Clients will still resolve to old IP based on TTL.

A non planned failover illustration #1

Master M dies. R gets promoted. Our tool instructs all DNS servers on all DCs to update the IP address.

Say TTL is 60 seconds. Say update to all DNS servers takes 10 seconds. We will have between 10 and 70 seconds until all clients connect to the new master R.

During that time they will continue to attempt connecting to M. Since M is dead, those attempts will fail (thankfully).

A non planned failover illustration #2

Master M gets network isolated for 30 seconds, during which time we failover. R gets promoted. Our tool instructs all DNS servers on all DCs to update the IP address.

Again, assume TTL is 60 seconds. As before, it will take between 10 and 70 seconds for clients to learn of the new IP.

Clients who will require between 40 and 70 seconds to learn of the new IP will, however, hit an unfortunate scenario: the old master M reappears on the grid. Those clients will successfully reconnect to M and issue writes, leading to data loss (writes to M no longer replicate anywhere).

Planned failover illustration

We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R. We need to change DNS records. Since this is a planned failover, we set the old master to read_only=1, or even better, we network isolated it.

And still our clients take 10 to 70 seconds to recognize the new master.

Discussion

The above numbers are just illustrative. Perhaps DNS deployment is quicker than 10 seconds. You should do your own math.

TTL is a compromise which you can tune. Setting lower TTL will mitigate the problem, but will cause more hits on the DNS servers.

For planned takeover we can first deploy a change to the TTL, to, say, 2sec, wait 60sec, then deploy the IP change, then restore TTL to 60.

You may choose to restart apps upon DNS deployment. This emulates apps' awareness of the change.

Sample orchestrator configuration

orchestrator configuration would look like this:

  "ApplyMySQLPromotionAfterMasterFailover": true,
  "PostMasterFailoverProcesses": [
    "/do/what/you/gotta/do to apply dns change for {failureClusterAlias}-writer.example.net to {successorHost}"
  ],  

In the above:

  • ApplyMySQLPromotionAfterMasterFailover instructs orchestrator to set read_only=0; reset slave all on promoted server.
  • PostMasterFailoverProcesses really depends on your setup. But orchestrator will supply with hints to your scripts: identity of cluster, identity of successor.

See orchestrator configuration documentation.

All posts in this series

by shlomi at May 03, 2018 10:56 AM

May 02, 2018

Peter Zaitsev

ProxySQL Query Rewrite Use Case

ProxySQL Query Rewrite

ProxySQL Query RewriteIn this blog post, I’m going to revisit the ProxySQL Query Rewrite feature. You may have seen me talking about possible use case scenarios in the past few conferences, but the reason I’m starting with this is that query rewriting was the original intention for building ProxySQL.

Why would you need to rewrite a query?

  • You’ve identified a query that’s causing bottleneck or slowness
  • A special operation requires query routing
  • You cannot modify application code

So here we have a case of a bad query hitting the backend database. You as a DBA have identified the query as causing severe slowdown, which could lead to a site-wide outage. This query needs to be optimized, and you have asked the developer to correct this bad query. Their answer isn’t really what you expected. You can rewrite some queries to have the same data result by choosing a different optimizer path. In cases where an application was written in ORM – such as Hibernate or similar – it is not easy to quickly make a code change.

The query rewrite feature of ProxySQL makes this possible (until the application can be modified).

How do we rewrite a query? There are two ways to accomplish this with ProxySQL.

Query rewrite is just a match_pattern + replace_pattern activity, whereas match_digest is only used for matching a query, not rewriting it. Logically, match_digest serves the same purpose of username, schemaname, proxy_addr, etc. It only matches the query.

These two different mechanisms offers ways to optimize query matching operation efficiently depending on the query type (such as DML operation versus SELECT query). Please note that if your intention is to rewrite queries, the rule must match the original query by using match_pattern. Query rules are processed by using rule_id field and only applied if active = 1.

Here’s how we can demonstrate match_digest in our test lab:

mysql> SELECT hostgroup hg, sum_time, count_star, digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC limit 10;
+----+-----------+------------+-----------------------------------+
| hg | sum_time  | count_star | digest_text                       |
+----+-----------+------------+-----------------------------------+
| 0  | 243549572 | 85710      | SELECT c FROM sbtest10 WHERE id=? |
| 0  | 146324255 | 42856      | COMMIT                            |
| 0  | 126643488 | 44310      | SELECT c FROM sbtest7 WHERE id=?  |
| 0  | 126517140 | 42927      | BEGIN                             |
| 0  | 123797307 | 43820      | SELECT c FROM sbtest1 WHERE id=?  |
| 0  | 123345775 | 43460      | SELECT c FROM sbtest6 WHERE id=?  |
| 0  | 122121030 | 43010      | SELECT c FROM sbtest9 WHERE id=?  |
| 0  | 121245265 | 42400      | SELECT c FROM sbtest8 WHERE id=?  |
| 0  | 120554811 | 42520      | SELECT c FROM sbtest3 WHERE id=?  |
| 0  | 119244143 | 42070      | SELECT c FROM sbtest5 WHERE id=?  |
+----+-----------+------------+-----------------------------------+
10 rows in set (0.00 sec)
mysql> INSERT INTO mysql_query_rules (rule_id,active,username,match_digest, match_pattern,replace_pattern,apply) VALUES (10,1,'root','SELECT.*WHERE id=?','sbtest2','sbtest10',1);
Query OK, 1 row affected (0.00 sec)
mysql> LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+
| hits | rule_id | digest | active | username | match_digest       | match_pattern | replace_pattern | cache_ttl | apply |
+------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+
| 0    | 10      | NULL   | 1      | root     | SELECT.*WHERE id=? | sbtest2       | sbtest10        | NULL      | 1     |
+------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+
1 row in set (0.00 sec)
mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+
| hits | rule_id | digest | active | username | match_digest       | match_pattern | replace_pattern | cache_ttl | apply |
+------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+
| 593  | 10      | NULL   | 1      | root     | SELECT.*WHERE id=? | sbtest2       | sbtest10        | NULL      | 1     |
+------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+
1 row in set (0.00 sec)

We can also monitor Query Rules activity live using the ProxyTop utility:

To reset ProxySQL’s statistics for query rules, use following steps:

mysql> SELECT 1 FROM stats_mysql_query_digest_reset LIMIT 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.01 sec)
mysql> LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

Here’s a match_pattern example:

mysql> SELECT hostgroup hg, sum_time, count_star, digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC limit 5;
+----+----------+------------+----------------------------------+
| hg | sum_time | count_star | digest_text                      |
+----+----------+------------+----------------------------------+
| 0  | 98753983 | 16292      | BEGIN                            |
| 0  | 84613532 | 16232      | COMMIT                           |
| 1  | 49327292 | 16556      | SELECT c FROM sbtest3 WHERE id=? |
| 1  | 49027118 | 16706      | SELECT c FROM sbtest2 WHERE id=? |
| 1  | 48095847 | 16396      | SELECT c FROM sbtest4 WHERE id=? |
+----+----------+------------+----------------------------------+
5 rows in set (0.01 sec)
mysql> INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (20,1,'root','DISTINCT(.*)ORDER BY c','DISTINCT1',1);
Query OK, 1 row affected (0.00 sec)
mysql> LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 0 rows affected (0.01 sec)
mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+
| hits | rule_id | digest | active | username | match_digest       | match_pattern          | replace_pattern | cache_ttl | apply |
+------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+
| 0    | 10      | NULL   | 1      | root     | SELECT.*WHERE id=? | sbtest2                | sbtest10        | NULL      | 1     |
| 0    | 20      | NULL   | 1      | root     | NULL               | DISTINCT(.*)ORDER BY c | DISTINCT1      | NULL      | 1     |
+------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+
2 rows in set (0.01 sec)
mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+
| hits | rule_id | digest | active | username | match_digest       | match_pattern          | replace_pattern | cache_ttl | apply |
+------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+
| 9994 | 10      | NULL   | 1      | root     | SELECT.*WHERE id=? | sbtest2                | sbtest10        | NULL      | 1     |
| 6487 | 20      | NULL   | 1      | root     | NULL               | DISTINCT(.*)ORDER BY c | DISTINCT1      | NULL      | 1     |
+------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+
2 rows in set (0.00 sec)
mysql> SELECT 1 FROM stats_mysql_query_digest_reset LIMIT 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)
mysql>  LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

The key in query ruling for a rewrite is the order of the apply field:

  • apply = 1 means don’t evaluate any other rules if there’s a match already.
  • apply = 0 means evaluate the next rules in the chain.

As we can see in the test below, all queries matching with rule_id = 10 or rule_id = 20 have hits. In reality, all rules in runtime_mysql_query_rules are active. If we want to disable a rule that is in the mysql_query_rules table, set active = 0:

mysql> update mysql_query_rules set apply = 1 where rule_id in (10);
Query OK, 1 row affected (0.00 sec)
mysql> update mysql_query_rules set apply = 0 where rule_id in (20);
Query OK, 1 row affected (0.00 sec)
mysql>  LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+
| hits | rule_id | digest | active | username | match_digest       | match_pattern          | replace_pattern | cache_ttl | apply |
+------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+
| 0    | 10      | NULL   | 1      | root     | SELECT.*WHERE id=? | sbtest2                | sbtest10        | NULL      | 1     |
| 0    | 20      | NULL   | 1      | root     | NULL               | DISTINCT(.*)ORDER BY c | DISTINCT1      | NULL      | 0     |
+------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+
2 rows in set (0.00 sec)
mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, flagIN, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+
| hits  | rule_id | digest | active | username | match_digest       | match_pattern          | replace_pattern | flagIN | apply |
+-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+
| 10195 | 10      | NULL   | 1      | root     | SELECT.*WHERE id=? | sbtest2                | sbtest10        | 0      | 1     |
| 6599  | 20      | NULL   | 1      | root     | NULL               | DISTINCT(.*)ORDER BY c | DISTINCT1      | 0      | 0     |
+-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+
2 rows in set (0.00 sec)
mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, flagIN, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+
| hits  | rule_id | digest | active | username | match_digest       | match_pattern          | replace_pattern | flagIN | apply |
+-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+
| 20217 | 5       | NULL   | 1      | root     | NULL               | DISTINCT(.*)ORDER BY c | DISTINCT1      | 0      | 1     |
| 27020 | 10      | NULL   | 1      | root     | SELECT.*WHERE id=? | sbtest2                | sbtest10        | 0      | 0     |
+-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+
2 rows in set (0.00 sec)
mysql> update mysql_query_rules set active = 0 where rule_id = 5;
Query OK, 1 row affected (0.00 sec)
mysql>  LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 0 rows affected (0.02 sec)
mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+
| hits | rule_id | digest | active | username | match_digest       | match_pattern | replace_pattern | cache_ttl | apply |
+------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+
| 0    | 10      | NULL   | 1      | root     | SELECT.*WHERE id=? | sbtest2       | sbtest10        | NULL      | 0     |
+------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+
1 row in set (0.00 sec)
mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+
| hits | rule_id | digest | active | username | match_digest       | match_pattern | replace_pattern | cache_ttl | apply |
+------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+
| 4224 | 10      | NULL   | 1      | root     | SELECT.*WHERE id=? | sbtest2       | sbtest10        | NULL      | 0     |
+------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+
1 row in set (0.01 sec)

Additionally, ProxySQL can help to identify bad queries. Login to the admin module and follow these steps:

Find the most time-consuming queries:

mysql> SELECT SUM(sum_time), SUM(count_star), digest_text FROM stats_mysql_query_digest GROUP BY digest ORDER BY SUM(sum_time) DESC LIMIT 3G
*************************** 1. row ***************************
  SUM(sum_time): 95053795
SUM(count_star): 13164
    digest_text: BEGIN
*************************** 2. row ***************************
  SUM(sum_time): 85094367
SUM(count_star): 13130
    digest_text: COMMIT
*************************** 3. row ***************************
  SUM(sum_time): 52110099
SUM(count_star): 13806
    digest_text: SELECT c FROM sbtest3 WHERE id=?
3 rows in set (0.00 sec)

Find highest average execution time:

mysql> SELECT SUM(sum_time), SUM(count_star), SUM(sum_time)/SUM(count_star) avg,  digest_text FROM stats_mysql_query_digest GROUP BY digest ORDER BY SUM(sum_time)/SUM(count_star) DESC limit 1;
+---------------+-----------------+--------+--------------------------------+
| SUM(sum_time) | SUM(count_star) | avg    | digest_text                    |
+---------------+-----------------+--------+--------------------------------+
| 972162        | 1               | 972162 | CREATE INDEX k_5 ON sbtest5(k) |
+---------------+-----------------+--------+--------------------------------+
1 row in set (0.00 sec)

The above information can also be gathered from information_schema.events_statements_summary_by_digest, but I prefer the ProxySQL admin interface. Also, you can run the slow query log analysis by running a detailed pt-query-digest on your system to identify slow queries. You can also use PMM’s QAN.

Conclusion

I’ve found the best documentation on ProxySQL query rewrite is at IBM’s site, where they explain query rewrite fundamentals with examples. It’s worth a read. I’m not going to get into the details of these techniques here, but if you find more relevant resources, please post them in the comments section.

A few of the possible query optimization techniques:

  • Operation merging
  • Operation movement
  • Predicate translation

At the time of this blog post, ProxySQL has also announced a new fast schema routing algorithm to support thousands of shards.

There may be other cases where you want to divert traffic to another table. Think of a table hitting the maximum integer value, and you want to keep inserts going into a new table while you alter the old one to correct the issue. In the mean time, all selects can still point to the old table to continue operation.

As of MySQL 5.7.6, Oracle also offers query rewrite as a plugin, and you can find the documentation here. The biggest disadvantage of using Oracle’s built-in solution is the rewrite rule sits with the server it is implemented on. That’s where ProxySQL has a bigger advantage: it sits between the application and database server, so the rule applies to the entire topology, not just for a single host.

As you can see, ProxySQL query rewrite is a great way to solve some real operational issues and make you a hero to the team and project. To become a rock star, you might want to consider Percona Training on ProxySQL. The training will provide the knowledge to set up a ProxySQL environment with best practices, understand when and how to change the configuration, and maintain it to ensure increasing your uptime SLAs. Contact us for more details at info@percona.com.

References:

https://www.percona.com/blog/2017/04/10/proxysql-rules-do-i-have-too-many/

http://www.proxysql.com/blog/query-rewrite-with-proxysql-use-case-scenario

https://github.com/sysown/proxysql/wiki/ProxySQL-Configuration#query-rewrite

https://dev.mysql.com/doc/refman/5.7/en/rewriter-query-rewrite-plugin.html

https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.admin.perf.doc/doc/c0005293.html

The post ProxySQL Query Rewrite Use Case appeared first on Percona Database Performance Blog.

by Alkin Tezuysal at May 02, 2018 11:02 PM

Jean-Jerome Schmidt

How to Overcome Accidental Data Deletion in MySQL & MariaDB

Someone accidently deleted part of the database. Someone forgot to include a WHERE clause in a DELETE query, or they dropped the wrong table. Things like that may and will happen, it is inevitable and human. But the impact can be disastrous. What can you do to guard yourself against such situations, and how can you recover your data? In this blog post, we will cover some of the most typical cases of the data loss, and how you can prepare yourself so you can recover from them.

Preparations

There are things you should do in order to ensure a smooth recovery. Let’s go through them. Please keep in mind that it’s not “pick one” situation - ideally you will implement all of the measures we are going to discuss below.

Backup

You have to have a backup, there is no getting away from it. You should have your backup files tested - unless you test your backups, you cannot be sure if they are any good and if you will ever be able to restore them. For disaster recovery you should keep a copy of your backup somewhere outside of your datacenter - just in case the whole datacenter becomes unavailable. To speed up the recovery, it’s very useful to keep a copy of the backup also on the database nodes. If your dataset is large, copying it over the network from a backup server to the database node which you want to restore may take significant time. Keeping the latest backup locally may significantly improve recovery times.

Logical Backup

Your first backup, most likely, will be a physical backup. For MySQL or MariaDB, it will be either something like xtrabackup or some sort of filesystem snapshot. Such backups are great for restoring a whole dataset or for provisioning new nodes. However, in case of deletion of a subset of data, they suffer from significant overhead. First of all, you are not able to restore all of the data, or else you will overwrite all changes that happened after the backup was created. What you are looking for is the ability to restore just a subset of data, only the rows which were accidentally removed. To do that with a physical backup, you would have to restore it on a separate host, locate removed rows, dump them and then restore them on the production cluster. Copying and restoring hundreds of gigabytes of data just to recover a handful of rows is something we would definitely call a significant overhead. To avoid it you can use logical backups - instead of storing physical data, such backups store data in a text format. This makes it easier to locate the exact data which was removed, which can then be restored directly on the production cluster. To make it even easier, you can also split such logical backup in parts and backup each and every table to a separate file. If your dataset is large, it will make sense to split one huge text file as much as possible. This will make the backup inconsistent but for the majority of the cases, this is no issue - if you will need to restore the whole dataset to a consistent state, you will use physical backup, which is much faster in this regard. If you need to restore just a subset of data, the requirements for consistency are less stringent.

Point-In-Time Recovery

Backup is just a beginning - you will be able to restore your data to the point at which the backup was taken but, most likely, data was removed after that time. Just by restoring missing data from the latest backup, you may lose any data that was changed after the backup. To avoid that you should implement Point-In-Time Recovery. For MySQL it basically means you will have to use binary logs to replay all the changes which happened between the moment of the backup and the data loss event. The below screenshot shows how ClusterControl can help with that.

What you will have to do is to restore this backup up to the moment just before the data loss. You will have to restore it on a separate host in order not to make changes on the production cluster. Once you have the backup restored, you can log into that host, find the missing data, dump it and restore on the production cluster.

Delayed Slave

All of the methods we discussed above have one common pain point - it takes time to restore the data. It may take longer, when you restore all of the data and then try to dump only the interesting part. It may take less time if you have logical backup and you can quickly drill down to the data you want to restore, but it is by no means a quick task. You still have to find a couple of rows in a large text file. The larger it is, the more complicated the task gets - sometimes the sheer size of the file slows down all actions. One method to avoid those problems is to have a delayed slave. Slaves typically try to stay up to date with the master but it is also possible to configure them so that they will keep a distance from their master. In the below screenshot, you can see how to use ClusterControl to deploy such a slave:

In short, we have here an option to add a replication slave to the database setup and configure it to be delayed. In the screenshot above, the slave will be delayed by 3600 seconds, which is one hour. This lets you to use that slave to recover the removed data up to one hour from the data deletion. You will not have to restore a backup, it will be enough to run mysqldump or SELECT ... INTO OUTFILE for the missing data and you will get the data to restore on your production cluster.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Restoring Data

In this section, we will go through a couple of examples of accidental data deletion and how you can recover from them. We will walk through recovery from a full data loss, we will also show how to recover from a partial data loss when using physical and logical backups. We will finally show you how to restore accidentally deleted rows if you have a delayed slave in your setup.

Full Data Loss

Accidental “rm -rf” or “DROP SCHEMA myonlyschema;” has been executed and you ended up with no data at all. If you happened to also remove files other than from the MySQL data directory, you may need to reprovision the host. To keep things simpler we will assume that only MySQL has been impacted. Let’s consider two cases, with a delayed slave and without one.

No Delayed Slave

In this case the only thing thing we can do is to restore the last physical backup. As all of our data has been removed, we don’t need to be worried about activity which happened after the data loss because with no data, there is no activity. We should be worried about the activity which happened after the backup took place. This means we have to do a Point-in-Time restore. Of course, it will take longer than to just restore data from the backup. If bringing your database up quickly is more crucial than to have all of the data restored, you can as well just restore a backup and be fine with it.

First of all, if you still have access to binary logs on the server you want to restore, you can use them for PITR. First, we want to convert the relevant part of the binary logs to a text file for further investigation. We know that data loss happened after 13:00:00. First, let’s check which binlog file we should investigate:

root@vagrant:~# ls -alh /var/lib/mysql/binlog.*
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:32 /var/lib/mysql/binlog.000001
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:33 /var/lib/mysql/binlog.000002
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:35 /var/lib/mysql/binlog.000003
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:38 /var/lib/mysql/binlog.000004
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:39 /var/lib/mysql/binlog.000005
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:41 /var/lib/mysql/binlog.000006
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:43 /var/lib/mysql/binlog.000007
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:45 /var/lib/mysql/binlog.000008
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:47 /var/lib/mysql/binlog.000009
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:49 /var/lib/mysql/binlog.000010
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:51 /var/lib/mysql/binlog.000011
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:53 /var/lib/mysql/binlog.000012
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:55 /var/lib/mysql/binlog.000013
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:57 /var/lib/mysql/binlog.000014
-rw-r----- 1 mysql mysql 1.1G Apr 23 10:59 /var/lib/mysql/binlog.000015
-rw-r----- 1 mysql mysql 306M Apr 23 13:18 /var/lib/mysql/binlog.000016

As can be seen, we are interested in the last binlog file.

root@vagrant:~# mysqlbinlog --start-datetime='2018-04-23 13:00:00' --verbose /var/lib/mysql/binlog.000016 > sql.out

Once done, let’s take a look at the contents of this file. We will search for ‘drop schema’ in vim. Here’s a relevant part of the file:

# at 320358785
#180423 13:18:58 server id 1  end_log_pos 320358850 CRC32 0x0893ac86    GTID    last_committed=307804   sequence_number=307805  rbr_only=no
SET @@SESSION.GTID_NEXT= '52d08e9d-46d2-11e8-aa17-080027e8bf1b:443415'/*!*/;
# at 320358850
#180423 13:18:58 server id 1  end_log_pos 320358946 CRC32 0x487ab38e    Query   thread_id=55    exec_time=1     error_code=0
SET TIMESTAMP=1524489538/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
drop schema sbtest
/*!*/;

As we can see, we want to restore up to position 320358785. We can pass this data to the ClusterControl UI:

Delayed Slave

If we have a delayed slave and that host is enough to handle all of the traffic, we can use it and promote it to master. First though, we have to make sure it caught up with the old master up to the point of the data loss. We will use some CLI here to make it happen. First, we need to figure out on which position the data loss happened. Then we will stop the slave and let it run up to the data loss event. We showed how to get the correct position in the previous section - by examining binary logs. We can either use that position (binlog.000016, position 320358785) or, if we use a multithreaded slave, we should use GTID of the data loss event (52d08e9d-46d2-11e8-aa17-080027e8bf1b:443415) and replay queries up to that GTID.

First, let’s stop the slave and disable delay:

mysql> STOP SLAVE;
Query OK, 0 rows affected (0.01 sec)
mysql> CHANGE MASTER TO MASTER_DELAY = 0;
Query OK, 0 rows affected (0.02 sec)

Then we can start it up to a given binary log position.

mysql> START SLAVE UNTIL MASTER_LOG_FILE='binlog.000016', MASTER_LOG_POS=320358785;
Query OK, 0 rows affected (0.01 sec)

If we’d like to use GTID, the command will look different:

mysql> START SLAVE UNTIL SQL_BEFORE_GTIDS = ‘52d08e9d-46d2-11e8-aa17-080027e8bf1b:443415’;
Query OK, 0 rows affected (0.01 sec)

Once the replication stopped (meaning all of the events we asked for have been executed), we should verify that the host contains the missing data. If so, you can promote it to master and then rebuild other hosts using new master as the source of data.

This is not always the best option. All depends on how delayed your slave is - if it is delayed by a couple of hours, it may not make sense to wait for it to catch up, especially if write traffic is heavy in your environment. In such case, it’s most likely faster to rebuild hosts using physical backup. On the other hand, if you have a rather small volume of traffic, this could be a nice way to actually quickly fix the issue, promote new master and get on with serving traffic, while the rest of the nodes are being rebuilt in the background.

Partial Data Loss - Physical Backup

In case of the partial data loss, physical backups can be inefficient but, as those are the most common type of a backup, it’s very important to know how to use them for partial restore. First step will always be to restore a backup up to a point in time before the data loss event. It’s also very important to restore it on a separate host. ClusterControl uses xtrabackup for physical backups so we will show how to use it. Let’s assume we ran the following incorrect query:

DELETE FROM sbtest1 WHERE id < 23146;

We wanted to delete just a single row (‘=’ in WHERE clause), instead we deleted a bunch of them (< in WHERE clause). Let’s take a look at the binary logs to find at which position the issue happened. We will use that position to restore the backup to.

mysqlbinlog --verbose /var/lib/mysql/binlog.000003 > bin.out

Now, let’s look at the output file and see what we can find there. We are using row-based replication therefore we will not see the exact SQL that was executed. Instead (as long as we will use --verbose flag to mysqlbinlog) we will see events like below:

### DELETE FROM `sbtest`.`sbtest1`
### WHERE
###   @1=999296
###   @2=1009782
###   @3='96260841950-70557543083-97211136584-70982238821-52320653831-03705501677-77169427072-31113899105-45148058587-70555151875'
###   @4='84527471555-75554439500-82168020167-12926542460-82869925404'

As can be seen, MySQL identifies rows to delete using very precise WHERE condition. Mysterious signs in the human-readable comment, “@1”, “@2”, mean “first column”, “second column”. We know that the first column is ‘id’, which is something we are interested in. We need to find a large DELETE event on a ‘sbtest1’ table. Comments which will follow should mention id of 1, then id of ‘2’, then ‘3’ and so on - all up to id of ‘23145’. All should be executed in a single transaction (single event in a binary log). After analysing the output using ‘less’, we found:

### DELETE FROM `sbtest`.`sbtest1`
### WHERE
###   @1=1
###   @2=1006036
###   @3='123'
###   @4='43683718329-48150560094-43449649167-51455516141-06448225399'
### DELETE FROM `sbtest`.`sbtest1`
### WHERE
###   @1=2
###   @2=1008980
###   @3='123'
###   @4='05603373460-16140454933-50476449060-04937808333-32421752305'

The event, to which those comments are attached started at:

#180427  8:09:21 server id 1  end_log_pos 29600687 CRC32 0x8cfdd6ae     Xid = 307686
COMMIT/*!*/;
# at 29600687
#180427  8:09:21 server id 1  end_log_pos 29600752 CRC32 0xb5aa18ba     GTID    last_committed=42844    sequence_number=42845   rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= '0c695e13-4931-11e8-9f2f-080027e8bf1b:55893'/*!*/;
# at 29600752
#180427  8:09:21 server id 1  end_log_pos 29600826 CRC32 0xc7b71da5     Query   thread_id=44    exec_time=0     error_code=0
SET TIMESTAMP=1524816561/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
BEGIN
/*!*/;
# at 29600826

So, we want to restore the backup up to the previous commit at position 29600687. Let’s do that now. We’ll use external server for that. We will restore backup up to that position and we will keep the restore server up and running so we can then later extract the missing data.

Once restore is completed, let’s make sure our data has been recovered:

mysql> SELECT COUNT(*) FROM sbtest.sbtest1 WHERE id < 23146;
+----------+
| COUNT(*) |
+----------+
|    23145 |
+----------+
1 row in set (0.03 sec)

Looks good. Now we can extract this data into a file which we will load back on the master.

mysql> SELECT * FROM sbtest.sbtest1 WHERE id < 23146 INTO OUTFILE 'missing.sql';
ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement

Something is not right - this is because the server is configured to be able to write files only in a particular location - it’s all about security, we don’t want to let users save contents anywhere they like. Let’s check where we can save our file:

mysql> SHOW VARIABLES LIKE "secure_file_priv";
+------------------+-----------------------+
| Variable_name    | Value                 |
+------------------+-----------------------+
| secure_file_priv | /var/lib/mysql-files/ |
+------------------+-----------------------+
1 row in set (0.13 sec)

Ok, let’s try one more time:

mysql> SELECT * FROM sbtest.sbtest1 WHERE id < 23146 INTO OUTFILE '/var/lib/mysql-files/missing.sql';
Query OK, 23145 rows affected (0.05 sec)

Now it looks much better. Let’s copy the data to the master:

root@vagrant:~# scp /var/lib/mysql-files/missing.sql 10.0.0.101:/var/lib/mysql-files/
missing.sql                                                                                                                                                                      100% 1744KB   1.7MB/s   00:00

Now it’s time to load the missing rows on the master and test if it succeeded:

mysql> LOAD DATA INFILE '/var/lib/mysql-files/missing.sql' INTO TABLE sbtest.sbtest1;
Query OK, 23145 rows affected (2.22 sec)
Records: 23145  Deleted: 0  Skipped: 0  Warnings: 0

mysql> SELECT COUNT(*) FROM sbtest.sbtest1 WHERE id < 23146;
+----------+
| COUNT(*) |
+----------+
|    23145 |
+----------+
1 row in set (0.00 sec)

That’s all, we restored our missing data.

Partial Data Loss - Logical Backup

In the previous section, we restored lost data using physical backup and an external server. What if we had logical backup created? Let’s take a look. First, let’s verify that we do have a logical backup:

root@vagrant:~# ls -alh /root/backups/BACKUP-13/
total 5.8G
drwx------ 2 root root 4.0K Apr 27 07:35 .
drwxr-x--- 5 root root 4.0K Apr 27 07:14 ..
-rw-r--r-- 1 root root 2.4K Apr 27 07:35 cmon_backup.metadata
-rw------- 1 root root 5.8G Apr 27 07:35 mysqldump_2018-04-27_071434_complete.sql.gz

Yes, it’s there. Now, it’s time to decompress it.

root@vagrant:~# mkdir /root/restore
root@vagrant:~# zcat /root/backups/BACKUP-13/mysqldump_2018-04-27_071434_complete.sql.gz > /root/restore/backup.sql

When you look into it, you will see that the data is stored in multi-value INSERT format. For example:

INSERT INTO `sbtest1` VALUES (1,1006036,'18034632456-32298647298-82351096178-60420120042-90070228681-93395382793-96740777141-18710455882-88896678134-41810932745','43683718329-48150560094-43449649167-51455516141-06448225399'),(2,1008980,'69708345057-48265944193-91002879830-11554672482-35576538285-03657113365-90301319612-18462263634-56608104414-27254248188','05603373460-16140454933-50476449060-04937808333-32421752305')

All we need to do now is to pinpoint where our table is located and then where the rows, which are of interest to us, are stored. First, knowing mysqldump patterns (drop table, create new one, disable indexes, insert data) let’s figure out which line contains CREATE TABLE statement for ‘sbtest1’ table:

root@vagrant:~/restore# grep -n "CREATE TABLE \`sbtest1\`" backup.sql > out
root@vagrant:~/restore# cat out
971:CREATE TABLE `sbtest1` (

Now, using a method of trial and error, we need to figure out where to look for our rows. We’ll show you the final command we came up with. The whole trick is to try and print different range of lines using sed and then check if the latest line contains rows close to, but later than what we are searching for. In the command below we look for lines between 971 (CREATE TABLE) and 993. We also ask sed to quit once it reaches line 994 as the rest of the file is of no interest to us:

root@vagrant:~/restore# sed -n '971,993p; 994q' backup.sql > 1.sql
root@vagrant:~/restore# tail -n 1 1.sql  | less

The output looks like below:

INSERT INTO `sbtest1` VALUES (31351,1007187,'23938390896-69688180281-37975364313-05234865797-89299459691-74476188805-03642252162-40036598389-45190639324-97494758464','60596247401-06173974673-08009930825-94560626453-54686757363'),

This means that our row range (up to row with id of 23145) is close. Next, it’s all about manual cleaning of the file. We want it to start with the first row we need to restore:

INSERT INTO `sbtest1` VALUES (1,1006036,'18034632456-32298647298-82351096178-60420120042-90070228681-93395382793-96740777141-18710455882-88896678134-41810932745','43683718329-48150560094-43449649167-51455516141-06448225399')

And end up with the last row to restore:

(23145,1001595,'37250617862-83193638873-99290491872-89366212365-12327992016-32030298805-08821519929-92162259650-88126148247-75122945670','60801103752-29862888956-47063830789-71811451101-27773551230');

We had to trim some of the unneeded data (it is multiline insert) but after all of this we have a file which we can load back on the master.

root@vagrant:~/restore# cat 1.sql | mysql -usbtest -psbtest -h10.0.0.101 sbtest
mysql: [Warning] Using a password on the command line interface can be insecure.

Finally, last check:

mysql> SELECT COUNT(*) FROM sbtest.sbtest1 WHERE id < 23146;
+----------+
| COUNT(*) |
+----------+
|    23145 |
+----------+
1 row in set (0.00 sec)

All is good, data has been restored.

Partial Data Loss, Delayed Slave

In this case, we will not go through the whole process. We already described how to identify the position of a data loss event in the binary logs. We also described how to stop a delayed slave and start the replication again, up to a point before the data loss event. We also explained how to use SELECT INTO OUTFILE and LOAD DATA INFILE to export data from external server and load it on the master. That’s all you need. As long as the data is still on the delayed slave, you have to stop it. Then you need to locate the position before the data loss event, start the slave up to that point and, once this is done, use the delayed slave to extract data which was deleted, copy the file to master and load it to restore the data.

Conclusion

Restoring lost data is not fun, but if you follow the steps we went through in this blog, you will have a good chance of recovering what you lost.

by krzysztof at May 02, 2018 02:04 PM

May 01, 2018

Peter Zaitsev

MongoDB Rollback in replicaset

MongoDB Rollback

MongoDB RollbackIn this blog post, we’ll look at how MongoDB rollback works during replicaset failovers.

In recent versions, MongoDB has provided lots of features related to replicaset and automatic failover. When it comes to failover, the next question that arises is “How does MongoDB ROLLBACK work during replicaset failover?”

If a PRIMARY member (say node A) stepped down with some data writes that were executed but not replicated to the SECONDARY members yet, then a ROLLBACK occurs on the former PRIMARY A when it rejoins the replicaset. I’ll explain below how the ROLLBACK works!

ROLLBACK Scenario:

ROLLBACK is rare in a replicaset as MongoDB tries to avoid it by replicating the operations from PRIMARY to SECONDARY without delay, under normal conditions. Most of the time ROLLBACK occurs in the event of network partitioning, or if SECONDARY members can’t keep up with the throughput of operations on the former PRIMARY.

ROLLBACK Process:

We will see the process with a test. I have used Docker for this test with the MongoDB 3.2 Jessie version to setup a replicaset with members mongo1 – A, mongo2 – B, mongo3 – C and set Priority 10 to A. Now A is PRIMARY as expected in the replicaset. We need to write some data into A and create a network partition scenario with B and C at the same time. For that, I inserted 25000 documents into A and made it out of network at the same time.

Terminal 1 (A’s mongo prompt):

my-mongo-set:PRIMARY> for (var i = 1; i <= 25000; i++) {
...    db.testData.insert( { x : i } )
... }
WriteResult({ "nInserted" : 1 })
my-mongo-set:PRIMARY> db.testD2018-03-30T17:34:51.455+0530 I NETWORK  [thread1] trying reconnect to 127.0.0.1:30001 (127.0.0.1) failed
2018-03-30T17:34:51.464+0530 I NETWORK  [thread1] reconnect 127.0.0.1:30001 (127.0.0.1) ok
                      db.testD
admin.testD
my-mongo-set:SECONDARY> rs.slaveOk()
my-mongo-set:SECONDARY> db.testData.count()
25000

Terminal2:

Vinodhs-MBP:~ vinodhkrish$ docker ps
CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS              PORTS                           NAMES
b27d82ac2439        mongo:3.2.19-jessie         "docker-entrypoint.s…"   2 days ago          Up 1 days           0.0.0.0:30003->27017/tcp        mongo3
2b39f9e41973        mongo:3.2.19-jessie         "docker-entrypoint.s…"   2 days ago          Up 1 days           0.0.0.0:30002->27017/tcp        mongo2
105b6df757d7        mongo:3.2.19-jessie         "docker-entrypoint.s…"   2 days ago          Up 1 days           0.0.0.0:30001->27017/tcp        mongo1
Vinodhs-MBP:~ vinodhkrish$ docker network disconnect my-mongo-cluster mongo1

The member A has now become as SECONDARY, because it couldn’t reach other members in the replicaset. On the other side, B and C members see that A is not reachable and then B is elected as PRIMARY. We could see that some inserts from former A replicated to B before the network split happens.

(B node)

my-mongo-set:PRIMARY> db.testData.count()
15003

Now do some write operations in current PRIMARY – B and then let node A join the network back by joining the container back to the bridge network. You can observe below that the node A’s member states are changing in the mongo prompt. (I just connected to A and pressed ENTER/RETURN button many times to see the member states, or you can see them in the log file):

(A node)

Vinodhs-MacBook-Pro:mongodb-osx-x86_64-3.2.19 vinodhkrish$ ./bin/mongo 127.0.0.1:30001/admin
MongoDB shell version: 3.2.19
connecting to: 127.0.0.1:30001/admin
my-mongo-set:ROLLBACK> 
my-mongo-set:RECOVERING> 
my-mongo-set:SECONDARY> 
my-mongo-set:SECONDARY> 
my-mongo-set:PRIMARY>

ROLLBACK Internal

From MongoDB point of view, we will see the replicaset process to understand what happened above. Normally the SECONDARY member syncs the oplog entries from its syncSource (the member from where the data is replicated) by using oplogFetcher. The OplogFetcher first sends a find() command to the syncSource’s oplog, and then follows with a series of getMores on the cursor. When node A rejoins the replicaset, node A’s oplogFetcher first sends find() command to syncSource node B and check it has a greater than or equal predicate on the timestamp of the last oplog entry it has fetched. Usually the find() command should at least return one doc due to the greater than or equal predicate. If not, it means that the syncSource is behind and so it will not replicate from it and look for other syncSource.

In this case, A’s oplogFetcher sees that the first document returned from node B does not match the last entry in its oplog. That means node A’s oplog has diverged from node B’s and it should go into ROLLBACK.

Node A first finds the common point between its oplog and its syncSource B’s oplog. It then goes through all of the operations in its oplog back to the common point and figures out how to undo them. Here, 9997 inserts are missed from B and C nodes, and so these documents will be recovered from A’s oplog.

2018-03-30T12:08:37.160+0000 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: (term: 4, timestamp: Mar 30 12:03:52:139). source's GTE: (term: 5, timestamp: Mar 30 12:05:37:1) hashes: (3789163619674410187/3226093795606474294)
2018-03-30T12:08:37.160+0000 I REPL     [rsBackgroundSync] rollback 0
2018-03-30T12:08:37.160+0000 I REPL     [ReplicationExecutor] transition to ROLLBACK
2018-03-30T12:08:37.163+0000 I REPL     [rsBackgroundSync] beginning rollback
2018-03-30T12:08:37.163+0000 I REPL     [rsBackgroundSync] rollback 1
2018-03-30T12:08:37.164+0000 I REPL     [rsBackgroundSync] rollback 2 FindCommonPoint
2018-03-30T12:08:37.166+0000 I REPL     [rsBackgroundSync] rollback our last optime:   Mar 30 12:03:52:139
2018-03-30T12:08:37.166+0000 I REPL     [rsBackgroundSync] rollback their last optime: Mar 30 12:08:17:1c5
2018-03-30T12:08:37.166+0000 I REPL     [rsBackgroundSync] rollback diff in end of log times: -265 seconds
2018-03-30T12:08:37.269+0000 I REPL     [rsBackgroundSync] rollback common point is (term: 4, timestamp: Mar 30 12:03:46:d2)
2018-03-30T12:08:37.269+0000 I REPL     [rsBackgroundSync] rollback 3 fixup
2018-03-30T12:08:38.240+0000 I REPL     [rsBackgroundSync] rollback 3.5
2018-03-30T12:08:38.240+0000 I REPL     [rsBackgroundSync] Setting minvalid to (term: 5, timestamp: Mar 30 12:08:17:1c5)
2018-03-30T12:08:38.241+0000 I REPL     [rsBackgroundSync] rollback 4 n:1
2018-03-30T12:08:38.241+0000 I REPL     [rsBackgroundSync] rollback 4.6
2018-03-30T12:08:38.241+0000 I REPL     [rsBackgroundSync] rollback 4.7
2018-03-30T12:08:38.391+0000 I REPL     [rsBackgroundSync] rollback 5 d:9997 u:0
2018-03-30T12:08:38.391+0000 I REPL     [rsBackgroundSync] rollback 6
2018-03-30T12:08:38.394+0000 I REPL     [rsBackgroundSync] rollback done
2018-03-30T12:08:38.396+0000 I REPL     [rsBackgroundSync] rollback finished

ROLLBACK data

Where would these 9997 recovered documents go? MongoDB writes these ROLLBACK documents under the rollback directory in the dbpath. These recovered collections are named with namespace as the prefix and the date time as the suffix in their names. These are in BSON format, and we need to convert into JSON to analyze them so the plan for the next course of action can be done. In our case, the testData collection’s rollback data are as follows:

root@105b6df757d7:/# cd /data/db
root@105b6df757d7:/data/db# ls -l rollback/
total 324K
-rw-r--r-- 1 mongodb mongodb 323K Mar 30 12:08 admin.testData.2018-03-30T12-08-38.0.bson

root@105b6df757d7:/data/db/rollback# bsondump admin.testData.2018-03-30T12-08-38.0.bson > rollback.json
2018-03-30T12:13:00.033+0000 9997 objects found
root@105b6df757d7:/data/db/rollback# head rollback.json
{"_id":{"$oid":"5abe279f97044083811b5975"},"x":15004.0}
{"_id":{"$oid":"5abe279f97044083811b5976"},"x":15005.0}
{"_id":{"$oid":"5abe279f97044083811b5977"},"x":15006.0}
{"_id":{"$oid":"5abe279f97044083811b5978"},"x":15007.0}
{"_id":{"$oid":"5abe279f97044083811b5979"},"x":15008.0}
{"_id":{"$oid":"5abe279f97044083811b5980"},"x":15009.0}
{"_id":{"$oid":"5abe279f97044083811b5981"},"x":15010.0}

That’s it? Now check the counts of the testData collection in node A:

my-mongo-set:PRIMARY> db.testData.count()
15003

So the records 9997 which were rollbacked into the rollback directory would also be dropped from the collection. This ensures the data consistency throughout the replicaset.

How to avoid ROLLBACK – writeConcern

The default writeConcern in the replicaSet is w:1., i.e., When a client writes into a replicaSet, then it receives an acknowledgment from the PRIMARY alone and won’t wait for SECONDARY members’ acknowledgment. If you want to avoid the ROLLBACK scenario in your environment, then you have to use the {w:majority} or {w:n}, where 1 > n <=  (no. of members in your replica set). This ensures that the writes are propagated to so many members of the replica set before sending the acknowledgment to the client. This solves the problem of ROLLBACK.

But please be careful that you are not giving higher value to writeConcern, because it also affects the write performance. The acknowledgment needs to be received from the number of members mentioned in the value. The value {w:majority} provides the acknowledgement that write operations have propagated to the majority of voting nodes, including the primary and is suitable for most of the environments.

ROLLBACK – Limitation

The main thing to note here is that mongod will not rollback more than 300MB data. In such cases, we need to manually check the instance to recover the data. You can see the below message in mongod.log in such cases:

[replica set sync] replSet syncThread: 13410 replSet too much data to roll back

Understanding this simple ROLLBACK background helps us to decide what needs to be done with the rollbacked data. It also helps us avoid such scenarios, because data is data and is very important!

The post MongoDB Rollback in replicaset appeared first on Percona Database Performance Blog.

by Vinodh Krishnaswamy at May 01, 2018 10:30 PM

Webinar Thursday May 3, 2018: Running MongoDB in Production (Part 3)

Running MongoDB

Running MongoDBPlease join Percona’s Senior Technical Operations Architect, Tim Vaillancourt as he presents Running MongoDB in Production (Part 3) on Thursday, May 3, 2018, at 10:00 am PDT (UTC-7) / 1:00 pm EDT (UTC-4).

Are you a seasoned MySQL DBA that needs to add MongoDB to your skills? Are you used to managing a small environment that runs well, but want to know what you might not know yet?

MongoDB works well, but when it has issues the number one question is “where should I go to solve a problem?”

This webinar on running MongoDB covers:

  • Troubleshooting
    • Log File
    • Slow Query
    • Operations
  • Schema Design
    • Data Types
    • Indexes
    • Workflows
  • Data Integrity
    • Replica Sets
    • Write Concerns
    • Data Recovery
  • Scaling (Read/Writes)

Register for the webinar now.

Missed Part 1 and Part 2 of our Running MongoDB in Production series? You can watch and download the slides of Part 1 here and watch or download the slides of Part 2 here.

Timothy Vaillancourt, Senior Technical Operations Architect

Tim joined Percona in 2016 as Sr. Technical Operations Architect for MongoDB with the goal of making MongoDB operations as smooth as possible. With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming, combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned.

Tim is based in Amsterdam, NL and enjoys traveling, coding and music. Prior to Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS. Prior to moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

The post Webinar Thursday May 3, 2018: Running MongoDB in Production (Part 3) appeared first on Percona Database Performance Blog.

by Tim Vaillancourt at May 01, 2018 08:29 PM

ClickHouse Meetup in Salt Lake City

ClickHouse Cloud Native Utah

ClickHouse Cloud Native UtahJoin Percona CTO Vadim Tkachenko at the Cloud Native Utah meetup in Salt Lake City on Tuesday, May 8, 2018, for an Intro to ClickHouse.

Next week, I’ll be switching from MyRocks performance testing and present an introduction to ClickHouse to the Cloud Native Utah meetup.

Interestingly enough, even though it is totally different from OLTP engines, ClickHouse uses a MergeTree engine. MergeTree engines have a lot of similarities with Log Structured Merge Tree (which is what is used by MyRocks / RocksDB). This the structure is optimized to run on huge datasets / low memory scenarios.

PingCAP TiDB and CockroachDB – the new databases on the block – are using RocksDB as the main storage engine. So is Log Structured Merge Tree the future of databases?

We can talk about this and other questions next week in Salt Lake City. If you are in town please join us at the Cloud Native Utah meetup.

 

The post ClickHouse Meetup in Salt Lake City appeared first on Percona Database Performance Blog.

by Vadim Tkachenko at May 01, 2018 07:07 PM

April 30, 2018

Peter Zaitsev

A Look at MyRocks Performance

MyRocks Performance

In this blog post, I’ll look at MyRocks performance through some benchmark testing.

As the MyRocks storage engine (based on the RocksDB key-value store http://rocksdb.org ) is now available as part of Percona Server for MySQL 5.7, I wanted to take a look at how it performs on a relatively high-end server and SSD storage. I wanted to check how it performs for different amounts of available memory for the given database size. This is similar to the benchmark I published a while ago for InnoDB (https://www.percona.com/blog/2010/04/08/fast-ssd-or-more-memory/).

In this case, I plan to use a sysbench-tpcc benchmark (https://www.percona.com/blog/2018/03/05/tpcc-like-workload-sysbench-1-0/) and I will execute it for both MyRocks and InnoDB. We’ll use InnoDB as a baseline.

For the benchmark, I will use 100 TPC-C warehouses, with a set of 10 tables (to shift the bottleneck from row contention). This should give roughly 90GB of data size (when loaded into InnoDB) and is a roughly equivalent to 1000 warehouses data size.

To vary the memory size, I will change innodb_buffer_pool_size from 5GB to 100GB for InnoDB, and rocksdb_block_cache_size for MyRocks.

For MyRocks we will use LZ4 as the default compression on disk. The data size in the MyRocks storage engine is 21GB. Interesting to note, that in MyRocks uncompressed size is 70GB on the storage.

For both engines, I did not use FOREIGN KEYS, as MyRocks does not support it at the moment.

MyRocks does not support SELECT .. FOR UPDATE statements in REPEATABLE-READ mode in the Percona Server for MySQL implementation. However, “SELECT .. FOR UPDATE” is used in this benchmark. So I had to use READ-COMMITTED mode, which is supported.

The most important setting I used was to enable binary logs, for the following reasons:

  1. Any serious production uses binary logs
  2. With disabled binary logs, MyRocks is affected by a suboptimal transaction coordinator

I used the following settings for binary logs:

  • binlog_format = ‘ROW’
  • binlog_row_image=minimal
  • sync_binlog=10000 (I am not using 0, as this causes serious stalls during binary log rotations, when the  content of binary log is flushed to storage all at once)

While I am not a full expert in MyRocks tuning yet, I used recommendations from this page: https://github.com/facebook/mysql-5.6/wiki/my.cnf-tuning. The Facebook-MyRocks engineering team also provided me input on the best settings for MyRocks.

Let’s review the results for different memory sizes.

This first chart shows throughput jitter. This helps to understand the distribution of throughput results. Throughput is measured every 1 second, and on the chart I show all measurements after 2000 seconds of a run (the total length of each run is 3600 seconds). So I show the last 1600 seconds of each run (to remove warm-up phases):

MyRocks Performance

To better quantify results, let’s take a look at them on a boxplot. The quickest way to understand boxplots is to take a look at the middle line. It represents a median of measurements (see more at https://www.percona.com/blog/2012/02/23/some-fun-with-r-visualization/):

MyRocks Performance 2

Before we jump to the summary of results, let’s take a look at a variation of the throughput for both InnoDB and MyRocks. We will zoom to a 1-second resolution chart for 100 GB of allocated memory:

MyRocks Performance 3

We can see that there is a lot of variation with periodical 1-second performance drops with MyRocks. At this moment, I do not know what causes these drops.

So let’s take a look at the average throughput for each engine for different memory settings (the results are in tps, and more is better):

Memory, GB InnoDB MyRocks
5 849.0664 4205.714
10 1321.9 4298.217
20 1808.236 4333.424
30 2275.403 4394.413
40 2968.101 4459.578
50 3867.625 4503.215
60 4756.551 4571.163
70 5527.853 4576.867
80 5984.642 4616.538
90 5949.249 4620.87
100 5961.2 4599.143

 

This is where MyRocks behaves differently from InnoDB. InnoDB benefits greatly from additional memory, up to the size of working dataset. After that, there is no reason to add more memory.

At the same time, interestingly MyRocks does not benefit much from additional memory.

Basically, MyRocks performs as expected for a write-optimized engine. You can refer to my article How Three Fundamental Data Structures Impact Storage and Retrieval for more details. 

In conclusion, InnoDB performs better (compared to itself) when the working dataset fits (or almost fits) into available memory, while MyRocks can operate (and outperform InnoDB) on small memory sizes.

IO and CPU usage

It is worth looking at resource utilization for each engine. I took vmstat measurements for each run so that we can analyze IO and CPU usage.

First, let’s review writes per second (in KB/sec). Please keep in mind that these writes include binary log writes too, not just writes from the storage engine.

Memory, GB InnoDB MyRocks
5 244754.4 87401.54
10 290602.5 89874.55
20 311726 93387.05
30 313851.7 93429.92
40 316890.6 94044.94
50 318404.5 96602.42
60 276341.5 94898.08
70 217726.9 97015.82
80 184805.3 96231.51
90 187185.1 96193.6
100 184867.5 97998.26

 

We can also calculate how many writes per transaction each storage engine performs:

MyRocks Performance 4

This chart shows the essential difference between InnoDB and MyRocks. MyRocks, being a write-optimized engine, uses a constant amount of writes per transaction.

For InnoDB, the amount of writes greatly depends on the memory size. The less memory we have, the more writes it has to perform.

What about reads?

The following table shows reads in KB per second.

Memory, GB InnoDB MyRocks
5 218343.1 171957.77
10 171634.7 146229.82
20 148395.3 125007.81
30 146829.1 110106.87
40 144707 97887.6
50 132858.1 87035.38
60 98371.2 77562.45
70 42532.15 71830.09
80 3479.852 66702.02
90 3811.371 64240.41
100 1998.137 62894.54

 

We can translate this to the number of reads per transaction:

MyRocks Performance 5

This shows MyRocks’ read-amplification. The allocation of more memory helps to decrease IO reads, but not as much as for InnoDB.

CPU usage

Let’s also review CPU usage for each storage engine. Let’s start with InnoDB:

MyRocks Performance 6

The chart shows that for 5GB memory size, InnoDB spends most of its time in IO waits (green area), and the CPU usage (blue area) increases with more memory.

This is the same chart for MyRocks:

MyRocks Performance 7

In tabular form:

Memory, GB engine us sys wa id
5 InnoDB 8 2 57 33
5 MyRocks 56 11 18 15
10 InnoDB 12 3 57 28
10 MyRocks 57 11 18 13
20 InnoDB 16 4 55 25
20 MyRocks 58 11 19 11
30 InnoDB 20 5 50 25
30 MyRocks 59 11 19 10
40 InnoDB 26 7 44 24
40 MyRocks 60 11 20 9
50 InnoDB 35 8 38 19
50 MyRocks 60 11 21 7
60 InnoDB 43 10 36 10
60 MyRocks 61 11 22 6
70 InnoDB 51 12 34 4
70 MyRocks 61 11 23 5
80 InnoDB 55 12 31 1
80 MyRocks 61 11 23 5
90 InnoDB 55 12 32 1
90 MyRocks 61 11 23 4
100 InnoDB 55 12 32 1
100 MyRocks 61 11 24 4

 

We can see that MyRocks uses a lot of CPU (in us+sys state) no matter how much memory is allocated. This leads to the conclusion that MyRocks performance is limited more by CPU performance than by available memory.

MyRocks directory size

As MyRocks writes all changes and compacts SST files down the road, it would be interesting to see how the data directory size changes during the benchmark so we can estimate our storage needs. Here is a chart of datadirectory size:

MyRocks Performance 8

We can see that datadirectory goes from 20GB at the start, to 31GB during the benchmark. It is interesting to observe the data growing until compaction shrinks it.

Conclusion

In conclusion, I can say that MyRocks performance increases as the ratio of dataset size to memory increases, outperforming InnoDB by almost five times in the case of 5GB memory allocation. Throughput variation is something to be concerned about, but I hope this gets improved in the future.

MyRocks does not require a lot of memory and shows constant write IO, while using most of the CPU resources.

I think this potentially makes MyRocks a great choice for cloud database instances, where both memory and IO can cost a lot. MyRocks deployments would make it cheaper to deploy in the cloud.

I will follow up with further cloud-oriented benchmarks.

Extras

Raw results, scripts and config

My goal is to provide fully repeatable benchmarks. To this end, I’m  sharing all the scripts and settings I used in the following GitHub repo:

https://github.com/Percona-Lab-results/201803-sysbench-tpcc-myrocks

MyRocks settings

rocksdb_max_open_files=-1
rocksdb_max_background_jobs=8
rocksdb_max_total_wal_size=4G
rocksdb_block_size=16384
rocksdb_table_cache_numshardbits=6
# rate limiter
rocksdb_bytes_per_sync=16777216
rocksdb_wal_bytes_per_sync=4194304
rocksdb_compaction_sequential_deletes_count_sd=1
rocksdb_compaction_sequential_deletes=199999
rocksdb_compaction_sequential_deletes_window=200000
rocksdb_default_cf_options="write_buffer_size=256m;target_file_size_base=32m;max_bytes_for_level_base=512m;max_write_buffer_number=4;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=20;level0_stop_writes_trigger=30;max_write_buffer_number=4;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=0};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;memtable_prefix_bloom_size_ratio=0.05;prefix_extractor=capped:12;compaction_pri=kMinOverlappingRatio;compression=kLZ4Compression;bottommost_compression=kLZ4Compression;compression_opts=-14:4:0"
rocksdb_max_subcompactions=4
rocksdb_compaction_readahead_size=16m
rocksdb_use_direct_reads=ON
rocksdb_use_direct_io_for_flush_and_compaction=ON

InnoDB settings

# files
 innodb_file_per_table
 innodb_log_file_size=15G
 innodb_log_files_in_group=2
 innodb_open_files=4000
# buffers
 innodb_buffer_pool_size= 200G
 innodb_buffer_pool_instances=8
 innodb_log_buffer_size=64M
# tune
 innodb_doublewrite= 1
 innodb_support_xa=0
 innodb_thread_concurrency=0
 innodb_flush_log_at_trx_commit= 1
 innodb_flush_method=O_DIRECT_NO_FSYNC
 innodb_max_dirty_pages_pct=90
 innodb_max_dirty_pages_pct_lwm=10
 innodb_lru_scan_depth=1024
 innodb_page_cleaners=4
 join_buffer_size=256K
 sort_buffer_size=256K
 innodb_use_native_aio=1
 innodb_stats_persistent = 1
 #innodb_spin_wait_delay=96
# perf special
 innodb_adaptive_flushing = 1
 innodb_flush_neighbors = 0
 innodb_read_io_threads = 4
 innodb_write_io_threads = 2
 innodb_io_capacity=2000
 innodb_io_capacity_max=4000
 innodb_purge_threads=4
 innodb_adaptive_hash_index=1

Hardware spec

Supermicro server:

  • CPU:
    • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
  • Memory: 256GB of RAM
  • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
  • Filesystem: ext4
  • Percona-Server-5.7.21-20
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic

The post A Look at MyRocks Performance appeared first on Percona Database Performance Blog.

by Vadim Tkachenko at April 30, 2018 11:35 PM

Keep Sensitive Data Secure in a Replication Setup

Keep sensitive data secure

Keep sensitive data secureThis blog post describes how to keep sensitive data secure on slave servers in a MySQL async replication setup.

Almost every web application has a sensitive data: passwords, SNN, credit cards, emails, etc. Splitting the database to secure and “public” parts allows for restricting user and application parts access to sensitive data.

Field encryption

This is based on MySQL encryption functions or on client-side encryption when the authorized user knows a secret, but encrypted data is distributed to all slaves.

  • If possible, use hashes with a big enough salt, and do not store real sensitive data in the database. A good example is passwords. An end-user sends the login and password, application/SQL code calculates the hash with a salt value unique for each end-user and compares the hash with the value stored in the database. Even if the attacker gets the hashes, it’s still hard or even impossible to extract real passwords for all users. Make sure that you are using a good random number generator for the salt, application-side secret, and a good hash function (not MD5).
  • Encryption is not suitable if you are going to provide public access to your database (via slave dumps in sql/csv/xml/json format).
  • Encryption is a complex topic. Check here for a good blog post explaining hashing usage, and try to find a security consultant if you are inventing some “new” method of storing and encrypting data.

Field encryption example

I’m using a single server setup, because the most important part of data separation should be done on the application side. The secure part of the application has a secret passphrase. For example, you can place the code working with authentication, full profile and payments on a separate server and use a dedicated MySQL account.

create database encrypted;
use encrypted;
create table t(c1 int, c2 varchar(255), rnd_pad varbinary(16), primary key(c1));
SET block_encryption_mode = 'aes-256-cbc';
SET @key_str = SHA2('My secret passphrase',512);
SET @init_vector = RANDOM_BYTES(16);
insert into t (c1,c2, rnd_pad) values (1, AES_ENCRYPT('Secret', @key_str, @init_vector), @init_vector);
-- decrypt data
select c1, AES_DECRYPT(c2,@key_str, rnd_pad) from t;

Summary

  • GOOD: Master and slave servers have exactly the same data and no problems with replication.
  • GOOD: Even if two different end-users have exactly the same password, the stored values are different due to random bytes in the init vector for AES encryption.
  • GOOD: Both the encryption and random number generation uses an external library (openssl).
  • CONF: It’s important to have binlog_format=ROW to avoid sending the secret to slave servers.
  • CONF: Do not allow end-users to change data without changing the init_vector, especially for small strings without random padding. Each update should cause init_vector re-generation.
  • BAD: Encrypted data is still sent to slave servers. If the encryption algorithm or protocol is broken, it is possible to get access to data from an insecure part of the application.
  • BAD: The described protocol still could be insecure.

Replication filters

There are two types of replication filters: a master-side with binlog-*db and a slave-side with replicate-*.

Both could cause replication breakage. Replication filters were created for STATEMENT-based replication and are problematic with modern binlog_format=ROW + gtid_mode=on setup. You can find several cases related to database-level slave-side filters in this blog post. If you still need slave-side filtering, use per-table replicate-wild-*-table options.

Master-side

Even if binary logging is disabled for a specific database, the statement still could be stored in the binary log if it’s a DDL statement, or if the binlog_format is STATEMENT or MIXED and default database is not used by the statement. For details, see the reference manual for the binlog-do-db option. In order to avoid replication issues, you should use ROW-based replication and run SET SESSION sql_log_bin=0; before each DDL statement is executed against the ignored database. It’s not a good idea to use binlog-do-db, because you are losing control of what should be replicated.

Why is binary log filtering useful? Changing the sql_log_bin variable is prohibited inside transactions. The sql_log_bin is DANGEROUS, please do not use it instead of binlog-ignore-db in production on the application side. If you need it for database administration, make sure that you are always typing the “session” word before sql_log_bin. This makes problematic consistent updates of multiple entities inside database.

We still should have the ability to hide just one column from the table. But if we are ignoring the database, we should provide a method of reading non-secure data on slaves / by restricted MySQL accounts. This is possible with triggers and views:

create database test;
set session sql_log_bin=0;
create table test.t(c1 int, c2 int, primary key(c1));
alter table test.t add primary key(c1);
set session sql_log_bin=1;
create database test_insecure;
create table test_insecure.t(c1 int, c2 int default NULL, primary key(c1));
use test
delimiter //
create trigger t_aft_ins
after insert
 on test.t FOR EACH ROW
BEGIN
  INSERT test_insecure.t (c1) values (NEW.c1);
END //
create trigger t_aft_upd
after update
 on test.t FOR EACH ROW
BEGIN
  UPDATE test_insecure.t SET c1 = NEW.c1 WHERE c1 = OLD.c1;
END //
create trigger t_aft_del
after delete
 on test.t FOR EACH ROW
BEGIN
  DELETE FROM test_insecure.t WHERE c1 = OLD.c1;
END //
delimiter ;
-- just on slave:
create database test;
create view test.t as select * from test_insecure.t;
-- typical usage
INSERT INTO test.t values(1,1234);
SELECT * from test.t; -- works on both master and slave, c2 field will have NULL value on slave.

Summary

  • BAD: The data is not the same on the master and slaves. It potentially breaks replication. It’s not possible to use a slave’s backup to restore the master or promote the slave as a new master.
  • BAD: Triggers could reduce DML statement performance.
  • GOOD: The sensitive data is not sent to slaves at all (and not written to binary log).
  • GOOD: It works with GTID
  • GOOD: It requires no application changes (or almost no application changes).
  • GOOD: binlog-ignore-db allows us to not use the dangerous sql_log_bin variable after initial table creation.

The post Keep Sensitive Data Secure in a Replication Setup appeared first on Percona Database Performance Blog.

by Nickolay Ihalainen at April 30, 2018 09:47 PM

Jean-Jerome Schmidt

Cloud Database Features Comparison - Amazon RDS vs Google Cloud SQL

As more companies run their workloads in the cloud, cloud database services are increasingly being used to manage data. One of the advantages of using a cloud database service instead of maintaining your database is that it reduces the management overhead. Database services from the leading cloud vendors share many similarities, but they have individual characteristics that may make them well-, or ill-suited to your workload. Developers are always looking for convenient ways of running their databases, whether it is to obtain more profound insight into database performance, to perform a migration efficiently, to simplify backup and restore processes, or to do many other "day to day" tasks. Among the number of available cloud services, it may not be easy to figure out which is the best one for our use case. In this article, we’ll compare two of the most popular cloud database services on the market - Google Cloud SQL and Amazon RDS.

Amazon RDS provides a web interface through which you can deploy MySQL. The RDS service manages the provisioning of the instance and configuration. Additionally, it also provides a console to monitor and perform basic database administration tasks. Google Cloud SQL similarly provides a predefined MySQL setup that is automatically managed. Predefined services can be a comfortable way to manage your databases however at the same time they can limit functionality. Let's take a closer look then at these management features.

Database logs and metrics monitoring

Amazon RDS and Google Cloud don't provide access to the shell. Your primary concern here may be access to essential log files. Amazon CloudWatch is a monitoring service for cloud resources which you can use to solve this problem. It collects metrics, collects and monitor log files or automatically react to changes in your AWS resources. Using CloudWatch, you can gather and processes error log, audit log and other logs from RDS into metrics presented in the web console. These statistics are recorded for 15 months so you can maintain a history. CloudWatch can take actions such as sending a notification to a notification recipient or if needed - autoscaling policies, which in turn may automatically handle an increase in load by adding more resources.

Amazon CloudWatch
Amazon CloudWatch

Google cloud also provides log processing functionality. You can view the Google Cloud SQL logs in the operations panel or through Google console. The operations panel logs every operation performed on the instance with pretty basic information. It could be extended with manually added metrics based on data from a file source. Unfortunately, the operations log does not include activities performed using external management tools, such as the mysql client. To extend basic functionality Google has another service - Stackdriver. The Stackdriver service can be used to create alerts for metrics defined in operational panel. Stackdriver embraces not only Google Cloud Platform (GCP) but also AWS and local services. You can use it for cross-cloud platform monitoring without additional agents. Stackdriver requires the installation of an open source based collected agent to access non-cloud metrics.

Google Cloud SQL logging
Google Cloud SQL logging

There are various ways in which you could monitor the MySQL instances metrics. It can be performed by querying the server all the time for the metrics values or with predefined services. You can get more in-depth visibility into the health of your Amazon RDS instances in real time with Enhanced Monitoring for Amazon RDS. It provides metrics so that you can monitor the health of your DB instances and DB clusters. You can monitor both DB instance metrics and operating system (OS) metrics.

It provides a set of over 50 database instance metrics and aggregated process information for your instances, at the granularity of 1 second. You can visualize the metrics on the RDS console.

Both CloudWatch and Stackdriver provides functionality to create alarms based on metrics. Amazon does it with Amazon Simple Notification Service (SNS) for notification. In Stackdiver it's done directly in this service.

Google Stackdriver monitoring dashboard
Google Stackdriver monitoring dashboard

Data Migration into Cloud

At this moment backup based migration to Google Cloud SQL is quite limited. You can only use logical dump, which may be a problem for bigger databases. The SQL dump file must not include any triggers, views, or stored procedures. If your database needs these elements, you should recreate them after shipping the data. If you have already created a dump file that holds these components, you need manually edit the file. The database you are importing into must exist up front. There is no option to migrate to Google cloud from other RDBMS. It all makes the process quite limited, not to mention that there is no option for cross-platform migration in real time (AWS RDS).

AWS Database Migration Service
AWS Database Migration Service

Amazon Database Migration Service (DMS) supports homogenous migrations such as MySQL to MySQL, as well as heterogeneous migrations between different database platforms. AWS DMS can help you in planning and migration of on-premises relational data stored in Oracle, SQL Server, MySQL, MariaDB, or PostgreSQL databases. DMS works by setting up and then managing a replication instance on AWS. This instance dumps data from the source database and loads it into the target database.

Achieving High Availability

Google use semisynchronous replicas to make your database highly available. Cloud SQL provides the ability to replicate a master instance to one or more read replicas. If the zone where the master is located experiences an outage and the backup server is set, Cloud SQL fails over to the failover replica.

Google Cloud SQL create read replica
Google Cloud SQL create read replica

The setup is straightforward, and with a couple of clicks, you can achieve a working slave node. Nevertheless, configuration options are limited and may not fit your system requirements. You can choose from the following replica scenarios:

  • read replica - a read replica is a one to one copy of the master. This is the base model where you create a replica to offload read requests or analytics traffic from the master,
  • external read replica - this option is to configure an instance that replicates to one or more replicas external to Cloud SQL,
  • external master - setup replication to migrate to Google Cloud SQL.

Amazon RDS provides read replica services. Cross-region read replicas gives you the ability to scale as AWS has its services in many areas in the world. RDS asynchronous replication is highly scalable. All read replicas are accessible and can be used for reading in a maximum number of five regions. These nodes are independent and can be used in your upgrade path or can be promoted to a standalone database.

In addition to that, Amazon offers Multi-AZ deployments based on DRBD, synchronous disk replication. How is it different from Read Replicas? The main difference is that only the database engine on the primary instance is active, which leads to other architectural variations.

Automated backups are taken from standby. That significantly reduces the possibility of performance degradation during a backup.

As opposed to read replicas, database engine version upgrades happen on the primary. Another difference is that AWS RDS will failover automatically while read replicas will require manual operations from you.

Multi-AZ failover on RDS uses a DNS change to point to the standby instance, according to Amazon this should happen in 60-120 seconds of unavailability during the failover. Because the standby uses the same storage data as the primary, there will probably be transaction/log recovery. Bigger databases may spend a significant amount of time on innoDB recovery, so please consider that in your DR plan.

Encryption

Security compliance is one of the critical concerns for enterprises whose data is in the cloud. When dealing with production databases that hold sensitive and vital data, it is highly recommended to implement encryption to protect the data from unauthorized access.

In Google Cloud SQL, customer data is encrypted when stored in database tables, temporary files, and backups. Outside connections can be encrypted by SSL certificates (especially for intra-zone connections to Cloud SQL), or by using the Cloud SQL Proxy. Google encrypts and authenticates all data in transit and data at rest with AES-256.

With RDS encryption enabled, the data is stored on the instance underlying storage, the automated backups, read replicas, and snapshots all become encrypted. The RDS encryption keys implement the AES-256 algorithm. Keys are being managed and protected by the AWS key management infrastructure through AWS Key Management Service (AWS KMS). You do not need to make any modifications to your code or operating model to benefit from this critical data protection feature. AWS CloudHSM is a service that helps meet stringent compliance requirements for cryptographic operations and storage of encryption keys by using a single tenant Hardware Security Module (HSM) appliances within the AWS cloud.

Pricing

Instance pricing for Google Cloud SQL is credited for every minute that the instance is running. The cost depends on the device type you choose for the instance, and the area where it's placed. Read replicas and failover replicas are charged at the same rate as stand-alone instances. The pricing starts from $0.0126 per hour of micro instance to $8k, db-n1-highmem-64 with 64 vCPUs, 416 GB RAM, 10,230 GB disk and limit of 4,000 connections.

Like other AWS products, users pay for what they use with RDS. But, this pay-as-you-go model has a specific billing construct that can, if left unchecked, yield questions or surprise billing elements if no one’s aware of what’s actually in the bill. You may bill your database options starting from 0.175$ per hour to upfront thousands of dollars. Both platforms are quite flexible, but you will see more configuration options in AWS.

Infrastructure

As mentioned in the pricing section, Google Cloud SQL can be scaled up to 64 processor cores and more than 400GB of RAM. The maximum size of the disk is 10TB per instance. You can configure your instance settings to increase it automatically. That should be plenty for many project requirements. Nevertheless if we take a look on what Amazon offers, there is still a long way for Google. RDS not only offers power instances but also long list of other services around it.

RDS supports storage volume snapshots, which you can use for point-in-time recovery or share with other AWS accounts. You can also take advantage of its provisioned IOPS feature to increase I/O. RDS can also be launched in Amazon VPC, Cloud SQL doesn’t yet support a virtual private network.

Backup

RDS generates automated backups of your DB instance. RDS establishes a storage volume snapshot of your DB instance, backing up the entire DB instance and not individual databases. Automated backups occur daily during the preferred backup window. If the backup requires more time than allotted to the backup window, the backup continues after the window ends, until it finishes. Read replication doesn't have backup enabled by default.

When you want to do a restore, the only option is to create a new instance. It can be restored to last backup or point in time recovery. Binary logs will be applied automatically, there is no possibility to get access to them. RDS PITR option is quite limited as it does not allow you to choose an exact time, or transaction. You will be limited to a 5 minutes interval. In most case scenarios, these settings may be sufficient however if you need to recover your database to the single transaction or exact time you need to be ready for manual actions.

Google Cloud SQL backup data is stored in separate regions for redundancy. With the automatic backup function enabled, database copy will be created every 4 hours. If needed you can create on-demand backups (for any Second Generation instance), whether the instance has automatic backups enabled or not. Google and Amazon approach for backups is quite the same however with Cloud SQL it is possible to perform the point in time recovery to the specific binary log and position.

by Bart Oles at April 30, 2018 01:10 PM

April 28, 2018

Federico Razzoli

Hidden caches catch your data

This article is different from my usual posts. It explains things that may be obvious to many database professionals – not all of them though.

The idea came indirectly from my friend Francesco Allertsen. He has a weekly mailing list he uses to share links to interesting articles he reads on the web. One of them was The hidden components of Web caching. Its purpose is to list all caches that play some role when we interact with a web site. An interesting idea, even if I find it incomplete. So I thought it was a good idea to talk about caches that we hit whenever we interact with a database.

Why should we care?

But first, a note on why we should care:

  • Caches increase the speed of IO by orders of magnitude. Latency numbers that everyone should know gives you an idea of that. The first comment suggests to use a solar system image to visualise the scale; curiously I had the same idea, and I used this image for some slides I made in the past.
  • When reliability is important, caches can get in the way in a dangerous way. Because caches are volatile, so they won’t survive a crash or other types of hardware/software failure (aka bugs and limitations). So when we write data, for certain use cases (financial transactions, etc) at no time data should be cached but not written to disk. Or it can happen temporarily before the database says “Ok Mr. Application, I got your data, you can resume your job”.

What guarantees this requirement? Well, caches can be write-through, write-around or write-back. Write-through and write-around caches are reliable in this respect, because data is always written to disk before the control is returned to the writer. Write-back are not reliable, because data is made persistent asynchronously, after the control has been returned. But of course they are much faster because they allow to groups more IO operations together (+ throughput), because latency is very low and because there is no stall if the IO capacity is currently saturated.

It’s all about tradeoffs

Depending on the use case, we will have to choice the proper tradeoff between reliability and performance. For example, InnoDB allows to:

  • Flush changes to disks at every commit: even if mysqld crashes, no data loss is possible if you don’t hit any software/hardware bug;
  • Flush changes to the filesystem cache at every commit: a filesystem crash is the only event that can cause data loss, and it is not likely;
  • Flush data to disk once a second, or even longer intervals.

Also, when we make the choice, we should keep into account data redundancy. For example, if we run a Galera Cluster, we have at least 3 copies of the data on different servers. Or we could store data on a RAID array, which also guarantees that we have multiple copies of data. Failover guarantees that our services don’t break if one copy gets damaged, and we can restore it from another copy. In such cases, even if we deal with critical data, we don’t necessarily need InnoDB to store data in the most reliable way – which is the slowest.

Types of caches

Enough blah blah. Here is the list of caches that could get in the way when we try to persist our data, starting from the lowest levels:

Disk caches – Both spinning disks and SSD can have caches. In your laptop, this cache is most probably write-back. This can usually be changed with something like hdparm.

Drive controllers and RAIDs caches – These can be write-through or write-back, and usually they are configurable. Notice that they could also be battery-backed, which means that they will survive a crash (unless the device has no power for a long time). Battery-backed caches can safely be write-back, but for other caches a write-through strategy could be necessary. Battery-backed RAIDs need a periodic learning cycle. A learning cycle slows down all operations sensibly, but it is necessary to be sure that the battery is totally charged. This operation should be scheduled carefully.

Filesystem cache – You can use it in a write-through or write-back fashion. This topic is amazingly complex (just like the previous ones), so I’ll just give you a link to a wonderful article: Files are Hard.

Virtual machines – Virtual machines have a disk interface cache. Its write strategy depends on the cache mode. There are several cache modes, but here we’ll only mention the reliable ones: none, which means that the VM doesn’t cache data (but the host system can), and writethrough, whose meaning should now be clear. Virtual machines also have a filesystem cache, of course. Note that having reliable settings on the VM guarantees that data changes will survive if the VM or anything running in it will crash; but if the host doesn’t have reliable settings and it crashes, most recent changes could be lost. Still, in production, typically a hypervisor runs many VMs. If many VMs bypass the cache, hypervisor’s IO capacity can be easily saturated. It could be better to build a database cluster with VMs running on separate physical hosts, to reduce the risk of data loss in case of crashes – in other words, often it’s better to rely on redundancy and failover, rather than reliability of individual systems.

Transaction log buffer – Terminology differ from DBMS to DBMS (WAL, transaction logs…), but the idea is that changes are persistent when they hit these logs. They will also need to be written to data files, but if they are in the logs they are safe. These logs have a buffer, which contains data not yet flushed. This idea can be confusing for some, so I’ll make it clear: this speeds up things, but doesn’t cause any danger. If your flush strategy is a write-through one, the buffer will contain not yet committed changes, which are flushed on commit – and only after the flush, the DBMS will report success. Regardless your flush strategy, some changes are flushed if the buffer gets full.

Binary log buffer – There is not necessarily a binary log separated from the transaction logs. MySQL has it because its architecture requires it – binary log contains all changes to data and it’s handled by the server, transaction logs contain information necessary to replay or rollback transactions and it’s handled by InnoDB (actually even non-transactional storage engines can have logs, but I’m not going to discuss this here). Considerations about the transaction logs apply to the binary log as well, keeping in mind that its purpose is different (incremental backups and replication, not crash recovery). In Postgres you have WAL files, which are used both for incremental backups/replication and for crash recovery.

Buffer pool – Most databases (Postgres is a famous exception) have a buffer pool to cache frequently accessed data and indexes. It can even contain dirty pages: changes that are not yet written to data files. This makes things much faster. And again: changes are persistent when they are written to transaction logs. Even after a crash, data files can be repaired using transaction logs.

Session buffers, work mem – These buffers speed up parts of query execution, like joins and sorting. However they have nothing to do with writes.

Query cache – MySQL older versions, MariaDB, and maybe other DBMS’s (not sure, sorry) have a query cache. This can speed up reads when the very same query is ran often. “Very same” means that hashes of the queries are compared, so any difference is relevant, including whitespaces. Every time a table is written, all queries mentioning the table are invalidated in the cache. This and its well-known scalability problems make it usually a bad idea, at least in MariaDB/MySQL (there are exceptions – for example, if you have small concurrency, a reasonable number of very slow queries and not many writes).

Proxy caches – Proxies, like ProxySQL, can also have a query cache. It can have problems and it is not necessarily useful, but at least it is supposed to be built with scalability in mind (because proxies are about scalability).

Redis & friends – This should be obvious: retrieving data from a cache system (like Redis or Memcached) is much faster than retrieving it from MySQL. Usually those data have a TTL (time to live), which determines when they will expire, and they can also be invalidated manually. Keep in mind that this makes response times unpredictable: if data is cached response time is X, if it is expired time is Y – where X and Y could be very different. It is even more unpredictable if this cache is not enough to contain all your hot data. So you should be careful about what to cache, unless your dataset is small. Note that these caches could also use the disks: for example, older Redis versions had Virtual Memory (currently deprecated). But we will not dig into this, as our focus is the persistent database. The point is: these caches can avoid database queries, but not always.

Application – No matter how fast proxy’s query cache, Redis and Memcached are: retrieving data from local RAM is much faster. No network round trip, no other servers response time involved. Of course you shouldn’t cache locally a big amount of data, or your memory will not be enough and your application could suffer. And cache invalidation can be a very complex problem to consider. But still, for hottest small data, local memory is the fastest option. To avoid making the response time unpredictable, it’s better to keep application-level caches updated, instead of running queries when an entry expires. Writes to the database are still necessary, and they can be synchronous or asynchronous, depending on how critical these data are.

Trust no one

A famous fairy said that some lies have short legs and others have long a nose. If hard disks, controllers and even filesystems had noses, some of them would have a long nose.

I will not dig into this complex topic myself, but the take away of this paragraph is: don’t trust them. They sometimes lie about consistency, so benchmarks are more impressive and marketing people are happy. Instead, try diskchecker.pl. It will tell you if something in your system is lying. It will not tell you if it is the hard disk, or the controller, or something in the OS. But it will tell you if data it writes are actually persisted immediately.

If your data are on the cloud, you cannot use this tool – because it involves shutting down the physical server suddenly while a file is being written. I am not aware of any tool or procedure to check if your cloud provider is lying about persistence. If you know one, please write a comment to this post. That would be much appreciated.

Databases don’t lie – at least, I am not aware of any DBMS or object store lying about persistence. But they have bugs, just like any piece of software, so you should check them periodically. Here is a PostgreSQL example.

Federico

by Federico at April 28, 2018 10:06 AM

April 27, 2018

Peter Zaitsev

This Week In Data with Colin Charles 37: Percona Live 2018 Wrap Up

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live Santa Clara 2018 is now over! All things considered, I think it went off quite well; if you have any comments/complaints/etc., please don’t hesitate to drop me a line. I believe a survey will be going out as to where you’d like to see the conference in 2019 – yes, it is no longer going to be at the Santa Clara Convention Centre.

I was pleasantly surprised that several people came up to me saying they read this column and enjoy it. Thank you!

The whole conference was abuzz with MySQL 8.0 GA chatter. Many seemed to enjoy the PostgreSQL focus too, now that Percona announced PostgreSQL support.

Congratulations as well to the MySQL Community Awards 2018 winners.

Releases

Link List

Upcoming appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week In Data with Colin Charles 37: Percona Live 2018 Wrap Up appeared first on Percona Database Performance Blog.

by Colin Charles at April 27, 2018 10:17 PM

MySQL 8.0 GA: Quality or Not?

MySQL 8.0 GA

MySQL 8.0 GAWhat does Anton Ego – a fictional restaurant critic from the Pixar movie Ratatouille – have to do with MySQL 8.0 GA?

When it comes to being a software critic, a lot.

In many ways, the work of a software critic is easy. We risk very little and thrive on negative criticism, which is fun to read and write.

But what about those who give their many hours of code development, and those who have tested such code before release? How about the many people behind the scenes who brought together packaging, documentation, multiple hours of design, marketing, online resources and more?

And all of that, I might add, is open source! Free for the world to take, copy, adapt and even incorporate in full or in part into their own open development.

It is in exactly that area that the team at MySQL shines once again – they have from their humble beginnings build up a colossally powerful database software that handles much of the world’s data, fast.

Used in every area of life – aerospace, defense, education, finances, government, healthcare, pharma, manufacturing, media, retail, telecoms, hospitality, and finally the web – it truly is a community effort.

My little contribution to this effort is first and foremost to say: well done! Well done for such an all-in-all huge endeavor. When I tested MySQL 8.0, I experienced something new: an extraordinarily clean bug report screen when I unleashed our bug hunting rats, ahem, I mean tools. This was somewhat unexpected. Usually, new releases are a fun playground even for seasoned QA engineers who look for the latest toy to break.

I have a suspicion that the team at Oracle either uses newly-improved bug-finding tools or perhaps they included some of our methods and tools in their setup. In either case, it is, was and will be welcome.

When the unexpected occurs, a fight or flight syndrome happens. I tend to be a fighter, so I upped the battle and managed to find about 30 bugs, with 21 bugs logged already. Quite a few of them are Sig 11’s in release builds. Signal 11 exceptions are unexpected crashes, and release builds are the exact same build you would download at dev.mysql.com.

The debug build also had a number of issues, but less than expected, leading me to the conclusions drawn above. Since Oracle engineers marked many of the issues logged as security bugs, I didn’t list them here. I’ll give Oracle some time to fix them, but I might add them later.

In summary, my personal recommendation is this: unless you are a funky new web company thriving on the latest technology, give Oracle the opportunity to make a few small point bugfix releases before adapting MySQL 8.0 GA. After that, providing upgrade prerequisites are matched, and that your software application is compatible, go for it and upgrade.

Before that, this is a great time to start checking out the latest and greatest that MySQL 8.0 GA has to offer!

All in all, I like what I saw, and I expect MySQL 8.0 GA to have a bright future.

Signed, a seasoned software critic.

The post MySQL 8.0 GA: Quality or Not? appeared first on Percona Database Performance Blog.

by Roel Van de Paar at April 27, 2018 07:36 PM

Jean-Jerome Schmidt

Watch the Webinar Replay: How to Measure Database Availability

Watch the replay of Part 2 of our database high availability webinar special!

Thanks to to everyone who participated in this week’s webinar on how to measure database availability. The replay and slides are now available to view online.

It is notoriously hard to measure and report on, although it is an important KPI in any SLA between you and your customer. With that in mind, we will discuss the different factors that affect database availability and see how you can measure your database availability in a realistic way.

It is common enough to define availability in terms of 9s (e.g. 99.9% or 99.999%) - especially here at Severalnines - although there are often different opinions as to what these numbers actually mean, or how they are measured.

Is the database available if an instance is up and running, but it is unable to serve any requests? Or if response times are excessively long, so that users consider the service unusable? Is the impact of one longer outage the same as multiple shorter outages? How do partial outages affect database availability, where some users are unable to use the service while others are completely unaffected?

Not agreeing on precise definitions with your customers might lead to dissatisfaction. The database team might be reporting that they have met their availability goals, while the customer is dissatisfied with the service.

Join us for this webinar during which we will discuss the different factors that affect database availability and see how to measure database availability in a realistic way.

Watch the replay

Agenda

  • Defining availability targets
    • Critical business functions
    • Customer needs
    • Duration and frequency of downtime
    • Planned vs unplanned downtime
    • SLA
  • Measuring the database availability
    • Failover/Switchover time
    • Recovery time
    • Upgrade time
    • Queries latency
    • Restoration time from backup
    • Service outage time
  • Instrumentation and tools to measure database availability:
    • Free & open-source tools
    • CC's Operational Report
    • Paid tools

Watch the replay

Speaker

Bartlomiej Oles is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.

by jj at April 27, 2018 08:33 AM

April 26, 2018

Peter Zaitsev

The Evolution of the DBA in an “As-A-Service” World

DBA

The requirements for managing and running a database in a modern enterprise have evolved over the past ten years. Those in charge of running enterprise databases have seen their focus shift from ensuring access and availability, to architecture, design and scalability responsibilities. Web-first companies pioneered the change by charging site reliability engineers (SRE’s) or multi-faceted DBAs with the task of ensuring that the company’s main revenue engine not only stayed up, but could scale to wherever the business needed to go. This is a far cry from the classic enterprise DBA’s top responsibilities: keep it up, keep it backed up, and react to issues as they present themselves.

Today, enterprises look for new revenue models to keep up with a shifting technology paradigm driven by the cloud. The requirements and needs for managing their database environments are changing along with this shift. In the SaaS world, application outages mean lost revenue. Worse, it leads to customer churn and gives your competitors an opening. To keep revenue flowing, every part of a SaaS company’s critical infrastructure needs to be planned out: redundancy should be built-in, and a future-proof architecture should be built to accommodate scale.

The more issues you can design out before launch, the less chance of a catastrophic outage later on. This means as a SaaS provider you want your DBAs and database engineers architecting a database that avoids problems at scale, and you want them working with your developers to write better, more efficient database calls. The database infrastructure is designed and automated to work at scale, while taking into account efficient use of resources for meeting today’s requirements.

When companies move to the cloud, the cloud provider takes care of much of the operational automation and many of the mundane day-to-day tasks (for example, using database as a service (DBaaS) options such as Amazon RDS and Aurora). But this does not eliminate the need for database expertise: it moves the function closer to the design and development side of the application. Someone needs to not only design and tune the database to support the application, but also has to understand how to build the modular pieces available in the cloud into a cohesive scalable unit that meets the needs of the application and the company. This means there are much higher impacts and clearer ROIs realized from efficient database expertise.

Cloud DBA vs. Classic DBA

 

Over the years at Percona, we have seen this shift as well. Currently, more than 50% of the support tickets our customers open are related to application design issues, query performance or database infrastructure design. This is a far cry from five years ago when these represented less than 20% of our overall caseload. This makes sense, however, when you think about the maturity of our database products and the technological advances that impact the database. A more stable MySQL and MongoDB, coupled with advances in either homegrown automation or cloud-based infrastructure, reduce the likelihood of common crashing bugs and “Core Database Software” related bugs. Instead, outages and issues are increasingly caused by design decisions, bad code or unplanned-for edge cases. In order to keep up, DBAs need to evolve to move upstream to have the greatest impact.

At Percona, we recognize the changing requirements of modern database deployments. In fact, we have been providing database expertise since the beginning of the SaaS and cloud era. We recognize the needs of clients that choose to run on a DBaaS platform are slightly different than those managing their own full-stack database deployments.

That’s why we created a brand new tier of support focused on DBaaS platforms. These services allow you to rely on your cloud provider for operational break-fix support, while augmenting that with proven world-class expertise focused on the design, development, and tuning of the database itself (which cloud providers typically don’t address).

We also launched a DBaaS-focused version of our Percona DBA service. The Percona DBA service focuses on designing, setting up, and proactively improving your DBaaS cloud environment to ensure you get the most out of your investment. 

Contact us for more details on our new support and managed service options that can help optimize your cloud database environments, and make them run as efficiently as possible with the applications that drive your business.

The post The Evolution of the DBA in an “As-A-Service” World appeared first on Percona Database Performance Blog.

by Matt Yonkovit at April 26, 2018 07:15 PM

Monty Says

Congratulations to Oracle on MySQL 8.0


Last week, Oracle announced the general availability of MySQL 8.0. This is good news for database users, as it means Oracle is still developing MySQL.


I decide to celebrate the event by doing a quick test of MySQL 8.0. Here follows a step-by-step description of my first experience with MySQL 8.0.
Note that I did the following without reading the release notes, as is what I have done with every MySQL / MariaDB release up to date; In this case it was not the right thing to do.

I pulled MySQL 8.0 from ghit@github.com:mysql/mysql-server.git
I was pleasantly surprised that 'cmake . ; make' worked without without any compiler warnings! I even checked the used compiler options and noticed that MySQL was compiled with -Wall + several other warning flags. Good job MySQL team!

I did have a little trouble finding the mysqld binary as Oracle had moved it to 'runtime_output_directory'; Unexpected, but no big thing.

Now it's was time to install MySQL 8.0.

I did know that MySQL 8.0 has removed mysql_install_db, so I had to use the mysqld binary directly to install the default databases:
(I have specified datadir=/my/data3 in the /tmp/my.cnf file)

> cd runtime_output_directory
> mkdir /my/data3
> ./mysqld --defaults-file=/tmp/my.cnf --install

2018-04-22T12:38:18.332967Z 1 [ERROR] [MY-011011] [Server] Failed to find valid data directory.
2018-04-22T12:38:18.333109Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2018-04-22T12:38:18.333135Z 0 [ERROR] [MY-010119] [Server] Aborting

A quick look in mysqld --help --verbose output showed that the right command option is --initialize. My bad, lets try again,

> ./mysqld --defaults-file=/tmp/my.cnf --initialize

2018-04-22T12:39:31.910509Z 0 [ERROR] [MY-010457] [Server] --initialize specified but the data directory has files in it. Aborting.
2018-04-22T12:39:31.910578Z 0 [ERROR] [MY-010119] [Server] Aborting

Now I used the right options, but still didn't work.
I took a quick look around:

> ls /my/data3/
binlog.index

So even if the mysqld noticed that the data3 directory was wrong, it still wrote things into it.  This even if I didn't have --log-binlog enabled in the my.cnf file. Strange, but easy to fix:

> rm /my/data3/binlog.index
> ./mysqld --defaults-file=/tmp/my.cnf --initialize

2018-04-22T12:40:45.633637Z 0 [ERROR] [MY-011071] [Server] unknown variable 'max-tmp-tables=100'
2018-04-22T12:40:45.633657Z 0 [Warning] [MY-010952] [Server] The privilege system failed to initialize correctly. If you have upgraded your server, make sure you're executing mysql_upgrade to correct the issue.
2018-04-22T12:40:45.633663Z 0 [ERROR] [MY-010119] [Server] Aborting

The warning about the privilege system confused me a bit, but I ignored it for the time being and removed from my configuration files the variables that MySQL 8.0 doesn't support anymore. I couldn't find a list of the removed variables anywhere so this was done with the trial and error method.

> ./mysqld --defaults-file=/tmp/my.cnf

2018-04-22T12:42:56.626583Z 0 [ERROR] [MY-010735] [Server] Can't open the mysql.plugin table. Please run mysql_upgrade to create it.
2018-04-22T12:42:56.827685Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2018-04-22T12:42:56.838501Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2018-04-22T12:42:56.848375Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables
2018-04-22T12:42:56.848863Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we're sending the information to the error-log instead: MY-001146 - Table 'mysql.component' doesn't exist
2018-04-22T12:42:56.848916Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we're sending the information to the error-log instead: MY-003543 - The mysql.component table is missing or has an incorrect definition.
....
2018-04-22T12:42:56.854141Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: '8.0.11' socket: '/tmp/mysql.sock' port: 3306 Source distribution.

I figured out that if there is a single wrong variable in the configuration file, running mysqld --initialize will leave the database in an inconsistent state. NOT GOOD! I am happy I didn't try this in a production system!

Time to start over from the beginning:

> rm -r /my/data3/*
> ./mysqld --defaults-file=/tmp/my.cnf --initialize

2018-04-22T12:44:45.548960Z 5 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: px)NaaSp?6um
2018-04-22T12:44:51.221751Z 0 [System] [MY-013170] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld (mysqld 8.0.11) initializing of server has completed

Success!

I wonder why the temporary password is so complex; It could easily have been something that one could easily remember without decreasing security, it's temporary after all. No big deal, one can always paste it from the logs. (Side note: MariaDB uses socket authentication on many system and thus doesn't need temporary installation passwords).

Now lets start the MySQL server for real to do some testing:

> ./mysqld --defaults-file=/tmp/my.cnf

2018-04-22T12:45:43.683484Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: '8.0.11' socket: '/tmp/mysql.sock' port: 3306 Source distribution.

And the lets start the client:

> ./client/mysql --socket=/tmp/mysql.sock --user=root --password="px)NaaSp?6um"
ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory

Apparently MySQL 8.0 doesn't work with old MySQL / MariaDB clients by default :(

I was testing this in a system with MariaDB installed, like all modern Linux system today, and didn't want to use the MySQL clients or libraries.

I decided to try to fix this by changing the authentication to the native (original) MySQL authentication method.

> mysqld --skip-grant-tables

> ./client/mysql --socket=/tmp/mysql.sock --user=root
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

Apparently --skip-grant-tables is not good enough anymore. Let's try again with:

> mysqld --skip-grant-tables --default_authentication_plugin=mysql_native_password

> ./client/mysql --socket=/tmp/mysql.sock --user=root mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.11 Source distribution

Great, we are getting somewhere, now lets fix "root"  to work with the old authenticaion:

MySQL [mysql]> update mysql.user set plugin="mysql_native_password",authentication_string=password("test") where user="root";
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '("test") where user="root"' at line 1

A quick look in the MySQL 8.0 release notes told me that the PASSWORD() function is removed in 8.0. Why???? I don't know how one in MySQL 8.0 is supposed to generate passwords compatible with old installations of MySQL. One could of course start an old MySQL or MariaDB version, execute the password() function and copy the result.

I decided to fix this the easy way and use an empty password:

(Update:: I later discovered that the right way would have been to use: FLUSH PRIVILEGES;  ALTER USER' root'@'localhost' identified by 'test'  ; I however dislike this syntax as it has the password in clear text which is easy to grab and the command can't be used to easily update the mysql.user table. One must also disable the --skip-grant mode to do use this)

MySQL [mysql]> update mysql.user set plugin="mysql_native_password",authentication_string="" where user="root";
Query OK, 1 row affected (0.077 sec)
Rows matched: 1 Changed: 1 Warnings: 0
 
I restarted mysqld:
> mysqld --default_authentication_plugin=mysql_native_password

> ./client/mysql --user=root --password="" mysql
ERROR 1862 (HY000): Your password has expired. To log in you must change it using a client that supports expired passwords.

Ouch, forgot that. Lets try again:

> mysqld --skip-grant-tables --default_authentication_plugin=mysql_native_password

> ./client/mysql --user=root --password="" mysql
MySQL [mysql]> update mysql.user set password_expired="N" where user="root";

Now restart and test worked:

> ./mysqld --default_authentication_plugin=mysql_native_password

>./client/mysql --user=root --password="" mysql

Finally I had a working account that I can use to create other users!

When looking at mysqld --help --verbose again. I noticed the option:

--initialize-insecure
Create the default database and exit. Create a super user
with empty password.

I decided to check if this would have made things easier:

> rm -r /my/data3/*
> ./mysqld --defaults-file=/tmp/my.cnf --initialize-insecure


2018-04-22T13:18:06.629548Z 5 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.

Hm. Don't understand the warning as--initialize-insecure is not an option that one would use more than one time and thus nothing one would 'switch off'.

> ./mysqld --defaults-file=/tmp/my.cnf

> ./client/mysql --user=root --password="" mysql
ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory

Back to the beginning :(

To get things to work with old clients, one has to initialize the database with:
> ./mysqld --defaults-file=/tmp/my.cnf --initialize-insecure --default_authentication_plugin=mysql_native_password

Now I finally had MySQL 8.0 up and running and thought I would take it up for a spin by running the "standard" MySQL/MariaDB sql-bench test suite. This was removed in MySQL 5.7, but as I happened to have MariaDB 10.3 installed, I decided to run it from there.

sql-bench is a single threaded benchmark that measures the "raw" speed for some common operations. It gives you the 'maximum' performance for a single query. Its different from other benchmarks that measures the maximum throughput when you have a lot of users, but sql-bench still tells you a lot about what kind of performance to expect from the database.

I tried first to be clever and create the "test" database, that I needed for sql-bench, with
> mkdir /my/data3/test

but when I tried to run the benchmark, MySQL 8.0 complained that the test database didn't exist.

MySQL 8.0 has gone away from the original concept of MySQL where the user can easily
create directories and copy databases into the database directory. This may have serious
implication for anyone doing backup of databases and/or trying to restore a backup with normal OS commands.

I created the 'test' database with mysqladmin and then tried to run sql-bench:

> ./run-all-tests --user=root

The first run failed in test-ATIS:

Can't execute command 'create table class_of_service (class_code char(2) NOT NULL,rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_code))'
Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_' at line 1

This happened because 'rank' is now a reserved word in MySQL 8.0. This is also reserved in ANSI SQL, but I don't know of any other database that has failed to run test-ATIS before. I have in the past run it against Oracle, PostgreSQL, Mimer, MSSQL etc without any problems.

MariaDB also has 'rank' as a keyword in 10.2 and 10.3 but one can still use it as an identifier.

I fixed test-ATIS and then managed to run all tests on MySQL 8.0.

I did run the test both with MySQL 8.0 and MariaDB 10.3 with the InnoDB storage engine and by having identical values for all InnoDB variables, table-definition-cache and table-open-cache. I turned off performance schema for both databases. All test are run with a user with an empty password (to keep things comparable and because it's was too complex to generate a password in MySQL 8.0)

The result are as follows
Results per test in seconds:

Operation         |MariaDB|MySQL-8|

-----------------------------------
ATIS              | 153.00| 228.00|
alter-table       |  92.00| 792.00|
big-tables        | 990.00|2079.00|
connect           | 186.00| 227.00|
create            | 575.00|4465.00|
insert            |4552.00|8458.00|
select            | 333.00| 412.00|
table-elimination |1900.00|3916.00|
wisconsin         | 272.00| 590.00|
-----------------------------------

This is of course just a first view of the performance of MySQL 8.0 in a single user environment. Some reflections about the results:

  • Alter-table test is slower (as expected) in 8.0 as some of the alter tests benefits of the instant add column in MariaDB 10.3.
  • connect test is also better for MariaDB as we put a lot of efforts to speed this up in MariaDB 10.2
  • table-elimination shows an optimization in MariaDB for the  Anchor table model, which MySQL doesn't have.
  • CREATE and DROP TABLE is almost 8 times slower in MySQL 8.0 than in MariaDB 10.3. I assume this is the cost of 'atomic DDL'. This may also cause performance problems for any thread using the data dictionary when another thread is creating/dropping tables.
  • When looking at the individual test results, MySQL 8.0 was slower in almost every test, in many significantly slower.
  • The only test where MySQL was faster was "update_with_key_prefix". I checked this and noticed that there was a bug in the test and the columns was updated to it's original value (which should be instant with any storage engine). This is an old bug that MySQL has found and fixed and that we have not been aware of in the test or in MariaDB.
  • While writing this, I noticed that MySQL 8.0 is now using utf8mb4 as the default character set instead of latin1. This may affect some of the benchmarks slightly (not much as most tests works with numbers and Oracle claims that utf8mb4 is only 20% slower than latin1), but needs to be verified.
  • Oracle claims that MySQL 8.0 is much faster on multi user benchmarks. The above test indicates that they may have done this by sacrificing single user performance.
  •  We need to do more and many different benchmarks to better understand exactly what is going on. Stay tuned!

Short summary of my first run with MySQL 8.0:
  • Using the new caching_sha2_password authentication as default for new installation is likely to cause a lot of problems for users. No old application will be able to use MySQL 8.0, installed with default options, without moving to MySQL's client libraries. While working on this blog I saw MySQL users complain on IRC that not even MySQL Workbench can authenticate with MySQL 8.0. This is the first time in MySQL's history where such an incompatible change has ever been done!
  • Atomic DDL is a good thing (We plan to have this in MariaDB 10.4), but it should not have such a drastic impact on performance. I am also a bit skeptical of MySQL 8.0 having just one copy of the data dictionary as if this gets corrupted you will lose all your data. (Single point of failure)
  • MySQL 8.0 has several new reserved words and has removed a lot of variables, which makes upgrades hard. Before upgrading to MySQL 8.0 one has to check all one's databases and applications to ensure that there are no conflicts.
  • As my test above shows, if you have a single deprecated variable in your configuration files, the installation of MySQL will abort and can leave the database in inconsistent state. I did of course my tests by installing into an empty data dictionary, but one can assume that some of the problems may also happen when upgrading an old installation.


Conclusions:
In many ways, MySQL 8.0 has caught up with some earlier versions of MariaDB. For instance, in MariaDB 10.0, we introduced roles (four years ago). In MariaDB 10.1, we introduced encrypted redo/undo logs (three years ago). In MariaDB 10.2, we introduced window functions and CTEs (a year ago). However, some catch-up of MariaDB Server 10.2 features still remains for MySQL (such as check constraints, binlog compression, and log-based rollback).

MySQL 8.0 has a few new interesting features (mostly Atomic DDL and JSON TABLE functions), but at the same time MySQL has strayed away from some of the fundamental corner stone principles of MySQL:

From the start of the first version of MySQL in 1995, all development has been focused around 3 core principles:
  • Ease of use
  • Performance
  • Stability

With MySQL 8.0, Oracle has sacrifices 2 of 3 of these.

In addition (as part of ease of use), while I was working on MySQL, we did our best to ensure that the following should hold:

  • Upgrades should be trivial
  • Things should be kept compatible, if possible (don't remove features/options/functions that are used)
  • Minimize reserved words, don't remove server variables
  • One should be able to use normal OS commands to create and drop databases, copy and move tables around within the same system or between different systems. With 8.0 and data dictionary taking backups of specific tables will be hard, even if the server is not running.
  • mysqldump should always be usable backups and to move to new releases
  • Old clients and application should be able to use 'any' MySQL server version unchanged. (Some Oracle client libraries, like C++, by default only supports the new X protocol and can thus not be used with older MySQL or any MariaDB version)

We plan to add a data dictionary to MariaDB 10.4 or MariaDB 10.5, but in a way to not sacrifice any of the above principles!

The competition between MySQL and MariaDB is not just about a tactical arms race on features. It’s about design philosophy, or strategic vision, if you will.

This shows in two main ways: our respective view of the Storage Engine structure, and of the top-level direction of the roadmap.

On the Storage Engine side, MySQL is converging on InnoDB, even for clustering and partitioning. In doing so, they are abandoning the advantages of multiple ways of storing data. By contrast, MariaDB sees lots of value in the Storage Engine architecture: MariaDB Server 10.3 will see the general availability of MyRocks (for write-intensive workloads) and Spider (for scalable workloads). On top of that, we have ColumnStore for analytical workloads. One can use the CONNECT engine to join with other databases. The use of different storage engines for different workloads and different hardware is a competitive differentiator, now more than ever.

On the roadmap side, MySQL is carefully steering clear of features that close the gap between MySQL and Oracle. MariaDB has no such constraints. With MariaDB 10.3, we are introducing PL/SQL compatibility (Oracle’s stored procedures) and AS OF (built-in system versioned tables with point-in-time querying). For both of those features, MariaDB is the first Open Source database doing so. I don't except Oracle to provide any of the above features in MySQL!

Also on the roadmap side, MySQL is not working with the ecosystem in extending the functionality. In 2017, MariaDB accepted more code contributions in one year, than MySQL has done during its entire lifetime, and the rate is increasing!

I am sure that the experience I had with testing MySQL 8.0 would have been significantly better if MySQL would have an open development model where the community could easily participate in developing and testing MySQL continuously. Most of the confusing error messages and strange behavior would have been found and fixed long before the GA release.


Before upgrading to MySQL 8.0 please read https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html to see what problems you can run into! Don't expect that old installations or applications will work out of the box without testing as a lot of features and options has been removed (query cache, partition of myisam tables etc)! You probably also have to revise your backup methods, especially if you want to ever restore just a few tables. (With 8.0, I don't know how this can be easily done).

According to the MySQL 8.0 release notes, one can't use mysqldump to copy a database to MySQL 8.0. One has to first to move to a MySQL 5.7 GA version (with mysqldump, as recommended by Oracle) and then to MySQL 8.0 with in-place update. I assume this means that all old mysqldump backups are useless for MySQL 8.0?

MySQL 8.0 seams to be a one way street to an unknown future. Up to MySQL 5.7 it has been trivial to move to MariaDB and one could always move back to MySQL with mysqldump. All MySQL client libraries has worked with MariaDB and all MariaDB client libraries has worked with MySQL. With MySQL 8.0 this has changed in the wrong direction.

As long as you are using MySQL 5.7 and below you have choices for your future, after MySQL 8.0 you have very little choice. But don't despair, as MariaDB will always be able to load a mysqldump file and it's very easy to upgrade your old MySQL installation to MariaDB :)

I wish you good luck to try MySQL 8.0 (and also the upcoming MariaDB 10.3)!

by Michael "Monty" Widenius (noreply@blogger.com) at April 26, 2018 10:44 AM

Jean-Jerome Schmidt

How to do Point-in-Time Recovery of MySQL & MariaDB Data using ClusterControl

Backups are crucial when it comes to safety of data. They are the ultimate disaster recovery solution - you have no database nodes reachable and your datacenter could literally have gone up in smoke, but as long as you have a backup of your data, you can still recover from such situation.

Typically, you will use backups to recover from different types of cases:

  • accidental DROP TABLE or DELETE without a WHERE clause, or with a WHERE clause that was not specific enough.
  • a database upgrade that fails and corrupts the data
  • storage media failure/corruption

Is restoring from backup not enough? What does it have to be point-in-time? We have to keep in mind that a backup is a snapshot of data taken at a given point in time. If you take a backup at 1:00 am and a table was removed accidently at 11:00 am, you can restore your data up to 1:00 am but what about changes which happened between 1:00 am and 11:00 am? Those changes would be lost unless you can replay modifications that happened in between. Luckily, MySQL has such a mechanism for storing changes - binary logs. You may know those logs are used for replication - MySQL uses them to store all of the changes which happened on the master, and a slave uses them to replay those changes and apply them to its dataset. As the binlogs store all of the changes, you can also use them to replay traffic. In this blog post, we will take a look at how ClusterControl can help you perform Point-In-Time Recovery (PITR).

Creating backup compatible with Point-In-Time Recovery

First of all, let’s talk about prerequisites. A host where you take backups from has to have binary logs enabled. Without them, PITR is not possible. Second requirement - a host where you take backups from should have all the binary logs required in order to restore to a given point in time. If you use too aggressive binary log rotation, this could become a problem.

So, let us see how to use this feature in ClusterControl. First of all, you have to take a backup which is compatible with PITR. Such backup has to be full, complete and consistent. For xtrabackup, as long it contains full dataset (you didn’t include just a subset of schemas), it will be PITR-compatible.

For mysqldump, there is an option to make it PITR-compatible. When you enable this option, all necessary options will be configured (for example, you won’t be able to pick separate schemas to include in the dump) and backup will be marked as available for point-in-time recovery.

Point-In-Time Recovery from a backup

First, you have to pick a backup to restore.

If the backup is compatible with PITR, an option will be presented to perform a Point-In-Time Recovery. You will have two options for that - “Time Based” and “Position Based”. Let’s discuss the difference between those two options.

“Time Based” PITR

With this option you can pass a date and time, up to which the backup should be restored. It can be defined within one second resolution. It does not guarantee that all of the data will be restored because, even if you are very precise in defining the time, during one second multiple events could be recorded in the binary log. Let’s say that you know that the data loss happened on 18th of April, at 10:00:01. You pass the following date and time to the form: ‘2018-04-18 10:00:00’. Please keep in mind that you should be using a time that is based on the timezone settings on the database server on which the backup was created.

It still may happen that the data loss even wasn’t the first one which happened at 10:00:01 so some of the events will be lost in the process. Let’s look at what that means.

During one second, multiple events may be logged in binlogs. Let's consider such case:
10:00:00 - events A,B,C,D,E,F
10:00:01 - events V,W,X,Y,Z
where X is the data loss event. With a granularity of a second, you can either restore up to everything which happened at 10:00:00 (so up to F) or up to 10:00:01 (up to Z). The later case is of no use as X would be re-executed. In the former case, we miss V and W.

That's why position based restore is more precise. You can tell "I want to restore up to W".

Time based restore is the most precise you can get without having to go to the binary logs and define the exact position to where you want to restore. This leads us to the second method of doing PITR.

“Position Based” PITR

Here some experience with command line tools for MySQL, namely mysqlbinlog utility, is required. On the other hand, you will have the best control over how the recovery will be made.

Let’s go through a simple example. As you can see in the screenshot above, you will have to pass a binary log name and binary log position up to which point the backup should be restored. Most of the time, this should be the last position before the data loss event.

Someone executed a SQL command which resulted in a serious data loss:

mysql> DROP TABLE sbtest1;
Query OK, 0 rows affected (0.02 sec)

Our application immediately started to complain:

sysbench 1.1.0-ecf1191 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 2
Report intermediate results every 1 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

FATAL: mysql_drv_query() returned error 1146 (Table 'sbtest.sbtest1' doesn't exist) for query 'DELETE FROM sbtest1 WHERE id=5038'
FATAL: `thread_run' function failed: /usr/local/share/sysbench/oltp_common.lua:490: SQL error, errno = 1146, state = '42S02': Table 'sbtest.sbtest1' doesn't exist

We have a backup but we want to restore all of the data up to that fatal moment. First of all, we assume that the application does not work so we can discard all of the writes which happened after the DROP TABLE as non-important. If your application works to some extent, you would have to merge the remaining changes later on. Ok, let’s examine the binary logs to find the position of the DROP TABLE statement. As we want to avoid parsing all of the binary logs, let’s find what was the position our latest backup covered. You can check that by examining logs for the latest backup set and look for a line similar to this one:

So, we are talking about filename 'binlog.000008' and position '16184120'. Let’s use this as our starting point. Let’s check what binary log files we have:

root@vagrant:~# ls -alh /var/lib/mysql/binlog.*
-rw-r----- 1 mysql mysql  58M Apr 17 08:31 /var/lib/mysql/binlog.000001
-rw-r----- 1 mysql mysql 116M Apr 17 08:59 /var/lib/mysql/binlog.000002
-rw-r----- 1 mysql mysql 379M Apr 17 09:30 /var/lib/mysql/binlog.000003
-rw-r----- 1 mysql mysql 344M Apr 17 10:54 /var/lib/mysql/binlog.000004
-rw-r----- 1 mysql mysql 892K Apr 17 10:56 /var/lib/mysql/binlog.000005
-rw-r----- 1 mysql mysql  74M Apr 17 11:03 /var/lib/mysql/binlog.000006
-rw-r----- 1 mysql mysql 5.2M Apr 17 11:06 /var/lib/mysql/binlog.000007
-rw-r----- 1 mysql mysql  21M Apr 18 11:35 /var/lib/mysql/binlog.000008
-rw-r----- 1 mysql mysql  59K Apr 18 11:35 /var/lib/mysql/binlog.000009
-rw-r----- 1 mysql mysql  144 Apr 18 11:35 /var/lib/mysql/binlog.index

So, in addition to 'binlog.000008' we also have 'binlog.000009' to examine. Let’s run the command which will convert binary logs into SQL format starting from the position we found in the backup log:

root@vagrant:~# mysqlbinlog --start-position='16184120' --verbose /var/lib/mysql/binlog.000008 /var/lib/mysql/binlog.000009 > binlog.out

Please node ‘--verbose’ is required to decode row-based events. This is not necessarily required for the DROP TABLE we are looking for, but for other type of events it may be needed.

Let’s search our output for the DROP TABLE query:

root@vagrant:~# grep -B 7 -A 1 "DROP TABLE" binlog.out
# at 20885489
#180418 11:24:32 server id 1  end_log_pos 20885554 CRC32 0xb89f2e66     GTID    last_committed=38168    sequence_number=38170    rbr_only=no
SET @@SESSION.GTID_NEXT= '7fe29cb7-422f-11e8-b48d-0800274b240e:38170'/*!*/;
# at 20885554
#180418 11:24:32 server id 1  end_log_pos 20885678 CRC32 0xb38a427b     Query    thread_id=54    exec_time=0    error_code=0
use `sbtest`/*!*/;
SET TIMESTAMP=1524050672/*!*/;
DROP TABLE `sbtest1` /* generated by server */
/*!*/;

In this sample we can see two events. First, at the position of 20885489, sets GTID_NEXT variable.

# at 20885489
#180418 11:24:32 server id 1  end_log_pos 20885554 CRC32 0xb89f2e66     GTID    last_committed=38168    sequence_number=38170    rbr_only=no
SET @@SESSION.GTID_NEXT= '7fe29cb7-422f-11e8-b48d-0800274b240e:38170'/*!*/;

Second, at the position of 20885554 is our DROP TABLE event. This leads to the conclusion that we should perform the PITR up to the position of 20885489. The only question to answer is which binary log we are talking about. We can check that by searching for binlog rotation entries:

root@vagrant:~# grep  "Rotate to binlog" binlog.out
#180418 11:35:46 server id 1  end_log_pos 21013114 CRC32 0x2772cc18     Rotate to binlog.000009  pos: 4

As it can be clearly seen by comparing dates, rotation to binlog.000009 happened later therefore we want to pass binlog.000008 as the binlog file in the form.

Next, we have to decide if we are going to restore the backup on the cluster or do we want to use external server to restore it. This second option could be useful if you want to restore just a subset of data. You can restore full physical backup on a separate host and then use mysqldump to dump the missing data and load it up on the production server.

Keep in mind that when you restore the backup on your cluster, you will have to rebuild nodes other than the one you recovered. In master - slave scenario you will typically want to restore backup on the master and then rebuild slaves from it.

As a last step, you will see a summary of actions ClusterControl will take.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Finally, after the backup was restored, we will test if the missing table has been restored or not:

mysql> show tables from sbtest like 'sbtest1'\G
*************************** 1. row ***************************
Tables_in_sbtest (sbtest1): sbtest1
1 row in set (0.00 sec)

Everything seems ok, we managed to restore missing data.

The last step we have to take is to rebuild our slave. Please note that there is an option to use a PITR backup. In the example here, this is not possible as the slave would replicate the DROP TABLE event and it would end up not being consistent with the master.

by krzysztof at April 26, 2018 09:59 AM

Peter Zaitsev

MySQL Community Awards Winners 2018

2018 Community Awards

2018 Community Awards

One of the highlights of Percona Live is that the winners of the annual MySQL Community Awards are announced. A 100% community effort, the awards were created to recognize community contribution. This year saw six very deserving winners in three categories:

MySQL Community Awards: Community Contributor of the year 2018

Two individuals received these awards:

  • Jean-François Gagné
    Jean-François was nominated for his many blog posts, bug reports, and experiment results that make MySQL much better. His blog: https://jfg-mysql.blogspot.com/
  • Sveta Smirnova
    Sveta spreads knowledge and good practice on all things MySQL as a frequent speaker and blogger. Her years of experience in testing, support, and consulting are shared in webinars, technical posts, conferences around the world and in her book “MySQL Troubleshooting”. While we’re proud to say that Sveta works for Percona, this reward is for her outstanding individual contribution irrespective of that. Kudos and respect, Sveta!

MySQL Community Awards: Application of the year 2018

Three applications were honoured:

  • MyRocks
    MyRocks is now in MariaDB, Percona Server and PolarDB (Alibaba). Intel, MariaDB and Rockset are optimizing it for cloud native storage.
  • ProxySQL
    ProxySQL solves serious, real-world problems experienced by DBAs in an elegant way.
  • Vitess
    Vitess is a database clustering system for horizontal scaling of MySQL. Originally developed at YouTube/Google and now under CNCF, Vitess is free and open source and aims to solve scaling problems for MySQL installations.

MySQL Community Awards: Corporate Contributor of the year 2018

The awards were presented by Agustín Gallego and Emily Slocombe.

In the spirit of open source, much of the content of this post has been sourced from the MySQL Community Awards website and the full information can be read there. Please do take the time to read the full details and you can also read about past winners and initiatives on that site.

Congratulations to all!

The post MySQL Community Awards Winners 2018 appeared first on Percona Database Performance Blog.

by Lorraine Pocklington at April 26, 2018 04:45 AM

Percona Live 2018: Closing Ceremony with Lightning Talks

Mariella ViaSat Percona Live 2018

Percona Live 2018 is officially done! The closing ceremony ends the conference with a last few lightning talks and a prize giveaway.

We ended the Percona Live 2018 conference the same way we started: with talks! This year, we had our lightning talks as part of our closing ceremony. They included:

Mariella ViaSat Percona Live 2018Amazon Aurora MySQL and RDS MySQL: Lessons learned

Mariella Di Giacomo – ViaSat

Determining the best and most suitable relational database management system (RDBMS) for a given project isn’t an easy task and it can be rather challenging at times. It is like benchmarking fast cars created by different racing teams. The presentation compared, using a large body of experimental results, two highly-available cloud closed-source products, Amazon Aurora MySQL and RDS MySQL, both based on the Open Source MySQL Edition.

Both use cases have demonstrated that MySQL is a great solution for concurrent writes, reads and read and write traffic. Additionally, both scenarios have proven to be successful, satisfying data integrity, reliability and scalability with different outcomes.

Øystein Grøvlen Oracle Percona Live 2018JSON_TABLE – The Best of Both Worlds

Øystein Grøvlen – Oracle

One of the most popular new features of MySQL 5.7 was JSON support. Now you can use SQL to search, extract information from, and change JSON documents. MySQL 8.0 takes this a step further. Using the JSON_TABLE function, you will be able to construct relational tables based on the contents of JSON documents. This way, you will be able to use the power of SQL to process JSON. For example, you can use SQL aggregate functions on your JSON data, or use the WHERE clause to find interesting objects within a JSON array.

This lightning talk gave a short introduction into how JSON_TABLE provides the missing link when processing JSON documents in MySQL.

Laurent Indeed Percona 2018Scaling MySQL with HaProxy

Laurent Kolakofsky – Indeed.com

Scaling MySQL infrastructure is challenging, and traditional setups don’t scale horizontally and require manual configuration management.

This talk was about how Indeed scaled MySQL their infrastructure. Using HaProxy, they dynamically take backends in/out of rotation based on replication lag. Through this proxy, they load-balance reads across a pool of replicas, ensure replication lag is below a threshold, and easily take replicas out of rotation for maintenance, removing the work of manually updating application’s configuration.

They also learned about different routing strategy we use, such as fail-to-primary vs fail-open, and about surprising application connection pool’s behaviors we learned along the way.

After these talks, we had our drawings for the Passport Prizes and the Rate My Talks participants.

And that is it! Keep your ear open for when we announce the location and dates of next year’s conference, and save the date for Percona Live Europe 2018. Until next year!

The post Percona Live 2018: Closing Ceremony with Lightning Talks appeared first on Percona Database Performance Blog.

by Dave Avery at April 26, 2018 12:04 AM

April 25, 2018

Peter Zaitsev

Percona Live 2018: POLARDB, an InnoDB Based Shared-Everything Storage Solution

Inaam Alibaba Percona Live 2018

We’re heading into the home stretch at Percona Live 2018, but the sessions are continuing. I was able to attend a talk this afternoon given by Inaam Rana, a Database Developer at Alibaba Cloud, on PolarDB, an InnoDB based shared-everything storage solution.

POLARDB provides read scale out on shared everything architecture. It features 100% backward compatibility with MySQL 5.6 and the ability to expand the capacity of a single database to over 100TB. Users can expand the computing engine and storage capability in just a matter of seconds. PolarDB offers a 6x performance improvement over MySQL 5.6, and a significant drop in costs compared to other commercial databases.

POLARDB leverages InnoDB’s redo logs for physical replication. InnoDB stores physical page level operations in redo logs for crash recovery. POLARDB extends this functionality to deploy multiple read replicas for read load sharing

In this talk, we took a deep dive into InnoDB internals and Inaam explained the changes made to the core InnoDB code. He touched upon design issues around logging, crash recovery, buffer pool management, MVCC, DDL synchronization etc.

I spoke briefly with Inaam after his Percona Live 2018 talk, and asked a few questions about PolarDB. Check it out.

The post Percona Live 2018: POLARDB, an InnoDB Based Shared-Everything Storage Solution appeared first on Percona Database Performance Blog.

by Dave Avery at April 25, 2018 10:52 PM

Percona Live 2018: Securing Access to Facebook’s Databases

Andrew Facebook Percona Live 2018

We’re moving along at Percona Live 2018, and there are still packed and energetic talks after lunch.

My next session was with Andrew Regner, Production Engineer at Facebook. His talk was on securing access to Facebook’s databases.

Since the beginning, Facebook has used a conventional username/password to secure access to production MySQL instances. Over the last few years, they’ve been working on moving to x509 TLS client certificate authenticated connections. Given the many types of languages and systems at Facebook that use MySQL in some way, this required a massive amount of changes for a lot of teams.

This talk is both a technical overview of how their new solution works and hard-learned tricks for getting an entire company to change their underlying MySQL client libraries.

After his talk, I had a chance to quickly talk with Andrew about his efforts to move the security process for Facebook’s databases. Check it out below.

The post Percona Live 2018: Securing Access to Facebook’s Databases appeared first on Percona Database Performance Blog.

by Dave Avery at April 25, 2018 10:30 PM

Percona Live 2018: Migrating to Vitess at (Slack) Scale

Michael Slack Percona Live 2018

Percona Live 2018 is moving along, and the first person I got a chance to talk with is Michael Demmer, Senior Staff Engineer at Slack. His talk was on Migrating to Vitess at (Slack Scale).

MySQL is the backbone of Slack’s data storage infrastructure. It handles billions of queries per day across thousands of sharded database hosts. Slack is migrating this system to use Vitess’ flexible sharding and topology management instead of simple application-based shard routing and manual administration. This effort aims to provide an architecture that scales to meet the growing demands of our largest customers and features while under pressure to maintain a stable and performant service.

This talk presented the core motivations behind our decision, why Vitess won out as the best option, and how Slack laid the groundwork for the migration within our development teams. Michael then presented some challenges and surprises (both good and bad) found during their transition, and the contributions to the Vitess project that mitigated them. Finally, he discussed the future plans for their migration, and suggested improvements to the Vitess ecosystem to aid other adoption efforts.

I spoke briefly with Michael after his talk, check it out below:

The post Percona Live 2018: Migrating to Vitess at (Slack) Scale appeared first on Percona Database Performance Blog.

by Dave Avery at April 25, 2018 08:22 PM

Percona Live 2018 Keynotes, Day Two

Percona Live 2018 Panel Keynote

Welcome to Percona Live 2018 keynotes, day two!

Day two of Percona Live 2018 looks to be as filled with great talks as day one. Today we started with keynote presentations from Percona, a panel on the evolution of databases and talks from VividCortex and Upwork. You can view the recording of today’s keynotes here.

Percona Welcome Back

Laurie Coffin (Percona)

Laurie Coffin, Percona CMO, welcomed everyone back to Percona Live Open Source Database Conference 2018 day two. She reminded everybody to rate your talks using the Percona Live App. People who rate five or more talks are eligible for a raffle prize! Winners of the raffle and the passport prize giveaway need to be present at the end of the day closing remarks to claim their prizes.

We haven’t selected the site for Percona Live 2019, but we will announce it as soon as we have (it will not be in Santa Clara next year). Save the date for Percona Live Europe 2018: November 5-7, 2018 in Frankfurt, Germany.

Percona Live 2018 Panel KeynoteDatabase Evolution in the Cloud Panel 

Lixun Peng (Alibaba Cloud), Sunil Kamath (Microsoft), Baron Schwartz (VividCortex), Shawn Briscoe (Percona)

How companies build applications and deploy databases has changed drastically over the last five years. Enterprises are moving applications and workloads to the cloud in order to take advantage of flexibility, match resource consumption to actual needs and reduce hardware and software expenses. This panel discussed the rapid changes occurring with databases deployed in the cloud, and what that means for the future of databases, database management and monitoring and the role of the DBA and developer. Will microservices displace traditional databases? (probably not). Will self-driving databases displace DBAs? (probably not). How will new cloud database paradigms help customers with their problems? (polyglot persistence provides the right tool for the job at hand).

Percona Live 2018 Baron KeynoteFuture Perfect: The New Shape Of The Data Tier

Baron Schwartz (VividCortex)

It’s obvious that macro trends such as cloud computing, microservices, containerization, and serverless applications are fundamentally changing how we architect, build, deploy, and operate modern applications. We’ve already seen how these changes have affected our data platforms dramatically over the past few years. Where is this going? VividCortex’s CEO Baron Schwartz gave a thoughtful talk about how our environments shape our cultures and how our cultures shape our technology. Along with that, he provided some perspective on the ways he thinks the various open source database cultures are similar, and how we can work together to shape the technologies that are taking us into the future.

Percona Live 2018 Scott KeynoteMongoDB at Upwork

Scott Simpson (Upwork)

Upwork is the largest freelancing website for connecting clients and freelancers. In this keynote, Scott Simpson, Lead Software Engineer at Upwork, discussed what MongoDB is used for at Upwork, how they chose the database and how Percona helps make them successful. Upwork need to set up a messaging service for communication between clients and freelancers, that could accommodate millions of messages a day and maintain performance.

That’s it for this year’s talks. You can find the day two talks over on our YouTube channel.

The post Percona Live 2018 Keynotes, Day Two appeared first on Percona Database Performance Blog.

by Dave Avery at April 25, 2018 06:26 PM

MariaDB AB

Try MariaDB Server 10.3 in Docker

Try MariaDB Server 10.3 in Docker rasmusjohansson Wed, 04/25/2018 - 13:07

There are times when you may want to test specific software or a specific version of software. In my case, I wanted to play with MariaDB Server 10.3.6 Release Candidate and some of the new, upcoming features. I didn’t want to have a permanent installation of it on my laptop so I chose to put it in a Docker container that I can easily copy to another place or remove. These are the steps I had to take to get it done.

I won’t go through how to install Docker itself. There is good documentation for it, which can be found here: https://docs.docker.com/install/

After the installation is completed, make sure Docker is up and running by opening a terminal and typing in a terminal window:

docker info

There are a lot of other alternatives to see that Docker is up and running, but “info” provides useful information about your Docker environment.

After Docker is set up, it’s time to create a container running MariaDB Server. The easy way to do it is to use the MariaDB Dockerfiles available on Docker Hub. These images are updated fairly quickly when a new release is made of MariaDB. It’s this easy to get MariaDB Server 10.3 RC up and running by using the Dockerfile:

docker pull mariadb:10.3

docker run --name mariadbtest -e MYSQL_ROOT_PASSWORD=mypass -d mariadb:10.3

Check that MariaDB started correctly by looking at the logs:

docker logs mariadbtest

The last row in the log will also tell you what version of MariaDB is running.

For documentation on this, refer to Installing and using MariaDB via Docker in the MariaDB documentation.

In my case, I wanted to test out the latest version of MariaDB that wasn’t yet at the time of writing available in the Dockerfile on Docker Hub. I will next go through the steps to create and populate a container without using a Dockerfile.

To get going we’ll need a new container. We need the container to be based on a operating system that is supported for MariaDB. I’ll base it off Ubuntu Xenial (16.04).

docker run -i -t ubuntu:xenial /bin/bash

When running that command, Docker will download the Ubuntu Xenial Docker image and use it as the base for the container. The /bin/bash at the end will take us into the shell of the container.

Inside the container I want to install MariaDB 10.3. I used the repository configuration tool for MariaDB to get the right configuration to add to the clean Xenial installation I now have. The tool gave me the following three commands to run.

add-apt-repository 'deb [arch=amd64,i386,ppc64el] http://mirror.netinch.com/pub/mariadb/repo/10.3/ubuntu xenial main'

apt update

apt install mariadb-server

The last command will start installing MariaDB, which will ask for a root password for MariaDB to be defined. Once that is done and the installation finishes we can exit from the container and save the configuration that we’ve done. The container id, which is needed as an argument for the commit command is easily fetched from the shell prompt , root@[container id].

exit

docker commit [container id] rasmus/mariadb103

It’s pretty useful to be able to have the database data stored outside the container. This is easily done by first defining a place for the data on the host machine. In my case, I chose to put it in /dbdata in my home directory. We want to expose it as the /data directory inside the container. We start the container with this command.

docker run -v="$HOME/dbdata":"/data" -i -t -p 3306 rasmus/mariadb103 /bin/bash

Inside the container, let’s start the MariaDB server and run the normal installation and configuration scripts.

/usr/bin/mysqld_safe &

mysql_install_db

mysql_secure_installation

After this we can test connecting to MariaDB 10.3 and hopefully everything works.

mysql -p

Welcome to the MariaDB monitor.  Commands end with ; or \g.

Your MariaDB connection id is 16

Server version: 10.3.6-MariaDB-1:10.3.6+maria~xenial-log mariadb.org binary distribution

Now I want to save the configuration so far to easily be able to start from state whenever needed. First, I exit the MariaDB monitor and then shutdown MariaDB.

exit

mysqladmin -p shutdown

Then another exit will get us out of the container and then we can save the new version of the container by running the below docker commit command in the host terminal. Again, take the container id from the shell prompt of the container.

exit

docker commit -m "mariadb 10.3.6" -author="Rasmus" [container id] rasmus/mariadb103:"basic configuration"

Tadaa, done! MariaDB 10.3.6 is now available in a Docker container and I can start playing with the cool new features of MariaDB Server 10.3 like System Versioned Tables. To start the container, I just run:

docker run -v="$HOME/dbdata":"/data" -i -t -p 3306 rasmus/mariadb103:”basic configuration” /bin/bash

 

There are times when you may want to test specific software or a specific version of software. In my case, I wanted to play with MariaDB Server 10.3.6 Release Candidate and some of the new, upcoming features. I didn’t want to have a permanent installation of it on my laptop so I chose to put it in a Docker container that I can easily copy to another place or remove. These are the steps I had to take to get it done.

Login or Register to post comments

by rasmusjohansson at April 25, 2018 05:07 PM

Serge Frezefond

a critical piece is missing for Oracle MySQL 8 (GA) …

Oracle MySQL 8.0 has been declared GA but a critical piece is missing … MySQL 8 is a fantastic release embedding the work of brilliant Oracle engineering. I will not detail all the great features of MySQL 8 as there are a lot of great presentations around it. https://mysqlserverteam.com/whats-new-in-mysql-8-0-generally-available/

One of my main concern regarding [...]

by Serge at April 25, 2018 11:57 AM

Peter Zaitsev

Percona Live 2018 Sessions: Ghostferry – the Swiss Army Knife of Live Data Migrations with Minimum Downtime

Shuhao Ghostferry Percona Live 2018 Sessions

In this blog post on Percona Live 2018 sessions, we’ll talk with Shuhoa Wu, Software Developer for Shopify, Inc. about how Ghostferry is the Swiss Army knife of live data migrations.

Existing tools like mysqldump and replication cannot migrate data between GTID-enabled MySQL and non-GTID-enabled MySQL – a common configuration across multiple cloud providers that cannot be changed. These tools are also cumbersome to operate and error-prone, thus requiring a DBA’s attention for each data migration. Shopify’s team introduced a tool that allows for easy migration of data between MySQL databases with constant downtime on the order of seconds.

Inspired by gh-ost, their tool is named Ghostferry and allows application developers at Shopify to migrate data without assistance from DBAs. It has been used to rebalance sharded data across databases. They open sourced Ghostferry at the Percona Live 2018 conference so that anyone can migrate their own data with minimal hassle and downtime. Since Shopify wrote Ghostferry as a library, you can use it to build specialized data movers that move arbitrary subsets of data from one database to another.

Shuhao walked through what data migration is, how it works, and how Ghostferry works to make this process simpler and standard across platforms – especially in systems (like cloud providers such as AWS or Google) where you don’t have control of the instances. Ghostferry also simplifies the replication process and allows someone to copy across instances with a single Ghostferry command, rather than having to understand both the source and target instances.

After the Percona Live 2018 sessions talk, I had a chance to speak with Shuhao about Ghostferry, Check it out below.

The post Percona Live 2018 Sessions: Ghostferry – the Swiss Army Knife of Live Data Migrations with Minimum Downtime appeared first on Percona Database Performance Blog.

by Dave Avery at April 25, 2018 01:11 AM

Percona Live 2018 Sessions: Microsoft Built MySQL, PostgreSQL and MariaDB for the Cloud

Percona Live 2018 Sessions Microsoft Azure

In this blog post on Percona Live 2018 sessions, we’ll talk with Jun Su, Principal Engineering Manager at Microsoft about how Microsoft built MySQL, PostgreSQL and MariaDB for the cloud.

Offering MySQL, PostgreSQL and MariaDB database services in the cloud is different than doing so on-premise. Latency, connection redirection, optimal performance configuration are just a few challenges. In this session, Jun Su walked us through Microsoft’s journey to not only offer these popular OSS RDBMS in Microsoft Azure, but how they are implemented in Azure as a true DBaaS. We learned about Microsoft’s Azure Database Services platform architecture, and how these services are built to scale.

In Azure, database engine instances are services managed by the Azure Service Fabric, which is a platform for reliable, hyperscale, microservice-based applications. So each database engine gets treated as a microservice. When coupled with Azure’s clustering — a set of machines that the Service Fabric stitches together — you can scale up to 1000+ machines. This provides some pretty impressive scaling opportunities. Jun also walked through some of the issues with multi-tenancy, and how different levels of multi-tenancy have different trade-offs in cost, capacity and density.

After the talk, I spoke briefly with Jun about Microsoft’s efforts to provide the different open source databases on the Azure platform.

The post Percona Live 2018 Sessions: Microsoft Built MySQL, PostgreSQL and MariaDB for the Cloud appeared first on Percona Database Performance Blog.

by Dave Avery at April 25, 2018 12:53 AM

April 24, 2018

Peter Zaitsev

Percona Live 2018 Sessions: Query Optimizer – MySQL vs. PostgreSQL

Percona Live 2018 MySQL at Twitter

In this blog post on Percona Live 2018 sessions, we’ll talk with Christian Antognini, Senior Principal Consultant at Trivadis about the differences between MySQL and PostgreSQL query optimizers.

MySQL and PostgreSQL are two of the most popular open-source relational databases. Why would you pick one over the other to support your applications? Of course, it depends on the use case, environment and workload. To help with choosing between them, the people at Trivadis ran a comparison of their query optimizers. The aim of this session was to summarize the outcome of the comparison. Specifically, to point out optimizer-related strengths and weaknesses.

Christian spent a lot of time looking at the differences in indexing with regard to sorts, keys and partitioning, as well as joins and merges.

Both engines have good configuration capabilities, metadata use, and indexing capabilities. MySQL has better hints, while PostgreSQL wins on object statistics and joins.

After the lecture, I had a chance to speak with Christian about the differences between the query optimizers for MySQL and PostgreSQL. Check it out below.

The post Percona Live 2018 Sessions: Query Optimizer – MySQL vs. PostgreSQL appeared first on Percona Database Performance Blog.

by Dave Avery at April 24, 2018 09:51 PM

Percona Live 2018 Sessions: MySQL at Twitter

Percona Live 2018 MySQL at Twitter

In this Percona Live 2018 blog, we’ll talk with Ronald Francisco, SRE of Database Infrastructure at Twitter about why they moved from a fork of MySQL to MySQL 5.7.

We already started today with a great set of keynote sessions, and now the breakout sessions have begun in earnest. I’ve been looking in on the talks and stopping to talk with some of the presenters.

In this session, Ronald Ramon Francisco (Twitter Inc) SRE, Database Infrastructure presented the motivation for moving from a fork to MySQL to MySQL proper, and why they decided to do it. Twitter has been using their own fork of MySQL for many years. Last year the team decided to migrate to the community version of MySQL 5.7 and abandoned their own version. The road to the community version was full of challenges.

He also discussed the challenges and surprises encountered and how they overcome them. Finally, He looked at lessons learned, recommendations and their future plans.

I got a chance to speak with Ronald after his talk, and ask a few questions.

Check it out below.

The post Percona Live 2018 Sessions: MySQL at Twitter appeared first on Percona Database Performance Blog.

by Dave Avery at April 24, 2018 08:21 PM

Percona Live 2018 Keynotes, Day One

Percona Live 2018 Keynotes Sugu

Welcome to Percona Live 2018 keynotes, day one!

Percona Live 2018 is up and running! We call this day one, but in reality, yesterday was filled with tutorials that provided excellent and practical information on how to get your MySQL, MongoDB, MariaDB and PostgreSQL environments up, running and optimized.

Today we started with keynote presentations from Percona, a technology panel, Oracle and Netflix. You can view the recording of today’s keynotes here.

Percona Live Keynotes LauriePercona Welcome

Laurie Coffin (Percona)

Laurie Coffin is Percona’s CMO, and she welcomed everyone to Percona Live Open Source Database Conference 2018 and announced some important items, like downloading the Percona app.

Make sure you download the app to use for the rest of the conference.

Percona Live 2018 Keynotes PeterOpen Source for the Modern Business

Peter Zaitsev (Percona)

As open source database adoption continues to grow in enterprise organizations, the expectations and definitions of what constitutes success continue to change. In today’s environment, it’s no longer a question of which database to use, but which databases do you need, what platforms will you deploy them on, and how do you get them to work together. A single technology for everything is no longer an option; welcome to the polyglot world.

Percona sees a lot of compelling open source projects and trends that interest the community. Peter also announced the beginning or Percona’s PostgreSQL support plans, as well as partnerships with Mesosphere and Microsoft. This makes Percona the only company that supports three of the major cloud providers and four of the major open source database platforms.

Percona Live 2018 Keynotes SuguCool Technologies Showcase

Nikolay Samokhvalov (Nombox), Sugu Sougoumarane (PlanetScale Data), Shuhao Wu (Shopify Inc.), Andy Pavlo (Carnegie Mellon University)

In this series of quick talks, we were treated to different perspectives on emerging database technologies:

  • Automatization of Postgres Administration. Cloud services like Amazon RDS or Google Cloud SQL help to automate half of DBA tasks: launch database instances, provision replicas, create backups. But for the most part, database tuning and query optimization aren’t. The purpose of automation should be to detect, predict and ultimately prevent database issues.
  • High Performance, Scalable, and Available MySQL Clustering System for the Cloud. Multiple companies now use Vitess in production. Vitess shines in this area by providing query logs, transaction logs, information URLs, and status variables that can feed into a monitoring system like Prometheus. Vitess won a MySQL community award last night at the Community Reception.
  • Ghostferry: the Swiss Army Knife of Live Data Migrations with Minimum Downtime. Inspired by gh-ost, Ghostferry allows application developers at Shopify to migrate data without assistance from DBAs. They plan to open source Ghostferry at the conference so that anyone can migrate their own data with minimal hassle and downtime.
  • What is a Self-Driving Database Management System? People are touting the rise of “self-driving” database management systems (DBMSs). But nobody has clearly defined what it means for a DBMS to be self-driving. Thus, in this keynote, Andy provides the history of autonomous databases and what is needed to make a true self-driving DBMS. Along with a new way to measure how close you are to academic tenure.

Percona Live 2018 Keynotes TomasState of the Dolphin 8.0

Tomas Ulin (Oracle)

Oracle just announced the availability of MySQL 8.0 GA. Today, Tomas Ulin talked about the focus, strategy, investments and innovations that allow MySQL to provide next-generation Web, mobile, cloud and embedded applications. He also discussed features, fixes and changes in the latest and the most significant MySQL database release ever in its history: MySQL 8.0.

Percona Live 2018 Keynotes BrendanLinux Performance 2018

Brendan Gregg (Netflix)

At over one thousand code commits per week, it’s hard to keep up with Linux developments. This keynote will summarize recent Linux performance features, for a wide audience: the KPTI patches for Meltdown, eBPF for performance observability, Kyber for disk I/O scheduling, BBR for TCP congestion control, and more. This is about exposure: knowing what exists, so you can learn and use it later when needed. Get the most out of your systems, whether they are databases or application servers, with the latest Linux kernels and exciting features.

All the keynotes today highlighted the many different aspects of the open source database community that come together to solve database challenges. Percona Live runs through Wednesday 4/25. Check out tomorrow’s keynotes here, as well as the numerous breakout sessions with top open source database experts.

The post Percona Live 2018 Keynotes, Day One appeared first on Percona Database Performance Blog.

by Dave Avery at April 24, 2018 06:49 PM

Jean-Jerome Schmidt

Announcing ClusterControl 1.6 - automation and management of open source databases in the cloud

Today we are excited to announce the 1.6 release of ClusterControl - the all-inclusive database management system that lets you easily deploy, monitor, manage and scale highly available open source databases - and load balancers - in any environment: on-premise or in the cloud.

ClusterControl 1.6 introduces a new set of cloud features in BETA status that allow users to deploy and manage their open source database clusters on public clouds such AWS, Google Cloud and Azure. The release also provides a Point In Time Recovery functionality for MySQL/MariaDB systems, as well as new topology views for PostgreSQL Replication clusters, MongoDB ReplicaSets and Sharded clusters.

Release Highlights

Deploy and manage clusters on public Clouds (BETA)

  • Supported cloud providers: Amazon Web Services (VPC), Google Cloud, and Azure
  • Supported databases: MySQL/MariaDB Galera, Percona XtraDB Cluster, PostgreSQL, MongoDB ReplicaSet

Point In Time Recovery - PITR (MySQL)

  • Position and time-based recovery for MySQL based clusters

Enhanced Topology View

  • Support added for PostgreSQL Replication clusters; MongoDB ReplicaSets and Sharded clusters

Additional Highlights

  • Deploy multiple clusters in parallel and increase deployment speed
  • Enhanced Database User Management for MySQL/MariaDB based systems
  • Support for MongoDB 3.6
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

View Release Details and Resources

Release Details

Deploy and manage open source database clusters on public Clouds (BETA)

With this latest release, we continue to add deeper cloud functionality to ClusterControl. Users can now launch cloud instances and deploy database clusters on AWS, Google Cloud and Azure right from their ClusterControl console; and they can now also upload/download backups to Azure cloud storage. Supported cloud providers currently include Amazon Web Services (VPC), Google Cloud, and Azure as well as the following databases: MySQL/MariaDB Galera, PostgreSQL, MongoDB ReplicaSet.

Point In Time Recovery - PITR (MySQL)

Point-in-Time recovery of MySQL & MariaDB involves restoring the database from backups prior to the target time, then uses incremental backups and binary logs to roll the database forward to the target time. Typically, database administrators use backups to recover from different types of cases such as a database upgrade that fails and corrupts the data or storage media failure/corruption. But what happens when an incident occurs at a time in between two backups? This is where binary logs come in: as they store all of the changes, users can also use them to replay traffic. ClusterControl automates that process for you and helps you minimize data loss after an outage.

New Topology View

The ClusterControl Topology View provides a visual representation of your database nodes and load balancers in real time, in a simple and friendly interface without the need to install any additional tools. Distributed databases or clusters typically consist of multiple nodes and node types, and it can be a challenge to understand how these work together. If you also have load balancers in the mix, hosts with multiple IP addresses and more, then the setup can quickly become too complex to visualise. That’s where the new ClusterControl Topology View comes in: it shows all the different nodes that form part of your database cluster (whether database nodes, load balancers or arbitrators), as well as the connections between them in an easy to view visual. With this release, we have added support for PostgreSQL Replication clusters as well as MongoDB ReplicaSets and Sharded clusters.

Enhanced Database User Management for MySQL based clusters

One important aspect of being a database administrator is to protect access to the company’s data. We have redesigned our DB User Management for MySQL based clusters with a more modern user interface, which makes it easier to view and manage the database accounts and privileges directly from ClusterControl.

Additional New Functionalities

  • Improved cluster deployment speed by utilizing parallel jobs. Deploy multiple clusters in parallel.
  • Support to deploy and manage MongoDB cluster on v3.6

Download ClusterControl today!

Happy Clustering!

by jj at April 24, 2018 02:59 PM

Peter Zaitsev

Percona Partners with Microsoft and Mesosphere To Help Enterprises Optimize and Maintain DBaaS Environments

Percona's New Cloud and Container Partners

Percona partners with Microsoft and MesosphereThis week, Percona partners with Microsoft and Mesosphere to make it easier for organizations to take advantage of cloud and container environments and run their open source databases, ensuring optimal performance while shifting their focus to improving applications and better supporting the business.

Percona’s new partnership with Microsoft to support Microsoft Azure customers, along with the availability of Percona Server for MongoDB in the Mesosphere DC/OS community, reflect Percona’s commitment to providing comprehensive support for open source databases in cloud deployments. 

With open source databases now standard in the enterprise, organizations are increasingly looking to deploy those databases in public cloud environments to benefit from the flexibility, scalability and economics of the cloud. Percona previously announced support partnerships with Google to offer Google Cloud SQL services and Amazon Web Services to support the use of Amazon Elastic Compute Cloud (Amazon EC2), Amazon Relational Database Service (Amazon RDS), and Amazon Aurora on AWS.

Partnership with Microsoft for Microsoft Azure Customers

Percona has entered into a partnership with Microsoft, a Platinum Sponsor of the Percona Live Open Source Database Conference, to make it easier for clients to take advantage of the Microsoft Azure cloud platform. Percona software will now be available via the Azure Marketplace, simplifying deployment into Azure Virtual Machines. Additionally, the Percona DBA Service for Microsoft Azure, as well as other support services, will help Azure customers derive the most benefit from their open source database environments.

Percona Server for MongoDB Certified on Mesosphere

Percona has partnered with Mesosphere to deliver Percona Server for MongoDB on Mesosphere DC/OS (data center operating system). As reliance on hybrid, on-premises, and cloud environments continues to grow, enterprises are increasingly turning to Mesosphere DC/OS to run large-scale applications. Percona Server for MongoDB offers these organizations the best of both worlds: a free and open source, enterprise-grade version of MongoDB backed by the renowned experts at Percona, plus the confidence that the solution meets Mesosphere’s rigorous standards for compatibility. Percona Server for MongoDB is available to all users of DC/OS 1.10 and newer as a beta solution through the DC/OS Universe package catalog, with full Mesosphere certification coming soon.

The post Percona Partners with Microsoft and Mesosphere To Help Enterprises Optimize and Maintain DBaaS Environments appeared first on Percona Database Performance Blog.

by Laurie Coffin at April 24, 2018 12:10 PM

Percona Expands Services Offerings with PostgreSQL Support

PostgreSQL Support

PostgreSQL SupportPercona is extending its world renowned open source database support expertise to provide PostgreSQL support. As a result, organizations can, for the first time, work with a single trusted vendor to meet their support needs for MySQL, MongoDB, MariaDB, PostgreSQL, or any hybrid combination of these database technologies, whether deployed on-premises, in the cloud, or in a Database as a Service (DBaaS) environment.

As the unbiased champion of open source database solutions, Percona has helped thousands of customers around the world succeed using the right solution for their business and technical needs, whether they deploy MySQL, MongoDB, or MariaDB, alone or in combination, on-premise or in the cloud.

The right solution for more and more companies today also involves PostgreSQL, which is why we are proud to announce that Percona Support for PostgreSQL will be available July 2018, making our world-renowned open source database expertise available to help these businesses reduce costs by avoiding problems before they arise, resolve problems that do arise as quickly as possible, and focus less on the database and more on strategic initiatives that create new business value.

Companies who deploy PostgreSQL in conjunction with other open source databases will have the added benefit of a single, trusted vendor for all their database support needs, streamlining the support process and driving faster issue resolution across their entire database infrastructure.

Open source databases are now mainstream, solving critical business problems for enterprises while delivering excellent value. Each major open source database has compelling features that make it attractive for specific workloads. Earlier this year, PostgreSQL won the DBMS of the Year 2017 award from DB-Engines, with a 17 percent increase in popularity. An RDBMS with a reputation for reliability, data integrity, and correctness, PostgreSQL has been adopted by a wide range of organizations, including Etsy.com, Fujitsu, Greenpeace, IMDB.com, MobyGames, Safeway, the University of California at Berkeley and many more.

What is PostgreSQL?

PostgreSQL is a general purpose, object-relational database management system. PostgreSQL was developed at the Berkeley Computer Science Department, University of California. PostgreSQL is free and open source software. Its source code is available under the PostgreSQL license.

PostgreSQL is very stable. PostgreSQL was the first database management system using multi-version concurrency control (MVCC), known as snapshot isolation in Oracle.

It has many advanced features, such as:

  • User-defined types
  • Sophisticated locking mechanism
  • Table inheritance
  • Multi-version concurrency control (MVCC)
  • Foreign key referential integrity
  • Nested transactions (savepoints)
  • Views, rules, subquery
  • Asynchronous replication
  • Native Microsoft Windows Server version
  • Tablespaces
  • Point-in-time recovery

By adding PostgreSQL to its portfolio of services, Percona has made it faster and easier for organizations to get the PostgreSQL support they need. With Percona’s expert advice, these organizations can also reduce costs by avoiding problems before they arise, allowing their teams to shift their focus from maintenance activities to more strategic initiatives. In addition, organizations using PostgreSQL in combination with other open source databases can get all the support and advice they need from a single, trusted vendor, resulting in a streamlined support process and faster issue resolution across their infrastructure.

Customers can access Percona Support for PostgreSQL via telephone, chat, email, and the web. 

The post Percona Expands Services Offerings with PostgreSQL Support appeared first on Percona Database Performance Blog.

by Dean Ellis at April 24, 2018 12:03 PM

Percona’s New Services and Software Products Help Organizations Achieve Performant, Durable Database Environments

Percona's New Services and Software

Percona's New Services and SoftwarePercona’s new services and software products, including a new Percona DBA service, new support tiers, and the general availability of Percona Server for MongoDB 3.6, the latest version of Percona’s free, enhanced, drop-in replacement for the MongoDB Community Edition are designed to provide enterprises with the performant and durable database environments needed to support critical applications and websites.

As open source databases are now standard in the enterprise, on-premises and cloud mission-critical data workloads require expert DBAs for architectural decisions, performance tuning, security compliance, query optimization, as well as monitoring and alert response. Percona has helped thousands of customers achieve better performance, better cost savings and better ROI by ensuring optimal performance while letting DBAs shift their focus to improving applications and better supporting their business.

Organizations need to deploy these open source databases in public cloud environments to benefit from the flexibility, scalability and economics of cloud deployments. Percona is committed to helping organizations take advantage of these benefits by providing comprehensive support for migrating open source databases to the cloud and providing support and DBA services specifically designed for the evolving role of database administrators.

Percona DBA Service

Percona DBA service is a flexible managed database service that offers a guaranteed set of scheduled, proactive deliverables. With the collective experience of a global team of experts to manage high-performance relational database environments in on-premises and cloud environments, Percona DBA Service provides performance tuning, growth planning, security assessments, health monitoring and reporting, 24x7x365 responsive services, business continuity reviews and more.

Cloud services like Amazon Aurora, Amazon RDS, Google Cloud and Microsoft Azure are easy to set up, operate and scale, and provide cost-efficient and resizable capacity while automating time-consuming administrative tasks. However, mission-critical databases deployed in DBaaS environments still require important architectural decisions, performance tuning, security compliance and query optimization, as well as monitoring and alert response. Percona DBA Service for Amazon Aurora, Amazon RDS, Google Cloud and Microsoft Azure offer peace of mind that the database architecture is being proactively managed and improved.

New Support Tiers

Percona’s new Essential, Standard and Premium Support tiers provide unbiased, comprehensive, responsive and cost-effective database subscriptions for MySQL, MariaDB, MongoDB and PostgreSQL open source databases. Available 24x7x365, Percona Support helps organizations navigate complexity and mitigate risks using best-of-breed open source software, avoiding vendor lock-in.

In addition, Percona is launching new DBaaS-focused support tiers that align with the changing role of database administrators who manage cloud environments. These administrators require less support for the routine management of operational tasks that are automated in the cloud, allowing Percona to reduce costs while still providing unbiased consultative support services to help customers properly configure their instances, improve schemas, tune queries, diagnose problems, benefit from additional database features, and ultimately get more out of their deployments.

The new support tiers will be available starting July 1, 2018.

Percona Server for MongoDB 3.6

Percona has announced the general availability of Percona Server for MongoDB 3.6, the latest version of the company’s free, enhanced, drop-in replacement for MongoDB Community Edition. Percona Server for MongoDB 3.6 contains all the new features introduced in MongoDB Community Edition 3.6, including:

  • Retryable writes, which ensures data is written to the database even after a network error occurs
  • Causal consistency, which allows reliable reads from secondary nodes
  • Security enhancements, including improved network listening and more restrictive access controls
  • Aggregation and array improvements, for greater querying flexibility

With more than 300,000 downloads, Percona Server for MongoDB provides all the cost and agility benefits of free, proven open source software, along with practical enterprise features. The greater security, reliability and flexibility of Percona Server for MongoDB 3.6 makes it ideal for running the document-based NoSQL MongoDB database in production environments to support product catalogs, online shopping carts, Internet of Things (IoT) applications, mobile/social apps and more.

The post Percona’s New Services and Software Products Help Organizations Achieve Performant, Durable Database Environments appeared first on Percona Database Performance Blog.

by Matt Yonkovit at April 24, 2018 12:02 PM

Percona Server for MongoDB 3.6.3-1.1 Is Now Available

Percona Server for MongoDB

Percona Server for MongoDBPercona announces the release of Percona Server for MongoDB 3.6.3-1.1 on April 24, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.6 Community Edition. It supports MongoDB 3.6 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine, as well as several enterprise-grade features. Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.6.3 and includes the following additional changes:

  •  MongoRocks is deprecated in Percona Server for MongoDB 3.6 and it will be fully removed in the next major version of Percona Server for MongoDB. Feature compatibility version is set to 3.4 when using Percona Server for MongoDB 3.6 with MongoRocks, so 3.6 features, such as retryable writes and causal consistency, cannot be used. Additionally, read concern majority may produce unreliable results.
  • #PSMDB-191: Fixed a bug in MongoRocks engine initialization code which caused wrong initialization of _maxPrefix value. This could lead to reuse of dropped prefix and accidental removal of data from the collection using a reused prefix.

    In some specific conditions data records could disappear at an arbitrary moment of time from the collections or indexes created after server restart.

    This could happen as the result of the following sequence of events:
    1. User deletes one or more indexes or collections. These should be the ones using maximum existing prefixes values.
    2. User shuts down the server before MongoRocks compaction thread executes compactions of deleted ranges.
    3. User restarts the server and creates new collections. Due to the bug those new collections and their indexes may get the same prefix values which were deleted and not yet compacted. The user inserts some data into the new collections.
    4. After the server restart MongoRocks compaction thread continues executing compactions of the deleted ranges and this process may eventually delete data from the collections sharing prefixes with deleted ranges.

     

  • #PSMDB-178: RocksSnapshotManager was reworked to match the new model of interaction between MongoDB and storage engine’s snapshot manager.

The Percona Server for MongoDB 3.6.3-1.1 release notes are available in the official documentation.

The post Percona Server for MongoDB 3.6.3-1.1 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at April 24, 2018 12:01 PM

Percona Server for MySQL 5.7.21-21 Is Now Available with Increased Built-In Security Enhancements

Percona Server for MySQL

Percona Server for MySQLPercona announces the GA release of Percona Server for MySQL 5.7.21-21 on on April 24, 2018. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

This version of Percona Server for MySQL 5.7.21 includes three new encryption features – Vault keyring plug-in, encryption for InnoDB general tablespaces, and encryption for binary log files.

These new capabilities, which allow companies to immediately increase security for their existing databases, are also part of a larger project to build complete, robust, enterprise-grade encryption capabilities into Percona Server for MySQL, allowing customers and the community to satisfy their most rigorous security compliance requirements. Percona also announced the release of a new version of Percona XtraBackup that supports backing up Percona Server for MySQL instances that have these encryption features enabled.

Based on MySQL 5.7.21, including all the bug fixes in it, Percona Server for MySQL 5.7.21-21 is the current GA release in the Percona Server for MySQL 5.7 series. Percona provides completely open-source and free software.

New Features:
  • A new variable innodb_temp_tablespace_encrypt is introduced to turn encryption of temporary tablespace and temporary InnoDB file-per-table tablespaces on/off. Bug fixed #3821.
  • A new variable innodb_encrypt_online_alter_logs simultaneously turns on encryption of files used by InnoDB for merge sort, online DDL logs, and temporary tables created by InnoDB for online DDL. Bug fixed #3819.
  • A new variable innodb_encrypt_tables can be set to ON, making InnoDB tables encrypted by default, to FORCE, disabling creation of unencrypted tables, or OFF, restoring the like-before behavior. Bug fixed #1525.
  • Query response time plugin now can be disabled at session level with use of a new variable query_response_time_session_stats.
Bugs Fixed:
  • Attempting to use a partially-installed query response time plugin could have caused server crash. Bug fixed #3959.
  • There was a server crash caused by a materialized temporary table from semi-join optimization with key length larger than 1000 bytes. Bug fixed #296.
  • A regression in the original 5.7 port was causing integer overflow with thread_pool_stall_limit variable values bigger than 2 seconds. Bug fixed #1095.
  • A memory leak took place in Percona Server when performance schema is used in conjunction with thread pooling. Bug fixed #1096.
  • A code clean-up was done to fix compilation with clang, both general warnings (bug fixed #3814, upstream #89646) and clang 6 specific warnings and errors (bug fixed #3893, upstream #90111).
  • Compilation warning was fixed for -DWITH_QUERY_RESPONSE_TIME=ON CMake compilation option, which makes QRT to be linked statically. Bug fixed #3841.
  • Percona Server returned empty result for SELECT query if number of connections exceeded 65535. Bug fixed #314 (upstream #89313).
  • A clean-up in Percona Server binlog-related code was made to avoid uninitialized memory comparison. Bug fixed #3925 (upstream #90238).
  • mysqldump utility with --innodb-optimize-keys option was incorrectly working with foreign keys on the same table, producing invalid SQL statements. Bugs fixed #1125 and #3863.
  • A fix of the mysqld startup script failed to detect jemalloc library location for preloading, thus not starting on systemd based machines, introduced in Percona Server 5.7.21-20, was improved to take into account previously created configuration file. Bug fixed #3850.
  • The possibility of a truncated bitmap file name was fixed in InnoDB logging subsystem. Bug fixed #3926.
  • Temporary file I/O was not instrumented for Performance Schema. Bug fixed #3937 (upstream #90264).
  • A crash in the unsafe query warning checks with views took place for UPDATE statement in case of statement binlogging format. Bug fixed #290.
MyRocks Changes:
  • A re-implemented variable rpl_skip_tx_api allows to turn on simple RocksDB write batches functionality, increasing replication performance by the transaction api skip. Bug fixed MYR-47.
  • Decoding value-less padded varchar fields could under some circumstances cause assertion and/or data corruption. Bug fixed MYR-232.
TokuDB Changes:
  • Two new variables introduced for the TokuDB fast updates feature, tokudb_enable_fast_update and tokudb_enable_fast_upsert should be now used instead of the NOAR keyword, which is now optional at compile time and off by default. Bugs fixed #63 and #148.
  • A set of compilation fixes was introduced to make TokuDB successfully build in MySQL / Percona Server 8.0. Bugs fixed #84, #85, #114, #115, #118, #128, #139, #141, and #172.
  • Conditional compilation code dependent on version ID in the TokuDB tree was separated and arranged to specific version branches. Bugs fixed #133, #134, #135, and #136.
  • ALTER TABLE ... COMMENT = ... statement caused TokuDB to rebuild the whole table, which is not needed, as only FRM metadata should be changed. Bug fixed #130, and #137.
  • Data race on the cache table pair attributes was fixed.

Other bugs fixed: #3793, #3812, #3813, #3815, #3818, #3835, #3875 (upstream #89916), #3843 (upstream #89822), #3848, #3856, #3887, MYR-160, MYR-245, #109, #111,#180, #181, #182, and #188.

The release notes for Percona Server for MySQL 5.7.21-20 are available in the online documentation. Please report any bugs on the project bug tracking system.

[2018-04-24 – UPDATE:] CentOS 7 packages were affected by PS-3971 where my.cnf configuration file would be replaced by a symlink. These packages (5.7.21-21.1) were removed from the repos, and new packages (5.7.21-21.2) with the fix have been deployed later today.

The post Percona Server for MySQL 5.7.21-21 Is Now Available with Increased Built-In Security Enhancements appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at April 24, 2018 11:59 AM

Percona XtraBackup 2.4.11 Is Now Available

Percona_XtraBackup

Percona XtraBackupPercona announces the GA release of Percona XtraBackup 2.4.11 on April 24, 2018. This release is based on MySQL 5.7.19. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

New features and improvements:

Release notes with all the improvements for version 2.4.11 are available in our online documentation. Please report any bugs to the issue tracker.

The post Percona XtraBackup 2.4.11 Is Now Available appeared first on Percona Database Performance Blog.

by Dmitriy Kostiuk at April 24, 2018 11:48 AM

Open Query Pty Ltd

PURGE BINARY LOGS with a relative time

Sometimes you want to reduce disk usage on certain servers by adjusting the time that binary logs are kept.  Also, some installations of MySQL and MariaDB have suffered from a very-hard-to-catch bug where the binary logs end up not getting automatically expired (basically, the expire_logs_days option doesn’t always work effectively).

A workaround can be scripted, but typically the script would specify the exact datetime to which the logs need to be kept.  The reference manual and examples all do this too, quite explicitly, noting:

The datetime expression is in the format ‘YYYY-MM-DD hh:mm:ss’.

However, the actual command syntax is phrased as follows:

PURGE { BINARY | MASTER } LOGS { TO ‘log_name’ | BEFORE datetime_expr }

and that indicates much more flexibility in the parser: “datetime_expr” means that you can put in an arbitrary temporal expression!

So let’s test that, with a functional equivalent of expire_logs_days=14:

FLUSH BINARY LOGS;
PURGE BINARY LOGS BEFORE (NOW() – INTERVAL 14 DAY);

And yep, that works (and the extra parenthesis around the expression are not required, I just did that to show clearly what the expression is).

Now, I’m not the first person to use this construct, there are several posts online from recent years that use an expression with PURGE BINARY LOGS.  I’m not sure whether allowing datetime_expr is a modification that was made in the parser at some point, or whether it was always possible.  Fact is, the reference manual text (MariaDB as well as MySQL) only provide examples with an absolute ISO datetime: ‘YYYY-MM-DD HH:MM:SS’.

by Arjen Lentz at April 24, 2018 03:54 AM

April 23, 2018

Peter Zaitsev

Percona Live 2018 Featured Talk: Data Integrity at Scale with Alexis Guajardo

Alexis Google Percona Live 2018

Percona Live 2018 Featured TalkWelcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Alexis Guajardo, Senior Software Engineer at Google.com. His session talk is titled Data Integrity at Scale. Keeping data safe is the top responsibility of anyone running a database. In this session, he dives into Cloud SQL’s storage architecture to demonstrate how they check data down to the disk level:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Alexis: I am a Software Engineer on the Cloud SQL team with Google Cloud. I got into databases by using FileMaker. However, the world of database technology has changed many times over since then.

Percona: Your session is titled “Data Integrity at Scale”. Has the importance of data integrity increased over time? Why?

Alexis Google Percona Live 2018Alexis: Data integrity has always been vital to databases and data in general. The most common method is using checksum validation to ensure data integrity. The challenge that we faced at Cloud SQL on Google Cloud was how to do this for two very popular open source database solutions, and how to do it at scale. The store for MySQL was a bit more straightforward, because of innochecksum.  PostgreSQL required our team to create a utility, which is open sourced. The complicated aspect of data corruption is that sometimes it is dormant and discovered at a most inopportune time. What we have instituted are frequent checks for corruption of the entire data set, so if there is a software bug or other issue, we can mitigate it as soon as possible.

Percona: How does scaling affect the ability to maintain data integrity?

AlexisThere is a benefit to working on a team that provides a public cloud. Since Google Cloud is not bounded by most restrictions that an individual or company would be, we can allocate resources to do data integrity verifications without restriction. If I were to implement a similar system at a smaller company, most likely there would be cost and resource restrictions. However, data integrity is a feature that Google Cloud provides.

Percona: What are three things a DBA should know about ensuring data integrity?

Alexis: I think that the three things can be simplified down to three words: verify your backups.

Even if someone does not use Cloud SQL, it is vital to take backups, maintain them and verify them. Having terabytes of backups, but without verification, leaves open the possibility that a software bug or hardware issue somehow corrupted a backup.

Percona: Why should people attend your talk? What do you hope people will take away from it? 

Alexis: I would say the main reason to attend my talk is to discover more about Cloud SQL. As a DBA or developer, having a managed database as a service solution takes away a lot of the minutia. But there are still the tasks of improving queries and creating applications.  However, having reliable and verified backups is vital. With the addition of high availability and the ability to scale up easily, Cloud SQL’s managed database solution makes life much easier.

Percona: What are you looking forward to at Percona Live (besides your talk)?

Alexis: The many talks about Vitesse look very interesting. It is also an open source Google technology, and to see its adoption by many companies and how they have benefited from its use will be interesting.

Want to find out more about this Percona Live 2018 featured talk, and data integrity at scale? Register for Percona Live 2018, and see Alexis session talk Data Integrity at Scale. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

The post Percona Live 2018 Featured Talk: Data Integrity at Scale with Alexis Guajardo appeared first on Percona Database Performance Blog.

by Dave Avery at April 23, 2018 08:27 PM

Check Out the Percona Live 2018 Live Stream!

Percona Live 2018 Live Stream

Announcing the Percona Live 2018 live stream.

This year at Percona Live Open Source Database Conference 2018 we are live streaming the Keynote Talks on Day 1 and 2.

Percona is streaming the keynotes on Tuesday, April 24, 2018, and Wednesday, April 25, 2018 beginning at 9 AM PDT (both days). The keynote speakers include people from VividCortex, Upwork, Oracle, Netflix and many more. The keynote panels feature a cloud discussion and a cool technologies showcase.

Use the live stream link if you don’t want to miss a keynote, but can’t be at the main stage. The link for the live stream is:

The list of keynote talks and speakers for each day is:

Day 1

Day 2

The post Check Out the Percona Live 2018 Live Stream! appeared first on Percona Database Performance Blog.

by Dave Avery at April 23, 2018 08:24 PM

This Week In Data with Colin Charles 36: Percona Live 2018

Colin Charles

Colin CharlesPercona Live Santa Clara 2018! Last week’s column may have somehow not made it to Planet MySQL, so please don’t miss the good links at: This Week in Data with Colin Charles 35: Percona Live 18 final countdown and a roundup of recent news.

Back to Percona Live – I expect people are still going to be registering, right down to the wire! I highly recommend you also register for the community dinner. They routinely sell out and people tend to complain about not being able to join in the fun, so reserve your spot early. Please also be present on Monday, which is not just tutorial day, but also during the welcoming reception, there will be the most excellent community awards. In addition, if you don’t find a tutorial as something you’re interested in (or didn’t get a ticket that included tutorials!), why not check out the China Track, something new and unique that showcases the technology coming out of China.

The biggest news this week? On Thursday, April 19, 2018, MySQL 8.0 became Generally Available with the 8.0.11 release. The release notes are a must read, as is the upgrade guide (this time around, you really want to read it!). Some more digestible links: What’s New in MySQL 8.0? (Generally Available), MySQL 8.0: New Features in Replication, MySQL 8.0 – Announcing GA of the MySQL Document Store. As a bonus, the Hacker News thread is also well worth a read. Don’t forget that all the connectors also got a nice version bump.

The PostgreSQL website has been redesigned – check out PostgreSQL.org.

More open source databases are always a good thing, and it’s great to see Apple open sourcing FoundationDB. Being corporate-backed open source, I have great hopes for what the project can become. The requisite Hacker News thread is also well worth a read.

Releases

  • PostgreSQL 10.3, 9.6.8, 9.5.12, 9.4.17, AND 9.3.22 released
  • MariaDB 10.3.6 is another release candidate, more changes for sql_mode=oracle, changes to the INFORMATION_SCHEMA tables around system versioning, and more. Particularly interesting is the contributor list, listing a total of 34 contributors. Five come from the MariaDB Foundation (including Monty) which is 14%, 17 come from the MariaDB Corporation (including Monty again) which is 50%, two from Tempesta, one from IBM, six from Codership (over 17%!), and four are independent. So nearly 62% of contributions are run by the Corporation/Foundation in total.
  • SysbenchRocks, a repository of Sysbench benchmarks, libraries and extensions.

Link List

Upcoming appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week In Data with Colin Charles 36: Percona Live 2018 appeared first on Percona Database Performance Blog.

by Colin Charles at April 23, 2018 07:09 PM

MariaDB Foundation

MariaDB 5.5.60 now available

The MariaDB Foundation is pleased to announce the immediate availability of MariaDB 5.5.60. This is a stable (GA) release. See the release notes and changelog for details. Download MariaDB 5.5.60 Release Notes Changelog What is MariaDB 5.5? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 5.5.60 Alexander Barkov (MariaDB Corporation) Alexey Botchkov (MariaDB […]

The post MariaDB 5.5.60 now available appeared first on MariaDB.org.

by Ian Gilfillan at April 23, 2018 05:38 PM

MariaDB AB

MariaDB Server 5.5.60 now available

MariaDB Server 5.5.60 now available dbart Mon, 04/23/2018 - 12:35

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 5.5.60. See the release notes and changelog for details and visit mariadb.com/downloads to download.

Download MariaDB Server 5.5.60

Release Notes Changelog What is MariaDB 5.5?

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 5.5.60. See the release notes and changelog for details.

Login or Register to post comments

by dbart at April 23, 2018 04:35 PM

Federico Razzoli

MySQL vs. MariaDB: WAIT, NOWAIT, SKIP LOCKED

NOWAIT, WAIT and SKIP LOCKED are syntaxes added in MySQL 8.0 and MariaDB 10.3. The idea came from AliSQL (MySQL fork by Alibaba). It was revisited in MySQL, and I am not aware if MariaDB used the original implementation. EDIT: As Morgan Tocker points out in a comment, originally Ali Baba filed a feature request to MySQL.

While MySQL and MariaDB syntaxes are similar, there are important differences and the compatibility is only apparent. This article discusses these differences.

WAIT

This syntax is only available in MariaDB. It means that, if a row or table that we want to read is write-locked, we can wait up to the specified number of seconds. If the lock is not released after the timeout occurs, the query will fail.

NOWAIT

If a table or row we need to read is write-locked, the query will not be queued; instead, it will fail immediately.

Incompatibilities:

  • MariaDB supports this syntax for some DDL statements (ALTER TABLE and its shortcuts), LOCK TABLES, and SELECT. MySQL only supports it for SELECT.
  • MySQL only supports this syntax in combination with FOR UPDATE or FOR SHARE. In order to introduce an incompatibility, they sacrificed the support of this feature for SELECTs in SERIALIZABLE mode, that have an implicit LOCK IN SHARE MODE clause. Fortunately this is an edge case, but it is another case where Oracle marketing strategies affect users in a bad way.
  • MySQL implements FOR UPDATE OF and FOR SHARE OF. This is interesting, and not only for the NOWAIT feature, because it allows us to JOIN multiple tables without locking them all. Thanks, Oracle engineers.
  • MySQL and MariaDB report different error codes and messages.
    MySQL says: ERROR 3572 (HY000): Statement aborted because lock(s) could not be acquired immediately and NOWAIT is set
    MariaDB says: ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

SKIP LOCKED

This is only in MySQL. It excludes locked rows from the resultset, making our queries fast. The documentation warns us that the resultset will be inconsistent. This is implicit in the future, but it is worth emphatizing. However, consistency is not always so important, and skipping rows seems to me a great way to solve some performance problems.

EDIT: Morgan’s comment points out that SKIP LOCKED is also non-deterministic. Again, I believe this is clear if you understand what this feature does, but still, maybe I should point it out. I think it could be compare to READ UNCOMMITTED: they are different optimizations, but in both cases the results you get depend on what other connections are doing. So results are inconsistent and not deterministic by nature.

My conclusions

MariaDB implementation of NOWAIT is clearly more complete. They have WAIT syntax to set a timeout; they implemented this syntax in more SQL statements; and the syntax is compatible with implicit LOCK IN SHARE MODE.

Despite this, IMHO, MySQL wins. They have SKIP LOCKED, which is very interesting. Also the above mentioned FOR UPDATE OF syntax is a nice feature.

Once again, Oracle spent some resources to add incompatibilities. This does not affect the quality of MySQL, but it’s still a damage to the community, which should be able to use both MySQL and MariaDB in the same environments, with the same tools and the same libraries. But the distance between these DBMS’s is growing constantly.

References

Federico

by Federico at April 23, 2018 12:56 PM

April 20, 2018

Peter Zaitsev

The Final Countdown: Are You Ready for Percona Live 2018?

Are you ready for Percona Live

Are you ready for Percona Live 2018It’s hard to believe Percona Live 2018 starts on Monday! We’re looking forward to seeing everyone in Santa Clara next week! Here are some quick highlights to remember:

  • In addition to all the amazing sessions and keynotes we’ve announced, we’ll be hosting the MySQL Community Awards and the Lightning Talks on Monday during the Opening Reception.
  • We’ve also got a great lineup of demos in the exhibit hall all day Tuesday and Wednesday – be sure to stop by and learn more about open source database products and tools.
  • On Monday, we have a special China Track now available from Alibaba Cloud, PingCAP and Shannon Systems. We’ve just put a $20.00 ticket on sale for that track, and if you have already purchased any of our other tickets, you are also welcome to attend those four sessions.
  • Don’t forget to make your reservation at the Community Dinner. It’s a great opportunity to socialize with everyone and Pythian is always a wonderful host!

Thanks to everyone who is sponsoring, presenting and attending! The community is who makes this event successful and so much fun to be a part of!

The post The Final Countdown: Are You Ready for Percona Live 2018? appeared first on Percona Database Performance Blog.

by Laurie Coffin at April 20, 2018 09:07 PM