Shlomi NoachTip: faster than TRUNCATE (9.3.2010, 11:37 UTC)

TRUNCATE is usually a fast operation (much faster than DELETE FROM). But sometimes it just hangs; I’ve has several such uncheerful events with InnoDB (Plugin) tables which were extensively written to. The TRUNCATE hanged; nothing else would work; minutes pass.

TRUNCATE on tables with no FOREIGN KEYs should act fast: it translate to dropping the table and creating a new one (and it all depends on the MySQL version, see the manual).

What’s faster than TRUNCATE, then? If you don’t have triggers nor FOREIGN KEYs, a RENAME TABLE can come to the rescue. Instead of:

TRUNCATE log_table

Do:

CREATE TABLE log_table_new LIKE log_table;
RENAME TABLE log_table TO log_table_old, log_table_new TO log_table;
DROP TABLE log_table_old;

I found this to work well for me. Do note that AUTO_INCREMENT values can be tricky here: the “new” table is created with an AUTO_INCREMENT value which is immediately taken in the “working” table. If you care about not using same AUTO_INCREMENT values, you can:

ALTER TABLE log_table_new AUTO_INCREMENT=some high enough value;

Just before renaming.

I do not have a good explanation as for why the RENAME TABLE succeeds to respond faster than TRUNCATE.

Link
Venu AnugantiWhen indexes are created in internal temporary tables (8.3.2010, 11:40 UTC)

During my previous post on how to improve derived tables performance, I patched the code to add indexes forcefully on internal derived table results, which made a huge difference in the performance. It was just an experiment and a thought to see if it really works without re-writing the queries, so that the logic can be pushed towards the engine rather than query re-write. \

But I got few emails in my inbox today asking whether MySQL really create any keys on internal temporary tables.

The answer is YES; and MySQL does create two keys on internal temporary tables namely ‘group_key‘ and ‘distinct_key‘ on the following conditions:

  • If there is any aggregate function and/or group-by (group_key)
  • Distinct column name(group_key)
  • Distinct in combination with group-by/aggregation functions (distinct_key)

Provided the query results are yielded in temporary table (Using temporary from the explain), else they get optimized away by the existing indexes from the regular table itself. These keys are added to both memory and disk based (MyISAM) internal temporary tables; so it does not matter if the internal temporary table is in memory or disk.

Here is a simple dump of internal temporary table index stats for some of the basic queries related to Information schema [Warning: these queries are really bad, and can't be used for any production use as they are meant for demonstration of different internal keys ]. This is a patch that I might be using for SHOW TEMPORARY TABLES when internal tables are included in the second version. The first version of the patch is already pushed to Maria branch, hoping that it gets pushed to 5.1.

-----------------------------
 TMP TABLE STATS, SESSION: 1
   temp file  : /tmp/#sqlf90_1_1f
   temp type  : MEMORY
   index count: 1
    key 1-1   : distinct_key
    field     : (null)
    key 1-2   : distinct_key
    field     : ENGINE
 query: select count(distinct engine) from information_schema.tables
-----------------------------
 
-----------------------------
 TMP TABLE STATS, SESSION: 1
   temp file  : /tmp/#sqlf90_1_21
   temp type  : MEMORY
   index count: 1
    key 1

Truncated by Planet PHP, read more at the original (another 9165 bytes)

Link
Kentoku SHIBA[MySQL][Spider]Spider-2.15 released (7.3.2010, 18:30 UTC)
I'm pleased to announce the release of Spider storage engine version 2.15(beta).
Spider is a Storage Engine for database sharding.
http://spiderformysql.com/

The main changes in this version are following.
This release is bug fix release.

Please see "99_change_logs.txt" in the download documents for more detail.

Enjoy!
Link
Colin CharlesAbusing MySQL (& thoughts on NoSQL) (7.3.2010, 16:15 UTC)
The NoSQL/relational database debate has been going on for quite some time. MariaDB, like MySQL is relational. And if you read these series of blog posts, you’ll realise that if you use MySQL correctly, you can achieve quite a lot. It all starts with Kellan Elliott-McCrea with his introductory post on Using, Abusing and Scaling MySQL [...] Related posts:
  1. MySQL Ecosystem – complementary talks at the conference?
  2. MariaDB 5.1.42 released!
  3. MySQL 5.1.26-rc released, and developer resources thoughts
Link
Venu AnugantiHow to improve subqueries derived tables performance (6.3.2010, 21:38 UTC)

Last week I was working on one of the issue where the sub-query related to OLAP staging was running for about 2+ hours in the production server and finally nailed down to get the query to run in < 7 secs. It was bit interesting and kind of known issue in MySQL sub-queries world and one of the limitation from MySQL on giving more control over derived table results.

Sometimes we can re-write the sub-queries so that there is no derived tables complexity involved; but the bad part is; this particular sub-query is part of an UPDATE statement; so not all sub-queries can be re-written especially when they are part of UPDATE or DELETE statements due to its own limitations.

PROBLEM:

Here is the subset of the problem query and as you can see it runs for about 6 minutes in this small subset of data that I used for testing on Mac. All tables are InnoDB based.

--------------
SELECT
    SUM(aggrpt.imps) as imps,
    SUM(aggrpt.clicks) as clicks,
    SUM(aggrpt.pos) as pos
 
FROM aggrpt
LEFT JOIN
(
    SELECT
    DISTINCT ext_group_id, group_id
    FROM sub
) sub2  ON(sub2.ext_group_id=aggrpt.adgroupid)
 
GROUP BY
aggrpt.report_date,
aggrpt.campaignid,
aggrpt.adgroupid,
aggrpt.keywordid
ORDER BY NULL
INTO OUTFILE '/tmp/test-sub.txt'
-------------- 
 
Query OK, 47827 rows affected (6 min 47.48 sec)

HOW TO GET AROUND – SOLUTIONS:

Moving the derived table (in the above case sub2) to a view did not help; and timings are more or less the same. Here is the two alternative ways, which made the query run in < 10 secs.

  1. By creating external table and adding an index instead of using derived table
  2. Added an index within the mysql code on the derived table temporary results table by adding FORCE INDEX syntax (changed the syntax to support this, so that engine will create an index on temporary derived table results, which in this case happens to fit within heap engine instead of disk based)

CASE 1:

This is a known alternative and lot of people use this in production by avoiding the derived tables and/or sub-queries completely by creating tables for derived tables. The only thing that made the big difference is adding an index on this; without index it takes more or less the same 6 minutes times.

--------------
CREATE TEMPORARY TABLE sub_tem

Truncated by Planet PHP, read more at the original (another 8058 bytes)

Link
Colin CharlesMySQL Conference: Radar interview, “mini-tutorials” (6.3.2010, 19:15 UTC)
A few notes about the MySQL Conference & Expo 2010. Check out the schedule. Its more packed than you can imagine. You’re going to want to be in many places at once, by the looks of it. O’Reilly Radar has an interview with Ronald Bradford: MySQL migration and risk management. You’ll get a teaser as to Ronald’s [...] Related posts:
  1. MySQL Ecosystem – complementary talks at the conference?
  2. MySQL Conference Update: Grid is up, go promote and register!
  3. o’reilly mysql conference & expo 2010
Link
Peter ZaitsevHow many fsync / sec FusionIO can handle (4.3.2010, 16:09 UTC)

I recently was asked how many fsync / sec ( and therefore durable transactions / sec) we can get on FusionIO card.

It should be easy to test, let's take sysbench fileio benchmark and run, the next command should make it:


./sysbench --test=fileio --file-num=1 --file-total-size=50G --file-fsync-all=on --file-test-mode=seqrewr --max-time=100 --file-block-size=4096 --max-requests=0 run

CODE:
  1. Operations performed:  0 Read, 922938 Write, 922938 Other = 1845876 Total
  2. Read 0b  Written 3.5207Gb  Total transferred 3.5207Gb  (36.052Mb/sec)
  3.  9229.35 Requests/sec executed

So that's 9229.35 req/sec, which is pretty impressive.

For comparison the same run on PERC 6i RAID10 with BBU:

CODE:
  1. Operations performed:  0 Read, 4832661 Write, 4832661 Other = 9665322 Total
  2. Read 0b  Written 18.435Gb  Total transferred 18.435Gb  (20.975Mb/sec)
  3.  5369.62 Requests/sec executed

which gives us 5369.62 req/sec.

Note this is for single thread, and in MySQL/InnoDB multi-thread load you may get more transactions per second with group commit ( which is back to live in InnoDB-plugin / XtraDB )


Entry posted by Vadim | No comment

Add to: delicious | digg |

Truncated by Planet PHP, read more at the original (another 1033 bytes)

Link
Colin CharlesMySQL Ecosystem – complementary talks at the conference? (3.3.2010, 15:15 UTC)
Its times like this, I want to hear from the greater community – the ones that are reading say, Planet MySQL or Planet MariaDB. MySQL to me, and many others is an ecosystem. We’ve had for the longest time, complementary technology talks, like for memcached (which have been popular, filled rooms). NoSQL is becoming quite popular, [...] Related posts:
  1. MySQL Conference Update: Grid is up, go promote and register!
  2. MySQL Conference: Radar interview, “mini-tutorials”
  3. Last chances to submit your MySQL Conference talk!
Link
Shlomi NoachQuick reminder: avoid using binlog-do-db (2.3.2010, 19:03 UTC)

Nothing new about this warning; but it’s worth repeating:

Using binlog-do-db is dangerous to your replication. It means the master will not write to binary logs any statement not in the given database.

Ahem. Not exactly. It will not write to binary logs any statement which did not originate from the given database.

Which is why a customer, who was using Toad for MySQL as client interface to MySQL, and by default connected to the mysql schema, did not see his queries being replicated. In fact, he later on got replication errors. If you do:

USE test;
INSERT INTO world.City VALUES (...)

Then the statement is assumed to be in the test database, not in the world database.

Slightly better is using replicate-do-db on the slave machines. At least we allow the master to write everything. But still, for the same reasons, slaves may fail to repeat a perfectly valid query, just because it has been issued in the context of the wrong database. replicate-ignore-db is somewhat safer yet, but the trap is still there.

My advice is that replication should replicate everything. Make sure you and everyone else you work with understand the implications of binlog-do-db and replicate-do-db before implementing it.

Link
Shlomi NoachStatic charts vs. interactive charts (2.3.2010, 13:28 UTC)

I’m having my usual fun with charts. Working on mycheckpoint, I’ve generated monitoring charts using the Google Chars API. But I’ve also had chance to experiment and deploy interactive charts, JavaScript based. In particular, I used and tweaked dygraphs.

I’d like to note some differences in using charts of both kinds. And I think it makes a very big difference.

Static charts

I’ll call any image-based chart by “static chart”. It’s just a static image. Example of such charts are those generated by Google Image Charts (they now also have new, interactive charts), or RRDtool. Show below is an example of a static chart; in this example, generated by Google:


Pros and cons of static charts

Pros

  • Images can be viewed on any graphical platform. Browsers, email clients, cell phones, whatever.
  • Self contained: chart image, legend, scales: all in one image.
  • As such, easy to move around.
  • Are safe to use.

Cons

  • Images are fuzzy. Is the com_replace_psec really 0? Maybe it’s 0.1? A larger value can make lower values hard to tell.
  • Images are inaccurate: the colors can lie. The red and green lines showing are hard to tell apart. The red is painted above the green. Data gets “lost”.
  • They do not zoom (one needs to regenerate larger image)
  • Unless encoded with base64, HTML pages which include images need to link outside.
  • In the particular case of Google Charts, one is limited to 2K length URL. Trust me, it’s a big limitation! (PS, Google now support POST method to allow for up to 16K. But… it’s a POST method…)
  • In the particular case of Google Charts, one must have an internet connection.
  • In the particular case of Google Charts, one must submit data to Google.

Interactive charts

Interactive charts are those which react to your commands. These are either JavaScript or Flash based, mostly. They allow for really nice features. Take the following chart as an example: try and move over with your mouse; or select sections to zoom in.

<script src="http://code.openark.org/blog/wp-content/uploads/2010/03/dygraph-combined.js" type="text/javascript"/>

DML


[graphDiv]

[labelsDiv]

<script type="text/javascript">// < ![CDATA[ // < ![CDATA[ g_DML = new Dygraph( document.getElementById("graphDiv_DML"), "Date,com_select_psec,com_insert_psec,com_delete_psec,com_update_psec,com_replace_psec\n2009-12-04 15:00:00,113.28,76.54,26.14,29.54,0.00\n2009-12-04 15:10:00,91.51,55.74,30.07,18.13,0.00\n2009-12-04 15:20:00,104.98,57.75,28.26,18.24,0.00\n2009-12-04 15:30:00,110.64,72.17,30.17,27.58,0.00\n2009-12-04 15:40:00,97.79,46.27,29.27,15.91,0.00\n2009-12-04 15:50:00,114.35,61.85,29.55,22.45,0.00\n2009-12-04 16:00:00,110.82,72.3

Truncated by Planet PHP, read more at the original (another 8878 bytes)

Link
LinksRSS 0.92   RDF 1.
Atom Feed