Peter ZaitsevPercona XtraDB Cluster 5.5.41-25.11 is now available (30.3.2015, 16:22 UTC)

Percona XtraDB Cluster 5.5.41-25.11Percona is glad to announce the new release of Percona XtraDB Cluster 5.5 on March 30th 2015. Binaries are available from downloads area or from our software repositories.

Based on Percona Server 5.5.41-37.0 including all the bug fixes in it, Galera Replicator 2.11, and on Codership wsrep API 25.11, Percona XtraDB Cluster 5.5.41-25.11 is now the current 5.5 General Availability release. All of Percona‘s software is open-source and free, and all the details of the release can be found in the 5.5.41-25.11 milestone at Launchpad.

Bugs Fixed:

  • XtraBackup SST wouldn’t stop when MySQL was SIGKILLed. This would prevent MySQL to initiate a new transfer as port 4444 was already utilized. Bug fixed #1380697.
  • garbd was returning incorrect return code, ie. when garbd was already started, return code was 0. Bugs fixed #1308103 and #1422863.
  • wsrep_sst_xtrabackup-v2 script was causing innobackupex to print a false positive stack trace into the log. Bug fixed #1407599.
  • MyISAM DDL (CREATE TABLE only) isn’t replicated anymore when wsrep_replicate_myisam is OFF. Note, for older nodes in the cluster, wsrep_replicate_myisam should work since the TOI decision (for MyISAM DDL) is done on origin node. Mixing of non-MyISAM and MyISAM tables in the same DDL statement is not recommended with wsrep_replicate_myisam OFF since if any table in list is MyISAM, the whole DDL statement is not put under TOI (total order isolation). This also doesn’t work if default_storage_engine variable is set to MyISAM (which is not recommended for Percona XtraDB Cluster) and a table is created without the ENGINE option. Bug fixed #1402338.
  • Percona XtraDB Cluster now shows a warning in case additional utilities, like pv which may not affect critical path of SST, are not installed. Bug fixed #1248688.
  • wsrep_causal_reads variable was not honored when declared as global. Bug fixed #1361859.
  • garbd would not work when cluster address was specified without the port. Bug fixed #1365193.
  • garbd was running as root user on Debian. Bug fixed #1392388.
  • Errors in garbd init script stop/start functions have been fixed. Bug fixed #1367956.
  • If mysqld gets killed during the SST it will leave an unclean data directory behind. This would cause Percona

Truncated by Planet PHP, read more at the original (another 6158 bytes)

Link
Jean-Jerome SchmidtHow to Deploy and Configure MaxScale for SQL Load Balancing with Read-Write Split (30.3.2015, 02:24 UTC)

There are two models of load balancing: transport and application layer. HAProxy is a great TCP load balancer, but it’s lack of SQL awareness effectively limits its ability to address certain scaling issues in distributed database environments. In the open source world, there’s been a few SQL-aware load balancers, namely MySQL Proxy, ProxySQL and MaxScale, but they all seemed to be in beta status and unfit for production use. So we were pretty excited when the MariaDB team released a GA version of MaxScale earlier this year. In this blog, we’ll have a look at MaxScale and see how it compares with HAProxy.

Deployment

Installation is easy, at least on the latest LTS version of Ubuntu (Trusty, 14.04) which we used for our tests.

Add a public key:

$ apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 8167EE24

Add a MaxScale repository to one of *.list files for apt-get:

deb [arch=amd64] http://downloads.mariadb.com/software/MaxScale/maxscale/DEB trusty main

Run:

$ apt-get update && apt-get install maxscale

and you can enjoy your new software - proxy is installed into the /usr/local/skysql/maxscale directory. 

 

Configuration

Once installed, we need to configure it. Along with installation comes an example configuration file, located in: /usr/local/skysql/maxscale/etc/MaxScale_template.cnf. It gives a nice introduction to the available options, and helps to setup the environment.

MaxScale uses a pluggable architecture with different plugins providing different features. In this post, we will concentrate on the routing part, and for now, leave out other interesting possibilities like query rewriting. MaxScale uses different types of services; monitors, services, listeners and filters. 

For our tests we defined two types of routing services: 

  • ‘router=readwritesplit’, which provides read/write (RW) splitting,
  • ‘router=readconnroute’, which provides round-robin-like (RR) kind of access.

Each service was accompanied by a listener, port 3307 for RW split and 3308 for RR service. With RR service, we relied on MaxScale’s monitoring of Galera nodes to route connections only to the nodes in a ‘Synced’ state.

[maxscale]
threads=4

[Galera Monitor]
type=monitor
module=galeramon
servers=server1,server2,server3
user=maxmon
passwd=maxpwd
monitor_interval=10000
disable_master_failback=1

[qla]
type=filter
module=qlafilter
options=/tmp/QueryLog

[fetch]
type=filter
module=regexfilter
match=fetch
replace=select

[RW]
type=service
router=readwritesplit
servers=server1,server2,server3
user=root
passwd=secretpass
max_slave_connections=100%
router_options=slave_selection_criteria=LEAST_CURRENT_OPERATIONS

[RR]
type=service
router=readconnroute
router_options=synced
servers=server1,server2,server3
user=root
passwd=secretpass

[Debug Interface]
type=service
router=debugcli

[CLI]
type=service
router=cli

[RWlistener]
type=listener
service=RW
protocol=MySQLClient
address=10.69.179.54
port=3307

[RRlistener]
type=listener
service=RR
protocol=MySQLClient
address=10.69.179.54
port=3308

[Debug Listener]
type=listener
service=Debug Interface
protocol=telnetd
address=127.0.0.1
port=4442

[CLI Listener]
type=listener
service=CLI
protocol=maxscaled
address=127.0.0.1
port=6603


[server1]
type=server
address=10.138.103.93
port=3306
protocol=MySQLBackend

[server2]
type=server
address=10.139.81.25
port=3306
protocol=MySQLBackend

[server3]
type=server
address=10.81.192.219
port=3306
protocol=MySQLBackend

There are couple of interesting bits in the configuration file. As you can see, we had to define user/password pairs several times. Those users are used to check the health of the MySQL nodes and to get access to the list of users defined in the system. For the sake of simplicity we used plain text passwords but it is possible to use hashed passwords for better security.&n

Truncated by Planet PHP, read more at the original (another 7984 bytes)

Link
Peter Zaitsev‘Woz on your mind?’ Share your questions for Steve Wozniak during his Percona Live keynote! (27.3.2015, 21:34 UTC)

Here’s your chance to get on stage with Woz! Sort of. Apple co-founder and Silicon Valley icon and philanthropist Steve Wozniak will participate in a moderated Q&A on creativity and innovation April 14 during the Percona Live MySQL Conference and Expo in Santa Clara, California.

Woz once said that he never intended to change the world. That was the other Steve, Steve Jobs.

“I didn’t want to start this company,” Woz told the Seattle Times of Apple’s beginnings in a 2006 interview. “My goal wasn’t to make a ton of money. It was to build good computers. I only started the company when I realized I could be an engineer forever.”

What would you ask Woz if given the opportunity?

“Woz, what first sparked your interest in engineering?”
“Hey Woz, how did you come up with the design for the first Apple?”
“Woz, what do you see as the next big thing in personal computers?”
“Hi Woz, what’s the deal with your giant vacuum tube watch?”

Now it’s your turn! Ask a question in the comments below and be sure to include your Twitter handle – or your Facebook page or LinkedIn profile. If we use your question, then your profile and question will be displayed on the giant screen behind Woz on stage as it’s being asked during his big keynote! How cool is that?

Want to be there in person? See Woz speak for just $5! That’s $70 off the regular admission price! Just use the promo code “KEY” at registration under the “Expo Hall and Keynote Pass” selection. Following Woz’s keynote, be sure to stop by the Percona booth, say “hello, Tom,” and I’ll give you a limited-edition Percona t-shirt. :)

In the meantime, help spread the word! Please share this tweet:

“Woz on your mind?” Tweet @Percona your questions for Apple’s Steve Wozniak who speaks April 14 at #PerconaLive! http://ow.ly/KTmES

Do that, then follow @Percona and I’ll send a DM for your address and will ship a t-shirt right to your door. See you at the conference!

The post ‘Woz on your mind?’ Share your questions for Steve Wozniak during his Percona Live keynote! appeared first on MySQL Performance Blog.

Link
Peter ZaitsevFoundationDB is acquired by Apple: My thoughts (27.3.2015, 13:00 UTC)

TechCrunch reported yesterday that Apple has acquired FoundationDB. And while I didn’t see any mention if this news on the FoundationDB website, they do have an announcement saying: “We have made the decision to evolve our company mission and, as of today, we will no longer offer downloads.”

This is an unfortunate development – I have been watching FoundationDB technology for years and was always impressed in terms of its performance and features. I was particularly impressed by their demo at last year’s Percona Live MySQL and Expo. Using their Intel NUC-based Cluster, I remember Ori Herrnstadt showing me how FoundationDB handles single-node failure as well as recovery from complete power-down – very quickly and seamlessly. We have borrowed a lot of ideas from this setup for our Percona XtraDB Cluster Demos.

I think it was a great design to build a distributed, shared-nothing transaction aware key value store, and then have an SQL Layer built on top of it. I did not have a chance to test it hands-on, though. Such a test would have revealed the capabilities of the SQL optimizer – the biggest challenge for distributed relational database systems.

My hope was to see, over time, this technology becoming available as open source (fully or partially), which would have dramatically increased adoption by the masses. It will be interesting to see Apple’s long-terms plans for this technology.

In any case it looks like FoundationDB software is off limits. If you are an existing FoundationDB customer looking for alternatives, we here at Percona would be happy to help evaluate options and develop a migration strategy if necessary.

The post FoundationDB is acquired by Apple: My thoughts appeared first on MySQL Performance Blog.

Link
Jean-Jerome SchmidtHow to Manage the World’s Top Open Source Databases: ClusterControl 1.2.9 Features Webinar Replay (26.3.2015, 14:35 UTC)

Thanks to everyone who attended and participated in this week’s webinar on 'New Features - ClusterControl 1.2.9 Release'. If you missed the sessions or would like to watch the webinar again & browse through the slides, they are now available online.

Our speaker this time was Johan Andersson, CTO, Severalnines.

Watch the replay

 

Read the slides

 

With over 7,000 users to date, ClusterControl is the leading, platform independent automation and management solution for the MySQL, MongoDB and now Postgres databases, its latest main feature. 

psql_dash.png

Highlights in ClusterControl 1.2.9 include:

  • Support for PostgreSQL Servers
  • Advanced HAProxy Configurations and Built-in Stats
  • Hybrid Replication with Galera Clusters
  • Galera Replication Traffic Encryption
  • Encrypted Communication between ClusterControl and MySQL-based systems
  • Query Deadlock Detection in MySQL-based systems
  • Bootstrap Galera Cluster
  • Restore of Backups
  • New UI theme
  • RPC interface to ClusterControl
  • Chef Recipe and Puppet Manifest for ClusterControl
  • Zabbix Plugin for ClusterControl

 

RELATED BLOGS

 

ABOUT CLUSTERCONTROL

Setting up, maintaining and operating a database cluster can be tricky. ClusterControl gives you the power to deploy, manage, monitor and scale entire clusters efficiently and reliably. ClusterControl supports a variety of MySQL-based clusters (Galera, NDB, 5.6 Replication), MariaDB as well as MongoDB/TokuMX-based clusters - and now Postgres.

Truncated by Planet PHP, read more at the original (another 4015 bytes)

Link
Jean-Jerome SchmidtHow to Cluster Atlassian Server Apps in your Data Center - JIRA, Confluence, Stash with MySQL Galera Cluster (25.3.2015, 13:14 UTC)

Atlassian apps are widely used by development teams to facilitate software development and collaboration. For larger teams who require enterprise-grade tools, you might want to host the applications in your own data center and cluster them for high availability and performance. 

In this blog post, we are going to deploy a redundant active-passive setup for JIRA with clustered MySQL and NFS shared storage. Note that some Atlassian applications (JIRA, Confluence and Stash) already has built-in clustering support, and these require you to have a Data Center license. In this blog, we will show you how to cluster the regular/standalone JIRA server product by deploying an active JIRA server with an additional passive stand-by instance for failover purposes.

The following is our architecture diagram:

All hosts are running Debian Wheezy 64bit, we will use Galera Cluster for MySQL as database backend. Iptables is disabled to simplify the deployment. JIRA and MySQL services are connected via a virtual IP to provide single point of access.

Our setup consists of 4 servers:

  • lb1: HAproxy + keepalived (master)
  • lb2: HAproxy + keepalived (backup) + ClusterControl + garbd
  • jira1: JIRA (active) + database server
  • jira2: JIRA (passive) + database server

 

Prerequisite

1. Ensure all hosts have the following host definition inside /etc/hosts:

192.168.50.100     jira virtual-ip
192.168.50.101     lb1.local lb1
192.168.50.102     lb2.local lb2 clustercontrol
192.168.50.111     jira1.local jira1 mysql1
192.168.50.112     jira2.local jira2 mysql2

2. Ensure each host has proper FQDN set up as per host definition above. For example on lb1:

$ hostname -f
lb1.local

 

Deploying Galera Cluster for MySQL

1. To set up Galera Cluster, go to the Galera Configurator to generate a deployment package. In the wizard, we used the following values when configuring our database cluster (note that we specified one of the DB nodes twice under Database Servers’ text field) :

Vendor                 : Codership
MySQL Version          : MySQL 5.6.x
Infrastructure         : none/on-premises 
Operating System       : Debian 7.1.x
Number of Galera Servers : 3 
Max connections        : 200 
OS user                : root 
ClusterControl Server  : 192.168.50.102
Database Servers       : 192.168.50.111 192.168.50.112 192.168.50.112

At the end of the wizard, a deployment package will be generated and emailed to you.

2. Download and extract the deployment package:

$ wget http://www.severalnines.com/galera-configurator3/tmp/wb06357009915302877/s9s-galera-codership-3.5.0.tar.gz
$ tar -xzf s9s-galera-codership-3.5.0.tar.gz

3. Before we proceed with the deployment, we need to perform some customization when deploying a two-node Galera cluster. Go to ~/s9s-galera-codership-3.5.0/mysql/config/cmon.cnf.controller and remove the repeated node IP next to mysql_server_addresses so it becomes as below:

mysql_server_addresses=192.168.50.111,192.168.50.112

4. Now we are ready to start the deployment:

$ cd ~/s9s-galera-codership-3.5.0/mysql/scripts/install/ 
$ bash ./deploy.sh 2>&1 | tee cc.log

5. The database cluster deployment will take about 15 minutes, and once completed, the ClusterControl UI is accessible at https://192.168.50.102/clustercontrol . Enter the de

Truncated by Planet PHP, read more at the original (another 18191 bytes)

Link
Peter ZaitsevYelp IT! A talk with 3 Yelp MySQL DBAs on Percona Live & more (25.3.2015, 10:00 UTC)

elp IT! A talk with 3 Yelp MySQL DBAs heading to Percona Live 2015Founded in 2004 to help people find great local businesses, Yelp has some 135 million monthly unique visitors. With those traffic volumes Yelp’s 300+ engineers are constantly working to keep things moving smoothly – and when you move that fast you learn many things.

Fortunately for the global MySQL community, three Yelp DBAs will be sharing what they’ve learned at the annual Percona Live MySQL Conference and Expo this April 13-16 in Santa Clara, California.

Say “hello” to Susanne Lehmann, Jenni Snyder and Josh Snyder! I chatted with them over email about their presentations, on how MySQL is used at Yelp, and about the shortage of women in MySQL.

***

Tom: Jenni, you and Josh will be co-presenting “Next generation monitoring: moving beyond Nagios ” on April 14.

You mentioned that Yelp’s databases scale dynamically, and so does your monitoring of those databases. And to minimize human intervention, you’ve created a Puppet and Sensu monitoring ensemble… because “if it’s not monitored, it’s not in production.” Talk to me more about Yelp’s philosophy of “opt-out monitoring.” What does that entail? How does that help Yelp?

Jenni: Before we moved to Sensu, our Nagios dashboards were a sea of red, muted, acknowledged, or disabled service checks. In fact, we even had a cluster check to make sure that we never accidentally put a host into use that was muted or marked for downtime. It was possible for a well-meaning operator to acknowledge checks on a host and forget about it, and I certainly perpetrated a couple of instances of disks filling up after acknowledging a 3am “warning” page that I’d rather forget about. With Sensu, hosts and services come out of the downtime/acknowledgement state automatically after a number of days, ensuring that we’re kept honest and stay on top of issues that need to be addressed.

Also, monitoring is deployed with a node, not separate monitoring configuration. Outside of a grace period we employ when a host is first provisioned or rebooted, if a host is up, it’s being monitored and alerting. Also, alerting doesn’t always mean paging. We also use IRC and file tickets directly into our tracking system when we don’t need eyes on a problem right away.


Tom: Susanne, in your presentation, titled “insert cassandra into prod where use_case=?;” you’ll discuss the situations you’ve encountered where MySQL just wasn’t the right tool for the job.

What led up to that discovery and how did you come up with finding the right tools (and what were they) to run alongside and support MySQL?

Susanne: Our main force behind exploring other datastores alongside MySQL was that Yelp is growing outside the US market a lot. Therefore we wanted the data to be nearer to the customer and needed multi-master writes.

Also, we saw use cases where our application data was organized very key-value like and not relational, which made them a better fit for a NoSQL solution.

We decided to use Cassandra as a datastore and I plan to go more into detail why during my talk. Now we offer developers more choices on how to store our application data, but we also believe in the “right tool for the job” philosophy and might add more solutions to the mix in the future.


Tom: Jenni, you’ll also be presenting “Schema changes multiple times a day? OK!” I know that you and your fellow MySQL DBAs are always improving and also finding better ways of supporting new and existing features for Yelp users like me. Delivering on such a scale must entail some unique processes and tools. Does this involve a particular mindset among your fellow DBAs? Also, what are some of those key tools – and processes and how are they used?

Jenni: Yelp prizes the productivity of our developers and our ability to iterate and develop new features quickly. In order to do that, we need to be able to not only create new database tables, but also modify existing ones, many of which are larger than MySQL

Truncated by Planet PHP, read more at the original (another 7195 bytes)

Link
Peter ZaitsevCalling all MySQL DBAs: How do you use Percona Toolkit? (24.3.2015, 07:00 UTC)

Percona Toolkit is one of our most mature open source applications. Derived from Maatkit and Aspersa, Percona Toolkit has evolved significantly over the years. The software now contains 32 tools, over 4,000 tests, and has been downloaded over 250,000 times. Anyone who manages a database – from DBAs to system administrators to even software developers – benefits from Percona Toolkit’s ability to perform a variety of MySQL server and system tasks that are too difficult or complex to perform manually.

We continue to make Percona Toolkit better each month. Over the last 9 months alone Percona has had 6 releases and resolved nearly 50 issues.

pt2-2-releases-2014-2015

While Percona team members in Support, Consulting, and Managed Services are big drivers of identifying bugs and new features (driven mostly by Percona customer needs), the community of Percona Toolkit users plays a significant role in making the open source software what it is today.

We’d like to learn how we can make Percona Toolkit even better for your needs. Please take a brief survey so we can learn how you actually use the software. As a thank you for taking the survey, we are randomly giving away five $50 Amazon.com gift cards to participants. It’s a small token but one that we hope you’ll appreciate.

Recent additions to Percona Toolkit have included better Percona XtraDB Cluster support as well as multiple fixes and improvements to pt-online-schema-change, pt-kill, pt-query-digest, pt-stalk, and preparation for the MySQL 5.7 GA. Help us continue to improve Percona Toolkit by taking part in our survey. If you use Percona Toolkit and are attending Percona Live next month, please keep a look out for me. I’d like to hear about your experiences.

The post Calling all MySQL DBAs: How do you use Percona Toolkit? appeared first on MySQL Performance Blog.

Link
Jean-Jerome SchmidtMonitoring Galera Cluster for MySQL or MariaDB - Understanding and Optimizing CPU-related InnoDB metrics (23.3.2015, 13:05 UTC)

The performance of a Galera cluster is strongly connected to the performance of MySQL. Galera only supports the InnoDB storage engine, it is therefore important to have an insight into how InnoDB operates  what metrics we can rely on and what kind of issues we might face. In this blog post, we will give you a tour of some of the InnoDB internals. Note that we covered Galera monitoring and host/OS monitoring in previous blogs.

For starters, we’d like to emphasize this is by no means comprehensive guide. Some details will not be mentioned here in this blog post  it’s a topic you can write a book about. 

This post assumes MySQL 5.6, which has been GA since February 2013. In MySQL 5.6, InnoDB does a pretty good in terms of scaling on a number of CPUs. Depending on your workload, it should not come as a surprise to see 32 cores fully utilized by the database  something that was not really the case in older MySQL versions. The database is not lock-free internally and, under certain workloads, it may suffer from internal contentions. By internal contentions we mean here mutexes and latches  mechanisms which protect some part of the code, which cannot be executed in parallel by multiple threads or which protect some data structures, which in turn cannot be accessed by multiple threads at once.

Let’s take a look at some examples. The most popular latch contention that was haunting DBAs prior to version 5.6 is the infamous btr_search_latch  a latch that is protecting InnoDB Adaptive Hash Index. AHI is a data structure where InnoDB builds it’s own internal hash index and stores data about some of the most frequently used index pages. It’s all automatic and, well, adaptive  main point is to speed up some type of workload (index-based lookups, small number of rows involved). The main problem with an Adaptive Hash Index is that the access to it has to be serialized to maintain data consistency. Under small to medium workload it should not be a problem but while under high concurrent workload, the latch that protects AHI was known to limit the server’s performance. Pretty often, users disabled this feature altogether to get more scalability. This was actually fixed in MySQL 5.6 with the introduction of multiple Adaptive Hash Indexes where the AHI is split across several partitions, each protected by it’s own latch. Thanks to that, contention is splitted across multiple partitions and as a result, InnoDB’s ability to execute concurrent queries increased significantly.

As for mutexes, the best known was most probably InnoDB’s kernel_mutex, which was a nightmare for DBA’s managing MySQL up to 5.5. This global kernel mutex was used to protect the main points in the code like getting transaction lists, locking rows etc. The problem was that it was also used in some less important parts of the code. Therefore, under higher concurrency, it was severely limiting MySQL performance. In MySQL 5.6 kernel_mutex was replaced by new mutexes and rw-locks, which allowed this version to scale properly.

Even though some of the largest latch or mutex-related issues were fixed in MySQL 5.6, it is still possible to be affected by InnoDB’s internal contention. Let

Truncated by Planet PHP, read more at the original (another 14031 bytes)

Link
Peter ZaitsevChoosing a good sharding key in MongoDB (and MySQL) (19.3.2015, 07:00 UTC)

MongoDB 3.0 was recently released. Instead of focusing on what’s new – that is so easy to find, let’s rather talk about something that has not changed a lot since the early MongoDB days. This topic is sharding and most specifically: how to choose a good sharding key. Note that most of the discussion will also apply to MySQL, so if you are more interested in sharding than in MongoDB, it could still be worth reading.

When do you want to shard?

In general sharding is recommended with MongoDB as soon as any of these conditions is met:

  • #1: A single server can no longer handle the write workload.
  • #2: The working set no longer fits in memory.
  • #3: The dataset is too large to easily fit in a single server.

Note that #1 and #2 are by far the most common reason why people need sharding. Also note that in the MySQL world, #2 does not imply that you need sharding.

What are the properties of a good sharding key?

The starting point is that a cross-shard query is very expensive in a sharded environment. It is easy to understand why: the query has to be executed independently on several shards, and then results have to be merged.

With MongoDB, mongos will transparently route queries to the right shards and will automatically merge the results: this is very handy but the hidden complexity can also make you forget that you have executed a very poorly optimized query.

This is where the choice of a sharding key is so critical: choose the right key and most queries will be simple and efficient, choose a wrong one and you’ll have ugly and slow queries.

Actually a good sharding does not need to have tens of properties, but only two:

  • Insertions should be as much balanced as possible across all shards.
  • Each query should be able to be executed by retrieving data from as little shards as possible (ideally 1 shard).

Sounds quite easy, right? However depending on your use case, it may be quite difficult to find a good sharding key. Let’s look at a few examples.

Social Network

Say we have users who can be connected to other users, who can read or write posts and who have their own wall.

All 3 collections can become very large, so sharding all will be necessary over time.

For the user and wall collection the user_id field is an obvious choice and it actually meets both criteria for a good sharding key.

For the post collection, user_id also looks like an obvious choice, but if you think about how the collection is accessed for reads, you will realize that you will need to fetch it using its post_id, not its user_id (simply because a user can have multiple posts). If you shard by user_id, any read to a single post will be broadcast to all shards: this is clearly not a good option.

So using post_id is a better choice. However it only meets criteria #2: most posts are never updated, so all the writes are insertions that will go to a single shard. However the traffic on the post collection is strongly in favor of reads, so being able to speed up reads while not slowing down writes is probably an acceptable tradeoff.

Access Logs

The workload here is very specific: write-intensive and append-only.

Sharding by ObjectId is definitely a bad idea: while data can be easily spread across all shards, all writes will only go to one shard, so you will have no benefit compared to a non-sharded setup when it comes to scale the writes.

A better solution is to use a hash of the ObjectId: that way data AND writes will be spread across all shards.

Another good option would be to use another field in your documents that you know is evenly distributed across the whole dataset. Such field may not exist though, that’s why hashing the ObjectId is a more generic solution.

Ecommerce

MongoDB can be a good option to store a product catalog: being schemaless, it can easily store products with very different attributes.

To be usable such a catalog must be searchable. This means that many indexes need to be added, and the working set will probably grow very quickly. In this case your main concern is probably not to scale the writes, but to make reads as efficient as possible.

Sharding can be an option because if done properly, each shard will act as a coarse-grained index. Now the issue is to find which field(s) will evenly distribute the dataset. Most likely a single field will not be enough, you will have to use a compound sharding key.

Here I would say that there is no generic solution, but if the products are for instance targeted at either kid, woman or man and if you have several categories of products, a potential sharding

Truncated by Planet PHP, read more at the original (another 1081 bytes)

Link
LinksRSS 0.92   RDF 1.
Atom Feed