Database scaling methods comparison

It’s a beautiful summer afternoon, we’ve finished our last project and we’re going on a well-deserved vacation. Suddenly we hear about our application in the media, which makes us even more proud. After a while, however, the client calls and it turns out that our application has stopped working. The administrator analyzes the situation and makes a diagnosis, that the database bandwidth is to blame, or more specifically the lack of it.

In this article, I present some ways of database scaling methods as well as suggest what tricks you can use with smaller funds to control the increased traffic, and not end up on bivouac vacation in the office.

In the context of database scaling, the administrators consider 2 main approaches to the problem:

Vertical database scaling

Vertical scaling is simply connected with improving the processing power of the server, by increasing its hardware capacity (CPU, RAM, SSD etcetera). This may sound like a common and natural option because it’s relatively easy to have a relational database running on a single machine and scale it up as necessary by upgrading its computing resources. Anyway, this may be deceptive as the current technological solutions can effectively lower the maximum possibilities of such scaling, and thus it is impossible to scale indefinitely. Also, the ratio of the ‘performance to cost’ effect is less favorable and we pay an incomparably high price for small benefits in terms of hardware parameters, the more we scale our base bandwidth.

vertical scalling database method

 

Cheats & Tricks

Although vertical database scaling has its limits (or our wallet has), we can still try to overextend performance by:

  • Implementing cache (storing data that has already been requested in memory, which allows for faster access to them in the future.)
  • Moving database to separate machine to have all resources on its own, without need to share them with HTTP server and other services.
  • If the above does not help consider database indexes check, and database engine upgrade if possible.
  • Ask developers to optimize queries.

Horizontal database scaling

The horizontal solution, on the other hand, achieves performance by the increased number of servers. Theoretically, we could scale infinitely and have as many servers in parallel as we want, which is why horizontal scaling is the most preferred option when databases have to scale. As servers are distributed, we gain the benefits of being able to store more data. Awesome right?

horizontal scalling database graph

Horizontal solutions

Sharding

One of the horizontal database scaling methods is sharding. This method works effectively, regardless of whether we decided to use SQL or NoSQL solutions. We’re slicing up the database into multiple pieces (so called shards). Each one has a unique index that corresponds to the type of data it stores. 

To illustrate it we will make a comparison to file cabinets in a police station. For example, one cabinet will store missing people cases with surnames from A-G, the other one H-N, and the last one O-Z. If someone with a surname for example Kovalsky disappears, his case will go to cabinet H-N. Surname is our ‘shard key’. Of course, nothing prevents us from dividing it for example by range of dates on which the disappearance took place, or any other criteria. This solution is called ‘range-based sharding’ because as the name says it uses ranges. It is effective for higher graduality of data, for example, prices.

scharding graph

For less granularity, for example, if our shard key is the ‘country’ parameter, we would use ‘directory based sharding’. The main rule is the same but the sharding key is not a range so it will move to shard corresponding to shardkey. For example data with shard key-value, ‘USA’ will go to shard no.1 while ‘Poland’ to shard no.2. So depending on Kovalsky’s nationality he will go to a different corresponding shard.

sharding graph 3

There is also sharding based on keys “Hash sharding”. The difference here is that the shard key is a result of a hash function (for example of md5(username) as we are trying to use data that does not change over time to avoid the need for the next redistribution over shards). The mapping table works as in the previous example but uses different (‘hashed’) values.

It all depends on how we planned our database to spread the data more or less evenly. It is a very logical concept.

So where’s the catch?

It’s essential to note that database sharding comes at a cost, especially for relational databases (SQL solutions). It isn’t natively supported by every database engine. PostgreSQL for example requires manual sharding as it does not provide any automated features of this type, which forces updating sharding logic every update or forces administrators to keep outdated database engines.

The high cost of maintainability is also a case here. Imagine that we need for example to change the schema and sharding method because of changes during the development process. Unfortunately, it may be painful.

And finally, complexity may cause a headache and corruption of data if something will be done incorrectly, so backups are a must here.

Replication

Replication is a solution where we maintain multiple copies of a source database by having it copied automatically from a master database server to a slave database server (also called ‘replica’). We will save all sent data to the master database, but to read data we will use all the servers (especially slave ones) which will effect with better data read performance.

Replication database graph

Conclusion – Pros & Cons of database scaling

VERTICAL SCALING

[table id=11 /]

HORIZONTAL SCALING

[table id=12 /]

*Only if we maintain and use our own server room.

Final Words

Now you know about horizontal and vertical scaling techniques, you have a starting point. It is worth noting that in most cases it is not a choice between horizontal and vertical scaling, but rather a mix of both approaches to the subject. Rather, it is certain that the industry will increasingly shift to a horizontally distributed approach to scaling architecture. This tendency results from the demand for greater reliability, as well as the development of services and technologies that simplifies horizontal scaling.

The selection of relational or non-relational databases depending on the project is also worth attention, anyway, this is a subject for another article.

Share with: