Database communication methods comparison: SQL & NoSQL
Everything that is or can be processed by our mind or system is called data. This “processability” (i.e. the possibility of further use of information) in the case of the system, is possible when we record these data in a form that it can use. One of such forms is a data set called a database. It is its structure, method of storage, and communication with it that will be the main subject of this article.
Database communication methods: SQL database
It is a database based on relations between tables that can be managed by the SQL language. The name SQL database is actually a simplification because SQL is a structured query language. To access it we can use this language or direct database connection or ORM (object-relational mappers).
Structure
The data in relational databases are stored in the main objects that contain a set of database tables, also called schemas. It is thanks to them that all records in the database table are consistent. The schema also acts as a security boundary where you can limit database user permissions to only a specific schema level, similar to operating system file and folder permissions such as Windows or Linux where folders have owner and group permissions.
Arrays are composed of columns containing attributes and rows (records of these tables). In the columns, we define the attributes included in them. Information whether they are or what they have:
Primary key – that is nothing more than the unique identifier of each record in the table. The unique word says that the values in this column cannot be repeated. The primary key column cannot also store NULL values, so the record must always have some value of this attribute.
Foreign key – The foreign key is used to define relationships between tables. The column (or columns) that we define as a foreign key in one table is linked to the column (s) that is the primary key in the other table. This means that the values stored in the column that is defined as the foreign key in the first table will always have a counterpart in the one that is defined as the primary key in the second table.
Constraints – Containing the data type constraints that the column can accept. (E.g. VARCHAR / Date / INT etcetera)
Indexing and relationships
Imagine you come home after a day’s work and have to prepare dinner. Today’s dish will be the fish, fried with rosemary, salt, pepper, and lemon. When we take out the necessary ingredients, we do not know exactly where the fish is in the freezer or where the fish knife is in the cupboard. We know, however, that these elements are in the freezer on the meat shelf and in the cupboard in the compartment that holds ‘large knives’. The same logic guides the relative database, more specifically the indexing mechanics. When indexing the knife, we do not have to pick up the spoons and forks one by one, searching the entire cabinet, but only check the partition with the appropriate type of knives. When looking for a knife, we do not know where it is exactly, but we do know to look for it in the cupboard drawer, from the first from the top, in the compartment for large knives. We will also never find a spoon in the knife compartment.
ACID (Atomicity, Consistency, Isolation, Durability)
By speaking of relational databases we also need to mention ACID. It is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other f-ups.
Atomicity shines when several operations are performed in one transaction, they will not be performed if one of the operations fails. It is like the one for all principle, all for one. The best examples are banking operations, in one transaction we have operations to change the account balance of the sending person and the account of the receiving person. It cannot be that the sending money operation fails, but the balance on the other person’s account will increase. In the event of an error, the transaction is aborted.
Consistency is related to the principles of data integrity. It ensures data consistency through the use of mechanisms that prevent data from being changed in an unauthorized way. Returning to the banking example, this could be a huge financial loss.
Isolation is very important when several people use our database at the same time. Users must be able to use the same data and have the illusion that he is the only user of the database.
Durability It is related to the durability of the data in the transaction, e.g. in the event of a failure, we are able to restore the data.
“NewSQL” and “Distributed SQL”
A couple of years ago several organizations began building relational/SQL-based systems that needed to abandon some traits in favor of horizontal scalability. The effect is down two largely distinct approaches:
NewSQL: A play on NoSQL, these systems typically take existing relational databases and layer distributed logic on top with varying degrees of transparency to the user. Citus and Vitess are two notable examples.
Distributed SQL: These systems take a “ground-up” approach in building horizontally scalable relational engines. CockroachDB and Google’s Spanner are good examples here.
[table id=13 /]
Examples
Some of popular Relational Database Management Systems (RDBMS):
Database communication methods: NoSQL databases
Probably the easier definition to remember. NoSQL database, originally “non SQL” or “non-relational” is a database that provides a mechanism for storing and retrieving data modeled in any way other than tabular relationships used in SQL database relationships. In short, every database that is not SQL is NoSQL. To access such databases regarding the chosen solution we need to use REST APIs or use vendor-specific language.
Structure
Depending on the chosen solution in NoSQL we distinguish the following main 4 data models:
Key-Value storage
Let’s begin with the easiest model which is actually used within most other models which extends it with additional features.
This type does not impose a specific data schema on us. Traditional relational databases have a predefined data structure in the database as a collection of tables containing fields with well-defined data types. The disclosure of data types in a database program enables it to apply a number of optimizations. In contrast, key-value systems treat data as a single opaque file that can have different fields for each record. In each key-value pair, the key is represented by any string, such as filename, file path, or hash. The value can be anything. It can be an image or a document. The value is stored as a blob that requires no prior data modeling or schema definition.
This is what gives us a great deal of flexibility. Optional values are not represented by placeholders as in most relative bases. Key-value stores often use much less memory to store the same database, which can lead to large performance gains under some workloads.
Storing the value as a blob eliminates the need to index our data to improve performance. Nothing for free, unfortunately, we lose the ability to filter and control what is returned from the request based on the value, because it is unpredictable for us.
In general, key-value stores do not have a query language. They enable us to store, retrieve and update data with simple commands. We retrieve data by referencing directly to an object in memory or on the server’s disk. The simplicity of this model makes the key-value model fast, easy to use, scalable, portable, and flexible.
[table id=14 /]
Examples
There are many DBMS that supports key-value storage, for example, these 3:
Document storage
Data and metadata are stored hierarchically in collections of JSON-based documents in the database. Theoretically, we store data in different formats, even without a consistent schema, in a document-based database. In practice, however, the file format is usually used for documents and the information is organized in a specific structure. This makes it easier for us to work with the database. For example, by using data structures, our database searches can be processed much more efficiently. You can do the same things in a document-based database as you do in SQL databases: information can be added, changed, deleted, and searched. To enable the above-mentioned activities to be performed, each document is assigned a unique identifier. It is not particularly important how we build a given identifier.
To address a document, we can use both a simple series of numbers, a string (for example, its full path). When searching for information, the documents themselves are checked. In other words, data is taken directly from the documents, not from the columns in the database as is the case with relational databases.
[table id=15 /]
Examples
DBMS used for document storage databases are for example:
Graph storage
Quite simply, a chart database is a database designed to treat relationships between data as important as the data itself. The main task is to store data without being limited to a predetermined model. Instead, the data is stored as we receive it first – showing how each entity connects to or is related to each other.
Of course like in most technologies, there are several different approaches to what constitutes the key components of a graph database. The one we going to describe is the property graph model, where data is organized as nodes, relationships, and properties (data stored in nodes or relationships).
Nodes are the entities in the graphs. They can contain any number of attributes (key-value pairs) called properties. They can be labeled with for example metadata (such as index or constraint information).
Relationships provide targeted, named, semantically relevant connections between two nodes. For example, Simon is the son of James. A relationship always has a direction, type, start node, and end node. Like nodes, relationships can also have properties. In most cases, relationships have quantitative properties such as weights, ratings, time frames, or strengths. Due to the efficient way in which relationships are stored, two nodes can share any amount or type of relationship without sacrificing performance. Although they are stored in a certain direction, you can always navigate efficiently in both directions.
[table id=16 /]
Examples
Some of DBMS used for graph storage databases:
Wide column storage
A store with wide columns (or extensible record stores) is a type of NoSQL database that uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can differ from row to row in the same table. The wide-column magazine can be interpreted as 2D Key-Value storage.
Columnar databases store each column in a separate file. One file stores only the key column, the other only the name, the other the type, and so on. Each column in the row is managed by automatic indexing, which means that the read / polled column offset corresponds to the other column offsets in that row in the corresponding files.
DBMS like Cassandra or Bigtable uses column families, database objects that contain columns of related data that consist of a key-value pair where the key is mapped to a value that is a set of columns. In analogy with relational databases, a column family is as a “table”, each key-value pair being a “row”. Each column is a set of 3 items consisting of a column name, a value, and a timestamp. In a relational database table, this data would be grouped together within a table with other non-related data.
[table id=17 /]
Examples
Some of DBMS that are used for wide column databases:
CAP rule
Have you ever heard about Price, Time, Quality schema, where clients need to select 2 by the cost of the third one? The same applies for NoSQL, but the difference is a selection between consistency, availability and partition tolerance. NoSQL DBMS can support up to 2 traits never all at once (at least at the time this article was published).
Consistency – means that all clients can see the same data at the same time no matter which node they are connected to. To do this, every time data is written to one node, it is immediately forwarded or replicated to all other nodes in the system before success response is received.
Availability – means that every client making a request gets a response to that request, even if one or more nodes are working. So there can not occur situations that one node action will not be executed while it would be on another node.
Partition tolerance – is a communications break within a distributed system—a lost or temporarily delayed connection between two nodes. Partition tolerance means that the cluster must continue to work despite any number of communication breakdowns between nodes in the system.
So if we are looking for a NoSQL solution, we at first shall search which engine supports the 2 traits we need.
Beauty or Beast?
SQL relations are order and beauty but they come at the high cost of performance. NoSQL is literally a ‘beast’ in case of performance but can generate a very painful headache if used improperly or in an inconsiderate manner. So can we finally answer the question: ‘Beauty or Beast’?
[table id=18 /]
Database communication methods: SQL or NoSQL?
It may be surprising but after all, we learned today it is still not a simple question at all. Many on the internet will say: “..if you got a big database you should always go with NoSQL…” which is NOT TRUE. Probably it comes from the fact that people do not understand the concept of horizontal scaling of NoSQL/SQL solutions or, worse, believe that SQL databases cannot be horizontally scaled. Even some giants had a dilemma in the past and needed time-consuming R&D before the implementation of their solution to their scaling problem. For some noSQL surprisingly was not a good solution and went for SQL sharding/replication [Link to Pinterest article], [Link to Etsy solution], [Link to slack article] but for some, it was the beast they were looking for [Link to article about Apple noSQL]. NoSQL cannot be worse or better than its semantic counterpart SQL, because of the fact that both of these database types are suited for different data(structured or unstructured) and thus will never replace or outshine each other.