Databases (SQL and NoSQL) – overview, comparison, examples

Choosing the right database may be obvious for you if you don’t have complex requirements and just need to store data in any way. In such cases it’s as simple as choosing the database you already know or one you want to learn. However, it’s often not that easy, especially for more complex projects with some important requirements related to performance, security, scalability etc.

Database types

For quite a long time I was aware of SQL and NoSQL databases. When it comes to NoSQL – I recognized in-memory, key-value and document databases but had no idea that there are also other types. Most of my colleagues had also no idea there are so many database types, some of them built for very specific use cases so there is a chance that there is a better database for your use case than most popular ones.

The below table presents the types with a short description so you can get an overview:

Database typeData organization
RelationalTables consisted of rows and columns. Each row represents a single record and each column is an attribute of the record. The data structure is enforced using table schemas.
In-memoryData is stored in memory instead of the disk (as in the case of other databases).
Key-valueA set of key-value pairs (like in a dictionary) with no enforcement of data structure.
DocumentAn extension of the key-value concept where each key is mapped to a document instead of a value. Documents consist of attributes, i.e. document is a set of key-value pairs.
Wide columnData is organized in columns instead of rows. Each column is stored separately so generally, it is more efficient to read the whole column than to read the whole row compared to relational databases. It also means that each row may have a different set of columns within the same table.
Compared to the key-value type – it can be defined as yet another extension of the key-value type where two keys are required to get a single value – the first key identifies a row (returning multiple columns), the second key is a column identifier making it possible to select a single column from a given row.
GraphGraph structure – instead of tables or documents there are nodes and edges (relationships between nodes). Structure focused on relationships among data.
Time seriesTimestamp is a key and data is always associated with one of the timestamps. Data is organized in such a way that makes analysis in the context of time very efficient.
LedgerAn immutable journal is used to store data where – the data can only be appended to the journal but it cannot be modified. All changes in the database are recorded and cannot be altered. Data deletion or update is performed by appending proper information to the journal. Journal’s immutability is guaranteed by organizing its data as blocks of data chained in a sequence using cryptographic hashing – similar to Blockchain.
Database typeData organization

Some databases can be assigned to more than one type, e.g. AWS DynamoDB combines key-value and document database types while Hazelcast incorporates in-memory and key-value types. Please remember that many databases also add some extra features and the descriptions are generic ones so in some cases there may be differences.

Relational (SQL database)

Most commonly used type of database. Data is stored in tables. A single row in a table represents a single piece of data. Rows consist of columns where attributes of this data piece are stored.

For each table, there must be a schema defined. It means that each row in a table must have the same set of attributes and each of these attributes has a type specified along with additional data (like nullable/non-nullable or max length).
Schemas must include also information about the primary key used to organize and query data where the key is defined by single or multiple columns. Optionally, there may be also other constraints like unique (values in the specific column must be unique) or foreign keys (attribute in one table refers primary key of another table). Primary key and not-null attributes are also part of the constraints.

SQL language is used to read and write data in relational databases.

Advantages:

  • Easy to learn – simple model makes it easy to understand how it works + SQL language is easy to learn too,
  • Most implementations are compliant with ACID by default,
  • Enforced expected data structure – it isn’t possible to insert data not compatible with table schema,
  • SQL language is supported enabling users to perform really complex queries,
  • Many possibilities to set up security, most well-known databases provide built-in tools to handle it, e.g. by granting restricted privileges to the users so it’s easy to set up access to a limited subset of data (e.g. only to some tables),
  • Versatility – a good choice for many different use cases,
  • Various options to improve performance like indexes or optimize SQL queries (e.g. using SQL query profilers) etc.,
  • Equally easy access to all tables so there is no need for any hierarchical traversing to access some data,
  • Multiple users can access the same database and the same data at the same time,
  • There are often additional features built-in into this type of databases like triggers or stored procedures.
  • A lot of tools make it easier to manage the database or to improve queries (like query profilers).

Disadvantages:

  • Performance may be an issue without proper knowledge of how to deal with it and it isn’t always so easy to create proper indexes or optimize queries, e.g. when multiple JOIN operations in queries are required, it may take a lot of time to perform the queries,
  • Relatively high requirements regarding physical memory and disk space,
  • Usually hard to scale – in most cases it’s not that easy to split data into multiple physical servers,
  • A significant cost of best enterprise relational databases,
  • Basically, they are unable to handle unstructured or semi-structured data. It’s somehow possible by using text types, some relational databases even support JSON types but it isn’t as effective as in some other databases (e.g. key-value or document types).

Sample use cases:

  • Most relational databases are the default choice and good (or best) ones for most use cases

In-memory

In-memory databases keep data only using memory (RAM) so they are often chosen for their performance which is much higher than in the case of other databases where permanent data storage is used to read and write data.

While nothing is saved permanently in this type of database, it isn’t always a disadvantage, e.g. when using it for caching purposes. Besides, there are ways to mitigate the risk of losing data using redundancy (multiple instances of the same database allowing to easily switch to another instance one previous one is destroyed or damaged), by snapshots (dumping database to permanent storage from time to time) or non-volatile RAM (memory which doesn’t lose its data when power is off).

This type of database is usually applied to real-time applications or parts of applications that require real-time responses.

Advantages:

  • Blazingly fast, performance is much better than in the case of the other databases where data is stored permanently,
  • Good choice for unit tests to speed up the tests and avoid the necessity to write any files on disk during the tests.

Disadvantages:

  • Data may be lost if the database is damaged or destroyed. To mitigate this risk this type of database may be either used to store only such type of data which can be safely lost (e.g. cache) or additional mechanisms must be used to secure the data like replication, non-volatile memory or creating snapshots on a regular basis,
  • Higher RAM usage compared to most other databases,
  • Memory is much more expensive than persistent storage so either database’s capacity is limited or costs can reach really high level. That’s why this database shouldn’t be used to store a large amount of data or you must be prepared to pay a lot of money,
  • Limited amount of memory which can be installed in a single server.

Sample use cases:

  • Cache,
  • Analytics,
  • Session store,
  • Gaming leaderboards,
  • Finance,
  • Defence,
  • Telecom,
  • Streaming apps,
  • Reservations apps.

Key-value

The simplest NoSQL database type which is simply a map (or dictionary) so unique keys are mapped to values. While keys are strings (nearly always), values can be usually of the following types: string, integer, boolean, array or JSON (the exact set of supported data types depend on concrete implementation). Hence, it is easy to retrieve the value for a particular key from such a database but it’s much harder to perform more complex operations like searching for values matching a given pattern or grouping the data.

Some databases provide extra features to support additional operations but it’s usually not as efficient as in the case of the other database types. That’s why key-value databases should be chosen rather for simple cases when it’s enough to retrieve data by key.

Advantages:

  • Good performance,
  • High scalability,
  • Easy access to any data by key,
  • Simplicity – simple way of retrieving data and storing it in a map or dictionary structure in code,
  • Flexibility – no schema (or very limited schema in some implementations) makes it easy to process unstructured and semi-structured data.

Disadvantages:

  • No efficient way to access data without a key,
  • Lack of support for complex queries, only simple data retrieval is available so there is no query language or option to filter data, although some databases introduced such features but they are not as efficient as in the case of the other database types.

Sample use cases:

  • Shopping carts,
  • Session store (slower than in-memory but still fast enough in most cases),
  • Real-time recommendations and advertising,
  • Embedded systems,
  • Application configurations (perfect database to keep most configurations),
  • Cache (especially efficient when a database is of both types: in-memory and key-value).

Document

Subclass of key-value databases where each key is mapped to a document. The document itself is a set of key-value pairs stored in JSON, XML or BSON (binary JSON) format. This type of database can also group documents into collections, contrary to key-value databases where data is not grouped – it is similar to tables (collections) in relational database grouping rows (documents) so in document databases, we can access multiple tables while in key-value ones only single table.

Data can be partitioned based on collections. Partitioned collections can even be split over multiple servers to improve performance.

Yet another feature available here is indexes working similarly as in relational databases so they speed up reading operations but slow down write operations. That’s why they may be really useful if reads happen more frequently than writes and when the data is often filtered by a subset of fields so too many indexes could decrease performance.

Document databases provide query languages or APIs that make it possible to retrieve data in various ways so it is much more flexible than in the case of key-value databases. Documents usually contain additional metadata to expand search possibilities.

Advantages:

  • Good performance and scalability with options to improve it (indexes, partitioning),
  • Flexible querying – data can be retrieved in many ways,
  • No schema enforcement so unstructured or semi-structured data is handled really well,
  • Optional schemas make it possible to set types of fields which can be used when querying the database.

Disadvantages:

  • More complicated than key-value databases which may be a disadvantage if additional features are not used,
  • No possibility to define relationships between documents,
  • Limited querying possibilities compared to relational databases.

Sample use cases:

  • Catalogs (e.g. product catalog),
  • Content management,
  • Inventory management,
  • User profiles,
  • Serialized documents (e.g. PDF or Excel).

Wide column

Similarly to relational databases – this kind of database has tables, rows and columns. However, each row may have a different set of columns as columns are basically stored separately. Many wide-column databases apply the column family concept which allows for grouping columns. This way multiple columns may be stored together to reuse them across rows so all the affected rows have the same columns defined by the family. There are also columnar databases which are similar to wide-column ones but they don’t support the column family features.

Adding a new column to existing data is as easy as adding a new file with no need to modify existing files so it is really fast. When reading the data – it is much more efficient than in the case of relational databases but only when a small subset of attributes are queried (meaning a small number of files from disk must be read). On the other hand, read operations are really slow when multiple columns (from different column families) are queried and/or data is not queried by key. Insert, update and delete operations are also inefficient as many files must be changed to perform these operations. Insert operations performance issues may be partially solved using batch operations.

Wide-column databases can be a really good choice for OLAP applications where operations are usually performed on a small number of columns, e.g. to retrieve the average price of products. In the case of relational databases, all rows with all columns would have to be scanned (unless indexes are used but would be hard to define indexes for all columns separately due to a drop in write performance), in this case, it’s enough to read one file for this particular column. In OLAP inserts are usually done by infrequent bulk operations instead of regular single ones like in OLTP so that’s yet another reason.

Advantages:

  • Single-column operations performance (read, add, update or delete a single column),
  • High scalability,
  • Great data compression.

Disadvantages:

  • Slow operations on whole rows or more generally – when multiple columns are affected,
  • Lack of joins or they are very slow.

Sample use cases:

  • Analytical data,
  • Logs,
  • IoT sensors data,
  • Time-series data, e.g. financial transactions history (but a time-series database may be even better choice for this kind of data).

Graph

Graph databases store data in nodes. But those nodes can have different roles depending on graph’s model used in the concrete implementation. There are basically two main types:

  • Labelled-property graph – nodes (counterpart of rows in a relational database) are connected to each other by relationships (like foreign keys in a relational database). Each node or relationship can have a set of key-value properties so nodes and relationships can have internal structures. There are also labels that group similar nodes together.
  • Resources Description Framework (RDF) – data are described as a set of statements being triples called subject-predicate-object. RDF isn’t used only in graph databases but in this context subject is a node, a predicate is a relationship and an object is another node or a literal. Sample statement could be John likes Frank (where John is the subject, likes is the predicate and Frank is the object being another node in this case) and yet another one: John age 25 (where John is the subject, age is the predicate and 25 is the object being literal in this case).

More differences between these two types are explained really well here: RDF Triple Stores vs. Labeled Property Graphs: What’s the Difference?

Advantages:

  • Amazing performance of complex queries focused on relationships between data,
  • Flexible structure – easy to add or change attributes of a single node without affecting the others.

Disadvantages:

  • Slow scans of the whole graph (slower than in some other database types) so queries like “scan all nodes to search by property’s value” are inefficient,
  • Inefficient queries not focused on relationships, e.g. queries to retrieve data of node with given ID or name,
  • Bad performance when storing large values (e.g. BLOBs or long texts),
  • No standardized query language is used across many graph databases (like SQL for relational databases).

Sample use cases:

  • Real-time recommendation engines (e.g. for products in e-commerce, videos on streaming platforms etc.),
  • Social networks,
  • Digital assets management (e.g. documents, contracts etc.),
  • Fraud detection in transactions,
  • Semantic search,
  • Computer network management.

Time series

Time series databases have their internal structure optimized for time-based data where a timestamp is a key and additional attributes are associated with the timestamp. Concrete data organization varies among time-based database implementations – sometimes they are based on relational databases, in other cases on NoSQL ones. That’s why advantages and disadvantages are described very generally as they are different depending on the database implementation.
Queries with aggregations performed on data over time are really efficient here. Data downsampling works also much better than in other databases. In case you don’t know this term: downsampling is a process of keeping all data points only for recent data and deleting some of the older ones leaving e.g. only every tenth of the older points so data analysis is still possible (more exact for the short recent period and less exact for older periods) while reducing database size.

Advantages:

  • Great performance of aggregations computed on data in time context, e.g. average value of metric over the large time period,
  • Efficient downsampling,
  • Good data compression is designed to provide better compression for time-based data than more generic compression algorithms.

Disadvantages:

  • Not suitable for data where relationships between data are important,
  • Bad choice if data is not based on time (making it hard or impossible to assign timestamps to each data piece).

Sample use cases:

  • IoT sensors data,
  • Server metrics,
  • Application performance monitoring data,
  • Market trades,
  • Network traffic data,
  • Financial transactions history.

Ledger

The purpose of a ledger database is to provide the following features:

  • Immutability,
  • Transparency,
  • Cryptographically verifiable data changes history.

Hence, this type of database is a perfect place for audit logs or other types of data in which changes must be tracked with no possibility of tampering with the history. Technically, its implementation is dependent on exact implementation but there are usually two important components:

  • An immutable journal (or append-only table) where data can only be inserted with no possibility to modify or delete existing data,
  • An updateable table storing the current state where all CRUD operations may be performed, however, each operation performed on the table is also recorded in the immutable journal so all performed changes are trackable.

Journal leverages from Blockchain concept by organizing its data in the same way – using blocks chained together. In order to guarantee their immutability database digests are used, i.e. whenever a new block is added, a hash of the latest block is calculated and stored in the new block. In other words – each transaction (block) is hashed using data from the current transaction and the hash of the previous transaction (block). This way data change history integrity is easily verifiable. This solution is essentially a blockchain but a centralized one.

Advantages:

  • An immutable ledger of data changes provides a secure way to store and track them,
  • Verifiable integrity of data history changes.

Disadvantages:

  • Not suitable for cases when audit trails are not necessary – tracking data changes involves additional operations and disk space,
  • Ledger databases are quite new so they have some limitations and lack some features compared to more mature databases.

Sample use cases:

  • Record of financial transactions,
  • History of transactions in supply chain systems,
  • Some HR systems like a system to maintain all employees’ data,
  • Insurance – claim transactions history.

How to choose a proper database?

The database must be chosen depending on the purpose of the data and how it will be processed. There are many factors to take into account but I hope that the presented features, advantages, disadvantages and sample use cases for described database types will make this process easier.

Some of the factors you should consider:

  • ACID – fully supported by relational databases while most NoSQL databases support it only partially,
  • Schema – rigid in SQL databases, flexible in NoSQL ones,
  • Scalability – relational databases are typically scaled vertically (upgrading server by adding more CPUs, RAM etc.) while NoSQL databases can be scaled horizontally (adding more servers and distributing the workload across servers),
  • Queries performance and disk space – data is usually normalized in SQL databases so it takes less space but queries can be slower as they often require JOIN operations, on the other hand, NoSQL databases have more data redundancy resulting in more disk space required to store the data but also better query performance as all the necessary data is stored in a single place,
  • Data usage – some databases are versatile (mainly relational database but document database is also quite universal) while others are better suited for processing data of a given type, e.g. graph (relations among data), time-series (time-based data), ledger databases (data changes), in-memory (temporary data which must be queried really fast) or wide-column (if just a few attributes are usually accessed).

Generally speaking – if you are not sure how the data will be processed or what are the requirements or if they don’t match any particular type – I recommend choosing a relational database as it is well-suited for most use cases so it’s hard to consider it a bad choice. But always try to analyse exact requirements and consider all the details to choose the best database type. You also need to choose concrete implementation as many databases have significant differences from each other even if they are of the same type. Some of them also combine multiple types.

Take your time and try to think thoroughly before making this decision. It can significantly impact the performance, security, scalability and development time of your application.

Let me know if you would like to read more about some database examples or their comparison. In this article, I’m just providing names of the most popular ones to give you an idea about the possible options.

Database examples

There are really many databases available in the market. They are usually free but few of them are paid, mainly enterprise relational databases. Here I’m presenting some of them so you can easily check possibilities when using particular cloud providers or on-premise.

Database typeAWSGCPAzureOn-premise
RelationalAmazon Aurora

Amazon RDS

Amazon Redshift
Bare Metal Solution For Oracle

Cloud SQL: for PostgreSQL, MySQL & SQL Server

Cloud Spanner
SQL Database – Managed Cloud Database Service

SQL Server on Azure Virtual Machines

Azure Database for PostgreSQL

Azure Database for MySQL – Managed MySQL Database

Azure Database for MariaDB – Community Services
Database Services | Oracle

MySQL

SQL Server 2019 | Microsoft

PostgreSQL

SQLite

MariaDB
In-memoryAmazon ElastiCache

Amazon MemoryDB for Redis
Memorystore: in-memory data storeAzure Cache for RedisRedis

Tarantool

Aerospike Database 5

memcached
Key-valueAmazon DynamoDBNo native solution but the closest one is Firestore: NoSQL document databaseAzure Cosmos DB – NoSQL DatabaseRedis (in-memory and key-value)

NoSQL Database | Oracle (can be key-value, document or wide column)

Consul by HashiCorp (key-value database is a part of the wider solution)

Aerospike CacheDB (can be key-value or document)

memcached (in-memory and key-value)
DocumentAmazon DocumentDB (with MongoDB compatibility)

Amazon DynamoDB
Firestore: NoSQL document database (classic document database)

Firebase Realtime Database (all data is stored in a single giant JSON tree)

The Best Way to Run MongoDB on Google Cloud Platform
Azure Cosmos DB – NoSQL DatabaseNoSQL Database | Oracle

Aerospike CacheDB

Elasticsearch

MongoDB
Wide columnAmazon KeyspacesCloud Bigtable: NoSQL database serviceAzure Managed Instance for Apache Cassandra

Azure Cosmos DB – NoSQL Database
NoSQL Database | Oracle

Apache Cassandra
GraphAmazon NeptuneGoogle Cloud Graph Database | Neo4j AuraDB on Google Cloud

Azure Cosmos DB – NoSQL DatabaseJanusGraph

InfiniteGraph

Neo4j
Time seriesAmazon TimestreamNo native solution but external ones can be deployed on GCPTime Series Insights (not strictly a database but it stores and processes time series data)InfluxDB

QuestDB

Prometheus (monitoring and alerting toolkit but part of it is a time-series database)
LedgerAmazon Ledger Database Services (QLDB)Not availableAzure Confidential LedgerNot available
Database typeAWSGCPAzureOn-premise
Database examples

More details for databases in the cloud:

Summary

I hope that this post will help you to make better decisions when deciding which database should be chosen for your software. Some additional time to consider various types may bring you a lot of benefits or even determine your project’s success (or failure) as the database is often a very important part of the system.

Databases classification is not always the same so they can certainly be divided into other categories as well, e.g. search engines (like Elasticsearch) could be treated separately. Many databases have so many related tools around them that it’s hard to classify them. The purpose of this article was to increase your awareness in this matter so you can make the decision about database choice more consciously.

Additional knowledge

Leave a Reply

Your email address will not be published. Required fields are marked *