Event Sourcing – Paweł Woroniecki

Do you need a full history of changes in your data for audit or recovery from failure purposes? Time for Event Sourcing – a pattern that is changing the traditional way of data persistence from a simple always up-to-date current state stored in the database to a sequence of events called event store. Event Store is appendable-only registry where events can only be added. It enables us to get back in history to any point in time by replaying all events (changes) as well as bringing other benefits I discuss in this article.

What is it?

We can consider an example to explain the concept better. Let’s assume we are creating software for a transport and logistics company to manage their vehicles. One of the business requirements is to track their locations and statuses so the company can allocate them properly, i.e. decide when and where they should go.

The simplified version of the database’s design could look like this:

So each vehicle has its location (latitude and longitude attributes) assigned as well as the current status defined by the current contract. If current_contract_id is null – the vehicle is ready to be assigned to transport new goods, otherwise – can check which contract it is currently performing. Moreover, each contract has its own data like the goods receiver or the ultimate data when the goods must be delivered.

Whenever any data is changed, it must be reflected in the database. While this solution works, we cannot track how the data changed over time. We know nothing about the past – what contracts were handled by the vehicle and in which locations or how the contracts changed.

This problem can be solved using Event Sourcing pattern by implementing Event Store which could be implemented in various ways. One of them is to store events in a table in the database instead of storing the current state. So let’s replace both tables (Vehicle and Contract) with a new single one:

where the attributes have the following meaning:

data – event’s details, i.e. what data changed and how (e.g. which attributes were updated or what data was added),
type – which entity is affected by the event, in this example: vehicle or contract,
timestamp – event’s timestamp to be able to determine the event’s occurrence order which is really important,
version – events versioning is necessary to handle possible event structure changes, this way the application knows which algorithm should be used to handle a given event and we can make changes in event data structures in the future while still being able to support old event versions. This is necessary to make all events replayable. It can also be used when the event’s format (at the root level, not its data) is changed which is usually applicable if Event Store is stored in NoSQL databases although it can sometimes be applied also to SQL databases (e.g. if a new nullable column in Events table is added where only new events would have a value set for the column)

Sample rows in the table can then look like this:

id	data	type	timestamp	version
1	{ latitude: 10.12, longitude: 15.13, operation: “ADD” }	vehicle	1641660000	1
2	{ latitude: 15.12, longitude: 13.14, operation: “ADD” }	vehicle	1641670000	1
3	{ receiver: “Sample company”, expirationDate: “2022-01-20T13:16:01.915Z“, operation: ADD }	contract	1641680000	1
4	{ id: 1, latitude: 10.15, longitude: 15.20, currentContractId: 1, operation: “UPDATE” }	vehicle	1641690000	1
5	{ id: 1, latitude: 55.15, longitude: 34.68, currentContractId: null, operationType: “UPDATE” }	vehicle	1641700000	2
6	{ id: 2, operationType: “REMOVE” }	vehicle	1641710000	2

The explanation of the data flow which led to such events:

Two vehicles have been added (events 1 and 2). They are added with ids: 1 and 2 and presented to the user.
A new contract has been signed with “Sample company” so it is added (event 3). There is no vehicle assigned to this contract at this point – it’s just information that such a contract has been signed without any additional details.
The vehicle with id 1 has been updated, i.e. it provided its location and it has been assigned to the contract (event 4).
The contract has been fulfilled so the vehicle has been updated again – its contract has been cleared to make it ready for a new contract. Additionally, its position has been changed to reflect its new location (the contract is fulfilled when the good are delivered so the vehicle is in another location than at the beginning).
In the end vehicle with id 2 has been removed (event 6). It wasn’t necessary so there was no point in keeping this one.

All these events mean that currently there is a vehicle with id 1 without any contract and the vehicle is located in (55.15, 34.68). There is also a contract with id 1 (which should probably be extended with some state to mark it as completed). That’s the current state created dynamically by applying all the events – one by one, starting from nothing.

This way all necessary data can be inferred just from the single table – we can create the current state of vehicles and contracts by playing events ordered by their timestamps. We can also analyse changes in any way we want.

Remember that Event Source is an append-only registry. Once an event is added there, it should never be modified. That’s why new events were added in order to update or remove existing data instead of modifying existing ones.

Another important thing to notice is the version’s attribute usage – there is a structural change in the data between versions 1 and 2 – operation was renamed to operationType. Thanks to the event’s version, the application knows how to handle events – in this case, it uses the proper name of the attribute to recognize the operation’s type.

Event data could be provided in the more structural way – instead of generic JSON, there could be tables for all the possible event types and they could contain the data, e.g.: there could be a table with all vehicle attributes + operation (or operationType) column + event’s id and such a table would contain data of all vehicle-related events. It would have its own advantages (e.g. data validation, more possibilities to perform SQL queries on the data though some databases support JSON natively and allow to query its content) and disadvantages (more work and queries to retrieve data for events).

Event Store can also be implemented in completely different ways, e.g. using NoSQL databases or Kafka. I just presented one of the ways to implement it, not claiming that it’s always the best one.

Retrieving current state

While events are certainly useful for many purposes, still current state must be retrieved in most applications. The simplest way to get it is to calculate it starting from scratch – we have no data and we need to apply all events one by one to retrieve the final state. While this approach may be enough in cases where there are not too many events, it is too slow for situations when a lot of events are stored. How to deal with this issue?

One of the possible solutions is to store the current state in the application’s memory and update it whenever a new event is received. This update should be an asynchronous task which is independent of adding an event to Event Store as these two things should be separated. Complementary, snapshots of the state should be persisted also permanently periodically, e.g. once per day during the night. Thanks to this, when the application crashes, it can quickly recalculate the current state by reading the snapshot and applying only those events that occurred later (ones not included in the snapshot).

Benefits

Here are the reasons why it may be worth applying this pattern:

Audit logging – each data change is recorded in the event store so we can review all the changes any time we want, besides the attributes I presented above additional ones can be added, e.g. change’s source (which system or user triggered the event).
Recovering – if there is a failure in the system and the current state is corrupted, we can easily recover and rebuild the correct state by replaying all the events.
Better debugging and more effective bug fixing – we can get back to any point in the past so the system can be set to the state when an error occurred, enabling us to observe its behaviour in such a state more thoroughly and debug it.
Performance – transactions can be processed immediately because any data changes are as easy as adding new immutable events to the event store. There is no need to modify any data, it means also no issues with changes synchronization. Events can be processed asynchronously by the tasks running in the background.
Scalability – it’s easy to scale Event Store when necessary as there are data only added to it.
Fewer issues with concurrent updates – only events are added so there is no need to resolve conflicts related to the concurrent updates of the same state. However, it doesn’t protect the system from incorrect states by itself – it depends on the data in the events and the system handles inconsistencies there.

Issues

Besides benefits, there are obviously also some potential issues that may occur when using this pattern. Luckily, it is possible to mitigate them:

Eventual consistency for read – as events are added to Event Store, it may take some time to update also the current state (or other projections used when reading the data). Depending on how the retrieval of the current state is implemented, the delay may be smaller or larger but generally speaking it will always occur. However, when properly implemented, the delay should be small enough to be acceptable in most cases. Moreover, this eventual consistency affects only read mode as there is strong consistency when writing (adding the events).
Versioning – as events structure may change over time, there is a necessity to maintain code handling all event formats. Traditional CRUD solutions don’t have such a requirement, sometimes we need to handle the format of the old requests but only until all clients switch to the new version – there is no need to support them forever. In the case of Event Sourcing we need to be able to handle all historical event formats.
Sometimes there is no version attribute in events and when the structure is being changed – all past events are modified to align with the new structure. However, it is against the rules as events should never be modified and it carries a lot of risks, e.g. potential events corruption leading to the inability to recover the system (or recovering wrong data) or audit trailing damages.
Events order – in multi-threaded systems events may come to Event Store in a different order than they were created. This issue can be mitigated using timestamp attribute just as I did above.
Data analysis – based only on Event Store, it is quite hard to analyse data from the events, e.g. to get some statistics (like the current number of vehicles without contracts or how many contracts were fulfilled by the vehicles during the last month). That’s why additional data extraction must be performed, e.g. by background tasks running periodically creating the current state (snapshots I mentioned above) or other data sets if various formats are required so we can retrieve needed statistics.
Disk space usage – storing all the events requires more space on disk. Storage is relatively cheap compared to other resources, but if necessary, clean-ups may be scheduled to remove old events, leaving just some recent snapshots and newer events. Just make sure to consider all requirements, e.g. data may need to be stored for several years to meet audit trailing requirements.

Pattern usage cases

Apply this pattern when:

It’s important to keep changes history – for audit trailing or system recovery purposes. You can add also custom metadata to each change, e.g. change reason (like “new contract signed” when adding a contract, “vehicle damaged” or “goods delivered” when detaching the contract from the vehicle etc.).
There is a need to minimize concurrent update conflicts.
It makes sense to separate data changes from their processing for any reason, e.g. to improve performance or to notify multiple consumers about events enabling them to process the same data in various ways, e.g. here the same data can be processed in two different systems – by payroll (responsible for financial aspects) and by managers (responsible for assigning contracts to vehicles).
Data flexibility matters as it’s possible to create multiple different views/tables or replace old ones with new ones in an easy way – always basing on the same data (events from Event Store) and just producing diverse outputs.

Avoid this pattern when:

Strong consistency is required so even small delays when providing the current state aren’t acceptable (near real-time updates are must-haves).
Features provided by Event Store – like audit trailing and system recovery – aren’t necessary.
Concurrent updates are not a problem or they don’t occur.
Application is small so it would be just an overkill, i.e. this pattern would bring too much overhead compared to benefits resulting from its application while the traditional approach would be easier and enough.

Example of software that followed this pattern:

Version Control System, e.g. Git, SVN),
Database migration tools, e.g. Flyway, Liquibase,
Some fintech applications, e.g. Stashaway or solarisBank.

Additions

This pattern application is often accompanied by:

CQRS pattern usage – to separate reads from writes which may help to improve performance even further in some cases (although not always) which is usually based on Event Sourcing.
Events reversal capability – while it’s possible to get back to any point in time by reading the last snapshot before the point and applying necessary events, it can be sometimes better in terms of performance to make it possible to roll back events. Performance depends on the number of events we need to roll back vs the number of events needed to apply after the snapshot is loaded.

More knowledge

I definitely recommend gaining more knowledge and reading more about implementation methods, advantages and disadvantages of this approach:

https://docs.microsoft.com/en-us/azure/architecture/patterns/event-sourcing – general article on this topic, a good starting point.
https://dzone.com/articles/introduction-to-event-sourcing – a short article on this topic, yet another good starting point just to gain basic knowledge.
https://martinfowler.com/eaaDev/EventSourcing.html – Martins Fowler’s considerations related to this pattern, a lot of additional details worth thinking of.
https://medium.com/stashaway-engineering/event-sourcing-the-best-way-for-stashaway-7a2da36e6021 – article from StashAway company (FinTech) describing why and how they applied Event Sourcing in their software, along with implementation details.
https://www.solarisbank.com/blog/solarisbank-core-banking-beginning-of-a-journey/ – similar to the article from StashAway but this time it’s from solarisBank company – fewer implementation details than in the case of StashAway’s post but still worth to read.
https://www.continuousimprover.com/2017/06/the-ugly-of-event-sourcing-projection.html – article focused on issues resulting from Event Sourcing – worth reading just to keep them in mind and be able to properly react.
https://dzone.com/articles/event-sourcing-with-kafka – how to implement Event Sourcing using Kafka.
Optionally: https://dzone.com/articles/event-sourcing-vs-blockchain-1 – nice comparison between Event Sourcing and Blockchain in case you’re interested in it.