Skip to content

Instantly share code, notes, and snippets.

@lmcardle
Last active August 4, 2021 18:49
Show Gist options
  • Save lmcardle/e8ec66f88d89d2b637fe22fded2b5801 to your computer and use it in GitHub Desktop.
Save lmcardle/e8ec66f88d89d2b637fe22fded2b5801 to your computer and use it in GitHub Desktop.

Microservice Patterns for Sharing Data

Database Ownership

As a general pattern, it has been well established that access to any given database should be limited to a single service. Meaning that no two services should both be able to directly reach into the database to read, insert, update or delete the data. Instead, a single service should hove ownership of that database and if another service has need to access that same data, it must go through the owning service.

<diagram 1>

Patterns for Providing Shard Data to Customers

While the actual data ownership and who accesses the database is well established, what is less established is how does the service which depends on the shared data, go about obtaining the data at runtime to fullfil a synchronous API request.

Before going into the available patterns, we first want to understand the environment. In this case, we have 3 independent services: Service A, B and C. As seen in the below diagram, all three services have their own databases and their own data models. In Service A and B, in addition to properties that are very specific to their service on their data model, they also have a common property, quox which technically belongs to the third service (Service C). Service C's data model is simply quox.

<diagram 2>

A real world DSCC usecase for the above would be something like tags. With tags, we are creating a new service (Tags), which would be represented as Service C above. Other services, say Volumes and Virtual Machines may subscribe to the tags service and want to make tags a first class property of it's object model.

For example, a user could make a GET request to /tags, which would return `{ items: [ { id, name } ] }.

Likewise, a user might make a GET request directly against a volume, which would return tags as a property on the volume object: { vol-id, vol-name, tags: [ {id: name } ] }. The tags returned with the volume being all of the tags that have been applied to that given volume. While the propertiesvol-idandvol-name` clearly belong to the volumes service, the volumes service does not own the tags data. Instead it must interact with the tags service to get the id and name of the tag when it is applied, changed in some way or deleted.

Pattern 1 - Real Time Access

One pattern for Service A and B to obtain this data and return it to customers in the API call is via a synchronous API call to Service C. When an API call comes into Service A, GET /service-a/123, Service A must pause, make a GRPC call to Service C, and once that call succeeds, populate the response with the newly obtained data and then finally return the Service A object to the client.

<diagram 3>

The pattern is fairly clean and allows for there to always and everywhere to only be one representation of the truth to which Service C looks like.

Pattern 2 - Caching Data Between Services

Another pattern for Service A and B to obtain this data and return it to customers is by asynchronously getting it and caching it within both Service A and Service B own databases. As above, when a CRUD action is performed against Service C, Service C owns the change and updates it's database to reflect that change. However, in this case, Service C does not stop there. Instead, Service C after making the change in its own database, it puts a message on the Kafka Message Bus describing the change that was recently made. Because Service A and B have subscribed to Service C, those two services see the new message in Kafka and pick it up. Both of them look at the message and decide if the message is relevant to them, in which case they update their own databases with this change.

<diagram 4>

Back to the real world, and using the tags service as the example again. For task a, its database currently looks like the following:

The tags service database looks like the following: id, name, associated resources 456, 'chicago', {type: volume, id: 123}

, , 123, Blaz, { id: 456, name: 'chicago' }

Now, when a client makes a PATCH request to /tags changing the name from 'chicago' to 'milwaukee', somehow, the volumes service needs to pick this up and update volume 123 with this change. This happens by the tags service putting a message on Kafka, describing this change that was made. The volumes service picks this message up and ultimately updates volume 456 with this change. While the change does not happen in real time, the representation of the data is eventually consistent, with the two databases looking like the following:

id, name, associated resources 456, 'milwaukee', {type: volume, id: 123}

, , 123, Blaz, { id: 456, name: 'milwaukee' }

Tradeoffs Between Patterns

Ultimately both patterns achieve a the same objective, in that independent services are able to return data that is meaningful to them, but the data is actually owned by another service. However, while they both achieve the same outcome, the two patterns optimize for different things.

In the first pattern, we can see that the client who is requesting the data always sees the latest and greats state of the world. If a change is made to Service C and a moment later a GET is made to Service A, the data that Service A will show (which is owned by Service C) will be the exact same data as if the user had made a GET request directly to Service C. This is a great achievement, however it comes with a tradeoffs.

Because Service A now depends on Service C in real time, Service A will be negatively affected with regard to performance and the availability of it's own API. Regarding performance, when an API call comes into Service A, it must pause and make a GRPC call to Service C. The call alone adds extra latency to service A being able to respond,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment