How to Use Amazon S3 as an Event Bus in Your next Architecture

How to Use Amazon S3 as an Event Bus in Your next Architecture

📅 04 June, 2020 – Kyle Galbraith

Amazon S3 is a powerhouse service inside of the AWS ecosystem. You can use it for storing petabytes of data, hosting static websites, building data lakes, and many more use cases.

Today, we are going to take a look at one of the unsung heroes of Amazon S3, event notifications. We are going to take a look at how we can use event notifications as an event bus for an event-driven architecture.

Event what a what???

We could spend hours talking about event-driven architectures. In fact, there are many books out there that focus on this topic alone. But, we’re not going to dive into that level of detail here.

Instead, were going to simplify event-driven architectures. For our purpose, an event-driven architecture is a system design that uses events to communicate. An event is a change in state. It can carry information about the state with it or simply be a notification.

Services within the architecture produce events, we refer to these as producers. Meanwhile, there are other services that consume events, these we call consumers. There is nothing that says a service can’t produce and consume events simultaneously.

Now that we have an idea of what an event-driven architecture is, let’s build one!

The Event Bus

What we are going to build out is a sample architecture that leverages Amazon S3 as the event bus. But, what exactly is an event bus anyways?

In general, an event bus is like the central hub to an event-driven architecture. It is where all producer services write their events and the consumer services pick them up. There are a couple of design choices when it comes to how your event bus publishes messages.

Approach #1: Publish everything to everyone

This is the simplest implementation for an event bus. It receives an event and publishes that event to all subscribers.

This has some unique advantages and disadvantages to other approaches. One advantage is that it makes the event-bus very easy to reason about. Any event producer can publish a message to the bus and it will notify all subscribers of the bus. The responsibility and logic of the bus are clear and concise.

The downside is that all subscribers will receive the event, even if they can’t operate on it or care about it. Consumers have to build in some filtering logic to filter out events they receive but don’t care about.

Approach #2: Filter events to consumers

Producers continue to publish all their events to the bus. But, the events get filtered to consumers. Allowing producers to continue to publish all events, but consumers only receive those that they care about.

This removes the disadvantage of Approach #1. Now the consumers can expect to receive only the events they care about.

There is a disadvantage to this approach. With a filter approach, consumers have to subsrcribe to the filter in addition to the event bus. They subscribe to events coming from the bus and register a filter for only particular events.

Publish Everything to Everyone Architecture

Now that we understand a little bit more about event buses and how they can publish events, it’s time we look at one within AWS. For the purposes of this post, we are going to focus on Amazon S3 as the event bus. Let’s look at the first approach - publish everything to everyone from the perspective of S3.

S3 is often viewed as a general object storage service. But it actually has many other features than just storing copious amounts of data. We are going to leverage one of these features, event notifications, to turn an S3 bucket into an event bus.

We are going to publish everything to everyone, so we can add one event notification that will push new events to all consumers. A high-level architecture diagram of this first approach looks like this:

Initial S3 event bus

Producers will write their events to the S3 bucket. We will configure an event notification on the bucket for any new objects created in the bucket. When a new event gets written by a producer to the bucket a notification is then pushed to all consumers.

But how does the pushing actually happen? When an object gets created in the bucket an event notification gets raised. But we must tell the bucket where to send the notification to. Amazon S3 out of the box allows us to send events to any of the following destinations.

  • Amazon Simple Notification Service (Amazon SNS) topic
  • Amazon Simple Queue Service (Amazon SQS) queue (not FIFO)
  • AWS Lambda

So we need to refine our proposed architecture a little bit. With our current approach, consumers have no way of receiving events because they aren’t subscribed to anything. We need to configure the bucket to send event notifications somewhere. For now, let’s configure our S3 event notifications to go to an SNS topic.

S3 event bus with SNS

Now producers will write their events to the S3 bucket. The events will push notifications to an SNS topic. Consumers of those events can subscribe to the SNS topic to receive them. Once a consumer receives a notification it can grab the key from S3 that triggered the notification. Within this key stored on S3 is the event the producer wrote.

Filter Events to Consumers Architecture

With the architecture proposed above, all producer events are sent to all subscribers. Meaning any event producer that writes an event to the S3 bucket will notify the SNS topic and thus all consumers subscribed to that topic.

This is fine if all the consumers need to see every event. But, this likely isn’t the case. It’s more likely that a given consumer only cares about one type of event rather than the entire corpus of events. So with the architecture from above, the consumer service must have some kind of filter to ignore events that they don’t care about.

This is a valid approach. But, what if we could push the filtering out of a given service and down onto the event bus. Then each consumer service would only ever receive the events they truly care about.

It turns out that there are multiple ways we could accomplish this.

Event Notification & SNS Topic per Producer

One approach would be to have event producers write to their own key prefix in the S3 bucket. Then we could have an event notification for each key prefix in the bucket that pushes to its own SNS topic.

S3 event prefixes

This proposed architecture solves our filtering problem. Each type of event producer can write to its own prefix, FooProducer writes to /foo/ in the S3 bucket. Then we can configure an event notification for each prefix in the bucket to go to a separate SNS topic. So when FooProducer writes an event to /foo/ an event notification gets pushed to the SNS topic for that specific prefix.

A key prefix has exactly one event notification and SNS topic. Consumers that care about those events can subscribe to the SNS topic for this type of producer.

This works, but it does have some complexity and downsides.

First, there is a lot more infrastructure and moving pieces. Each type of event producer must have a key prefix in the bucket, an S3 event notification, and an SNS topic.

Second, there is a limit on the number of event notifications you can have on a single S3 bucket. Say this limit was 100, then you can only have 100 different event producers write to the bucket. Maybe that’s a problem, maybe it’s not, but it’s a limitation.

Finally, it shifts the responsibility from consumers to producers. Now instead of consumers filtering the events they receive, producers must remember to write to a special prefix. It’s not complicated but it’s possible to write to the wrong prefix and push the wrong event to the wrong consumer.

Event Notification & SNS Topic with Filtering

There is another way we can filter events to consumers. What if we extended the functionality of our original Publish Everything to Everyone Architecture architecture.

S3 event bus with SNS

This architecture in its current form will push every event to every consumer. Meaning that the consumers have to have logic inside of them to skip events they don’t care about.

What if we could have the SNS topic do this filtering on our behalf? We would remove the need to filter inside of the consumer and remove multiple SNS topics.

So how can we get to that model?

One approach, that S3 does not currently support would be to define SNS topic Message Attributes on the S3 event notification.

Message Attributes are a payload you can include with an SNS notification. These are nothing more than key-value pairs that act as metadata on the notification. With SNS message filtering your subscribers can subscribe to receive only the messages that contain a message attribute they care about.

Combined with S3 event notifications, we want to define the S3 event notification with a Message Attribute. This could then get passed to the SNS topic when the event gets triggered. Here is an example of what this idea would look like:

S3 event with SNS filtering

This does still have the limitation around event notifications and producers still write to their own prefix. But now those notifications could push to one SNS topic.

With this idea, all producers continue to write their events to S3 as they did before. The bucket has event notifications configured to push events to a single SNS topic. But, **each notification configuration is set up to pass along a Message Attribute to the SNS topic. This Message Attribute could be something like service: FooProducer.

Allowing the FooConsumer to only receive events from the FooProducer via SNS message filtering.

Sounds great right? Except, it’s not currently possible with S3 events. You can’t configure S3 events that push to an SNS topic to pass along any Message Attributes.

So with that idea off the table, what else can we do?

It turns out we can create a similar behavior by adding an extra step to our producers. Currently, a producer writes it’s events to the event-bus, in our case this is S3. The producer then relies on the event bus to deliver this event to consumers. We saw this in earlier examples by event notifications pushing to our SNS topic.

If we add an extra step to our producers we can get SNS message filtering. A producer writes an event to the S3 bucket. It then sends a message to the SNS topic of the form below:

{
  "eventStore": "<s3-bucket-name>",
  "eventPath": "/foo/my-event-xyz"
}

When the producer sends the message to SNS it also sets a Message Attribute to something like service: FooProducer. Then consumers can subscribe and state they only want messages that have that attribute.

Here is what our architecture would look like for this approach:

Producers publish to SNS

As we see producers still write their events to S3 (bubble 1). But now after they write them they send a message to the common SNS topic that consumers are listening on (bubble 2).

With this approach, we have removed S3 event notifications altogether. S3 is no longer our event bus but rather our event storage. In this approach, the SNS topic is the actual event bus that is delivering events to consumers.

Consumers can now only receive producer events they care about. Once they receive the SNS topic notification they care about they can then grab the event that the producer wrote to S3. All the information about where to find the event is in the SNS message.

What have we learned?

We took a look at the introductory concepts surrounding event-driven architectures. We needed to ground ourselves in the concept of an event bus before looking at how Amazon S3 can act as one.

We took a look at two different approaches for getting events from producers to consumers in an event-driven architecture. Specifically, we looked at how an event bus can choose to publish every event to every consumer, or how we can filter events to consumers.

We dove into how both of these concepts can be used inside of AWS. We explored how S3 with event notifications can publish all events to a single SNS topic that consumers subscribe to.

Jumping off that idea we took a look at how we can filter events published to S3 to consumers. In particular, we could establish prefixes in the S3 bucket for each type of event producer. Each one of those prefixes could then have it’s own SNS topic. Consumers could then choose which SNS topics they want to subscribe to in order to only receive events from the producers they care about.

This was a viable approach but required quite a bit of extra infrastructure. So we explored how it would be nice if we could have one event notification to one topic but append SNS Message Attributes to the event. If this was possible with S3 event notifications we could then filter events to consumers via those Message Attributes. Unfortunately, this isn’t supported by S3 event notifications yet.

So that led us to an approach that gave us something similar. Instead of relying on S3 event notifications, we have producers send a message to a central SNS topic. These producers still write their events to S3. But they publish a message to the SNS topic with a Message Attribute set like service. This allows consumers to subscribe to the topic and select the message attributes they want to receive messages for. This changed the definition of S3 from being an event bus to event storage, but it is a viable workaround.

Conclusion

Amazon S3 is a powerhouse service within the AWS ecosystem. Besides its high durability and availability, it offers many core features that fit into a variety of architectures.

In this post, we focused on S3 event notifications and how they are leveraged to turn a bucket into an event bus. We also explored limitations to this idea.

The hope is that you can take the concepts we covered here and apply them to your own projects. Or at the very least take these concepts and recognize the limitations they may introduce and determine that they won’t work.

Want to check out my other projects?

I am a huge fan of the DEV community. If you have any questions or want to chat about different ideas relating to refactoring, reach out on Twitter.

Outside of blogging, I created a Learn AWS By Using It course. In the course, we focus on learning Amazon Web Services by actually using it to host, secure, and deliver static websites. It’s a simple problem, with many solutions, but it’s perfect for ramping up your understanding of AWS. I recently added two new bonus chapters to the course that focus on Infrastructure as Code and Continuous Deployment.

I also curate my own weekly newsletter. The Learn By Doing newsletter is packed full of awesome cloud, coding, and DevOps articles each week. Sign up to get it in your inbox.