← All Posts

The S3 Consistency Model Got an Upgrade

Written by
Kyle Galbraith
Published on
12 January 2021
Share
Over two years ago I wrote a blog post on Medium that explained the S3 consistency model. Since then a lot has changed.
Build Docker images faster using build cache banner

Over two years ago I wrote a blog post on Medium that explained the S3 consistency model. Since then a lot has changed. I quit writing on Medium, as you can see. But also, the S3 consistency model has received a major update.

The longest running AWS service, Simple Storage Service (S3), continues to get new and exciting features every year. As I stated in my original blog post, it’s a powerhouse of a service. It solves a breadth of use cases from data lakes to event-driven architectures. But before this year’s re:Invent, there was always a quirk you had to be careful of, the consistency model.

Before we jump to the ending, let’s remind ourselves of what the consistency model use to be.

There was a happy path scenario. This is where we are writing a brand new object to an S3 bucket.

PUT /key-prefix/cool-file.jpg 200
GET /key-prefix/cool-file.jpg 200

When we GET the file right after we PUT we get a status code of 200 and we know the file is the most up to date copy. Otherwise known as read-after-write consistency.

But then there was this caveat scenario with overwriting. We write an object to the bucket. Another process writes that object again (with new content), and then we try to read the object.

PUT /key-prefix/cool-file.jpg 200
PUT /key-prefix/cool-file.jpg 200 (new content)
GET /key-prefix/cool-file.jpg 200

Here we ended up with eventual consistency. When we call GET we may receive the file contents of the first PUT or we may receive the second. This was because the overwrite to the same object would have to be propagated behind the scenes.

Then there was the fussy 404 caveat. This occurred when you would issue a GET before the PUT had finished.

GET /key-prefix/cool-file.jpg 404
PUT /key-prefix/cool-file.jpg 200
GET /key-prefix/cool-file.jpg 404

Here because the GET happened on the object before the PUT was complete, we got a 404. Because of eventual consistency, it was possible to get a 404 again because the PUT may have still been propagating.

The world is simpler now

In December, AWS announced strong read-after-write consistency for all GET, PUT, and LIST operations in S3 🎉

So what does that mean for the scenarios we talked about up above? Well, it means they are mostly irrelevant now.

If a PUT call of an object to S3 is successful you can assume that any subsequent GET or LIST call for that object will return the latest version of the object. Meaning the happy path still works as expected.

The overwrite scenario will return the latest data as well. But the first GET request must happen after all PUT requests have finished to guarantee the latest object.

This last bit is important. You can have concurrent processes writing the same object, with different data, to the same bucket. The first process finishes writing the object and then the next process starts writing the object with new data. Meanwhile, before this second write finishes, we start a GET request on the object. In this scenario, our GET request can still return eventual data. This is because the second write hasn’t yet completed.

Like the above, we can have a scenario where there are simultaneous writes. Meaning that before process one finishes writing the object, process two starts writing to that object as well. This is what we call concurrent writes. In this scenario, S3 uses last-write wins semantics. But our GET request will return mixed results until the final write finishes.

The key here is that for most scenarios S3 now has strong read-after-write consistency. But, there are still edge cases.

Conclusion

S3 gaining strong read-after-write wipes out a lot of challenges. A lot of the caveats we were once stuck with go away. But that doesn’t mean all our problems are solved. As we saw, there are still things you have to consider when using S3 in an asynchronous environment. It’s unrealistic to think S3 can solve those for us as well because they can’t control how our applications write and read from a given bucket. Keep those special edge cases in mind as you build out an architecture that incorporates S3.

© 2024 Kyle Galbraith