Over two years ago I wrote a blog post on Medium that explained the S3 consistency model. Since then a lot has changed. I quit writing on Medium, as you can see. But also, the S3 consistency model has received a major update.
The longest running AWS service, Simple Storage Service (S3), continues to get new and exciting features every year. As I stated in my original blog post, it's a powerhouse of a service. It solves a breadth of use cases from data lakes to event-driven architectures. But before this year's re:Invent, there was always a quirk you had to be careful of, the consistency model.
Before we jump to the ending, let's remind ourselves of what the consistency model use to be.
There was a happy path scenario. This is where we are writing a brand new object to an S3 bucket.
PUT /key-prefix/cool-file.jpg 200 GET /key-prefix/cool-file.jpg 200
GET the file right after we
PUT we get a status code of 200 and we know the file is the most up to date copy. Otherwise known as read-after-write consistency.
But then there was this caveat scenario with overwriting. We write an object to the bucket. Another process writes that object again (with new content), and then we try to read the object.
PUT /key-prefix/cool-file.jpg 200 PUT /key-prefix/cool-file.jpg 200 (new content) GET /key-prefix/cool-file.jpg 200
Here we ended up with eventual consistency. When we call
GET we may receive the file contents of the first
PUT or we may receive the second. This was because the overwrite to the same object would have to be propagated behind the scenes.
Then there was the fussy
404 caveat. This occurred when you would issue a
GET before the
PUT had finished.
GET /key-prefix/cool-file.jpg 404 PUT /key-prefix/cool-file.jpg 200 GET /key-prefix/cool-file.jpg 404
Here because the
GET happened on the object before the
PUT was complete, we got a
404. Because of eventual consistency, it was possible to get a
404 again because the
PUT may have still been propagating.
The world is simpler now
In December, AWS announced strong read-after-write consistency for all
LIST operations in S3 🎉
So what does that mean for the scenarios we talked about up above? Well, it means they are mostly irrelevant now.
PUT call of an object to S3 is successful you can assume that any subsequent
LIST call for that object will return the latest version of the object. Meaning the happy path still works as expected.
The overwrite scenario will return the latest data as well. But the first
GET request must happen after all
PUT requests have finished to guarantee the latest object.
This last bit is important. You can have concurrent processes writing the same object, with different data, to the same bucket. The first process finishes writing the object and then the next process starts writing the object with new data. Meanwhile, before this second write finishes, we start a
GET request on the object. In this scenario, our
GET request can still return eventual data. This is because the second write hasn't yet completed.
Like the above, we can have a scenario where there are simultaneous writes. Meaning that before process one finishes writing the object, process two starts writing to that object as well. This is what we call concurrent writes. In this scenario, S3 uses last-write wins semantics. But our
GET request will return mixed results until the final write finishes.
The key here is that for most scenarios S3 now has strong read-after-write consistency. But, there are still edge cases.
S3 gaining strong read-after-write wipes out a lot of challenges. A lot of the caveats we were once stuck with go away. But that doesn't mean all our problems are solved. As we saw, there are still things you have to consider when using S3 in an asynchronous environment. It's unrealistic to think S3 can solve those for us as well because they can't control how our applications write and read from a given bucket. Keep those special edge cases in mind as you build out an architecture that incorporates S3.