Over two years ago I wrote a blog post on Medium that explained the S3 consistency model. Since then a lot has changed. I quit writing on Medium, as you can see. But also, the S3 consistency model has received a major update.
The longest running AWS service, Simple Storage Service (S3), continues to get new and exciting features every year. As I stated in my original blog post, it’s a powerhouse of a service. It solves a breadth of use cases from data lakes to event-driven architectures. But before this year’s re:Invent, there was always a quirk you had to be careful of, the consistency model.
Before we jump to the ending, let’s remind ourselves of what the consistency model use to be.
There was a happy path scenario. This is where we are writing a brand new object to an S3 bucket.
PUT /key-prefix/cool-file.jpg 200
GET /key-prefix/cool-file.jpg 200
When we GET
the file right after we PUT
we get a status code of 200 and we know the file is the most up to date
copy. Otherwise known as read-after-write consistency.
But then there was this caveat scenario with overwriting. We write an object to the bucket. Another process writes that object again (with new content), and then we try to read the object.
PUT /key-prefix/cool-file.jpg 200
PUT /key-prefix/cool-file.jpg 200 (new content)
GET /key-prefix/cool-file.jpg 200
Here we ended up with eventual consistency. When we call GET
we may receive the file contents of the first PUT
or we
may receive the second. This was because the overwrite to the same object would have to be propagated behind the scenes.
Then there was the fussy 404
caveat. This occurred when you would issue a GET
before the PUT
had finished.
GET /key-prefix/cool-file.jpg 404
PUT /key-prefix/cool-file.jpg 200
GET /key-prefix/cool-file.jpg 404
Here because the GET
happened on the object before the PUT
was complete, we got a 404
. Because of eventual
consistency, it was possible to get a 404
again because the PUT
may have still been propagating.
The world is simpler now
In December,
AWS announced strong read-after-write consistency
for all GET
, PUT
, and LIST
operations in S3 🎉
So what does that mean for the scenarios we talked about up above? Well, it means they are mostly irrelevant now.
If a PUT
call of an object to S3 is successful you can assume that any subsequent GET
or LIST
call for that object
will return the latest version of the object. Meaning the happy path still works as expected.
The overwrite scenario will return the latest data as well. But the first GET
request must happen after all PUT
requests have finished to guarantee the latest object.
This last bit is important. You can have concurrent processes writing the same object, with different data, to the same
bucket. The first process finishes writing the object and then the next process starts writing the object with new data.
Meanwhile, before this second write finishes, we start a GET
request on the object. In this scenario, our GET
request can still return eventual data. This is because the second write hasn’t yet completed.
Like the above, we can have a scenario where there are simultaneous writes. Meaning that before process one finishes
writing the object, process two starts writing to that object as well. This is what we call concurrent writes. In this
scenario, S3 uses last-write wins semantics. But our GET
request will return mixed results until the final write
finishes.
The key here is that for most scenarios S3 now has strong read-after-write consistency. But, there are still edge cases.
Conclusion
S3 gaining strong read-after-write wipes out a lot of challenges. A lot of the caveats we were once stuck with go away. But that doesn’t mean all our problems are solved. As we saw, there are still things you have to consider when using S3 in an asynchronous environment. It’s unrealistic to think S3 can solve those for us as well because they can’t control how our applications write and read from a given bucket. Keep those special edge cases in mind as you build out an architecture that incorporates S3.