Cache Invalidation
- Random guy on the internet explains cache invalidation
- What is cache? and why do we need it?
Reading data from storage can be slow, if we take in account for HDD that means its actually a literal disk which needs to spin in correct manner in order for you to fetch your data and return it back. This requires efforts from many intermediate components in order to work properly too. Imagine you are selling chips, and the chips are kept in your storage room which is 3 floors up, everytime a new customer comes in with an order you will have to go all the way up and back in order to get a packet of chips and give it to the customer What will be the obvious fix here? Keeping some of the packets near the front door! That is what cache is, you are basically keeping some of the stuff from storage which is accessed a lot frequently in a faster fetching mechanism, in tech words, memory. Memory is faster in retrieval time, it deletes it data upon restart(most of the time) and improves the customer experience with respect to performance!
- Cache is great! where is the invalidation part?
Now cache is great for reading stuff but you cannot easily use it as a write alternative, since your changes need to flow into your storage as well as the other database replicas you might have across the world. Loading storage data into cache comes with an obvious question, till when will you keep it there? Cache memory is much more expensive compared to storage (although this is changing with time and maybe someday everything stays in memory), so you cannot just hoard your data onto it and wish for it to work well, you will need to think of ways to remove old data and get the updated data from the DBs in some programatically manner. Another point is taking care of the updates, if you are fetching data from the storage and your user then ends up updating the storage your cache will need refetch the new data in order to be relevant otherwise why would the user want data fast if it is outdated? we cannot keep the time it takes to refresh the cache too less either since that can lead to the systems getting overloaded and taking up too much of your bandwidth, it has happened where cache update has become a much more expensive operation due to very less frequency and it will make your bills go sky rocket! One of the most used ways to do this is using periodic refresh, take example of chips again, you know your chips get expired in front door in 5 days so every 4th/5th day you dump all the packets on your front-door (cache) and reload the packets from the storage (assume a hypothetical situation where storage keeps the chips fresh and new chips are also getting loaded there) So far so good? cool, now same thing can be done in a database, you set a ttl (time-to-live) for you cache which decides when does the cache needs to be refreshed and upon that time your components will update your cache, easy peasy we all win! Not really, this is where the real crux comes in. Imagine your database depends on some external party and your cache recently updated itself. The customers will be accessing your data directly from cache in a very fast speed as expected. But someone then tells you that one of the data entry in your DB is incorrect, inappropriate or simply corrupt! Again going back to the chips example, this is like being told some of the chips you got from the storage are expired! But the customer will not know this and they will keep consuming what is loaded in the cache for them, this is where you need a method to invalidate your cache!
Manually or automatically, your systems need to check if the cache contains a form of data which should not be there and allows the system admins to override the value of refresh the cache at will! There are many ways of doing this but having such mechanism in place is crucial when serving lots of clients with read heavy systems since they depend on cache alot! Hope this was a okay explanation, I liked covering this topic in the following casual manner so be sure to tag me or comment down below if you would like me to cover other such topics!