Databases grow alongside with business of our clients. It’s a widely known fact that load on systems emerged in production increases with time. Therefore, it could cause performance degradation. Another cause of performance degradation over time is change in business rules. It causes steep leap in load. SSD manufacturers are constantly inventing new technologies, and Microsoft offers for us memory-optimized tables since version 2014. Scalability of both solutions is limited in comparison with caching. It’s important to keep in mind that speeding up drives and increasing memory size doesn’t always provide desirable performance. Ever-growing load/business could be speeded up dramatically using caching of query results.
However, considering the most popular OLTP scenarios, a big obstacle emerges – lack in correct and accurate cache invalidation.
To tell the truth, high-level cache invalidation for MS SQL using SqlDependency is already presented. Unfortunately, this technology has not become widespread due to a plenty of limitations. If you are interested in such limitations here is an official list of them:
Let’s start to conquer the problem
Having been involved into solving performance issues caused by both reasons – growing of database and growing to complexity of queries according to changes in business rules, I would like to share my knowledge and experience in this field.
So, let’s take an example of a similar, but pretty simplified database structure.
In order not to weaken the performance you need to have a proper understanding of the customer’s business. Also, an access to the profiling and monitoring of the production systems would be mandatory.
Description of a real world problem
Before we meet db structure, let’s consider staff turnover. This index is usually the same for a particular industry. For instance, when it comes to retail industry, turnover might be about 1% a day. E.g. the client/organization with the 1000 employees on the payroll usually hires and fires 10 people a day. So, the tables on the diagram below change just 20 times a day in the scope of the organization above. Relatively, a cached query result should be refreshed from DB just 20 times a day. It is a breakthrough opportunity for us to implement invalidation of cache. Diagram is pretty simplified and contains only navigational and reverse properties required for reading comprehension of the query below.
On one hand, a stream of updates is usually significant just on few tables. On the other hand, a bulk of tables/sets is often queried on each operation.
Let me tell a little bit more about analysis in a vacuum. The invalidation of countries is much easier than the staff – the countries are usually configured during launch of a production system once and updated rarely later. Finally, the four tables on the diagram above will be referenced by all the subsystems that are outside of the scope today:
Interviews management subsystem
Business trips management subsystem
Project management subsystem
Vacations management subsystem
Time tracking subsystem, etc.
Therefore, let’s write a query, which will be accelerated up to 10x:
This query returns structure of departments of a specified organization. Result includes a director (navigation property “Director”) of the organization and heads of departments (navigation property “Department.Head”).
And here comes the Solution
First of all, let’s declare conditions on which cache of this query above follows accurate and correct invalidation. It is enough to flush cached query result in 3 cases – if an employee, a department or an organization has been updated during some SaveChanges() call. Obviously, we can not control all of the SaveChanges, so we need to intercept each SaveChanges() call and do our logic in ObjectContext.SavingChanges event handler. All we need is to mark corresponding OrganizationId properties with custom attribute, namely CacheInvalidation. For example:
By the way, a CacheInvalidation attribute is also suitable for database-first behavior. In this case, attribute should target entity class. Besides, optional Property should specify which property refers to an entity id.
In the same way, we will mark OrganizationId and Id properties of Department and Organization classes. This is it. With the help of this metadata, Entity Framework will flush up to two cached query results. Why two? Because we have to flush both the original organization’s info and the current one.
Flushing queries cache according to the country changes is easier. As we don’t have queries with id of a country as a parameter, we will invalidate all the cached query results on each update of any country entity. It is not a problem, since such changes are expected to occur very rarely.
This way, we have already implemented and configured correct and accurate invalidation. It is a transparent process – we do not need to modify both written and not written yet calls to SaveChanges().
Here is the updated query with caching. All we need is to wrap our query into DependenciesExtentions.LazyLoad:
ImageHere are two highlights about this method. First of all, dependencies variable declares invalidation criteria – whose timestamps should be equal to valid cached results. Second of all, old method body has just moved to lambda.
Eventually, we have come to the end – we have added 4 instances of CacheInvalidation attribute to our code-first model. Moreover, we have wrapped our query into dependencies extension method, namely LazyLoad. In addition, we have made decisions of invalidation strategy based on the analysis. Finally, we have injected CacheInvalidator into our DbContext class. That is it.
This solution also includes drawback measurements as well as performance gain measurements.
By the way, the source code of the CacheInvalidation library is an open source, MIT licensed public git repo. We are looking forward to the feedback. Here is a link to bitbucket repo:
To make internals of CacheInvalidation.EF library easier to understand, the flow diagram “How does cache invalidation work” shows exactly what is in its title. It shows interactions of two apps with the SQL Server DB, the Invalidation Storage and the Cache. Diagram illustrates four cases:
1. Query in case of empty cache.
2. Query in case of valid cached query results.
3. Some call to SaveChanges(), which flushes cached query results.
4. Query in case of incorrect cached query results.
To sum up
So, this article covers only performance of the main responsibility of EF – query with caching. We have discussed a solution, which resolves the problem of correct and accurate invalidation of cached query results in distributed environment. Other areas, including performance issues of Entity Framework (relatively slow processing of SaveChanges() and long initialization of Entity Framework on first query), are out of scope.
By the way, principles of cache invalidation used here, are suitable for any ORM, not just Entity Framework. When using Entity Framework, implementation of those principles becomes almost transparent.