Databases grow alongside our clients’ business. But since the load of systems increases over time, it could cause the entity framework slow performance. Another reason for the performance degradation is a change in business rules.
SSD manufacturers are constantly inventing new technologies, and Microsoft offers memory-optimized tables since version 2014. However, speeding up drives and increasing memory size doesn’t always solve the issues. In most cases, the performance can be speeded up dramatically using caching of query results. But considering the most popular OLTP scenarios, a big obstacle is the lack of correct and accurate cache invalidation.
Although high-level cache invalidation for MS SQL server with the help of SqlDependency has already been presented, this technology has not become widespread due to numerous limitations.
Before we start
Having been involved in solving performance issues caused by both reasons – growing databases and growing complexity of queries according to changes in business rules, I want to share some insights in this field. They should help you improve entity framework performance by many times.
So, let’s take as an example of a pretty simplified database structure and explore the way that can help us solve entity framework performance issues.
Importantly, not to weaken the performance, you need to have a proper understanding of the client’s business. Also, access to the profiling and monitoring of the production systems would be mandatory.
Description of a real-world problem
Before we get to the entity framework performance improvements and DB structure, let’s look at the system. First of all, we need to consider staff turnover. This index is usually the same for the entire industry. For instance, when it comes to the retail industry, turnover might be about 1% a day. It means that the client/organization with 1,000 employees on the payroll usually hires and fires 10 people a day.
So, the tables on the diagram below change just 20 times a day for the organization above. For this reason, a cached query result should be refreshed from DB just 20 times a day. It is a great opportunity for us to implement invalidation of cache. The diagram is pretty simplified and contains only navigational and reverse properties required for reading comprehension of the query below.
On one hand, a stream of updates is usually significant just on a few tables. On the other hand, a bulk of tables/sets is often queried on each operation.
Before we move further, let me tell a little bit more about analysis in a vacuum. The invalidation of countries is much easier than the invalidation of staff – the countries are usually configured during the launch of a production system and rarely updated afterward. The four tables on the diagram above will be referenced by all the subsystems that are outside of the scope today:
Interviews management subsystem
Business trips management subsystem
Project management subsystem
Vacations management subsystem
Time tracking subsystem, etc.
So, let’s write a query that will be accelerated up to 10x:
This query returns a structure of departments of a specified organization. The result includes a director (navigation property “Director”) of the organization and heads of departments (navigation property “Department.Head”).
And here comes the solution
First of all, let’s look at the conditions when the cache of the query above follows accurate and correct invalidation. It is enough to flush cached query results in three cases – if an employee, a department, or an organization has been updated during some SaveChanges() call.
Obviously, we can not control all of the SaveChanges, so we need to intercept each SaveChanges() call and do our logic in ObjectContext.SavingChanges event handler. All we need is to mark corresponding OrganizationId properties with a custom attribute, CacheInvalidation. For example:
By the way, a CacheInvalidation attribute is also suitable for database-first behavior. In this case, the attribute should target the entity class. Besides, optional Property should specify which property refers to an entity id.
In the same way, we will mark OrganizationId and Id properties of Department and Organization classes. That’s it. With the help of this metadata, Entity Framework will flush up to two cached query results. Why two? Because we have to flush both the original organization’s info and the current one to improve entity framework performance.
Flushing queries cache according to the country changes is easier. As we don’t have queries with the id of a country as a parameter, we will invalidate all the cached query results on each update of every country entity. It is not a problem since such changes are expected to occur very rarely.
So, we have already implemented and configured correct and accurate invalidation for entity framework performance tuning. It is a transparent process – we do not need to modify both written and not written yet calls to SaveChanges().
Here is the updated query with caching. All we need is to wrap our query into DependenciesExtentions.LazyLoad:
Here are two important aspects of this method. First, the dependencies variable declares invalidation criteria – whose timestamps should be equal to valid cached results. Second, the old method body was just moved to lambda.
Eventually, we have added 4 instances of CacheInvalidation attribute to our code-first model. Besides, we have wrapped our query into a dependencies extension method called LazyLoad. In addition, we have made decisions of invalidation strategy based on the analysis. Finally, we have injected CacheInvalidator into our DbContext class. That is it.
This solution also includes drawback measurements as well as performance gain measurements.
By the way, the CacheInvalidation library is open-source, MIT licensed public git repo.
Here is a link to the bitbucket repo.
To make the internals of CacheInvalidation.EF library easier to understand, take a look at the flow diagram “How does cache invalidation work”. It shows interactions of two apps with the SQL Server DB, the Invalidation Storage, and the Cache.
Diagram illustrates four cases:
- Query in case of empty cache.
- Query in case of valid cached query results.
- Some call to SaveChanges(), which flushes cached query results.
- Query in case of incorrect cached query results.
In this article, we have discussed an entity framework best practices that resolve the problem of correct and accurate invalidation of cached query results in a distributed environment. Other areas, including performance issues of Entity Framework (relatively slow processing of SaveChanges() and long initialization of Entity Framework on the first query), are out of scope.
However, principles of cache invalidation and entity framework performance tips used here are suitable for any ORM, not just Entity Framework. When using Entity Framework, implementation of those principles becomes almost transparent. Hope this will help you improve the performance of the entity framework and make it much faster.
Get in Touch
Send us a message and we’ll get back to you shortly