Only get the row you need

And the key reason for your slow query is it outputs too much data. Consider adding: Take(200), Skip() to take only the data you need or the current page requires. Use a pager to generate the report. This might helps a lot.

Consider the following code (BAD Example)

var myArticles = await _dbContext
    .Articles
    .Where(t => t.OwnerId == me.Id)
    .ToListAsync();

var theFirstArticle = myArticles[0];
var theSecondArticle = myArticles[1];

The system will load all data to get only your second article.

var myArticles = await _dbContext
    .Articles
    .Where(t => t.OwnerId == me.Id)
    .Take(2) // Only get two items.
    .ToListAsync();

var theFirstArticle = myArticles[0];
var theSecondArticle = myArticles[1];
	

Only get the column you need

Imagine you are only querying the hits count of a post, you may do:

var hits = await (dbContext.Posts.FirstOrDefaultAsync(p => p.Id == postId)).Hits;

But that query will query ALL properties of that post, may including the content, which might be very slow.

Since you only need the hits data, you can do:

var hits = await dbContext
    .Posts
    .Where(p => p.Id == postId)
    .Select(p => p.Hits)
    .FirstOrDefaultAsync();

But what if you need more than hits data, but two properties: CreationTime and Hits?

Solution is very simple. You can manually select all properties and map to a new instance like this:

var postWithHitsAndCreationTime = await dbContext
    .Posts
    .Where(p => p.Id == postId)
    .Select(p => new Post
    {
        Hits = p.Hits,
        CreationTime = p.CreationTime // <- Only select the properties you need.
    })
    .FirstOrDefaultAsync();

While that solution is efficient, the code might be ugly. The returned postWithHitsAndCreationTime may have a lot of null values for those you didn't select.

To solve that, you may need third-party libraries like AutoMapper which helps you do ProjectTo<PostDto> on the entity, to convert that as a dto, which only has the properties you need.

For more about AutoMapper:

https://docs.automapper.org/en/stable/Queryable-Extensions.html

Consider about IQueryable and IEnumerable

When we are writing a query to our database, we need to be careful about what we are currently working with. For most LINQ operations supports both IQueryable and IEnumerable.

IQueryable is a query definition. It will not be loaded if not necessary. Extend this object only changes how EF generates the SQL.

IEnumerable is a collection of data that can be enumerated. It will be loaded once you access it. For example, if you add a filter to it and iterate the collection, the system will try to get all data before running the filter.

Consider the following code (BAD Example):

        public IEnumerable<Topic> GetTopics()
        {
            // Return an IQueryable.
            return _dbContext.Topics;
        }

        public object SomeAction()
        {
            var myTopics = GetTopics()
                .Where(t => t.SummitId == 5); // This `Where` method comes from `System.Linq`. Not from `System.Linq.Queryable`

            return myTopics;
        }

The code calls GetTopics. While the code defines an EF query and returns it, the upper level can't understand that this is a query definition. But be treated as a collection of data.

Change that to:

        public IQueryable<Topic> GetAllTopicsQuery()
        {
            var myTopics = await _dbContext
                .Topics // Topics is an IQueryable. So the query will only happed when necessary.
                .Where(t => t.SummitId == 5); // This `Where` method comes from `System.Linq.Queryable`. It defines the query.

            return myTopics;
        }

Call ToList immediately when your query can load all you need

Calling the ToList method from an IQueryable object will do the query job and loads all data in the memory immediately. So when you can make sure that your current query loads everything you need, just call ToList to prevent system lazy load.

If you are not trying to get a collection of data, but only one item, call SingleOrDefault or FirstOrDefault instead.

Consider the following code (BAD Example):

            var allTopics = _dbContext
                .Topics
                .Where(t => t.SummitId == summitId);
            foreach (var topic in allTopics) // Not called ToList(). So we are iterating an IQueryable.
            {
                // do something.
                bigName += topic.Title;
            }

The system will load the topic one by one while executing your logic. This gonna be slow.

Change that to:

            var allTopicTitles = await _dbContext
                .Topics
                .Where(t => t.SummitId == summitId)
                .Select(t => t.Title)
                .ToListAsync();
            foreach (var topicTitle in allTopicTitles) // allTopicTitles is a List<string>. This loop happens in the memory.
            {
                // do something.
                bigName += topicTitle;
            }

The first two tips indicate that:

  • If you are building a query and not everything in the query is needed, write code with IQueryable.
  • If your query is built fine and everything you gonna query is necessary for the logic, make it a list immediately to prevent further IO operations.

Use FirstOrDefault to replace SingleOrDefault() for primary key queries

In C#, SingleOrDefault() will throw an exception if the results are more than 1. But FirstOrDefault() will return the first item.

For Entity Framework, SingleOrDefaultAsync() will be translated to top 2 while FirstOrDefaultAsync() will be translated to top 1.

Consider the following query (Bad example):

var appId = 12345;
var myApple = await dbContext
    .Apples
    .Where(a => a.Id == 12345) // Id is the primary key.
    .SingleOrDefaultAsync();

Since it works, and can correctly get the apple you need. The problem is that it will be translated to SELECT TOP 2 column_name(s) FROM Apples.

Since you already know that Id is the primary key. And it is impossible to get 2 apples with the same ID. You don't need that top 2, which will scan the entire table even after SQL already found the correct result.

Change that to:

var appId = 12345;
var myApple = await dbContext
    .Apples
    .Where(a => a.Id == 12345) // Id is the primary key.
    .FirstOrDefaultAsync(); // <- This doesn't harm the consistency.

Prevent tracking

Consider adding AsNoTracking for this makes query performance better if you don't need to save the changes.

Reference: https://docs.microsoft.com/en-us/ef/core/querying/tracking#no-tracking-queries

Prevent Include

Include generates SQL to select multiple tables. Which greatly increased complexity. You can only select the data you need and prevent writing the Include function.

For example, if you only want to get the last ball in the box, consider writing like this:

public class Box
{
    public int Id { get; set; }
    public IEnumerable<Ball> Balls { get; set; }
}

public class Ball
{
    public int Id { get; set; }

    public int BoxId { get; set; }
    public Box Box { get; set; }
}


    var boxes = await Boxes
                // DO NOT Call Include(t => t.Balls) here!
                .Where(somecondition)
                .Select(t => new Box(){
                  Id = t.Id,
                  Balls = t.Balls.OrderByDescending(x => x.CreationTime)
                             .Take(1) // Only get what you need
                })               
                .ToListAsync()

Also when we use Select we can remove .Include because it won’t have any effect here.

Will the projection property be null if I dont use include above? No, It won’t! Select will take care of it. Give it a try!

Prevent client-side evaluation

In the old version of EF, client-side evaluation is supported. So if you are trying to do some complicated calculation with DateTime or TimeSpan, EF will load all the data and do the evaluation locally. (Fixed in .NET Core 3.0)

Client-side evaluation means that EF will query all data in the database and typically very slow. Consider the following code(BAD Example):

            var viewers = await _dbContext
                .Events
                .Where(t => t.CreationTime + TimeSpan.FromMinutes(1) > DateTime.UtcNow)
                .Select(t => t.Fingerprint)
                .Distinct()
                .CountAsync();

EF can't translate the timespan calculation to SQL. So it has to load all data from the database and query from the client-side.

To fix this, evaluate the time first in code and build the query then.

            var startCountTime = DateTime.UtcNow - TimeSpan.FromMinutes(1);
            var viewers = await _dbContext
                .Events
                .Where(t => t.CreationTime > startCountTime)
                .Select(t => t.Fingerprint)
                .Distinct()
                .CountAsync();