[05/11] Add covering index for /list/ query

Message ID	20180810080106.10714-6-stewart@linux.ibm.com
State	Accepted
Headers	show Return-Path: <patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org> Gateway: Authorized Use Only! Violators will be prosecuted for <patchwork@lists.ozlabs.org> from <stewart@linux.ibm.com>; Fri, 10 Aug 2018 02:01:15 -0600 Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 10 Aug 2018 02:01:12 -0600 From: Stewart Smith <stewart@linux.ibm.com> To: patchwork@lists.ozlabs.org Subject: [PATCH 05/11] Add covering index for /list/ query Date: Fri, 10 Aug 2018 18:01:00 +1000 In-Reply-To: <20180810080106.10714-1-stewart@linux.ibm.com> References: <20180810080106.10714-1-stewart@linux.ibm.com> Message-Id: <20180810080106.10714-6-stewart@linux.ibm.com> Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Patchwork" <patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org>
Series	Performance for ALL THE THINGS! \| expand [00/11] Performance for ALL THE THINGS! [01/11] Improve patch listing performance (~3x) [02/11] 4x performance improvement for viewing patch with many comments [03/11] Add index for patchwork_comment (submission_id,date) [04/11] Fetch all series for patch/cover viewing [05/11] Add covering index for /list/ query [06/11] check distinct(user) based on just user_id [07/11] Be particular over check_set and series prefetch for /list/ [08/11] Add covering index to patchwork_submissions for /list/ queries [09/11] Optimise fetching checks when displaying a patch [10/11] Be sensible computing project patch counts [11/11] Fetch maintainer information in one query

Message ID

20180810080106.10714-6-stewart@linux.ibm.com

State

Accepted

Headers

From: Stewart Smith <stewart@linux.ibm.com>
To: patchwork@lists.ozlabs.org
Subject: [PATCH 05/11] Add covering index for /list/ query
Date: Fri, 10 Aug 2018 18:01:00 +1000
In-Reply-To: <20180810080106.10714-1-stewart@linux.ibm.com>
References: <20180810080106.10714-1-stewart@linux.ibm.com>
Message-Id: <20180810080106.10714-6-stewart@linux.ibm.com>
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org
Sender: "Patchwork"
	<patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org>

Series

Performance for ALL THE THINGS! | expand

Commit Message

Stewart Smith Aug. 10, 2018, 8:01 a.m. UTC

In constructing the list of patches for a project, there are two
main queries that are executed:
1) get a count() of how many patches there are
2) Get the page of results being displayed

In a test dataset of ~11500 LKML patches and ~4000 others, the existing
code would take around 585ms and 858ms with a cold cache and 28ms and
198ms for a warm cache.

By adding a covering index, we get down to 4ms and 255ms for a cold
cache, and 4ms and 143ms for a warm cache!

Additionally, when there's a lot of archived or accepted patches
(I used ~11000 archived out of the 15000 total in my test set)
the query time goes from 28ms and 72ms down to 2ms and 33-40ms!

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
---
 .../0028_add_list_covering_index.py           | 19 +++++++++++++++++++
 patchwork/models.py                           |  6 ++++++
 2 files changed, 25 insertions(+)
 create mode 100644 patchwork/migrations/0028_add_list_covering_index.py

Comments

Stephen Finucane Aug. 31, 2018, 2:10 p.m. UTC | #1

On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote:
> In constructing the list of patches for a project, there are two
> main queries that are executed:
> 1) get a count() of how many patches there are
> 2) Get the page of results being displayed
> 
> In a test dataset of ~11500 LKML patches and ~4000 others, the existing
> code would take around 585ms and 858ms with a cold cache and 28ms and
> 198ms for a warm cache.
> 
> By adding a covering index, we get down to 4ms and 255ms for a cold
> cache, and 4ms and 143ms for a warm cache!
> 
> Additionally, when there's a lot of archived or accepted patches
> (I used ~11000 archived out of the 15000 total in my test set)
> the query time goes from 28ms and 72ms down to 2ms and 33-40ms!
> 
> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>

As before, I'm trusting your leet skillz here in all DB-related things.
I don't have a large enough dataset to validate this properly but I am
seeing a performance improvement in the smaller set I do have.

Reviewed-by: Stephen Finucane <stephen@that.guru>

Stephen

> ---
>  .../0028_add_list_covering_index.py           | 19 +++++++++++++++++++
>  patchwork/models.py                           |  6 ++++++
>  2 files changed, 25 insertions(+)
>  create mode 100644 patchwork/migrations/0028_add_list_covering_index.py
> 
> diff --git a/patchwork/migrations/0028_add_list_covering_index.py b/patchwork/migrations/0028_add_list_covering_index.py
> new file mode 100644
> index 000000000000..65ebaefbead7
> --- /dev/null
> +++ b/patchwork/migrations/0028_add_list_covering_index.py
> @@ -0,0 +1,19 @@
> +# -*- coding: utf-8 -*-
> +# Generated by Django 1.11.15 on 2018-08-09 17:24
> +from __future__ import unicode_literals
> +
> +from django.db import migrations, models
> +
> +
> +class Migration(migrations.Migration):
> +
> +    dependencies = [
> +        ('patchwork', '0027_add_comment_date_index'),
> +    ]
> +
> +    operations = [
> +        migrations.AddIndex(
> +            model_name='patch',
> +            index=models.Index(fields=['archived', 'patch_project', 'state', 'delegate'], name='patch_list_covering_idx'),
> +        ),
> +    ]
> diff --git a/patchwork/models.py b/patchwork/models.py
> index d2389cfdad29..15224ad69cfa 100644
> --- a/patchwork/models.py
> +++ b/patchwork/models.py
> @@ -598,6 +598,12 @@ class Patch(SeriesMixin, Submission):
>          if django.VERSION >= (1, 10):
>              base_manager_name = 'objects'
>  
> +        indexes = [
> +            # This is a covering index for the /list/ query
> +            models.Index(fields=['archived','patch_project','state','delegate'],
> +                         name='patch_list_covering_idx'),
> +            ]
> +
>  
>  class Comment(EmailMixin, models.Model):
>      # parent

Stewart Smith Sept. 10, 2018, 2:50 a.m. UTC | #2

Stephen Finucane <stephen@that.guru> writes:
> On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote:
>> In constructing the list of patches for a project, there are two
>> main queries that are executed:
>> 1) get a count() of how many patches there are
>> 2) Get the page of results being displayed
>> 
>> In a test dataset of ~11500 LKML patches and ~4000 others, the existing
>> code would take around 585ms and 858ms with a cold cache and 28ms and
>> 198ms for a warm cache.
>> 
>> By adding a covering index, we get down to 4ms and 255ms for a cold
>> cache, and 4ms and 143ms for a warm cache!
>> 
>> Additionally, when there's a lot of archived or accepted patches
>> (I used ~11000 archived out of the 15000 total in my test set)
>> the query time goes from 28ms and 72ms down to 2ms and 33-40ms!
>> 
>> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
>
> As before, I'm trusting your leet skillz here in all DB-related things.
> I don't have a large enough dataset to validate this properly but I am
> seeing a performance improvement in the smaller set I do have.

Simply put, an index is just a tree with a key of
something like this: |IndexField1|IndexField|PrimaryKeyOrPointerToHowtoGetFullRow|
So, if you have all the fields you need to answer a query in the index,
the database engine doesn't *need* to go and find the real row, it can
answer the query just from the index - saving potentially *lot* of disk
seeks as well as being a lot more cache friendly - especially as this
will group things on disk based on the order of the index, so if we
include things the right way for us, we get a cache friendly structure
that groups information on patches in a project close together on disk
and in an order that we're likely to request.

diff --git a/patchwork/migrations/0028_add_list_covering_index.py b/patchwork/migrations/0028_add_list_covering_index.py
new file mode 100644
index 000000000000..65ebaefbead7
--- /dev/null
+++ b/patchwork/migrations/0028_add_list_covering_index.py
@@ -0,0 +1,19 @@ 
+# -*- coding: utf-8 -*-
+# Generated by Django 1.11.15 on 2018-08-09 17:24
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+    dependencies = [
+        ('patchwork', '0027_add_comment_date_index'),
+    ]
+
+    operations = [
+        migrations.AddIndex(
+            model_name='patch',
+            index=models.Index(fields=['archived', 'patch_project', 'state', 'delegate'], name='patch_list_covering_idx'),
+        ),
+    ]
diff --git a/patchwork/models.py b/patchwork/models.py
index d2389cfdad29..15224ad69cfa 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -598,6 +598,12 @@  class Patch(SeriesMixin, Submission):
         if django.VERSION >= (1, 10):
             base_manager_name = 'objects'
 
+        indexes = [
+            # This is a covering index for the /list/ query
+            models.Index(fields=['archived','patch_project','state','delegate'],
+                         name='patch_list_covering_idx'),
+            ]
+
 
 class Comment(EmailMixin, models.Model):
     # parent