[05/11] Add covering index for /list/ query

Message ID 20180810080106.10714-6-stewart@linux.ibm.com
State Accepted
Headers show
Series
  • Performance for ALL THE THINGS!
Related show

Commit Message

Stewart Smith Aug. 10, 2018, 8:01 a.m.
In constructing the list of patches for a project, there are two
main queries that are executed:
1) get a count() of how many patches there are
2) Get the page of results being displayed

In a test dataset of ~11500 LKML patches and ~4000 others, the existing
code would take around 585ms and 858ms with a cold cache and 28ms and
198ms for a warm cache.

By adding a covering index, we get down to 4ms and 255ms for a cold
cache, and 4ms and 143ms for a warm cache!

Additionally, when there's a lot of archived or accepted patches
(I used ~11000 archived out of the 15000 total in my test set)
the query time goes from 28ms and 72ms down to 2ms and 33-40ms!

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
---
 .../0028_add_list_covering_index.py           | 19 +++++++++++++++++++
 patchwork/models.py                           |  6 ++++++
 2 files changed, 25 insertions(+)
 create mode 100644 patchwork/migrations/0028_add_list_covering_index.py

Comments

Stephen Finucane Aug. 31, 2018, 2:10 p.m. | #1
On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote:
> In constructing the list of patches for a project, there are two
> main queries that are executed:
> 1) get a count() of how many patches there are
> 2) Get the page of results being displayed
> 
> In a test dataset of ~11500 LKML patches and ~4000 others, the existing
> code would take around 585ms and 858ms with a cold cache and 28ms and
> 198ms for a warm cache.
> 
> By adding a covering index, we get down to 4ms and 255ms for a cold
> cache, and 4ms and 143ms for a warm cache!
> 
> Additionally, when there's a lot of archived or accepted patches
> (I used ~11000 archived out of the 15000 total in my test set)
> the query time goes from 28ms and 72ms down to 2ms and 33-40ms!
> 
> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>

As before, I'm trusting your leet skillz here in all DB-related things.
I don't have a large enough dataset to validate this properly but I am
seeing a performance improvement in the smaller set I do have.

Reviewed-by: Stephen Finucane <stephen@that.guru>

Stephen

> ---
>  .../0028_add_list_covering_index.py           | 19 +++++++++++++++++++
>  patchwork/models.py                           |  6 ++++++
>  2 files changed, 25 insertions(+)
>  create mode 100644 patchwork/migrations/0028_add_list_covering_index.py
> 
> diff --git a/patchwork/migrations/0028_add_list_covering_index.py b/patchwork/migrations/0028_add_list_covering_index.py
> new file mode 100644
> index 000000000000..65ebaefbead7
> --- /dev/null
> +++ b/patchwork/migrations/0028_add_list_covering_index.py
> @@ -0,0 +1,19 @@
> +# -*- coding: utf-8 -*-
> +# Generated by Django 1.11.15 on 2018-08-09 17:24
> +from __future__ import unicode_literals
> +
> +from django.db import migrations, models
> +
> +
> +class Migration(migrations.Migration):
> +
> +    dependencies = [
> +        ('patchwork', '0027_add_comment_date_index'),
> +    ]
> +
> +    operations = [
> +        migrations.AddIndex(
> +            model_name='patch',
> +            index=models.Index(fields=['archived', 'patch_project', 'state', 'delegate'], name='patch_list_covering_idx'),
> +        ),
> +    ]
> diff --git a/patchwork/models.py b/patchwork/models.py
> index d2389cfdad29..15224ad69cfa 100644
> --- a/patchwork/models.py
> +++ b/patchwork/models.py
> @@ -598,6 +598,12 @@ class Patch(SeriesMixin, Submission):
>          if django.VERSION >= (1, 10):
>              base_manager_name = 'objects'
>  
> +        indexes = [
> +            # This is a covering index for the /list/ query
> +            models.Index(fields=['archived','patch_project','state','delegate'],
> +                         name='patch_list_covering_idx'),
> +            ]
> +
>  
>  class Comment(EmailMixin, models.Model):
>      # parent
Stewart Smith Sept. 10, 2018, 2:50 a.m. | #2
Stephen Finucane <stephen@that.guru> writes:
> On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote:
>> In constructing the list of patches for a project, there are two
>> main queries that are executed:
>> 1) get a count() of how many patches there are
>> 2) Get the page of results being displayed
>> 
>> In a test dataset of ~11500 LKML patches and ~4000 others, the existing
>> code would take around 585ms and 858ms with a cold cache and 28ms and
>> 198ms for a warm cache.
>> 
>> By adding a covering index, we get down to 4ms and 255ms for a cold
>> cache, and 4ms and 143ms for a warm cache!
>> 
>> Additionally, when there's a lot of archived or accepted patches
>> (I used ~11000 archived out of the 15000 total in my test set)
>> the query time goes from 28ms and 72ms down to 2ms and 33-40ms!
>> 
>> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
>
> As before, I'm trusting your leet skillz here in all DB-related things.
> I don't have a large enough dataset to validate this properly but I am
> seeing a performance improvement in the smaller set I do have.

Simply put, an index is just a tree with a key of
something like this: |IndexField1|IndexField|PrimaryKeyOrPointerToHowtoGetFullRow|
So, if you have all the fields you need to answer a query in the index,
the database engine doesn't *need* to go and find the real row, it can
answer the query just from the index - saving potentially *lot* of disk
seeks as well as being a lot more cache friendly - especially as this
will group things on disk based on the order of the index, so if we
include things the right way for us, we get a cache friendly structure
that groups information on patches in a project close together on disk
and in an order that we're likely to request.

Patch

diff --git a/patchwork/migrations/0028_add_list_covering_index.py b/patchwork/migrations/0028_add_list_covering_index.py
new file mode 100644
index 000000000000..65ebaefbead7
--- /dev/null
+++ b/patchwork/migrations/0028_add_list_covering_index.py
@@ -0,0 +1,19 @@ 
+# -*- coding: utf-8 -*-
+# Generated by Django 1.11.15 on 2018-08-09 17:24
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+    dependencies = [
+        ('patchwork', '0027_add_comment_date_index'),
+    ]
+
+    operations = [
+        migrations.AddIndex(
+            model_name='patch',
+            index=models.Index(fields=['archived', 'patch_project', 'state', 'delegate'], name='patch_list_covering_idx'),
+        ),
+    ]
diff --git a/patchwork/models.py b/patchwork/models.py
index d2389cfdad29..15224ad69cfa 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -598,6 +598,12 @@  class Patch(SeriesMixin, Submission):
         if django.VERSION >= (1, 10):
             base_manager_name = 'objects'
 
+        indexes = [
+            # This is a covering index for the /list/ query
+            models.Index(fields=['archived','patch_project','state','delegate'],
+                         name='patch_list_covering_idx'),
+            ]
+
 
 class Comment(EmailMixin, models.Model):
     # parent