Message ID | 20180810080106.10714-6-stewart@linux.ibm.com |
---|---|
State | Accepted |
Headers | show |
Series | Performance for ALL THE THINGS! | expand |
On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote: > In constructing the list of patches for a project, there are two > main queries that are executed: > 1) get a count() of how many patches there are > 2) Get the page of results being displayed > > In a test dataset of ~11500 LKML patches and ~4000 others, the existing > code would take around 585ms and 858ms with a cold cache and 28ms and > 198ms for a warm cache. > > By adding a covering index, we get down to 4ms and 255ms for a cold > cache, and 4ms and 143ms for a warm cache! > > Additionally, when there's a lot of archived or accepted patches > (I used ~11000 archived out of the 15000 total in my test set) > the query time goes from 28ms and 72ms down to 2ms and 33-40ms! > > Signed-off-by: Stewart Smith <stewart@linux.ibm.com> As before, I'm trusting your leet skillz here in all DB-related things. I don't have a large enough dataset to validate this properly but I am seeing a performance improvement in the smaller set I do have. Reviewed-by: Stephen Finucane <stephen@that.guru> Stephen > --- > .../0028_add_list_covering_index.py | 19 +++++++++++++++++++ > patchwork/models.py | 6 ++++++ > 2 files changed, 25 insertions(+) > create mode 100644 patchwork/migrations/0028_add_list_covering_index.py > > diff --git a/patchwork/migrations/0028_add_list_covering_index.py b/patchwork/migrations/0028_add_list_covering_index.py > new file mode 100644 > index 000000000000..65ebaefbead7 > --- /dev/null > +++ b/patchwork/migrations/0028_add_list_covering_index.py > @@ -0,0 +1,19 @@ > +# -*- coding: utf-8 -*- > +# Generated by Django 1.11.15 on 2018-08-09 17:24 > +from __future__ import unicode_literals > + > +from django.db import migrations, models > + > + > +class Migration(migrations.Migration): > + > + dependencies = [ > + ('patchwork', '0027_add_comment_date_index'), > + ] > + > + operations = [ > + migrations.AddIndex( > + model_name='patch', > + index=models.Index(fields=['archived', 'patch_project', 'state', 'delegate'], name='patch_list_covering_idx'), > + ), > + ] > diff --git a/patchwork/models.py b/patchwork/models.py > index d2389cfdad29..15224ad69cfa 100644 > --- a/patchwork/models.py > +++ b/patchwork/models.py > @@ -598,6 +598,12 @@ class Patch(SeriesMixin, Submission): > if django.VERSION >= (1, 10): > base_manager_name = 'objects' > > + indexes = [ > + # This is a covering index for the /list/ query > + models.Index(fields=['archived','patch_project','state','delegate'], > + name='patch_list_covering_idx'), > + ] > + > > class Comment(EmailMixin, models.Model): > # parent
Stephen Finucane <stephen@that.guru> writes: > On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote: >> In constructing the list of patches for a project, there are two >> main queries that are executed: >> 1) get a count() of how many patches there are >> 2) Get the page of results being displayed >> >> In a test dataset of ~11500 LKML patches and ~4000 others, the existing >> code would take around 585ms and 858ms with a cold cache and 28ms and >> 198ms for a warm cache. >> >> By adding a covering index, we get down to 4ms and 255ms for a cold >> cache, and 4ms and 143ms for a warm cache! >> >> Additionally, when there's a lot of archived or accepted patches >> (I used ~11000 archived out of the 15000 total in my test set) >> the query time goes from 28ms and 72ms down to 2ms and 33-40ms! >> >> Signed-off-by: Stewart Smith <stewart@linux.ibm.com> > > As before, I'm trusting your leet skillz here in all DB-related things. > I don't have a large enough dataset to validate this properly but I am > seeing a performance improvement in the smaller set I do have. Simply put, an index is just a tree with a key of something like this: |IndexField1|IndexField|PrimaryKeyOrPointerToHowtoGetFullRow| So, if you have all the fields you need to answer a query in the index, the database engine doesn't *need* to go and find the real row, it can answer the query just from the index - saving potentially *lot* of disk seeks as well as being a lot more cache friendly - especially as this will group things on disk based on the order of the index, so if we include things the right way for us, we get a cache friendly structure that groups information on patches in a project close together on disk and in an order that we're likely to request.
diff --git a/patchwork/migrations/0028_add_list_covering_index.py b/patchwork/migrations/0028_add_list_covering_index.py new file mode 100644 index 000000000000..65ebaefbead7 --- /dev/null +++ b/patchwork/migrations/0028_add_list_covering_index.py @@ -0,0 +1,19 @@ +# -*- coding: utf-8 -*- +# Generated by Django 1.11.15 on 2018-08-09 17:24 +from __future__ import unicode_literals + +from django.db import migrations, models + + +class Migration(migrations.Migration): + + dependencies = [ + ('patchwork', '0027_add_comment_date_index'), + ] + + operations = [ + migrations.AddIndex( + model_name='patch', + index=models.Index(fields=['archived', 'patch_project', 'state', 'delegate'], name='patch_list_covering_idx'), + ), + ] diff --git a/patchwork/models.py b/patchwork/models.py index d2389cfdad29..15224ad69cfa 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -598,6 +598,12 @@ class Patch(SeriesMixin, Submission): if django.VERSION >= (1, 10): base_manager_name = 'objects' + indexes = [ + # This is a covering index for the /list/ query + models.Index(fields=['archived','patch_project','state','delegate'], + name='patch_list_covering_idx'), + ] + class Comment(EmailMixin, models.Model): # parent
In constructing the list of patches for a project, there are two main queries that are executed: 1) get a count() of how many patches there are 2) Get the page of results being displayed In a test dataset of ~11500 LKML patches and ~4000 others, the existing code would take around 585ms and 858ms with a cold cache and 28ms and 198ms for a warm cache. By adding a covering index, we get down to 4ms and 255ms for a cold cache, and 4ms and 143ms for a warm cache! Additionally, when there's a lot of archived or accepted patches (I used ~11000 archived out of the 15000 total in my test set) the query time goes from 28ms and 72ms down to 2ms and 33-40ms! Signed-off-by: Stewart Smith <stewart@linux.ibm.com> --- .../0028_add_list_covering_index.py | 19 +++++++++++++++++++ patchwork/models.py | 6 ++++++ 2 files changed, 25 insertions(+) create mode 100644 patchwork/migrations/0028_add_list_covering_index.py