Message ID | 20180308155437.8082-1-dja@axtens.net |
---|---|
State | Accepted |
Headers | show |
Series | Fix slow Patch counting query | expand |
> (It turns out all of this is all for the "| NN patches" counter we > added to the filter bar!!) Ah, this is wrong, the paginator needs it too. > I think we drop the patch counter for stable and try to get this and So - AFAICT - we can't fix this in stable. > the event stuff merged to master promptly, and just tag 2.1. (To that So we should tag 2.1 sooner rather than later! Regards, Daniel > end, I will re-read and finish reviewing the event stuff soon.) > --- > patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++ > patchwork/models.py | 4 +++ > patchwork/parser.py | 1 + > patchwork/views/__init__.py | 2 +- > 4 files changed, 45 insertions(+), 1 deletion(-) > create mode 100644 patchwork/migrations/0024_patch_patch_project.py > > diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py > new file mode 100644 > index 000000000000..76d8f144c9dd > --- /dev/null > +++ b/patchwork/migrations/0024_patch_patch_project.py > @@ -0,0 +1,39 @@ > +# -*- coding: utf-8 -*- > +# Generated by Django 1.11.10 on 2018-03-08 01:51 > +from __future__ import unicode_literals > + > +from django.db import migrations, models > +import django.db.models.deletion > + > + > +class Migration(migrations.Migration): > + # per migration 16, but note this seems to be going away > + # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113) > + atomic = False > + > + dependencies = [ > + ('patchwork', '0023_timezone_unify'), > + ] > + > + operations = [ > + migrations.AddField( > + model_name='patch', > + name='patch_project', > + field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), > + preserve_default=False, > + ), > + > + # as with 10, this will break if you use non-default table names > + migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id = > + (SELECT project_id FROM patchwork_submission > + WHERE patchwork_submission.id = > + patchwork_patch.submission_ptr_id);''' > + ), > + > + migrations.AlterField( > + model_name='patch', > + name='patch_project', > + field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), > + ), > + > + ] > diff --git a/patchwork/models.py b/patchwork/models.py > index b2491752f04a..3b905c4cd75b 100644 > --- a/patchwork/models.py > +++ b/patchwork/models.py > @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission): > archived = models.BooleanField(default=False) > hash = HashField(null=True, blank=True) > > + # duplicate project from submission in subclass so we can count the > + # patches in a project without needing to do a JOIN. > + patch_project = models.ForeignKey(Project, on_delete=models.CASCADE) > + > objects = PatchManager() > > @staticmethod > diff --git a/patchwork/parser.py b/patchwork/parser.py > index 803e98592fa8..805037c72d73 100644 > --- a/patchwork/parser.py > +++ b/patchwork/parser.py > @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None): > patch = Patch.objects.create( > msgid=msgid, > project=project, > + patch_project=project, > name=name[:255], > date=date, > headers=headers, > diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py > index 3baf2999a836..f8d23a388ac7 100644 > --- a/patchwork/views/__init__.py > +++ b/patchwork/views/__init__.py > @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None, > context['filters'].set_status(filterclass, setting) > > if patches is None: > - patches = Patch.objects.filter(project=project) > + patches = Patch.objects.filter(patch_project=project) > > # annotate with tag counts > patches = patches.with_tag_counts(project) > -- > 2.14.1
I have fixed up a small error where I didn't update a test and applied this to master. I'm just waiting for Travis CI to run on my repository, and then I will push it to the main repo. As this is very important to OzLabs, I will spin a 2.1-rc1 soon. Regards, Daniel Daniel Axtens <dja@axtens.net> writes: > Stephen Rothwell noticed (way back in September - sorry Stephen!) that > the following query is really slow on OzLabs: > > SELECT COUNT(*) AS "__count" FROM "patchwork_patch" > INNER JOIN "patchwork_submission" ON > ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id") > WHERE ("patchwork_submission"."project_id" = 14 AND > "patchwork_patch"."state_id" IN > (SELECT U0."id" AS Col1 FROM "patchwork_state" U0 > WHERE U0."action_required" = true > ORDER BY U0."ordering" ASC)); > > I think this is really slow because we have to join the patch and > submission table to get the project id, which we need to filter the > patches. > > Duplicate the project id in the patch table itself, which allows us to > avoid the JOIN. > > The new query reads as: > SELECT COUNT(*) AS "__count" FROM "patchwork_patch" > WHERE ("patchwork_patch"."patch_project_id" = 1 AND > "patchwork_patch"."state_id" IN > (SELECT U0."id" AS Col1 FROM "patchwork_state" U0 > WHERE U0."action_required" = true > ORDER BY U0."ordering" ASC)); > > Very simple testing on a small, artifical Postgres instance (3 > projects, 102711 patches), shows speed gains of ~1.5-5x for this > query. Looking at Postgres' cost estimates (EXPLAIN) of the first > query vs the second query, we see a ~1.75x improvement there too. > > I suspect the gains will be bigger on OzLabs. > > (It turns out all of this is all for the "| NN patches" counter we > added to the filter bar!!) > > Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> > Signed-off-by: Daniel Axtens <dja@axtens.net> > > --- > > This requires a migration, so I don't think we can feasibly do it as a > stable update. > > I think we drop the patch counter for stable and try to get this and > the event stuff merged to master promptly, and just tag 2.1. (To that > end, I will re-read and finish reviewing the event stuff soon.) > --- > patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++ > patchwork/models.py | 4 +++ > patchwork/parser.py | 1 + > patchwork/views/__init__.py | 2 +- > 4 files changed, 45 insertions(+), 1 deletion(-) > create mode 100644 patchwork/migrations/0024_patch_patch_project.py > > diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py > new file mode 100644 > index 000000000000..76d8f144c9dd > --- /dev/null > +++ b/patchwork/migrations/0024_patch_patch_project.py > @@ -0,0 +1,39 @@ > +# -*- coding: utf-8 -*- > +# Generated by Django 1.11.10 on 2018-03-08 01:51 > +from __future__ import unicode_literals > + > +from django.db import migrations, models > +import django.db.models.deletion > + > + > +class Migration(migrations.Migration): > + # per migration 16, but note this seems to be going away > + # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113) > + atomic = False > + > + dependencies = [ > + ('patchwork', '0023_timezone_unify'), > + ] > + > + operations = [ > + migrations.AddField( > + model_name='patch', > + name='patch_project', > + field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), > + preserve_default=False, > + ), > + > + # as with 10, this will break if you use non-default table names > + migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id = > + (SELECT project_id FROM patchwork_submission > + WHERE patchwork_submission.id = > + patchwork_patch.submission_ptr_id);''' > + ), > + > + migrations.AlterField( > + model_name='patch', > + name='patch_project', > + field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), > + ), > + > + ] > diff --git a/patchwork/models.py b/patchwork/models.py > index b2491752f04a..3b905c4cd75b 100644 > --- a/patchwork/models.py > +++ b/patchwork/models.py > @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission): > archived = models.BooleanField(default=False) > hash = HashField(null=True, blank=True) > > + # duplicate project from submission in subclass so we can count the > + # patches in a project without needing to do a JOIN. > + patch_project = models.ForeignKey(Project, on_delete=models.CASCADE) > + > objects = PatchManager() > > @staticmethod > diff --git a/patchwork/parser.py b/patchwork/parser.py > index 803e98592fa8..805037c72d73 100644 > --- a/patchwork/parser.py > +++ b/patchwork/parser.py > @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None): > patch = Patch.objects.create( > msgid=msgid, > project=project, > + patch_project=project, > name=name[:255], > date=date, > headers=headers, > diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py > index 3baf2999a836..f8d23a388ac7 100644 > --- a/patchwork/views/__init__.py > +++ b/patchwork/views/__init__.py > @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None, > context['filters'].set_status(filterclass, setting) > > if patches is None: > - patches = Patch.objects.filter(project=project) > + patches = Patch.objects.filter(patch_project=project) > > # annotate with tag counts > patches = patches.with_tag_counts(project) > -- > 2.14.1
On Fri, 2018-03-09 at 02:54 +1100, Daniel Axtens wrote: > Stephen Rothwell noticed (way back in September - sorry Stephen!) that > the following query is really slow on OzLabs: > > SELECT COUNT(*) AS "__count" FROM "patchwork_patch" > INNER JOIN "patchwork_submission" ON > ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id") > WHERE ("patchwork_submission"."project_id" = 14 AND > "patchwork_patch"."state_id" IN > (SELECT U0."id" AS Col1 FROM "patchwork_state" U0 > WHERE U0."action_required" = true > ORDER BY U0."ordering" ASC)); > > I think this is really slow because we have to join the patch and > submission table to get the project id, which we need to filter the > patches. > > Duplicate the project id in the patch table itself, which allows us to > avoid the JOIN. > > The new query reads as: > SELECT COUNT(*) AS "__count" FROM "patchwork_patch" > WHERE ("patchwork_patch"."patch_project_id" = 1 AND > "patchwork_patch"."state_id" IN > (SELECT U0."id" AS Col1 FROM "patchwork_state" U0 > WHERE U0."action_required" = true > ORDER BY U0."ordering" ASC)); > > Very simple testing on a small, artifical Postgres instance (3 > projects, 102711 patches), shows speed gains of ~1.5-5x for this > query. Looking at Postgres' cost estimates (EXPLAIN) of the first > query vs the second query, we see a ~1.75x improvement there too. > > I suspect the gains will be bigger on OzLabs. > > (It turns out all of this is all for the "| NN patches" counter we > added to the filter bar!!) > > Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> > Signed-off-by: Daniel Axtens <dja@axtens.net> It's unfortunate that this has already merged. While it works, it defeats the whole point of the multi-table inheritance introduced in commit 86172ccc1 (normalization). To be honest, given the performance impacts that particular change introduced (which we're only seeing at scale), I'd rather denormalize the whole thing and fold the 'Patch' and 'CoverLetter' models back into 'Submission' and just use a 'type' field (or similar) to control behavior. Is there any reason not to do this? If not, I'd like to wait on 2.1 until we've done both this and the event fixes. Stephen > --- > > This requires a migration, so I don't think we can feasibly do it as a > stable update. > > I think we drop the patch counter for stable and try to get this and > the event stuff merged to master promptly, and just tag 2.1. (To that > end, I will re-read and finish reviewing the event stuff soon.) > --- > patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++ > patchwork/models.py | 4 +++ > patchwork/parser.py | 1 + > patchwork/views/__init__.py | 2 +- > 4 files changed, 45 insertions(+), 1 deletion(-) > create mode 100644 patchwork/migrations/0024_patch_patch_project.py > > diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py > new file mode 100644 > index 000000000000..76d8f144c9dd > --- /dev/null > +++ b/patchwork/migrations/0024_patch_patch_project.py > @@ -0,0 +1,39 @@ > +# -*- coding: utf-8 -*- > +# Generated by Django 1.11.10 on 2018-03-08 01:51 > +from __future__ import unicode_literals > + > +from django.db import migrations, models > +import django.db.models.deletion > + > + > +class Migration(migrations.Migration): > + # per migration 16, but note this seems to be going away > + # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113) > + atomic = False > + > + dependencies = [ > + ('patchwork', '0023_timezone_unify'), > + ] > + > + operations = [ > + migrations.AddField( > + model_name='patch', > + name='patch_project', > + field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), > + preserve_default=False, > + ), > + > + # as with 10, this will break if you use non-default table names > + migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id = > + (SELECT project_id FROM patchwork_submission > + WHERE patchwork_submission.id = > + patchwork_patch.submission_ptr_id);''' > + ), > + > + migrations.AlterField( > + model_name='patch', > + name='patch_project', > + field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), > + ), > + > + ] > diff --git a/patchwork/models.py b/patchwork/models.py > index b2491752f04a..3b905c4cd75b 100644 > --- a/patchwork/models.py > +++ b/patchwork/models.py > @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission): > archived = models.BooleanField(default=False) > hash = HashField(null=True, blank=True) > > + # duplicate project from submission in subclass so we can count the > + # patches in a project without needing to do a JOIN. > + patch_project = models.ForeignKey(Project, on_delete=models.CASCADE) > + > objects = PatchManager() > > @staticmethod > diff --git a/patchwork/parser.py b/patchwork/parser.py > index 803e98592fa8..805037c72d73 100644 > --- a/patchwork/parser.py > +++ b/patchwork/parser.py > @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None): > patch = Patch.objects.create( > msgid=msgid, > project=project, > + patch_project=project, > name=name[:255], > date=date, > headers=headers, > diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py > index 3baf2999a836..f8d23a388ac7 100644 > --- a/patchwork/views/__init__.py > +++ b/patchwork/views/__init__.py > @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None, > context['filters'].set_status(filterclass, setting) > > if patches is None: > - patches = Patch.objects.filter(project=project) > + patches = Patch.objects.filter(patch_project=project) > > # annotate with tag counts > patches = patches.with_tag_counts(project)
Stephen Finucane <stephen@that.guru> writes: > On Fri, 2018-03-09 at 02:54 +1100, Daniel Axtens wrote: >> Stephen Rothwell noticed (way back in September - sorry Stephen!) that >> the following query is really slow on OzLabs: >> >> SELECT COUNT(*) AS "__count" FROM "patchwork_patch" >> INNER JOIN "patchwork_submission" ON >> ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id") >> WHERE ("patchwork_submission"."project_id" = 14 AND >> "patchwork_patch"."state_id" IN >> (SELECT U0."id" AS Col1 FROM "patchwork_state" U0 >> WHERE U0."action_required" = true >> ORDER BY U0."ordering" ASC)); >> >> I think this is really slow because we have to join the patch and >> submission table to get the project id, which we need to filter the >> patches. >> >> Duplicate the project id in the patch table itself, which allows us to >> avoid the JOIN. >> >> The new query reads as: >> SELECT COUNT(*) AS "__count" FROM "patchwork_patch" >> WHERE ("patchwork_patch"."patch_project_id" = 1 AND >> "patchwork_patch"."state_id" IN >> (SELECT U0."id" AS Col1 FROM "patchwork_state" U0 >> WHERE U0."action_required" = true >> ORDER BY U0."ordering" ASC)); >> >> Very simple testing on a small, artifical Postgres instance (3 >> projects, 102711 patches), shows speed gains of ~1.5-5x for this >> query. Looking at Postgres' cost estimates (EXPLAIN) of the first >> query vs the second query, we see a ~1.75x improvement there too. >> >> I suspect the gains will be bigger on OzLabs. >> >> (It turns out all of this is all for the "| NN patches" counter we >> added to the filter bar!!) >> >> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> >> Signed-off-by: Daniel Axtens <dja@axtens.net> > > It's unfortunate that this has already merged. While it works, it > defeats the whole point of the multi-table inheritance introduced in > commit 86172ccc1 (normalization). To be honest, given the performance > impacts that particular change introduced (which we're only seeing at > scale), I'd rather denormalize the whole thing and fold the 'Patch' and > 'CoverLetter' models back into 'Submission' and just use a 'type' field > (or similar) to control behavior. Is there any reason not to do this? I agree that would be a better conceptual solution. The reason I didn't do it is because I tried a couple of times and couldn't get it all to work, and I haven't had the time to really sit down and re-engineer it. If you can get it to work I am happy to revert this and apply that; the changes to the database schema aren't at all difficult to undo so any brave souls running master won't be completely hosed. Regards, Daniel > If not, I'd like to wait on 2.1 until we've done both this and the > event fixes. > > Stephen > >> --- >> >> This requires a migration, so I don't think we can feasibly do it as a >> stable update. >> >> I think we drop the patch counter for stable and try to get this and >> the event stuff merged to master promptly, and just tag 2.1. (To that >> end, I will re-read and finish reviewing the event stuff soon.) >> --- >> patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++ >> patchwork/models.py | 4 +++ >> patchwork/parser.py | 1 + >> patchwork/views/__init__.py | 2 +- >> 4 files changed, 45 insertions(+), 1 deletion(-) >> create mode 100644 patchwork/migrations/0024_patch_patch_project.py >> >> diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py >> new file mode 100644 >> index 000000000000..76d8f144c9dd >> --- /dev/null >> +++ b/patchwork/migrations/0024_patch_patch_project.py >> @@ -0,0 +1,39 @@ >> +# -*- coding: utf-8 -*- >> +# Generated by Django 1.11.10 on 2018-03-08 01:51 >> +from __future__ import unicode_literals >> + >> +from django.db import migrations, models >> +import django.db.models.deletion >> + >> + >> +class Migration(migrations.Migration): >> + # per migration 16, but note this seems to be going away >> + # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113) >> + atomic = False >> + >> + dependencies = [ >> + ('patchwork', '0023_timezone_unify'), >> + ] >> + >> + operations = [ >> + migrations.AddField( >> + model_name='patch', >> + name='patch_project', >> + field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), >> + preserve_default=False, >> + ), >> + >> + # as with 10, this will break if you use non-default table names >> + migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id = >> + (SELECT project_id FROM patchwork_submission >> + WHERE patchwork_submission.id = >> + patchwork_patch.submission_ptr_id);''' >> + ), >> + >> + migrations.AlterField( >> + model_name='patch', >> + name='patch_project', >> + field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), >> + ), >> + >> + ] >> diff --git a/patchwork/models.py b/patchwork/models.py >> index b2491752f04a..3b905c4cd75b 100644 >> --- a/patchwork/models.py >> +++ b/patchwork/models.py >> @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission): >> archived = models.BooleanField(default=False) >> hash = HashField(null=True, blank=True) >> >> + # duplicate project from submission in subclass so we can count the >> + # patches in a project without needing to do a JOIN. >> + patch_project = models.ForeignKey(Project, on_delete=models.CASCADE) >> + >> objects = PatchManager() >> >> @staticmethod >> diff --git a/patchwork/parser.py b/patchwork/parser.py >> index 803e98592fa8..805037c72d73 100644 >> --- a/patchwork/parser.py >> +++ b/patchwork/parser.py >> @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None): >> patch = Patch.objects.create( >> msgid=msgid, >> project=project, >> + patch_project=project, >> name=name[:255], >> date=date, >> headers=headers, >> diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py >> index 3baf2999a836..f8d23a388ac7 100644 >> --- a/patchwork/views/__init__.py >> +++ b/patchwork/views/__init__.py >> @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None, >> context['filters'].set_status(filterclass, setting) >> >> if patches is None: >> - patches = Patch.objects.filter(project=project) >> + patches = Patch.objects.filter(patch_project=project) >> >> # annotate with tag counts >> patches = patches.with_tag_counts(project)
> > It's unfortunate that this has already merged. While it works, it > defeats the whole point of the multi-table inheritance introduced in > commit 86172ccc1 (normalization). To be honest, given the performance > impacts that particular change introduced (which we're only seeing at > scale), I'd rather denormalize the whole thing and fold the 'Patch' and > 'CoverLetter' models back into 'Submission' and just use a 'type' field > (or similar) to control behavior. Is there any reason not to do this? > If not, I'd like to wait on 2.1 until we've done both this and the > event fixes. The other thing worth noting for this is that this issue is impacting OzLabs pretty badly *now* - one of the queries took over 200s! - and so they'd really like a fix sooner rather than later. So that was weighing on my mind when I went with the half-fix rather than full denomalisation. Regards, Daniel > > Stephen > >> --- >> >> This requires a migration, so I don't think we can feasibly do it as a >> stable update. >> >> I think we drop the patch counter for stable and try to get this and >> the event stuff merged to master promptly, and just tag 2.1. (To that >> end, I will re-read and finish reviewing the event stuff soon.) >> --- >> patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++ >> patchwork/models.py | 4 +++ >> patchwork/parser.py | 1 + >> patchwork/views/__init__.py | 2 +- >> 4 files changed, 45 insertions(+), 1 deletion(-) >> create mode 100644 patchwork/migrations/0024_patch_patch_project.py >> >> diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py >> new file mode 100644 >> index 000000000000..76d8f144c9dd >> --- /dev/null >> +++ b/patchwork/migrations/0024_patch_patch_project.py >> @@ -0,0 +1,39 @@ >> +# -*- coding: utf-8 -*- >> +# Generated by Django 1.11.10 on 2018-03-08 01:51 >> +from __future__ import unicode_literals >> + >> +from django.db import migrations, models >> +import django.db.models.deletion >> + >> + >> +class Migration(migrations.Migration): >> + # per migration 16, but note this seems to be going away >> + # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113) >> + atomic = False >> + >> + dependencies = [ >> + ('patchwork', '0023_timezone_unify'), >> + ] >> + >> + operations = [ >> + migrations.AddField( >> + model_name='patch', >> + name='patch_project', >> + field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), >> + preserve_default=False, >> + ), >> + >> + # as with 10, this will break if you use non-default table names >> + migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id = >> + (SELECT project_id FROM patchwork_submission >> + WHERE patchwork_submission.id = >> + patchwork_patch.submission_ptr_id);''' >> + ), >> + >> + migrations.AlterField( >> + model_name='patch', >> + name='patch_project', >> + field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), >> + ), >> + >> + ] >> diff --git a/patchwork/models.py b/patchwork/models.py >> index b2491752f04a..3b905c4cd75b 100644 >> --- a/patchwork/models.py >> +++ b/patchwork/models.py >> @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission): >> archived = models.BooleanField(default=False) >> hash = HashField(null=True, blank=True) >> >> + # duplicate project from submission in subclass so we can count the >> + # patches in a project without needing to do a JOIN. >> + patch_project = models.ForeignKey(Project, on_delete=models.CASCADE) >> + >> objects = PatchManager() >> >> @staticmethod >> diff --git a/patchwork/parser.py b/patchwork/parser.py >> index 803e98592fa8..805037c72d73 100644 >> --- a/patchwork/parser.py >> +++ b/patchwork/parser.py >> @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None): >> patch = Patch.objects.create( >> msgid=msgid, >> project=project, >> + patch_project=project, >> name=name[:255], >> date=date, >> headers=headers, >> diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py >> index 3baf2999a836..f8d23a388ac7 100644 >> --- a/patchwork/views/__init__.py >> +++ b/patchwork/views/__init__.py >> @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None, >> context['filters'].set_status(filterclass, setting) >> >> if patches is None: >> - patches = Patch.objects.filter(project=project) >> + patches = Patch.objects.filter(patch_project=project) >> >> # annotate with tag counts >> patches = patches.with_tag_counts(project)
On Wed, 2018-03-28 at 09:42 +1100, Daniel Axtens wrote: > Stephen Finucane <stephen@that.guru> writes: > > > On Fri, 2018-03-09 at 02:54 +1100, Daniel Axtens wrote: > > > Stephen Rothwell noticed (way back in September - sorry Stephen!) that > > > the following query is really slow on OzLabs: > > > > > > SELECT COUNT(*) AS "__count" FROM "patchwork_patch" > > > INNER JOIN "patchwork_submission" ON > > > ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id") > > > WHERE ("patchwork_submission"."project_id" = 14 AND > > > "patchwork_patch"."state_id" IN > > > (SELECT U0."id" AS Col1 FROM "patchwork_state" U0 > > > WHERE U0."action_required" = true > > > ORDER BY U0."ordering" ASC)); > > > > > > I think this is really slow because we have to join the patch and > > > submission table to get the project id, which we need to filter the > > > patches. > > > > > > Duplicate the project id in the patch table itself, which allows us to > > > avoid the JOIN. > > > > > > The new query reads as: > > > SELECT COUNT(*) AS "__count" FROM "patchwork_patch" > > > WHERE ("patchwork_patch"."patch_project_id" = 1 AND > > > "patchwork_patch"."state_id" IN > > > (SELECT U0."id" AS Col1 FROM "patchwork_state" U0 > > > WHERE U0."action_required" = true > > > ORDER BY U0."ordering" ASC)); > > > > > > Very simple testing on a small, artifical Postgres instance (3 > > > projects, 102711 patches), shows speed gains of ~1.5-5x for this > > > query. Looking at Postgres' cost estimates (EXPLAIN) of the first > > > query vs the second query, we see a ~1.75x improvement there too. > > > > > > I suspect the gains will be bigger on OzLabs. > > > > > > (It turns out all of this is all for the "| NN patches" counter we > > > added to the filter bar!!) > > > > > > Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> > > > Signed-off-by: Daniel Axtens <dja@axtens.net> > > > > It's unfortunate that this has already merged. While it works, it > > defeats the whole point of the multi-table inheritance introduced in > > commit 86172ccc1 (normalization). To be honest, given the performance > > impacts that particular change introduced (which we're only seeing at > > scale), I'd rather denormalize the whole thing and fold the 'Patch' and > > 'CoverLetter' models back into 'Submission' and just use a 'type' field > > (or similar) to control behavior. Is there any reason not to do this? > > I agree that would be a better conceptual solution. The reason I didn't > do it is because I tried a couple of times and couldn't get it all to > work, and I haven't had the time to really sit down and re-engineer it. > > If you can get it to work I am happy to revert this and apply that; the > changes to the database schema aren't at all difficult to undo so any > brave souls running master won't be completely hosed. OK, that's fair. Let me see how I fare with this. The biggest issues I see are the custom managers that we use for the Patch class. I'm hoping Veronika's tag infrastructure rework will help with this and I can build upon that. I'll review said RFC this week to see if that's the case. Stephen > Regards, > Daniel > > > If not, I'd like to wait on 2.1 until we've done both this and the > > event fixes. > > > > Stephen > > > > > --- > > > > > > This requires a migration, so I don't think we can feasibly do it as a > > > stable update. > > > > > > I think we drop the patch counter for stable and try to get this and > > > the event stuff merged to master promptly, and just tag 2.1. (To that > > > end, I will re-read and finish reviewing the event stuff soon.) > > > --- > > > patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++ > > > patchwork/models.py | 4 +++ > > > patchwork/parser.py | 1 + > > > patchwork/views/__init__.py | 2 +- > > > 4 files changed, 45 insertions(+), 1 deletion(-) > > > create mode 100644 patchwork/migrations/0024_patch_patch_project.py > > > > > > diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py > > > new file mode 100644 > > > index 000000000000..76d8f144c9dd > > > --- /dev/null > > > +++ b/patchwork/migrations/0024_patch_patch_project.py > > > @@ -0,0 +1,39 @@ > > > +# -*- coding: utf-8 -*- > > > +# Generated by Django 1.11.10 on 2018-03-08 01:51 > > > +from __future__ import unicode_literals > > > + > > > +from django.db import migrations, models > > > +import django.db.models.deletion > > > + > > > + > > > +class Migration(migrations.Migration): > > > + # per migration 16, but note this seems to be going away > > > + # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113) > > > + atomic = False > > > + > > > + dependencies = [ > > > + ('patchwork', '0023_timezone_unify'), > > > + ] > > > + > > > + operations = [ > > > + migrations.AddField( > > > + model_name='patch', > > > + name='patch_project', > > > + field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), > > > + preserve_default=False, > > > + ), > > > + > > > + # as with 10, this will break if you use non-default table names > > > + migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id = > > > + (SELECT project_id FROM patchwork_submission > > > + WHERE patchwork_submission.id = > > > + patchwork_patch.submission_ptr_id);''' > > > + ), > > > + > > > + migrations.AlterField( > > > + model_name='patch', > > > + name='patch_project', > > > + field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), > > > + ), > > > + > > > + ] > > > diff --git a/patchwork/models.py b/patchwork/models.py > > > index b2491752f04a..3b905c4cd75b 100644 > > > --- a/patchwork/models.py > > > +++ b/patchwork/models.py > > > @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission): > > > archived = models.BooleanField(default=False) > > > hash = HashField(null=True, blank=True) > > > > > > + # duplicate project from submission in subclass so we can count the > > > + # patches in a project without needing to do a JOIN. > > > + patch_project = models.ForeignKey(Project, on_delete=models.CASCADE) > > > + > > > objects = PatchManager() > > > > > > @staticmethod > > > diff --git a/patchwork/parser.py b/patchwork/parser.py > > > index 803e98592fa8..805037c72d73 100644 > > > --- a/patchwork/parser.py > > > +++ b/patchwork/parser.py > > > @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None): > > > patch = Patch.objects.create( > > > msgid=msgid, > > > project=project, > > > + patch_project=project, > > > name=name[:255], > > > date=date, > > > headers=headers, > > > diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py > > > index 3baf2999a836..f8d23a388ac7 100644 > > > --- a/patchwork/views/__init__.py > > > +++ b/patchwork/views/__init__.py > > > @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None, > > > context['filters'].set_status(filterclass, setting) > > > > > > if patches is None: > > > - patches = Patch.objects.filter(project=project) > > > + patches = Patch.objects.filter(patch_project=project) > > > > > > # annotate with tag counts > > > patches = patches.with_tag_counts(project)
diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py new file mode 100644 index 000000000000..76d8f144c9dd --- /dev/null +++ b/patchwork/migrations/0024_patch_patch_project.py @@ -0,0 +1,39 @@ +# -*- coding: utf-8 -*- +# Generated by Django 1.11.10 on 2018-03-08 01:51 +from __future__ import unicode_literals + +from django.db import migrations, models +import django.db.models.deletion + + +class Migration(migrations.Migration): + # per migration 16, but note this seems to be going away + # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113) + atomic = False + + dependencies = [ + ('patchwork', '0023_timezone_unify'), + ] + + operations = [ + migrations.AddField( + model_name='patch', + name='patch_project', + field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), + preserve_default=False, + ), + + # as with 10, this will break if you use non-default table names + migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id = + (SELECT project_id FROM patchwork_submission + WHERE patchwork_submission.id = + patchwork_patch.submission_ptr_id);''' + ), + + migrations.AlterField( + model_name='patch', + name='patch_project', + field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'), + ), + + ] diff --git a/patchwork/models.py b/patchwork/models.py index b2491752f04a..3b905c4cd75b 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission): archived = models.BooleanField(default=False) hash = HashField(null=True, blank=True) + # duplicate project from submission in subclass so we can count the + # patches in a project without needing to do a JOIN. + patch_project = models.ForeignKey(Project, on_delete=models.CASCADE) + objects = PatchManager() @staticmethod diff --git a/patchwork/parser.py b/patchwork/parser.py index 803e98592fa8..805037c72d73 100644 --- a/patchwork/parser.py +++ b/patchwork/parser.py @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None): patch = Patch.objects.create( msgid=msgid, project=project, + patch_project=project, name=name[:255], date=date, headers=headers, diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py index 3baf2999a836..f8d23a388ac7 100644 --- a/patchwork/views/__init__.py +++ b/patchwork/views/__init__.py @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None, context['filters'].set_status(filterclass, setting) if patches is None: - patches = Patch.objects.filter(project=project) + patches = Patch.objects.filter(patch_project=project) # annotate with tag counts patches = patches.with_tag_counts(project)
Stephen Rothwell noticed (way back in September - sorry Stephen!) that the following query is really slow on OzLabs: SELECT COUNT(*) AS "__count" FROM "patchwork_patch" INNER JOIN "patchwork_submission" ON ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id") WHERE ("patchwork_submission"."project_id" = 14 AND "patchwork_patch"."state_id" IN (SELECT U0."id" AS Col1 FROM "patchwork_state" U0 WHERE U0."action_required" = true ORDER BY U0."ordering" ASC)); I think this is really slow because we have to join the patch and submission table to get the project id, which we need to filter the patches. Duplicate the project id in the patch table itself, which allows us to avoid the JOIN. The new query reads as: SELECT COUNT(*) AS "__count" FROM "patchwork_patch" WHERE ("patchwork_patch"."patch_project_id" = 1 AND "patchwork_patch"."state_id" IN (SELECT U0."id" AS Col1 FROM "patchwork_state" U0 WHERE U0."action_required" = true ORDER BY U0."ordering" ASC)); Very simple testing on a small, artifical Postgres instance (3 projects, 102711 patches), shows speed gains of ~1.5-5x for this query. Looking at Postgres' cost estimates (EXPLAIN) of the first query vs the second query, we see a ~1.75x improvement there too. I suspect the gains will be bigger on OzLabs. (It turns out all of this is all for the "| NN patches" counter we added to the filter bar!!) Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Daniel Axtens <dja@axtens.net> --- This requires a migration, so I don't think we can feasibly do it as a stable update. I think we drop the patch counter for stable and try to get this and the event stuff merged to master promptly, and just tag 2.1. (To that end, I will re-read and finish reviewing the event stuff soon.) --- patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++ patchwork/models.py | 4 +++ patchwork/parser.py | 1 + patchwork/views/__init__.py | 2 +- 4 files changed, 45 insertions(+), 1 deletion(-) create mode 100644 patchwork/migrations/0024_patch_patch_project.py