diff mbox series

Fix slow Patch counting query

Message ID 20180308155437.8082-1-dja@axtens.net
State Accepted
Headers show
Series Fix slow Patch counting query | expand

Commit Message

Daniel Axtens March 8, 2018, 3:54 p.m. UTC
Stephen Rothwell noticed (way back in September - sorry Stephen!) that
the following query is really slow on OzLabs:

SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
    INNER JOIN "patchwork_submission" ON
        ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id")
    WHERE ("patchwork_submission"."project_id" = 14 AND
           "patchwork_patch"."state_id" IN
	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
                WHERE U0."action_required" = true
		ORDER BY U0."ordering" ASC));

I think this is really slow because we have to join the patch and
submission table to get the project id, which we need to filter the
patches.

Duplicate the project id in the patch table itself, which allows us to
avoid the JOIN.

The new query reads as:
SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
    WHERE ("patchwork_patch"."patch_project_id" = 1 AND
           "patchwork_patch"."state_id" IN
	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
	        WHERE U0."action_required" = true
		ORDER BY U0."ordering" ASC));

Very simple testing on a small, artifical Postgres instance (3
projects, 102711 patches), shows speed gains of ~1.5-5x for this
query. Looking at Postgres' cost estimates (EXPLAIN) of the first
query vs the second query, we see a ~1.75x improvement there too.

I suspect the gains will be bigger on OzLabs.

(It turns out all of this is all for the "| NN patches" counter we
added to the filter bar!!)

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Daniel Axtens <dja@axtens.net>

---

This requires a migration, so I don't think we can feasibly do it as a
stable update.

I think we drop the patch counter for stable and try to get this and
the event stuff merged to master promptly, and just tag 2.1. (To that
end, I will re-read and finish reviewing the event stuff soon.)
---
 patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++
 patchwork/models.py                              |  4 +++
 patchwork/parser.py                              |  1 +
 patchwork/views/__init__.py                      |  2 +-
 4 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 patchwork/migrations/0024_patch_patch_project.py

Comments

Daniel Axtens March 8, 2018, 4:08 p.m. UTC | #1
> (It turns out all of this is all for the "| NN patches" counter we
> added to the filter bar!!)

Ah, this is wrong, the paginator needs it too.

> I think we drop the patch counter for stable and try to get this and

So - AFAICT - we can't fix this in stable.

> the event stuff merged to master promptly, and just tag 2.1. (To that

So we should tag 2.1 sooner rather than later!

Regards,
Daniel

> end, I will re-read and finish reviewing the event stuff soon.)
> ---
>  patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++
>  patchwork/models.py                              |  4 +++
>  patchwork/parser.py                              |  1 +
>  patchwork/views/__init__.py                      |  2 +-
>  4 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100644 patchwork/migrations/0024_patch_patch_project.py
>
> diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py
> new file mode 100644
> index 000000000000..76d8f144c9dd
> --- /dev/null
> +++ b/patchwork/migrations/0024_patch_patch_project.py
> @@ -0,0 +1,39 @@
> +# -*- coding: utf-8 -*-
> +# Generated by Django 1.11.10 on 2018-03-08 01:51
> +from __future__ import unicode_literals
> +
> +from django.db import migrations, models
> +import django.db.models.deletion
> +
> +
> +class Migration(migrations.Migration):
> +    # per migration 16, but note this seems to be going away
> +    # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113)
> +    atomic = False
> +
> +    dependencies = [
> +        ('patchwork', '0023_timezone_unify'),
> +    ]
> +
> +    operations = [
> +        migrations.AddField(
> +            model_name='patch',
> +            name='patch_project',
> +            field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> +            preserve_default=False,
> +        ),
> +
> +        # as with 10, this will break if you use non-default table names
> +        migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id =
> +                               (SELECT project_id FROM patchwork_submission
> +                                WHERE patchwork_submission.id =
> +                                        patchwork_patch.submission_ptr_id);'''
> +        ),
> +
> +        migrations.AlterField(
> +            model_name='patch',
> +            name='patch_project',
> +            field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> +        ),
> +
> +    ]
> diff --git a/patchwork/models.py b/patchwork/models.py
> index b2491752f04a..3b905c4cd75b 100644
> --- a/patchwork/models.py
> +++ b/patchwork/models.py
> @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission):
>      archived = models.BooleanField(default=False)
>      hash = HashField(null=True, blank=True)
>  
> +    # duplicate project from submission in subclass so we can count the
> +    # patches in a project without needing to do a JOIN.
> +    patch_project = models.ForeignKey(Project, on_delete=models.CASCADE)
> +
>      objects = PatchManager()
>  
>      @staticmethod
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 803e98592fa8..805037c72d73 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None):
>          patch = Patch.objects.create(
>              msgid=msgid,
>              project=project,
> +            patch_project=project,
>              name=name[:255],
>              date=date,
>              headers=headers,
> diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
> index 3baf2999a836..f8d23a388ac7 100644
> --- a/patchwork/views/__init__.py
> +++ b/patchwork/views/__init__.py
> @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None,
>              context['filters'].set_status(filterclass, setting)
>  
>      if patches is None:
> -        patches = Patch.objects.filter(project=project)
> +        patches = Patch.objects.filter(patch_project=project)
>  
>      # annotate with tag counts
>      patches = patches.with_tag_counts(project)
> -- 
> 2.14.1
Daniel Axtens March 20, 2018, 11:04 p.m. UTC | #2
I have fixed up a small error where I didn't update a test and applied
this to master. I'm just waiting for Travis CI to run on my repository,
and then I will push it to the main repo.

As this is very important to OzLabs, I will spin a 2.1-rc1 soon.

Regards,
Daniel

Daniel Axtens <dja@axtens.net> writes:

> Stephen Rothwell noticed (way back in September - sorry Stephen!) that
> the following query is really slow on OzLabs:
>
> SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
>     INNER JOIN "patchwork_submission" ON
>         ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id")
>     WHERE ("patchwork_submission"."project_id" = 14 AND
>            "patchwork_patch"."state_id" IN
> 	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
>                 WHERE U0."action_required" = true
> 		ORDER BY U0."ordering" ASC));
>
> I think this is really slow because we have to join the patch and
> submission table to get the project id, which we need to filter the
> patches.
>
> Duplicate the project id in the patch table itself, which allows us to
> avoid the JOIN.
>
> The new query reads as:
> SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
>     WHERE ("patchwork_patch"."patch_project_id" = 1 AND
>            "patchwork_patch"."state_id" IN
> 	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
> 	        WHERE U0."action_required" = true
> 		ORDER BY U0."ordering" ASC)); 
>
> Very simple testing on a small, artifical Postgres instance (3
> projects, 102711 patches), shows speed gains of ~1.5-5x for this
> query. Looking at Postgres' cost estimates (EXPLAIN) of the first
> query vs the second query, we see a ~1.75x improvement there too.
>
> I suspect the gains will be bigger on OzLabs.
>
> (It turns out all of this is all for the "| NN patches" counter we
> added to the filter bar!!)
>
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
>
> ---
>
> This requires a migration, so I don't think we can feasibly do it as a
> stable update.
>
> I think we drop the patch counter for stable and try to get this and
> the event stuff merged to master promptly, and just tag 2.1. (To that
> end, I will re-read and finish reviewing the event stuff soon.)
> ---
>  patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++
>  patchwork/models.py                              |  4 +++
>  patchwork/parser.py                              |  1 +
>  patchwork/views/__init__.py                      |  2 +-
>  4 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100644 patchwork/migrations/0024_patch_patch_project.py
>
> diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py
> new file mode 100644
> index 000000000000..76d8f144c9dd
> --- /dev/null
> +++ b/patchwork/migrations/0024_patch_patch_project.py
> @@ -0,0 +1,39 @@
> +# -*- coding: utf-8 -*-
> +# Generated by Django 1.11.10 on 2018-03-08 01:51
> +from __future__ import unicode_literals
> +
> +from django.db import migrations, models
> +import django.db.models.deletion
> +
> +
> +class Migration(migrations.Migration):
> +    # per migration 16, but note this seems to be going away
> +    # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113)
> +    atomic = False
> +
> +    dependencies = [
> +        ('patchwork', '0023_timezone_unify'),
> +    ]
> +
> +    operations = [
> +        migrations.AddField(
> +            model_name='patch',
> +            name='patch_project',
> +            field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> +            preserve_default=False,
> +        ),
> +
> +        # as with 10, this will break if you use non-default table names
> +        migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id =
> +                               (SELECT project_id FROM patchwork_submission
> +                                WHERE patchwork_submission.id =
> +                                        patchwork_patch.submission_ptr_id);'''
> +        ),
> +
> +        migrations.AlterField(
> +            model_name='patch',
> +            name='patch_project',
> +            field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> +        ),
> +
> +    ]
> diff --git a/patchwork/models.py b/patchwork/models.py
> index b2491752f04a..3b905c4cd75b 100644
> --- a/patchwork/models.py
> +++ b/patchwork/models.py
> @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission):
>      archived = models.BooleanField(default=False)
>      hash = HashField(null=True, blank=True)
>  
> +    # duplicate project from submission in subclass so we can count the
> +    # patches in a project without needing to do a JOIN.
> +    patch_project = models.ForeignKey(Project, on_delete=models.CASCADE)
> +
>      objects = PatchManager()
>  
>      @staticmethod
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 803e98592fa8..805037c72d73 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None):
>          patch = Patch.objects.create(
>              msgid=msgid,
>              project=project,
> +            patch_project=project,
>              name=name[:255],
>              date=date,
>              headers=headers,
> diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
> index 3baf2999a836..f8d23a388ac7 100644
> --- a/patchwork/views/__init__.py
> +++ b/patchwork/views/__init__.py
> @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None,
>              context['filters'].set_status(filterclass, setting)
>  
>      if patches is None:
> -        patches = Patch.objects.filter(project=project)
> +        patches = Patch.objects.filter(patch_project=project)
>  
>      # annotate with tag counts
>      patches = patches.with_tag_counts(project)
> -- 
> 2.14.1
Stephen Finucane March 25, 2018, 3:03 p.m. UTC | #3
On Fri, 2018-03-09 at 02:54 +1100, Daniel Axtens wrote:
> Stephen Rothwell noticed (way back in September - sorry Stephen!) that
> the following query is really slow on OzLabs:
> 
> SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
>     INNER JOIN "patchwork_submission" ON
>         ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id")
>     WHERE ("patchwork_submission"."project_id" = 14 AND
>            "patchwork_patch"."state_id" IN
> 	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
>                 WHERE U0."action_required" = true
> 		ORDER BY U0."ordering" ASC));
> 
> I think this is really slow because we have to join the patch and
> submission table to get the project id, which we need to filter the
> patches.
> 
> Duplicate the project id in the patch table itself, which allows us to
> avoid the JOIN.
> 
> The new query reads as:
> SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
>     WHERE ("patchwork_patch"."patch_project_id" = 1 AND
>            "patchwork_patch"."state_id" IN
> 	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
> 	        WHERE U0."action_required" = true
> 		ORDER BY U0."ordering" ASC));
> 
> Very simple testing on a small, artifical Postgres instance (3
> projects, 102711 patches), shows speed gains of ~1.5-5x for this
> query. Looking at Postgres' cost estimates (EXPLAIN) of the first
> query vs the second query, we see a ~1.75x improvement there too.
> 
> I suspect the gains will be bigger on OzLabs.
> 
> (It turns out all of this is all for the "| NN patches" counter we
> added to the filter bar!!)
> 
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Daniel Axtens <dja@axtens.net>

It's unfortunate that this has already merged. While it works, it
defeats the whole point of the multi-table inheritance introduced in
commit 86172ccc1 (normalization). To be honest, given the performance
impacts that particular change introduced (which we're only seeing at
scale), I'd rather denormalize the whole thing and fold the 'Patch' and
'CoverLetter' models back into 'Submission' and just use a 'type' field
(or similar) to control behavior. Is there any reason not to do this?
If not, I'd like to wait on 2.1 until we've done both this and the
event fixes.

Stephen

> ---
> 
> This requires a migration, so I don't think we can feasibly do it as a
> stable update.
> 
> I think we drop the patch counter for stable and try to get this and
> the event stuff merged to master promptly, and just tag 2.1. (To that
> end, I will re-read and finish reviewing the event stuff soon.)
> ---
>  patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++
>  patchwork/models.py                              |  4 +++
>  patchwork/parser.py                              |  1 +
>  patchwork/views/__init__.py                      |  2 +-
>  4 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100644 patchwork/migrations/0024_patch_patch_project.py
> 
> diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py
> new file mode 100644
> index 000000000000..76d8f144c9dd
> --- /dev/null
> +++ b/patchwork/migrations/0024_patch_patch_project.py
> @@ -0,0 +1,39 @@
> +# -*- coding: utf-8 -*-
> +# Generated by Django 1.11.10 on 2018-03-08 01:51
> +from __future__ import unicode_literals
> +
> +from django.db import migrations, models
> +import django.db.models.deletion
> +
> +
> +class Migration(migrations.Migration):
> +    # per migration 16, but note this seems to be going away
> +    # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113)
> +    atomic = False
> +
> +    dependencies = [
> +        ('patchwork', '0023_timezone_unify'),
> +    ]
> +
> +    operations = [
> +        migrations.AddField(
> +            model_name='patch',
> +            name='patch_project',
> +            field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> +            preserve_default=False,
> +        ),
> +
> +        # as with 10, this will break if you use non-default table names
> +        migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id =
> +                               (SELECT project_id FROM patchwork_submission
> +                                WHERE patchwork_submission.id =
> +                                        patchwork_patch.submission_ptr_id);'''
> +        ),
> +
> +        migrations.AlterField(
> +            model_name='patch',
> +            name='patch_project',
> +            field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> +        ),
> +
> +    ]
> diff --git a/patchwork/models.py b/patchwork/models.py
> index b2491752f04a..3b905c4cd75b 100644
> --- a/patchwork/models.py
> +++ b/patchwork/models.py
> @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission):
>      archived = models.BooleanField(default=False)
>      hash = HashField(null=True, blank=True)
>  
> +    # duplicate project from submission in subclass so we can count the
> +    # patches in a project without needing to do a JOIN.
> +    patch_project = models.ForeignKey(Project, on_delete=models.CASCADE)
> +
>      objects = PatchManager()
>  
>      @staticmethod
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 803e98592fa8..805037c72d73 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None):
>          patch = Patch.objects.create(
>              msgid=msgid,
>              project=project,
> +            patch_project=project,
>              name=name[:255],
>              date=date,
>              headers=headers,
> diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
> index 3baf2999a836..f8d23a388ac7 100644
> --- a/patchwork/views/__init__.py
> +++ b/patchwork/views/__init__.py
> @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None,
>              context['filters'].set_status(filterclass, setting)
>  
>      if patches is None:
> -        patches = Patch.objects.filter(project=project)
> +        patches = Patch.objects.filter(patch_project=project)
>  
>      # annotate with tag counts
>      patches = patches.with_tag_counts(project)
Daniel Axtens March 27, 2018, 10:42 p.m. UTC | #4
Stephen Finucane <stephen@that.guru> writes:

> On Fri, 2018-03-09 at 02:54 +1100, Daniel Axtens wrote:
>> Stephen Rothwell noticed (way back in September - sorry Stephen!) that
>> the following query is really slow on OzLabs:
>> 
>> SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
>>     INNER JOIN "patchwork_submission" ON
>>         ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id")
>>     WHERE ("patchwork_submission"."project_id" = 14 AND
>>            "patchwork_patch"."state_id" IN
>> 	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
>>                 WHERE U0."action_required" = true
>> 		ORDER BY U0."ordering" ASC));
>> 
>> I think this is really slow because we have to join the patch and
>> submission table to get the project id, which we need to filter the
>> patches.
>> 
>> Duplicate the project id in the patch table itself, which allows us to
>> avoid the JOIN.
>> 
>> The new query reads as:
>> SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
>>     WHERE ("patchwork_patch"."patch_project_id" = 1 AND
>>            "patchwork_patch"."state_id" IN
>> 	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
>> 	        WHERE U0."action_required" = true
>> 		ORDER BY U0."ordering" ASC));
>> 
>> Very simple testing on a small, artifical Postgres instance (3
>> projects, 102711 patches), shows speed gains of ~1.5-5x for this
>> query. Looking at Postgres' cost estimates (EXPLAIN) of the first
>> query vs the second query, we see a ~1.75x improvement there too.
>> 
>> I suspect the gains will be bigger on OzLabs.
>> 
>> (It turns out all of this is all for the "| NN patches" counter we
>> added to the filter bar!!)
>> 
>> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
>> Signed-off-by: Daniel Axtens <dja@axtens.net>
>
> It's unfortunate that this has already merged. While it works, it
> defeats the whole point of the multi-table inheritance introduced in
> commit 86172ccc1 (normalization). To be honest, given the performance
> impacts that particular change introduced (which we're only seeing at
> scale), I'd rather denormalize the whole thing and fold the 'Patch' and
> 'CoverLetter' models back into 'Submission' and just use a 'type' field
> (or similar) to control behavior. Is there any reason not to do this?

I agree that would be a better conceptual solution. The reason I didn't
do it is because I tried a couple of times and couldn't get it all to
work, and I haven't had the time to really sit down and re-engineer it.

If you can get it to work I am happy to revert this and apply that; the
changes to the database schema aren't at all difficult to undo so any
brave souls running master won't be completely hosed.

Regards,
Daniel

> If not, I'd like to wait on 2.1 until we've done both this and the
> event fixes.
>
> Stephen
>
>> ---
>> 
>> This requires a migration, so I don't think we can feasibly do it as a
>> stable update.
>> 
>> I think we drop the patch counter for stable and try to get this and
>> the event stuff merged to master promptly, and just tag 2.1. (To that
>> end, I will re-read and finish reviewing the event stuff soon.)
>> ---
>>  patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++
>>  patchwork/models.py                              |  4 +++
>>  patchwork/parser.py                              |  1 +
>>  patchwork/views/__init__.py                      |  2 +-
>>  4 files changed, 45 insertions(+), 1 deletion(-)
>>  create mode 100644 patchwork/migrations/0024_patch_patch_project.py
>> 
>> diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py
>> new file mode 100644
>> index 000000000000..76d8f144c9dd
>> --- /dev/null
>> +++ b/patchwork/migrations/0024_patch_patch_project.py
>> @@ -0,0 +1,39 @@
>> +# -*- coding: utf-8 -*-
>> +# Generated by Django 1.11.10 on 2018-03-08 01:51
>> +from __future__ import unicode_literals
>> +
>> +from django.db import migrations, models
>> +import django.db.models.deletion
>> +
>> +
>> +class Migration(migrations.Migration):
>> +    # per migration 16, but note this seems to be going away
>> +    # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113)
>> +    atomic = False
>> +
>> +    dependencies = [
>> +        ('patchwork', '0023_timezone_unify'),
>> +    ]
>> +
>> +    operations = [
>> +        migrations.AddField(
>> +            model_name='patch',
>> +            name='patch_project',
>> +            field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
>> +            preserve_default=False,
>> +        ),
>> +
>> +        # as with 10, this will break if you use non-default table names
>> +        migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id =
>> +                               (SELECT project_id FROM patchwork_submission
>> +                                WHERE patchwork_submission.id =
>> +                                        patchwork_patch.submission_ptr_id);'''
>> +        ),
>> +
>> +        migrations.AlterField(
>> +            model_name='patch',
>> +            name='patch_project',
>> +            field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
>> +        ),
>> +
>> +    ]
>> diff --git a/patchwork/models.py b/patchwork/models.py
>> index b2491752f04a..3b905c4cd75b 100644
>> --- a/patchwork/models.py
>> +++ b/patchwork/models.py
>> @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission):
>>      archived = models.BooleanField(default=False)
>>      hash = HashField(null=True, blank=True)
>>  
>> +    # duplicate project from submission in subclass so we can count the
>> +    # patches in a project without needing to do a JOIN.
>> +    patch_project = models.ForeignKey(Project, on_delete=models.CASCADE)
>> +
>>      objects = PatchManager()
>>  
>>      @staticmethod
>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>> index 803e98592fa8..805037c72d73 100644
>> --- a/patchwork/parser.py
>> +++ b/patchwork/parser.py
>> @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None):
>>          patch = Patch.objects.create(
>>              msgid=msgid,
>>              project=project,
>> +            patch_project=project,
>>              name=name[:255],
>>              date=date,
>>              headers=headers,
>> diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
>> index 3baf2999a836..f8d23a388ac7 100644
>> --- a/patchwork/views/__init__.py
>> +++ b/patchwork/views/__init__.py
>> @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None,
>>              context['filters'].set_status(filterclass, setting)
>>  
>>      if patches is None:
>> -        patches = Patch.objects.filter(project=project)
>> +        patches = Patch.objects.filter(patch_project=project)
>>  
>>      # annotate with tag counts
>>      patches = patches.with_tag_counts(project)
Daniel Axtens March 27, 2018, 10:47 p.m. UTC | #5
>
> It's unfortunate that this has already merged. While it works, it
> defeats the whole point of the multi-table inheritance introduced in
> commit 86172ccc1 (normalization). To be honest, given the performance
> impacts that particular change introduced (which we're only seeing at
> scale), I'd rather denormalize the whole thing and fold the 'Patch' and
> 'CoverLetter' models back into 'Submission' and just use a 'type' field
> (or similar) to control behavior. Is there any reason not to do this?
> If not, I'd like to wait on 2.1 until we've done both this and the
> event fixes.

The other thing worth noting for this is that this issue is impacting
OzLabs pretty badly *now* - one of the queries took over 200s! - and so
they'd really like a fix sooner rather than later. So that was weighing
on my mind when I went with the half-fix rather than full denomalisation.

Regards,
Daniel

>
> Stephen
>
>> ---
>> 
>> This requires a migration, so I don't think we can feasibly do it as a
>> stable update.
>> 
>> I think we drop the patch counter for stable and try to get this and
>> the event stuff merged to master promptly, and just tag 2.1. (To that
>> end, I will re-read and finish reviewing the event stuff soon.)
>> ---
>>  patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++
>>  patchwork/models.py                              |  4 +++
>>  patchwork/parser.py                              |  1 +
>>  patchwork/views/__init__.py                      |  2 +-
>>  4 files changed, 45 insertions(+), 1 deletion(-)
>>  create mode 100644 patchwork/migrations/0024_patch_patch_project.py
>> 
>> diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py
>> new file mode 100644
>> index 000000000000..76d8f144c9dd
>> --- /dev/null
>> +++ b/patchwork/migrations/0024_patch_patch_project.py
>> @@ -0,0 +1,39 @@
>> +# -*- coding: utf-8 -*-
>> +# Generated by Django 1.11.10 on 2018-03-08 01:51
>> +from __future__ import unicode_literals
>> +
>> +from django.db import migrations, models
>> +import django.db.models.deletion
>> +
>> +
>> +class Migration(migrations.Migration):
>> +    # per migration 16, but note this seems to be going away
>> +    # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113)
>> +    atomic = False
>> +
>> +    dependencies = [
>> +        ('patchwork', '0023_timezone_unify'),
>> +    ]
>> +
>> +    operations = [
>> +        migrations.AddField(
>> +            model_name='patch',
>> +            name='patch_project',
>> +            field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
>> +            preserve_default=False,
>> +        ),
>> +
>> +        # as with 10, this will break if you use non-default table names
>> +        migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id =
>> +                               (SELECT project_id FROM patchwork_submission
>> +                                WHERE patchwork_submission.id =
>> +                                        patchwork_patch.submission_ptr_id);'''
>> +        ),
>> +
>> +        migrations.AlterField(
>> +            model_name='patch',
>> +            name='patch_project',
>> +            field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
>> +        ),
>> +
>> +    ]
>> diff --git a/patchwork/models.py b/patchwork/models.py
>> index b2491752f04a..3b905c4cd75b 100644
>> --- a/patchwork/models.py
>> +++ b/patchwork/models.py
>> @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission):
>>      archived = models.BooleanField(default=False)
>>      hash = HashField(null=True, blank=True)
>>  
>> +    # duplicate project from submission in subclass so we can count the
>> +    # patches in a project without needing to do a JOIN.
>> +    patch_project = models.ForeignKey(Project, on_delete=models.CASCADE)
>> +
>>      objects = PatchManager()
>>  
>>      @staticmethod
>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>> index 803e98592fa8..805037c72d73 100644
>> --- a/patchwork/parser.py
>> +++ b/patchwork/parser.py
>> @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None):
>>          patch = Patch.objects.create(
>>              msgid=msgid,
>>              project=project,
>> +            patch_project=project,
>>              name=name[:255],
>>              date=date,
>>              headers=headers,
>> diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
>> index 3baf2999a836..f8d23a388ac7 100644
>> --- a/patchwork/views/__init__.py
>> +++ b/patchwork/views/__init__.py
>> @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None,
>>              context['filters'].set_status(filterclass, setting)
>>  
>>      if patches is None:
>> -        patches = Patch.objects.filter(project=project)
>> +        patches = Patch.objects.filter(patch_project=project)
>>  
>>      # annotate with tag counts
>>      patches = patches.with_tag_counts(project)
Stephen Finucane March 28, 2018, 11:38 a.m. UTC | #6
On Wed, 2018-03-28 at 09:42 +1100, Daniel Axtens wrote:
> Stephen Finucane <stephen@that.guru> writes:
> 
> > On Fri, 2018-03-09 at 02:54 +1100, Daniel Axtens wrote:
> > > Stephen Rothwell noticed (way back in September - sorry Stephen!) that
> > > the following query is really slow on OzLabs:
> > > 
> > > SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
> > >     INNER JOIN "patchwork_submission" ON
> > >         ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id")
> > >     WHERE ("patchwork_submission"."project_id" = 14 AND
> > >            "patchwork_patch"."state_id" IN
> > > 	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
> > >                 WHERE U0."action_required" = true
> > > 		ORDER BY U0."ordering" ASC));
> > > 
> > > I think this is really slow because we have to join the patch and
> > > submission table to get the project id, which we need to filter the
> > > patches.
> > > 
> > > Duplicate the project id in the patch table itself, which allows us to
> > > avoid the JOIN.
> > > 
> > > The new query reads as:
> > > SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
> > >     WHERE ("patchwork_patch"."patch_project_id" = 1 AND
> > >            "patchwork_patch"."state_id" IN
> > > 	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
> > > 	        WHERE U0."action_required" = true
> > > 		ORDER BY U0."ordering" ASC));
> > > 
> > > Very simple testing on a small, artifical Postgres instance (3
> > > projects, 102711 patches), shows speed gains of ~1.5-5x for this
> > > query. Looking at Postgres' cost estimates (EXPLAIN) of the first
> > > query vs the second query, we see a ~1.75x improvement there too.
> > > 
> > > I suspect the gains will be bigger on OzLabs.
> > > 
> > > (It turns out all of this is all for the "| NN patches" counter we
> > > added to the filter bar!!)
> > > 
> > > Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> > > Signed-off-by: Daniel Axtens <dja@axtens.net>
> > 
> > It's unfortunate that this has already merged. While it works, it
> > defeats the whole point of the multi-table inheritance introduced in
> > commit 86172ccc1 (normalization). To be honest, given the performance
> > impacts that particular change introduced (which we're only seeing at
> > scale), I'd rather denormalize the whole thing and fold the 'Patch' and
> > 'CoverLetter' models back into 'Submission' and just use a 'type' field
> > (or similar) to control behavior. Is there any reason not to do this?
> 
> I agree that would be a better conceptual solution. The reason I didn't
> do it is because I tried a couple of times and couldn't get it all to
> work, and I haven't had the time to really sit down and re-engineer it.
> 
> If you can get it to work I am happy to revert this and apply that; the
> changes to the database schema aren't at all difficult to undo so any
> brave souls running master won't be completely hosed.

OK, that's fair. Let me see how I fare with this. The biggest issues I
see are the custom managers that we use for the Patch class. I'm hoping
Veronika's tag infrastructure rework will help with this and I can
build upon that. I'll review said RFC this week to see if that's the
case.

Stephen

> Regards,
> Daniel
> 
> > If not, I'd like to wait on 2.1 until we've done both this and the
> > event fixes.
> > 
> > Stephen
> > 
> > > ---
> > > 
> > > This requires a migration, so I don't think we can feasibly do it as a
> > > stable update.
> > > 
> > > I think we drop the patch counter for stable and try to get this and
> > > the event stuff merged to master promptly, and just tag 2.1. (To that
> > > end, I will re-read and finish reviewing the event stuff soon.)
> > > ---
> > >  patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++
> > >  patchwork/models.py                              |  4 +++
> > >  patchwork/parser.py                              |  1 +
> > >  patchwork/views/__init__.py                      |  2 +-
> > >  4 files changed, 45 insertions(+), 1 deletion(-)
> > >  create mode 100644 patchwork/migrations/0024_patch_patch_project.py
> > > 
> > > diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py
> > > new file mode 100644
> > > index 000000000000..76d8f144c9dd
> > > --- /dev/null
> > > +++ b/patchwork/migrations/0024_patch_patch_project.py
> > > @@ -0,0 +1,39 @@
> > > +# -*- coding: utf-8 -*-
> > > +# Generated by Django 1.11.10 on 2018-03-08 01:51
> > > +from __future__ import unicode_literals
> > > +
> > > +from django.db import migrations, models
> > > +import django.db.models.deletion
> > > +
> > > +
> > > +class Migration(migrations.Migration):
> > > +    # per migration 16, but note this seems to be going away
> > > +    # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113)
> > > +    atomic = False
> > > +
> > > +    dependencies = [
> > > +        ('patchwork', '0023_timezone_unify'),
> > > +    ]
> > > +
> > > +    operations = [
> > > +        migrations.AddField(
> > > +            model_name='patch',
> > > +            name='patch_project',
> > > +            field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> > > +            preserve_default=False,
> > > +        ),
> > > +
> > > +        # as with 10, this will break if you use non-default table names
> > > +        migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id =
> > > +                               (SELECT project_id FROM patchwork_submission
> > > +                                WHERE patchwork_submission.id =
> > > +                                        patchwork_patch.submission_ptr_id);'''
> > > +        ),
> > > +
> > > +        migrations.AlterField(
> > > +            model_name='patch',
> > > +            name='patch_project',
> > > +            field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> > > +        ),
> > > +
> > > +    ]
> > > diff --git a/patchwork/models.py b/patchwork/models.py
> > > index b2491752f04a..3b905c4cd75b 100644
> > > --- a/patchwork/models.py
> > > +++ b/patchwork/models.py
> > > @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission):
> > >      archived = models.BooleanField(default=False)
> > >      hash = HashField(null=True, blank=True)
> > >  
> > > +    # duplicate project from submission in subclass so we can count the
> > > +    # patches in a project without needing to do a JOIN.
> > > +    patch_project = models.ForeignKey(Project, on_delete=models.CASCADE)
> > > +
> > >      objects = PatchManager()
> > >  
> > >      @staticmethod
> > > diff --git a/patchwork/parser.py b/patchwork/parser.py
> > > index 803e98592fa8..805037c72d73 100644
> > > --- a/patchwork/parser.py
> > > +++ b/patchwork/parser.py
> > > @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None):
> > >          patch = Patch.objects.create(
> > >              msgid=msgid,
> > >              project=project,
> > > +            patch_project=project,
> > >              name=name[:255],
> > >              date=date,
> > >              headers=headers,
> > > diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
> > > index 3baf2999a836..f8d23a388ac7 100644
> > > --- a/patchwork/views/__init__.py
> > > +++ b/patchwork/views/__init__.py
> > > @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None,
> > >              context['filters'].set_status(filterclass, setting)
> > >  
> > >      if patches is None:
> > > -        patches = Patch.objects.filter(project=project)
> > > +        patches = Patch.objects.filter(patch_project=project)
> > >  
> > >      # annotate with tag counts
> > >      patches = patches.with_tag_counts(project)
diff mbox series

Patch

diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py
new file mode 100644
index 000000000000..76d8f144c9dd
--- /dev/null
+++ b/patchwork/migrations/0024_patch_patch_project.py
@@ -0,0 +1,39 @@ 
+# -*- coding: utf-8 -*-
+# Generated by Django 1.11.10 on 2018-03-08 01:51
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+import django.db.models.deletion
+
+
+class Migration(migrations.Migration):
+    # per migration 16, but note this seems to be going away
+    # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113)
+    atomic = False
+
+    dependencies = [
+        ('patchwork', '0023_timezone_unify'),
+    ]
+
+    operations = [
+        migrations.AddField(
+            model_name='patch',
+            name='patch_project',
+            field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
+            preserve_default=False,
+        ),
+
+        # as with 10, this will break if you use non-default table names
+        migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id =
+                               (SELECT project_id FROM patchwork_submission
+                                WHERE patchwork_submission.id =
+                                        patchwork_patch.submission_ptr_id);'''
+        ),
+
+        migrations.AlterField(
+            model_name='patch',
+            name='patch_project',
+            field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
+        ),
+
+    ]
diff --git a/patchwork/models.py b/patchwork/models.py
index b2491752f04a..3b905c4cd75b 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -423,6 +423,10 @@  class Patch(SeriesMixin, Submission):
     archived = models.BooleanField(default=False)
     hash = HashField(null=True, blank=True)
 
+    # duplicate project from submission in subclass so we can count the
+    # patches in a project without needing to do a JOIN.
+    patch_project = models.ForeignKey(Project, on_delete=models.CASCADE)
+
     objects = PatchManager()
 
     @staticmethod
diff --git a/patchwork/parser.py b/patchwork/parser.py
index 803e98592fa8..805037c72d73 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -1004,6 +1004,7 @@  def parse_mail(mail, list_id=None):
         patch = Patch.objects.create(
             msgid=msgid,
             project=project,
+            patch_project=project,
             name=name[:255],
             date=date,
             headers=headers,
diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
index 3baf2999a836..f8d23a388ac7 100644
--- a/patchwork/views/__init__.py
+++ b/patchwork/views/__init__.py
@@ -270,7 +270,7 @@  def generic_list(request, project, view, view_args=None, filter_settings=None,
             context['filters'].set_status(filterclass, setting)
 
     if patches is None:
-        patches = Patch.objects.filter(project=project)
+        patches = Patch.objects.filter(patch_project=project)
 
     # annotate with tag counts
     patches = patches.with_tag_counts(project)