xmlrpc/patch_list: only fetch required fields

Message ID 20170928141935.23204-1-dja@axtens.net
State New
Headers show
Series
  • xmlrpc/patch_list: only fetch required fields
Related show

Commit Message

Daniel Axtens Sept. 28, 2017, 2:19 p.m.
OzLabs noticed *massive* slowdowns in queries like this one:

SELECT "patchwork_submission"."id", "patchwork_submission"."msgid",
"patchwork_submission"."date", "patchwork_submission"."headers",
"patchwork_submission"."submitter_id",
"patchwork_submission"."content", "patchwork_submission"."project_id",
"patchwork_submission"."name", "patchwork_patch"."submission_ptr_id",
"patchwork_patch"."diff", "patchwork_patch"."commit_ref",
"patchwork_patch"."pull_url", "patchwork_patch"."delegate_id",
"patchwork_patch"."state_id", "patchwork_patch"."archived",
"patchwork_patch"."hash" FROM "patchwork_patch" INNER JOIN
"patchwork_submission" ON ("patchwork_patch"."submission_ptr_id" =
"patchwork_submission"."id") WHERE
("patchwork_submission"."project_id" = 2 AND
"patchwork_patch"."state_id" = 1) ORDER BY
"patchwork_submission"."date" ASC

These appear to be a result of pwclient list operations. We *do not*
need content/headers/diff in this case - so do not fetch them as it
is incredibly expensive - queries in excess of 50s have been observed.

This should go to stable/2.0.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Daniel Axtens <dja@axtens.net>

---
We're seeing this much more than in v1, I think because of the join
that is now required. More on this later.
---
 patchwork/views/xmlrpc.py | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Daniel Axtens Sept. 28, 2017, 2:36 p.m. | #1
Daniel Axtens <dja@axtens.net> writes:

> OzLabs noticed *massive* slowdowns in queries like this one:
>
> SELECT "patchwork_submission"."id", "patchwork_submission"."msgid",
> "patchwork_submission"."date", "patchwork_submission"."headers",
> "patchwork_submission"."submitter_id",
> "patchwork_submission"."content", "patchwork_submission"."project_id",
> "patchwork_submission"."name", "patchwork_patch"."submission_ptr_id",
> "patchwork_patch"."diff", "patchwork_patch"."commit_ref",
> "patchwork_patch"."pull_url", "patchwork_patch"."delegate_id",
> "patchwork_patch"."state_id", "patchwork_patch"."archived",
> "patchwork_patch"."hash" FROM "patchwork_patch" INNER JOIN
> "patchwork_submission" ON ("patchwork_patch"."submission_ptr_id" =
> "patchwork_submission"."id") WHERE
> ("patchwork_submission"."project_id" = 2 AND
> "patchwork_patch"."state_id" = 1) ORDER BY
> "patchwork_submission"."date" ASC
>
> These appear to be a result of pwclient list operations. We *do not*
> need content/headers/diff in this case - so do not fetch them as it
> is incredibly expensive - queries in excess of 50s have been observed.
>
> This should go to stable/2.0.
>
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Jeremy Kerr <jk@ozlabs.org>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
>
> ---
> We're seeing this much more than in v1, I think because of the join
> that is now required. More on this later.
> ---
>  patchwork/views/xmlrpc.py | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/patchwork/views/xmlrpc.py b/patchwork/views/xmlrpc.py
> index 8de9d10e1d50..186f54cac684 100644
> --- a/patchwork/views/xmlrpc.py
> +++ b/patchwork/views/xmlrpc.py
> @@ -585,6 +585,14 @@ def patch_list(filt=None):
>  
>      patches = Patch.objects.filter(**dfilter)
>  
> +    # Only extract the relevant fields. This saves a big db load as we
> +    # no longer fetch content/headers/etc for potentially every patch
> +    # in a project.
> +    patches = patches.only('id', 'date', 'msgid', 'name',
> +                           'project_id', 'state_id', 'archived',
> +                           'submitter_id', 'delegate_id', 'commit_ref',
> +                           'hash')
> +
We probably also want to chuck a select/prefetch_related in here but for
some reason xmlrpc doesn't want to test locally so I haven't been able
to verify it just yet. This should still be a big improvement though!

Regards,
Daniel


>      return _get_objects(patch_to_dict, patches, max_count)
>  
>  
> -- 
> 2.11.0

Patch

diff --git a/patchwork/views/xmlrpc.py b/patchwork/views/xmlrpc.py
index 8de9d10e1d50..186f54cac684 100644
--- a/patchwork/views/xmlrpc.py
+++ b/patchwork/views/xmlrpc.py
@@ -585,6 +585,14 @@  def patch_list(filt=None):
 
     patches = Patch.objects.filter(**dfilter)
 
+    # Only extract the relevant fields. This saves a big db load as we
+    # no longer fetch content/headers/etc for potentially every patch
+    # in a project.
+    patches = patches.only('id', 'date', 'msgid', 'name',
+                           'project_id', 'state_id', 'archived',
+                           'submitter_id', 'delegate_id', 'commit_ref',
+                           'hash')
+
     return _get_objects(patch_to_dict, patches, max_count)