diff mbox

[v2,05/13] REST: Resolve performance issues with tags

Message ID 1479574288-24171-6-git-send-email-stephen@that.guru
State Superseded
Headers show

Commit Message

Stephen Finucane Nov. 19, 2016, 4:51 p.m. UTC
The list comprehension used to generate a list of tags resulted in a
significant number of duplicated queries. Resolve this by copying the
approach taken to minimize patch queries in the UI.

This changes the output of the response in two ways. First, counts for
all tag patches are now shown, even if the count is 0. Secondly, a
dictionary is returned, with the tag name as the key, rather than a
list. As such, the output for a typical patch is transformed from:

    [
      {
        'name': 'Reviewed-by',
        'count': 1
      }
    ]

to

    {
      'Reviewed-by': 1
      'Acked-by': 0,
      'Tested-by': 0
    }

How this is achieved is a little hacky, but reworked tags are on the
post-v2.0 which will allow this to be resolved.

Signed-off-by: Stephen Finucane <stephen@that.guru>
Cc: Andy Doan <andy.doan@linaro.org>
---
 patchwork/api/patch.py           | 12 +++++-------
 patchwork/models.py              | 13 ++++++++++---
 patchwork/tests/test_rest_api.py |  5 ++---
 3 files changed, 17 insertions(+), 13 deletions(-)

Comments

Daniel Axtens Nov. 21, 2016, 3:18 a.m. UTC | #1
Hi Stephen,

Always keen on performance improvements! I think the new data format is
better too.

I'm not 100% across the DRF view of the world, so apologies if there are
any dumb questions.

>      def get_queryset(self):
> -        qs = super(PatchViewSet, self).get_queryset(
> -        ).prefetch_related(
> -            'check_set', 'patchtag_set'
> -        ).select_related('state', 'submitter', 'delegate')
> +        qs = super(PatchViewSet, self).get_queryset().with_tag_counts()\
> +            .prefetch_related('check_set')\
> +            .select_related('state', 'submitter', 'delegate')

I notice you've added the with_tag_counts. Is that required every time
we get the query_set? What's the motivation for doing it here?

>          if 'pk' not in self.kwargs:
>              # we are doing a listing, we don't need these fields
>              qs = qs.defer('content', 'diff', 'headers')
> diff --git a/patchwork/models.py b/patchwork/models.py
> index a27dda6..d1d0857 100644
> --- a/patchwork/models.py
> +++ b/patchwork/models.py
> @@ -225,8 +225,8 @@ def get_default_initial_patch_state():
>  
>  class PatchQuerySet(models.query.QuerySet):
>  
> -    def with_tag_counts(self, project):
> -        if not project.use_tags:
> +    def with_tag_counts(self, project=None):
> +        if project and not project.use_tags:
>              return self

I must admit some confusion here. Previously project was not
nullabe. What's the motivation for making it nullable?

And one final question - given our recent history with broken parsing -
is this all going to behave fine in the presence of malformed tags? I
don't think this particular patch (or patch set) really touch it, but
probably worth thinking about while we're in this area.

Regards,
Daniel
Stephen Finucane Nov. 24, 2016, 7:55 p.m. UTC | #2
On Mon, 2016-11-21 at 14:18 +1100, Daniel Axtens wrote:
> Hi Stephen,
> 
> Always keen on performance improvements! I think the new data format
> is
> better too.
> 
> I'm not 100% across the DRF view of the world, so apologies if there
> are
> any dumb questions.
> 
> > 
> >      def get_queryset(self):
> > -        qs = super(PatchViewSet, self).get_queryset(
> > -        ).prefetch_related(
> > -            'check_set', 'patchtag_set'
> > -        ).select_related('state', 'submitter', 'delegate')
> > +        qs = super(PatchViewSet,
> > self).get_queryset().with_tag_counts()\
> > +            .prefetch_related('check_set')\
> > +            .select_related('state', 'submitter', 'delegate')
> 
> I notice you've added the with_tag_counts. Is that required every
> time
> we get the query_set? What's the motivation for doing it here?

If we want to access the tags, then yes. If you don't do this, you need
to calculate the tag counts in Python which results in a cascade of
queries. It's annoying but necessary.

> > 
> >          if 'pk' not in self.kwargs:
> >              # we are doing a listing, we don't need these fields
> >              qs = qs.defer('content', 'diff', 'headers')
> > diff --git a/patchwork/models.py b/patchwork/models.py
> > index a27dda6..d1d0857 100644
> > --- a/patchwork/models.py
> > +++ b/patchwork/models.py
> > @@ -225,8 +225,8 @@ def get_default_initial_patch_state():
> >  
> >  class PatchQuerySet(models.query.QuerySet):
> >  
> > -    def with_tag_counts(self, project):
> > -        if not project.use_tags:
> > +    def with_tag_counts(self, project=None):
> > +        if project and not project.use_tags:
> >              return self
> 
> I must admit some confusion here. Previously project was not
> nullabe. What's the motivation for making it nullable?

In the 'get_queryset' function above, we don't actually have a project
instance available to pass to 'with_tag_counts'. Seeing as the only
reason we even pass 'with_tag_counts' is to validate whether tags
should be displayed (project.tags == Tag.objects.all() for all
projects), it's not really necessary. It's a bit hacky, but I'll fix
post-2.0. 

> And one final question - given our recent history with broken parsing
> -
> is this all going to behave fine in the presence of malformed tags? I
> don't think this particular patch (or patch set) really touch it, but
> probably worth thinking about while we're in this area.

There aren't really such as thing as malformed tags. We don't store the
body of the tags, for example. I think we're good here, but we can
revisit if we see funky stuff in the wild.

Stephen
diff mbox

Patch

diff --git a/patchwork/api/patch.py b/patchwork/api/patch.py
index bfd8fe3..bc4cb5c 100644
--- a/patchwork/api/patch.py
+++ b/patchwork/api/patch.py
@@ -52,9 +52,8 @@  class PatchSerializer(HyperlinkedModelSerializer):
         return request.build_absolute_uri(instance.get_mbox_url())
 
     def get_tags(self, instance):
-        # TODO(stephenfin): I don't think this is correct - too many queries
-        return [{'name': x.tag.name, 'count': x.count}
-                for x in instance.patchtag_set.all()]
+        return {x.name: getattr(instance, x.attr_name)
+                for x in instance.project.tags}
 
     def get_headers(self, instance):
         if instance.headers:
@@ -85,10 +84,9 @@  class PatchViewSet(PatchworkViewSet):
     serializer_class = PatchSerializer
 
     def get_queryset(self):
-        qs = super(PatchViewSet, self).get_queryset(
-        ).prefetch_related(
-            'check_set', 'patchtag_set'
-        ).select_related('state', 'submitter', 'delegate')
+        qs = super(PatchViewSet, self).get_queryset().with_tag_counts()\
+            .prefetch_related('check_set')\
+            .select_related('state', 'submitter', 'delegate')
         if 'pk' not in self.kwargs:
             # we are doing a listing, we don't need these fields
             qs = qs.defer('content', 'diff', 'headers')
diff --git a/patchwork/models.py b/patchwork/models.py
index a27dda6..d1d0857 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -225,8 +225,8 @@  def get_default_initial_patch_state():
 
 class PatchQuerySet(models.query.QuerySet):
 
-    def with_tag_counts(self, project):
-        if not project.use_tags:
+    def with_tag_counts(self, project=None):
+        if project and not project.use_tags:
             return self
 
         # We need the project's use_tags field loaded for Project.tags().
@@ -236,7 +236,14 @@  class PatchQuerySet(models.query.QuerySet):
         qs = self.prefetch_related('project')
         select = OrderedDict()
         select_params = []
-        for tag in project.tags:
+
+        # All projects have the same tags, so we're good to go here
+        if project:
+            tags = project.tags
+        else:
+            tags = Tag.objects.all()
+
+        for tag in tags:
             select[tag.attr_name] = (
                 "coalesce("
                 "(SELECT count FROM patchwork_patchtag"
diff --git a/patchwork/tests/test_rest_api.py b/patchwork/tests/test_rest_api.py
index 3be8ecf..ddce4d7 100644
--- a/patchwork/tests/test_rest_api.py
+++ b/patchwork/tests/test_rest_api.py
@@ -304,9 +304,8 @@  class TestPatchAPI(APITestCase):
             content='Reviewed-by: Test User <test@example.com>\n')
         resp = self.client.get(self.api_url(patch.id))
         tags = resp.data['tags']
-        self.assertEqual(1, len(tags))
-        self.assertEqual(1, tags[0]['count'])
-        self.assertEqual('Reviewed-by', tags[0]['name'])
+        self.assertEqual(3, len(tags))
+        self.assertEqual(1, tags['Reviewed-by'])
 
     def test_anonymous_create(self):
         """Ensure anonymous "POST" operations are rejected."""