[10/11] Be sensible computing project patch counts

Message ID 20180810080106.10714-11-stewart@linux.ibm.com
State Accepted
Headers show
Series
  • Performance for ALL THE THINGS!
Related show

Commit Message

Stewart Smith Aug. 10, 2018, 8:01 a.m.
Django actively fights constructing a query that isn't insane.

So, let's go and just execute a raw one. This is all very standard
SQL so should execute everywhere without a problem.

With the dataset of patchwork.ozlabs.org, looking at the /project/
page for qemu-devel would take 13 queries and 1500ms,
with this patch it's down to 11 queries in ~250ms.
For the dataset of the netdev list, it's down to 440ms from 1500ms.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
---
 patchwork/views/project.py | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

Comments

Stephen Finucane Aug. 31, 2018, 2:16 p.m. | #1
On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote:
> Django actively fights constructing a query that isn't insane.
> 
> So, let's go and just execute a raw one. This is all very standard
> SQL so should execute everywhere without a problem.
> 
> With the dataset of patchwork.ozlabs.org, looking at the /project/
> page for qemu-devel would take 13 queries and 1500ms,
> with this patch it's down to 11 queries in ~250ms.
> For the dataset of the netdev list, it's down to 440ms from 1500ms.
> 
> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>

I'm generally reluctant to dive into SQL unless it's absolutely
required. This is simple enough and the gains are significant so

Reviewed-by: Stephen Finucane <stephen@that.guru>

Is this a potential bug report for Django?

Stephen

> ---
>  patchwork/views/project.py | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/patchwork/views/project.py b/patchwork/views/project.py
> index 484455c02d9d..2a75242a06af 100644
> --- a/patchwork/views/project.py
> +++ b/patchwork/views/project.py
> @@ -27,6 +27,8 @@ from patchwork.compat import reverse
>  from patchwork.models import Patch
>  from patchwork.models import Project
>  
> +from django.db import connection
> +
>  
>  def project_list(request):
>      projects = Project.objects.all()
> @@ -44,14 +46,29 @@ def project_list(request):
>  
>  def project_detail(request, project_id):
>      project = get_object_or_404(Project, linkname=project_id)
> -    patches = Patch.objects.filter(project=project)
> +
> +    # So, we revert to raw sql because if we do what you'd think would
> +    # be the correct thing in Django-ese, it ends up doing a *pointless*
> +    # join with patchwork_submissions that ends up ruining the query.
> +    # So, we do not do this, as this is wrong:
> +    #patches = Patch.objects.filter(patch_project_id=project.id).only('archived')
> +    #patches = patches.annotate(c=Count('archived'))
> +    # and instead do this, because it's simple and fast
> +
> +    n_patches = {}
> +    with connection.cursor() as cursor:
> +        c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c FROM patchwork_patch WHERE patch_project_id=%s GROUP BY archived',
> +                           [project.id])
> +
> +        for r in cursor:
> +            n_patches[r[0]] = r[1]
>  
>      context = {
>          'project': project,
>          'maintainers': User.objects.filter(
>              profile__maintainer_projects=project),
> -        'n_patches': patches.filter(archived=False).count(),
> -        'n_archived_patches': patches.filter(archived=True).count(),
> +        'n_patches': n_patches[False],
> +        'n_archived_patches': n_patches[True],
>          'enable_xmlrpc': settings.ENABLE_XMLRPC,
>      }
>      return render(request, 'patchwork/project.html', context)
Stephen Finucane Sept. 5, 2018, 7:53 p.m. | #2
On Fri, 2018-08-31 at 15:16 +0100, Stephen Finucane wrote:
> On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote:
> > Django actively fights constructing a query that isn't insane.
> > 
> > So, let's go and just execute a raw one. This is all very standard
> > SQL so should execute everywhere without a problem.
> > 
> > With the dataset of patchwork.ozlabs.org, looking at the /project/
> > page for qemu-devel would take 13 queries and 1500ms,
> > with this patch it's down to 11 queries in ~250ms.
> > For the dataset of the netdev list, it's down to 440ms from 1500ms.
> > 
> > Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
> 
> I'm generally reluctant to dive into SQL unless it's absolutely
> required. This is simple enough and the gains are significant so
> 
> Reviewed-by: Stephen Finucane <stephen@that.guru>

Actually, there's a potential issue here which I spotted while testing
with a smaller (odd, I know) set of patches. I can fix at merge time,
assuming a respin isn't needed.

> Is this a potential bug report for Django?
> 
> Stephen
> 
> > ---
> >  patchwork/views/project.py | 23 ++++++++++++++++++++---
> >  1 file changed, 20 insertions(+), 3 deletions(-)
> > 
> > diff --git a/patchwork/views/project.py b/patchwork/views/project.py
> > index 484455c02d9d..2a75242a06af 100644
> > --- a/patchwork/views/project.py
> > +++ b/patchwork/views/project.py
> > @@ -27,6 +27,8 @@ from patchwork.compat import reverse
> >  from patchwork.models import Patch
> >  from patchwork.models import Project
> >  
> > +from django.db import connection
> > +
> >  
> >  def project_list(request):
> >      projects = Project.objects.all()
> > @@ -44,14 +46,29 @@ def project_list(request):
> >  
> >  def project_detail(request, project_id):
> >      project = get_object_or_404(Project, linkname=project_id)
> > -    patches = Patch.objects.filter(project=project)
> > +
> > +    # So, we revert to raw sql because if we do what you'd think would
> > +    # be the correct thing in Django-ese, it ends up doing a *pointless*
> > +    # join with patchwork_submissions that ends up ruining the query.
> > +    # So, we do not do this, as this is wrong:
> > +    #patches = Patch.objects.filter(patch_project_id=project.id).only('archived')
> > +    #patches = patches.annotate(c=Count('archived'))
> > +    # and instead do this, because it's simple and fast
> > +
> > +    n_patches = {}
> > +    with connection.cursor() as cursor:
> > +        c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c FROM patchwork_patch WHERE patch_project_id=%s GROUP BY archived',
> > +                           [project.id])
> > +
> > +        for r in cursor:
> > +            n_patches[r[0]] = r[1]
> >  
> >      context = {
> >          'project': project,
> >          'maintainers': User.objects.filter(
> >              profile__maintainer_projects=project),
> > -        'n_patches': patches.filter(archived=False).count(),
> > -        'n_archived_patches': patches.filter(archived=True).count(),
> > +        'n_patches': n_patches[False],
> > +        'n_archived_patches': n_patches[True],

It's possible that there are (a) no patches, (b) all patches are
archived, or (c) no patches are archived. If any of these are true,
you'll get a key error. We can fix this very simply like so:

  'n_patches': n_patches[False] if False in n_patches else 0,
  'n_archived_patches': n_patches[True] if True in n_patches else 0,

As noted above, I'll fix this at merge time and add a test to ensure we
don't regress.

Stephen

> >          'enable_xmlrpc': settings.ENABLE_XMLRPC,
> >      }
> >      return render(request, 'patchwork/project.html', context)
> 
> 
> _______________________________________________
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
Stewart Smith Sept. 10, 2018, 2:55 a.m. | #3
Stephen Finucane <stephen@that.guru> writes:
> On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote:
>> Django actively fights constructing a query that isn't insane.
>> 
>> So, let's go and just execute a raw one. This is all very standard
>> SQL so should execute everywhere without a problem.
>> 
>> With the dataset of patchwork.ozlabs.org, looking at the /project/
>> page for qemu-devel would take 13 queries and 1500ms,
>> with this patch it's down to 11 queries in ~250ms.
>> For the dataset of the netdev list, it's down to 440ms from 1500ms.
>> 
>> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
>
> I'm generally reluctant to dive into SQL unless it's absolutely
> required. This is simple enough and the gains are significant so

(Imagine all my exasperation at wanting to do that for all the other
patches :)

> Reviewed-by: Stephen Finucane <stephen@that.guru>
>
> Is this a potential bug report for Django?

It probably is. With some luck I'll find the time to write up a small
test case and send it upstream.
Stewart Smith Sept. 10, 2018, 2:56 a.m. | #4
Stephen Finucane <stephen@that.guru> writes:
> On Fri, 2018-08-31 at 15:16 +0100, Stephen Finucane wrote:
>> On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote:
>> > Django actively fights constructing a query that isn't insane.
>> > 
>> > So, let's go and just execute a raw one. This is all very standard
>> > SQL so should execute everywhere without a problem.
>> > 
>> > With the dataset of patchwork.ozlabs.org, looking at the /project/
>> > page for qemu-devel would take 13 queries and 1500ms,
>> > with this patch it's down to 11 queries in ~250ms.
>> > For the dataset of the netdev list, it's down to 440ms from 1500ms.
>> > 
>> > Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
>> 
>> I'm generally reluctant to dive into SQL unless it's absolutely
>> required. This is simple enough and the gains are significant so
>> 
>> Reviewed-by: Stephen Finucane <stephen@that.guru>
>
> Actually, there's a potential issue here which I spotted while testing
> with a smaller (odd, I know) set of patches. I can fix at merge time,
> assuming a respin isn't needed.
>
>> Is this a potential bug report for Django?
>> 
>> Stephen
>> 
>> > ---
>> >  patchwork/views/project.py | 23 ++++++++++++++++++++---
>> >  1 file changed, 20 insertions(+), 3 deletions(-)
>> > 
>> > diff --git a/patchwork/views/project.py b/patchwork/views/project.py
>> > index 484455c02d9d..2a75242a06af 100644
>> > --- a/patchwork/views/project.py
>> > +++ b/patchwork/views/project.py
>> > @@ -27,6 +27,8 @@ from patchwork.compat import reverse
>> >  from patchwork.models import Patch
>> >  from patchwork.models import Project
>> >  
>> > +from django.db import connection
>> > +
>> >  
>> >  def project_list(request):
>> >      projects = Project.objects.all()
>> > @@ -44,14 +46,29 @@ def project_list(request):
>> >  
>> >  def project_detail(request, project_id):
>> >      project = get_object_or_404(Project, linkname=project_id)
>> > -    patches = Patch.objects.filter(project=project)
>> > +
>> > +    # So, we revert to raw sql because if we do what you'd think would
>> > +    # be the correct thing in Django-ese, it ends up doing a *pointless*
>> > +    # join with patchwork_submissions that ends up ruining the query.
>> > +    # So, we do not do this, as this is wrong:
>> > +    #patches = Patch.objects.filter(patch_project_id=project.id).only('archived')
>> > +    #patches = patches.annotate(c=Count('archived'))
>> > +    # and instead do this, because it's simple and fast
>> > +
>> > +    n_patches = {}
>> > +    with connection.cursor() as cursor:
>> > +        c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c FROM patchwork_patch WHERE patch_project_id=%s GROUP BY archived',
>> > +                           [project.id])
>> > +
>> > +        for r in cursor:
>> > +            n_patches[r[0]] = r[1]
>> >  
>> >      context = {
>> >          'project': project,
>> >          'maintainers': User.objects.filter(
>> >              profile__maintainer_projects=project),
>> > -        'n_patches': patches.filter(archived=False).count(),
>> > -        'n_archived_patches': patches.filter(archived=True).count(),
>> > +        'n_patches': n_patches[False],
>> > +        'n_archived_patches': n_patches[True],
>
> It's possible that there are (a) no patches, (b) all patches are
> archived, or (c) no patches are archived. If any of these are true,
> you'll get a key error. We can fix this very simply like so:
>
>   'n_patches': n_patches[False] if False in n_patches else 0,
>   'n_archived_patches': n_patches[True] if True in n_patches else 0,
>
> As noted above, I'll fix this at merge time and add a test to ensure we
> don't regress.

Oh, neat! Yeah, I likely didn't test those scenarios at all, I was
focused on large amounts of data rather than no data.

Patch

diff --git a/patchwork/views/project.py b/patchwork/views/project.py
index 484455c02d9d..2a75242a06af 100644
--- a/patchwork/views/project.py
+++ b/patchwork/views/project.py
@@ -27,6 +27,8 @@  from patchwork.compat import reverse
 from patchwork.models import Patch
 from patchwork.models import Project
 
+from django.db import connection
+
 
 def project_list(request):
     projects = Project.objects.all()
@@ -44,14 +46,29 @@  def project_list(request):
 
 def project_detail(request, project_id):
     project = get_object_or_404(Project, linkname=project_id)
-    patches = Patch.objects.filter(project=project)
+
+    # So, we revert to raw sql because if we do what you'd think would
+    # be the correct thing in Django-ese, it ends up doing a *pointless*
+    # join with patchwork_submissions that ends up ruining the query.
+    # So, we do not do this, as this is wrong:
+    #patches = Patch.objects.filter(patch_project_id=project.id).only('archived')
+    #patches = patches.annotate(c=Count('archived'))
+    # and instead do this, because it's simple and fast
+
+    n_patches = {}
+    with connection.cursor() as cursor:
+        c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c FROM patchwork_patch WHERE patch_project_id=%s GROUP BY archived',
+                           [project.id])
+
+        for r in cursor:
+            n_patches[r[0]] = r[1]
 
     context = {
         'project': project,
         'maintainers': User.objects.filter(
             profile__maintainer_projects=project),
-        'n_patches': patches.filter(archived=False).count(),
-        'n_archived_patches': patches.filter(archived=True).count(),
+        'n_patches': n_patches[False],
+        'n_archived_patches': n_patches[True],
         'enable_xmlrpc': settings.ENABLE_XMLRPC,
     }
     return render(request, 'patchwork/project.html', context)