Message ID | 20180810080106.10714-11-stewart@linux.ibm.com |
---|---|
State | Accepted |
Headers | show |
Series | Performance for ALL THE THINGS! | expand |
On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote: > Django actively fights constructing a query that isn't insane. > > So, let's go and just execute a raw one. This is all very standard > SQL so should execute everywhere without a problem. > > With the dataset of patchwork.ozlabs.org, looking at the /project/ > page for qemu-devel would take 13 queries and 1500ms, > with this patch it's down to 11 queries in ~250ms. > For the dataset of the netdev list, it's down to 440ms from 1500ms. > > Signed-off-by: Stewart Smith <stewart@linux.ibm.com> I'm generally reluctant to dive into SQL unless it's absolutely required. This is simple enough and the gains are significant so Reviewed-by: Stephen Finucane <stephen@that.guru> Is this a potential bug report for Django? Stephen > --- > patchwork/views/project.py | 23 ++++++++++++++++++++--- > 1 file changed, 20 insertions(+), 3 deletions(-) > > diff --git a/patchwork/views/project.py b/patchwork/views/project.py > index 484455c02d9d..2a75242a06af 100644 > --- a/patchwork/views/project.py > +++ b/patchwork/views/project.py > @@ -27,6 +27,8 @@ from patchwork.compat import reverse > from patchwork.models import Patch > from patchwork.models import Project > > +from django.db import connection > + > > def project_list(request): > projects = Project.objects.all() > @@ -44,14 +46,29 @@ def project_list(request): > > def project_detail(request, project_id): > project = get_object_or_404(Project, linkname=project_id) > - patches = Patch.objects.filter(project=project) > + > + # So, we revert to raw sql because if we do what you'd think would > + # be the correct thing in Django-ese, it ends up doing a *pointless* > + # join with patchwork_submissions that ends up ruining the query. > + # So, we do not do this, as this is wrong: > + #patches = Patch.objects.filter(patch_project_id=project.id).only('archived') > + #patches = patches.annotate(c=Count('archived')) > + # and instead do this, because it's simple and fast > + > + n_patches = {} > + with connection.cursor() as cursor: > + c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c FROM patchwork_patch WHERE patch_project_id=%s GROUP BY archived', > + [project.id]) > + > + for r in cursor: > + n_patches[r[0]] = r[1] > > context = { > 'project': project, > 'maintainers': User.objects.filter( > profile__maintainer_projects=project), > - 'n_patches': patches.filter(archived=False).count(), > - 'n_archived_patches': patches.filter(archived=True).count(), > + 'n_patches': n_patches[False], > + 'n_archived_patches': n_patches[True], > 'enable_xmlrpc': settings.ENABLE_XMLRPC, > } > return render(request, 'patchwork/project.html', context)
On Fri, 2018-08-31 at 15:16 +0100, Stephen Finucane wrote: > On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote: > > Django actively fights constructing a query that isn't insane. > > > > So, let's go and just execute a raw one. This is all very standard > > SQL so should execute everywhere without a problem. > > > > With the dataset of patchwork.ozlabs.org, looking at the /project/ > > page for qemu-devel would take 13 queries and 1500ms, > > with this patch it's down to 11 queries in ~250ms. > > For the dataset of the netdev list, it's down to 440ms from 1500ms. > > > > Signed-off-by: Stewart Smith <stewart@linux.ibm.com> > > I'm generally reluctant to dive into SQL unless it's absolutely > required. This is simple enough and the gains are significant so > > Reviewed-by: Stephen Finucane <stephen@that.guru> Actually, there's a potential issue here which I spotted while testing with a smaller (odd, I know) set of patches. I can fix at merge time, assuming a respin isn't needed. > Is this a potential bug report for Django? > > Stephen > > > --- > > patchwork/views/project.py | 23 ++++++++++++++++++++--- > > 1 file changed, 20 insertions(+), 3 deletions(-) > > > > diff --git a/patchwork/views/project.py b/patchwork/views/project.py > > index 484455c02d9d..2a75242a06af 100644 > > --- a/patchwork/views/project.py > > +++ b/patchwork/views/project.py > > @@ -27,6 +27,8 @@ from patchwork.compat import reverse > > from patchwork.models import Patch > > from patchwork.models import Project > > > > +from django.db import connection > > + > > > > def project_list(request): > > projects = Project.objects.all() > > @@ -44,14 +46,29 @@ def project_list(request): > > > > def project_detail(request, project_id): > > project = get_object_or_404(Project, linkname=project_id) > > - patches = Patch.objects.filter(project=project) > > + > > + # So, we revert to raw sql because if we do what you'd think would > > + # be the correct thing in Django-ese, it ends up doing a *pointless* > > + # join with patchwork_submissions that ends up ruining the query. > > + # So, we do not do this, as this is wrong: > > + #patches = Patch.objects.filter(patch_project_id=project.id).only('archived') > > + #patches = patches.annotate(c=Count('archived')) > > + # and instead do this, because it's simple and fast > > + > > + n_patches = {} > > + with connection.cursor() as cursor: > > + c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c FROM patchwork_patch WHERE patch_project_id=%s GROUP BY archived', > > + [project.id]) > > + > > + for r in cursor: > > + n_patches[r[0]] = r[1] > > > > context = { > > 'project': project, > > 'maintainers': User.objects.filter( > > profile__maintainer_projects=project), > > - 'n_patches': patches.filter(archived=False).count(), > > - 'n_archived_patches': patches.filter(archived=True).count(), > > + 'n_patches': n_patches[False], > > + 'n_archived_patches': n_patches[True], It's possible that there are (a) no patches, (b) all patches are archived, or (c) no patches are archived. If any of these are true, you'll get a key error. We can fix this very simply like so: 'n_patches': n_patches[False] if False in n_patches else 0, 'n_archived_patches': n_patches[True] if True in n_patches else 0, As noted above, I'll fix this at merge time and add a test to ensure we don't regress. Stephen > > 'enable_xmlrpc': settings.ENABLE_XMLRPC, > > } > > return render(request, 'patchwork/project.html', context) > > > _______________________________________________ > Patchwork mailing list > Patchwork@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/patchwork
Stephen Finucane <stephen@that.guru> writes: > On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote: >> Django actively fights constructing a query that isn't insane. >> >> So, let's go and just execute a raw one. This is all very standard >> SQL so should execute everywhere without a problem. >> >> With the dataset of patchwork.ozlabs.org, looking at the /project/ >> page for qemu-devel would take 13 queries and 1500ms, >> with this patch it's down to 11 queries in ~250ms. >> For the dataset of the netdev list, it's down to 440ms from 1500ms. >> >> Signed-off-by: Stewart Smith <stewart@linux.ibm.com> > > I'm generally reluctant to dive into SQL unless it's absolutely > required. This is simple enough and the gains are significant so (Imagine all my exasperation at wanting to do that for all the other patches :) > Reviewed-by: Stephen Finucane <stephen@that.guru> > > Is this a potential bug report for Django? It probably is. With some luck I'll find the time to write up a small test case and send it upstream.
Stephen Finucane <stephen@that.guru> writes: > On Fri, 2018-08-31 at 15:16 +0100, Stephen Finucane wrote: >> On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote: >> > Django actively fights constructing a query that isn't insane. >> > >> > So, let's go and just execute a raw one. This is all very standard >> > SQL so should execute everywhere without a problem. >> > >> > With the dataset of patchwork.ozlabs.org, looking at the /project/ >> > page for qemu-devel would take 13 queries and 1500ms, >> > with this patch it's down to 11 queries in ~250ms. >> > For the dataset of the netdev list, it's down to 440ms from 1500ms. >> > >> > Signed-off-by: Stewart Smith <stewart@linux.ibm.com> >> >> I'm generally reluctant to dive into SQL unless it's absolutely >> required. This is simple enough and the gains are significant so >> >> Reviewed-by: Stephen Finucane <stephen@that.guru> > > Actually, there's a potential issue here which I spotted while testing > with a smaller (odd, I know) set of patches. I can fix at merge time, > assuming a respin isn't needed. > >> Is this a potential bug report for Django? >> >> Stephen >> >> > --- >> > patchwork/views/project.py | 23 ++++++++++++++++++++--- >> > 1 file changed, 20 insertions(+), 3 deletions(-) >> > >> > diff --git a/patchwork/views/project.py b/patchwork/views/project.py >> > index 484455c02d9d..2a75242a06af 100644 >> > --- a/patchwork/views/project.py >> > +++ b/patchwork/views/project.py >> > @@ -27,6 +27,8 @@ from patchwork.compat import reverse >> > from patchwork.models import Patch >> > from patchwork.models import Project >> > >> > +from django.db import connection >> > + >> > >> > def project_list(request): >> > projects = Project.objects.all() >> > @@ -44,14 +46,29 @@ def project_list(request): >> > >> > def project_detail(request, project_id): >> > project = get_object_or_404(Project, linkname=project_id) >> > - patches = Patch.objects.filter(project=project) >> > + >> > + # So, we revert to raw sql because if we do what you'd think would >> > + # be the correct thing in Django-ese, it ends up doing a *pointless* >> > + # join with patchwork_submissions that ends up ruining the query. >> > + # So, we do not do this, as this is wrong: >> > + #patches = Patch.objects.filter(patch_project_id=project.id).only('archived') >> > + #patches = patches.annotate(c=Count('archived')) >> > + # and instead do this, because it's simple and fast >> > + >> > + n_patches = {} >> > + with connection.cursor() as cursor: >> > + c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c FROM patchwork_patch WHERE patch_project_id=%s GROUP BY archived', >> > + [project.id]) >> > + >> > + for r in cursor: >> > + n_patches[r[0]] = r[1] >> > >> > context = { >> > 'project': project, >> > 'maintainers': User.objects.filter( >> > profile__maintainer_projects=project), >> > - 'n_patches': patches.filter(archived=False).count(), >> > - 'n_archived_patches': patches.filter(archived=True).count(), >> > + 'n_patches': n_patches[False], >> > + 'n_archived_patches': n_patches[True], > > It's possible that there are (a) no patches, (b) all patches are > archived, or (c) no patches are archived. If any of these are true, > you'll get a key error. We can fix this very simply like so: > > 'n_patches': n_patches[False] if False in n_patches else 0, > 'n_archived_patches': n_patches[True] if True in n_patches else 0, > > As noted above, I'll fix this at merge time and add a test to ensure we > don't regress. Oh, neat! Yeah, I likely didn't test those scenarios at all, I was focused on large amounts of data rather than no data.
diff --git a/patchwork/views/project.py b/patchwork/views/project.py index 484455c02d9d..2a75242a06af 100644 --- a/patchwork/views/project.py +++ b/patchwork/views/project.py @@ -27,6 +27,8 @@ from patchwork.compat import reverse from patchwork.models import Patch from patchwork.models import Project +from django.db import connection + def project_list(request): projects = Project.objects.all() @@ -44,14 +46,29 @@ def project_list(request): def project_detail(request, project_id): project = get_object_or_404(Project, linkname=project_id) - patches = Patch.objects.filter(project=project) + + # So, we revert to raw sql because if we do what you'd think would + # be the correct thing in Django-ese, it ends up doing a *pointless* + # join with patchwork_submissions that ends up ruining the query. + # So, we do not do this, as this is wrong: + #patches = Patch.objects.filter(patch_project_id=project.id).only('archived') + #patches = patches.annotate(c=Count('archived')) + # and instead do this, because it's simple and fast + + n_patches = {} + with connection.cursor() as cursor: + c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c FROM patchwork_patch WHERE patch_project_id=%s GROUP BY archived', + [project.id]) + + for r in cursor: + n_patches[r[0]] = r[1] context = { 'project': project, 'maintainers': User.objects.filter( profile__maintainer_projects=project), - 'n_patches': patches.filter(archived=False).count(), - 'n_archived_patches': patches.filter(archived=True).count(), + 'n_patches': n_patches[False], + 'n_archived_patches': n_patches[True], 'enable_xmlrpc': settings.ENABLE_XMLRPC, } return render(request, 'patchwork/project.html', context)
Django actively fights constructing a query that isn't insane. So, let's go and just execute a raw one. This is all very standard SQL so should execute everywhere without a problem. With the dataset of patchwork.ozlabs.org, looking at the /project/ page for qemu-devel would take 13 queries and 1500ms, with this patch it's down to 11 queries in ~250ms. For the dataset of the netdev list, it's down to 440ms from 1500ms. Signed-off-by: Stewart Smith <stewart@linux.ibm.com> --- patchwork/views/project.py | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)