diff mbox series

[RFC] 3x improvement in patch listing

Message ID 20180808074006.21820-1-stewart@linux.ibm.com
State Superseded
Headers show
Series [RFC] 3x improvement in patch listing | expand

Commit Message

Stewart Smith Aug. 8, 2018, 7:40 a.m. UTC
There's two main bits that are really expensive when composing the list
of patches for a project: the query getting the list, and the query
finding the series for each patch.

If we look at the query getting the list, it gets a lot of unnesseccary
fields such as 'headers' and 'content', even though we tell Django not
to. It turns out that Django seems to ignore the Submission relationship
and I have no idea how to force it to ignore that thing (defer doesn't
work) but if we go only, then it works okay.

From my import of ~8000 messages for a few projects, my laptop query
time (MySQL, as setup by whatever the docker-compose things do) goes
from:

http://localhost:8000/project/linux-kernel/list/
FROM:
342ms SQL queries cold cache, 268ms warm cache
TO:
118ms SQL queries cold cache, 88ms warm cache

Which is... non trivial to say the least.

The big jump is the patches.only change, and the removal of ordering
on the patchseries takes a further 10ms off. For some strange reason, it
seems rather hard to tell Django that you don't care what order the
results come back in for that query (if we do, then the db server has to
do a sort rather than just return each row)

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
---
 patchwork/models.py         | 1 -
 patchwork/views/__init__.py | 2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

Comments

Konstantin Ryabitsev Aug. 8, 2018, 3:38 p.m. UTC | #1
On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote:
>There's two main bits that are really expensive when composing the list
>of patches for a project: the query getting the list, and the query
>finding the series for each patch.

Stewart:

Thanks for working on this! Do you think this would help with pagination 
as well? I find that in semi-abandoned projects like 
https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few 
seconds to load the list view due to 800+ pages of unprocessed patches.  
I am currently considering an automated script that would auto-archive 
patches older than 6 months, but if simply improving pagination times 
would fix the issue, then I wouldn't need to bother.

Best,
-K
Daniel Axtens Aug. 8, 2018, 4:17 p.m. UTC | #2
Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:

> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote:
>>There's two main bits that are really expensive when composing the list
>>of patches for a project: the query getting the list, and the query
>>finding the series for each patch.
>
> Stewart:
>
> Thanks for working on this! Do you think this would help with pagination 
> as well? I find that in semi-abandoned projects like 
> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few 
> seconds to load the list view due to 800+ pages of unprocessed patches.  
> I am currently considering an automated script that would auto-archive 
> patches older than 6 months, but if simply improving pagination times 
> would fix the issue, then I wouldn't need to bother.

It should do. From a brief look at his patch, it fixes a hot path in
__init__.py that goes through the paginator (and which I was *sure* I
had looked at, but apparently I had not!) I'll do a quick sanity check
on it and hopefully apply it to master and stable/2.1 within the next
few days. (but no promises!)

Regards,
Daniel

>
> Best,
> -K
> _______________________________________________
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
Daniel Axtens Aug. 8, 2018, 4:57 p.m. UTC | #3
Daniel Axtens <dja@axtens.net> writes:

> Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:
>
>> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote:
>>>There's two main bits that are really expensive when composing the list
>>>of patches for a project: the query getting the list, and the query
>>>finding the series for each patch.
>>
>> Stewart:
>>
>> Thanks for working on this! Do you think this would help with pagination 
>> as well? I find that in semi-abandoned projects like 
>> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few 
>> seconds to load the list view due to 800+ pages of unprocessed patches.  
>> I am currently considering an automated script that would auto-archive 
>> patches older than 6 months, but if simply improving pagination times 
>> would fix the issue, then I wouldn't need to bother.
>
> It should do. From a brief look at his patch, it fixes a hot path in
> __init__.py that goes through the paginator (and which I was *sure* I
> had looked at, but apparently I had not!) I'll do a quick sanity check

Ah, I have figured out what I had thought I had fixed!

So you're hitting the issue where the db has to fetch the 'diff',
'content' and 'headers' out of Submission just to list the patches,
which is clearly utterly bonkers. This is *despite* the defer() we
attempt to do - jk reported a Django bug for it: [1].

It looks like Debian picked up the patch for their release, but it
didn't actually make it into Django 1.11.6 ([1], comment 8)

I attempted a monkey-patch inside Patchwork [2] - the patch was NAKed,
but OzLabs is/was carrying a local fix.

I suppose it's worth taking the patch - the Django fix certainly isn't
going to hit the 1.11 stable tree and we're not supporting 2.0 just
yet. I do still want to check it properly first, so ...

> on it and hopefully apply it to master and stable/2.1 within the next
> few days. (but no promises!)

... this lack-of-a-firm-timeline still applies.

Regards,
Daniel

[1] https://code.djangoproject.com/ticket/28549#ticket
[2] https://patchwork.ozlabs.org/patch/809294/
Stewart Smith Aug. 9, 2018, 1:55 a.m. UTC | #4
Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:
> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote:
>>There's two main bits that are really expensive when composing the list
>>of patches for a project: the query getting the list, and the query
>>finding the series for each patch.
>
> Stewart:
>
> Thanks for working on this! Do you think this would help with pagination 
> as well? I find that in semi-abandoned projects like 
> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few 
> seconds to load the list view due to 800+ pages of unprocessed patches.  
> I am currently considering an automated script that would auto-archive 
> patches older than 6 months, but if simply improving pagination times 
> would fix the issue, then I wouldn't need to bother.

I may be accidentally letting people know I know something about
databases again :)

Out of interest, does the kernel.org instance back onto PostgreSQL or
MySQL? I have a bunch of ideas that would quite dramatically improve
performance on MySQL (properly using primary keys to force disk layout
to be sane, rather than the current screw-locality method that Django
enforces). On PostgreSQL, things are a bit different, but it's possible
an occasional ALTER TABLE CLUSTER BY (or whatever the syntax is) could
help a *lot*.

I'm not sure that archiving the patches would help query performance at
all, but I'd have to check it out a bit. The query that Django generates
is certainly.... "interesting".
Stewart Smith Aug. 9, 2018, 2:06 a.m. UTC | #5
Daniel Axtens <dja@axtens.net> writes:
> Daniel Axtens <dja@axtens.net> writes:
>
>> Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:
>>
>>> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote:
>>>>There's two main bits that are really expensive when composing the list
>>>>of patches for a project: the query getting the list, and the query
>>>>finding the series for each patch.
>>>
>>> Stewart:
>>>
>>> Thanks for working on this! Do you think this would help with pagination 
>>> as well? I find that in semi-abandoned projects like 
>>> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few 
>>> seconds to load the list view due to 800+ pages of unprocessed patches.  
>>> I am currently considering an automated script that would auto-archive 
>>> patches older than 6 months, but if simply improving pagination times 
>>> would fix the issue, then I wouldn't need to bother.
>>
>> It should do. From a brief look at his patch, it fixes a hot path in
>> __init__.py that goes through the paginator (and which I was *sure* I
>> had looked at, but apparently I had not!) I'll do a quick sanity check
>
> Ah, I have figured out what I had thought I had fixed!
>
> So you're hitting the issue where the db has to fetch the 'diff',
> 'content' and 'headers' out of Submission just to list the patches,
> which is clearly utterly bonkers. This is *despite* the defer() we
> attempt to do - jk reported a Django bug for it: [1].

Yeah, that'd explain it. I was kind of going "wtf" a lot at that, and
then doing something kind of crazy to try and get things down to
something even *remotely* sane.

I was almost tempted to rewrite the page just as raw SQL rather than
using Django at all, at least then it won't randomly regress in the
future. I may be a cranky database guy with a healthy disdain for the
SQL that comes out of ORMs though :)

> I suppose it's worth taking the patch - the Django fix certainly isn't
> going to hit the 1.11 stable tree and we're not supporting 2.0 just
> yet. I do still want to check it properly first, so ...
>
>> on it and hopefully apply it to master and stable/2.1 within the next
>> few days. (but no promises!)
>
> ... this lack-of-a-firm-timeline still applies.

Any random testing / whatever I can do locally to try and help?

This is literally my first time poking at anything Django.
Konstantin Ryabitsev Aug. 9, 2018, 2:24 p.m. UTC | #6
On Thu, Aug 09, 2018 at 11:55:16AM +1000, Stewart Smith wrote:
>Out of interest, does the kernel.org instance back onto PostgreSQL or
>MySQL? 

We have one of each. The original instance patchwork.kernel.org is on 
MySQL (well, MariaDB), because that's the database cluster we have in 
our PDX DC. When we split the LKML project off to a VM in AWS 
(lore.kernel.org/patchwork), we set up a separate instance that's backed 
by the AWS PostgreSQL instance. However, that instance is for archival 
purposes only, and the one that's actually used for patch tracking and 
management is patchwork.kernel.org.

>I have a bunch of ideas that would quite dramatically improve
>performance on MySQL (properly using primary keys to force disk layout
>to be sane, rather than the current screw-locality method that Django
>enforces). On PostgreSQL, things are a bit different, but it's possible
>an occasional ALTER TABLE CLUSTER BY (or whatever the syntax is) could
>help a *lot*.

If you want to recreate a huge project, you can feed 20 years of LKML 
archives into it. :) 

>I'm not sure that archiving the patches would help query performance at
>all, but I'd have to check it out a bit. The query that Django generates
>is certainly.... "interesting".

I know for a fact that it does, because I have empirical evidence to 
prove it. Go to lore.kernel.org/patchwork and try the patch listing with 
Archived=No and without it. There will be a few seconds difference.

-K
diff mbox series

Patch

diff --git a/patchwork/models.py b/patchwork/models.py
index 6268f5b72e3c..d2b88fc48c91 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -747,7 +747,6 @@  class Series(FilenameMixin, models.Model):
         return self.name if self.name else 'Untitled series #%d' % self.id
 
     class Meta:
-        ordering = ('date',)
         verbose_name_plural = 'Series'
 
 
diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
index f8d23a388ac7..96fd0798af5a 100644
--- a/patchwork/views/__init__.py
+++ b/patchwork/views/__init__.py
@@ -287,6 +287,8 @@  def generic_list(request, project, view, view_args=None, filter_settings=None,
     # rendering the list template
     patches = patches.select_related('state', 'submitter', 'delegate')
 
+    patches = patches.only('state','submitter','delegate','project','name','date')
+
     # we also need checks and series
     patches = patches.prefetch_related('check_set', 'series')