Message ID | 20180808074006.21820-1-stewart@linux.ibm.com |
---|---|
State | Superseded |
Headers | show |
Series | [RFC] 3x improvement in patch listing | expand |
On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote: >There's two main bits that are really expensive when composing the list >of patches for a project: the query getting the list, and the query >finding the series for each patch. Stewart: Thanks for working on this! Do you think this would help with pagination as well? I find that in semi-abandoned projects like https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few seconds to load the list view due to 800+ pages of unprocessed patches. I am currently considering an automated script that would auto-archive patches older than 6 months, but if simply improving pagination times would fix the issue, then I wouldn't need to bother. Best, -K
Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes: > On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote: >>There's two main bits that are really expensive when composing the list >>of patches for a project: the query getting the list, and the query >>finding the series for each patch. > > Stewart: > > Thanks for working on this! Do you think this would help with pagination > as well? I find that in semi-abandoned projects like > https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few > seconds to load the list view due to 800+ pages of unprocessed patches. > I am currently considering an automated script that would auto-archive > patches older than 6 months, but if simply improving pagination times > would fix the issue, then I wouldn't need to bother. It should do. From a brief look at his patch, it fixes a hot path in __init__.py that goes through the paginator (and which I was *sure* I had looked at, but apparently I had not!) I'll do a quick sanity check on it and hopefully apply it to master and stable/2.1 within the next few days. (but no promises!) Regards, Daniel > > Best, > -K > _______________________________________________ > Patchwork mailing list > Patchwork@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/patchwork
Daniel Axtens <dja@axtens.net> writes: > Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes: > >> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote: >>>There's two main bits that are really expensive when composing the list >>>of patches for a project: the query getting the list, and the query >>>finding the series for each patch. >> >> Stewart: >> >> Thanks for working on this! Do you think this would help with pagination >> as well? I find that in semi-abandoned projects like >> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few >> seconds to load the list view due to 800+ pages of unprocessed patches. >> I am currently considering an automated script that would auto-archive >> patches older than 6 months, but if simply improving pagination times >> would fix the issue, then I wouldn't need to bother. > > It should do. From a brief look at his patch, it fixes a hot path in > __init__.py that goes through the paginator (and which I was *sure* I > had looked at, but apparently I had not!) I'll do a quick sanity check Ah, I have figured out what I had thought I had fixed! So you're hitting the issue where the db has to fetch the 'diff', 'content' and 'headers' out of Submission just to list the patches, which is clearly utterly bonkers. This is *despite* the defer() we attempt to do - jk reported a Django bug for it: [1]. It looks like Debian picked up the patch for their release, but it didn't actually make it into Django 1.11.6 ([1], comment 8) I attempted a monkey-patch inside Patchwork [2] - the patch was NAKed, but OzLabs is/was carrying a local fix. I suppose it's worth taking the patch - the Django fix certainly isn't going to hit the 1.11 stable tree and we're not supporting 2.0 just yet. I do still want to check it properly first, so ... > on it and hopefully apply it to master and stable/2.1 within the next > few days. (but no promises!) ... this lack-of-a-firm-timeline still applies. Regards, Daniel [1] https://code.djangoproject.com/ticket/28549#ticket [2] https://patchwork.ozlabs.org/patch/809294/
Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes: > On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote: >>There's two main bits that are really expensive when composing the list >>of patches for a project: the query getting the list, and the query >>finding the series for each patch. > > Stewart: > > Thanks for working on this! Do you think this would help with pagination > as well? I find that in semi-abandoned projects like > https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few > seconds to load the list view due to 800+ pages of unprocessed patches. > I am currently considering an automated script that would auto-archive > patches older than 6 months, but if simply improving pagination times > would fix the issue, then I wouldn't need to bother. I may be accidentally letting people know I know something about databases again :) Out of interest, does the kernel.org instance back onto PostgreSQL or MySQL? I have a bunch of ideas that would quite dramatically improve performance on MySQL (properly using primary keys to force disk layout to be sane, rather than the current screw-locality method that Django enforces). On PostgreSQL, things are a bit different, but it's possible an occasional ALTER TABLE CLUSTER BY (or whatever the syntax is) could help a *lot*. I'm not sure that archiving the patches would help query performance at all, but I'd have to check it out a bit. The query that Django generates is certainly.... "interesting".
Daniel Axtens <dja@axtens.net> writes: > Daniel Axtens <dja@axtens.net> writes: > >> Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes: >> >>> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote: >>>>There's two main bits that are really expensive when composing the list >>>>of patches for a project: the query getting the list, and the query >>>>finding the series for each patch. >>> >>> Stewart: >>> >>> Thanks for working on this! Do you think this would help with pagination >>> as well? I find that in semi-abandoned projects like >>> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few >>> seconds to load the list view due to 800+ pages of unprocessed patches. >>> I am currently considering an automated script that would auto-archive >>> patches older than 6 months, but if simply improving pagination times >>> would fix the issue, then I wouldn't need to bother. >> >> It should do. From a brief look at his patch, it fixes a hot path in >> __init__.py that goes through the paginator (and which I was *sure* I >> had looked at, but apparently I had not!) I'll do a quick sanity check > > Ah, I have figured out what I had thought I had fixed! > > So you're hitting the issue where the db has to fetch the 'diff', > 'content' and 'headers' out of Submission just to list the patches, > which is clearly utterly bonkers. This is *despite* the defer() we > attempt to do - jk reported a Django bug for it: [1]. Yeah, that'd explain it. I was kind of going "wtf" a lot at that, and then doing something kind of crazy to try and get things down to something even *remotely* sane. I was almost tempted to rewrite the page just as raw SQL rather than using Django at all, at least then it won't randomly regress in the future. I may be a cranky database guy with a healthy disdain for the SQL that comes out of ORMs though :) > I suppose it's worth taking the patch - the Django fix certainly isn't > going to hit the 1.11 stable tree and we're not supporting 2.0 just > yet. I do still want to check it properly first, so ... > >> on it and hopefully apply it to master and stable/2.1 within the next >> few days. (but no promises!) > > ... this lack-of-a-firm-timeline still applies. Any random testing / whatever I can do locally to try and help? This is literally my first time poking at anything Django.
On Thu, Aug 09, 2018 at 11:55:16AM +1000, Stewart Smith wrote: >Out of interest, does the kernel.org instance back onto PostgreSQL or >MySQL? We have one of each. The original instance patchwork.kernel.org is on MySQL (well, MariaDB), because that's the database cluster we have in our PDX DC. When we split the LKML project off to a VM in AWS (lore.kernel.org/patchwork), we set up a separate instance that's backed by the AWS PostgreSQL instance. However, that instance is for archival purposes only, and the one that's actually used for patch tracking and management is patchwork.kernel.org. >I have a bunch of ideas that would quite dramatically improve >performance on MySQL (properly using primary keys to force disk layout >to be sane, rather than the current screw-locality method that Django >enforces). On PostgreSQL, things are a bit different, but it's possible >an occasional ALTER TABLE CLUSTER BY (or whatever the syntax is) could >help a *lot*. If you want to recreate a huge project, you can feed 20 years of LKML archives into it. :) >I'm not sure that archiving the patches would help query performance at >all, but I'd have to check it out a bit. The query that Django generates >is certainly.... "interesting". I know for a fact that it does, because I have empirical evidence to prove it. Go to lore.kernel.org/patchwork and try the patch listing with Archived=No and without it. There will be a few seconds difference. -K
diff --git a/patchwork/models.py b/patchwork/models.py index 6268f5b72e3c..d2b88fc48c91 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -747,7 +747,6 @@ class Series(FilenameMixin, models.Model): return self.name if self.name else 'Untitled series #%d' % self.id class Meta: - ordering = ('date',) verbose_name_plural = 'Series' diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py index f8d23a388ac7..96fd0798af5a 100644 --- a/patchwork/views/__init__.py +++ b/patchwork/views/__init__.py @@ -287,6 +287,8 @@ def generic_list(request, project, view, view_args=None, filter_settings=None, # rendering the list template patches = patches.select_related('state', 'submitter', 'delegate') + patches = patches.only('state','submitter','delegate','project','name','date') + # we also need checks and series patches = patches.prefetch_related('check_set', 'series')
There's two main bits that are really expensive when composing the list of patches for a project: the query getting the list, and the query finding the series for each patch. If we look at the query getting the list, it gets a lot of unnesseccary fields such as 'headers' and 'content', even though we tell Django not to. It turns out that Django seems to ignore the Submission relationship and I have no idea how to force it to ignore that thing (defer doesn't work) but if we go only, then it works okay. From my import of ~8000 messages for a few projects, my laptop query time (MySQL, as setup by whatever the docker-compose things do) goes from: http://localhost:8000/project/linux-kernel/list/ FROM: 342ms SQL queries cold cache, 268ms warm cache TO: 118ms SQL queries cold cache, 88ms warm cache Which is... non trivial to say the least. The big jump is the patches.only change, and the removal of ordering on the patchseries takes a further 10ms off. For some strange reason, it seems rather hard to tell Django that you don't care what order the results come back in for that query (if we do, then the db server has to do a sort rather than just return each row) Signed-off-by: Stewart Smith <stewart@linux.ibm.com> --- patchwork/models.py | 1 - patchwork/views/__init__.py | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-)