From patchwork Sun Sep 3 13:36:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Axtens X-Patchwork-Id: 809294 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xlYtz6Q1Sz9t2f for ; Sun, 3 Sep 2017 23:37:31 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=axtens.net header.i=@axtens.net header.b="iwJBv+QF"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3xlYtz4fX7zDrmv for ; Sun, 3 Sep 2017 23:37:31 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=axtens.net header.i=@axtens.net header.b="iwJBv+QF"; dkim-atps=neutral X-Original-To: patchwork@lists.ozlabs.org Delivered-To: patchwork@lists.ozlabs.org Received: from mail-pf0-x229.google.com (mail-pf0-x229.google.com [IPv6:2607:f8b0:400e:c00::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3xlYtp73WxzDqYP for ; Sun, 3 Sep 2017 23:37:22 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=axtens.net header.i=@axtens.net header.b="iwJBv+QF"; dkim-atps=neutral Received: by mail-pf0-x229.google.com with SMTP id l87so11626176pfj.1 for ; Sun, 03 Sep 2017 06:37:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=axtens.net; s=google; h=from:to:cc:subject:date:message-id; bh=UkX2epmMNkNOPCtqs5GU92P1VOQfdYjjqBpm7yGZ++Q=; b=iwJBv+QFv48whLeajsZJ6TcXDeGbxrjGa6S3uP9oim2sJMveNc6a+RFM77eT63M/85 oNH9vU7fpFvJkfBViZkGZ/oN69wljQkKAocU2QJlJB9YtuA8s/W9PThNlCPYv7Hj2JdP +ZDqC04kPLauppVBc2pxJabDLyxBIiblLIMGk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=UkX2epmMNkNOPCtqs5GU92P1VOQfdYjjqBpm7yGZ++Q=; b=F+OiopbbzAxJn4Swl/d8z01imnxMjTT0yWhEp019koVU/V+1yM5xTV4jl+mFGtxpc7 F5uHxf8s5UdMAkVk5cCb8/OJnDuC7vFYQxLyYn58O8FFc9Bj3bhu7uuPSpH9wMclctxS Op3OnGMbYkvfnMzfdGO0YHo7UiuKlMIsIWi3b30bOEPy2lLEJXjL1vGq8v4rm7VOnlqB Yiynsl9FN19V73eFGJVGD09ibTmLV2oDL4mo76mArr9x+wDMnzlmNOk/nA/YbywyiRoI ikQJhMAxS81b6m3xh38EeEo+DqJJi98MBDKr/yRnATz8nQf2D/QrUSjLrsA205B8J3JU JLdg== X-Gm-Message-State: AHPjjUgJ781l2Y0wkJY1QjEEGH5av46kd5EMZNbZfnBxgvGLtcvd4cbS CtpCM+OpgrjPODX5YaZXtA== X-Google-Smtp-Source: ADKCNb5vWqhyPqDdCM/gy28IZJtrpQXNLe6J8+WvIo6Rgf1whKd3f205nre0fcW0l8nKLnDqZCTSbQ== X-Received: by 10.99.154.9 with SMTP id o9mr8788194pge.205.1504445840374; Sun, 03 Sep 2017 06:37:20 -0700 (PDT) Received: from localhost.localdomain (124-171-202-56.dyn.iinet.net.au. [124.171.202.56]) by smtp.gmail.com with ESMTPSA id 62sm7077493pfj.86.2017.09.03.06.37.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 03 Sep 2017 06:37:19 -0700 (PDT) From: Daniel Axtens To: patchwork@lists.ozlabs.org Subject: [PATCH] RFC: Monkey-patch Django performance bug Date: Sun, 3 Sep 2017 23:36:31 +1000 Message-Id: <20170903133631.13609-1-dja@axtens.net> X-Mailer: git-send-email 2.11.0 X-BeenThere: patchwork@lists.ozlabs.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Patchwork development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: sfr@canb.auug.org.au, jk@ozlabs.org MIME-Version: 1.0 Errors-To: patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Patchwork" OzLabs noticed a performance regression, which was traced to the loading of the 'headers', 'content' and 'diff' fields, despite the 'defer()' call in views/__init__.py:generic_list() This is a django bug, reported upstream and fixed at: https://code.djangoproject.com/ticket/28549#ticket https://github.com/django/django/pull/8994 I will be preparing a patch for the Debian-based system OzLabs uses, but I am wondering if we need a more generic approach for all our potential users, rather than asking people to patch their systems. This is one way to do it: Monkey-patch the broken method, and add a test to verify correct behaviour. It's also a somewhat terrifying approach, so ther suggestions are very welcome. Signed-off-by: Daniel Axtens --- patchwork/__init__.py | 1 + patchwork/django_bugs.py | 124 +++++++++++++++++++++++++++++++++++++++++++ patchwork/tests/test_list.py | 22 ++++++++ 3 files changed, 147 insertions(+) create mode 100644 patchwork/django_bugs.py diff --git a/patchwork/__init__.py b/patchwork/__init__.py index 98e9b21d066f..8e704e4b30a2 100644 --- a/patchwork/__init__.py +++ b/patchwork/__init__.py @@ -18,6 +18,7 @@ # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA from patchwork.version import get_latest_version +from patchwork import django_bugs # noqa VERSION = (2, 1, 0, 'alpha', 0) diff --git a/patchwork/django_bugs.py b/patchwork/django_bugs.py new file mode 100644 index 000000000000..d0dfbfc5561f --- /dev/null +++ b/patchwork/django_bugs.py @@ -0,0 +1,124 @@ +"""Horrifying workarounds for bugs. + +Please justify your answers and show all reasoning. + +Provide test cases! + +Imported through patchwork's __init__.py + +""" + +# https://code.djangoproject.com/ticket/28549 +# Can't defer() fields from super- and sub-class at the same time +# +# In views/__init__.py in in generic_list() we attempt to defer the +# headers, diff and content of a patch. This broke with pw2, as the +# headers and content belonged to the super-class (Submission) and the +# diff belongs to the sub-class (Patch). As such an attempt to defer +# both ended up deferring neither, and performance was affected very +# noticeably on OzLabs. +# +# jk successfully fixed the bug in collaboration with upstream, but +# this leaves deployments in an unfortunate position, as the fix +# hasn't made it into any released versions yet. People would have to +# patch their system or installation django. That would come with an +# enormous maintainability cost. So monkey-patch the fixed version in +# so long as we're below v1.12. +# +# Drop this after we only support Django v1.12+ + +import django +from django.db.models.constants import LOOKUP_SEP +from django.db.models.sql.query import is_reverse_o2o +from django.db.models.sql.query import add_to_dict + + +def backported_deferred_to_data(self, target, callback): + """ + Convert the self.deferred_loading data structure to an alternate data + structure, describing the field that *will* be loaded. This is used to + compute the columns to select from the database and also by the + QuerySet class to work out which fields are being initialized on each + model. Models that have all their fields included aren't mentioned in + the result, only those that have field restrictions in place. + The "target" parameter is the instance that is populated (in place). + The "callback" is a function that is called whenever a (model, field) + pair need to be added to "target". It accepts three parameters: + "target", and the model and list of fields being added for that model. + """ + field_names, defer = self.deferred_loading + if not field_names: + return + orig_opts = self.get_meta() + seen = {} + must_include = {orig_opts.concrete_model: {orig_opts.pk}} + for field_name in field_names: + parts = field_name.split(LOOKUP_SEP) + cur_model = self.model._meta.concrete_model + opts = orig_opts + for name in parts[:-1]: + old_model = cur_model + source = opts.get_field(name) + if is_reverse_o2o(source): + cur_model = source.related_model + else: + cur_model = source.remote_field.model + opts = cur_model._meta + # Even if we're "just passing through" this model, we must add + # both the current model's pk and the related reference field + # (if it's not a reverse relation) to the things we select. + if not is_reverse_o2o(source): + must_include[old_model].add(source) + add_to_dict(must_include, cur_model, opts.pk) + field = opts.get_field(parts[-1]) + is_reverse_object = field.auto_created and not field.concrete + model = field.related_model if is_reverse_object else field.model + model = model._meta.concrete_model + if model == opts.model: + model = cur_model + if not is_reverse_o2o(field): + add_to_dict(seen, model, field) + + if defer: + # We need to load all fields for each model, except those that + # appear in "seen" (for all models that appear in "seen"). The only + # slight complexity here is handling fields that exist on parent + # models. + workset = {} + for model, values in seen.items(): + for field in model._meta.local_fields: + if field in values: + continue + m = field.model._meta.concrete_model + add_to_dict(workset, m, field) + for model, values in must_include.items(): + # If we haven't included a model in workset, we don't add the + # corresponding must_include fields for that model, since an + # empty set means "include all fields". That's why there's no + # "else" branch here. + if model in workset: + workset[model].update(values) + for model, values in workset.items(): + callback(target, model, values) + else: + for model, values in must_include.items(): + if model in seen: + seen[model].update(values) + else: + # As we've passed through this model, but not explicitly + # included any fields, we have to make sure it's mentioned + # so that only the "must include" fields are pulled in. + seen[model] = values + # Now ensure that every model in the inheritance chain is mentioned + # in the parent list. Again, it must be mentioned to ensure that + # only "must include" fields are pulled in. + for model in orig_opts.get_parent_list(): + if model not in seen: + seen[model] = set() + for model, values in seen.items(): + callback(target, model, values) + + +if django.VERSION < (1, 12): + django.db.models.sql.query.Query.deferred_to_data = \ + backported_deferred_to_data diff --git a/patchwork/tests/test_list.py b/patchwork/tests/test_list.py index 11b9da9c11d3..c891a4cb4191 100644 --- a/patchwork/tests/test_list.py +++ b/patchwork/tests/test_list.py @@ -22,6 +22,8 @@ from __future__ import absolute_import from datetime import datetime as dt import re +import django +from django.db.models.query_utils import DeferredAttribute from django.test import TestCase from django.utils.six.moves import zip @@ -32,6 +34,19 @@ from patchwork.tests.utils import create_person from patchwork.tests.utils import create_project +if django.VERSION < (1, 10): + def count_deferred(obj): + count = 0 + for field in obj._meta.fields: + if isinstance(obj.__class__.__dict__.get(field.attname), + DeferredAttribute): + count += 1 + return count +else: + def count_deferred(obj): + return len(obj.get_deferred_fields()) + + class EmptyPatchListTest(TestCase): def test_empty_patch_list(self): @@ -132,3 +147,10 @@ class PatchOrderTest(TestCase): p2.submitter.name.lower()) self._test_sequence(response, test_fn) + + def test_defers(self): + # see django_bugs.py + patches = Patch.objects.all() + patches = patches.defer('diff', 'content', 'headers') + patch = patches.first() + self.assertEqual(count_deferred(patch), 3)