diff mbox series

[9/9] parser: don't fail on multiple SeriesReferences

Message ID 20180221141716.10908-10-dja@axtens.net
State Changes Requested
Headers show
Series Tools and fixes for parallel parsing | expand

Checks

Context Check Description
dja/snowpatch-0_1_0 success master/apply_patch Successfully applied
dja/snowpatch-snowpatch_job_snowpatch-patchwork success Test snowpatch/job/snowpatch-patchwork on branch master

Commit Message

Daniel Axtens Feb. 21, 2018, 2:17 p.m. UTC
Parallel parsing would occasonally fail with:

patchwork.models.MultipleObjectsReturned: get() returned more than one SeriesReference -- it returned 2!

I think these are happening if you have different processes parsing
e.g. 1/3 and 2/3 simultaneously: both will have a reference to 1/3,
in the case of 1 it will be the msgid, in the case of 2 it will be
in References. So when we come to parse 3/3, .get() finds 2 and
throws the exception.

This does not fix the creation of multiple series references; it
just causes them to be ignored. We still have serious race conditions
with series creation, but I don't yet have clear answers for them.
With this patch, they will at least not stop patches from being
processed - they'll just lead to wonky series, which we already have.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 patchwork/parser.py | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

Comments

Andrew Donnellan Feb. 22, 2018, 4:53 a.m. UTC | #1
On 22/02/18 01:17, Daniel Axtens wrote:
> Parallel parsing would occasonally fail with:
> 
> patchwork.models.MultipleObjectsReturned: get() returned more than one SeriesReference -- it returned 2!
> 
> I think these are happening if you have different processes parsing
> e.g. 1/3 and 2/3 simultaneously: both will have a reference to 1/3,
> in the case of 1 it will be the msgid, in the case of 2 it will be
> in References. So when we come to parse 3/3, .get() finds 2 and
> throws the exception.
> 
> This does not fix the creation of multiple series references; it
> just causes them to be ignored. We still have serious race conditions
> with series creation, but I don't yet have clear answers for them.
> With this patch, they will at least not stop patches from being
> processed - they'll just lead to wonky series, which we already have.
> 
> Signed-off-by: Daniel Axtens <dja@axtens.net>

Label that open bug with a FIXME for searchability? Logging it is a 
great idea.

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

> ---
>   patchwork/parser.py | 23 +++++++++++++++++++++--
>   1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 0e53e6b9a3af..4c9e636336d9 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -240,6 +240,13 @@ def _find_series_by_references(project, mail):
>                   msgid=ref[:255], series__project=project).series
>           except SeriesReference.DoesNotExist:
>               continue
> +        except SeriesReference.MultipleObjectsReturned:
> +            # Open bug: this can happen when we're processing messages
> +            # in parallel. Pick the first and log.
> +            logger.error("Multiple SeriesReferences for %s in project %s!" %
> +                         (ref[:255], project.name))
> +            return SeriesReference.objects.filter(
> +                msgid=ref[:255], series__project=project).first().series
> 
> 
>   def _find_series_by_markers(project, mail, author):
> @@ -1037,6 +1044,9 @@ def parse_mail(mail, list_id=None):
>                                                   series__project=project)
>                   except SeriesReference.DoesNotExist:
>                       SeriesReference.objects.create(series=series, msgid=ref)
> +                except SeriesReference.MultipleObjectsReturned:
> +                    logger.error("Multiple SeriesReferences for %s"
> +                                 " in project %s!" % (ref, project.name))
> 
>           # add to a series if we have found one, and we have a numbered
>           # patch. Don't add unnumbered patches (for example diffs sent
> @@ -1075,6 +1085,11 @@ def parse_mail(mail, list_id=None):
>                       msgid=msgid, series__project=project).series
>               except SeriesReference.DoesNotExist:
>                   series = None
> +            except SeriesReference.MultipleObjectsReturned:
> +                logger.error("Multiple SeriesReferences for %s"
> +                             " in project %s!" % (msgid, project.name))
> +                series = SeriesReference.objects.filter(
> +                    msgid=msgid, series__project=project).first().series
> 
>               if not series:
>                   series = Series(project=project,
> @@ -1087,8 +1102,12 @@ def parse_mail(mail, list_id=None):
>                   # we don't save the in-reply-to or references fields
>                   # for a cover letter, as they can't refer to the same
>                   # series
> -                SeriesReference.objects.get_or_create(series=series,
> -                                                      msgid=msgid)
> +                try:
> +                    SeriesReference.objects.get_or_create(series=series,
> +                                                          msgid=msgid)
> +                except SeriesReference.MultipleObjectsReturned:
> +                    logger.error("Multiple SeriesReferences for %s"
> +                                 " in project %s!" % (msgid, project.name))
> 
>               cover_letter = CoverLetter(
>                   msgid=msgid,
>
Daniel Axtens Feb. 24, 2018, 2:39 p.m. UTC | #2
Andrew Donnellan <andrew.donnellan@au1.ibm.com> writes:

> On 22/02/18 01:17, Daniel Axtens wrote:
>> Parallel parsing would occasonally fail with:
>> 
>> patchwork.models.MultipleObjectsReturned: get() returned more than one SeriesReference -- it returned 2!
>> 
>> I think these are happening if you have different processes parsing
>> e.g. 1/3 and 2/3 simultaneously: both will have a reference to 1/3,
>> in the case of 1 it will be the msgid, in the case of 2 it will be
>> in References. So when we come to parse 3/3, .get() finds 2 and
>> throws the exception.
>> 
>> This does not fix the creation of multiple series references; it
>> just causes them to be ignored. We still have serious race conditions
>> with series creation, but I don't yet have clear answers for them.
>> With this patch, they will at least not stop patches from being
>> processed - they'll just lead to wonky series, which we already have.
>> 
>> Signed-off-by: Daniel Axtens <dja@axtens.net>
>
> Label that open bug with a FIXME for searchability? Logging it is a 
> great idea.

Done!

>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Thanks!

Regards,
Daniel

>
>> ---
>>   patchwork/parser.py | 23 +++++++++++++++++++++--
>>   1 file changed, 21 insertions(+), 2 deletions(-)
>> 
>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>> index 0e53e6b9a3af..4c9e636336d9 100644
>> --- a/patchwork/parser.py
>> +++ b/patchwork/parser.py
>> @@ -240,6 +240,13 @@ def _find_series_by_references(project, mail):
>>                   msgid=ref[:255], series__project=project).series
>>           except SeriesReference.DoesNotExist:
>>               continue
>> +        except SeriesReference.MultipleObjectsReturned:
>> +            # Open bug: this can happen when we're processing messages
>> +            # in parallel. Pick the first and log.
>> +            logger.error("Multiple SeriesReferences for %s in project %s!" %
>> +                         (ref[:255], project.name))
>> +            return SeriesReference.objects.filter(
>> +                msgid=ref[:255], series__project=project).first().series
>> 
>> 
>>   def _find_series_by_markers(project, mail, author):
>> @@ -1037,6 +1044,9 @@ def parse_mail(mail, list_id=None):
>>                                                   series__project=project)
>>                   except SeriesReference.DoesNotExist:
>>                       SeriesReference.objects.create(series=series, msgid=ref)
>> +                except SeriesReference.MultipleObjectsReturned:
>> +                    logger.error("Multiple SeriesReferences for %s"
>> +                                 " in project %s!" % (ref, project.name))
>> 
>>           # add to a series if we have found one, and we have a numbered
>>           # patch. Don't add unnumbered patches (for example diffs sent
>> @@ -1075,6 +1085,11 @@ def parse_mail(mail, list_id=None):
>>                       msgid=msgid, series__project=project).series
>>               except SeriesReference.DoesNotExist:
>>                   series = None
>> +            except SeriesReference.MultipleObjectsReturned:
>> +                logger.error("Multiple SeriesReferences for %s"
>> +                             " in project %s!" % (msgid, project.name))
>> +                series = SeriesReference.objects.filter(
>> +                    msgid=msgid, series__project=project).first().series
>> 
>>               if not series:
>>                   series = Series(project=project,
>> @@ -1087,8 +1102,12 @@ def parse_mail(mail, list_id=None):
>>                   # we don't save the in-reply-to or references fields
>>                   # for a cover letter, as they can't refer to the same
>>                   # series
>> -                SeriesReference.objects.get_or_create(series=series,
>> -                                                      msgid=msgid)
>> +                try:
>> +                    SeriesReference.objects.get_or_create(series=series,
>> +                                                          msgid=msgid)
>> +                except SeriesReference.MultipleObjectsReturned:
>> +                    logger.error("Multiple SeriesReferences for %s"
>> +                                 " in project %s!" % (msgid, project.name))
>> 
>>               cover_letter = CoverLetter(
>>                   msgid=msgid,
>> 
>
> -- 
> Andrew Donnellan              OzLabs, ADL Canberra
> andrew.donnellan@au1.ibm.com  IBM Australia Limited
diff mbox series

Patch

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 0e53e6b9a3af..4c9e636336d9 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -240,6 +240,13 @@  def _find_series_by_references(project, mail):
                 msgid=ref[:255], series__project=project).series
         except SeriesReference.DoesNotExist:
             continue
+        except SeriesReference.MultipleObjectsReturned:
+            # Open bug: this can happen when we're processing messages
+            # in parallel. Pick the first and log.
+            logger.error("Multiple SeriesReferences for %s in project %s!" %
+                         (ref[:255], project.name))
+            return SeriesReference.objects.filter(
+                msgid=ref[:255], series__project=project).first().series
 
 
 def _find_series_by_markers(project, mail, author):
@@ -1037,6 +1044,9 @@  def parse_mail(mail, list_id=None):
                                                 series__project=project)
                 except SeriesReference.DoesNotExist:
                     SeriesReference.objects.create(series=series, msgid=ref)
+                except SeriesReference.MultipleObjectsReturned:
+                    logger.error("Multiple SeriesReferences for %s"
+                                 " in project %s!" % (ref, project.name))
 
         # add to a series if we have found one, and we have a numbered
         # patch. Don't add unnumbered patches (for example diffs sent
@@ -1075,6 +1085,11 @@  def parse_mail(mail, list_id=None):
                     msgid=msgid, series__project=project).series
             except SeriesReference.DoesNotExist:
                 series = None
+            except SeriesReference.MultipleObjectsReturned:
+                logger.error("Multiple SeriesReferences for %s"
+                             " in project %s!" % (msgid, project.name))
+                series = SeriesReference.objects.filter(
+                    msgid=msgid, series__project=project).first().series
 
             if not series:
                 series = Series(project=project,
@@ -1087,8 +1102,12 @@  def parse_mail(mail, list_id=None):
                 # we don't save the in-reply-to or references fields
                 # for a cover letter, as they can't refer to the same
                 # series
-                SeriesReference.objects.get_or_create(series=series,
-                                                      msgid=msgid)
+                try:
+                    SeriesReference.objects.get_or_create(series=series,
+                                                          msgid=msgid)
+                except SeriesReference.MultipleObjectsReturned:
+                    logger.error("Multiple SeriesReferences for %s"
+                                 " in project %s!" % (msgid, project.name))
 
             cover_letter = CoverLetter(
                 msgid=msgid,