regexec: Fix off-by-one bug in weight comparison [BZ #23036]

Message ID 20180711144341.B61BB43994575@oldenburg.str.redhat.com
State New
Headers show
Series
  • regexec: Fix off-by-one bug in weight comparison [BZ #23036]
Related show

Commit Message

Florian Weimer July 11, 2018, 2:43 p.m.
Each weight is prefixed by its length, and the length does not include
itself in the count.  This can be seen clearly from the find_idx
function in string/strxfrm_l.c, for example.  The old code behaved as if
the length itself counted, thus comparing an additional byte after the
weight, leading to spurious comparison failures and incorrect further
partitioning of character equivalence classes.

(cherry picked from commit 7b2f4cedf044ea83f53f6b43a5bf6871eb9ce969)

2018-07-10  Florian Weimer  <fweimer@redhat.com>

	[BZ #23036]
	* posix/regexec.c (check_node_accept_bytes): When comparing
	weights, do not compare an extra byte after the end of the
	weights.

Comments

Dmitry V. Levin July 11, 2018, 4:20 p.m. | #1
On Wed, Jul 11, 2018 at 04:43:41PM +0200, Florian Weimer wrote:
> Each weight is prefixed by its length, and the length does not include
> itself in the count.  This can be seen clearly from the find_idx
> function in string/strxfrm_l.c, for example.  The old code behaved as if
> the length itself counted, thus comparing an additional byte after the
> weight, leading to spurious comparison failures and incorrect further
> partitioning of character equivalence classes.
> 
> (cherry picked from commit 7b2f4cedf044ea83f53f6b43a5bf6871eb9ce969)
> 
> 2018-07-10  Florian Weimer  <fweimer@redhat.com>
> 
> 	[BZ #23036]
> 	* posix/regexec.c (check_node_accept_bytes): When comparing
> 	weights, do not compare an extra byte after the end of the
> 	weights.
> 
> diff --git a/NEWS b/NEWS
> index 2e7e7837ac..c5c78ffd3b 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -69,6 +69,7 @@ The following bugs are resolved with this release:
>    [22947] FAIL: misc/tst-preadvwritev2
>    [22963] cs_CZ: Add alternative month names
>    [23005] Crash in __res_context_send after memory allocation failure
> +  [23036] regexec: Fix off-by-one bug in weight comparison
>    [23037] initialize msg_flags to zero for sendmmsg() calls
>    [23069] sigaction broken on riscv64-linux-gnu
>    [23102] Incorrect parsing of consecutive $ variables in runpath entries

Thanks for backporting this to 2.27.
Carlos O'Donell July 11, 2018, 4:57 p.m. | #2
On 07/11/2018 12:20 PM, Dmitry V. Levin wrote:
> On Wed, Jul 11, 2018 at 04:43:41PM +0200, Florian Weimer wrote:
>> Each weight is prefixed by its length, and the length does not include
>> itself in the count.  This can be seen clearly from the find_idx
>> function in string/strxfrm_l.c, for example.  The old code behaved as if
>> the length itself counted, thus comparing an additional byte after the
>> weight, leading to spurious comparison failures and incorrect further
>> partitioning of character equivalence classes.
>>
>> (cherry picked from commit 7b2f4cedf044ea83f53f6b43a5bf6871eb9ce969)
>>
>> 2018-07-10  Florian Weimer  <fweimer@redhat.com>
>>
>> 	[BZ #23036]
>> 	* posix/regexec.c (check_node_accept_bytes): When comparing
>> 	weights, do not compare an extra byte after the end of the
>> 	weights.
>>
>> diff --git a/NEWS b/NEWS
>> index 2e7e7837ac..c5c78ffd3b 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -69,6 +69,7 @@ The following bugs are resolved with this release:
>>    [22947] FAIL: misc/tst-preadvwritev2
>>    [22963] cs_CZ: Add alternative month names
>>    [23005] Crash in __res_context_send after memory allocation failure
>> +  [23036] regexec: Fix off-by-one bug in weight comparison
>>    [23037] initialize msg_flags to zero for sendmmsg() calls
>>    [23069] sigaction broken on riscv64-linux-gnu
>>    [23102] Incorrect parsing of consecutive $ variables in runpath entries
> 
> Thanks for backporting this to 2.27.

+1! :-)
Florian Weimer July 11, 2018, 5:07 p.m. | #3
On 07/11/2018 06:20 PM, Dmitry V. Levin wrote:

>> diff --git a/NEWS b/NEWS
>> index 2e7e7837ac..c5c78ffd3b 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -69,6 +69,7 @@ The following bugs are resolved with this release:
>>     [22947] FAIL: misc/tst-preadvwritev2
>>     [22963] cs_CZ: Add alternative month names
>>     [23005] Crash in __res_context_send after memory allocation failure
>> +  [23036] regexec: Fix off-by-one bug in weight comparison
>>     [23037] initialize msg_flags to zero for sendmmsg() calls
>>     [23069] sigaction broken on riscv64-linux-gnu
>>     [23102] Incorrect parsing of consecutive $ variables in runpath entries
> 
> Thanks for backporting this to 2.27.

Thanks for explaining the mystery.  I accidentally passed the wrong 
option to my patch posting script.

Florian

Patch

diff --git a/NEWS b/NEWS
index 2e7e7837ac..c5c78ffd3b 100644
--- a/NEWS
+++ b/NEWS
@@ -69,6 +69,7 @@  The following bugs are resolved with this release:
   [22947] FAIL: misc/tst-preadvwritev2
   [22963] cs_CZ: Add alternative month names
   [23005] Crash in __res_context_send after memory allocation failure
+  [23036] regexec: Fix off-by-one bug in weight comparison
   [23037] initialize msg_flags to zero for sendmmsg() calls
   [23069] sigaction broken on riscv64-linux-gnu
   [23102] Incorrect parsing of consecutive $ variables in runpath entries
diff --git a/posix/regexec.c b/posix/regexec.c
index 4b1ab4ecff..21129432d1 100644
--- a/posix/regexec.c
+++ b/posix/regexec.c
@@ -3848,30 +3848,27 @@  check_node_accept_bytes (const re_dfa_t *dfa, int node_idx,
 	      indirect = (const int32_t *)
 		_NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTMB);
 	      int32_t idx = findidx (table, indirect, extra, &cp, elem_len);
+	      int32_t rule = idx >> 24;
+	      idx &= 0xffffff;
 	      if (idx > 0)
-		for (i = 0; i < cset->nequiv_classes; ++i)
-		  {
-		    int32_t equiv_class_idx = cset->equiv_classes[i];
-		    size_t weight_len = weights[idx & 0xffffff];
-		    if (weight_len == weights[equiv_class_idx & 0xffffff]
-			&& (idx >> 24) == (equiv_class_idx >> 24))
-		      {
-			int cnt = 0;
-
-			idx &= 0xffffff;
-			equiv_class_idx &= 0xffffff;
-
-			while (cnt <= weight_len
-			       && (weights[equiv_class_idx + 1 + cnt]
-				   == weights[idx + 1 + cnt]))
-			  ++cnt;
-			if (cnt > weight_len)
-			  {
-			    match_len = elem_len;
-			    goto check_node_accept_bytes_match;
-			  }
-		      }
-		  }
+		{
+		  size_t weight_len = weights[idx];
+		  for (i = 0; i < cset->nequiv_classes; ++i)
+		    {
+		      int32_t equiv_class_idx = cset->equiv_classes[i];
+		      int32_t equiv_class_rule = equiv_class_idx >> 24;
+		      equiv_class_idx &= 0xffffff;
+		      if (weights[equiv_class_idx] == weight_len
+			  && equiv_class_rule == rule
+			  && memcmp (weights + idx + 1,
+				     weights + equiv_class_idx + 1,
+				     weight_len) == 0)
+			{
+			  match_len = elem_len;
+			  goto check_node_accept_bytes_match;
+			}
+		    }
+		}
 	    }
 	}
       else