Message ID | 20180711144341.B61BB43994575@oldenburg.str.redhat.com |
---|---|
State | New |
Headers | show |
Series | regexec: Fix off-by-one bug in weight comparison [BZ #23036] | expand |
On Wed, Jul 11, 2018 at 04:43:41PM +0200, Florian Weimer wrote: > Each weight is prefixed by its length, and the length does not include > itself in the count. This can be seen clearly from the find_idx > function in string/strxfrm_l.c, for example. The old code behaved as if > the length itself counted, thus comparing an additional byte after the > weight, leading to spurious comparison failures and incorrect further > partitioning of character equivalence classes. > > (cherry picked from commit 7b2f4cedf044ea83f53f6b43a5bf6871eb9ce969) > > 2018-07-10 Florian Weimer <fweimer@redhat.com> > > [BZ #23036] > * posix/regexec.c (check_node_accept_bytes): When comparing > weights, do not compare an extra byte after the end of the > weights. > > diff --git a/NEWS b/NEWS > index 2e7e7837ac..c5c78ffd3b 100644 > --- a/NEWS > +++ b/NEWS > @@ -69,6 +69,7 @@ The following bugs are resolved with this release: > [22947] FAIL: misc/tst-preadvwritev2 > [22963] cs_CZ: Add alternative month names > [23005] Crash in __res_context_send after memory allocation failure > + [23036] regexec: Fix off-by-one bug in weight comparison > [23037] initialize msg_flags to zero for sendmmsg() calls > [23069] sigaction broken on riscv64-linux-gnu > [23102] Incorrect parsing of consecutive $ variables in runpath entries Thanks for backporting this to 2.27.
On 07/11/2018 12:20 PM, Dmitry V. Levin wrote: > On Wed, Jul 11, 2018 at 04:43:41PM +0200, Florian Weimer wrote: >> Each weight is prefixed by its length, and the length does not include >> itself in the count. This can be seen clearly from the find_idx >> function in string/strxfrm_l.c, for example. The old code behaved as if >> the length itself counted, thus comparing an additional byte after the >> weight, leading to spurious comparison failures and incorrect further >> partitioning of character equivalence classes. >> >> (cherry picked from commit 7b2f4cedf044ea83f53f6b43a5bf6871eb9ce969) >> >> 2018-07-10 Florian Weimer <fweimer@redhat.com> >> >> [BZ #23036] >> * posix/regexec.c (check_node_accept_bytes): When comparing >> weights, do not compare an extra byte after the end of the >> weights. >> >> diff --git a/NEWS b/NEWS >> index 2e7e7837ac..c5c78ffd3b 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -69,6 +69,7 @@ The following bugs are resolved with this release: >> [22947] FAIL: misc/tst-preadvwritev2 >> [22963] cs_CZ: Add alternative month names >> [23005] Crash in __res_context_send after memory allocation failure >> + [23036] regexec: Fix off-by-one bug in weight comparison >> [23037] initialize msg_flags to zero for sendmmsg() calls >> [23069] sigaction broken on riscv64-linux-gnu >> [23102] Incorrect parsing of consecutive $ variables in runpath entries > > Thanks for backporting this to 2.27. +1! :-)
On 07/11/2018 06:20 PM, Dmitry V. Levin wrote: >> diff --git a/NEWS b/NEWS >> index 2e7e7837ac..c5c78ffd3b 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -69,6 +69,7 @@ The following bugs are resolved with this release: >> [22947] FAIL: misc/tst-preadvwritev2 >> [22963] cs_CZ: Add alternative month names >> [23005] Crash in __res_context_send after memory allocation failure >> + [23036] regexec: Fix off-by-one bug in weight comparison >> [23037] initialize msg_flags to zero for sendmmsg() calls >> [23069] sigaction broken on riscv64-linux-gnu >> [23102] Incorrect parsing of consecutive $ variables in runpath entries > > Thanks for backporting this to 2.27. Thanks for explaining the mystery. I accidentally passed the wrong option to my patch posting script. Florian
diff --git a/NEWS b/NEWS index 2e7e7837ac..c5c78ffd3b 100644 --- a/NEWS +++ b/NEWS @@ -69,6 +69,7 @@ The following bugs are resolved with this release: [22947] FAIL: misc/tst-preadvwritev2 [22963] cs_CZ: Add alternative month names [23005] Crash in __res_context_send after memory allocation failure + [23036] regexec: Fix off-by-one bug in weight comparison [23037] initialize msg_flags to zero for sendmmsg() calls [23069] sigaction broken on riscv64-linux-gnu [23102] Incorrect parsing of consecutive $ variables in runpath entries diff --git a/posix/regexec.c b/posix/regexec.c index 4b1ab4ecff..21129432d1 100644 --- a/posix/regexec.c +++ b/posix/regexec.c @@ -3848,30 +3848,27 @@ check_node_accept_bytes (const re_dfa_t *dfa, int node_idx, indirect = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTMB); int32_t idx = findidx (table, indirect, extra, &cp, elem_len); + int32_t rule = idx >> 24; + idx &= 0xffffff; if (idx > 0) - for (i = 0; i < cset->nequiv_classes; ++i) - { - int32_t equiv_class_idx = cset->equiv_classes[i]; - size_t weight_len = weights[idx & 0xffffff]; - if (weight_len == weights[equiv_class_idx & 0xffffff] - && (idx >> 24) == (equiv_class_idx >> 24)) - { - int cnt = 0; - - idx &= 0xffffff; - equiv_class_idx &= 0xffffff; - - while (cnt <= weight_len - && (weights[equiv_class_idx + 1 + cnt] - == weights[idx + 1 + cnt])) - ++cnt; - if (cnt > weight_len) - { - match_len = elem_len; - goto check_node_accept_bytes_match; - } - } - } + { + size_t weight_len = weights[idx]; + for (i = 0; i < cset->nequiv_classes; ++i) + { + int32_t equiv_class_idx = cset->equiv_classes[i]; + int32_t equiv_class_rule = equiv_class_idx >> 24; + equiv_class_idx &= 0xffffff; + if (weights[equiv_class_idx] == weight_len + && equiv_class_rule == rule + && memcmp (weights + idx + 1, + weights + equiv_class_idx + 1, + weight_len) == 0) + { + match_len = elem_len; + goto check_node_accept_bytes_match; + } + } + } } } else