Message ID  20180709172046.D1DA743994575@oldenburg.str.redhat.com 

State  New 
Headers  show 
Series 

Related  show 
On 07/09/2018 01:20 PM, Florian Weimer wrote: > 20180709 Florian Weimer <fweimer@redhat.com> > > [BZ #23036] > * posix/regexec.c (check_node_accept_bytes): When comparing > weights, do not compare an extra byte after the end of the > weights. I assume that this fixes the issue and that you have minimally passed the regression testing for x86_64. I would like a test for this, but it can wait. This regression is a blocker for the release, and I appreciate your root cause analysis of the issue. If I understand correctly this is a "offbyone" error and I'll explain below. In the future please try to keep the change to the absolute minimum instead of rewriting it and hoisting constants out of loops. It certainly made the review longer than it should have been. We can always clean this up later. Given all of my assumptions and admonitions, this looks OK. Reviewedby: Carlos O'Donell <carlos@redhat.com> > diff git a/posix/regexec.c b/posix/regexec.c > index 63aef97535..73644c2341 100644 >  a/posix/regexec.c > +++ b/posix/regexec.c > @@ 3878,30 +3878,27 @@ check_node_accept_bytes (const re_dfa_t *dfa, Idx node_idx, > indirect = (const int32_t *) > _NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTMB); > int32_t idx = findidx (table, indirect, extra, &cp, elem_len); > + int32_t rule = idx >> 24; > + idx &= 0xffffff; OK. > if (idx > 0) >  for (i = 0; i < cset>nequiv_classes; ++i) >  { >  int32_t equiv_class_idx = cset>equiv_classes[i]; >  size_t weight_len = weights[idx & 0xffffff]; >  if (weight_len == weights[equiv_class_idx & 0xffffff] >  && (idx >> 24) == (equiv_class_idx >> 24)) Check length and rule is the same. >  { >  Idx cnt = 0; >  >  idx &= 0xffffff; >  equiv_class_idx &= 0xffffff; >  >  while (cnt <= weight_len >  && (weights[equiv_class_idx + 1 + cnt] >  == weights[idx + 1 + cnt])) Here we start at count 0 and go to <= weight_len. This is one byte too far. In an Nlength weight: L01234...[N1] ^^ ^   weights [idx + 1 + (weight_len  1)]  weights[idx + 1]  weights L == N == weight_len So the loop for cnt <= weight_len goes one byte beyond the weights array. >  ++cnt; >  if (cnt > weight_len) >  { >  match_len = elem_len; >  goto check_node_accept_bytes_match; >  } >  } >  } > + { > + size_t weight_len = weights[idx]; > + for (i = 0; i < cset>nequiv_classes; ++i) > + { > + int32_t equiv_class_idx = cset>equiv_classes[i]; > + int32_t equiv_class_rule = equiv_class_idx >> 24; > + equiv_class_idx &= 0xffffff; > + if (weights[equiv_class_idx] == weight_len OK, check the lengths are the same. > + && equiv_class_rule == rule OK, check the rule is the same. > + && memcmp (weights + idx + 1, > + weights + equiv_class_idx + 1, > + weight_len) == 0) OK. Use memcmp to compare only weight_len bytes, not weight_len + 1 bytes. > + { > + match_len = elem_len; > + goto check_node_accept_bytes_match; OK. > + } > + } > + } > } > } > else > The new code is much more readable too, you can clearly see what is being compared and why. It is better than what we had before.
On 09/07/2018 15:00, Carlos O'Donell wrote: > On 07/09/2018 01:20 PM, Florian Weimer wrote: >> 20180709 Florian Weimer <fweimer@redhat.com> >> >> [BZ #23036] >> * posix/regexec.c (check_node_accept_bytes): When comparing >> weights, do not compare an extra byte after the end of the >> weights. > > I assume that this fixes the issue and that you have minimally passed > the regression testing for x86_64. I would like a test for this, but it > can wait. This regression is a blocker for the release, and I appreciate > your root cause analysis of the issue. I still think we should add a regression test for it, the bug report have one quite simple which we can add as an independent test (which should not invalidate testsuite validation already done).
* Adhemerval Zanella: > On 09/07/2018 15:00, Carlos O'Donell wrote: >> On 07/09/2018 01:20 PM, Florian Weimer wrote: >>> 20180709 Florian Weimer <fweimer@redhat.com> >>> >>> [BZ #23036] >>> * posix/regexec.c (check_node_accept_bytes): When comparing >>> weights, do not compare an extra byte after the end of the >>> weights. >> >> I assume that this fixes the issue and that you have minimally passed >> the regression testing for x86_64. I would like a test for this, but it >> can wait. This regression is a blocker for the release, and I appreciate >> your root cause analysis of the issue. > > I still think we should add a regression test for it, the bug report > have one quite simple which we can add as an independent test (which > should not invalidate testsuite validation already done). I'm working on a test which constructs all matching singlecharacter multibyte strings matching a given pattern in a given locale, so we can check that we get the expected result.
* Carlos O'Donell: >>  while (cnt <= weight_len >>  && (weights[equiv_class_idx + 1 + cnt] >>  == weights[idx + 1 + cnt])) > > Here we start at count 0 and go to <= weight_len. > > This is one byte too far. > > In an Nlength weight: > > L01234...[N1] > ^^ ^ >   weights [idx + 1 + (weight_len  1)] >  weights[idx + 1] >  weights > > L == N == weight_len > > So the loop for cnt <= weight_len goes one byte beyond the weights array. Right. I wasn't able to derive the data layout from locale/programs/ldcollate.c, but there's this code in string/strxfrm_l.c: /* Find next weight and rule index. Inlined since called for every char. */ static __always_inline size_t find_idx (const USTRING_TYPE **us, int32_t *weight_idx, unsigned char *rule_idx, const locale_data_t *l_data, const int pass) { int32_t tmp = findidx (l_data>table, l_data>indirect, l_data>extra, us, 1); *rule_idx = tmp >> 24; int32_t idx = tmp & 0xffffff; size_t len = l_data>weights[idx++]; /* Skip over indices of previous levels. */ for (int i = 0; i < pass; i++) { idx += len; len = l_data>weights[idx++]; } *weight_idx = idx; return len; } This makes it abundantly clear that the length element does not count itself in the length.
On 07/09/2018 02:55 PM, Adhemerval Zanella wrote: > > > On 09/07/2018 15:00, Carlos O'Donell wrote: >> On 07/09/2018 01:20 PM, Florian Weimer wrote: >>> 20180709 Florian Weimer <fweimer@redhat.com> >>> >>> [BZ #23036] >>> * posix/regexec.c (check_node_accept_bytes): When comparing >>> weights, do not compare an extra byte after the end of the >>> weights. >> >> I assume that this fixes the issue and that you have minimally passed >> the regression testing for x86_64. I would like a test for this, but it >> can wait. This regression is a blocker for the release, and I appreciate >> your root cause analysis of the issue. > > I still think we should add a regression test for it, the bug report > have one quite simple which we can add as an independent test (which > should not invalidate testsuite validation already done). I agree. However, the testsuite can take all the way until August 1st for us to get done. I imagine Florian posted this early because it's a release blocker and we want review right away.
diff git a/posix/regexec.c b/posix/regexec.c index 63aef97535..73644c2341 100644  a/posix/regexec.c +++ b/posix/regexec.c @@ 3878,30 +3878,27 @@ check_node_accept_bytes (const re_dfa_t *dfa, Idx node_idx, indirect = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTMB); int32_t idx = findidx (table, indirect, extra, &cp, elem_len); + int32_t rule = idx >> 24; + idx &= 0xffffff; if (idx > 0)  for (i = 0; i < cset>nequiv_classes; ++i)  {  int32_t equiv_class_idx = cset>equiv_classes[i];  size_t weight_len = weights[idx & 0xffffff];  if (weight_len == weights[equiv_class_idx & 0xffffff]  && (idx >> 24) == (equiv_class_idx >> 24))  {  Idx cnt = 0;   idx &= 0xffffff;  equiv_class_idx &= 0xffffff;   while (cnt <= weight_len  && (weights[equiv_class_idx + 1 + cnt]  == weights[idx + 1 + cnt]))  ++cnt;  if (cnt > weight_len)  {  match_len = elem_len;  goto check_node_accept_bytes_match;  }  }  } + { + size_t weight_len = weights[idx]; + for (i = 0; i < cset>nequiv_classes; ++i) + { + int32_t equiv_class_idx = cset>equiv_classes[i]; + int32_t equiv_class_rule = equiv_class_idx >> 24; + equiv_class_idx &= 0xffffff; + if (weights[equiv_class_idx] == weight_len + && equiv_class_rule == rule + && memcmp (weights + idx + 1, + weights + equiv_class_idx + 1, + weight_len) == 0) + { + match_len = elem_len; + goto check_node_accept_bytes_match; + } + } + } } } else