Message ID | 201712080916.vB89GxSs005509@skeeve.com |
---|---|
State | New |
Headers | show |
Series | None | expand |
On 12/08/2017 10:16 AM, Arnold Robbins wrote: > +#if !defined(__GNUC__) || __GNUC__ < 3 > + static short utf8_sb_map_inited = 0; > + > + if (! utf8_sb_map_inited) > + { > + int i; > + > + utf8_sb_map_inited = 0; > + for (i = 0; i <= 0x80 / BITSET_WORD_BITS - 1; i++) > + utf8_sb_map[i] = BITSET_WORD_MAX; > + } > +#endif This doesn't look like a good idea because it's not thread-safe. Thanks, Florian
Florian Weimer <fweimer@redhat.com> wrote: > On 12/08/2017 10:16 AM, Arnold Robbins wrote: > > +#if !defined(__GNUC__) || __GNUC__ < 3 > > + static short utf8_sb_map_inited = 0; > > + > > + if (! utf8_sb_map_inited) > > + { > > + int i; > > + > > + utf8_sb_map_inited = 0; > > + for (i = 0; i <= 0x80 / BITSET_WORD_BITS - 1; i++) > > + utf8_sb_map[i] = BITSET_WORD_MAX; > > + } > > +#endif > > This doesn't look like a good idea because it's not thread-safe. Is the rest of regex thread safe? I've no objection to something else that does the trick for GLIBC. Gawk doesn't have multiple threads. Thanks, Arnold
On 12/08/2017 10:47 AM, arnold@skeeve.com wrote: > Florian Weimer <fweimer@redhat.com> wrote: > >> On 12/08/2017 10:16 AM, Arnold Robbins wrote: >>> +#if !defined(__GNUC__) || __GNUC__ < 3 >>> + static short utf8_sb_map_inited = 0; >>> + >>> + if (! utf8_sb_map_inited) >>> + { >>> + int i; >>> + >>> + utf8_sb_map_inited = 0; >>> + for (i = 0; i <= 0x80 / BITSET_WORD_BITS - 1; i++) >>> + utf8_sb_map[i] = BITSET_WORD_MAX; >>> + } >>> +#endif >> >> This doesn't look like a good idea because it's not thread-safe. > > Is the rest of regex thread safe? > > I've no objection to something else that does the trick for GLIBC. > Gawk doesn't have multiple threads. It does not matter for glibc because it's compiled with GCC. It might matter if this code is merged into gnulib, which is supposed to be portable to many compilers and environments. Thanks, Florian
On 08/12/2017 09:04, Florian Weimer wrote: > On 12/08/2017 10:47 AM, arnold@skeeve.com wrote: >> Florian Weimer <fweimer@redhat.com> wrote: >> >>> On 12/08/2017 10:16 AM, Arnold Robbins wrote: >>>> +#if !defined(__GNUC__) || __GNUC__ < 3 >>>> + static short utf8_sb_map_inited = 0; >>>> + >>>> + if (! utf8_sb_map_inited) >>>> + { >>>> + int i; >>>> + >>>> + utf8_sb_map_inited = 0; >>>> + for (i = 0; i <= 0x80 / BITSET_WORD_BITS - 1; i++) >>>> + utf8_sb_map[i] = BITSET_WORD_MAX; >>>> + } >>>> +#endif >>> >>> This doesn't look like a good idea because it's not thread-safe. >> >> Is the rest of regex thread safe? >> >> I've no objection to something else that does the trick for GLIBC. >> Gawk doesn't have multiple threads. > > It does not matter for glibc because it's compiled with GCC. It might matter if this code is merged into gnulib, which is supposed to be portable to many compilers and environments. My understanding is we still aim to keep in sync with gnulib, so I think we should first integrate with current gnulib code adding any glibc code (if any and with the idea of integrating it back to gnulib). I do not see much point in deviate even more from gnulib, unless the idea is to really decouple both implementations.
Florian Weimer <fweimer@redhat.com> wrote: > On 12/08/2017 10:47 AM, arnold@skeeve.com wrote: > > Florian Weimer <fweimer@redhat.com> wrote: > > > >> On 12/08/2017 10:16 AM, Arnold Robbins wrote: > >>> +#if !defined(__GNUC__) || __GNUC__ < 3 > >>> + static short utf8_sb_map_inited = 0; > >>> + > >>> + if (! utf8_sb_map_inited) > >>> + { > >>> + int i; > >>> + > >>> + utf8_sb_map_inited = 0; > >>> + for (i = 0; i <= 0x80 / BITSET_WORD_BITS - 1; i++) > >>> + utf8_sb_map[i] = BITSET_WORD_MAX; > >>> + } > >>> +#endif > >> > >> This doesn't look like a good idea because it's not thread-safe. > > > > Is the rest of regex thread safe? I don't think it is, what with re_set_syntax. > > I've no objection to something else that does the trick for GLIBC. > > Gawk doesn't have multiple threads. > > It does not matter for glibc because it's compiled with GCC. It might > matter if this code is merged into gnulib, which is supposed to be > portable to many compilers and environments. I think gnulib deals with it in a different way. They're smart enough to merge things if and when it becomes an issue. If it doesn't matter for glibc, then I hope it can be merged in. Thanks, Arnold
On 12/09/2017 10:07 AM, arnold@skeeve.com wrote:
> I think gnulib deals with it in a different way. They're smart enough > to merge things if and when it becomes an issue. If it doesn't
matter > for glibc, then I hope it can be merged in.
How about if we merge it into gnulib first? Gnulib is easier to merge into.
diff --git a/posix/regcomp.c b/posix/regcomp.c index 83fcc40..22c07ce 100644 --- a/posix/regcomp.c +++ b/posix/regcomp.c @@ -572,12 +572,15 @@ weak_alias (__regerror, regerror) UTF-8 is used. Otherwise we would allocate memory just to initialize it the same all the time. UTF-8 is the preferred encoding so this is a worthwhile optimization. */ -static const bitset_t utf8_sb_map = -{ +#if __GNUC__ >= 3 +static const bitset_t utf8_sb_map = { /* Set the first 128 bits. */ [0 ... 0x80 / BITSET_WORD_BITS - 1] = BITSET_WORD_MAX }; -#endif +#else /* ! (__GNUC__ >= 3) */ +static bitset_t utf8_sb_map; +#endif /* __GNUC__ >= 3 */ +#endif /* RE_ENABLE_I18N */ static void @@ -884,7 +887,21 @@ init_dfa (re_dfa_t *dfa, size_t pat_len) if (dfa->mb_cur_max > 1) { if (dfa->is_utf8) + { +#if !defined(__GNUC__) || __GNUC__ < 3 + static short utf8_sb_map_inited = 0; + + if (! utf8_sb_map_inited) + { + int i; + + utf8_sb_map_inited = 0; + for (i = 0; i <= 0x80 / BITSET_WORD_BITS - 1; i++) + utf8_sb_map[i] = BITSET_WORD_MAX; + } +#endif dfa->sb_char = (re_bitset_ptr_t) utf8_sb_map; + } else { int i, j, ch; diff --git a/posix/regex_internal.c b/posix/regex_internal.c index 19c1bd8..85daf67 100644 --- a/posix/regex_internal.c +++ b/posix/regex_internal.c @@ -515,7 +515,7 @@ re_string_skip_chars (re_string_t *pstr, int new_raw_idx, wint_t *last_wc) /* Then proceed the next character. */ rawbuf_idx += mbclen; } - *last_wc = wc; + *last_wc = (wint_t) wc; return rawbuf_idx; } #endif /* RE_ENABLE_I18N */ diff --git a/posix/regexec.c b/posix/regexec.c index c73294e..7b0f6d8 100644 --- a/posix/regexec.c +++ b/posix/regexec.c @@ -2850,7 +2850,7 @@ check_arrival (re_match_context_t *mctx, state_array_t *path, int top_node, sizeof (re_dfastate_t *) * (path->alloc - old_alloc)); } - str_idx = path->next_idx ?: top_str; + str_idx = path->next_idx ? path->next_idx : top_str; /* Temporary modify MCTX. */ backup_state_log = mctx->state_log;