Message ID | 20170411113622.8C77B403B9917@oldenburg.str.redhat.com |
---|---|
State | New |
Headers | show |
On Tue, Apr 11, 2017 at 7:36 AM, Florian Weimer <fweimer@redhat.com> wrote: > They only modify the state in the dirstream argument, and we > generally do not treat this as a reason to mark a function as > not thread-safe. For an example, see random_r, which is marked > as thread-safe even though the random state is not protected > by a lock. Hmm. There's two issues here: first, POSIX specifically allows readdir to be not thread-safe (although it's unclear to me what that actually means) so it might be appropriate to keep the annotation around to warn people that there is a portability concern; second, if you share a DIR object among threads, a call to readdir in one thread will clobber the previous return value, which might still be live in another thread. Is that sufficient reason to call the *function* thread-unsafe? We don't have any good place to warn people about that *other* than the documentation for readdir. (Note that the text of the @deftypefun does a very bad job of explaining what the problem is.) zw
On 04/11/2017 03:16 PM, Zack Weinberg wrote: > On Tue, Apr 11, 2017 at 7:36 AM, Florian Weimer <fweimer@redhat.com> wrote: >> They only modify the state in the dirstream argument, and we >> generally do not treat this as a reason to mark a function as >> not thread-safe. For an example, see random_r, which is marked >> as thread-safe even though the random state is not protected >> by a lock. > > Hmm. There's two issues here: first, POSIX specifically allows > readdir to be not thread-safe (although it's unclear to me what that > actually means) so it might be appropriate to keep the annotation > around to warn people that there is a portability concern; The cost of that is that people use readdir_r instead, which is not what we want at all. > second, if > you share a DIR object among threads, a call to readdir in one thread > will clobber the previous return value, which might still be live in > another thread. That's true for file positions and FILE * objects, too. > Is that sufficient reason to call the *function* > thread-unsafe? We don't have any good place to warn people about that > *other* than the documentation for readdir. (Note that the text of > the @deftypefun does a very bad job of explaining what the problem > is.) I don't understand your comment about @deftypefun. The problem here is that people avoid readdir and jump through the hoops required to use readdir_r, either introducing security vulnerabilities or interoperability issues in the process. (It's all related to the missing buffer size argument, similar to realpath's second argument.) As usual, people assume that because libcs and POSIX define readdir_r, it is a desirable interface to have, but that is wrong. Thanks, Florian
On Tue, Apr 11, 2017 at 9:22 AM, Florian Weimer <fweimer@redhat.com> wrote: > On 04/11/2017 03:16 PM, Zack Weinberg wrote: >> On Tue, Apr 11, 2017 at 7:36 AM, Florian Weimer <fweimer@redhat.com> >> wrote: >>> >>> They only modify the state in the dirstream argument, and we >>> generally do not treat this as a reason to mark a function as >>> not thread-safe. For an example, see random_r, which is marked >>> as thread-safe even though the random state is not protected >>> by a lock. >> >> Hmm. There's two issues here: first, POSIX specifically allows >> readdir to be not thread-safe (although it's unclear to me what that >> actually means) so it might be appropriate to keep the annotation >> around to warn people that there is a portability concern; > > The cost of that is that people use readdir_r instead, which is not what we > want at all. > >> second, if >> you share a DIR object among threads, a call to readdir in one thread >> will clobber the previous return value, which might still be live in >> another thread. > > That's true for file positions and FILE * objects, too. Both good points. >> Is that sufficient reason to call the *function* >> thread-unsafe? We don't have any good place to warn people about that >> *other* than the documentation for readdir. (Note that the text of >> the @deftypefun does a very bad job of explaining what the problem >> is.) > > I don't understand your comment about @deftypefun. Sorry, I meant to complain about this paragraph (https://sourceware.org/git/?p=glibc.git;a=blob;f=manual/filesys.texi;h=edc7c64d22169bfb9f104321b141a6b4925ce740;hb=38efe8c5a5c1676f54481caa81980a93f7223479#l509): # In POSIX.1-2008, @code{readdir} is not thread-safe. In @theglibc{} # implementation, it is safe to call @code{readdir} concurrently on # different @var{dirstream}s, but multiple threads accessing the same # @var{dirstream} result in undefined behavior. @code{readdir_r} is a # fully thread-safe alternative, but suffers from poor portability (see # below). It is recommended that you use @code{readdir}, with external # locking if multiple threads access the same @var{dirstream}. which should really be something like # @strong{Caution:} The pointer returned by @code{readdir} points to # a buffer within the @code{DIR} object. The data in that buffer will # be overwritten by the next call to @code{readdir}. You must take care, # for instance, to copy the @code{d_name} string if you need it later. # # Because of this, it is not safe to share a @code{DIR} object among # multiple threads, unless you use your own locking to ensure that # no thread calls @code{readdir} while another thread is still using the # data from the previous call. In @theglibc{}, it is safe to call # @code{readdir} from multiple threads as long as each thread uses # its own @code{DIR} object. POSIX.1-2008 does not require this to # be safe, but we are not aware of any operating systems where it # does not work. # # @code{readdir_r} allows you to provide your own buffer for the # @code{struct dirent}, but it is less portable than @code{readdir}, and # has problems with very long filenames (see below). We recommend # you use @code{readdir}, but do not share @code{DIR} objects. with matching changes made to the readdir_r documentation. If we clarified the text along those lines I'd be fine with calling readdir thread-safe in the annotations. zw
On 04/11/2017 03:41 PM, Zack Weinberg wrote: >> I don't understand your comment about @deftypefun. > > Sorry, I meant to complain about this paragraph > (https://sourceware.org/git/?p=glibc.git;a=blob;f=manual/filesys.texi;h=edc7c64d22169bfb9f104321b141a6b4925ce740;hb=38efe8c5a5c1676f54481caa81980a93f7223479#l509): > > # In POSIX.1-2008, @code{readdir} is not thread-safe. In @theglibc{} > # implementation, it is safe to call @code{readdir} concurrently on > # different @var{dirstream}s, but multiple threads accessing the same > # @var{dirstream} result in undefined behavior. @code{readdir_r} is a > # fully thread-safe alternative, but suffers from poor portability (see > # below). It is recommended that you use @code{readdir}, with external > # locking if multiple threads access the same @var{dirstream}. > > which should really be something like > > # @strong{Caution:} The pointer returned by @code{readdir} points to > # a buffer within the @code{DIR} object. The data in that buffer will > # be overwritten by the next call to @code{readdir}. You must take care, > # for instance, to copy the @code{d_name} string if you need it later. > # > # Because of this, it is not safe to share a @code{DIR} object among > # multiple threads, unless you use your own locking to ensure that > # no thread calls @code{readdir} while another thread is still using the > # data from the previous call. In @theglibc{}, it is safe to call > # @code{readdir} from multiple threads as long as each thread uses > # its own @code{DIR} object. POSIX.1-2008 does not require this to > # be safe, but we are not aware of any operating systems where it > # does not work. > # > # @code{readdir_r} allows you to provide your own buffer for the > # @code{struct dirent}, but it is less portable than @code{readdir}, and > # has problems with very long filenames (see below). We recommend > # you use @code{readdir}, but do not share @code{DIR} objects. > > with matching changes made to the readdir_r documentation. If we > clarified the text along those lines I'd be fine with calling readdir > thread-safe in the annotations. Your suggested change makes sense to me. I can install it along with my patch if you want. Thanks, Florian
On Tue, Apr 11, 2017 at 10:31 AM, Florian Weimer <fweimer@redhat.com> wrote: > > Your suggested change makes sense to me. I can install it along with my > patch if you want. Go ahead. I'm thinking about further revisions but I'm not going to get to it today, so. zw
diff --git a/manual/filesys.texi b/manual/filesys.texi index edc7c64..23a8173 100644 --- a/manual/filesys.texi +++ b/manual/filesys.texi @@ -478,7 +478,7 @@ symbols are declared in the header file @file{dirent.h}. @comment dirent.h @comment POSIX.1 @deftypefun {struct dirent *} readdir (DIR *@var{dirstream}) -@safety{@prelim{}@mtunsafe{@mtasurace{:dirstream}}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} +@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} @c This function holds dirstream's non-recursive lock, which brings @c about the usual issues with locks and async signals and cancellation, @c but the lock taking is not enough to make the returned value safe to @@ -592,7 +592,7 @@ of the last two functions. @comment dirent.h @comment LFS @deftypefun {struct dirent64 *} readdir64 (DIR *@var{dirstream}) -@safety{@prelim{}@mtunsafe{@mtasurace{:dirstream}}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} +@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} The @code{readdir64} function is just like the @code{readdir} function except that it returns a pointer to a record of type @code{struct dirent64}. Some of the members of this data type (notably @code{d_ino})