Message ID | 4E9C2445.2000007@net-b.de |
---|---|
State | New |
Headers | show |
On Mon, Oct 17, 2011 at 15:49, Tobias Burnus <burnus@net-b.de> wrote: > This patch adds a call to _commit() on _WIN32 for the FLUSH subroutine and > the FLUSH statement. It removes the _commit from gfortran's buf_flush. Like I argued in this message http://gcc.gnu.org/ml/fortran/2011-10/msg00094.html , I think this is a gross mistake. libgfortran should not require _commit nor fsync in any situation. Those calls are useful for writing databases and other applications which must make data integrity guarantees, and are prepared to pay the performance cost associated with it. It's absolutely not something a language support library should do unless the language spec explicitly requires such data integrity guarantees. > Background: > * gfortran internally buffers the I/O, but it calls the nonbuffering > open/write/read functions (and not, e.g., fopen/fwrite/fread). On > Unix/Darwin/Linux system, the changes become immediately visible to all > processes after a "write()". > * On Windows, there seems to be a file-descriptor specific system buffer > such that the data only becomes available to other processes after either a > "_commit" or after closing the file. > * The Windows _commit() is a combination of a (system) buffer flush - as a > "write()" already implies on POSIX systems, but it also ensures that the > data ends up on the disk, which on POSIX systems is done via fsync(). I admit I'm somewhat confused by this issue, and despite extensive googling I haven't been able to come up with a clear explanation for the behavior seen in PR 44698. That write() would be buffered on windows makes no sense to me, and I have found no documentation supporting this view. The closest what I found was in the documentation for WriteFile and WriteFileEX (which are the win32 native calls, write() is a wrapper around one of these (the EX version if available, presumably)): http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748%28v=vs.85%29.aspx "When writing to a file, the last write time is not fully updated until all handles used for writing have been closed. Therefore, to ensure an accurate last write time, close the file handle immediately after writing to the file." This suggests that metadata updates are not immediately visible to other processes. Or at least it makes sense that it would update the size at the same time it updates the last write time. Which would explain why stat("/path/to/file") would show an old size, as the size hasn't necessarily yet been updated to the directory, although the file data itself is already transferred to the OS. And, while I haven't checked it, it would make sense that fstat(fd,...) would return the up to date info, as that would just need to check the data via the process local file descriptor table rather than looking up the metadata via the directory. That is, I guess that the implementation is something along the lines of these calls not being buffered (thus no data lost if the process crashes after WriteFile(EX)), but the kernel maintains a per-handle metadata cache which is flushed to the filesystem during batched metadata updates (and close(), _commit(), or process ending), at which point it also becomes visible to other processes. Or something like that. That being said, this is just me speculating. If someone knows better, please feel free to share. And, while I'm at it, this kind of "relaxed consistency" is not unheard of in the unix world either. Consider NFS, where data and metadata may not be flushed to the server until fsync() or close() is called, or the attribute cache timeout forces the writeout(?), and thus it's possible for clients to have an inconsistent view of a file. In both cases the remedy is the same; if this kind of consistency matters, the user should close the file or fsync()/_commit() before expecting that the OS metadata is consistent. I think that's a better option than sprinkling _commit() all over the library. So I would rather prefer my own patch from the URL above. Also, I think it would be nice if we could get this fix into 4.6.2..
Hi Janne, On 10/17/2011 05:30 PM, Janne Blomqvist wrote: > On Mon, Oct 17, 2011 at 15:49, Tobias Burnus<burnus@net-b.de> wrote: >> This patch adds a call to _commit() on _WIN32 for the FLUSH subroutine and >> the FLUSH statement. It removes the _commit from gfortran's buf_flush. > Like I argued in this message http://gcc.gnu.org/ml/fortran/2011-10/msg00094.html, I think this is a gross mistake. [...] And I think it is a mistake to not make the data available to other processes as it is indicated by the Fortran 2008 standard: "Execution of a FLUSH statement causes data written to an external le to be available to other processes, or causes data placed in an external file by means other than Fortran to be available to a READ statement. These actions are processor dependent." Thus, I think it makes sense for FLUSH to call _commit on Windows. If you don't want to have a slow down: Simply do not call FLUSH. > libgfortran should not require _commit nor fsync in any situation. Those calls are useful for writing databases and other applications which must make data integrity guarantees, and are prepared to pay the performance cost associated with it. It's absolutely not something a language support library should do unless the language spec explicitly requires such data integrity guarantees. Well, Fortran does not need to write the data to the file, however, the purpose of FLUSH is that I can, e.g., run execute_command_line with the file the program just has written. It will work on Unix/Linux but not on MinGW/MinGW-w64 without a _commit (or without closing the file). > That write() would be buffered on windows makes no sense to me Why shouldn't it be buffed? Typical Windows programs open files with an exclusive lock and as Windows never had the pipes and many small programs as Unix did, having a per-file-descriptor buffer is easier to implement, avoids multi-thread issues and is potentially faster. If a program wants to make the data available, it can just _commit it or close the file handle - that way one also has a perfect data integrity. > And, while I'm at it, this kind of "relaxed consistency" is not > unheard of in the unix world either. Consider NFS, where data and > metadata may not be flushed to the server until fsync() or close() is > called, or the attribute cache timeout forces the writeout(?), and > thus it's possible for clients to have an inconsistent view of a file. Well, most of the time it works well on the same system: If I call execute_command_line, the data is up to date. The issue with NFS only occurs if I want to access the data remotely, which is another issue. If one wants to do that, one can use a parallel access with, e.g., HDF5 or MPIv2 or the Coarray TS (to be written and implemented). > In both cases the remedy is the same; if this kind of consistency matters, the user should close the file or fsync()/_commit() before expecting that the OS metadata is consistent. I think that's a better option than sprinkling _commit() all over the library. No, for the required consistency, FLUSH is enough (including calling _commit on MinGW/MinGW-w64). It makes sure that if the program crashes, the data is still there, it makes the data available for other processes. Only if one wants to have complete integrity, one can call fsync. However, with NFS, Lustre et al., I am not 100% sure that the data is immediately available on all other clients after fsync returned. > So I would rather prefer my own patch from the URL above. Also, I > think it would be nice if we could get this fix into 4.6.2.. I also would like to see this fixed for 4.6.2. However, a prerequisite is that we agree on how to implement it. Regarding your patch: I think it does not solve the FLUSH issue. For the file size itself, I think the patch is okay, but frankly, I think for the performance it does not really matter which approach is taken. And I do not like the test-suite part of your patch. Tobias
On Mon, Oct 17, 2011 at 19:03, Tobias Burnus <burnus@net-b.de> wrote: > Hi Janne, > > On 10/17/2011 05:30 PM, Janne Blomqvist wrote: >> >> On Mon, Oct 17, 2011 at 15:49, Tobias Burnus<burnus@net-b.de> wrote: >>> >>> This patch adds a call to _commit() on _WIN32 for the FLUSH subroutine >>> and >>> the FLUSH statement. It removes the _commit from gfortran's buf_flush. >> >> Like I argued in this message >> http://gcc.gnu.org/ml/fortran/2011-10/msg00094.html, I think this is a gross >> mistake. > > [...] > > And I think it is a mistake to not make the data available to other > processes as it is indicated by the Fortran 2008 standard: > > "Execution of a FLUSH statement causes data written to an external le to be > available to other processes, or causes data placed in an external file by > means other than Fortran to be available to a READ statement. These actions > are processor dependent." > > Thus, I think it makes sense for FLUSH to call _commit on Windows. I'm not actually sure we can draw such conclusions. What we know is that metadata updates to the directory are delayed. It wouldn't surprise me if opening a file (which presumably is a atomic operation in order to avoid race conditions just like on POSIX) forces the kernel to sync metadata of other handles to the same file. > If you don't want to have a slow down: Simply do not call FLUSH. There are legitimate reasons flush which do not entail storing dirty data to stable storage. Which is why things like fflush() and fsync() are separate. >> libgfortran should not require _commit nor fsync in any situation. Those >> calls are useful for writing databases and other applications which must >> make data integrity guarantees, and are prepared to pay the performance cost >> associated with it. It's absolutely not something a language support library >> should do unless the language spec explicitly requires such data integrity >> guarantees. > > Well, Fortran does not need to write the data to the file, however, the > purpose of FLUSH is that I can, e.g., run execute_command_line with the file > the program just has written. It will work on Unix/Linux but not on > MinGW/MinGW-w64 without a _commit (or without closing the file). Do we have any evidence of this failing in reality? >> That write() would be buffered on windows makes no sense to me > > Why shouldn't it be buffed? Because language runtimes already provide buffered IO so there's no reason for what is ostensibly a syscall to provide an extra layer of user space buffering. Secondly, AFAIK once a write is finished, the data is there even if the application crashes, which suggests that the data is transferred to the kernel page cache just like on other OS'es. To clarify, I was talking about a per-process user space buffer, like C stdio or indeed the internal buffering in libgfortran. Of course, like any remotely modern OS Windows does aggressive buffering of files in the kernel using whatever memory is available. > Typical Windows programs open files with an > exclusive lock and as Windows never had the pipes and many small programs as > Unix did, having a per-file-descriptor buffer is easier to implement, avoids > multi-thread issues and is potentially faster. The fact that Windows, like many other OS'es, supports things like shared libraries and shared file mappings, strongly suggests that it has a unified page cache for file data, just like Linux. Also, an OS that would waste memory on caching the same file multiple times would not be performance competitive in today's world. >> And, while I'm at it, this kind of "relaxed consistency" is not >> unheard of in the unix world either. Consider NFS, where data and >> metadata may not be flushed to the server until fsync() or close() is >> called, or the attribute cache timeout forces the writeout(?), and >> thus it's possible for clients to have an inconsistent view of a file. > > Well, most of the time it works well on the same system: Yes, most of the time it works just fine. If we're going to do something with massive performance implications, IMHO we'd better make sure it works always and not just most of the time. >> In both cases the remedy is the same; if this kind of consistency matters, >> the user should close the file or fsync()/_commit() before expecting that >> the OS metadata is consistent. I think that's a better option than >> sprinkling _commit() all over the library. > > No, for the required consistency, FLUSH is enough (including calling _commit > on MinGW/MinGW-w64). It makes sure that if the program crashes, the data is > still there, it makes the data available for other processes. Like I argued above, I'd be VERY surprised if written data is actually lost if the application crashes without first doing a _commit. Also, I find it likely that opening a file will actually provide the correct up-to-date metadata even though the directory might contain slightly stale metadata. > Only if one wants to have complete integrity, one can call fsync. However, > with NFS, Lustre et al., I am not 100% sure that the data is immediately > available on all other clients after fsync returned. With NFS, no. fsync() just guarantees that data is committed on the server; other clients will notice it only when their attribute caches timeout and they refetch the attributes for the file. For Lustre, yes, since Lustre implements a full cache coherency protocol. > Regarding your patch: I think it does not solve the FLUSH issue. Why not? > For the > file size itself, I think the patch is okay, but frankly, I think for the > performance it does not really matter which approach is taken. I do think performance matters. The performance of fsync()/_commit() is limited by the rotation speed of the HD; with a normal 7.2k SATA disk one can do maybe 100 fsync()'s per second. A terribly small number, that is, if one compares to the performance of the CPU and memory on a modern PC. For comparison, I recall I was able to do something like 10**6 or was it even 10**7 simple syscalls per second. > And I do not > like the test-suite part of your patch. ?? Well, with the rest of the patch the FLUSH is unnecessary, but I'm sure it won't do any great harm if it's kept.
On Mon, Oct 17, 2011 at 23:52, Janne Blomqvist <blomqvist.janne@gmail.com> wrote: > On Mon, Oct 17, 2011 at 19:03, Tobias Burnus <burnus@net-b.de> wrote: >> Hi Janne, >> >> On 10/17/2011 05:30 PM, Janne Blomqvist wrote: >>> >>> On Mon, Oct 17, 2011 at 15:49, Tobias Burnus<burnus@net-b.de> wrote: >>>> >>>> This patch adds a call to _commit() on _WIN32 for the FLUSH subroutine >>>> and >>>> the FLUSH statement. It removes the _commit from gfortran's buf_flush. >>> >>> Like I argued in this message >>> http://gcc.gnu.org/ml/fortran/2011-10/msg00094.html, I think this is a gross >>> mistake. >> >> [...] >> >> And I think it is a mistake to not make the data available to other >> processes as it is indicated by the Fortran 2008 standard: >> >> "Execution of a FLUSH statement causes data written to an external le to be >> available to other processes, or causes data placed in an external file by >> means other than Fortran to be available to a READ statement. These actions >> are processor dependent." >> >> Thus, I think it makes sense for FLUSH to call _commit on Windows. > > I'm not actually sure we can draw such conclusions. What we know is > that metadata updates to the directory are delayed. It wouldn't > surprise me if opening a file (which presumably is a atomic operation > in order to avoid race conditions just like on POSIX) forces the > kernel to sync metadata of other handles to the same file. I did some further googling, and http://stackoverflow.com/questions/2883691/fflush-on-stdout seems to suggest that the my explanation above is roughly what is happening. That is, opening and closing the file in another program, "type" in the link above, flushes the metadata to the directory. Also from the MSDN link referenced there "The only guarantee about a file timestamp is that the file time is correctly reflected when the handle that makes the change is closed. " (which would suggest that the same hold for other file metadata as well, such as the size). Explanation of file caching in Windows: http://msdn.microsoft.com/en-us/library/aa364218%28v=VS.85%29.aspx Some MS presentation about how to make apps behave nicely with SMB: http://mschnlnine.vo.llnwd.net/d1/pdc08/PPTX/ES23.pptx Under the heading of "Platform Support for Metadata Caching": "Metadata caching is best effort and there are very limited consistency guarantees Metadata caches expire after a fixed time " (Though it's not entirely clear if the above "platform support" means only SMB or Windows filesystem semantics in general.) So, I think the picture is essentially: - write() (which is a wrapper around WriteFile(EX)) transfers data and metadata to the system cache (kernel page cache in unix terminology). - However, the metadata is written to the directory lazily, allowing applications that do stat() (or the equivalent Win32 API call(s)) on a pathname to see stale data. This, incidentally, is what gfortran is doing and what caused the issue that led to the introduction of _commit in the first place. - Closing the file, or _commit() (a wrapper around FlushFileBuffers()), or (per the stackoverflow link above) opening/closing the file in another process will force the directory flush immediately. Some investigation into how other language support libraries handle this: - For MS-DOS compatibility (back when OS file caching wasn't that advanced, one presumes), the MS C runtime allows the user to link in an extra object COMMODE.OBJ which makes fflush() also call _commit() recreating the MS-DOS behavior. By default, however, this is not done. Also, there are some nonstandard flags that can be passed to fopen() to indicate that one wants the MS-DOS behavior. - For the MS C++ compiler, the same COMMODE.OBJ linking can be done, and there is some nonstandard extension allowing one to get the fd from a C++ stream and then call _commit on it. By default flush() on a stream does not call _commit/FlushFileBuffers, it just does a WriteFile(). - For .NET, there is the FileStream.Flush() method, which flushes the user-space buffer without calling _commit/FlushFileBuffers. In the latest version of .NET, there is a new FileStream.Flush(bool) method which can be used to call FlushFileBuffers. In previous .NET versions, people used PInvoke (the native code calling interface) to call FlushFileBuffers if needed. - For the runtimes provided with GCC, grepping the source tree shows that libgfortran is the only occurence of _commit. In libjava there is a FlushFileBuffers call, but it's #if 0'ed away. That is, except for libgfortran, no other language runtime by default makes an effort to synchronize the metadata. In conclusion, I still think that my previous patch which got rid of the _commit and the reliance on using stat() via pathname is the correct approach. "dir" might show stale data for the file size, but this seems to be the norm on Windows, and opening the file in another process will give the correct data. So in practice I don't think there will be any problems from getting rid of _commit.
2011-10-17 Tobias Burnus <burnus@net-b.de> PR fortran/50016 * intrinsic.texi (FLUSH): Document that it calls _commit on MinGW(-w64). 2011-10-17 Tobias Burnus <burnus@net-b.de> PR fortran/50016 * io/file_pos.c (st_flush): Call _commit on MinGW(-w64). * io/intrinsics.c (flush_i4, flush_i8): Ditto. * io/unix.c (flush_all_units_1, flush_all_units): Ditto. (buf_flush): Remove _commit call. diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi index 11f87a5..1a0e930 100644 --- a/gcc/fortran/intrinsic.texi +++ b/gcc/fortran/intrinsic.texi @@ -4775,7 +4775,10 @@ that the data is committed to disk. On POSIX systems, you can request that all data is transferred to the storage device by calling the @code{fsync} function, with the POSIX file descriptor of the I/O unit as argument (retrieved with GNU intrinsic -@code{FNUM}). The following example shows how: +@code{FNUM}). The following example shows how. On Windows (MinGW and +MinGW-w64), the @code{FLUSH} subroutine and @code{FLUSH} statement always +call @code{_commit} to ensure that other file descriptors see the current +file status; @code{_commit} additionally flushes the data to the disk. @smallexample ! Declare the interface for POSIX fsync function diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c index b034f38..4efdc1f 100644 --- a/libgfortran/io/file_pos.c +++ b/libgfortran/io/file_pos.c @@ -452,6 +452,10 @@ st_flush (st_parameter_filepos *fpp) fbuf_flush (u, u->mode); sflush (u->s); +#ifdef _WIN32 + /* Without _commit, changes are not visible to other file descriptors. */ + _commit (u->s->fd); +#endif unlock_unit (u); } else diff --git a/libgfortran/io/intrinsics.c b/libgfortran/io/intrinsics.c index f48bd77..eaf2e17 100644 --- a/libgfortran/io/intrinsics.c +++ b/libgfortran/io/intrinsics.c @@ -207,6 +207,11 @@ flush_i4 (GFC_INTEGER_4 *unit) if (us != NULL) { sflush (us->s); +#ifdef _WIN32 + /* Without _commit, changes are not visible + to other file descriptors. */ + _commit (u->s->fd); +#endif unlock_unit (us); } } @@ -230,6 +235,11 @@ flush_i8 (GFC_INTEGER_8 *unit) if (us != NULL) { sflush (us->s); +#ifdef _WIN32 + /* Without _commit, changes are not visible + to other file descriptors. */ + _commit (u->s->fd); +#endif unlock_unit (us); } } diff --git a/libgfortran/io/unix.c b/libgfortran/io/unix.c index 25cb559..a88e83b 100644 --- a/libgfortran/io/unix.c +++ b/libgfortran/io/unix.c @@ -443,10 +443,6 @@ buf_flush (unix_stream * s) if (s->ndirty != 0) return -1; -#ifdef _WIN32 - _commit (s->fd); -#endif - return 0; } @@ -1531,7 +1527,14 @@ flush_all_units_1 (gfc_unit *u, int min_unit) if (__gthread_mutex_trylock (&u->lock)) return u; if (u->s) - sflush (u->s); + { + sflush (u->s); +#ifdef _WIN32 + /* Without _commit, changes are not visible to other + file descriptors. */ + _commit (u->s->fd); +#endif + } __gthread_mutex_unlock (&u->lock); } u = u->right; @@ -1562,6 +1565,11 @@ flush_all_units (void) if (u->closed == 0) { sflush (u->s); +#ifdef _WIN32 + /* Without _commit, changes are not visible to other + file descriptors. */ + _commit (u->s->fd); +#endif __gthread_mutex_lock (&unit_lock); __gthread_mutex_unlock (&u->lock); (void) predec_waiting_locked (u);