Message ID | 20180320093135.26422-1-fl@n621.de |
---|---|
State | New |
Headers | show |
Series | os: truncate pidfile on creation | expand |
On 03/20/2018 04:31 AM, Florian Larysch wrote: > qemu_create_pidfile does not truncate the pidfile when it creates it, > but rather overwrites its contents with the new pid. This works fine as > long as the length of the pid doesn't decrease, but this might happen in > case of wraparounds, causing pidfiles to contain trailing garbage which > breaks operations such as 'kill $(cat pidfile)'. Ouch. Good analysis. However, I have to wonder if we have the opposite problem - when the file exists (truncated) but has not yet been written, how often do we have a race where someone can see the empty file? Should we be going even further and writing the pid into a temporary file and then rename(2)ing that file onto the real destination, so that if the pid file exists at all, it has stable contents that can't cause confusion? > > Instead, always truncate the file before writing it. > > Signed-off-by: Florian Larysch <fl@n621.de> > --- > os-posix.c | 2 +- > os-win32.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/os-posix.c b/os-posix.c > index b9c2343b1e..9a6b874180 100644 > --- a/os-posix.c > +++ b/os-posix.c > @@ -301,7 +301,7 @@ int qemu_create_pidfile(const char *filename) > int len; > int fd; > > - fd = qemu_open(filename, O_RDWR | O_CREAT, 0600); > + fd = qemu_open(filename, O_RDWR | O_CREAT | O_TRUNC, 0600); This part is right, if we don't care about the race with someone reading an empty file. > if (fd == -1) { > return -1; > } > diff --git a/os-win32.c b/os-win32.c > index 586a7c7d49..85dbad7af8 100644 > --- a/os-win32.c > +++ b/os-win32.c > @@ -108,7 +108,7 @@ int qemu_create_pidfile(const char *filename) > memset(&overlap, 0, sizeof(overlap)); > > file = CreateFile(filename, GENERIC_WRITE, FILE_SHARE_READ, NULL, > - OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); > + CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); I assume this is right (at least for looking comparable to the POSIX case), although I'm trusting you here. Reviewed-by: Eric Blake <eblake@redhat.com> Worth including in 2.12, unless we want to also avoid the race of an empty pidfile.
On Tue, Mar 20, 2018 at 09:50:19AM -0500, Eric Blake wrote: > However, I have to wonder if we have the opposite problem - when the > file exists (truncated) but has not yet been written, how often do > we have a race where someone can see the empty file? > > Should we be going even further and writing the pid into a temporary > file and then rename(2)ing that file onto the real destination, so > that if the pid file exists at all, it has stable contents that > can't cause confusion? I thought about doing that, but the window of opportunity is sufficiently small that this probably not a problem. Furthermore, the second purpose of the pid file is to provide mutual exclusion between qemu instances using the same pid file by means of the lockf() call. Given that the lock is associated with the inode, doing a rename() would defeat this if we didn't reopen the file before calling lockf(). However, doing that would in turn allow a race between multiple qemu instances (possibly resulting in the incorrect pid remaining in the file). That could also be fixed, but makes the whole affair even more complicated; not sure if this is worth the benefits. > >diff --git a/os-win32.c b/os-win32.c > >index 586a7c7d49..85dbad7af8 100644 > >--- a/os-win32.c > >+++ b/os-win32.c > >@@ -108,7 +108,7 @@ int qemu_create_pidfile(const char *filename) > > memset(&overlap, 0, sizeof(overlap)); > > file = CreateFile(filename, GENERIC_WRITE, FILE_SHARE_READ, NULL, > >- OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); > >+ CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); > > I assume this is right (at least for looking comparable to the POSIX > case), although I'm trusting you here. I adapted this for consistency reasons, but I don't have a Windows platform available to test on. I based this on [1] and [2], which seems reasonable, but I'd be happy if someone who is familiar with the Windows API could ack this (get_maintainer.pl suggested Stefan Weil, who is CC'ed). [1] https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx [2] https://stackoverflow.com/questions/14469607/difference-between-open-always-and-create-always-in-createfile-of-windows-api Florian
On Tue, Mar 20, 2018 at 09:50:19AM -0500, Eric Blake wrote: > On 03/20/2018 04:31 AM, Florian Larysch wrote: > > qemu_create_pidfile does not truncate the pidfile when it creates it, > > but rather overwrites its contents with the new pid. This works fine as > > long as the length of the pid doesn't decrease, but this might happen in > > case of wraparounds, causing pidfiles to contain trailing garbage which > > breaks operations such as 'kill $(cat pidfile)'. > > Ouch. Good analysis. > > However, I have to wonder if we have the opposite problem - when the file > exists (truncated) but has not yet been written, how often do we have a race > where someone can see the empty file? > > Should we be going even further and writing the pid into a temporary file > and then rename(2)ing that file onto the real destination, so that if the > pid file exists at all, it has stable contents that can't cause confusion? > > > > > Instead, always truncate the file before writing it. > > > > Signed-off-by: Florian Larysch <fl@n621.de> > > --- > > os-posix.c | 2 +- > > os-win32.c | 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/os-posix.c b/os-posix.c > > index b9c2343b1e..9a6b874180 100644 > > --- a/os-posix.c > > +++ b/os-posix.c > > @@ -301,7 +301,7 @@ int qemu_create_pidfile(const char *filename) > > int len; > > int fd; > > - fd = qemu_open(filename, O_RDWR | O_CREAT, 0600); > > + fd = qemu_open(filename, O_RDWR | O_CREAT | O_TRUNC, 0600); > > This part is right, if we don't care about the race with someone reading an > empty file. No, it is unsafe - we rely on lockf() to get the mutual exclusion. If a QEMU is running with pidfile locked, and its pid written into it, then a 2nd QEMU comes along it will truncate the pidfile owned by the original QEMU because the truncation happens before it has tried to acquire the lock. The 2nd QEMU will still exit, but the original QEMU's pid has now been lost. We must call ftruncate() after lockf(), but before writing the new pid into the file. That ensures there is no window in which it is possible to see the new & old pids mixed together. Regards, Daniel
On 03/20/2018 11:02 AM, Daniel P. Berrangé wrote: > On Tue, Mar 20, 2018 at 09:50:19AM -0500, Eric Blake wrote: >> On 03/20/2018 04:31 AM, Florian Larysch wrote: >>> qemu_create_pidfile does not truncate the pidfile when it creates it, >>> but rather overwrites its contents with the new pid. This works fine as >>> long as the length of the pid doesn't decrease, but this might happen in >>> case of wraparounds, causing pidfiles to contain trailing garbage which >>> breaks operations such as 'kill $(cat pidfile)'. >> >> Ouch. Good analysis. >> >> However, I have to wonder if we have the opposite problem - when the file >> exists (truncated) but has not yet been written, how often do we have a race >> where someone can see the empty file? >> >> This part is right, if we don't care about the race with someone reading an >> empty file. > > No, it is unsafe - we rely on lockf() to get the mutual exclusion. Oh, right, because we aren't using O_EXCL. > If a QEMU is running with pidfile locked, and its pid written into > it, then a 2nd QEMU comes along it will truncate the pidfile owned > by the original QEMU because the truncation happens before it has > tried to acquire the lock. The 2nd QEMU will still exit, but the > original QEMU's pid has now been lost. > > We must call ftruncate() after lockf(), but before writing the new > pid into the file. That ensures there is no window in which it is > possible to see the new & old pids mixed together. So the window of time where the file can be empty still exists, but it must be AFTER the lockf() (and the window where reading the file content can see the wrong pid is longer - from the open until the ftruncate - but at least the wrong data is limited to a previous pid or the empty string, rather than a completely random wrong value from squashing together two unrelated pids). We'll need a v2.
On Tue, Mar 20, 2018 at 04:02:44PM +0000, Daniel P. Berrangé wrote: > No, it is unsafe - we rely on lockf() to get the mutual exclusion. > If a QEMU is running with pidfile locked, and its pid written into > it, then a 2nd QEMU comes along it will truncate the pidfile owned > by the original QEMU because the truncation happens before it has > tried to acquire the lock. The 2nd QEMU will still exit, but the > original QEMU's pid has now been lost. That's correct, thanks for pointing it out. > We must call ftruncate() after lockf(), but before writing the new > pid into the file. That ensures there is no window in which it is > possible to see the new & old pids mixed together. I'll send a revised version doing exactly that. From my reading of the Windows API documentation, this might not be a problem there: The file is opened with FILE_SHARE_READ, which prohibits opening the file in a writable mode and CREATE_ALWAYS will only recreate the file if it is writable. Florian
On Tue, Mar 20, 2018 at 05:21:16PM +0100, Florian Larysch wrote: > On Tue, Mar 20, 2018 at 04:02:44PM +0000, Daniel P. Berrangé wrote: > > No, it is unsafe - we rely on lockf() to get the mutual exclusion. > > If a QEMU is running with pidfile locked, and its pid written into > > it, then a 2nd QEMU comes along it will truncate the pidfile owned > > by the original QEMU because the truncation happens before it has > > tried to acquire the lock. The 2nd QEMU will still exit, but the > > original QEMU's pid has now been lost. > > That's correct, thanks for pointing it out. > > > We must call ftruncate() after lockf(), but before writing the new > > pid into the file. That ensures there is no window in which it is > > possible to see the new & old pids mixed together. > > I'll send a revised version doing exactly that. > > From my reading of the Windows API documentation, this might not be a > problem there: The file is opened with FILE_SHARE_READ, which prohibits > opening the file in a writable mode and CREATE_ALWAYS will only recreate > the file if it is writable. I don't think that's correct either I'm afraid. FILE_SHARE_READ indicates whether other processes are allowed to open the file in read-only mode, while we have it open for write. We're passing GENERIC_WRITE so QEMU has it open for write, thus your change will again cause the 2nd QEMU to blindly replace the file owned by the original QEMU. That said, I think the existing pidfile code for win32 is totally broken because QEMU is closing it immediately after writing its pid. So the only mutual exclusion is taking place in the tiny time window while QEMU is writing the pid. Regards, Daniel
diff --git a/os-posix.c b/os-posix.c index b9c2343b1e..9a6b874180 100644 --- a/os-posix.c +++ b/os-posix.c @@ -301,7 +301,7 @@ int qemu_create_pidfile(const char *filename) int len; int fd; - fd = qemu_open(filename, O_RDWR | O_CREAT, 0600); + fd = qemu_open(filename, O_RDWR | O_CREAT | O_TRUNC, 0600); if (fd == -1) { return -1; } diff --git a/os-win32.c b/os-win32.c index 586a7c7d49..85dbad7af8 100644 --- a/os-win32.c +++ b/os-win32.c @@ -108,7 +108,7 @@ int qemu_create_pidfile(const char *filename) memset(&overlap, 0, sizeof(overlap)); file = CreateFile(filename, GENERIC_WRITE, FILE_SHARE_READ, NULL, - OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); + CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); if (file == INVALID_HANDLE_VALUE) { return -1;
qemu_create_pidfile does not truncate the pidfile when it creates it, but rather overwrites its contents with the new pid. This works fine as long as the length of the pid doesn't decrease, but this might happen in case of wraparounds, causing pidfiles to contain trailing garbage which breaks operations such as 'kill $(cat pidfile)'. Instead, always truncate the file before writing it. Signed-off-by: Florian Larysch <fl@n621.de> --- os-posix.c | 2 +- os-win32.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)