diff mbox series

os: truncate pidfile on creation

Message ID 20180320093135.26422-1-fl@n621.de
State New
Headers show
Series os: truncate pidfile on creation | expand

Commit Message

Florian Larysch March 20, 2018, 9:31 a.m. UTC
qemu_create_pidfile does not truncate the pidfile when it creates it,
but rather overwrites its contents with the new pid. This works fine as
long as the length of the pid doesn't decrease, but this might happen in
case of wraparounds, causing pidfiles to contain trailing garbage which
breaks operations such as 'kill $(cat pidfile)'.

Instead, always truncate the file before writing it.

Signed-off-by: Florian Larysch <fl@n621.de>
---
 os-posix.c | 2 +-
 os-win32.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Eric Blake March 20, 2018, 2:50 p.m. UTC | #1
On 03/20/2018 04:31 AM, Florian Larysch wrote:
> qemu_create_pidfile does not truncate the pidfile when it creates it,
> but rather overwrites its contents with the new pid. This works fine as
> long as the length of the pid doesn't decrease, but this might happen in
> case of wraparounds, causing pidfiles to contain trailing garbage which
> breaks operations such as 'kill $(cat pidfile)'.

Ouch.  Good analysis.

However, I have to wonder if we have the opposite problem - when the 
file exists (truncated) but has not yet been written, how often do we 
have a race where someone can see the empty file?

Should we be going even further and writing the pid into a temporary 
file and then rename(2)ing that file onto the real destination, so that 
if the pid file exists at all, it has stable contents that can't cause 
confusion?

> 
> Instead, always truncate the file before writing it.
> 
> Signed-off-by: Florian Larysch <fl@n621.de>
> ---
>   os-posix.c | 2 +-
>   os-win32.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/os-posix.c b/os-posix.c
> index b9c2343b1e..9a6b874180 100644
> --- a/os-posix.c
> +++ b/os-posix.c
> @@ -301,7 +301,7 @@ int qemu_create_pidfile(const char *filename)
>       int len;
>       int fd;
>   
> -    fd = qemu_open(filename, O_RDWR | O_CREAT, 0600);
> +    fd = qemu_open(filename, O_RDWR | O_CREAT | O_TRUNC, 0600);

This part is right, if we don't care about the race with someone reading 
an empty file.

>       if (fd == -1) {
>           return -1;
>       }
> diff --git a/os-win32.c b/os-win32.c
> index 586a7c7d49..85dbad7af8 100644
> --- a/os-win32.c
> +++ b/os-win32.c
> @@ -108,7 +108,7 @@ int qemu_create_pidfile(const char *filename)
>       memset(&overlap, 0, sizeof(overlap));
>   
>       file = CreateFile(filename, GENERIC_WRITE, FILE_SHARE_READ, NULL,
> -                      OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
> +                      CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

I assume this is right (at least for looking comparable to the POSIX 
case), although I'm trusting you here.

Reviewed-by: Eric Blake <eblake@redhat.com>

Worth including in 2.12, unless we want to also avoid the race of an 
empty pidfile.
Florian Larysch March 20, 2018, 3:49 p.m. UTC | #2
On Tue, Mar 20, 2018 at 09:50:19AM -0500, Eric Blake wrote:
> However, I have to wonder if we have the opposite problem - when the
> file exists (truncated) but has not yet been written, how often do
> we have a race where someone can see the empty file?
> 
> Should we be going even further and writing the pid into a temporary
> file and then rename(2)ing that file onto the real destination, so
> that if the pid file exists at all, it has stable contents that
> can't cause confusion?

I thought about doing that, but the window of opportunity is
sufficiently small that this probably not a problem.

Furthermore, the second purpose of the pid file is to provide mutual
exclusion between qemu instances using the same pid file by means of the
lockf() call. Given that the lock is associated with the inode, doing a
rename() would defeat this if we didn't reopen the file before calling
lockf(). However, doing that would in turn allow a race between multiple
qemu instances (possibly resulting in the incorrect pid remaining in the
file). That could also be fixed, but makes the whole affair even more
complicated; not sure if this is worth the benefits.

> >diff --git a/os-win32.c b/os-win32.c
> >index 586a7c7d49..85dbad7af8 100644
> >--- a/os-win32.c
> >+++ b/os-win32.c
> >@@ -108,7 +108,7 @@ int qemu_create_pidfile(const char *filename)
> >      memset(&overlap, 0, sizeof(overlap));
> >      file = CreateFile(filename, GENERIC_WRITE, FILE_SHARE_READ, NULL,
> >-                      OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
> >+                      CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
> 
> I assume this is right (at least for looking comparable to the POSIX
> case), although I'm trusting you here.

I adapted this for consistency reasons, but I don't have a Windows
platform available to test on. I based this on [1] and [2], which seems
reasonable, but I'd be happy if someone who is familiar with the Windows
API could ack this (get_maintainer.pl suggested Stefan Weil, who is
CC'ed).

[1] https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx
[2] https://stackoverflow.com/questions/14469607/difference-between-open-always-and-create-always-in-createfile-of-windows-api

Florian
Daniel P. Berrangé March 20, 2018, 4:02 p.m. UTC | #3
On Tue, Mar 20, 2018 at 09:50:19AM -0500, Eric Blake wrote:
> On 03/20/2018 04:31 AM, Florian Larysch wrote:
> > qemu_create_pidfile does not truncate the pidfile when it creates it,
> > but rather overwrites its contents with the new pid. This works fine as
> > long as the length of the pid doesn't decrease, but this might happen in
> > case of wraparounds, causing pidfiles to contain trailing garbage which
> > breaks operations such as 'kill $(cat pidfile)'.
> 
> Ouch.  Good analysis.
> 
> However, I have to wonder if we have the opposite problem - when the file
> exists (truncated) but has not yet been written, how often do we have a race
> where someone can see the empty file?
> 
> Should we be going even further and writing the pid into a temporary file
> and then rename(2)ing that file onto the real destination, so that if the
> pid file exists at all, it has stable contents that can't cause confusion?
> 
> > 
> > Instead, always truncate the file before writing it.
> > 
> > Signed-off-by: Florian Larysch <fl@n621.de>
> > ---
> >   os-posix.c | 2 +-
> >   os-win32.c | 2 +-
> >   2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/os-posix.c b/os-posix.c
> > index b9c2343b1e..9a6b874180 100644
> > --- a/os-posix.c
> > +++ b/os-posix.c
> > @@ -301,7 +301,7 @@ int qemu_create_pidfile(const char *filename)
> >       int len;
> >       int fd;
> > -    fd = qemu_open(filename, O_RDWR | O_CREAT, 0600);
> > +    fd = qemu_open(filename, O_RDWR | O_CREAT | O_TRUNC, 0600);
> 
> This part is right, if we don't care about the race with someone reading an
> empty file.

No, it is unsafe - we rely on lockf() to get the mutual exclusion.
If a QEMU is running with pidfile locked, and its pid written into
it, then a 2nd QEMU comes along it will truncate the pidfile owned
by the original QEMU because the truncation happens before it has
tried to acquire the lock. The 2nd QEMU will still exit, but the
original QEMU's pid has now been lost.

We must call ftruncate() after lockf(), but before writing the new
pid into the file. That ensures there is no window in which it is
possible to see the new & old pids mixed together.


Regards,
Daniel
Eric Blake March 20, 2018, 4:07 p.m. UTC | #4
On 03/20/2018 11:02 AM, Daniel P. Berrangé wrote:
> On Tue, Mar 20, 2018 at 09:50:19AM -0500, Eric Blake wrote:
>> On 03/20/2018 04:31 AM, Florian Larysch wrote:
>>> qemu_create_pidfile does not truncate the pidfile when it creates it,
>>> but rather overwrites its contents with the new pid. This works fine as
>>> long as the length of the pid doesn't decrease, but this might happen in
>>> case of wraparounds, causing pidfiles to contain trailing garbage which
>>> breaks operations such as 'kill $(cat pidfile)'.
>>
>> Ouch.  Good analysis.
>>
>> However, I have to wonder if we have the opposite problem - when the file
>> exists (truncated) but has not yet been written, how often do we have a race
>> where someone can see the empty file?
>>

>> This part is right, if we don't care about the race with someone reading an
>> empty file.
> 
> No, it is unsafe - we rely on lockf() to get the mutual exclusion.

Oh, right, because we aren't using O_EXCL.

> If a QEMU is running with pidfile locked, and its pid written into
> it, then a 2nd QEMU comes along it will truncate the pidfile owned
> by the original QEMU because the truncation happens before it has
> tried to acquire the lock. The 2nd QEMU will still exit, but the
> original QEMU's pid has now been lost.
> 
> We must call ftruncate() after lockf(), but before writing the new
> pid into the file. That ensures there is no window in which it is
> possible to see the new & old pids mixed together.

So the window of time where the file can be empty still exists, but it 
must be AFTER the lockf() (and the window where reading the file content 
can see the wrong pid is longer - from the open until the ftruncate - 
but at least the wrong data is limited to a previous pid or the empty 
string, rather than a completely random wrong value from squashing 
together two unrelated pids).

We'll need a v2.
Florian Larysch March 20, 2018, 4:21 p.m. UTC | #5
On Tue, Mar 20, 2018 at 04:02:44PM +0000, Daniel P. Berrangé wrote:
> No, it is unsafe - we rely on lockf() to get the mutual exclusion.
> If a QEMU is running with pidfile locked, and its pid written into
> it, then a 2nd QEMU comes along it will truncate the pidfile owned
> by the original QEMU because the truncation happens before it has
> tried to acquire the lock. The 2nd QEMU will still exit, but the
> original QEMU's pid has now been lost.

That's correct, thanks for pointing it out.

> We must call ftruncate() after lockf(), but before writing the new
> pid into the file. That ensures there is no window in which it is
> possible to see the new & old pids mixed together.

I'll send a revised version doing exactly that.

From my reading of the Windows API documentation, this might not be a
problem there: The file is opened with FILE_SHARE_READ, which prohibits
opening the file in a writable mode and CREATE_ALWAYS will only recreate
the file if it is writable.

Florian
Daniel P. Berrangé March 20, 2018, 4:29 p.m. UTC | #6
On Tue, Mar 20, 2018 at 05:21:16PM +0100, Florian Larysch wrote:
> On Tue, Mar 20, 2018 at 04:02:44PM +0000, Daniel P. Berrangé wrote:
> > No, it is unsafe - we rely on lockf() to get the mutual exclusion.
> > If a QEMU is running with pidfile locked, and its pid written into
> > it, then a 2nd QEMU comes along it will truncate the pidfile owned
> > by the original QEMU because the truncation happens before it has
> > tried to acquire the lock. The 2nd QEMU will still exit, but the
> > original QEMU's pid has now been lost.
> 
> That's correct, thanks for pointing it out.
> 
> > We must call ftruncate() after lockf(), but before writing the new
> > pid into the file. That ensures there is no window in which it is
> > possible to see the new & old pids mixed together.
> 
> I'll send a revised version doing exactly that.
> 
> From my reading of the Windows API documentation, this might not be a
> problem there: The file is opened with FILE_SHARE_READ, which prohibits
> opening the file in a writable mode and CREATE_ALWAYS will only recreate
> the file if it is writable.

I don't think that's correct either I'm afraid.

FILE_SHARE_READ indicates whether other processes are allowed to open
the file in read-only mode, while we have it open for write.

We're passing GENERIC_WRITE so QEMU has it open for write, thus your
change will again cause the 2nd QEMU to blindly replace the file
owned by the original QEMU.

That said, I think the existing pidfile code for win32 is totally broken
because QEMU is closing it immediately after writing its pid. So the only
mutual exclusion is taking place in the tiny time window while QEMU is
writing the pid.

Regards,
Daniel
diff mbox series

Patch

diff --git a/os-posix.c b/os-posix.c
index b9c2343b1e..9a6b874180 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -301,7 +301,7 @@  int qemu_create_pidfile(const char *filename)
     int len;
     int fd;
 
-    fd = qemu_open(filename, O_RDWR | O_CREAT, 0600);
+    fd = qemu_open(filename, O_RDWR | O_CREAT | O_TRUNC, 0600);
     if (fd == -1) {
         return -1;
     }
diff --git a/os-win32.c b/os-win32.c
index 586a7c7d49..85dbad7af8 100644
--- a/os-win32.c
+++ b/os-win32.c
@@ -108,7 +108,7 @@  int qemu_create_pidfile(const char *filename)
     memset(&overlap, 0, sizeof(overlap));
 
     file = CreateFile(filename, GENERIC_WRITE, FILE_SHARE_READ, NULL,
-                      OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
+                      CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
 
     if (file == INVALID_HANDLE_VALUE) {
         return -1;