Patchwork [02/10] Add buffered_file_internal constant

login
register
mail settings
Submitter Juan Quintela
Date Nov. 23, 2010, 11:02 p.m.
Message ID <a12e21918775e0c0ae23c373be1e35060b52031b.1290552026.git.quintela@redhat.com>
Download mbox | patch
Permalink /patch/72767/
State New
Headers show

Comments

Juan Quintela - Nov. 23, 2010, 11:02 p.m.
From: Juan Quintela <quintela@trasno.org>

This time is each time that buffered_file ticks happen

Signed-off-by: Juan Quintela <quintela@trasno.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |    6 ++++--
 buffered_file.h |    2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)
Michael S. Tsirkin - Nov. 24, 2010, 10:40 a.m.
On Wed, Nov 24, 2010 at 12:02:59AM +0100, Juan Quintela wrote:
> From: Juan Quintela <quintela@trasno.org>
> 
> This time is each time that buffered_file ticks happen
> 
> Signed-off-by: Juan Quintela <quintela@trasno.org>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  buffered_file.c |    6 ++++--
>  buffered_file.h |    2 ++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/buffered_file.c b/buffered_file.c
> index 1836e7e..1f492e6 100644
> --- a/buffered_file.c
> +++ b/buffered_file.c
> @@ -20,6 +20,8 @@
> 
>  //#define DEBUG_BUFFERED_FILE
> 
> +const int buffered_file_interval = 100;
> +
>  typedef struct QEMUFileBuffered
>  {
>      BufferedPutFunc *put_buffer;
> @@ -235,7 +237,7 @@ static void buffered_rate_tick(void *opaque)
>          return;
>      }
> 
> -    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + 100);
> +    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + buffered_file_interval);
> 
>      if (s->freeze_output)
>          return;
> @@ -273,7 +275,7 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,
> 
>      s->timer = qemu_new_timer(rt_clock, buffered_rate_tick, s);
> 
> -    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + 100);
> +    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + buffered_file_interval);
> 
>      return s->file;
>  }
> diff --git a/buffered_file.h b/buffered_file.h
> index 98d358b..a728316 100644
> --- a/buffered_file.h
> +++ b/buffered_file.h
> @@ -21,6 +21,8 @@ typedef void (BufferedPutReadyFunc)(void *opaque);
>  typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
>  typedef int (BufferedCloseFunc)(void *opaque);
> 
> +extern const int buffered_file_interval;
> +

This shouldn't be exported, IMO.

>  QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t xfer_limit,
>                                    BufferedPutFunc *put_buffer,
>                                    BufferedPutReadyFunc *put_ready,
> -- 
> 1.7.3.2
>
Juan Quintela - Nov. 24, 2010, 10:52 a.m.
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Nov 24, 2010 at 12:02:59AM +0100, Juan Quintela wrote:
>> From: Juan Quintela <quintela@trasno.org>

>> diff --git a/buffered_file.h b/buffered_file.h
>> index 98d358b..a728316 100644
>> --- a/buffered_file.h
>> +++ b/buffered_file.h
>> @@ -21,6 +21,8 @@ typedef void (BufferedPutReadyFunc)(void *opaque);
>>  typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
>>  typedef int (BufferedCloseFunc)(void *opaque);
>> 
>> +extern const int buffered_file_interval;
>> +
>
> This shouldn't be exported, IMO.

What do you want?  an accessor function?  Notice that it is a constant.
We need the value in other places, see the last patch.

Otherwise, I have to pick random numbers like ... 50ms.

Later, Juan.


>
>>  QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t xfer_limit,
>>                                    BufferedPutFunc *put_buffer,
>>                                    BufferedPutReadyFunc *put_ready,
>> -- 
>> 1.7.3.2
>>
Michael S. Tsirkin - Nov. 24, 2010, 11:04 a.m.
On Wed, Nov 24, 2010 at 11:52:25AM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Nov 24, 2010 at 12:02:59AM +0100, Juan Quintela wrote:
> >> From: Juan Quintela <quintela@trasno.org>
> 
> >> diff --git a/buffered_file.h b/buffered_file.h
> >> index 98d358b..a728316 100644
> >> --- a/buffered_file.h
> >> +++ b/buffered_file.h
> >> @@ -21,6 +21,8 @@ typedef void (BufferedPutReadyFunc)(void *opaque);
> >>  typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
> >>  typedef int (BufferedCloseFunc)(void *opaque);
> >> 
> >> +extern const int buffered_file_interval;
> >> +
> >
> > This shouldn't be exported, IMO.
> 
> What do you want?  an accessor function?

I mean an abstraction at qemu_file_ level.

>  Notice that it is a constant.
> We need the value in other places, see the last patch.
> 
> Otherwise, I have to pick random numbers like ... 50ms.
> 
> Later, Juan.

If you need a random number, use a random number :)

> 
> >
> >>  QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t xfer_limit,
> >>                                    BufferedPutFunc *put_buffer,
> >>                                    BufferedPutReadyFunc *put_ready,
> >> -- 
> >> 1.7.3.2
> >>
Juan Quintela - Nov. 24, 2010, 11:13 a.m.
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Nov 24, 2010 at 11:52:25AM +0100, Juan Quintela wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> > On Wed, Nov 24, 2010 at 12:02:59AM +0100, Juan Quintela wrote:
>> >> From: Juan Quintela <quintela@trasno.org>
>> 
>> >> diff --git a/buffered_file.h b/buffered_file.h
>> >> index 98d358b..a728316 100644
>> >> --- a/buffered_file.h
>> >> +++ b/buffered_file.h
>> >> @@ -21,6 +21,8 @@ typedef void (BufferedPutReadyFunc)(void *opaque);
>> >>  typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
>> >>  typedef int (BufferedCloseFunc)(void *opaque);
>> >> 
>> >> +extern const int buffered_file_interval;
>> >> +
>> >
>> > This shouldn't be exported, IMO.
>> 
>> What do you want?  an accessor function?
>
> I mean an abstraction at qemu_file_ level.
>
>>  Notice that it is a constant.
>> We need the value in other places, see the last patch.
>> 
>> Otherwise, I have to pick random numbers like ... 50ms.
>> 
>> Later, Juan.
>
> If you need a random number, use a random number :)

It is not random, it is the bigger number that lets the timer to run at
each 100ms O:-)

I tried 100ms and 75ms, they got better throughput, but I got still
stalls of 1.5s.

I could do some search between 50 and 75ms, but the advantage in
bandwidth is not so big to complicate things.  (150 vs 125s for total
migration time or so, and system is quite irresponsive with the bigger
value).

Later, Juan.
Michael S. Tsirkin - Nov. 24, 2010, 11:19 a.m.
On Wed, Nov 24, 2010 at 12:13:26PM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Nov 24, 2010 at 11:52:25AM +0100, Juan Quintela wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> > On Wed, Nov 24, 2010 at 12:02:59AM +0100, Juan Quintela wrote:
> >> >> From: Juan Quintela <quintela@trasno.org>
> >> 
> >> >> diff --git a/buffered_file.h b/buffered_file.h
> >> >> index 98d358b..a728316 100644
> >> >> --- a/buffered_file.h
> >> >> +++ b/buffered_file.h
> >> >> @@ -21,6 +21,8 @@ typedef void (BufferedPutReadyFunc)(void *opaque);
> >> >>  typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
> >> >>  typedef int (BufferedCloseFunc)(void *opaque);
> >> >> 
> >> >> +extern const int buffered_file_interval;
> >> >> +
> >> >
> >> > This shouldn't be exported, IMO.
> >> 
> >> What do you want?  an accessor function?
> >
> > I mean an abstraction at qemu_file_ level.
> >
> >>  Notice that it is a constant.
> >> We need the value in other places, see the last patch.
> >> 
> >> Otherwise, I have to pick random numbers like ... 50ms.
> >> 
> >> Later, Juan.
> >
> > If you need a random number, use a random number :)
> 
> It is not random, it is the bigger number that lets the timer to run at
> each 100ms O:-)

The fact that buffered_file uses a timer internally is an implementation
detail: arch_init uses the qemu file abstraction.

> I tried 100ms and 75ms, they got better throughput, but I got still
> stalls of 1.5s.
> 
> I could do some search between 50 and 75ms, but the advantage in
> bandwidth is not so big to complicate things.  (150 vs 125s for total
> migration time or so, and system is quite irresponsive with the bigger
> value).
> 
> Later, Juan.
Anthony Liguori - Nov. 30, 2010, 1:59 a.m.
On 11/23/2010 05:02 PM, Juan Quintela wrote:
> From: Juan Quintela<quintela@trasno.org>
>
> This time is each time that buffered_file ticks happen
>
> Signed-off-by: Juan Quintela<quintela@trasno.org>
> Signed-off-by: Juan Quintela<quintela@redhat.com>
> ---
>   buffered_file.c |    6 ++++--
>   buffered_file.h |    2 ++
>   2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/buffered_file.c b/buffered_file.c
> index 1836e7e..1f492e6 100644
> --- a/buffered_file.c
> +++ b/buffered_file.c
> @@ -20,6 +20,8 @@
>
>   //#define DEBUG_BUFFERED_FILE
>
> +const int buffered_file_interval = 100;
>    

This should not be a global property but instead be a property of the 
QEMUFileBuffered.

More importantly, it shouldn't be tick intervals.  We should use 
get_ticks_per_sec() to convert from a meaningful unit (like milliseconds).

The original code is bad in this regard so we should take the 
opportunity to make it suck a little less.

Regards,

Anthony Liguori

> +
>   typedef struct QEMUFileBuffered
>   {
>       BufferedPutFunc *put_buffer;
> @@ -235,7 +237,7 @@ static void buffered_rate_tick(void *opaque)
>           return;
>       }
>
> -    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + 100);
> +    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + buffered_file_interval);
>
>       if (s->freeze_output)
>           return;
> @@ -273,7 +275,7 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,
>
>       s->timer = qemu_new_timer(rt_clock, buffered_rate_tick, s);
>
> -    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + 100);
> +    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + buffered_file_interval);
>
>       return s->file;
>   }
> diff --git a/buffered_file.h b/buffered_file.h
> index 98d358b..a728316 100644
> --- a/buffered_file.h
> +++ b/buffered_file.h
> @@ -21,6 +21,8 @@ typedef void (BufferedPutReadyFunc)(void *opaque);
>   typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
>   typedef int (BufferedCloseFunc)(void *opaque);
>
> +extern const int buffered_file_interval;
> +
>   QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t xfer_limit,
>                                     BufferedPutFunc *put_buffer,
>                                     BufferedPutReadyFunc *put_ready,
>
Anthony Liguori - Nov. 30, 2010, 2:19 a.m.
On 11/23/2010 05:02 PM, Juan Quintela wrote:
> From: Juan Quintela<quintela@trasno.org>
>
> This time is each time that buffered_file ticks happen
>
> Signed-off-by: Juan Quintela<quintela@trasno.org>
> Signed-off-by: Juan Quintela<quintela@redhat.com>
>    

This patch is wrong.

The point of the tick is to establish a period during which only a 
certain amount of traffic is allowed to go through.  100 is used to 
indicate a period of 100ms even though a tick doesn't technically equal 
a millisecond (even though it practically happens to be close).

But the crucial part of making this work is scaling the bandwidth 
computation to the right period length.  That is, the following bit of code:

     s->xfer_limit = bytes_per_sec / 10;
     s->put_buffer = put_buffer;
     s->put_ready = put_ready;

To make this generic, you'd need to add:

   s->xfer_limit = bytes_per_sec / (get_ticks_per_sec() / 
buffered_file_interval);

Regards,

Anthony Liguori

> ---
>   buffered_file.c |    6 ++++--
>   buffered_file.h |    2 ++
>   2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/buffered_file.c b/buffered_file.c
> index 1836e7e..1f492e6 100644
> --- a/buffered_file.c
> +++ b/buffered_file.c
> @@ -20,6 +20,8 @@
>
>   //#define DEBUG_BUFFERED_FILE
>
> +const int buffered_file_interval = 100;
> +
>   typedef struct QEMUFileBuffered
>   {
>       BufferedPutFunc *put_buffer;
> @@ -235,7 +237,7 @@ static void buffered_rate_tick(void *opaque)
>           return;
>       }
>
> -    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + 100);
> +    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + buffered_file_interval);
>
>       if (s->freeze_output)
>           return;
> @@ -273,7 +275,7 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,
>
>       s->timer = qemu_new_timer(rt_clock, buffered_rate_tick, s);
>
> -    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + 100);
> +    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + buffered_file_interval);
>
>       return s->file;
>   }
> diff --git a/buffered_file.h b/buffered_file.h
> index 98d358b..a728316 100644
> --- a/buffered_file.h
> +++ b/buffered_file.h
> @@ -21,6 +21,8 @@ typedef void (BufferedPutReadyFunc)(void *opaque);
>   typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
>   typedef int (BufferedCloseFunc)(void *opaque);
>
> +extern const int buffered_file_interval;
> +
>   QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t xfer_limit,
>                                     BufferedPutFunc *put_buffer,
>                                     BufferedPutReadyFunc *put_ready,
>
Anthony Liguori - Nov. 30, 2010, 2:23 a.m.
On 11/24/2010 04:52 AM, Juan Quintela wrote:
> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>    
>> On Wed, Nov 24, 2010 at 12:02:59AM +0100, Juan Quintela wrote:
>>      
>>> From: Juan Quintela<quintela@trasno.org>
>>>        
>    
>>> diff --git a/buffered_file.h b/buffered_file.h
>>> index 98d358b..a728316 100644
>>> --- a/buffered_file.h
>>> +++ b/buffered_file.h
>>> @@ -21,6 +21,8 @@ typedef void (BufferedPutReadyFunc)(void *opaque);
>>>   typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
>>>   typedef int (BufferedCloseFunc)(void *opaque);
>>>
>>> +extern const int buffered_file_interval;
>>> +
>>>        
>> This shouldn't be exported, IMO.
>>      
> What do you want?  an accessor function?  Notice that it is a constant.
> We need the value in other places, see the last patch.
>    

That one's just as wrong as this one.  TBH, this whole series is 
fundamentally the wrong approach because it's all ad hoc heuristics 
benchmarked against one workload.

There are three fundamental problems: 1) kvm.ko dirty bit tracking 
doesn't scale 2) we lose flow control information because of the 
multiple levels of buffering which means we move more data than we 
should move 3) migration prevents a guest from executing the device 
model because of qemu_mutex.

Those are the problems to fix.  Sprinkling the code with returns in 
semi-random places because it benchmarked well for one particular test 
case is something we'll deeply regret down the road.

Regards,

Anthony Liguori

> Otherwise, I have to pick random numbers like ... 50ms.
>
> Later, Juan.
>
>
>    
>>      
>>>   QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t xfer_limit,
>>>                                     BufferedPutFunc *put_buffer,
>>>                                     BufferedPutReadyFunc *put_ready,
>>> -- 
>>> 1.7.3.2
>>>
>>>        
>
Juan Quintela - Nov. 30, 2010, 11:56 a.m.
Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/24/2010 04:52 AM, Juan Quintela wrote:
>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>>    
>>> On Wed, Nov 24, 2010 at 12:02:59AM +0100, Juan Quintela wrote:
>>>      
>>>> From: Juan Quintela<quintela@trasno.org>
>>>>        
>>    
>>>> diff --git a/buffered_file.h b/buffered_file.h
>>>> index 98d358b..a728316 100644
>>>> --- a/buffered_file.h
>>>> +++ b/buffered_file.h
>>>> @@ -21,6 +21,8 @@ typedef void (BufferedPutReadyFunc)(void *opaque);
>>>>   typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
>>>>   typedef int (BufferedCloseFunc)(void *opaque);
>>>>
>>>> +extern const int buffered_file_interval;
>>>> +
>>>>        
>>> This shouldn't be exported, IMO.
>>>      
>> What do you want?  an accessor function?  Notice that it is a constant.
>> We need the value in other places, see the last patch.
>>    
>
> That one's just as wrong as this one.  TBH, this whole series is
> fundamentally the wrong approach because it's all ad hoc heuristics
> benchmarked against one workload.

No, I benchmarked against two workloads:
a- idle guest (because it was faster to test)
b- busy guest (each test takes forever, that is the reason that I tested
last).

So, I don't agree with that.

> There are three fundamental problems: 1) kvm.ko dirty bit tracking
> doesn't scale

Fully agree, but this patch don't took that.

> 2) we lose flow control information because of the
> multiple levels of buffering which means we move more data than we
> should move

Fully agree here, but this is a "massive change" to fix it correctly.

> 3) migration prevents a guest from executing the device
> model because of qemu_mutex.

This is a different problem.

> Those are the problems to fix.

This still don't fix the stalls on the main_loop.

So, you are telling me, there are this list of problems that you need to
fix.  They are not enough to fix the problem, and their imply massive
changes.

In the middle time, everybody in stable and 0.14 is not going to be able
to use migration with more than 2GB/4GB guest.

>  Sprinkling the code with returns in
> semi-random places because it benchmarked well for one particular test
> case is something we'll deeply regret down the road.

This was mean :(

There are two returns and one heuristic.

- return a) we try to migrate when we know that there is no space,
  obvious optimazation/bug (deppends on how to look at it).

- return b) we don't need to handle TLB bitmap code for kvm.  I fully
  agree that we need to split the bitmaps in something more sensible,
  but change is quite invasible, and simple fix works for the while.

- heuristic:  if you really think that an io_handler should be able to
  stall the main loop for almost 4 seconds, sorry, I don't agree.
  Again, we can fix this much better, but it needs lots of changes:
   * I asked you if there is a "maximum" value that one io_handler can
     hog the main loop.  Answer: dunno/deppends.
   * Long term: migration should be its own thread, have I told thread?
     This is qemu, no thread.
   * qemu_mutex: this is the holy grail, if we drop qemu_mutex in
     ram_save_live(), we left the guest cpu's to continue running.
     But, and this is a big but, we still stuck the main_loop for 4
     seconds, so this solution is at least partial.

And that is it.  To improve things, I receive complains left and right
that exporting buffered_file_period/timeout is BAD, very BAD.  Because
that exposes buffered_file.c internal state.

But I am given the suggestion that I should just create a
buffered_file_write_nop() that just increases the number of bytes
transferred for normal pages.  I agree that this "could" also work, but
that is indeed worse in the sense that we are exposing yet more
internals of buffered_file.c.  In the 1st case, we are only exposing the
periof of a timer, in the second, we are hijaking how
qemu_file_rate_limit() works + how we account for written memory + how
it calculates the ammount of staff that it have sent, and with this we
"knownly" lie about how much stuff we have write.

How this can be "cleaner" than a timeout of 50ms is beyond me.

On the other hand, this is the typical case where you need a lot of
testing for each change that you did.  I thought several times that I
had found the "guilty" bit for the stalls, and no luck, there was yet
another problem.

I also thought at points, ok, I can now drop the previous attempts, and
no, that didn't work either.  This was the minimal set of patches able
to fix the stalls.

This is just very depressing.

Later, Juan.
Anthony Liguori - Nov. 30, 2010, 2:02 p.m.
On 11/30/2010 05:56 AM, Juan Quintela wrote:
> No, I benchmarked against two workloads:
> a- idle guest (because it was faster to test)
> b- busy guest (each test takes forever, that is the reason that I tested
> last).
>
> So, I don't agree with that.
>    

But in both cases, it's a large memory guest where the RSS size is <<< 
than the allocated memory size.  This is simply an unrealistic 
scenario.  Migrating immediately after launch may be common among 
developers but users typically run useful workloads in their guests and 
run migration after the guest has been running for quite some time.

So the scenario I'm much more interested in, is trying to migrate a 
400GB guest after the guest has been doing useful work and has brought 
in most of it's RSS.  Also with a box that big, you're going to be on 
10gbit.

That's going to introduce a whole other set of problems that this series 
is potentially just going to exacerbate even more.

>> There are three fundamental problems: 1) kvm.ko dirty bit tracking
>> doesn't scale
>>      
> Fully agree, but this patch don't took that.
>
>    
>> 2) we lose flow control information because of the
>> multiple levels of buffering which means we move more data than we
>> should move
>>      
> Fully agree here, but this is a "massive change" to fix it correctly.
>    

It's really not massive.

>> 3) migration prevents a guest from executing the device
>> model because of qemu_mutex.
>>      
> This is a different problem.
>    

No, this is really the fundamental one.

>> Those are the problems to fix.
>>      
> This still don't fix the stalls on the main_loop.
>
> So, you are telling me, there are this list of problems that you need to
> fix.  They are not enough to fix the problem, and their imply massive
> changes.
>
> In the middle time, everybody in stable and 0.14 is not going to be able
> to use migration with more than 2GB/4GB guest.
>    

That's simply not true.  You started this series with a statement that 
migration is broken.  It's not, it works perfectly fine.  Migration with 
8GB guests work perfectly fine.

You've identified a corner case for which we have suboptimal behavior, 
and are now declaring that migration is "totally broken".

>>   Sprinkling the code with returns in
>> semi-random places because it benchmarked well for one particular test
>> case is something we'll deeply regret down the road.
>>      
> This was mean :(
>    

It wasn't intended to be mean but it is the truth.  We need to approach 
these sort of problems in a more systematic way.  Systematic means 
identifying what the fundamental problems are and fixing them in a 
proper way.

Throwing a magic number in the iteration path after which we let the 
main loop run for a little bit is simply not a solution.  You're still 
going to get main loop starvation.

> There are two returns and one heuristic.
>
> - return a) we try to migrate when we know that there is no space,
>    obvious optimazation/bug (deppends on how to look at it).
>
> - return b) we don't need to handle TLB bitmap code for kvm.  I fully
>    agree that we need to split the bitmaps in something more sensible,
>    but change is quite invasible, and simple fix works for the while.
>
> - heuristic:  if you really think that an io_handler should be able to
>    stall the main loop for almost 4 seconds, sorry, I don't agree.
>    

But fundamentally, why would an iteration of migration take 4 seconds?  
This is really my fundamental object, the migration loop should, at most 
take as much time as it takes to fill up an empty socket buffer or until 
it hits the bandwidth limit.

The bandwidth limit offers a fine-grain control over exactly how long it 
should take.

If we're burning excess CPU walking a 100MB bitmap, then let's fix that 
problem.  Stopping every 1MB worth of the bitmap to do other work just 
papers over the real problem (that we're walking 100MB bitmap).

>    Again, we can fix this much better, but it needs lots of changes:
>     * I asked you if there is a "maximum" value that one io_handler can
>       hog the main loop.  Answer: dunno/deppends.
>     * Long term: migration should be its own thread, have I told thread?
>       This is qemu, no thread.
>     * qemu_mutex: this is the holy grail, if we drop qemu_mutex in
>       ram_save_live(), we left the guest cpu's to continue running.
>       But, and this is a big but, we still stuck the main_loop for 4
>       seconds, so this solution is at least partial.
>    

It really shouldn't be that hard at all to add a multi-level tree to 
kvm.ko and it fixes a bunch of problems at once.  Likewise, accounting 
zero pages is very simple and it makes bandwidth limiting more effective.

> And that is it.  To improve things, I receive complains left and right
> that exporting buffered_file_period/timeout is BAD, very BAD.  Because
> that exposes buffered_file.c internal state.
>
> But I am given the suggestion that I should just create a
> buffered_file_write_nop() that just increases the number of bytes
> transferred for normal pages.  I agree that this "could" also work, but
> that is indeed worse in the sense that we are exposing yet more
> internals of buffered_file.c.  In the 1st case, we are only exposing the
> periof of a timer, in the second, we are hijaking how
> qemu_file_rate_limit() works + how we account for written memory + how
> it calculates the ammount of staff that it have sent, and with this we
> "knownly" lie about how much stuff we have write.
>
> How this can be "cleaner" than a timeout of 50ms is beyond me.
>    

If the migration loop is running for 50ms than we've already failed if 
the user requests a 30ms downtime for migration.  The 50ms timeout means 
that a VCPU can be blocked for 50ms at a time.

Regards,

Anthony Liguori

> On the other hand, this is the typical case where you need a lot of
> testing for each change that you did.  I thought several times that I
> had found the "guilty" bit for the stalls, and no luck, there was yet
> another problem.
>
> I also thought at points, ok, I can now drop the previous attempts, and
> no, that didn't work either.  This was the minimal set of patches able
> to fix the stalls.
>
> This is just very depressing.
>
> Later, Juan.
>
Michael S. Tsirkin - Nov. 30, 2010, 2:11 p.m.
On Tue, Nov 30, 2010 at 08:02:56AM -0600, Anthony Liguori wrote:
> If we're burning excess CPU walking a 100MB bitmap, then let's fix
> that problem.  Stopping every 1MB worth of the bitmap to do other
> work just papers over the real problem (that we're walking 100MB
> bitmap).

Just using a bit per page will already make the problem 8x smaller,
won't it?  Is that enough to avoid the need for a new kernel interface?
Anthony Liguori - Nov. 30, 2010, 2:22 p.m.
On 11/30/2010 08:11 AM, Michael S. Tsirkin wrote:
> On Tue, Nov 30, 2010 at 08:02:56AM -0600, Anthony Liguori wrote:
>    
>> If we're burning excess CPU walking a 100MB bitmap, then let's fix
>> that problem.  Stopping every 1MB worth of the bitmap to do other
>> work just papers over the real problem (that we're walking 100MB
>> bitmap).
>>      
> Just using a bit per page will already make the problem 8x smaller,
> won't it?  Is that enough to avoid the need for a new kernel interface?
>    

I don't know, but it's definitely a good thing to try.

Regards,

Anthony Liguori
Juan Quintela - Nov. 30, 2010, 3:40 p.m.
Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/30/2010 05:56 AM, Juan Quintela wrote:
>> No, I benchmarked against two workloads:
>> a- idle guest (because it was faster to test)
>> b- busy guest (each test takes forever, that is the reason that I tested
>> last).
>>
>> So, I don't agree with that.
>>    
>
> But in both cases, it's a large memory guest where the RSS size is <<<
> than the allocated memory size.  This is simply an unrealistic
> scenario.  Migrating immediately after launch may be common among
> developers but users typically run useful workloads in their guests
> and run migration after the guest has been running for quite some
> time.
>
> So the scenario I'm much more interested in, is trying to migrate a
> 400GB guest after the guest has been doing useful work and has brought
> in most of it's RSS.  Also with a box that big, you're going to be on
> 10gbit.
>
> That's going to introduce a whole other set of problems that this
> series is potentially just going to exacerbate even more.
>
>>> There are three fundamental problems: 1) kvm.ko dirty bit tracking
>>> doesn't scale
>>>      
>> Fully agree, but this patch don't took that.
>>
>>    
>>> 2) we lose flow control information because of the
>>> multiple levels of buffering which means we move more data than we
>>> should move
>>>      
>> Fully agree here, but this is a "massive change" to fix it correctly.
>>    
>
> It's really not massive.
>
>>> 3) migration prevents a guest from executing the device
>>> model because of qemu_mutex.
>>>      
>> This is a different problem.
>>    
>
> No, this is really the fundamental one.

We have more problems on the main_loop.

>>> Those are the problems to fix.
>>>      
>> This still don't fix the stalls on the main_loop.
>>
>> So, you are telling me, there are this list of problems that you need to
>> fix.  They are not enough to fix the problem, and their imply massive
>> changes.
>>
>> In the middle time, everybody in stable and 0.14 is not going to be able
>> to use migration with more than 2GB/4GB guest.
>>    
>
> That's simply not true.  You started this series with a statement that
> migration is broken.

Well, it deppends who you ask.

>   It's not, it works perfectly fine.  Migration
> with 8GB guests work perfectly fine.

It dont' work perfectly fine.  If you have a migrate_maximum_downtime of
30ms and we are stuck for 1s several times, it is broken on my book.
This was an idle guest.

Now move to having a loaded guest (more than we can migrate).  And then
we get stalls of almost a minute where nothing happens.

That is a "very strange" definition of it works perfectly.  In my book
it is nearer the "it is completely broken".

> You've identified a corner case for which we have suboptimal behavior,
> and are now declaring that migration is "totally broken".

I think it is not a corner case, but it deppendes of what your "normal"
case is.

>>>   Sprinkling the code with returns in
>>> semi-random places because it benchmarked well for one particular test
>>> case is something we'll deeply regret down the road.
>>>      
>> This was mean :(
>>    
>
> It wasn't intended to be mean but it is the truth.  We need to
> approach these sort of problems in a more systematic way.  Systematic
> means identifying what the fundamental problems are and fixing them in
> a proper way.

It is not a corner case, it is "always" that we have enough memory.
Basically our bitmap handling code is "exponential" on memory size, so
the bigger the amount of memory the bigger the problems appear.

> Throwing a magic number in the iteration path after which we let the
> main loop run for a little bit is simply not a solution.  You're still
> going to get main loop starvation.

This deppends of how you approach the problem.   We have io_handlers
that take too much time (for various definitios of "too much").

>> There are two returns and one heuristic.
>>
>> - return a) we try to migrate when we know that there is no space,
>>    obvious optimazation/bug (deppends on how to look at it).
>>
>> - return b) we don't need to handle TLB bitmap code for kvm.  I fully
>>    agree that we need to split the bitmaps in something more sensible,
>>    but change is quite invasible, and simple fix works for the while.
>>
>> - heuristic:  if you really think that an io_handler should be able to
>>    stall the main loop for almost 4 seconds, sorry, I don't agree.
>>    
>
> But fundamentally, why would an iteration of migration take 4 seconds?

That is a good question.  It shouldn't.

> This is really my fundamental object, the migration loop should, at
> most take as much time as it takes to fill up an empty socket buffer
> or until it hits the bandwidth limit.

bandwidth limit is not hit with zero pages.

> The bandwidth limit offers a fine-grain control over exactly how long
> it should take.

No if we are not sending data.

> If we're burning excess CPU walking a 100MB bitmap, then let's fix
> that problem.  Stopping every 1MB worth of the bitmap to do other work
> just papers over the real problem (that we're walking 100MB bitmap).

Agreed.

OK.  I am going to state this other way.  We have at least this
problems:
- qemu/kvm bitmap handler
- io_handler takes too much memory
- interface witch kvm
- bandwidth calculation
- zero page optimazations
- qemu_mutex handling.

If I sent a patch that fixes only "some" of those, I am going to be
asked for numbers (because problem would still be there).

Are you telling me that if I sent a patch that fixes any of the
individual problems, but for the user there is still the same problem
(due to the other ones) they are going to be accepted?

I tried to get the minimal set of patches that fixes the "real" problem:
i.e. stalls.  Half of them are obvious and you agreed to accept them.

But accepting them, we haven't fixed the problem.  So, what is the next
step?
- getting the problems fixed one by one: would that patches get
  integrated?  notice that stalls/numbers will not improve until last
  problem is fixed.
- nothing will get integrated until everything is fixed?

My idea was to go the other way around.  Get the minimal fixes that are
needed to fix the real problem, and then change the implementation of
the things that I showed before.  Why, because this way we know that
problem is fixed (not the more elegant solution, but it is fixed), and
we can measure that we are not getting problems back with each change.

Testing each change in isolation don't allow us to measure that we are
not re-introducing the problem.

Later, Juan.
Michael S. Tsirkin - Nov. 30, 2010, 4:10 p.m.
On Tue, Nov 30, 2010 at 04:40:41PM +0100, Juan Quintela wrote:
> Basically our bitmap handling code is "exponential" on memory size,

I didn't realize this. What makes it exponential?
Juan Quintela - Nov. 30, 2010, 4:32 p.m.
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Nov 30, 2010 at 04:40:41PM +0100, Juan Quintela wrote:
>> Basically our bitmap handling code is "exponential" on memory size,
>
> I didn't realize this. What makes it exponential?

Well, 1st of all, it is "exponential" as you measure it.

stalls by default are:

1-2GB: milliseconds
2-4GB: 100-200ms
4-8GB: 1s
64GB: 59s
400GB: 24m (yes, minutes)

That sounds really exponential.

Now the other thing is the cache size.

with 64GB of RAM, we basically have a 16MB bitmap size, i.e. we blow the
cache each time that we run ram_save_live().

This is one of the reasons why I don't want to walk the bitmap on
ram_save_remaining().

ram_save_remaining() is linear with memory size (previous my changes).
ram_save_live is also linear on the memory size, so we are in a worse
case of n^n  (notice that this is the worst case, we normally do much
better n^2, n^3 or so).

Add to this that we are blowing the caches with big amounts of memory
(we don't do witch smaller sizes), and you can see that our behaviour is
clearly exponential.

As I need to fix them, I will work on them today/tomorrow.

Later, Juan.
Anthony Liguori - Nov. 30, 2010, 4:44 p.m.
On 11/30/2010 10:32 AM, Juan Quintela wrote:
> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>    
>> On Tue, Nov 30, 2010 at 04:40:41PM +0100, Juan Quintela wrote:
>>      
>>> Basically our bitmap handling code is "exponential" on memory size,
>>>        
>> I didn't realize this. What makes it exponential?
>>      
> Well, 1st of all, it is "exponential" as you measure it.
>
> stalls by default are:
>
> 1-2GB: milliseconds
> 2-4GB: 100-200ms
> 4-8GB: 1s
> 64GB: 59s
> 400GB: 24m (yes, minutes)
>
> That sounds really exponential.
>    

How are you measuring stalls btw?

Regards,

Anthony Liguori

> Now the other thing is the cache size.
>
> with 64GB of RAM, we basically have a 16MB bitmap size, i.e. we blow the
> cache each time that we run ram_save_live().
>
> This is one of the reasons why I don't want to walk the bitmap on
> ram_save_remaining().
>
> ram_save_remaining() is linear with memory size (previous my changes).
> ram_save_live is also linear on the memory size, so we are in a worse
> case of n^n  (notice that this is the worst case, we normally do much
> better n^2, n^3 or so).
>
> Add to this that we are blowing the caches with big amounts of memory
> (we don't do witch smaller sizes), and you can see that our behaviour is
> clearly exponential.
>
> As I need to fix them, I will work on them today/tomorrow.
>
> Later, Juan.
>
Juan Quintela - Nov. 30, 2010, 6:04 p.m.
Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/30/2010 10:32 AM, Juan Quintela wrote:
>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>>    
>>> On Tue, Nov 30, 2010 at 04:40:41PM +0100, Juan Quintela wrote:
>>>      
>>>> Basically our bitmap handling code is "exponential" on memory size,
>>>>        
>>> I didn't realize this. What makes it exponential?
>>>      
>> Well, 1st of all, it is "exponential" as you measure it.
>>
>> stalls by default are:
>>
>> 1-2GB: milliseconds
>> 2-4GB: 100-200ms
>> 4-8GB: 1s
>> 64GB: 59s
>> 400GB: 24m (yes, minutes)
>>
>> That sounds really exponential.
>>    
>
> How are you measuring stalls btw?

At the end of the ram_save_live().  This was the reason that I put the
information there.

for the 24mins stall (I don't have that machine anymore) I had less
"exact" measurements.  It was the amount that it "decided" to sent in
the last non-live part of memory migration.  With the stalls & zero page
account, we just got to the point where we had basically infinity speed.

Later, Juan.

> Regards,
>
> Anthony Liguori
>
>> Now the other thing is the cache size.
>>
>> with 64GB of RAM, we basically have a 16MB bitmap size, i.e. we blow the
>> cache each time that we run ram_save_live().
>>
>> This is one of the reasons why I don't want to walk the bitmap on
>> ram_save_remaining().
>>
>> ram_save_remaining() is linear with memory size (previous my changes).
>> ram_save_live is also linear on the memory size, so we are in a worse
>> case of n^n  (notice that this is the worst case, we normally do much
>> better n^2, n^3 or so).
>>
>> Add to this that we are blowing the caches with big amounts of memory
>> (we don't do witch smaller sizes), and you can see that our behaviour is
>> clearly exponential.
>>
>> As I need to fix them, I will work on them today/tomorrow.
>>
>> Later, Juan.
>>
Anthony Liguori - Nov. 30, 2010, 6:54 p.m.
On 11/30/2010 12:04 PM, Juan Quintela wrote:
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>    
>> On 11/30/2010 10:32 AM, Juan Quintela wrote:
>>      
>>> "Michael S. Tsirkin"<mst@redhat.com>   wrote:
>>>
>>>        
>>>> On Tue, Nov 30, 2010 at 04:40:41PM +0100, Juan Quintela wrote:
>>>>
>>>>          
>>>>> Basically our bitmap handling code is "exponential" on memory size,
>>>>>
>>>>>            
>>>> I didn't realize this. What makes it exponential?
>>>>
>>>>          
>>> Well, 1st of all, it is "exponential" as you measure it.
>>>
>>> stalls by default are:
>>>
>>> 1-2GB: milliseconds
>>> 2-4GB: 100-200ms
>>> 4-8GB: 1s
>>> 64GB: 59s
>>> 400GB: 24m (yes, minutes)
>>>
>>> That sounds really exponential.
>>>
>>>        
>> How are you measuring stalls btw?
>>      
> At the end of the ram_save_live().  This was the reason that I put the
> information there.
>
> for the 24mins stall (I don't have that machine anymore) I had less
> "exact" measurements.  It was the amount that it "decided" to sent in
> the last non-live part of memory migration.  With the stalls&  zero page
> account, we just got to the point where we had basically infinity speed.
>    

That's not quite guest visible.

It only is a "stall" if the guest is trying to access device emulation 
and acquiring the qemu_mutex.  A more accurate measurement would be 
something that measured guest availability.  For instance, I tight loop 
of while (1) { usleep(100); gettimeofday(); } that then recorded periods 
of unavailability > X.

Of course, it's critically important that a working version of pvclock 
be available int he guest for this to be accurate.

Regards,

Anthony Liguori

> Later, Juan.
>
>    
>> Regards,
>>
>> Anthony Liguori
>>
>>      
>>> Now the other thing is the cache size.
>>>
>>> with 64GB of RAM, we basically have a 16MB bitmap size, i.e. we blow the
>>> cache each time that we run ram_save_live().
>>>
>>> This is one of the reasons why I don't want to walk the bitmap on
>>> ram_save_remaining().
>>>
>>> ram_save_remaining() is linear with memory size (previous my changes).
>>> ram_save_live is also linear on the memory size, so we are in a worse
>>> case of n^n  (notice that this is the worst case, we normally do much
>>> better n^2, n^3 or so).
>>>
>>> Add to this that we are blowing the caches with big amounts of memory
>>> (we don't do witch smaller sizes), and you can see that our behaviour is
>>> clearly exponential.
>>>
>>> As I need to fix them, I will work on them today/tomorrow.
>>>
>>> Later, Juan.
>>>
>>>
Juan Quintela - Nov. 30, 2010, 7:15 p.m.
Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/30/2010 12:04 PM, Juan Quintela wrote:
>> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>>    
>>> On 11/30/2010 10:32 AM, Juan Quintela wrote:
>>>      
>>>> "Michael S. Tsirkin"<mst@redhat.com>   wrote:
>>>>
>>>>        
>>>>> On Tue, Nov 30, 2010 at 04:40:41PM +0100, Juan Quintela wrote:
>>>>>
>>>>>          
>>>>>> Basically our bitmap handling code is "exponential" on memory size,
>>>>>>
>>>>>>            
>>>>> I didn't realize this. What makes it exponential?
>>>>>
>>>>>          
>>>> Well, 1st of all, it is "exponential" as you measure it.
>>>>
>>>> stalls by default are:
>>>>
>>>> 1-2GB: milliseconds
>>>> 2-4GB: 100-200ms
>>>> 4-8GB: 1s
>>>> 64GB: 59s
>>>> 400GB: 24m (yes, minutes)
>>>>
>>>> That sounds really exponential.
>>>>
>>>>        
>>> How are you measuring stalls btw?
>>>      
>> At the end of the ram_save_live().  This was the reason that I put the
>> information there.
>>
>> for the 24mins stall (I don't have that machine anymore) I had less
>> "exact" measurements.  It was the amount that it "decided" to sent in
>> the last non-live part of memory migration.  With the stalls&  zero page
>> account, we just got to the point where we had basically infinity speed.
>>    
>
> That's not quite guest visible.

Humm, guest don't answer in 24mins
monitor don't answer in 24mins
ping don't answer in 24mins

are you sure that this is not visible?  Bug report put that guest had
just died, it was me who waited to see that it took 24mins to end.

> It only is a "stall" if the guest is trying to access device emulation
> and acquiring the qemu_mutex.  A more accurate measurement would be
> something that measured guest availability.  For instance, I tight
> loop of while (1) { usleep(100); gettimeofday(); } that then recorded
> periods of unavailability > X.

This is better, and this is what qemu_mutex change should fix.

> Of course, it's critically important that a working version of pvclock
> be available int he guest for this to be accurate.

If the problem are 24mins, we don't need such an "exact" version O:-)

Later, Juan.
Anthony Liguori - Nov. 30, 2010, 8:23 p.m.
On 11/30/2010 01:15 PM, Juan Quintela wrote:
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>    
>> On 11/30/2010 12:04 PM, Juan Quintela wrote:
>>      
>>> Anthony Liguori<anthony@codemonkey.ws>   wrote:
>>>
>>>        
>>>> On 11/30/2010 10:32 AM, Juan Quintela wrote:
>>>>
>>>>          
>>>>> "Michael S. Tsirkin"<mst@redhat.com>    wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> On Tue, Nov 30, 2010 at 04:40:41PM +0100, Juan Quintela wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Basically our bitmap handling code is "exponential" on memory size,
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> I didn't realize this. What makes it exponential?
>>>>>>
>>>>>>
>>>>>>              
>>>>> Well, 1st of all, it is "exponential" as you measure it.
>>>>>
>>>>> stalls by default are:
>>>>>
>>>>> 1-2GB: milliseconds
>>>>> 2-4GB: 100-200ms
>>>>> 4-8GB: 1s
>>>>> 64GB: 59s
>>>>> 400GB: 24m (yes, minutes)
>>>>>
>>>>> That sounds really exponential.
>>>>>
>>>>>
>>>>>            
>>>> How are you measuring stalls btw?
>>>>
>>>>          
>>> At the end of the ram_save_live().  This was the reason that I put the
>>> information there.
>>>
>>> for the 24mins stall (I don't have that machine anymore) I had less
>>> "exact" measurements.  It was the amount that it "decided" to sent in
>>> the last non-live part of memory migration.  With the stalls&   zero page
>>> account, we just got to the point where we had basically infinity speed.
>>>
>>>        
>> That's not quite guest visible.
>>      
> Humm, guest don't answer in 24mins
> monitor don't answer in 24mins
> ping don't answer in 24mins
>
> are you sure that this is not visible?  Bug report put that guest had
> just died, it was me who waited to see that it took 24mins to end.
>    

I'm extremely sceptical that any of your patches would address this 
problem.  Even if you had to scan every page in a 400GB guest, it would 
not take 24 minutes.   Something is not quite right here.

24 minutes suggests that there's another problem that is yet to be 
identified.

Regards,

Anthony Liguori

>> It only is a "stall" if the guest is trying to access device emulation
>> and acquiring the qemu_mutex.  A more accurate measurement would be
>> something that measured guest availability.  For instance, I tight
>> loop of while (1) { usleep(100); gettimeofday(); } that then recorded
>> periods of unavailability>  X.
>>      
> This is better, and this is what qemu_mutex change should fix.
>
>    
>> Of course, it's critically important that a working version of pvclock
>> be available int he guest for this to be accurate.
>>      
> If the problem are 24mins, we don't need such an "exact" version O:-)
>
> Later, Juan.
>
Juan Quintela - Nov. 30, 2010, 8:56 p.m.
Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/30/2010 01:15 PM, Juan Quintela wrote:

>>>> At the end of the ram_save_live().  This was the reason that I put the
>>>> information there.
>>>>
>>>> for the 24mins stall (I don't have that machine anymore) I had less
>>>> "exact" measurements.  It was the amount that it "decided" to sent in
>>>> the last non-live part of memory migration.  With the stalls&   zero page
>>>> account, we just got to the point where we had basically infinity speed.
>>>>
>>>>        
>>> That's not quite guest visible.
>>>      
>> Humm, guest don't answer in 24mins
>> monitor don't answer in 24mins
>> ping don't answer in 24mins
>>
>> are you sure that this is not visible?  Bug report put that guest had
>> just died, it was me who waited to see that it took 24mins to end.
>>    
>
> I'm extremely sceptical that any of your patches would address this
> problem.  Even if you had to scan every page in a 400GB guest, it
> would not take 24 minutes.   Something is not quite right here.
>
> 24 minutes suggests that there's another problem that is yet to be
> identified.

I haven't tested the lastest versions of the patches sent to the list,
but a previous version fixed that problem.

If I get my hands on the machine will try to reproduce the problem and
measure things.

Later, Juan.

Patch

diff --git a/buffered_file.c b/buffered_file.c
index 1836e7e..1f492e6 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -20,6 +20,8 @@ 

 //#define DEBUG_BUFFERED_FILE

+const int buffered_file_interval = 100;
+
 typedef struct QEMUFileBuffered
 {
     BufferedPutFunc *put_buffer;
@@ -235,7 +237,7 @@  static void buffered_rate_tick(void *opaque)
         return;
     }

-    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + 100);
+    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + buffered_file_interval);

     if (s->freeze_output)
         return;
@@ -273,7 +275,7 @@  QEMUFile *qemu_fopen_ops_buffered(void *opaque,

     s->timer = qemu_new_timer(rt_clock, buffered_rate_tick, s);

-    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + 100);
+    qemu_mod_timer(s->timer, qemu_get_clock(rt_clock) + buffered_file_interval);

     return s->file;
 }
diff --git a/buffered_file.h b/buffered_file.h
index 98d358b..a728316 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -21,6 +21,8 @@  typedef void (BufferedPutReadyFunc)(void *opaque);
 typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
 typedef int (BufferedCloseFunc)(void *opaque);

+extern const int buffered_file_interval;
+
 QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t xfer_limit,
                                   BufferedPutFunc *put_buffer,
                                   BufferedPutReadyFunc *put_ready,