diff mbox

[v3,1/4] serial: reset thri_pending on IER writes with THRI=0

Message ID 1418388243-1886-2-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Dec. 12, 2014, 12:44 p.m. UTC
This is responsible for failure of migration from 2.2 to 2.1, because
thr_ipending is always one in practice.

serial.c is setting thr_ipending unconditionally.  However, thr_ipending
is not used at all if THRI=0, and it will be overwritten again the next
time THRE or THRI changes.  For that reason, we can set thr_ipending to
zero every time THRI is reset.

This has no semantic change and is enough to fix migration.  It does not
change the migration format, so 2.2.0 -> 2.1 will remain broken but we
can fix 2.2.1 -> 2.1 without breaking 2.2.1 <-> 2.2.0.

There is disagreement on whether LSR.THRE should be resampled when IER.THRI
goes from 1 to 1.  Do not touch the code for now.

Cc: qemu-stable@nongnu.org
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/char/serial.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Dr. David Alan Gilbert Dec. 15, 2014, 10:46 a.m. UTC | #1
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> This is responsible for failure of migration from 2.2 to 2.1, because
> thr_ipending is always one in practice.
> 
> serial.c is setting thr_ipending unconditionally.  However, thr_ipending
> is not used at all if THRI=0, and it will be overwritten again the next
> time THRE or THRI changes.  For that reason, we can set thr_ipending to
> zero every time THRI is reset.
> 
> This has no semantic change and is enough to fix migration.  It does not
> change the migration format, so 2.2.0 -> 2.1 will remain broken but we
> can fix 2.2.1 -> 2.1 without breaking 2.2.1 <-> 2.2.0.

I can see this causes the thr_ipending to be 0 more of the time, but I don't
see why it's sufficient.

If the transmitter has just transmitted it's last byte, then thr_ipending is set
true by serial_xmit, however if there is a higher priority interrupt then
serial_update_irq would set tmp_iir to something other than THRI,
so I think serial_thr_ipending_needed  would return true and write the
subsection.

Dave

> There is disagreement on whether LSR.THRE should be resampled when IER.THRI
> goes from 1 to 1.  Do not touch the code for now.
> 
> Cc: qemu-stable@nongnu.org
> Reported-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/char/serial.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/char/serial.c b/hw/char/serial.c
> index ebcacdc..8c42d03 100644
> --- a/hw/char/serial.c
> +++ b/hw/char/serial.c
> @@ -350,10 +350,24 @@ static void serial_ioport_write(void *opaque, hwaddr addr, uint64_t val,
>                       s->poll_msl = 0;
>                  }
>              }
> -            if (s->lsr & UART_LSR_THRE) {
> +
> +            /* Turning on the THRE interrupt on IER can trigger the interrupt
> +             * if LSR.THRE=1, even if it had been masked before by reading IIR.
> +             * This is not in the datasheet, but Windows relies on it.  It is
> +             * unclear if THRE has to be resampled every time THRI becomes
> +             * 1, or only on the rising edge.  Bochs does the latter, and Windows
> +             * always toggles IER to all zeroes and back to all ones.  But for
> +             * now leave it as it has always been in QEMU.
> +             *
> +             * If IER.THRI is zero, thr_ipending is not used.  Set it to zero
> +             * so that the thr_ipending subsection is not migrated.
> +             */
> +            if ((s->ier & UART_IER_THRI) && (s->lsr & UART_LSR_THRE)) {
>                  s->thr_ipending = 1;
> -                serial_update_irq(s);
> +            } else {
> +                s->thr_ipending = 0;
>              }
> +            serial_update_irq(s);
>          }
>          break;
>      case 2:
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Paolo Bonzini Dec. 15, 2014, 11:51 a.m. UTC | #2
On 15/12/2014 11:46, Dr. David Alan Gilbert wrote:
> I can see this causes the thr_ipending to be 0 more of the time, but I don't
> see why it's sufficient.
> 
> If the transmitter has just transmitted it's last byte, then thr_ipending is set
> true by serial_xmit, however if there is a higher priority interrupt then
> serial_update_irq would set tmp_iir to something other than THRI,
> so I think serial_thr_ipending_needed  would return true and write the
> subsection.

It's not sufficient in all cases, and it cannot be.  If we could always
reconstruct the state, the subsection would not be necessary at all.
Subsections can be used for two things:

- stuff that was not supported in the previous release, and thus would
always be zero if you can start up the machine on the older release
(with old machine type, no new devices or properties or CPU models).
This is easy, but unfortunately not what this patch does.

- stuff that was migrated wrong in the previous release.  Here, the game
to play with the "needed" and "post_load" functions is to find a
good-enough approximation of missing state from migrated state.  This is
what the thr_ipending subsection does: thr_ipending is approximated by
"is the highest-priority active interrupt the THRI interrupt?"

So, in the case you mentioned 2.1->2.1 migration would have
malfunctioned.  The THRI would have been dropped after acknowledging the
higher priority interrupt, and probably the guest driver would have hung
(at least in the case of Windows).  So, in this case 2.2->2.1 migration
should fail, which the subsection achieves.

There are other examples of this in IDE, where PIO state was not
migrated.  Very early migration (during BIOS) would then hung the VM.
The fix is to break backwards migration in this case, unfortunately you
can't have it both ways. :(

Paolo
Dr. David Alan Gilbert Dec. 15, 2014, 1:30 p.m. UTC | #3
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> 
> 
> On 15/12/2014 11:46, Dr. David Alan Gilbert wrote:
> > I can see this causes the thr_ipending to be 0 more of the time, but I don't
> > see why it's sufficient.
> > 
> > If the transmitter has just transmitted it's last byte, then thr_ipending is set
> > true by serial_xmit, however if there is a higher priority interrupt then
> > serial_update_irq would set tmp_iir to something other than THRI,
> > so I think serial_thr_ipending_needed  would return true and write the
> > subsection.
> 
> It's not sufficient in all cases, and it cannot be.  If we could always
> reconstruct the state, the subsection would not be necessary at all.
> Subsections can be used for two things:

> 
> - stuff that was not supported in the previous release, and thus would
> always be zero if you can start up the machine on the older release
> (with old machine type, no new devices or properties or CPU models).
> This is easy, but unfortunately not what this patch does.
> 
> - stuff that was migrated wrong in the previous release.  Here, the game
> to play with the "needed" and "post_load" functions is to find a
> good-enough approximation of missing state from migrated state.  This is
> what the thr_ipending subsection does: thr_ipending is approximated by
> "is the highest-priority active interrupt the THRI interrupt?"
> 
> So, in the case you mentioned 2.1->2.1 migration would have
> malfunctioned.  The THRI would have been dropped after acknowledging the
> higher priority interrupt, and probably the guest driver would have hung
> (at least in the case of Windows).  So, in this case 2.2->2.1 migration
> should fail, which the subsection achieves.
> 
> There are other examples of this in IDE, where PIO state was not
> migrated.  Very early migration (during BIOS) would then hung the VM.
> The fix is to break backwards migration in this case, unfortunately you
> can't have it both ways. :(

I think the situation for different versions is:

a   pre-2.2   Guest could lose an interrupt, may hang serial transmit
b   2.2       2.2->2.1 fails migration most of the time
c             2.1->2.2 better chance than older since it tries to
                       reconstruct thr_ipending from iir state, but
                       can still lose interrupt
d             2.2->2.2 works
e your patch  2.2->2.1 still might fail migration if unlucky not common
f             2.1->2.2 as (c)
g             2.2<->patch works
h             patch<->patch works

Anyone who really cares about backwards migration compatibility would
probably have to guard the subsection with the machine-type to avoid
(e) ever happening (the heuristic from (c) might be useful to add).

So since (e) is an improvement:

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

although if you do have to recut it, please clarify the text that says
 'and is enough to fix migration.' since it doesn't quite.

(This is an interesting example where with a migration format that allowed
'optional' subsections you wouldn't break backwards migration if the
reader could ignore a section marked as such that it didn't recognise).

> 
> Paolo

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Paolo Bonzini Dec. 15, 2014, 1:34 p.m. UTC | #4
On 15/12/2014 14:30, Dr. David Alan Gilbert wrote:
> Anyone who really cares about backwards migration compatibility would
> probably have to guard the subsection with the machine-type to avoid
> (e) ever happening (the heuristic from (c) might be useful to add).

Right.

> although if you do have to recut it, please clarify the text that says
>  'and is enough to fix migration.' since it doesn't quite.

Ok, I'll clarify this, specifying which case remains broken by design.

> (This is an interesting example where with a migration format that allowed
> 'optional' subsections you wouldn't break backwards migration if the
> reader could ignore a section marked as such that it didn't recognise).

Right, the question is whether you really want to do this. :)

Paolo
Dr. David Alan Gilbert Dec. 15, 2014, 1:37 p.m. UTC | #5
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> 
> 
> On 15/12/2014 14:30, Dr. David Alan Gilbert wrote:
> > Anyone who really cares about backwards migration compatibility would
> > probably have to guard the subsection with the machine-type to avoid
> > (e) ever happening (the heuristic from (c) might be useful to add).
> 
> Right.
> 
> > although if you do have to recut it, please clarify the text that says
> >  'and is enough to fix migration.' since it doesn't quite.
> 
> Ok, I'll clarify this, specifying which case remains broken by design.
> 
> > (This is an interesting example where with a migration format that allowed
> > 'optional' subsections you wouldn't break backwards migration if the
> > reader could ignore a section marked as such that it didn't recognise).
> 
> Right, the question is whether you really want to do this. :)

Well, it would solve this type of problem less painfully, your migration
would always succeed and destinations that knew what to do with it would
always do the right thing.  Of course you could also solve the problem
if the source knew the version of the destination, but you didn't like
that idea either.

Dave

> 
> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Paolo Bonzini Dec. 15, 2014, 1:45 p.m. UTC | #6
On 15/12/2014 14:37, Dr. David Alan Gilbert wrote:
>> > Right, the question is whether you really want to do this. :)
> Well, it would solve this type of problem less painfully, your migration
> would always succeed and destinations that knew what to do with it would
> always do the right thing.

Yes, that's a determination to make.  I agree that this particular case
is one where losing the subsection is not the end of the world, and
keying off the machine type could be a good call for downstreams.

But that's also because downstreams can also patch older source
versions, while QEMU will not release another 2.1 version.  In fact...

> Of course you could also solve the problem
> if the source knew the version of the destination, but you didn't like
> that idea either.

... if the source could be patched by someone to know the version of the
destination, they could also add the subsection with a "needed" function
that always returns 0.  This way it is never migrated, but it is
interpreted.  You do not need the version of the destination.

Paolo
diff mbox

Patch

diff --git a/hw/char/serial.c b/hw/char/serial.c
index ebcacdc..8c42d03 100644
--- a/hw/char/serial.c
+++ b/hw/char/serial.c
@@ -350,10 +350,24 @@  static void serial_ioport_write(void *opaque, hwaddr addr, uint64_t val,
                      s->poll_msl = 0;
                 }
             }
-            if (s->lsr & UART_LSR_THRE) {
+
+            /* Turning on the THRE interrupt on IER can trigger the interrupt
+             * if LSR.THRE=1, even if it had been masked before by reading IIR.
+             * This is not in the datasheet, but Windows relies on it.  It is
+             * unclear if THRE has to be resampled every time THRI becomes
+             * 1, or only on the rising edge.  Bochs does the latter, and Windows
+             * always toggles IER to all zeroes and back to all ones.  But for
+             * now leave it as it has always been in QEMU.
+             *
+             * If IER.THRI is zero, thr_ipending is not used.  Set it to zero
+             * so that the thr_ipending subsection is not migrated.
+             */
+            if ((s->ier & UART_IER_THRI) && (s->lsr & UART_LSR_THRE)) {
                 s->thr_ipending = 1;
-                serial_update_irq(s);
+            } else {
+                s->thr_ipending = 0;
             }
+            serial_update_irq(s);
         }
         break;
     case 2: