diff mbox series

[RFC,v1,1/8] vfio-ccw: Return IOINST_CC_NOT_OPERATIONAL for EIO

Message ID 20191115033437.37926-2-farman@linux.ibm.com
State New
Headers show
Series s390x/vfio-ccw: Channel Path Handling | expand

Commit Message

Eric Farman Nov. 15, 2019, 3:34 a.m. UTC
From: Farhan Ali <alifm@linux.ibm.com>

EIO is returned by vfio-ccw mediated device when the backing
host subchannel is not operational anymore. So return cc=3
back to the guest, rather than returning a unit check.
This way the guest can take appropriate action such as
issue an 'stsch'.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 hw/vfio/ccw.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Cornelia Huck Nov. 18, 2019, 6:13 p.m. UTC | #1
On Fri, 15 Nov 2019 04:34:30 +0100
Eric Farman <farman@linux.ibm.com> wrote:

> From: Farhan Ali <alifm@linux.ibm.com>
> 
> EIO is returned by vfio-ccw mediated device when the backing
> host subchannel is not operational anymore. So return cc=3
> back to the guest, rather than returning a unit check.
> This way the guest can take appropriate action such as
> issue an 'stsch'.

Hnm, I'm trying to recall whether that was actually a conscious choice,
but I can't quite remember... the change does make sense at a glance,
however.

> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>

I would need your s-o-b for that one, though :)

> ---
>  hw/vfio/ccw.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index 6863f6c69f..0919ddbeb8 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -114,6 +114,7 @@ again:
>          return IOINST_CC_BUSY;
>      case -ENODEV:
>      case -EACCES:
> +    case -EIO:
>          return IOINST_CC_NOT_OPERATIONAL;
>      case -EFAULT:
>      default:
Halil Pasic Nov. 19, 2019, 11:23 a.m. UTC | #2
On Mon, 18 Nov 2019 19:13:34 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> > EIO is returned by vfio-ccw mediated device when the backing
> > host subchannel is not operational anymore. So return cc=3
> > back to the guest, rather than returning a unit check.
> > This way the guest can take appropriate action such as
> > issue an 'stsch'.  
> 
> Hnm, I'm trying to recall whether that was actually a conscious choice,
> but I can't quite remember... the change does make sense at a glance,
> however.

Is EIO returned if and only if the host subchannel/device is not
operational any more, or are there cases as well? Is the mapping
(cc to condition) documented? By the QEMU code I would think that
we already have ENODEV and EACCESS for 'not operational' -- no idea
why we need two codes though.

Regards,
Halil
Cornelia Huck Nov. 19, 2019, 12:02 p.m. UTC | #3
On Tue, 19 Nov 2019 12:23:40 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 18 Nov 2019 19:13:34 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > > EIO is returned by vfio-ccw mediated device when the backing
> > > host subchannel is not operational anymore. So return cc=3
> > > back to the guest, rather than returning a unit check.
> > > This way the guest can take appropriate action such as
> > > issue an 'stsch'.    
> > 
> > Hnm, I'm trying to recall whether that was actually a conscious choice,
> > but I can't quite remember... the change does make sense at a glance,
> > however.  
> 
> Is EIO returned if and only if the host subchannel/device is not
> operational any more, or are there cases as well? 

Ok, I walked through the kernel code, and it seems -EIO can happen
- when we try to do I/O while in the NOT_OPER or STANDBY states... cc 3
  makes sense in those cases
- when the cp is not initialized when trying to fetch the orb... which
  is an internal vfio-ccw kernel module error

Btw., this patch only changes one of the handlers; I think you have to
change all of start/halt/clear?

[Might also be good to double-check the handling for the different
instructions.]

> Is the mapping
> (cc to condition) documented? By the QEMU code I would think that
> we already have ENODEV and EACCESS for 'not operational' -- no idea
> why we need two codes though.

-ENODEV: device gone
-EACCES: no path operational

We should be able to distinguish between the two; in the 'no path
operational' case, the device may still be accessible with a different
path mask in the request.
Eric Farman Nov. 19, 2019, 3:42 p.m. UTC | #4
On 11/19/19 7:02 AM, Cornelia Huck wrote:
> On Tue, 19 Nov 2019 12:23:40 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
>> On Mon, 18 Nov 2019 19:13:34 +0100
>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>>> EIO is returned by vfio-ccw mediated device when the backing
>>>> host subchannel is not operational anymore. So return cc=3
>>>> back to the guest, rather than returning a unit check.
>>>> This way the guest can take appropriate action such as
>>>> issue an 'stsch'.    
>>>
>>> Hnm, I'm trying to recall whether that was actually a conscious choice,
>>> but I can't quite remember... the change does make sense at a glance,
>>> however.  
>>
>> Is EIO returned if and only if the host subchannel/device is not
>> operational any more, or are there cases as well? 
> 
> Ok, I walked through the kernel code, and it seems -EIO can happen
> - when we try to do I/O while in the NOT_OPER or STANDBY states... cc 3
>   makes sense in those cases
> - when the cp is not initialized when trying to fetch the orb... which
>   is an internal vfio-ccw kernel module error
> 
> Btw., this patch only changes one of the handlers; I think you have to
> change all of start/halt/clear?

Correct; this patch must've been written before the halt/clear handlers
landed and I missed that nuance in the rebase.  I'll fix that up...

> 
> [Might also be good to double-check the handling for the different
> instructions.]

...and do the double-checking.

> 
>> Is the mapping
>> (cc to condition) documented? By the QEMU code I would think that
>> we already have ENODEV and EACCESS for 'not operational' -- no idea
>> why we need two codes though.
> 
> -ENODEV: device gone
> -EACCES: no path operational
> 
> We should be able to distinguish between the two; in the 'no path
> operational' case, the device may still be accessible with a different
> path mask in the request.
> 

As long as we don't ignore the guest LPM.  Gotta drop that patch.  ;)
Eric Farman Nov. 19, 2019, 3:49 p.m. UTC | #5
On 11/18/19 1:13 PM, Cornelia Huck wrote:
> On Fri, 15 Nov 2019 04:34:30 +0100
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> From: Farhan Ali <alifm@linux.ibm.com>
>>
>> EIO is returned by vfio-ccw mediated device when the backing
>> host subchannel is not operational anymore. So return cc=3
>> back to the guest, rather than returning a unit check.
>> This way the guest can take appropriate action such as
>> issue an 'stsch'.
> 
> Hnm, I'm trying to recall whether that was actually a conscious choice,
> but I can't quite remember... the change does make sense at a glance,
> however.
> 
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> 
> I would need your s-o-b for that one, though :)

Oops.  :)

> 
>> ---
>>  hw/vfio/ccw.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
>> index 6863f6c69f..0919ddbeb8 100644
>> --- a/hw/vfio/ccw.c
>> +++ b/hw/vfio/ccw.c
>> @@ -114,6 +114,7 @@ again:
>>          return IOINST_CC_BUSY;
>>      case -ENODEV:
>>      case -EACCES:
>> +    case -EIO:
>>          return IOINST_CC_NOT_OPERATIONAL;
>>      case -EFAULT:
>>      default:
>
Halil Pasic Nov. 19, 2019, 5:59 p.m. UTC | #6
On Tue, 19 Nov 2019 13:02:20 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 19 Nov 2019 12:23:40 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Mon, 18 Nov 2019 19:13:34 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > > EIO is returned by vfio-ccw mediated device when the backing
> > > > host subchannel is not operational anymore. So return cc=3
> > > > back to the guest, rather than returning a unit check.
> > > > This way the guest can take appropriate action such as
> > > > issue an 'stsch'.    
> > > 
> > > Hnm, I'm trying to recall whether that was actually a conscious choice,
> > > but I can't quite remember... the change does make sense at a glance,
> > > however.  
> > 
> > Is EIO returned if and only if the host subchannel/device is not
> > operational any more, or are there cases as well? 
> 
> Ok, I walked through the kernel code, and it seems -EIO can happen

Thanks Connie for having a look.

> - when we try to do I/O while in the NOT_OPER or STANDBY states... cc 3
>   makes sense in those cases

I do understand NOT_OPER, but I'm not sure about STANDBY.

Here is what the PoP says about cc 3 for SSCH.
"""
Condition code 3 is set, and no other action is
taken, when the subchannel is not operational for
START SUBCHANNEL. A subchannel is not opera-
tional for START SUBCHANNEL if the subchannel is
not provided in the channel subsystem, has no valid
device number associated with it, or is not enabled.
"""

Are we guaranteed to reflect one of these conditions back?

Under what circumstances do we expect that our request will
find the device in STANDBY?

> - when the cp is not initialized when trying to fetch the orb... which
>   is an internal vfio-ccw kernel module error


So the answer seems to be, no EIO is also used for something else than
'device not operational' in a sense of the s390 IO architecture (cc=3
and stuff).

AFAIR the idea was that EIO means something is broken, and we decided
to reflect that as an unit check (because the broader device -- the
actual device + our pass-through code == device for the guest) is broken.
So I think it was a conscious choice.

Regards,
Halil

> 
> Btw., this patch only changes one of the handlers; I think you have to
> change all of start/halt/clear?
> 
> [Might also be good to double-check the handling for the different
> instructions.]
> 
> > Is the mapping
> > (cc to condition) documented? By the QEMU code I would think that
> > we already have ENODEV and EACCESS for 'not operational' -- no idea
> > why we need two codes though.
> 
> -ENODEV: device gone
> -EACCES: no path operational
> 
> We should be able to distinguish between the two; in the 'no path
> operational' case, the device may still be accessible with a different
> path mask in the request.
>
Cornelia Huck Nov. 20, 2019, 10:11 a.m. UTC | #7
On Tue, 19 Nov 2019 18:59:11 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Tue, 19 Nov 2019 13:02:20 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Tue, 19 Nov 2019 12:23:40 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> > > On Mon, 18 Nov 2019 19:13:34 +0100
> > > Cornelia Huck <cohuck@redhat.com> wrote:
> > >   
> > > > > EIO is returned by vfio-ccw mediated device when the backing
> > > > > host subchannel is not operational anymore. So return cc=3
> > > > > back to the guest, rather than returning a unit check.
> > > > > This way the guest can take appropriate action such as
> > > > > issue an 'stsch'.      
> > > > 
> > > > Hnm, I'm trying to recall whether that was actually a conscious choice,
> > > > but I can't quite remember... the change does make sense at a glance,
> > > > however.    
> > > 
> > > Is EIO returned if and only if the host subchannel/device is not
> > > operational any more, or are there cases as well?   
> > 
> > Ok, I walked through the kernel code, and it seems -EIO can happen  
> 
> Thanks Connie for having a look.
> 
> > - when we try to do I/O while in the NOT_OPER or STANDBY states... cc 3
> >   makes sense in those cases  
> 
> I do understand NOT_OPER, but I'm not sure about STANDBY.
> 
> Here is what the PoP says about cc 3 for SSCH.
> """
> Condition code 3 is set, and no other action is
> taken, when the subchannel is not operational for
> START SUBCHANNEL. A subchannel is not opera-
> tional for START SUBCHANNEL if the subchannel is
> not provided in the channel subsystem, has no valid
> device number associated with it, or is not enabled.
> """
> 
> Are we guaranteed to reflect one of these conditions back?
> 
> Under what circumstances do we expect that our request will
> find the device in STANDBY?

IIRC, the subchannel is not enabled when the device is in STANDBY?

Anyway, it seems the check here is more like a safety measure, in case
we messed up.

> 
> > - when the cp is not initialized when trying to fetch the orb... which
> >   is an internal vfio-ccw kernel module error  
> 
> 
> So the answer seems to be, no EIO is also used for something else than
> 'device not operational' in a sense of the s390 IO architecture (cc=3
> and stuff).
> 
> AFAIR the idea was that EIO means something is broken, and we decided
> to reflect that as an unit check (because the broader device -- the
> actual device + our pass-through code == device for the guest) is broken.
> So I think it was a conscious choice.

Hm, if you put it like that... maybe leaving it as -EIO makes more sense.

The main question is: What happens if userspace triggers I/O to be
started and we find the device to have become not operational? Can we
even switch the state to NOT_OPER before we try the ssch (which will
fail with cc 3)? If not, it's probably safe to leave the -EIO in place.
diff mbox series

Patch

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 6863f6c69f..0919ddbeb8 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -114,6 +114,7 @@  again:
         return IOINST_CC_BUSY;
     case -ENODEV:
     case -EACCES:
+    case -EIO:
         return IOINST_CC_NOT_OPERATIONAL;
     case -EFAULT:
     default: