mbox series

[F,SRU,0/1] selftests/eeh: Skip ahci adapters

Message ID 20210224071423.30317-1-po-hsu.lin@canonical.com
Headers show
Series selftests/eeh: Skip ahci adapters | expand

Message

Po-Hsu Lin Feb. 24, 2021, 7:14 a.m. UTC
[Impact]
When trying to run this test on P8 node entei with Focal kernel,
it will try to break 4 devices on Focal, and one of them is using
the AHCI driver which doesn't support error recovery:

  $ sudo ./eeh-basic.sh
  0000:00:00.0, Skipped: bridge
  0001:00:00.0, Skipped: bridge
  0020:00:00.0, Skipped: bridge
  0021:00:00.0, Skipped: bridge
  0021:01:00.0, Skipped: bridge
  0021:02:01.0, Skipped: bridge
  0021:02:08.0, Skipped: bridge
  0021:02:09.0, Skipped: bridge
  0021:02:0a.0, Skipped: bridge
  0021:02:0b.0, Skipped: bridge
  0021:02:0c.0, Skipped: bridge
  0021:0d:00.0, Added
  0021:0e:00.0, Added
  0021:0f:00.0, Skipped: bridge
  0021:10:00.0, Added
  0022:00:00.0, Skipped: bridge
  0022:01:00.0, Added
  Found 4 breakable devices...
  Breaking 0021:0d:00.0...
  0021:0d:00.0, waited 0/60
  0021:0d:00.0, waited 1/60
  0021:0d:00.0, waited 2/60
  0021:0d:00.0, waited 3/60
  0021:0d:00.0, waited 4/60
  0021:0d:00.0, waited 5/60
  0021:0d:00.0, waited 6/60
  0021:0d:00.0, waited 7/60
  0021:0d:00.0, waited 8/60
  0021:0d:00.0, Recovered after 9 seconds
  Breaking 0021:0e:00.0...
  0021:0e:00.0, waited 0/60
  0021:0e:00.0, waited 1/60
  ./eeh-basic.sh: 74: sleep: Input/output error
  0021:0e:00.0, waited 2/60
  ./eeh-basic.sh: 74: sleep: Input/output error
  ....
  ./eeh-basic.sh: 74: sleep: Input/output error
  0021:0e:00.0, waited 59/60
  ./eeh-basic.sh: 74: sleep: Input/output error
  0021:0e:00.0, waited 60/60
  ./eeh-basic.sh: 74: sleep: Input/output error
  0021:0e:00.0, Failed to recover!
  Breaking 0021:10:00.0...
  Skipping 0021:10:00.0, Initial PE state is not ok
  Breaking 0022:01:00.0...
  Skipping 0022:01:00.0, Initial PE state is not ok
  3 devices failed to recover (4 tested)
  ./eeh-basic.sh: 81: lspci: Input/output error
  ./eeh-basic.sh: 81: diff: Input/output error
  ./eeh-basic.sh: 82: rm: Input/output error
  ./eeh-basic.sh: 84: test: 3: unexpected operator

With the driver failed to recover, the system will start acting up.
  $ ls
  ls: command not found

And drop into a read-only state

[Fixes]
* bbe9064f30f06e ("selftests/eeh: Skip ahci adapters")

This is only affecting Focal and it can be cherry-picked.

[Test case]
Run the eeh-basic.sh script in tools/testing/selftests/powerpc/eeh/
on the affected P8 node, the test should pass without any issue.

[Where problems could occur]
This fix is limited to PowerPC testing tool and it's already in
Groovy kernel, it's unlike to cause any issue.

Michael Ellerman (1):
  selftests/eeh: Skip ahci adapters

 tools/testing/selftests/powerpc/eeh/eeh-basic.sh | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Kelsey Skunberg Feb. 26, 2021, 1:39 a.m. UTC | #1
applied to Focal master-next. Thank you! 

-Kelsey

On 2021-02-24 15:14:22 , Po-Hsu Lin wrote:
> [Impact]
> When trying to run this test on P8 node entei with Focal kernel,
> it will try to break 4 devices on Focal, and one of them is using
> the AHCI driver which doesn't support error recovery:
> 
>   $ sudo ./eeh-basic.sh
>   0000:00:00.0, Skipped: bridge
>   0001:00:00.0, Skipped: bridge
>   0020:00:00.0, Skipped: bridge
>   0021:00:00.0, Skipped: bridge
>   0021:01:00.0, Skipped: bridge
>   0021:02:01.0, Skipped: bridge
>   0021:02:08.0, Skipped: bridge
>   0021:02:09.0, Skipped: bridge
>   0021:02:0a.0, Skipped: bridge
>   0021:02:0b.0, Skipped: bridge
>   0021:02:0c.0, Skipped: bridge
>   0021:0d:00.0, Added
>   0021:0e:00.0, Added
>   0021:0f:00.0, Skipped: bridge
>   0021:10:00.0, Added
>   0022:00:00.0, Skipped: bridge
>   0022:01:00.0, Added
>   Found 4 breakable devices...
>   Breaking 0021:0d:00.0...
>   0021:0d:00.0, waited 0/60
>   0021:0d:00.0, waited 1/60
>   0021:0d:00.0, waited 2/60
>   0021:0d:00.0, waited 3/60
>   0021:0d:00.0, waited 4/60
>   0021:0d:00.0, waited 5/60
>   0021:0d:00.0, waited 6/60
>   0021:0d:00.0, waited 7/60
>   0021:0d:00.0, waited 8/60
>   0021:0d:00.0, Recovered after 9 seconds
>   Breaking 0021:0e:00.0...
>   0021:0e:00.0, waited 0/60
>   0021:0e:00.0, waited 1/60
>   ./eeh-basic.sh: 74: sleep: Input/output error
>   0021:0e:00.0, waited 2/60
>   ./eeh-basic.sh: 74: sleep: Input/output error
>   ....
>   ./eeh-basic.sh: 74: sleep: Input/output error
>   0021:0e:00.0, waited 59/60
>   ./eeh-basic.sh: 74: sleep: Input/output error
>   0021:0e:00.0, waited 60/60
>   ./eeh-basic.sh: 74: sleep: Input/output error
>   0021:0e:00.0, Failed to recover!
>   Breaking 0021:10:00.0...
>   Skipping 0021:10:00.0, Initial PE state is not ok
>   Breaking 0022:01:00.0...
>   Skipping 0022:01:00.0, Initial PE state is not ok
>   3 devices failed to recover (4 tested)
>   ./eeh-basic.sh: 81: lspci: Input/output error
>   ./eeh-basic.sh: 81: diff: Input/output error
>   ./eeh-basic.sh: 82: rm: Input/output error
>   ./eeh-basic.sh: 84: test: 3: unexpected operator
> 
> With the driver failed to recover, the system will start acting up.
>   $ ls
>   ls: command not found
> 
> And drop into a read-only state
> 
> [Fixes]
> * bbe9064f30f06e ("selftests/eeh: Skip ahci adapters")
> 
> This is only affecting Focal and it can be cherry-picked.
> 
> [Test case]
> Run the eeh-basic.sh script in tools/testing/selftests/powerpc/eeh/
> on the affected P8 node, the test should pass without any issue.
> 
> [Where problems could occur]
> This fix is limited to PowerPC testing tool and it's already in
> Groovy kernel, it's unlike to cause any issue.
> 
> Michael Ellerman (1):
>   selftests/eeh: Skip ahci adapters
> 
>  tools/testing/selftests/powerpc/eeh/eeh-basic.sh | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> -- 
> 2.7.4
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team