mbox series

[Cosmic,SRU,Bionic/Xenial/Trusty,0/2] Fixes for partition scan of corrupted AIX disk

Message ID 20180821140153.2848-1-mfo@canonical.com
Headers show
Series Fixes for partition scan of corrupted AIX disk | expand

Message

Mauricio Faria de Oliveira Aug. 21, 2018, 2:01 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1787281

[Impact]

 * Users with disks/LUNs used for AIX operating system installations
   previously, which possibly undergone overwrites/corruption on the
   partition table, might hit kernel failures during partition scan
   of such disk/LUN, and possibly hang the system (seen with retries).

 * The Linux kernel should be robust to corrupted disk data, performing
   a better sanitization/checks and not failing.

 * The fix are a couple of simple logic changes to make the code
   of the AIX partition table parser more robust.

[Test Case]

 * Run the partition scan on the (trimmed) disk image of the AIX lun.
   (It's not provided here since it contains customer data), with this
   command:

   $ sudo losetup --find --show --partscan rlv_grkgld.1mb

 * On failure, the command hangs, and messages like these are printed
   to the console, depending on the kernel version (see tests below)

   [ 270.506420] partition (null) (3 pp's found) is not contiguous

   [ 270.597428] BUG: unable to handle kernel paging request at 0000000000001000
   [ 270.599525] IP: [<ffffffff81379d4d>] strnlen+0xd/0x40

 * On success, the command prints a loop device name, for example:

   /dev/loop0

[Regression Potential]

 * Low. Both changes are simple improvements in logic.

 * This affects users which mount disks/LUNs from the AIX OS;
   it should only change behavior for users which relied on a
   uninitialized variables to work correctly during partition
   scan of those disks/LUNs which should be rare as the code
   is likely to fail as we observe in this scenario.

 * This has been tested on Cosmic, Bionic, Xenial, and Trusty.

Mauricio Faria de Oliveira (2):
  partitions/aix: fix usage of uninitialized lv_info and lvname
    structures
  partitions/aix: append null character to print data from disk

 block/partitions/aix.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Comments

Seth Forshee Aug. 23, 2018, 7:51 p.m. UTC | #1
On Tue, Aug 21, 2018 at 11:01:51AM -0300, Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1787281
> 
> [Impact]
> 
>  * Users with disks/LUNs used for AIX operating system installations
>    previously, which possibly undergone overwrites/corruption on the
>    partition table, might hit kernel failures during partition scan
>    of such disk/LUN, and possibly hang the system (seen with retries).
> 
>  * The Linux kernel should be robust to corrupted disk data, performing
>    a better sanitization/checks and not failing.
> 
>  * The fix are a couple of simple logic changes to make the code
>    of the AIX partition table parser more robust.
> 
> [Test Case]
> 
>  * Run the partition scan on the (trimmed) disk image of the AIX lun.
>    (It's not provided here since it contains customer data), with this
>    command:
> 
>    $ sudo losetup --find --show --partscan rlv_grkgld.1mb
> 
>  * On failure, the command hangs, and messages like these are printed
>    to the console, depending on the kernel version (see tests below)
> 
>    [ 270.506420] partition (null) (3 pp's found) is not contiguous
> 
>    [ 270.597428] BUG: unable to handle kernel paging request at 0000000000001000
>    [ 270.599525] IP: [<ffffffff81379d4d>] strnlen+0xd/0x40
> 
>  * On success, the command prints a loop device name, for example:
> 
>    /dev/loop0
> 
> [Regression Potential]
> 
>  * Low. Both changes are simple improvements in logic.
> 
>  * This affects users which mount disks/LUNs from the AIX OS;
>    it should only change behavior for users which relied on a
>    uninitialized variables to work correctly during partition
>    scan of those disks/LUNs which should be rare as the code
>    is likely to fail as we observe in this scenario.
> 
>  * This has been tested on Cosmic, Bionic, Xenial, and Trusty.

Acked-by: Seth Forshee <seth.forshee@canonical.com>

Applied to cosmic/master-next and unstable/master, thanks!
Stefan Bader Aug. 27, 2018, 12:56 p.m. UTC | #2
On 21.08.2018 16:01, Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1787281
> 
> [Impact]
> 
>  * Users with disks/LUNs used for AIX operating system installations
>    previously, which possibly undergone overwrites/corruption on the
>    partition table, might hit kernel failures during partition scan
>    of such disk/LUN, and possibly hang the system (seen with retries).
> 
>  * The Linux kernel should be robust to corrupted disk data, performing
>    a better sanitization/checks and not failing.
> 
>  * The fix are a couple of simple logic changes to make the code
>    of the AIX partition table parser more robust.
> 
> [Test Case]
> 
>  * Run the partition scan on the (trimmed) disk image of the AIX lun.
>    (It's not provided here since it contains customer data), with this
>    command:
> 
>    $ sudo losetup --find --show --partscan rlv_grkgld.1mb
> 
>  * On failure, the command hangs, and messages like these are printed
>    to the console, depending on the kernel version (see tests below)
> 
>    [ 270.506420] partition (null) (3 pp's found) is not contiguous
> 
>    [ 270.597428] BUG: unable to handle kernel paging request at 0000000000001000
>    [ 270.599525] IP: [<ffffffff81379d4d>] strnlen+0xd/0x40
> 
>  * On success, the command prints a loop device name, for example:
> 
>    /dev/loop0
> 
> [Regression Potential]
> 
>  * Low. Both changes are simple improvements in logic.
> 
>  * This affects users which mount disks/LUNs from the AIX OS;
>    it should only change behavior for users which relied on a
>    uninitialized variables to work correctly during partition
>    scan of those disks/LUNs which should be rare as the code
>    is likely to fail as we observe in this scenario.
> 
>  * This has been tested on Cosmic, Bionic, Xenial, and Trusty.
> 
> Mauricio Faria de Oliveira (2):
>   partitions/aix: fix usage of uninitialized lv_info and lvname
>     structures
>   partitions/aix: append null character to print data from disk
> 
>  block/partitions/aix.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Mauricio Faria de Oliveira Sept. 4, 2018, 12:14 p.m. UTC | #3
On Tue, Aug 21, 2018 at 11:02 AM Mauricio Faria de Oliveira <
mfo@canonical.com> wrote:

> BugLink: https://bugs.launchpad.net/bugs/1787281
> [snip]


Hi, pinging just in case; Kleber and I already chatted about this series,
but better safe than sorry.

Thanks,
Kleber Sacilotto de Souza Sept. 4, 2018, 4 p.m. UTC | #4
On 08/21/18 16:01, Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1787281
> 
> [Impact]
> 
>  * Users with disks/LUNs used for AIX operating system installations
>    previously, which possibly undergone overwrites/corruption on the
>    partition table, might hit kernel failures during partition scan
>    of such disk/LUN, and possibly hang the system (seen with retries).
> 
>  * The Linux kernel should be robust to corrupted disk data, performing
>    a better sanitization/checks and not failing.
> 
>  * The fix are a couple of simple logic changes to make the code
>    of the AIX partition table parser more robust.
> 
> [Test Case]
> 
>  * Run the partition scan on the (trimmed) disk image of the AIX lun.
>    (It's not provided here since it contains customer data), with this
>    command:
> 
>    $ sudo losetup --find --show --partscan rlv_grkgld.1mb
> 
>  * On failure, the command hangs, and messages like these are printed
>    to the console, depending on the kernel version (see tests below)
> 
>    [ 270.506420] partition (null) (3 pp's found) is not contiguous
> 
>    [ 270.597428] BUG: unable to handle kernel paging request at 0000000000001000
>    [ 270.599525] IP: [<ffffffff81379d4d>] strnlen+0xd/0x40
> 
>  * On success, the command prints a loop device name, for example:
> 
>    /dev/loop0
> 
> [Regression Potential]
> 
>  * Low. Both changes are simple improvements in logic.
> 
>  * This affects users which mount disks/LUNs from the AIX OS;
>    it should only change behavior for users which relied on a
>    uninitialized variables to work correctly during partition
>    scan of those disks/LUNs which should be rare as the code
>    is likely to fail as we observe in this scenario.
> 
>  * This has been tested on Cosmic, Bionic, Xenial, and Trusty.
> 
> Mauricio Faria de Oliveira (2):
>   partitions/aix: fix usage of uninitialized lv_info and lvname
>     structures
>   partitions/aix: append null character to print data from disk
> 
>  block/partitions/aix.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 


Applied to bionic/master-next, xenial/master-next and trusty/master-next
branches.

Thanks,
Kleber