diff mbox series

[v4] slof/fs/packages/disk-label.fs: improve checking for DOS boot partitions

Message ID 20240328060009.650859-1-kconsul@linux.ibm.com
State New
Headers show
Series [v4] slof/fs/packages/disk-label.fs: improve checking for DOS boot partitions | expand

Commit Message

Kautuk Consul March 28, 2024, 6 a.m. UTC
While testing with a qcow2 with a DOS boot partition it was found that
when we set the logical_block_size in the guest XML to >512 then the
boot would fail in the following interminable loop:
<SNIP>
Trying to load:  from: /pci@800000020000000/scsi@3 ...
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
</SNIP>

Change the "read-sector" Forth subroutine to throw an exception whenever
it fails to read a full block-size length of sector from the disk.
Also change the "open" method to initiate CATCH exception handling for the calls to
try-partitions/try-files which will also call read-sector which could potentially
now throw this new exception.

After making the above changes, it fails properly with the correct error
message as follows:
<SNIP>
Trying to load:  from: /pci@800000020000000/scsi@3 ...
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!
virtioblk_transfer: Access beyond end of device!

E3404: Not a bootable device!

E3407: Load failed

  Type 'boot' and press return to continue booting the system.
  Type 'reset-all' and press return to reboot the system.

Ready!
0 >
</SNIP>

Signed-off-by: Kautuk Consul <kconsul@linux.ibm.com>
---
 slof/fs/packages/disk-label.fs | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Alexey Kardashevskiy April 4, 2024, 12:35 a.m. UTC | #1
First, sorry I am late into the discussion. Comments below.


On Thu, 28 Mar 2024, at 17:00, Kautuk Consul wrote:
> While testing with a qcow2 with a DOS boot partition it was found that
> when we set the logical_block_size in the guest XML to >512 then the
> boot would fail in the following interminable loop:

Why would anyone tweak this? And when you do, what happens inside the SLOF, does it keep using 512?

> <SNIP>
> Trying to load:  from: /pci@800000020000000/scsi@3 ...
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> </SNIP>
> 
> Change the "read-sector" Forth subroutine to throw an exception whenever
> it fails to read a full block-size length of sector from the disk.

Why not throwing an exception from the "beyond end" message point?
Or fail to open a device if SLOF does not like the block size? I forgot the internals :(

> Also change the "open" method to initiate CATCH exception handling for the calls to
> try-partitions/try-files which will also call read-sector which could potentially
> now throw this new exception.
> 
> After making the above changes, it fails properly with the correct error
> message as follows:
> <SNIP>
> Trying to load:  from: /pci@800000020000000/scsi@3 ...
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> virtioblk_transfer: Access beyond end of device!
> 
> E3404: Not a bootable device!
> 
> E3407: Load failed
> 
>   Type 'boot' and press return to continue booting the system.
>   Type 'reset-all' and press return to reboot the system.
> 
> Ready!
> 0 >
> </SNIP>
> 
> Signed-off-by: Kautuk Consul <kconsul@linux.ibm.com>
> ---
> slof/fs/packages/disk-label.fs | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs
> index 661c6b0..a6fb231 100644
> --- a/slof/fs/packages/disk-label.fs
> +++ b/slof/fs/packages/disk-label.fs
> @@ -136,7 +136,8 @@ CONSTANT /gpt-part-entry
> : read-sector ( sector-number -- )
>     \ block-size is 0x200 on disks, 0x800 on cdrom drives
>     block-size * 0 seek drop      \ seek to sector
> -   block block-size read drop    \ read sector
> +   block block-size read         \ read sector
> +   block-size < IF throw THEN    \ if we read less than the block-size then throw an exception

When it fails, is the number of bytes ever non zero? Thanks,

> ;
>  
> : (.part-entry) ( part-entry )
> @@ -723,10 +724,15 @@ CREATE GPT-LINUX-PARTITION 10 allot
>     THEN
>  
>     partition IF
> -       try-partitions
> +       ['] try-partitions
>     ELSE
> -       try-files
> +       ['] try-files
>     THEN
> +
> +   \ Catch any exception that might happen due to read-sector failing to read
> +   \ block-size number of bytes from any sector of the disk.
> +   CATCH IF false THEN
> +
>     dup 0= IF debug-disk-label? IF ." not found." cr THEN close THEN \ free memory again
> ;
>  
> -- 
> 2.31.1
> 
>
Kautuk Consul April 4, 2024, 7:18 a.m. UTC | #2
On 2024-04-04 11:35:43, Alexey Kardashevskiy wrote:
> First, sorry I am late into the discussion. Comments below.
> 
> 
> On Thu, 28 Mar 2024, at 17:00, Kautuk Consul wrote:
> > While testing with a qcow2 with a DOS boot partition it was found that
> > when we set the logical_block_size in the guest XML to >512 then the
> > boot would fail in the following interminable loop:
> 
> Why would anyone tweak this? And when you do, what happens inside the SLOF, does it keep using 512?
Well, we had an image with DOS boot partition and we tested it with
logical_block_size = 1024 and got this infinite loop.
In SLOF the block-size becomes what we configure in the
logical_block_size parameter. This same issue doesn't arise with GPT.
> 
> > <SNIP>
> > Trying to load:  from: /pci@800000020000000/scsi@3 ...
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > </SNIP>
> > 
> > Change the "read-sector" Forth subroutine to throw an exception whenever
> > it fails to read a full block-size length of sector from the disk.
> 
> Why not throwing an exception from the "beyond end" message point?
> Or fail to open a device if SLOF does not like the block size? I forgot the internals :(
This loop is interminable and this "Access beyond end of device!"
message continues forever.
SLOF doesn't have any option other than to use the block-size that was
set in the logical_block_size parameter. It doesn't have any preference
as the code is very generic for both DOS as well as GPT.
> 
> > Also change the "open" method to initiate CATCH exception handling for the calls to
> > try-partitions/try-files which will also call read-sector which could potentially
> > now throw this new exception.
> > 
> > After making the above changes, it fails properly with the correct error
> > message as follows:
> > <SNIP>
> > Trying to load:  from: /pci@800000020000000/scsi@3 ...
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > virtioblk_transfer: Access beyond end of device!
> > 
> > E3404: Not a bootable device!
> > 
> > E3407: Load failed
> > 
> >   Type 'boot' and press return to continue booting the system.
> >   Type 'reset-all' and press return to reboot the system.
> > 
> > Ready!
> > 0 >
> > </SNIP>
> > 
> > Signed-off-by: Kautuk Consul <kconsul@linux.ibm.com>
> > ---
> > slof/fs/packages/disk-label.fs | 12 +++++++++---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs
> > index 661c6b0..a6fb231 100644
> > --- a/slof/fs/packages/disk-label.fs
> > +++ b/slof/fs/packages/disk-label.fs
> > @@ -136,7 +136,8 @@ CONSTANT /gpt-part-entry
> > : read-sector ( sector-number -- )
> >     \ block-size is 0x200 on disks, 0x800 on cdrom drives
> >     block-size * 0 seek drop      \ seek to sector
> > -   block block-size read drop    \ read sector
> > +   block block-size read         \ read sector
> > +   block-size < IF throw THEN    \ if we read less than the block-size then throw an exception
> 
> When it fails, is the number of bytes ever non zero? Thanks,
No, it doesn't reach 0. It is lesser than the block-size. For example if
we set the logcial_block_size to 1024, the block-size is that much. if
we are reading the last sector which is physically only 512 bytes long
then we read that 512 bytes which is lesser than 1024, which should be
regarded as an error.
> 
> > ;
> >  
> > : (.part-entry) ( part-entry )
> > @@ -723,10 +724,15 @@ CREATE GPT-LINUX-PARTITION 10 allot
> >     THEN
> >  
> >     partition IF
> > -       try-partitions
> > +       ['] try-partitions
> >     ELSE
> > -       try-files
> > +       ['] try-files
> >     THEN
> > +
> > +   \ Catch any exception that might happen due to read-sector failing to read
> > +   \ block-size number of bytes from any sector of the disk.
> > +   CATCH IF false THEN
Segher/Alexey, can we keep this CATCH block or should I remove it ?
> > +
> >     dup 0= IF debug-disk-label? IF ." not found." cr THEN close THEN \ free memory again
> > ;
> >  
> > -- 
> > 2.31.1
> > 
> >
Alexey Kardashevskiy April 5, 2024, 3:46 a.m. UTC | #3
On Thu, 4 Apr 2024, at 18:18, Kautuk Consul wrote:
> On 2024-04-04 11:35:43, Alexey Kardashevskiy wrote:
> > First, sorry I am late into the discussion. Comments below.
> > 
> > 
> > On Thu, 28 Mar 2024, at 17:00, Kautuk Consul wrote:
> > > While testing with a qcow2 with a DOS boot partition it was found that
> > > when we set the logical_block_size in the guest XML to >512 then the
> > > boot would fail in the following interminable loop:
> > 
> > Why would anyone tweak this? And when you do, what happens inside the SLOF, does it keep using 512?
> Well, we had an image with DOS boot partition and we tested it with
> logical_block_size = 1024 and got this infinite loop.

This does not really answer to "why" ;)

> In SLOF the block-size becomes what we configure in the
> logical_block_size parameter. This same issue doesn't arise with GPT.

How is GPT different in this regard?

> > > <SNIP>
> > > Trying to load:  from: /pci@800000020000000/scsi@3 ...
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > </SNIP>
> > > 
> > > Change the "read-sector" Forth subroutine to throw an exception whenever
> > > it fails to read a full block-size length of sector from the disk.
> > 
> > Why not throwing an exception from the "beyond end" message point?
> > Or fail to open a device if SLOF does not like the block size? I forgot the internals :(
> This loop is interminable and this "Access beyond end of device!"
> message continues forever.

Where is that loop exactly? Put CATCH in there.

> SLOF doesn't have any option other than to use the block-size that was
> set in the logical_block_size parameter. It doesn't have any preference
> as the code is very generic for both DOS as well as GPT.
> > 
> > > Also change the "open" method to initiate CATCH exception handling for the calls to
> > > try-partitions/try-files which will also call read-sector which could potentially
> > > now throw this new exception.
> > > 
> > > After making the above changes, it fails properly with the correct error
> > > message as follows:
> > > <SNIP>
> > > Trying to load:  from: /pci@800000020000000/scsi@3 ...
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > virtioblk_transfer: Access beyond end of device!
> > > 
> > > E3404: Not a bootable device!
> > > 
> > > E3407: Load failed
> > > 
> > >   Type 'boot' and press return to continue booting the system.
> > >   Type 'reset-all' and press return to reboot the system.
> > > 
> > > Ready!
> > > 0 >
> > > </SNIP>
> > > 
> > > Signed-off-by: Kautuk Consul <kconsul@linux.ibm.com>
> > > ---
> > > slof/fs/packages/disk-label.fs | 12 +++++++++---
> > > 1 file changed, 9 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs
> > > index 661c6b0..a6fb231 100644
> > > --- a/slof/fs/packages/disk-label.fs
> > > +++ b/slof/fs/packages/disk-label.fs
> > > @@ -136,7 +136,8 @@ CONSTANT /gpt-part-entry
> > > : read-sector ( sector-number -- )
> > >     \ block-size is 0x200 on disks, 0x800 on cdrom drives
> > >     block-size * 0 seek drop      \ seek to sector
> > > -   block block-size read drop    \ read sector
> > > +   block block-size read         \ read sector
> > > +   block-size < IF throw THEN    \ if we read less than the block-size then throw an exception
> > 
> > When it fails, is the number of bytes ever non zero? Thanks,
> No, it doesn't reach 0. It is lesser than the block-size. For example if
> we set the logcial_block_size to 1024, the block-size is that much. if
> we are reading the last sector which is physically only 512 bytes long
> then we read that 512 bytes which is lesser than 1024, which should be
> regarded as an error.

Ah so it only happens when there is an odd number of 512 sectors so reading the last one with block-size==1024 only reads a half => failure, is that right?

> > 
> > > ;
> > >  
> > > : (.part-entry) ( part-entry )
> > > @@ -723,10 +724,15 @@ CREATE GPT-LINUX-PARTITION 10 allot
> > >     THEN
> > >  
> > >     partition IF
> > > -       try-partitions
> > > +       ['] try-partitions
> > >     ELSE
> > > -       try-files
> > > +       ['] try-files
> > >     THEN
> > > +
> > > +   \ Catch any exception that might happen due to read-sector failing to read
> > > +   \ block-size number of bytes from any sector of the disk.
> > > +   CATCH IF false THEN
> Segher/Alexey, can we keep this CATCH block or should I remove it ?

imho the original bug should be handled more gracefully. Seeing exceptions in the code just triggers exceptions in my small brain :) Thanks,

> > > +
> > >     dup 0= IF debug-disk-label? IF ." not found." cr THEN close THEN \ free memory again
> > > ;
> > >  
> > > -- 
> > > 2.31.1
> > > 
> > > 
>
Kautuk Consul April 5, 2024, 8:16 a.m. UTC | #4
Hi,

On 2024-04-05 14:46:14, Alexey Kardashevskiy wrote:
> 
> 
> On Thu, 4 Apr 2024, at 18:18, Kautuk Consul wrote:
> > On 2024-04-04 11:35:43, Alexey Kardashevskiy wrote:
> > > First, sorry I am late into the discussion. Comments below.
> > > 
> > > 
> > > On Thu, 28 Mar 2024, at 17:00, Kautuk Consul wrote:
> > > > While testing with a qcow2 with a DOS boot partition it was found that
> > > > when we set the logical_block_size in the guest XML to >512 then the
> > > > boot would fail in the following interminable loop:
> > > 
> > > Why would anyone tweak this? And when you do, what happens inside the SLOF, does it keep using 512?
> > Well, we had an image with DOS boot partition and we tested it with
> > logical_block_size = 1024 and got this infinite loop.
> 
> This does not really answer to "why" ;)

Well, my point is that it is tweakable in virsh/qemu and maybe we should be
handling this error in configuration in SLOF properly ? There shouldn't
be the possibility of an interminable loop in the software anywhere,
right ? :-)

> 
> > In SLOF the block-size becomes what we configure in the
> > logical_block_size parameter. This same issue doesn't arise with GPT.
> 
> How is GPT different in this regard?

In GPT the sector number 1 contains the GPT headers, not
sector 0 as in the case of DOS boot partition. So if the block-size is
incorrect, the GPT magic number itself isn't found and the "E3404: Not a
bootable device!" error is immediately thrown.

> 
> > > > <SNIP>
> > > > Trying to load:  from: /pci@800000020000000/scsi@3 ...
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > </SNIP>
> > > > 
> > > > Change the "read-sector" Forth subroutine to throw an exception whenever
> > > > it fails to read a full block-size length of sector from the disk.
> > > 
> > > Why not throwing an exception from the "beyond end" message point?
> > > Or fail to open a device if SLOF does not like the block size? I forgot the internals :(
> > This loop is interminable and this "Access beyond end of device!"
> > message continues forever.
> 
> Where is that loop exactly? Put CATCH in there.

That loop is in count-dos-logical-partitions. The reason why I didn't
put a CATCH in there is because there is already another CATCH statement
in do-load in slof/fs/boot.fs which covers this throw. For the other
path that doesn't have a CATCH I have inserted a CATCH in the open
subroutine in disk-label.fs.

> 
> > SLOF doesn't have any option other than to use the block-size that was
> > set in the logical_block_size parameter. It doesn't have any preference
> > as the code is very generic for both DOS as well as GPT.
> > > 
> > > > Also change the "open" method to initiate CATCH exception handling for the calls to
> > > > try-partitions/try-files which will also call read-sector which could potentially
> > > > now throw this new exception.
> > > > 
> > > > After making the above changes, it fails properly with the correct error
> > > > message as follows:
> > > > <SNIP>
> > > > Trying to load:  from: /pci@800000020000000/scsi@3 ...
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > virtioblk_transfer: Access beyond end of device!
> > > > 
> > > > E3404: Not a bootable device!
> > > > 
> > > > E3407: Load failed
> > > > 
> > > >   Type 'boot' and press return to continue booting the system.
> > > >   Type 'reset-all' and press return to reboot the system.
> > > > 
> > > > Ready!
> > > > 0 >
> > > > </SNIP>
> > > > 
> > > > Signed-off-by: Kautuk Consul <kconsul@linux.ibm.com>
> > > > ---
> > > > slof/fs/packages/disk-label.fs | 12 +++++++++---
> > > > 1 file changed, 9 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs
> > > > index 661c6b0..a6fb231 100644
> > > > --- a/slof/fs/packages/disk-label.fs
> > > > +++ b/slof/fs/packages/disk-label.fs
> > > > @@ -136,7 +136,8 @@ CONSTANT /gpt-part-entry
> > > > : read-sector ( sector-number -- )
> > > >     \ block-size is 0x200 on disks, 0x800 on cdrom drives
> > > >     block-size * 0 seek drop      \ seek to sector
> > > > -   block block-size read drop    \ read sector
> > > > +   block block-size read         \ read sector
> > > > +   block-size < IF throw THEN    \ if we read less than the block-size then throw an exception
> > > 
> > > When it fails, is the number of bytes ever non zero? Thanks,
> > No, it doesn't reach 0. It is lesser than the block-size. For example if
> > we set the logcial_block_size to 1024, the block-size is that much. if
> > we are reading the last sector which is physically only 512 bytes long
> > then we read that 512 bytes which is lesser than 1024, which should be
> > regarded as an error.
> 
> Ah so it only happens when there is an odd number of 512 sectors so reading the last one with block-size==1024 only reads a half => failure, is that right?

Yes. Or, block-size if set to 2048 or 4096 will also show the same problem.

> 
> > > 
> > > > ;
> > > >  
> > > > : (.part-entry) ( part-entry )
> > > > @@ -723,10 +724,15 @@ CREATE GPT-LINUX-PARTITION 10 allot
> > > >     THEN
> > > >  
> > > >     partition IF
> > > > -       try-partitions
> > > > +       ['] try-partitions
> > > >     ELSE
> > > > -       try-files
> > > > +       ['] try-files
> > > >     THEN
> > > > +
> > > > +   \ Catch any exception that might happen due to read-sector failing to read
> > > > +   \ block-size number of bytes from any sector of the disk.
> > > > +   CATCH IF false THEN
> > Segher/Alexey, can we keep this CATCH block or should I remove it ?
> 
> imho the original bug should be handled more gracefully. Seeing exceptions in the code just triggers exceptions in my small brain :) Thanks,

:-). But this is the only other path that doesn't have a CATCH
like the do-load subroutine in slof/fs/boot.fs. According to Segher
there shouldn't ever be a problem with throw because if nothing else the
outer-most interpreter loop's CATCH will catch the exception. But I
thought to cover this throw in read-sector more locally in places near
to this functionality. Because the outermost FORTH SLOF interpreter loop is not
really so related to reading a sector in disk-label.fs.

> 
> > > > +
> > > >     dup 0= IF debug-disk-label? IF ." not found." cr THEN close THEN \ free memory again
> > > > ;
> > > >  
> > > > -- 
> > > > 2.31.1
> > > > 
> > > > 
> >
Kautuk Consul April 16, 2024, 4:33 a.m. UTC | #5
Hi,

On 2024-04-05 13:46:44, Kautuk Consul wrote:
> Hi,
> 
> On 2024-04-05 14:46:14, Alexey Kardashevskiy wrote:
> > 
> > 
> > On Thu, 4 Apr 2024, at 18:18, Kautuk Consul wrote:
> > > On 2024-04-04 11:35:43, Alexey Kardashevskiy wrote:
> > > > First, sorry I am late into the discussion. Comments below.
> > > > 
> > > > 
> > > > On Thu, 28 Mar 2024, at 17:00, Kautuk Consul wrote:
> > > > > While testing with a qcow2 with a DOS boot partition it was found that
> > > > > when we set the logical_block_size in the guest XML to >512 then the
> > > > > boot would fail in the following interminable loop:
> > > > 
> > > > Why would anyone tweak this? And when you do, what happens inside the SLOF, does it keep using 512?
> > > Well, we had an image with DOS boot partition and we tested it with
> > > logical_block_size = 1024 and got this infinite loop.
> > 
> > This does not really answer to "why" ;)
> 
> Well, my point is that it is tweakable in virsh/qemu and maybe we should be
> handling this error in configuration in SLOF properly ? There shouldn't
> be the possibility of an interminable loop in the software anywhere,
> right ? :-)
> 
> > 
> > > In SLOF the block-size becomes what we configure in the
> > > logical_block_size parameter. This same issue doesn't arise with GPT.
> > 
> > How is GPT different in this regard?
> 
> In GPT the sector number 1 contains the GPT headers, not
> sector 0 as in the case of DOS boot partition. So if the block-size is
> incorrect, the GPT magic number itself isn't found and the "E3404: Not a
> bootable device!" error is immediately thrown.
> 
> > 
> > > > > <SNIP>
> > > > > Trying to load:  from: /pci@800000020000000/scsi@3 ...
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > </SNIP>
> > > > > 
> > > > > Change the "read-sector" Forth subroutine to throw an exception whenever
> > > > > it fails to read a full block-size length of sector from the disk.
> > > > 
> > > > Why not throwing an exception from the "beyond end" message point?
> > > > Or fail to open a device if SLOF does not like the block size? I forgot the internals :(
> > > This loop is interminable and this "Access beyond end of device!"
> > > message continues forever.
> > 
> > Where is that loop exactly? Put CATCH in there.
> 
> That loop is in count-dos-logical-partitions. The reason why I didn't
> put a CATCH in there is because there is already another CATCH statement
> in do-load in slof/fs/boot.fs which covers this throw. For the other
> path that doesn't have a CATCH I have inserted a CATCH in the open
> subroutine in disk-label.fs.
> 
> > 
> > > SLOF doesn't have any option other than to use the block-size that was
> > > set in the logical_block_size parameter. It doesn't have any preference
> > > as the code is very generic for both DOS as well as GPT.
> > > > 
> > > > > Also change the "open" method to initiate CATCH exception handling for the calls to
> > > > > try-partitions/try-files which will also call read-sector which could potentially
> > > > > now throw this new exception.
> > > > > 
> > > > > After making the above changes, it fails properly with the correct error
> > > > > message as follows:
> > > > > <SNIP>
> > > > > Trying to load:  from: /pci@800000020000000/scsi@3 ...
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > virtioblk_transfer: Access beyond end of device!
> > > > > 
> > > > > E3404: Not a bootable device!
> > > > > 
> > > > > E3407: Load failed
> > > > > 
> > > > >   Type 'boot' and press return to continue booting the system.
> > > > >   Type 'reset-all' and press return to reboot the system.
> > > > > 
> > > > > Ready!
> > > > > 0 >
> > > > > </SNIP>
> > > > > 
> > > > > Signed-off-by: Kautuk Consul <kconsul@linux.ibm.com>
> > > > > ---
> > > > > slof/fs/packages/disk-label.fs | 12 +++++++++---
> > > > > 1 file changed, 9 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs
> > > > > index 661c6b0..a6fb231 100644
> > > > > --- a/slof/fs/packages/disk-label.fs
> > > > > +++ b/slof/fs/packages/disk-label.fs
> > > > > @@ -136,7 +136,8 @@ CONSTANT /gpt-part-entry
> > > > > : read-sector ( sector-number -- )
> > > > >     \ block-size is 0x200 on disks, 0x800 on cdrom drives
> > > > >     block-size * 0 seek drop      \ seek to sector
> > > > > -   block block-size read drop    \ read sector
> > > > > +   block block-size read         \ read sector
> > > > > +   block-size < IF throw THEN    \ if we read less than the block-size then throw an exception
> > > > 
> > > > When it fails, is the number of bytes ever non zero? Thanks,
> > > No, it doesn't reach 0. It is lesser than the block-size. For example if
> > > we set the logcial_block_size to 1024, the block-size is that much. if
> > > we are reading the last sector which is physically only 512 bytes long
> > > then we read that 512 bytes which is lesser than 1024, which should be
> > > regarded as an error.
> > 
> > Ah so it only happens when there is an odd number of 512 sectors so reading the last one with block-size==1024 only reads a half => failure, is that right?
> 
> Yes. Or, block-size if set to 2048 or 4096 will also show the same problem.
> 
> > 
> > > > 
> > > > > ;
> > > > >  
> > > > > : (.part-entry) ( part-entry )
> > > > > @@ -723,10 +724,15 @@ CREATE GPT-LINUX-PARTITION 10 allot
> > > > >     THEN
> > > > >  
> > > > >     partition IF
> > > > > -       try-partitions
> > > > > +       ['] try-partitions
> > > > >     ELSE
> > > > > -       try-files
> > > > > +       ['] try-files
> > > > >     THEN
> > > > > +
> > > > > +   \ Catch any exception that might happen due to read-sector failing to read
> > > > > +   \ block-size number of bytes from any sector of the disk.
> > > > > +   CATCH IF false THEN
> > > Segher/Alexey, can we keep this CATCH block or should I remove it ?
> > 
> > imho the original bug should be handled more gracefully. Seeing exceptions in the code just triggers exceptions in my small brain :) Thanks,
> 
> :-). But this is the only other path that doesn't have a CATCH
> like the do-load subroutine in slof/fs/boot.fs. According to Segher
> there shouldn't ever be a problem with throw because if nothing else the
> outer-most interpreter loop's CATCH will catch the exception. But I
> thought to cover this throw in read-sector more locally in places near
> to this functionality. Because the outermost FORTH SLOF interpreter loop is not
> really so related to reading a sector in disk-label.fs.
> 
Alexey/Segher, so what should be the next steps ?
Do you find my explanation above okay or should I simply remove these
CATCH blocks ? Putting a CATCH block in count-dos-logical-partitions is
out of the question as there is already a CATCH in do-load in
slof/fs/boot.fs. This CATCH block in the open subroutine is actually to
prevent the exception to be caught at some non-local place in the code.
Kautuk Consul April 25, 2024, 7:14 a.m. UTC | #6
Hi,

On 2024-04-16 10:03:33, Kautuk Consul wrote:
> > 
> Alexey/Segher, so what should be the next steps ?
> Do you find my explanation above okay or should I simply remove these
> CATCH blocks ? Putting a CATCH block in count-dos-logical-partitions is
> out of the question as there is already a CATCH in do-load in
> slof/fs/boot.fs. This CATCH block in the open subroutine is actually to
> prevent the exception to be caught at some non-local place in the code.

Any ideas on how to proceed ?

> _______________________________________________
> SLOF mailing list
> SLOF@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/slof
diff mbox series

Patch

diff --git a/slof/fs/packages/disk-label.fs b/slof/fs/packages/disk-label.fs
index 661c6b0..a6fb231 100644
--- a/slof/fs/packages/disk-label.fs
+++ b/slof/fs/packages/disk-label.fs
@@ -136,7 +136,8 @@  CONSTANT /gpt-part-entry
 : read-sector ( sector-number -- )
    \ block-size is 0x200 on disks, 0x800 on cdrom drives
    block-size * 0 seek drop      \ seek to sector
-   block block-size read drop    \ read sector
+   block block-size read         \ read sector
+   block-size < IF throw THEN    \ if we read less than the block-size then throw an exception
 ;
 
 : (.part-entry) ( part-entry )
@@ -723,10 +724,15 @@  CREATE GPT-LINUX-PARTITION 10 allot
    THEN
 
    partition IF
-       try-partitions
+       ['] try-partitions
    ELSE
-       try-files
+       ['] try-files
    THEN
+
+   \ Catch any exception that might happen due to read-sector failing to read
+   \ block-size number of bytes from any sector of the disk.
+   CATCH IF false THEN
+
    dup 0= IF debug-disk-label? IF ." not found." cr THEN close THEN \ free memory again
 ;