diff mbox

[M25P80] Make sure not to overrun the internal data buffer.

Message ID 20161224151113.23955-1-jcd@tribudubois.net
State New
Headers show

Commit Message

Jean-Christophe DUBOIS Dec. 24, 2016, 3:11 p.m. UTC
It did happen that the internal data buffer was overrun leading to a Qemu
crash (in particular while emulating the i.MX6 sabrelite board).

This patch makes sure the data array would not be overrun and allow the
sabrelite emulation to run without crash.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
---
 hw/block/m25p80.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

mar.krzeminski Dec. 24, 2016, 5:18 p.m. UTC | #1
Hello,

W dniu 24.12.2016 o 16:11, Jean-Christophe Dubois pisze:
> It did happen that the internal data buffer was overrun leading to a Qemu
> crash (in particular while emulating the i.MX6 sabrelite board).
>
> This patch makes sure the data array would not be overrun and allow the
> sabrelite emulation to run without crash.
>
> Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
> ---
>   hw/block/m25p80.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> index d29ff4c..a1c4e5d 100644
> --- a/hw/block/m25p80.c
> +++ b/hw/block/m25p80.c
> @@ -1117,7 +1117,7 @@ static uint32_t m25p80_transfer8(SSISlave *ss, uint32_t tx)
>           s->data[s->len] = (uint8_t)tx;
>           s->len++;
>   
> -        if (s->len == s->needed_bytes) {
> +        if ((s->len >= s->needed_bytes) || (s->len >= sizeof(s->data))) {
>               complete_collecting_data(s);
>           }
>           break;
Do you have exact scenario that caused the problem?
Generally it should not happen.

Thanks,
Marcin
Jean-Christophe DUBOIS Dec. 24, 2016, 5:41 p.m. UTC | #2
Le 24/12/2016 à 18:18, mar.krzeminski a écrit :
> Hello,
>
> W dniu 24.12.2016 o 16:11, Jean-Christophe Dubois pisze:
>> It did happen that the internal data buffer was overrun leading to a 
>> Qemu
>> crash (in particular while emulating the i.MX6 sabrelite board).
>>
>> This patch makes sure the data array would not be overrun and allow the
>> sabrelite emulation to run without crash.
>>
>> Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
>> ---
>>   hw/block/m25p80.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
>> index d29ff4c..a1c4e5d 100644
>> --- a/hw/block/m25p80.c
>> +++ b/hw/block/m25p80.c
>> @@ -1117,7 +1117,7 @@ static uint32_t m25p80_transfer8(SSISlave *ss, 
>> uint32_t tx)
>>           s->data[s->len] = (uint8_t)tx;
>>           s->len++;
>>   -        if (s->len == s->needed_bytes) {
>> +        if ((s->len >= s->needed_bytes) || (s->len >= 
>> sizeof(s->data))) {
>>               complete_collecting_data(s);
>>           }
>>           break;
> Do you have exact scenario that caused the problem?

When booting Xvisor (http://xhypervisor.org/) on top of Qemu emulated 
Sabrelite.

During the boot Qemu would segfault while writing to the SPI flash.

> Generally it should not happen.

The fact is that there is no protection to make sure the data array is 
not overrun.

May be it should not happen but it did happen in this case ....

JC


>
> Thanks,
> Marcin
>
>
mar.krzeminski Dec. 24, 2016, 6:04 p.m. UTC | #3
W dniu 24.12.2016 o 18:41, Jean-Christophe DUBOIS pisze:
> Le 24/12/2016 à 18:18, mar.krzeminski a écrit :
>> Hello,
>>
>> W dniu 24.12.2016 o 16:11, Jean-Christophe Dubois pisze:
>>> It did happen that the internal data buffer was overrun leading to a 
>>> Qemu
>>> crash (in particular while emulating the i.MX6 sabrelite board).
>>>
>>> This patch makes sure the data array would not be overrun and allow the
>>> sabrelite emulation to run without crash.
>>>
>>> Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
>>> ---
>>>   hw/block/m25p80.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
>>> index d29ff4c..a1c4e5d 100644
>>> --- a/hw/block/m25p80.c
>>> +++ b/hw/block/m25p80.c
>>> @@ -1117,7 +1117,7 @@ static uint32_t m25p80_transfer8(SSISlave *ss, 
>>> uint32_t tx)
>>>           s->data[s->len] = (uint8_t)tx;
>>>           s->len++;
>>>   -        if (s->len == s->needed_bytes) {
>>> +        if ((s->len >= s->needed_bytes) || (s->len >= 
>>> sizeof(s->data))) {
>>>               complete_collecting_data(s);
>>>           }
>>>           break;
>> Do you have exact scenario that caused the problem?
>
> When booting Xvisor (http://xhypervisor.org/) on top of Qemu emulated 
> Sabrelite.
>
> During the boot Qemu would segfault while writing to the SPI flash.
Thanks, I'll try to take I look.
>
>> Generally it should not happen.
>
> The fact is that there is no protection to make sure the data array is 
> not overrun.
Yes. IMHO it could be nice to log some error here and reset state 
machine instead
of going to next state.
>
> May be it should not happen but it did happen in this case ....
Yeap, but this mean m25p80's state machine goes nuts. Overflow is just a 
symptom
that something wrong is going on.

Thanks,
Marcin
>
> JC
>
>
>>
>> Thanks,
>> Marcin
>>
>>
>
>
Jean-Christophe DUBOIS Dec. 24, 2016, 6:12 p.m. UTC | #4
Le 24/12/2016 à 19:04, mar.krzeminski a écrit :
>
>
> W dniu 24.12.2016 o 18:41, Jean-Christophe DUBOIS pisze:
>> Le 24/12/2016 à 18:18, mar.krzeminski a écrit :
>>> Hello,
>>>
>>> W dniu 24.12.2016 o 16:11, Jean-Christophe Dubois pisze:
>>>> It did happen that the internal data buffer was overrun leading to 
>>>> a Qemu
>>>> crash (in particular while emulating the i.MX6 sabrelite board).
>>>>
>>>> This patch makes sure the data array would not be overrun and allow 
>>>> the
>>>> sabrelite emulation to run without crash.
>>>>
>>>> Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
>>>> ---
>>>>   hw/block/m25p80.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
>>>> index d29ff4c..a1c4e5d 100644
>>>> --- a/hw/block/m25p80.c
>>>> +++ b/hw/block/m25p80.c
>>>> @@ -1117,7 +1117,7 @@ static uint32_t m25p80_transfer8(SSISlave 
>>>> *ss, uint32_t tx)
>>>>           s->data[s->len] = (uint8_t)tx;
>>>>           s->len++;
>>>>   -        if (s->len == s->needed_bytes) {
>>>> +        if ((s->len >= s->needed_bytes) || (s->len >= 
>>>> sizeof(s->data))) {
>>>>               complete_collecting_data(s);
>>>>           }
>>>>           break;
>>> Do you have exact scenario that caused the problem?
>>
>> When booting Xvisor (http://xhypervisor.org/) on top of Qemu emulated 
>> Sabrelite.
>>
>> During the boot Qemu would segfault while writing to the SPI flash.
> Thanks, I'll try to take I look.

Once you have built Xvisor for "generic ARMv7" you can run the following 
command.

qemu-system-arm -M sabrelite -display none -serial null -serial stdio 
-kernel ./build/vmm.bin -initrd ./build/vmm.bin  -dtb 
./build/arch/arm/board/generic/dts/imx6/sabrelite-a9/one_guest_sabrelite-a9.dtb

You can also run Qemu  under valgrind that will pinpoint the problem.

JC

>>
>>> Generally it should not happen.
>>
>> The fact is that there is no protection to make sure the data array 
>> is not overrun.
> Yes. IMHO it could be nice to log some error here and reset state 
> machine instead
> of going to next state.
>>
>> May be it should not happen but it did happen in this case ....
> Yeap, but this mean m25p80's state machine goes nuts. Overflow is just 
> a symptom
> that something wrong is going on.
>
> Thanks,
> Marcin
>>
>> JC
>>
>>
>>>
>>> Thanks,
>>> Marcin
>>>
>>>
>>
>>
>
>
Jean-Christophe DUBOIS Dec. 27, 2016, 5:08 p.m. UTC | #5
You can have a more detailed procedure on how to run Xvisor on Qemu 
Sabrelite (with Linux guests if you wish) at the following URL.

https://github.com/avpatel/xvisor-next/blob/master/docs/arm/imx6-sabrelite.txt

You don't need to start the guest to see the crash. Just boot Xvisor ...

JC

Le 24/12/2016 à 19:12, Jean-Christophe DUBOIS a écrit :
> Le 24/12/2016 à 19:04, mar.krzeminski a écrit :
>>
>>
>> W dniu 24.12.2016 o 18:41, Jean-Christophe DUBOIS pisze:
>>> Le 24/12/2016 à 18:18, mar.krzeminski a écrit :
>>>> Hello,
>>>>
>>>> W dniu 24.12.2016 o 16:11, Jean-Christophe Dubois pisze:
>>>>> It did happen that the internal data buffer was overrun leading to 
>>>>> a Qemu
>>>>> crash (in particular while emulating the i.MX6 sabrelite board).
>>>>>
>>>>> This patch makes sure the data array would not be overrun and 
>>>>> allow the
>>>>> sabrelite emulation to run without crash.
>>>>>
>>>>> Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
>>>>> ---
>>>>>   hw/block/m25p80.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
>>>>> index d29ff4c..a1c4e5d 100644
>>>>> --- a/hw/block/m25p80.c
>>>>> +++ b/hw/block/m25p80.c
>>>>> @@ -1117,7 +1117,7 @@ static uint32_t m25p80_transfer8(SSISlave 
>>>>> *ss, uint32_t tx)
>>>>>           s->data[s->len] = (uint8_t)tx;
>>>>>           s->len++;
>>>>>   -        if (s->len == s->needed_bytes) {
>>>>> +        if ((s->len >= s->needed_bytes) || (s->len >= 
>>>>> sizeof(s->data))) {
>>>>>               complete_collecting_data(s);
>>>>>           }
>>>>>           break;
>>>> Do you have exact scenario that caused the problem?
>>>
>>> When booting Xvisor (http://xhypervisor.org/) on top of Qemu 
>>> emulated Sabrelite.
>>>
>>> During the boot Qemu would segfault while writing to the SPI flash.
>> Thanks, I'll try to take I look.
>
> Once you have built Xvisor for "generic ARMv7" you can run the 
> following command.
>
> qemu-system-arm -M sabrelite -display none -serial null -serial stdio 
> -kernel ./build/vmm.bin -initrd ./build/vmm.bin  -dtb 
> ./build/arch/arm/board/generic/dts/imx6/sabrelite-a9/one_guest_sabrelite-a9.dtb
>
> You can also run Qemu  under valgrind that will pinpoint the problem.
>
> JC
>
>>>
>>>> Generally it should not happen.
>>>
>>> The fact is that there is no protection to make sure the data array 
>>> is not overrun.
>> Yes. IMHO it could be nice to log some error here and reset state 
>> machine instead
>> of going to next state.
>>>
>>> May be it should not happen but it did happen in this case ....
>> Yeap, but this mean m25p80's state machine goes nuts. Overflow is 
>> just a symptom
>> that something wrong is going on.
>>
>> Thanks,
>> Marcin
>>>
>>> JC
>>>
>>>
>>>>
>>>> Thanks,
>>>> Marcin
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
mar.krzeminski Dec. 30, 2016, 3:39 p.m. UTC | #6
W dniu 27.12.2016 o 18:08, Jean-Christophe DUBOIS pisze:
> You can have a more detailed procedure on how to run Xvisor on Qemu 
> Sabrelite (with Linux guests if you wish) at the following URL.
>
> https://github.com/avpatel/xvisor-next/blob/master/docs/arm/imx6-sabrelite.txt 
>
>
> You don't need to start the guest to see the crash. Just boot Xvisor ...
>
> JC
>
> Le 24/12/2016 à 19:12, Jean-Christophe DUBOIS a écrit :
>> Le 24/12/2016 à 19:04, mar.krzeminski a écrit :
>>>
>>>
>>> W dniu 24.12.2016 o 18:41, Jean-Christophe DUBOIS pisze:
>>>> Le 24/12/2016 à 18:18, mar.krzeminski a écrit :
>>>>> Hello,
>>>>>
>>>>> W dniu 24.12.2016 o 16:11, Jean-Christophe Dubois pisze:
>>>>>> It did happen that the internal data buffer was overrun leading 
>>>>>> to a Qemu
>>>>>> crash (in particular while emulating the i.MX6 sabrelite board).
>>>>>>
>>>>>> This patch makes sure the data array would not be overrun and 
>>>>>> allow the
>>>>>> sabrelite emulation to run without crash.
>>>>>>
>>>>>> Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
>>>>>> ---
>>>>>>   hw/block/m25p80.c | 2 +-
>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
>>>>>> index d29ff4c..a1c4e5d 100644
>>>>>> --- a/hw/block/m25p80.c
>>>>>> +++ b/hw/block/m25p80.c
>>>>>> @@ -1117,7 +1117,7 @@ static uint32_t m25p80_transfer8(SSISlave 
>>>>>> *ss, uint32_t tx)
>>>>>>           s->data[s->len] = (uint8_t)tx;
>>>>>>           s->len++;
>>>>>>   -        if (s->len == s->needed_bytes) {
>>>>>> +        if ((s->len >= s->needed_bytes) || (s->len >= 
>>>>>> sizeof(s->data))) {
>>>>>>               complete_collecting_data(s);
>>>>>>           }
>>>>>>           break;
>>>>> Do you have exact scenario that caused the problem?
>>>>
>>>> When booting Xvisor (http://xhypervisor.org/) on top of Qemu 
>>>> emulated Sabrelite.
>>>>
>>>> During the boot Qemu would segfault while writing to the SPI flash.
>>> Thanks, I'll try to take I look.
>>
>> Once you have built Xvisor for "generic ARMv7" you can run the 
>> following command.
>>
>> qemu-system-arm -M sabrelite -display none -serial null -serial stdio 
>> -kernel ./build/vmm.bin -initrd ./build/vmm.bin  -dtb 
>> ./build/arch/arm/board/generic/dts/imx6/sabrelite-a9/one_guest_sabrelite-a9.dtb
>>
I got some time, and reproduced the problem. Here are some logs with 
m25p80 debugs:
: decode_new_cmd: decoded new command:9f
: decode_new_cmd: populated jedec code
: decode_new_cmd: decoded new command:0
: decode_new_cmd: decoded new command:0 //Getting flash Id in above 4 
lines -> OK (but missing CS)
Found sst25vf016b compatible flash device
: decode_new_cmd: decoded new command:6 //Write enable, command without 
payload, so it is ok
: decode_new_cmd: decoded new command:1 //Write to status register, 
guest sends data
INFO: spi0.0: sst25vf016b (2048 Kbytes)
INFO: spi0.0: mtd
   .name = spi0.0,
   .size = 0x200000 (2MiB)
   .erasesize = 0x00001000 (4KiB)
   .numeraseregions = 0
Segmentation fault (core dumped) //Here probably guest try to send some data

The root cause why m25p80 enter strange state is that CS line is not 
selected/deselected at all- there is missing debug from m25p80_cs.
In spi transfer CS line (here qemu_irq) should be 0 before begin of 
every message, and set after end of transmission.
In case of simple WREN command you should see something like this:
: m25p80_cs: deselect
: decode_new_cmd: decoded new command:6
: m25p80_cs: select

Can you check spi controller model code?

Thanks,
Marcin

>> You can also run Qemu  under valgrind that will pinpoint the problem.
>>
>> JC
>>
>>>>
>>>>> Generally it should not happen.
>>>>
>>>> The fact is that there is no protection to make sure the data array 
>>>> is not overrun.
>>> Yes. IMHO it could be nice to log some error here and reset state 
>>> machine instead
>>> of going to next state.
>>>>
>>>> May be it should not happen but it did happen in this case ....
>>> Yeap, but this mean m25p80's state machine goes nuts. Overflow is 
>>> just a symptom
>>> that something wrong is going on.
>>>
>>> Thanks,
>>> Marcin
>>>>
>>>> JC
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Marcin
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>
Jean-Christophe DUBOIS Dec. 30, 2016, 5:14 p.m. UTC | #7
Le 30/12/2016 à 16:39, mar.krzeminski a écrit :
> I got some time, and reproduced the problem. Here are some logs with 
> m25p80 debugs:
> : decode_new_cmd: decoded new command:9f
> : decode_new_cmd: populated jedec code
> : decode_new_cmd: decoded new command:0
> : decode_new_cmd: decoded new command:0 //Getting flash Id in above 4 
> lines -> OK (but missing CS)
> Found sst25vf016b compatible flash device
> : decode_new_cmd: decoded new command:6 //Write enable, command 
> without payload, so it is ok
> : decode_new_cmd: decoded new command:1 //Write to status register, 
> guest sends data
> INFO: spi0.0: sst25vf016b (2048 Kbytes)
> INFO: spi0.0: mtd
>   .name = spi0.0,
>   .size = 0x200000 (2MiB)
>   .erasesize = 0x00001000 (4KiB)
>   .numeraseregions = 0
> Segmentation fault (core dumped) //Here probably guest try to send 
> some data
>
> The root cause why m25p80 enter strange state is that CS line is not 
> selected/deselected at all- there is missing debug from m25p80_cs.
> In spi transfer CS line (here qemu_irq) should be 0 before begin of 
> every message, and set after end of transmission.
> In case of simple WREN command you should see something like this:
> : m25p80_cs: deselect
> : decode_new_cmd: decoded new command:6
> : m25p80_cs: select
>
> Can you check spi controller model code?

I'll double check.

But why is the SPI memory/device even responding if CS is not set ?

>
> Thanks,
> Marcin
>
>
mar.krzeminski Dec. 30, 2016, 6:09 p.m. UTC | #8
W dniu 30.12.2016 o 18:14, Jean-Christophe DUBOIS pisze:
> Le 30/12/2016 à 16:39, mar.krzeminski a écrit :
>> I got some time, and reproduced the problem. Here are some logs with 
>> m25p80 debugs:
>> : decode_new_cmd: decoded new command:9f
>> : decode_new_cmd: populated jedec code
>> : decode_new_cmd: decoded new command:0
>> : decode_new_cmd: decoded new command:0 //Getting flash Id in above 4 
>> lines -> OK (but missing CS)
>> Found sst25vf016b compatible flash device
>> : decode_new_cmd: decoded new command:6 //Write enable, command 
>> without payload, so it is ok
>> : decode_new_cmd: decoded new command:1 //Write to status register, 
>> guest sends data
>> INFO: spi0.0: sst25vf016b (2048 Kbytes)
>> INFO: spi0.0: mtd
>>   .name = spi0.0,
>>   .size = 0x200000 (2MiB)
>>   .erasesize = 0x00001000 (4KiB)
>>   .numeraseregions = 0
>> Segmentation fault (core dumped) //Here probably guest try to send 
>> some data
>>
>> The root cause why m25p80 enter strange state is that CS line is not 
>> selected/deselected at all- there is missing debug from m25p80_cs.
>> In spi transfer CS line (here qemu_irq) should be 0 before begin of 
>> every message, and set after end of transmission.
>> In case of simple WREN command you should see something like this:
>> : m25p80_cs: deselect
>> : decode_new_cmd: decoded new command:6
>> : m25p80_cs: select
>>
>> Can you check spi controller model code?
>
> I'll double check.
>
> But why is the SPI memory/device even responding if CS is not set ?
Looking at ssi code it should not.
Flash (so the m25p80) is responding when CS line is low and it seem that 
this is default.

Thanks,
Marcin

>
>>
>> Thanks,
>> Marcin
>>
>>
>
>
Jean-Christophe DUBOIS Jan. 2, 2017, 9:24 p.m. UTC | #9
Le 30/12/2016 à 19:09, mar.krzeminski a écrit :
>>> Can you check spi controller model code?
>>
>> I'll double check.
>>
>> But why is the SPI memory/device even responding if CS is not set ?
> Looking at ssi code it should not.
> Flash (so the m25p80) is responding when CS line is low and it seem 
> that this is default.

So I fixed the CS handling in the i.MX SPI device emulator and I don't 
experience crashes anymore. Thanks for your help.

You may still want to harden the m25p80 code to make sure it doesn't 
overrun its internal buffer.

>
> Thanks,
> Marcin
>
>>
>>>
>>> Thanks,
>>> Marcin
>>>
>>>
>>
>>
>
>
mar.krzeminski Jan. 3, 2017, 5:08 p.m. UTC | #10
W dniu 02.01.2017 o 22:24, Jean-Christophe DUBOIS pisze:
> Le 30/12/2016 à 19:09, mar.krzeminski a écrit :
>>>> Can you check spi controller model code?
>>>
>>> I'll double check.
>>>
>>> But why is the SPI memory/device even responding if CS is not set ?
>> Looking at ssi code it should not.
>> Flash (so the m25p80) is responding when CS line is low and it seem 
>> that this is default.
>
> So I fixed the CS handling in the i.MX SPI device emulator and I don't 
> experience crashes anymore. Thanks for your help.

Cool!
>
> You may still want to harden the m25p80 code to make sure it doesn't 
> overrun its internal buffer.

Yeap, for me it is a bug.
I understand you will not finish this patch, won't you ?:)

Thanks,
Marcin
>
>>
>> Thanks,
>> Marcin
>>
>>>
>>>>
>>>> Thanks,
>>>> Marcin
>>>>
>>>>
>>>
>>>
>>
>>
>
>
Jean-Christophe DUBOIS Jan. 3, 2017, 8:34 p.m. UTC | #11
Le 03/01/2017 à 18:08, mar.krzeminski a écrit :
>
>> You may still want to harden the m25p80 code to make sure it doesn't 
>> overrun its internal buffer.
>
> Yeap, for me it is a bug.
> I understand you will not finish this patch, won't you ?:)

I could give it a shot.

JC
diff mbox

Patch

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index d29ff4c..a1c4e5d 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -1117,7 +1117,7 @@  static uint32_t m25p80_transfer8(SSISlave *ss, uint32_t tx)
         s->data[s->len] = (uint8_t)tx;
         s->len++;
 
-        if (s->len == s->needed_bytes) {
+        if ((s->len >= s->needed_bytes) || (s->len >= sizeof(s->data))) {
             complete_collecting_data(s);
         }
         break;