diff mbox

[v1,1/1] xilinx_axidma.c: Fix up the stream_running() function

Message ID 14e62e727b01a64c188921b7ef2862f0f92734e4.1432711923.git.alistair.francis@xilinx.com
State New
Headers show

Commit Message

Alistair Francis May 27, 2015, 7:37 a.m. UTC
Previously the stream_running() function didn't check
if the DMA was halted. This caused hangs in recent versions
of MicroBlaze u-boot. Correct stream_running() to check
DMASR_HALTED as well as DMACR_RUNSTOP.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Sai Pavan Boddu <saipava@xilinx.com>
---
 hw/dma/xilinx_axidma.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

Comments

Alistair Francis June 3, 2015, 7 a.m. UTC | #1
On Wed, May 27, 2015 at 5:37 PM, Alistair Francis
<alistair.francis@xilinx.com> wrote:
> Previously the stream_running() function didn't check
> if the DMA was halted. This caused hangs in recent versions
> of MicroBlaze u-boot. Correct stream_running() to check
> DMASR_HALTED as well as DMACR_RUNSTOP.
>
> Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
> Reviewed-by: Sai Pavan Boddu <saipava@xilinx.com>

Ping!

Also add QEMU Trivial.

Thanks,

Alistair

> ---
>  hw/dma/xilinx_axidma.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
> index d06002d..27fba40 100644
> --- a/hw/dma/xilinx_axidma.c
> +++ b/hw/dma/xilinx_axidma.c
> @@ -154,7 +154,8 @@ static inline int stream_resetting(struct Stream *s)
>
>  static inline int stream_running(struct Stream *s)
>  {
> -    return s->regs[R_DMACR] & DMACR_RUNSTOP;
> +    return s->regs[R_DMACR] & DMACR_RUNSTOP &&
> +           !(s->regs[R_DMASR] & DMASR_HALTED);
>  }
>
>  static inline int stream_idle(struct Stream *s)
> --
> 1.7.1
>
Peter Crosthwaite June 3, 2015, 10:52 p.m. UTC | #2
On Wed, May 27, 2015 at 12:37 AM, Alistair Francis
<alistair.francis@xilinx.com> wrote:
> Previously the stream_running() function didn't check
> if the DMA was halted. This caused hangs in recent versions
> of MicroBlaze u-boot. Correct stream_running() to check
> DMASR_HALTED as well as DMACR_RUNSTOP.
>

So I'm stuggling with this one. Partly because I think HALTED might be
misimplemented in existing code. I did some digging, and AFAICS,
HALTED is conditional on !DAMCR_RUNSTOP. I think i might have got
210914e29975d17e635f9e8c1f7478c0ed7a208f wrong:

@@ -276,7 +276,7 @@ static void stream_process_mem2s(struct Stream *s,
         stream_desc_load(s, s->regs[R_CURDESC]);

         if (s->desc.status & SDESC_STATUS_COMPLETE) {
-            s->regs[R_DMASR] |= DMASR_IDLE;
+            s->regs[R_DMASR] |= DMASR_HALTED;
             break;
         }

Stepping back and ignoring the existing implementation of HALTED there
are 4 states of RS:H (RUNSTOP and HALTED):

!RS &&  H - this is the off state. doc refers to this as the "halted" state.
 RS && !H - This is the running state.
!RS && !H - This is the transient state. Software has cleared RS but
there s still something on AXI bus so cant assert halted yet.
 RS &&  H - This is an invalid state.

Current code reaches the invalid state on the ring buffer full case
due to the bug above. My thoery is
210914e29975d17e635f9e8c1f7478c0ed7a208f should have just been:

         if (s->desc.status & SDESC_STATUS_COMPLETE) {
-            s->regs[R_DMASR] |= DMASR_IDLE;
             break;
         }

Now I think there is yet another bug in that clearing RS doesn't seem
to be able to reliably set the HALTED bit (only in the unrelated case
of a ring buffer fill).

I'm starting to question whether the HALTED bit as far as QEMU is
concerned should just be a straight negation of RS. Depending on what
the conditions cause a transient and what doesn't, the transient as I
describe above may evaporate as we can get away with this simple
shortcut.

This would make this patch obsolete without fixing your bug :).

So running on the assumption that HALTED is misimplemented your patch
is doing something with that behaviour. The misimplemented HALTED is
currently holding the state of "we are blocked on a full buffer". If
you can point me which of the 3 call sites of stream_running was
giving you problems I might have more clues.

FYI you patch may still be correct but I wondering whether is has
uncovered a bug that should lead to a rework of this.

Regards,
Peter

> Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
> Reviewed-by: Sai Pavan Boddu <saipava@xilinx.com>
> ---
>  hw/dma/xilinx_axidma.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
> index d06002d..27fba40 100644
> --- a/hw/dma/xilinx_axidma.c
> +++ b/hw/dma/xilinx_axidma.c
> @@ -154,7 +154,8 @@ static inline int stream_resetting(struct Stream *s)
>
>  static inline int stream_running(struct Stream *s)
>  {
> -    return s->regs[R_DMACR] & DMACR_RUNSTOP;
> +    return s->regs[R_DMACR] & DMACR_RUNSTOP &&
> +           !(s->regs[R_DMASR] & DMASR_HALTED);
>  }
>
>  static inline int stream_idle(struct Stream *s)
> --
> 1.7.1
>
>
Michael Tokarev June 4, 2015, 10:20 a.m. UTC | #3
03.06.2015 10:00, Alistair Francis wrote:
> On Wed, May 27, 2015 at 5:37 PM, Alistair Francis
> <alistair.francis@xilinx.com> wrote:
>> Previously the stream_running() function didn't check
>> if the DMA was halted. This caused hangs in recent versions
>> of MicroBlaze u-boot. Correct stream_running() to check
>> DMASR_HALTED as well as DMACR_RUNSTOP.
>>
>> Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
>> Reviewed-by: Sai Pavan Boddu <saipava@xilinx.com>
> 
> Ping!
> 
> Also add QEMU Trivial.

As far as I can see, Peter Crosthwaite had some rather long
comment about this patch.  Maybe we should have some conclusion
before applying?

https://patchwork.ozlabs.org/patch/476977/

Thanks,

/mjt
Alistair Francis June 5, 2015, 2 a.m. UTC | #4
On Thu, Jun 4, 2015 at 8:52 AM, Peter Crosthwaite
<peter.crosthwaite@xilinx.com> wrote:
> On Wed, May 27, 2015 at 12:37 AM, Alistair Francis
> <alistair.francis@xilinx.com> wrote:
>> Previously the stream_running() function didn't check
>> if the DMA was halted. This caused hangs in recent versions
>> of MicroBlaze u-boot. Correct stream_running() to check
>> DMASR_HALTED as well as DMACR_RUNSTOP.
>>
>
> So I'm stuggling with this one. Partly because I think HALTED might be
> misimplemented in existing code. I did some digging, and AFAICS,
> HALTED is conditional on !DAMCR_RUNSTOP. I think i might have got
> 210914e29975d17e635f9e8c1f7478c0ed7a208f wrong:
>
> @@ -276,7 +276,7 @@ static void stream_process_mem2s(struct Stream *s,
>          stream_desc_load(s, s->regs[R_CURDESC]);
>
>          if (s->desc.status & SDESC_STATUS_COMPLETE) {
> -            s->regs[R_DMASR] |= DMASR_IDLE;
> +            s->regs[R_DMASR] |= DMASR_HALTED;
>              break;
>          }
>
> Stepping back and ignoring the existing implementation of HALTED there
> are 4 states of RS:H (RUNSTOP and HALTED):
>
> !RS &&  H - this is the off state. doc refers to this as the "halted" state.
>  RS && !H - This is the running state.
> !RS && !H - This is the transient state. Software has cleared RS but
> there s still something on AXI bus so cant assert halted yet.
>  RS &&  H - This is an invalid state.
>
> Current code reaches the invalid state on the ring buffer full case
> due to the bug above. My thoery is
> 210914e29975d17e635f9e8c1f7478c0ed7a208f should have just been:
>
>          if (s->desc.status & SDESC_STATUS_COMPLETE) {
> -            s->regs[R_DMASR] |= DMASR_IDLE;
>              break;
>          }
>
> Now I think there is yet another bug in that clearing RS doesn't seem
> to be able to reliably set the HALTED bit (only in the unrelated case
> of a ring buffer fill).
>
> I'm starting to question whether the HALTED bit as far as QEMU is
> concerned should just be a straight negation of RS. Depending on what
> the conditions cause a transient and what doesn't, the transient as I
> describe above may evaporate as we can get away with this simple
> shortcut.

Hey Peter,

Ok, so you are saying maybe we should only have 'Running' and 'Off'
states?

>
> This would make this patch obsolete without fixing your bug :).
>
> So running on the assumption that HALTED is misimplemented your patch
> is doing something with that behaviour. The misimplemented HALTED is
> currently holding the state of "we are blocked on a full buffer". If
> you can point me which of the 3 call sites of stream_running was
> giving you problems I might have more clues.

So the problem was basically because:
 - axienet_eth_rx_notify() calls the DMA push functions

 - xilinx_axidma_data_stream_can_push() returned true
    - Therefore xilinx_axidma_data_stream_push() could be called

 - This meant that stream_process_s2mem() was called
    - But it would break straight away as
      's->desc.status & SDESC_STATUS_COMPLETE' would return true.
    - This meant it would cause an infinite loop as the stream could be
      pushed, but nothing was ever pushed. As ret was always zero
      the code never broke out of the second while loop in
axienet_eth_rx_notify()

Thanks,

Alistair

>
> FYI you patch may still be correct but I wondering whether is has
> uncovered a bug that should lead to a rework of this.
>
> Regards,
> Peter
>
>> Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
>> Reviewed-by: Sai Pavan Boddu <saipava@xilinx.com>
>> ---
>>  hw/dma/xilinx_axidma.c |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
>> index d06002d..27fba40 100644
>> --- a/hw/dma/xilinx_axidma.c
>> +++ b/hw/dma/xilinx_axidma.c
>> @@ -154,7 +154,8 @@ static inline int stream_resetting(struct Stream *s)
>>
>>  static inline int stream_running(struct Stream *s)
>>  {
>> -    return s->regs[R_DMACR] & DMACR_RUNSTOP;
>> +    return s->regs[R_DMACR] & DMACR_RUNSTOP &&
>> +           !(s->regs[R_DMASR] & DMASR_HALTED);
>>  }
>>
>>  static inline int stream_idle(struct Stream *s)
>> --
>> 1.7.1
>>
>>
>
Peter Crosthwaite June 5, 2015, 2:58 a.m. UTC | #5
On Thu, Jun 4, 2015 at 7:00 PM, Alistair Francis
<alistair.francis@xilinx.com> wrote:
> On Thu, Jun 4, 2015 at 8:52 AM, Peter Crosthwaite
> <peter.crosthwaite@xilinx.com> wrote:
>> On Wed, May 27, 2015 at 12:37 AM, Alistair Francis
>> <alistair.francis@xilinx.com> wrote:
>>> Previously the stream_running() function didn't check
>>> if the DMA was halted. This caused hangs in recent versions
>>> of MicroBlaze u-boot. Correct stream_running() to check
>>> DMASR_HALTED as well as DMACR_RUNSTOP.
>>>
>>
>> So I'm stuggling with this one. Partly because I think HALTED might be
>> misimplemented in existing code. I did some digging, and AFAICS,
>> HALTED is conditional on !DAMCR_RUNSTOP. I think i might have got
>> 210914e29975d17e635f9e8c1f7478c0ed7a208f wrong:
>>
>> @@ -276,7 +276,7 @@ static void stream_process_mem2s(struct Stream *s,
>>          stream_desc_load(s, s->regs[R_CURDESC]);
>>
>>          if (s->desc.status & SDESC_STATUS_COMPLETE) {
>> -            s->regs[R_DMASR] |= DMASR_IDLE;
>> +            s->regs[R_DMASR] |= DMASR_HALTED;
>>              break;
>>          }
>>
>> Stepping back and ignoring the existing implementation of HALTED there
>> are 4 states of RS:H (RUNSTOP and HALTED):
>>
>> !RS &&  H - this is the off state. doc refers to this as the "halted" state.
>>  RS && !H - This is the running state.
>> !RS && !H - This is the transient state. Software has cleared RS but
>> there s still something on AXI bus so cant assert halted yet.
>>  RS &&  H - This is an invalid state.
>>
>> Current code reaches the invalid state on the ring buffer full case
>> due to the bug above. My thoery is
>> 210914e29975d17e635f9e8c1f7478c0ed7a208f should have just been:
>>
>>          if (s->desc.status & SDESC_STATUS_COMPLETE) {
>> -            s->regs[R_DMASR] |= DMASR_IDLE;
>>              break;
>>          }
>>
>> Now I think there is yet another bug in that clearing RS doesn't seem
>> to be able to reliably set the HALTED bit (only in the unrelated case
>> of a ring buffer fill).
>>
>> I'm starting to question whether the HALTED bit as far as QEMU is
>> concerned should just be a straight negation of RS. Depending on what
>> the conditions cause a transient and what doesn't, the transient as I
>> describe above may evaporate as we can get away with this simple
>> shortcut.
>
> Hey Peter,
>
> Ok, so you are saying maybe we should only have 'Running' and 'Off'
> states?
>
>>
>> This would make this patch obsolete without fixing your bug :).
>>
>> So running on the assumption that HALTED is misimplemented your patch
>> is doing something with that behaviour. The misimplemented HALTED is
>> currently holding the state of "we are blocked on a full buffer". If
>> you can point me which of the 3 call sites of stream_running was
>> giving you problems I might have more clues.
>
> So the problem was basically because:
>  - axienet_eth_rx_notify() calls the DMA push functions
>
>  - xilinx_axidma_data_stream_can_push() returned true
>     - Therefore xilinx_axidma_data_stream_push() could be called
>
>  - This meant that stream_process_s2mem() was called
>     - But it would break straight away as
>       's->desc.status & SDESC_STATUS_COMPLETE' would return true.
>     - This meant it would cause an infinite loop as the stream could be
>       pushed, but nothing was ever pushed. As ret was always zero
>       the code never broke out of the second while loop in
> axienet_eth_rx_notify()
>

Ok I think I know what is up. It happens by fluke, that the bad
implementation of halted is exactly the state you need to correct this
issue. You could capture it as a new boolean in the state struct and
just check it in can_push. I'm looking at ways to refactor it to avoid
the extra calls to _push though. It might be possible to refactor such
that all of stream_running, stream_idle and the SDESC_STATUS_COMPLETE
happen in can_push. This means can_push has to do the
stream_desc_load. But that is ok, as the current solution will already
have spurious fetches of the descriptor. Then make can_push itself the
iteration condition for _mem2s's loop which will handle descriptor
fetches for you in the process.

Regards,
Peter

> Thanks,
>
> Alistair
>
>>
>> FYI you patch may still be correct but I wondering whether is has
>> uncovered a bug that should lead to a rework of this.
>>
>> Regards,
>> Peter
>>
>>> Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
>>> Reviewed-by: Sai Pavan Boddu <saipava@xilinx.com>
>>> ---
>>>  hw/dma/xilinx_axidma.c |    3 ++-
>>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
>>> index d06002d..27fba40 100644
>>> --- a/hw/dma/xilinx_axidma.c
>>> +++ b/hw/dma/xilinx_axidma.c
>>> @@ -154,7 +154,8 @@ static inline int stream_resetting(struct Stream *s)
>>>
>>>  static inline int stream_running(struct Stream *s)
>>>  {
>>> -    return s->regs[R_DMACR] & DMACR_RUNSTOP;
>>> +    return s->regs[R_DMACR] & DMACR_RUNSTOP &&
>>> +           !(s->regs[R_DMASR] & DMASR_HALTED);
>>>  }
>>>
>>>  static inline int stream_idle(struct Stream *s)
>>> --
>>> 1.7.1
>>>
>>>
>>
>
Alistair Francis June 19, 2015, 12:04 a.m. UTC | #6
On Fri, Jun 5, 2015 at 12:58 PM, Peter Crosthwaite
<peter.crosthwaite@xilinx.com> wrote:
> On Thu, Jun 4, 2015 at 7:00 PM, Alistair Francis
> <alistair.francis@xilinx.com> wrote:
>> On Thu, Jun 4, 2015 at 8:52 AM, Peter Crosthwaite
>> <peter.crosthwaite@xilinx.com> wrote:
>>> On Wed, May 27, 2015 at 12:37 AM, Alistair Francis
>>> <alistair.francis@xilinx.com> wrote:
>>>> Previously the stream_running() function didn't check
>>>> if the DMA was halted. This caused hangs in recent versions
>>>> of MicroBlaze u-boot. Correct stream_running() to check
>>>> DMASR_HALTED as well as DMACR_RUNSTOP.
>>>>
>>>
>>> So I'm stuggling with this one. Partly because I think HALTED might be
>>> misimplemented in existing code. I did some digging, and AFAICS,
>>> HALTED is conditional on !DAMCR_RUNSTOP. I think i might have got
>>> 210914e29975d17e635f9e8c1f7478c0ed7a208f wrong:
>>>
>>> @@ -276,7 +276,7 @@ static void stream_process_mem2s(struct Stream *s,
>>>          stream_desc_load(s, s->regs[R_CURDESC]);
>>>
>>>          if (s->desc.status & SDESC_STATUS_COMPLETE) {
>>> -            s->regs[R_DMASR] |= DMASR_IDLE;
>>> +            s->regs[R_DMASR] |= DMASR_HALTED;
>>>              break;
>>>          }
>>>
>>> Stepping back and ignoring the existing implementation of HALTED there
>>> are 4 states of RS:H (RUNSTOP and HALTED):
>>>
>>> !RS &&  H - this is the off state. doc refers to this as the "halted" state.
>>>  RS && !H - This is the running state.
>>> !RS && !H - This is the transient state. Software has cleared RS but
>>> there s still something on AXI bus so cant assert halted yet.
>>>  RS &&  H - This is an invalid state.
>>>
>>> Current code reaches the invalid state on the ring buffer full case
>>> due to the bug above. My thoery is
>>> 210914e29975d17e635f9e8c1f7478c0ed7a208f should have just been:
>>>
>>>          if (s->desc.status & SDESC_STATUS_COMPLETE) {
>>> -            s->regs[R_DMASR] |= DMASR_IDLE;
>>>              break;
>>>          }
>>>
>>> Now I think there is yet another bug in that clearing RS doesn't seem
>>> to be able to reliably set the HALTED bit (only in the unrelated case
>>> of a ring buffer fill).
>>>
>>> I'm starting to question whether the HALTED bit as far as QEMU is
>>> concerned should just be a straight negation of RS. Depending on what
>>> the conditions cause a transient and what doesn't, the transient as I
>>> describe above may evaporate as we can get away with this simple
>>> shortcut.
>>
>> Hey Peter,
>>
>> Ok, so you are saying maybe we should only have 'Running' and 'Off'
>> states?
>>
>>>
>>> This would make this patch obsolete without fixing your bug :).
>>>
>>> So running on the assumption that HALTED is misimplemented your patch
>>> is doing something with that behaviour. The misimplemented HALTED is
>>> currently holding the state of "we are blocked on a full buffer". If
>>> you can point me which of the 3 call sites of stream_running was
>>> giving you problems I might have more clues.
>>
>> So the problem was basically because:
>>  - axienet_eth_rx_notify() calls the DMA push functions
>>
>>  - xilinx_axidma_data_stream_can_push() returned true
>>     - Therefore xilinx_axidma_data_stream_push() could be called
>>
>>  - This meant that stream_process_s2mem() was called
>>     - But it would break straight away as
>>       's->desc.status & SDESC_STATUS_COMPLETE' would return true.
>>     - This meant it would cause an infinite loop as the stream could be
>>       pushed, but nothing was ever pushed. As ret was always zero
>>       the code never broke out of the second while loop in
>> axienet_eth_rx_notify()
>>
>
> Ok I think I know what is up. It happens by fluke, that the bad
> implementation of halted is exactly the state you need to correct this
> issue. You could capture it as a new boolean in the state struct and
> just check it in can_push. I'm looking at ways to refactor it to avoid

Do you mean capture the running/!running state in the state struct?

> the extra calls to _push though. It might be possible to refactor such
> that all of stream_running, stream_idle and the SDESC_STATUS_COMPLETE
> happen in can_push. This means can_push has to do the
> stream_desc_load. But that is ok, as the current solution will already
> have spurious fetches of the descriptor. Then make can_push itself the
> iteration condition for _mem2s's loop which will handle descriptor
> fetches for you in the process.

Is it really that bad to be calling _push?

Thanks,

Alistair

>
> Regards,
> Peter
>
>> Thanks,
>>
>> Alistair
>>
>>>
>>> FYI you patch may still be correct but I wondering whether is has
>>> uncovered a bug that should lead to a rework of this.
>>>
>>> Regards,
>>> Peter
>>>
>>>> Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
>>>> Reviewed-by: Sai Pavan Boddu <saipava@xilinx.com>
>>>> ---
>>>>  hw/dma/xilinx_axidma.c |    3 ++-
>>>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
>>>> index d06002d..27fba40 100644
>>>> --- a/hw/dma/xilinx_axidma.c
>>>> +++ b/hw/dma/xilinx_axidma.c
>>>> @@ -154,7 +154,8 @@ static inline int stream_resetting(struct Stream *s)
>>>>
>>>>  static inline int stream_running(struct Stream *s)
>>>>  {
>>>> -    return s->regs[R_DMACR] & DMACR_RUNSTOP;
>>>> +    return s->regs[R_DMACR] & DMACR_RUNSTOP &&
>>>> +           !(s->regs[R_DMASR] & DMASR_HALTED);
>>>>  }
>>>>
>>>>  static inline int stream_idle(struct Stream *s)
>>>> --
>>>> 1.7.1
>>>>
>>>>
>>>
>>
>
diff mbox

Patch

diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
index d06002d..27fba40 100644
--- a/hw/dma/xilinx_axidma.c
+++ b/hw/dma/xilinx_axidma.c
@@ -154,7 +154,8 @@  static inline int stream_resetting(struct Stream *s)
 
 static inline int stream_running(struct Stream *s)
 {
-    return s->regs[R_DMACR] & DMACR_RUNSTOP;
+    return s->regs[R_DMACR] & DMACR_RUNSTOP &&
+           !(s->regs[R_DMASR] & DMASR_HALTED);
 }
 
 static inline int stream_idle(struct Stream *s)