[LEDE-DEV] Transmit timeouts with mtk_eth_soc and MT7621

Message ID CAJ0DADJogZ_3-+yUwdprv4kFVhgDk0ai7fK1uxQnu0q88QcrSw@mail.gmail.com
State New
Headers show

Commit Message

michael lee July 23, 2017, 2:50 p.m.
Hi.
i write a path. could you help to verify it?

Comments

Kristian Evensen July 23, 2017, 10:46 p.m. | #1
Hi!

On Sun, Jul 23, 2017 at 4:50 PM, Mingyu Li <igvtee@gmail.com> wrote:
> Hi.
> i write a path. could you help to verify it?

Thanks a lot, will do! I am leaving on holiday tomorrow, so testing
might take a little while, but I will get back to you as soon as
possible. Would you mind explaining a bit about what you think
triggers the issuse and how the patch adresses it?

Kristian
michael lee July 24, 2017, 2:02 a.m. | #2
Hi.

i guest the problem is there are some tx data not free. but tx
interrupt is clean. cause tx timeout. the old code will free data
first then clean interrupt. but there maybe new data arrive after free
data before clean interrupt.
so change it to clean interrupt first then clean all tx data( also
remove the budget limit). if new tx data arrive. hardware will set tx
interrupt flag. then we will free it next time.
i also apply this to rx flow.
Kristian Evensen July 24, 2017, 9:19 a.m. | #3
Hi,

On Mon, Jul 24, 2017 at 4:02 AM, Mingyu Li <igvtee@gmail.com> wrote:
> i guest the problem is there are some tx data not free. but tx
> interrupt is clean. cause tx timeout. the old code will free data
> first then clean interrupt. but there maybe new data arrive after free
> data before clean interrupt.
> so change it to clean interrupt first then clean all tx data( also
> remove the budget limit). if new tx data arrive. hardware will set tx
> interrupt flag. then we will free it next time.
> i also apply this to rx flow.

Thanks for the detailed explanation. I have deployed an image with the
patch to some of the routers showing this issue, so lets wait and see.
Of course, all routers have been stable for the last couple of days
(including before the weekend) now, so I will let them run for a week
or so and then report back.

In order to ease testing and make it more controlled, do you have any
suggestions for how to trigger the error? Is it "just" a timing issue
or should I be able to trigger it with for example a specific traffic
pattern?

-Kristian
michael lee July 24, 2017, 3:45 p.m. | #4
i guess more other interrupts maybe cause the problem. because the
ethernet receive flow is interrupt by other hardware. so use sd card,
wifi or usb can generate interrupts.

2017-07-24 17:19 GMT+08:00 Kristian Evensen <kristian.evensen@gmail.com>:
> Hi,
>
> On Mon, Jul 24, 2017 at 4:02 AM, Mingyu Li <igvtee@gmail.com> wrote:
>> i guest the problem is there are some tx data not free. but tx
>> interrupt is clean. cause tx timeout. the old code will free data
>> first then clean interrupt. but there maybe new data arrive after free
>> data before clean interrupt.
>> so change it to clean interrupt first then clean all tx data( also
>> remove the budget limit). if new tx data arrive. hardware will set tx
>> interrupt flag. then we will free it next time.
>> i also apply this to rx flow.
>
> Thanks for the detailed explanation. I have deployed an image with the
> patch to some of the routers showing this issue, so lets wait and see.
> Of course, all routers have been stable for the last couple of days
> (including before the weekend) now, so I will let them run for a week
> or so and then report back.
>
> In order to ease testing and make it more controlled, do you have any
> suggestions for how to trigger the error? Is it "just" a timing issue
> or should I be able to trigger it with for example a specific traffic
> pattern?
>
> -Kristian
michael lee Aug. 19, 2017, 3:06 p.m. | #5
Hi Kristian.

does this patch works?

2017-07-24 23:45 GMT+08:00 Mingyu Li <igvtee@gmail.com>:
> i guess more other interrupts maybe cause the problem. because the
> ethernet receive flow is interrupt by other hardware. so use sd card,
> wifi or usb can generate interrupts.
>
> 2017-07-24 17:19 GMT+08:00 Kristian Evensen <kristian.evensen@gmail.com>:
>> Hi,
>>
>> On Mon, Jul 24, 2017 at 4:02 AM, Mingyu Li <igvtee@gmail.com> wrote:
>>> i guest the problem is there are some tx data not free. but tx
>>> interrupt is clean. cause tx timeout. the old code will free data
>>> first then clean interrupt. but there maybe new data arrive after free
>>> data before clean interrupt.
>>> so change it to clean interrupt first then clean all tx data( also
>>> remove the budget limit). if new tx data arrive. hardware will set tx
>>> interrupt flag. then we will free it next time.
>>> i also apply this to rx flow.
>>
>> Thanks for the detailed explanation. I have deployed an image with the
>> patch to some of the routers showing this issue, so lets wait and see.
>> Of course, all routers have been stable for the last couple of days
>> (including before the weekend) now, so I will let them run for a week
>> or so and then report back.
>>
>> In order to ease testing and make it more controlled, do you have any
>> suggestions for how to trigger the error? Is it "just" a timing issue
>> or should I be able to trigger it with for example a specific traffic
>> pattern?
>>
>> -Kristian
John Crispin Aug. 19, 2017, 6:16 p.m. | #6
Hi All,

i have a staged commit on my laptop that makes all the (upstream) 
ethernet fixes that i pushed to mt7623 work on mt7621. please hang on 
for a few more days till i finished testing the support. this will add 
latest upstream ethernet support + DSA

     John


On 19/08/17 17:06, Mingyu Li wrote:
> Hi Kristian.
>
> does this patch works?
>
> 2017-07-24 23:45 GMT+08:00 Mingyu Li <igvtee@gmail.com>:
>> i guess more other interrupts maybe cause the problem. because the
>> ethernet receive flow is interrupt by other hardware. so use sd card,
>> wifi or usb can generate interrupts.
>>
>> 2017-07-24 17:19 GMT+08:00 Kristian Evensen <kristian.evensen@gmail.com>:
>>> Hi,
>>>
>>> On Mon, Jul 24, 2017 at 4:02 AM, Mingyu Li <igvtee@gmail.com> wrote:
>>>> i guest the problem is there are some tx data not free. but tx
>>>> interrupt is clean. cause tx timeout. the old code will free data
>>>> first then clean interrupt. but there maybe new data arrive after free
>>>> data before clean interrupt.
>>>> so change it to clean interrupt first then clean all tx data( also
>>>> remove the budget limit). if new tx data arrive. hardware will set tx
>>>> interrupt flag. then we will free it next time.
>>>> i also apply this to rx flow.
>>> Thanks for the detailed explanation. I have deployed an image with the
>>> patch to some of the routers showing this issue, so lets wait and see.
>>> Of course, all routers have been stable for the last couple of days
>>> (including before the weekend) now, so I will let them run for a week
>>> or so and then report back.
>>>
>>> In order to ease testing and make it more controlled, do you have any
>>> suggestions for how to trigger the error? Is it "just" a timing issue
>>> or should I be able to trigger it with for example a specific traffic
>>> pattern?
>>>
>>> -Kristian
> _______________________________________________
> Lede-dev mailing list
> Lede-dev@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev
John Crispin Aug. 19, 2017, 9:52 p.m. | #7
On 19/08/17 23:13, Kristian Evensen wrote:
> Hi both,
>
> On Sat, 19 Aug 2017 at 20:16, John Crispin <john@phrozen.org 
> <mailto:john@phrozen.org>> wrote:
>
>     Hi All,
>
>     i have a staged commit on my laptop that makes all the (upstream)
>     ethernet fixes that i pushed to mt7623 work on mt7621. please hang on
>     for a few more days till i finished testing the support. this will add
>     latest upstream ethernet support + DSA
>
>
> Thanks for the follow-up Mingyu and the info John. I have not had time 
> to investigate the issue further (holiday backlog ...), but will start 
> working on trying to reproduce it at the end of next week. I have 
> deployed the patch to some routers and have not seen any regressions, 
> but I would like to know how to reliably trigger the issue before 
> concluding :)
>
> John, does your commits include a fix similar to what Mingyu sent me?


with my fixes the mt7623 passes a 48h stress test running the unit on a 
iperf test with 200 parallel flows at full wire speed. once backported 
to mt7621 i am pretty confident that the fix will yield the maximum 
stable performance we can get.
      John

>
> Kristian
>
>
>
>          John
>
>
>     On 19/08/17 17:06, Mingyu Li wrote:
>     > Hi Kristian.
>     >
>     > does this patch works?
>     >
>     > 2017-07-24 23:45 GMT+08:00 Mingyu Li <igvtee@gmail.com
>     <mailto:igvtee@gmail.com>>:
>     >> i guess more other interrupts maybe cause the problem. because the
>     >> ethernet receive flow is interrupt by other hardware. so use sd
>     card,
>     >> wifi or usb can generate interrupts.
>     >>
>     >> 2017-07-24 17:19 GMT+08:00 Kristian Evensen
>     <kristian.evensen@gmail.com <mailto:kristian.evensen@gmail.com>>:
>     >>> Hi,
>     >>>
>     >>> On Mon, Jul 24, 2017 at 4:02 AM, Mingyu Li <igvtee@gmail.com
>     <mailto:igvtee@gmail.com>> wrote:
>     >>>> i guest the problem is there are some tx data not free. but tx
>     >>>> interrupt is clean. cause tx timeout. the old code will free data
>     >>>> first then clean interrupt. but there maybe new data arrive
>     after free
>     >>>> data before clean interrupt.
>     >>>> so change it to clean interrupt first then clean all tx data(
>     also
>     >>>> remove the budget limit). if new tx data arrive. hardware
>     will set tx
>     >>>> interrupt flag. then we will free it next time.
>     >>>> i also apply this to rx flow.
>     >>> Thanks for the detailed explanation. I have deployed an image
>     with the
>     >>> patch to some of the routers showing this issue, so lets wait
>     and see.
>     >>> Of course, all routers have been stable for the last couple of
>     days
>     >>> (including before the weekend) now, so I will let them run for
>     a week
>     >>> or so and then report back.
>     >>>
>     >>> In order to ease testing and make it more controlled, do you
>     have any
>     >>> suggestions for how to trigger the error? Is it "just" a
>     timing issue
>     >>> or should I be able to trigger it with for example a specific
>     traffic
>     >>> pattern?
>     >>>
>     >>> -Kristian
>     > _______________________________________________
>     > Lede-dev mailing list
>     > Lede-dev@lists.infradead.org <mailto:Lede-dev@lists.infradead.org>
>     > http://lists.infradead.org/mailman/listinfo/lede-dev
>
John Crispin Aug. 19, 2017, 10:30 p.m. | #8
On 20/08/17 00:07, Kristian Evensen wrote:
>
> On Sat, 19 Aug 2017 at 23:52, John Crispin <john@phrozen.org 
> <mailto:john@phrozen.org>> wrote:
>
>
>
>     On 19/08/17 23:13, Kristian Evensen wrote:
>     > Hi both,
>     >
>     > On Sat, 19 Aug 2017 at 20:16, John Crispin <john@phrozen.org
>     <mailto:john@phrozen.org>
>     > <mailto:john@phrozen.org <mailto:john@phrozen.org>>> wrote:
>     >
>     >     Hi All,
>     >
>     >     i have a staged commit on my laptop that makes all the
>     (upstream)
>     >     ethernet fixes that i pushed to mt7623 work on mt7621.
>     please hang on
>     >     for a few more days till i finished testing the support.
>     this will add
>     >     latest upstream ethernet support + DSA
>     >
>     >
>     > Thanks for the follow-up Mingyu and the info John. I have not
>     had time
>     > to investigate the issue further (holiday backlog ...), but will
>     start
>     > working on trying to reproduce it at the end of next week. I have
>     > deployed the patch to some routers and have not seen any
>     regressions,
>     > but I would like to know how to reliably trigger the issue before
>     > concluding :)
>     >
>     > John, does your commits include a fix similar to what Mingyu
>     sent me?
>
>
>     with my fixes the mt7623 passes a 48h stress test running the unit
>     on a
>     iperf test with 200 parallel flows at full wire speed. once backported
>     to mt7621 i am pretty confident that the fix will yield the maximum
>     stable performance we can get.
>
>
> Thanks! I will focus on finding a way to reproduce the issue then, and 
> then test Mingyu and your patches. Out of curiosity, when you say 
> maximum stable performance, does that mean that the hwnat will also be 
> backported?
>
> Kristian
>

correct, in my testing i have been ... with 200 parallel flows ... on 
MT7623, we'll have to find out what mt7621 can achieve ... this is all 
using hwnat ...
1) tcp - at 50 byte frames i am able to pass 720 MBit which is > 1M FPS
2) udp - at 128 byte frames i am able to pass ~450k FPS at ~10% packet 
loss .. at near wirespeed

in a nutshell ... UDP has no TC. due to this, the lower the frame size, 
the higher the packet loss. the HW NAT will assert the FC bit inside the 
GMAC. when using TCP this will cause back pressure to make the OS stall 
the connection and reduce max throughput. in contrast, when using UDP 
you'll see packet loss go up instead of dropping throughput as there is 
no TC.

also i have managed to make HW QoS work, still working on the best way 
to integrate this with fw3. HW QoS doe perform remarkably well on 
mt7623. when saturating the link doing lan->wan traffic i am able to ssh 
into the unit and only have a slight subjective increase in latency.

    John


>
>           John
>
>     >
>     > Kristian
>     >
>     >
>     >
>     >          John
>     >
>     >
>     >     On 19/08/17 17:06, Mingyu Li wrote:
>     >     > Hi Kristian.
>     >     >
>     >     > does this patch works?
>     >     >
>     >     > 2017-07-24 23:45 GMT+08:00 Mingyu Li <igvtee@gmail.com
>     <mailto:igvtee@gmail.com>
>     >     <mailto:igvtee@gmail.com <mailto:igvtee@gmail.com>>>:
>     >     >> i guess more other interrupts maybe cause the problem.
>     because the
>     >     >> ethernet receive flow is interrupt by other hardware. so
>     use sd
>     >     card,
>     >     >> wifi or usb can generate interrupts.
>     >     >>
>     >     >> 2017-07-24 17:19 GMT+08:00 Kristian Evensen
>     >     <kristian.evensen@gmail.com
>     <mailto:kristian.evensen@gmail.com>
>     <mailto:kristian.evensen@gmail.com
>     <mailto:kristian.evensen@gmail.com>>>:
>     >     >>> Hi,
>     >     >>>
>     >     >>> On Mon, Jul 24, 2017 at 4:02 AM, Mingyu Li
>     <igvtee@gmail.com <mailto:igvtee@gmail.com>
>     >     <mailto:igvtee@gmail.com <mailto:igvtee@gmail.com>>> wrote:
>     >     >>>> i guest the problem is there are some tx data not free.
>     but tx
>     >     >>>> interrupt is clean. cause tx timeout. the old code will
>     free data
>     >     >>>> first then clean interrupt. but there maybe new data arrive
>     >     after free
>     >     >>>> data before clean interrupt.
>     >     >>>> so change it to clean interrupt first then clean all tx
>     data(
>     >     also
>     >     >>>> remove the budget limit). if new tx data arrive. hardware
>     >     will set tx
>     >     >>>> interrupt flag. then we will free it next time.
>     >     >>>> i also apply this to rx flow.
>     >     >>> Thanks for the detailed explanation. I have deployed an
>     image
>     >     with the
>     >     >>> patch to some of the routers showing this issue, so lets
>     wait
>     >     and see.
>     >     >>> Of course, all routers have been stable for the last
>     couple of
>     >     days
>     >     >>> (including before the weekend) now, so I will let them
>     run for
>     >     a week
>     >     >>> or so and then report back.
>     >     >>>
>     >     >>> In order to ease testing and make it more controlled, do you
>     >     have any
>     >     >>> suggestions for how to trigger the error? Is it "just" a
>     >     timing issue
>     >     >>> or should I be able to trigger it with for example a
>     specific
>     >     traffic
>     >     >>> pattern?
>     >     >>>
>     >     >>> -Kristian
>     >     > _______________________________________________
>     >     > Lede-dev mailing list
>     >     > Lede-dev@lists.infradead.org
>     <mailto:Lede-dev@lists.infradead.org>
>     <mailto:Lede-dev@lists.infradead.org
>     <mailto:Lede-dev@lists.infradead.org>>
>     >     > http://lists.infradead.org/mailman/listinfo/lede-dev
>     >
>
Florian Fainelli Aug. 19, 2017, 11:13 p.m. | #9
On 08/19/2017 03:30 PM, John Crispin wrote:
> 
> 
> On 20/08/17 00:07, Kristian Evensen wrote:
>>
>> On Sat, 19 Aug 2017 at 23:52, John Crispin <john@phrozen.org
>> <mailto:john@phrozen.org>> wrote:
>>
>>
>>
>>     On 19/08/17 23:13, Kristian Evensen wrote:
>>     > Hi both,
>>     >
>>     > On Sat, 19 Aug 2017 at 20:16, John Crispin <john@phrozen.org
>>     <mailto:john@phrozen.org>
>>     > <mailto:john@phrozen.org <mailto:john@phrozen.org>>> wrote:
>>     >
>>     >     Hi All,
>>     >
>>     >     i have a staged commit on my laptop that makes all the
>>     (upstream)
>>     >     ethernet fixes that i pushed to mt7623 work on mt7621.
>>     please hang on
>>     >     for a few more days till i finished testing the support.
>>     this will add
>>     >     latest upstream ethernet support + DSA
>>     >
>>     >
>>     > Thanks for the follow-up Mingyu and the info John. I have not
>>     had time
>>     > to investigate the issue further (holiday backlog ...), but will
>>     start
>>     > working on trying to reproduce it at the end of next week. I have
>>     > deployed the patch to some routers and have not seen any
>>     regressions,
>>     > but I would like to know how to reliably trigger the issue before
>>     > concluding :)
>>     >
>>     > John, does your commits include a fix similar to what Mingyu
>>     sent me?
>>
>>
>>     with my fixes the mt7623 passes a 48h stress test running the unit
>>     on a
>>     iperf test with 200 parallel flows at full wire speed. once
>> backported
>>     to mt7621 i am pretty confident that the fix will yield the maximum
>>     stable performance we can get.
>>
>>
>> Thanks! I will focus on finding a way to reproduce the issue then, and
>> then test Mingyu and your patches. Out of curiosity, when you say
>> maximum stable performance, does that mean that the hwnat will also be
>> backported?
>>
>> Kristian
>>
> 
> correct, in my testing i have been ... with 200 parallel flows ... on
> MT7623, we'll have to find out what mt7621 can achieve ... this is all
> using hwnat ...
> 1) tcp - at 50 byte frames i am able to pass 720 MBit which is > 1M FPS
> 2) udp - at 128 byte frames i am able to pass ~450k FPS at ~10% packet
> loss .. at near wirespeed
> 
> in a nutshell ... UDP has no TC. due to this, the lower the frame size,
> the higher the packet loss. the HW NAT will assert the FC bit inside the
> GMAC. when using TCP this will cause back pressure to make the OS stall
> the connection and reduce max throughput. in contrast, when using UDP
> you'll see packet loss go up instead of dropping throughput as there is
> no TC.
> 
> also i have managed to make HW QoS work, still working on the best way
> to integrate this with fw3. HW QoS doe perform remarkably well on
> mt7623. when saturating the link doing lan->wan traffic i am able to ssh
> into the unit and only have a slight subjective increase in latency.

Nice, do you think this could be interfaced with the TC offloads that
the newer kernels support? What kind of HW QoS can you configure on MT7623?
John Crispin Aug. 19, 2017, 11:24 p.m. | #10
On 20/08/17 01:13, Florian Fainelli wrote:
>
> On 08/19/2017 03:30 PM, John Crispin wrote:
>>
>> On 20/08/17 00:07, Kristian Evensen wrote:
>>> On Sat, 19 Aug 2017 at 23:52, John Crispin <john@phrozen.org
>>> <mailto:john@phrozen.org>> wrote:
>>>
>>>
>>>
>>>      On 19/08/17 23:13, Kristian Evensen wrote:
>>>      > Hi both,
>>>      >
>>>      > On Sat, 19 Aug 2017 at 20:16, John Crispin <john@phrozen.org
>>>      <mailto:john@phrozen.org>
>>>      > <mailto:john@phrozen.org <mailto:john@phrozen.org>>> wrote:
>>>      >
>>>      >     Hi All,
>>>      >
>>>      >     i have a staged commit on my laptop that makes all the
>>>      (upstream)
>>>      >     ethernet fixes that i pushed to mt7623 work on mt7621.
>>>      please hang on
>>>      >     for a few more days till i finished testing the support.
>>>      this will add
>>>      >     latest upstream ethernet support + DSA
>>>      >
>>>      >
>>>      > Thanks for the follow-up Mingyu and the info John. I have not
>>>      had time
>>>      > to investigate the issue further (holiday backlog ...), but will
>>>      start
>>>      > working on trying to reproduce it at the end of next week. I have
>>>      > deployed the patch to some routers and have not seen any
>>>      regressions,
>>>      > but I would like to know how to reliably trigger the issue before
>>>      > concluding :)
>>>      >
>>>      > John, does your commits include a fix similar to what Mingyu
>>>      sent me?
>>>
>>>
>>>      with my fixes the mt7623 passes a 48h stress test running the unit
>>>      on a
>>>      iperf test with 200 parallel flows at full wire speed. once
>>> backported
>>>      to mt7621 i am pretty confident that the fix will yield the maximum
>>>      stable performance we can get.
>>>
>>>
>>> Thanks! I will focus on finding a way to reproduce the issue then, and
>>> then test Mingyu and your patches. Out of curiosity, when you say
>>> maximum stable performance, does that mean that the hwnat will also be
>>> backported?
>>>
>>> Kristian
>>>
>> correct, in my testing i have been ... with 200 parallel flows ... on
>> MT7623, we'll have to find out what mt7621 can achieve ... this is all
>> using hwnat ...
>> 1) tcp - at 50 byte frames i am able to pass 720 MBit which is > 1M FPS
>> 2) udp - at 128 byte frames i am able to pass ~450k FPS at ~10% packet
>> loss .. at near wirespeed
>>
>> in a nutshell ... UDP has no TC. due to this, the lower the frame size,
>> the higher the packet loss. the HW NAT will assert the FC bit inside the
>> GMAC. when using TCP this will cause back pressure to make the OS stall
>> the connection and reduce max throughput. in contrast, when using UDP
>> you'll see packet loss go up instead of dropping throughput as there is
>> no TC.
>>
>> also i have managed to make HW QoS work, still working on the best way
>> to integrate this with fw3. HW QoS doe perform remarkably well on
>> mt7623. when saturating the link doing lan->wan traffic i am able to ssh
>> into the unit and only have a slight subjective increase in latency.
> Nice, do you think this could be interfaced with the TC offloads that
> the newer kernels support? What kind of HW QoS can you configure on MT7623?
Hi Florian,

there are 2 tasks

1) once pablo finished his flow table offloading i will invest private 
resources to get HW NAT code for QCA8k and mt7530 upstreamed. i spoke 
with pablo and my current understanding is that his patches will 
accommodate both silicons. once his patches are public we'll know more ...
2) figure out how to get HW QoS upstream ... on mediatek i have fully 
rev'ed the silicon and already looked at TC integration and believe we 
can make it work with those new patches. regarding qca8k i am not fully 
sure yet that this is possible due to how the silicon works .. might 
require some patching .. TBD

     John
Kristian Evensen Aug. 25, 2017, 2:25 p.m. | #11
Hi all,

On Sun, Aug 20, 2017 at 12:30 AM, John Crispin <john@phrozen.org> wrote:
> correct, in my testing i have been ... with 200 parallel flows ... on
> MT7623, we'll have to find out what mt7621 can achieve ... this is all using
> hwnat ...
> 1) tcp - at 50 byte frames i am able to pass 720 MBit which is > 1M FPS
> 2) udp - at 128 byte frames i am able to pass ~450k FPS at ~10% packet loss
> .. at near wirespeed

I have spent the last two days looking into this. My testing was based
on LEDE master as of yesterday morning and my initial test setup was
the following:

Server (Intel NUC) <-> Gbit Switch <-> ZBT 2926 <-> Client

The switch was tested and confirmed working at gigabit speeds. I used
iperf for my tests, with a payload of 100B and configured port
forwarding of UDP port 1203 from ZBT to client. I then ran the
following command on the NUC in a loop:

iperf -u -c 10.1.2.63 -t 3600 -d -p 1203 -l 100B -b 1000M

I left the test running over night (around 16 hours of pushing data),
but no error had been triggered as of this morning. Using bwm-ng, I
saw that the NUC was able to push around 40 Mbit/s, which, based on
earlier tests I have done where I have used the NUC as traffic
generator, seemed a bit low. I don't know if it is relevant, but when
capturing traffic (on both NUC and client) I saw pause packets quite
frequently.

Since this tests did not yield any result, and throughput was low, I
looked at some of the setups where I have seen this error. In all
setups, there is always something placed in front of the 2926 (a
router, switch, ...). I therefore modified my test setup to be as
follows:

Server (Intel NUC) <-> Gbit Switch <-> ZBT 2926 #1 <-> ZBT 2926 #2 <-> Client

I forwarded port 1203 on the new ZBT router and repeated the
experiment. Using this setup, the NUC pushed about 260Mbit/s and I am
reliably able to trigger the error within ~1000 seconds. The error is
always seen on ZBT #1, and sometimes on ZBT #2. If I see the error on
#2 it is always at a later time than #1, so it seems that the two
routers somehow affect each other. When looking at the RX bandwidth on
the client (using bwm-ng), I see that it is very bursty. I receive
data at about 32Mbit/s, then no data for a while, then back to around
32 Mbit/s, and so on, until the error is triggered and switch (TX) on
the router(s) die. Pause frames are also seen on both server and
client in this experiment.

After having found a way to reliably trigger the issue, I tested the
patch provided by Mingyu. With this patch, the error is triggered much
faster, usually after around 300 seconds.

Mingyu, do you have any other ideas on what could be wrong or how to fix this?

John, would it be possible to get access to your staged commit, so
that I can repeat the test using your new code?

Thanks for all the help,
Kristian
michael lee Aug. 26, 2017, 5:43 a.m. | #12
Hi.

i check the code again. found xmit_more can cause tx timeout. you can
reference this.
https://www.mail-archive.com/netdev@vger.kernel.org/msg123334.html
so the patch should be like this. edit mtk_eth_soc.c

        tx_num = fe_cal_txd_req(skb);
        if (unlikely(fe_empty_txd(ring) <= tx_num)) {
+                if (skb->xmit_more)
+                        fe_reg_w32(ring->tx_next_idx, FE_REG_TX_CTX_IDX0);
                netif_stop_queue(dev);
                netif_err(priv, tx_queued, dev,
                          "Tx Ring full when queue awake!\n");

but i am not sure. maybe the pause frame cause the problem.

2017-08-25 22:25 GMT+08:00 Kristian Evensen <kristian.evensen@gmail.com>:
> Hi all,
>
> On Sun, Aug 20, 2017 at 12:30 AM, John Crispin <john@phrozen.org> wrote:
>> correct, in my testing i have been ... with 200 parallel flows ... on
>> MT7623, we'll have to find out what mt7621 can achieve ... this is all using
>> hwnat ...
>> 1) tcp - at 50 byte frames i am able to pass 720 MBit which is > 1M FPS
>> 2) udp - at 128 byte frames i am able to pass ~450k FPS at ~10% packet loss
>> .. at near wirespeed
>
> I have spent the last two days looking into this. My testing was based
> on LEDE master as of yesterday morning and my initial test setup was
> the following:
>
> Server (Intel NUC) <-> Gbit Switch <-> ZBT 2926 <-> Client
>
> The switch was tested and confirmed working at gigabit speeds. I used
> iperf for my tests, with a payload of 100B and configured port
> forwarding of UDP port 1203 from ZBT to client. I then ran the
> following command on the NUC in a loop:
>
> iperf -u -c 10.1.2.63 -t 3600 -d -p 1203 -l 100B -b 1000M
>
> I left the test running over night (around 16 hours of pushing data),
> but no error had been triggered as of this morning. Using bwm-ng, I
> saw that the NUC was able to push around 40 Mbit/s, which, based on
> earlier tests I have done where I have used the NUC as traffic
> generator, seemed a bit low. I don't know if it is relevant, but when
> capturing traffic (on both NUC and client) I saw pause packets quite
> frequently.
>
> Since this tests did not yield any result, and throughput was low, I
> looked at some of the setups where I have seen this error. In all
> setups, there is always something placed in front of the 2926 (a
> router, switch, ...). I therefore modified my test setup to be as
> follows:
>
> Server (Intel NUC) <-> Gbit Switch <-> ZBT 2926 #1 <-> ZBT 2926 #2 <-> Client
>
> I forwarded port 1203 on the new ZBT router and repeated the
> experiment. Using this setup, the NUC pushed about 260Mbit/s and I am
> reliably able to trigger the error within ~1000 seconds. The error is
> always seen on ZBT #1, and sometimes on ZBT #2. If I see the error on
> #2 it is always at a later time than #1, so it seems that the two
> routers somehow affect each other. When looking at the RX bandwidth on
> the client (using bwm-ng), I see that it is very bursty. I receive
> data at about 32Mbit/s, then no data for a while, then back to around
> 32 Mbit/s, and so on, until the error is triggered and switch (TX) on
> the router(s) die. Pause frames are also seen on both server and
> client in this experiment.
>
> After having found a way to reliably trigger the issue, I tested the
> patch provided by Mingyu. With this patch, the error is triggered much
> faster, usually after around 300 seconds.
>
> Mingyu, do you have any other ideas on what could be wrong or how to fix this?
>
> John, would it be possible to get access to your staged commit, so
> that I can repeat the test using your new code?
>
> Thanks for all the help,
> Kristian
Kristian Evensen Aug. 26, 2017, 10:38 a.m. | #13
Hi,

On Sat, Aug 26, 2017 at 7:43 AM, Mingyu Li <igvtee@gmail.com> wrote:
> Hi.
>
> i check the code again. found xmit_more can cause tx timeout. you can
> reference this.
> https://www.mail-archive.com/netdev@vger.kernel.org/msg123334.html
> so the patch should be like this. edit mtk_eth_soc.c
>
>         tx_num = fe_cal_txd_req(skb);
>         if (unlikely(fe_empty_txd(ring) <= tx_num)) {
> +                if (skb->xmit_more)
> +                        fe_reg_w32(ring->tx_next_idx, FE_REG_TX_CTX_IDX0);
>                 netif_stop_queue(dev);
>                 netif_err(priv, tx_queued, dev,
>                           "Tx Ring full when queue awake!\n");
>
> but i am not sure. maybe the pause frame cause the problem.

Thanks for the patch. I tested it, but I unfortunately still see the
error. I also added a print-statement inside the conditional and can
see that the condition is never hit. I also don't see the "Tx Ring
full"-message. One difference which I noticed now though, is that I
don't see the bursty bandwidth pattern I described earlier (32, 0, 32,
0, ...). With your patch, it is always 32, 0, crash.

-Kristian

Patch

--- a/mtk_eth_soc.c     2017-07-22 08:13:52.845251484 +0800
+++ b/mtk_eth_soc.c     2017-07-23 22:38:37.746471417 +0800
@@ -810,6 +810,8 @@ 
        u8 *data, *new_data;
        struct fe_rx_dma *rxd, trxd;
        int done = 0, pad;
+       u32 hwidx;
+       int cnt;

        if (netdev->features & NETIF_F_RXCSUM)
                checksum_bit = soc->checksum_bit;
@@ -821,6 +823,12 @@ 
        else
                pad = NET_IP_ALIGN;

+       /* when rx count more the budget size not clear interrupt */
+       hwidx = fe_reg_r32(FE_REG_RX_DRX_IDX0);
+       cnt = ((hwidx - idx) & (ring->rx_ring_size - 1)) - 1;
+       if (cnt < budget)
+               fe_reg_w32(rx_intr, FE_REG_FE_INT_STATUS);
+
        while (done < budget) {
                unsigned int pktlen;
                dma_addr_t dma_addr;
@@ -890,14 +898,10 @@ 
                done++;
        }

-       if (done < budget)
-               fe_reg_w32(rx_intr, FE_REG_FE_INT_STATUS);
-
        return done;
 }

-static int fe_poll_tx(struct fe_priv *priv, int budget, u32 tx_intr,
-                     int *tx_again)
+static int fe_poll_tx(struct fe_priv *priv)
 {
        struct net_device *netdev = priv->netdev;
        struct device *dev = &netdev->dev;
@@ -911,7 +915,7 @@ 
        idx = ring->tx_free_idx;
        hwidx = fe_reg_r32(FE_REG_TX_DTX_IDX0);

-       while ((idx != hwidx) && budget) {
+       while (idx != hwidx) {
                tx_buf = &ring->tx_buf[idx];
                skb = tx_buf->skb;

@@ -921,24 +925,12 @@ 
                if (skb != (struct sk_buff *)DMA_DUMMY_DESC) {
                        bytes_compl += skb->len;
                        done++;
-                       budget--;
                }
                fe_txd_unmap(dev, tx_buf);
                idx = NEXT_TX_DESP_IDX(idx);
        }
        ring->tx_free_idx = idx;

-       if (idx == hwidx) {
-               /* read hw index again make sure no new tx packet */
-               hwidx = fe_reg_r32(FE_REG_TX_DTX_IDX0);
-               if (idx == hwidx)
-                       fe_reg_w32(tx_intr, FE_REG_FE_INT_STATUS);
-               else
-                       *tx_again = 1;
-       } else {
-               *tx_again = 1;
-       }
-
        if (done) {
                netdev_completed_queue(netdev, done, bytes_compl);
                smp_mb();
@@ -954,7 +946,7 @@ 
 {
        struct fe_priv *priv = container_of(napi, struct fe_priv, rx_napi);
        struct fe_hw_stats *hwstat = priv->hw_stats;
-       int tx_done, rx_done, tx_again;
+       int tx_done, rx_done;
        u32 status, fe_status, status_reg, mask;
        u32 tx_intr, rx_intr, status_intr;

@@ -965,7 +957,6 @@ 
        status_intr = priv->soc->status_int;
        tx_done = 0;
        rx_done = 0;
-       tx_again = 0;

        if (fe_reg_table[FE_REG_FE_INT_STATUS2]) {
                fe_status = fe_reg_r32(FE_REG_FE_INT_STATUS2);
@@ -974,8 +965,11 @@ 
                status_reg = FE_REG_FE_INT_STATUS;
        }

-       if (status & tx_intr)
-               tx_done = fe_poll_tx(priv, budget, tx_intr, &tx_again);
+       if (status & tx_intr) {
+               fe_reg_w32(tx_intr, FE_REG_FE_INT_STATUS);
+               tx_done = fe_poll_tx(priv);
+               status = fe_reg_r32(FE_REG_FE_INT_STATUS);
+       }

        if (status & rx_intr)
                rx_done = fe_poll_rx(napi, budget, priv, rx_intr);
@@ -995,7 +989,7 @@ 
                            tx_done, rx_done, status, mask);
        }

-       if (!tx_again && (rx_done < budget)) {
+       if (rx_done < budget) {
                status = fe_reg_r32(FE_REG_FE_INT_STATUS);
                if (status & (tx_intr | rx_intr)) {
                        /* let napi poll again */