diff mbox

[net,2/3] net: cdc_mbim: send ZLP after max sized NTBs

Message ID 1358783440-11459-3-git-send-email-bjorn@mork.no
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Bjørn Mork Jan. 21, 2013, 3:50 p.m. UTC
We normally avoid sending ZLPs by padding NTBs with a zero byte
if the NTB is shorter than dwNtbOutMaxSize, resulting in a short
USB packet instead of a ZLP.  But in the case where the NTB length
is exactly dwNtbOutMaxSize and this is an exact multiplum of
wMaxPacketSize, then we must send a ZLP.

This fixes an issue seen on a Sierra Wireless MC7710 device
where the transmission would fail whenever we ended up padding
the NTBs to max size.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
---
 drivers/net/usb/cdc_mbim.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Alexey ORISHKO Jan. 21, 2013, 4:31 p.m. UTC | #1
> -----Original Message-----

> From: Bjørn Mork [mailto:bjorn@mork.no]

> 

> We normally avoid sending ZLPs by padding NTBs with a zero byte if the

> NTB is shorter than dwNtbOutMaxSize, resulting in a short USB packet

> instead of a ZLP.  But in the case where the NTB length is exactly

> dwNtbOutMaxSize and this is an exact multiplum of wMaxPacketSize, then

> we must send a ZLP.

> 

> This fixes an issue seen on a Sierra Wireless MC7710 device where the

> transmission would fail whenever we ended up padding the NTBs to max

> size.


If you add ZLP for NTBs of dwNtbOutMaxSize, you are heavily affecting CPU
load, increasing interrupt load by factor of 2 in high load traffic
scenario and possibly decreasing throughput for all other devices
which behaves correctly. 

Why should others pay for Sierra Wireless fault?

At least, do check VID/PID in usbnet and send ZLP only for malfunctioning 
devices.

Regards,
Alexey
Yauheni Kaliuta Jan. 21, 2013, 4:56 p.m. UTC | #2
Hi, Bjørn!

>>>>> "BM" == Bjørn Mork writes:

 > We normally avoid sending ZLPs by padding NTBs with a zero byte
 > if the NTB is shorter than dwNtbOutMaxSize, resulting in a short
 > USB packet instead of a ZLP.  But in the case where the NTB length
 > is exactly dwNtbOutMaxSize and this is an exact multiplum of
 > wMaxPacketSize, then we must send a ZLP.

The idea of NCM was to avoid extra ZLPs. If your transfer is exactly
dwNtbOutMaxSize, it's known, you can submit such request on the receiver
side and you do not need any EOT indicatation, so the frametime can be
used for useful data.

I didn't check MBIM specs, but I guess, it wasn't changed. But better get
Alexey's answer for sure.

 > This fixes an issue seen on a Sierra Wireless MC7710 device
 > where the transmission would fail whenever we ended up padding
 > the NTBs to max size.

Is it buggy?

 > Signed-off-by: Bjørn Mork <bjorn@mork.no>
 > ---
 >  drivers/net/usb/cdc_mbim.c |    2 +-
 >  1 file changed, 1 insertion(+), 1 deletion(-)

 > diff --git a/drivers/net/usb/cdc_mbim.c b/drivers/net/usb/cdc_mbim.c
 > index 42f51c7..3a5673a 100644
 > --- a/drivers/net/usb/cdc_mbim.c
 > +++ b/drivers/net/usb/cdc_mbim.c
 > @@ -366,7 +366,7 @@ err:
 >  
 >  static const struct driver_info cdc_mbim_info = {
 >          .description = "CDC MBIM",
 > -        .flags = FLAG_NO_SETINT | FLAG_MULTI_PACKET | FLAG_WWAN,
 > +        .flags = FLAG_NO_SETINT | FLAG_MULTI_PACKET | FLAG_WWAN | FLAG_SEND_ZLP,
 >          .bind = cdc_mbim_bind,
 >          .unbind = cdc_mbim_unbind,
 >          .manage_power = cdc_mbim_manage_power,
 > -- 
 > 1.7.10.4
Bjørn Mork Jan. 21, 2013, 10:01 p.m. UTC | #3
Yauheni Kaliuta <y.kaliuta@gmail.com> writes:
>>>>>> "BM" == Bjørn Mork writes:
>
>  > We normally avoid sending ZLPs by padding NTBs with a zero byte
>  > if the NTB is shorter than dwNtbOutMaxSize, resulting in a short
>  > USB packet instead of a ZLP.  But in the case where the NTB length
>  > is exactly dwNtbOutMaxSize and this is an exact multiplum of
>  > wMaxPacketSize, then we must send a ZLP.
>
> The idea of NCM was to avoid extra ZLPs. If your transfer is exactly
> dwNtbOutMaxSize, it's known, you can submit such request on the receiver
> side and you do not need any EOT indicatation, so the frametime can be
> used for useful data.

Yes, that makes sense.  And I understand that both you and Alexey are of
this opinion.

But this idea is by no means made clear (to me) in the spec.  I do not
think the current wording is precise enough to expect every implementor
to understand any such intent.  The only place I find either "short
packet" or ZLP mentioned in the NCM spec is in tables 3-1 and 3-2,
describing the 16bit and 32bit NTH formats, in the description of the
(d)wBlockLength fields.  And even there it is only mentioned in the
context of the special (d)wBlockLength = 0x0000 handling.

If the intent was to prevent ZLPs, then it would have been wise to write
that in the NCM standard. As it stands you have to use a lot of
imagination to read that intent into the current spec.

> I didn't check MBIM specs, but I guess, it wasn't changed. But better get
> Alexey's answer for sure.

No, there are no changes to this area in the MBIM spec AFAIK, so
whatever is correct for NCM is also correct for MBIM.  The question is
what is correct for NCM.

>  > This fixes an issue seen on a Sierra Wireless MC7710 device
>  > where the transmission would fail whenever we ended up padding
>  > the NTBs to max size.
>
> Is it buggy?

That is not unlikely. However, if so then it is buggy in a way we just
have to deal with because Windows does...

But before claiming this to be a bug I would like to understand how we
have come to this conclusion that NCM and MBIM devices should not need
ZLPs.  I agree that it would have made sense to have such a requirement,
but I cannot find it in the spec.

I would also like to know how Windows works around this issue, because I
am pretty confident that this device works with Windows 8.  If it didn't
then the firmware wouldn't have been flagged for release, and it is.  It
is even distributed by 3rd party integrators (aka laptop vendors).



Bjørn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjørn Mork Jan. 21, 2013, 10:36 p.m. UTC | #4
Alexey ORISHKO <alexey.orishko@stericsson.com> writes:

>> -----Original Message-----
>> From: Bjørn Mork [mailto:bjorn@mork.no]
>> 
>> We normally avoid sending ZLPs by padding NTBs with a zero byte if the
>> NTB is shorter than dwNtbOutMaxSize, resulting in a short USB packet
>> instead of a ZLP.  But in the case where the NTB length is exactly
>> dwNtbOutMaxSize and this is an exact multiplum of wMaxPacketSize, then
>> we must send a ZLP.
>> 
>> This fixes an issue seen on a Sierra Wireless MC7710 device where the
>> transmission would fail whenever we ended up padding the NTBs to max
>> size.
>
> If you add ZLP for NTBs of dwNtbOutMaxSize, you are heavily affecting CPU
> load, increasing interrupt load by factor of 2 in high load traffic
> scenario and possibly decreasing throughput for all other devices
> which behaves correctly. 
>
> Why should others pay for Sierra Wireless fault?

No, they shouldn't.

> At least, do check VID/PID in usbnet and send ZLP only for malfunctioning 
> devices.

Yes, this would be easy to implement by simply creating a vendor/device
specific entry in cdc_mbim.

But I do worry that this isn't Sierra Wireless fault, but Qualcomms
fault.  SW are using Qualcomm firmware to do the low level stuff, and I
know the Qualcomm firmware implement MBIM functionality so it may very
well be the one to blame (I don't know this for sure, and I have no idea
how to verify it). If this is correct than we can expect hundreds of
devices needing a device specific entry in a year or two. That's not
what I was looking forward to with a class driver like cdc_mbim.

And I still believe that there is reason to ask if this is really a
fault at all.


I don't know if this is particularily useful, but here is an usbmon
extract from the driver without the extra ZLP fix:

ffff88022aef3d40 3883111197 S Bo:8:005:1 -115 4096 = 4e434d48 0c000600 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff880213f2d2c0 3883111233 S Bi:8:005:2 -115 16384 <
ffff8802274eab40 3883111255 S Bi:8:005:2 -115 16384 <
ffff88022fd65980 3883111276 S Bi:8:005:2 -115 16384 <
ffff880204351440 3883111297 S Bi:8:005:2 -115 16384 <
ffff880213f2d8c0 3883111317 S Bi:8:005:2 -115 16384 <
ffff88022aef3d40 3883111460 C Bo:8:005:1 0 4096 >
ffff880204351e00 3883965516 S Bo:8:005:1 -115 4096 = 4e434d48 0c000700 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff880204351e00 3883965849 C Bo:8:005:1 0 4096 >
ffff880204351e00 3884973428 S Bo:8:005:1 -115 4096 = 4e434d48 0c000800 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff880204351e00 3884973752 C Bo:8:005:1 0 4096 >
ffff880204351d40 3885981392 S Bo:8:005:1 -115 4096 = 4e434d48 0c000900 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff880204351d40 3885981638 C Bo:8:005:1 0 4096 >
ffff880231b128c0 3886989391 S Bo:8:005:1 -115 4096 = 4e434d48 0c000a00 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff880231b128c0 3886989622 C Bo:8:005:1 0 4096 >
ffff8802041d8a40 3887997401 S Bo:8:005:1 -115 4096 = 4e434d48 0c000b00 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff8802041d8a40 3887997675 C Bo:8:005:1 0 4096 >
ffff8802041d8a40 3889005720 S Bo:8:005:1 -115 4096 = 4e434d48 0c000c00 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff8802041d8a40 3889006019 C Bo:8:005:1 0 4096 >
ffff8802041d8c80 3890013428 S Bo:8:005:1 -115 4096 = 4e434d48 0c000d00 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff8802041d8c80 3890013784 C Bo:8:005:1 0 4096 >
ffff880213f2d2c0 3892433962 C Bi:8:005:2 -104 0
ffff8802274eab40 3892434032 C Bi:8:005:2 -104 0
ffff88022fd65980 3892434078 C Bi:8:005:2 -104 0
ffff880204351440 3892434126 C Bi:8:005:2 -104 0
ffff880213f2d8c0 3892434172 C Bi:8:005:2 -104 0

same driver, with smaller packets keeping the NTB size below the magic
limit:

ffff88022b1de200 3896794916 S Bi:8:005:2 -115 16384 <
ffff88022767d780 3896794928 C Bo:8:005:1 0 262 >
ffff88022b1de140 3896794948 S Bi:8:005:2 -115 16384 <
ffff88022767d780 3896794971 S Bi:8:005:2 -115 16384 <
ffff88022b1deb00 3896794995 S Bi:8:005:2 -115 16384 <
ffff88022767d900 3896864077 C Bi:8:005:2 0 416 = 4e434d48 0c003500 a0010c00 49505300 10000000 1c007a01 00000000 4500017a
ffff88022767d900 3896864129 S Bi:8:005:2 -115 16384 <
ffff88022d639e40 3897644729 S Bo:8:005:1 -115 262 = 4e434d48 0c000f00 06010c00 49505300 10000000 b8004e00 00000000 00000000
ffff88022d639e40 3897644965 C Bo:8:005:1 0 262 >
ffff88022b1de380 3897711770 C Bi:8:005:2 0 112 = 4e434d48 0c003600 70000c00 49505300 10000000 1c004e00 00000000 4500004e
ffff88022b1de380 3897711812 S Bi:8:005:2 -115 16384 <
ffff88022b07c600 3898646616 S Bo:8:005:1 -115 262 = 4e434d48 0c001000 06010c00 49505300 10000000 b8004e00 00000000 00000000
ffff88022b07c600 3898646848 C Bo:8:005:1 0 262 >
ffff88022b1de200 3898697788 C Bi:8:005:2 0 112 = 4e434d48 0c003700 70000c00 49505300 10000000 1c004e00 00000000 4500004e
ffff88022b1de200 3898697829 S Bi:8:005:2 -115 16384 <
ffff88022b1de140 3901425927 C Bi:8:005:2 -104 0


driver with ZLP:

ffff8802041d82c0 3651255294 C Bo:8:005:1 0 4096 >
ffff88021a3ef380 3652103000 S Bo:8:005:1 -115 4096 = 4e434d48 0c003200 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff88021a3ef380 3652103315 C Bo:8:005:1 0 4096 >
ffff8802041af540 3652439129 C Bi:8:005:2 0 416 = 4e434d48 0c002c00 a0010c00 49505300 10000000 1c007a01 00000000 4500017a
ffff8802041af540 3652439183 S Bi:8:005:2 -115 16384 <
ffff8802041af240 3653099148 C Bi:8:005:2 0 416 = 4e434d48 0c002d00 a0010c00 49505300 10000000 1c007a01 00000000 4500017a
ffff8802041af240 3653099183 S Bi:8:005:2 -115 16384 <
ffff88021a3efb00 3653101946 S Bo:8:005:1 -115 4096 = 4e434d48 0c003300 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff88021a3efb00 3653102201 C Bo:8:005:1 0 4096 >
ffff8802041af9c0 3654039362 C Bi:8:005:2 0 416 = 4e434d48 0c002e00 a0010c00 49505300 10000000 1c007a01 00000000 4500017a
ffff8802041af9c0 3654039414 S Bi:8:005:2 -115 16384 <
ffff88021a3ef980 3654101353 S Bo:8:005:1 -115 4096 = 4e434d48 0c003400 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff88021a3ef980 3654101622 C Bo:8:005:1 0 4096 >
ffff88021a3efc80 3654999361 C Bi:8:005:2 0 416 = 4e434d48 0c002f00 a0010c00 49505300 10000000 1c007a01 00000000 4500017a
ffff88021a3efc80 3654999412 S Bi:8:005:2 -115 16384 <
ffff88021a3ef200 3655102396 S Bo:8:005:1 -115 4096 = 4e434d48 0c003500 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff88021a3ef200 3655102737 C Bo:8:005:1 0 4096 >
ffff88021a3efe00 3655949445 C Bi:8:005:2 0 416 = 4e434d48 0c003000 a0010c00 49505300 10000000 1c007a01 00000000 4500017a
ffff88021a3efe00 3655949491 S Bi:8:005:2 -115 16384 <
ffff88021a3ef500 3656103498 S Bo:8:005:1 -115 4096 = 4e434d48 0c003600 00100c00 49505300 10000000 b8007a01 00000000 00000000
ffff88021a3ef500 3656103779 C Bo:8:005:1 0 4096 >
ffff8802041af540 3657229210 C Bi:8:005:2 0 416 = 4e434d48 0c003100 a0010c00 49505300 10000000 1c007a01 00000000 4500017a
ffff8802041af540 3657229251 S Bi:8:005:2 -115 16384 <
ffff8802041af240 3659438084 C Bi:8:005:2 -104 0
ffff8802041af9c0 3659438159 C Bi:8:005:2 -104 0
ffff88021a3efc80 3659438206 C Bi:8:005:2 -104 0



Bjørn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjørn Mork Jan. 22, 2013, 9:54 a.m. UTC | #5
Alexey ORISHKO <alexey.orishko@stericsson.com> writes:

> If you add ZLP for NTBs of dwNtbOutMaxSize, you are heavily affecting CPU
> load, increasing interrupt load by factor of 2 in high load traffic
> scenario and possibly decreasing throughput for all other devices
> which behaves correctly. 

Hello Alexey,

Still trying to understand the mechanisms involved here. I must
apologize for my lack of knowledge of the hardware restrictions
involved.

The current cdc_ncm/cdc_mbim drivers will pad a NTB to the full
dwNtbOutMaxSize whenever it reaches at least 512 bytes.  The reason is
that this allows more efficient device DMA operation.  This is something
we do to adapt to device hardware restrictions even though there is no
such recommendations in the NCM/MBIM specs.  The penalty on the host and
bus should be obvious: Even with a quite small dwNtbOutMaxSize of 4096,
we end up sending 8 x 512-byte data packets instead of the 2 we could
have managed with.

Now you claim that sending 9 packets, where the last one is a zero
length packet, increaes the interrupt load by factor of 2?  How is that?

This is where my lack of hardware knowledge shows up, but I assumed
based on the device DMA argument that the device USB engine would do the
actual packet reception and assembly and DMA the complete 4096-byte data
transfer to the device memory. Won't the device USB engine also handle
the ZLP?  Won't it just collect 9 packets instead of 8 and assemble them
to the 4096-byte data transfer which is DMAed away the same way it would
if there were just 8 packets in the transfer?  The only differences
being that the ZLP variant works even if the device stack for some
reason requests more than dwNtbOutMaxSize bytes (which of course could
be claimed to be stupid, but the spec allows it), and of course that we
waste even more USB bus bandwidth.  But if USB bandwidth is a factor we
consider here then I am going to question the padding to dwNtbOutMaxSize
based on some devices having DMA restrictions making that more efficient.

Why is the device DMA restrictions not a fault, while the device ZLP
requirement is?  Both seem like reasonable device hardware/firmware
implementation imposed restrictions to me.  Something we'll just have to
accept.

Note that the problem with the ZLP on the Sierra devices is made much
worse than necessary by the padding.  Without this we would rarely have
hit the dwNtbOutMaxSize limit, and would have been able to work around
it by simply using tx_max = dwNtbOutMaxSize - 1 if 
(dwNtbOutMaxSize % wMaxPacketSize == 0).

Maybe that is what Windows does?

But we cannot do that due to other devices having issues with transfers
less than dwNtbOutMaxSize..


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey ORISHKO Jan. 22, 2013, 3:51 p.m. UTC | #6
H Bjørn,

> -----Original Message-----

> From: Bjørn Mork [mailto:bjorn@mork.no]

> Sent: Tuesday, January 22, 2013 10:54 AM

> 

> > If you add ZLP for NTBs of dwNtbOutMaxSize, you are heavily affecting

> > CPU load, increasing interrupt load by factor of 2 in high load

> > traffic scenario and possibly decreasing throughput for all other

> > devices which behaves correctly.

>  

> The current cdc_ncm/cdc_mbim drivers will pad a NTB to the full

> dwNtbOutMaxSize whenever it reaches at least 512 bytes.  The reason is

> that this allows more efficient device DMA operation.  This is

> something we do to adapt to device hardware restrictions even though

> there is no such recommendations in the NCM/MBIM specs.  


There are a lot of things which were discussed during development of
specifications, but not ended up in the final version of the spec due to
various reasons. Companies attending USB-IF F2F meetings can benefit
from discussions between member companies and get access to additional
information not visible outside of USB-IF. 

> The penalty on

> the host and bus should be obvious: Even with a quite small

> dwNtbOutMaxSize of 4096, we end up sending 8 x 512-byte data packets

> instead of the 2 we could have managed with.


It was intentional and it's developer choice (see also the last comment).
It's a straightforward approach, but this limit could be a dynamic
value based on statistic and current load.

>  

> Now you claim that sending 9 packets, where the last one is a zero

> length packet, increase the interrupt load by factor of 2?  How is

> that?


You set up DMA job to receive full NTB and get a single interrupt
when job is done. It means while DMA is collecting data, CPU can do
something else (send data to NW stack).

In other protocols you need to indicate the end of data, but in
NCM/MBIM we know for sure that host is not allowed to send more
than dwNtbOutMaxSize bytes, so ZLP is not needed.

If host decides to send ZLP after full NTB, CPU must handle
additional INT per every full NTB instead of doing useful work.
For FTP transfer with constantly full NTBs you get twice amount of
Interrupts.

> Why is the device DMA restrictions not a fault, while the device ZLP

> requirement is?  Both seem like reasonable device hardware/firmware

> implementation imposed restrictions to me.  Something we'll just have

> to accept.


Both specifications (NCM/MBIM) were written from the point of host (most
likely PC with 2/4 core CPU) being more powerful than device and with
requirement for host to honor device limitations.

Regards,
Alexey
Yauheni Kaliuta Jan. 22, 2013, 5:16 p.m. UTC | #7
Hi!

On Tue, Jan 22, 2013 at 12:01 AM, Bjørn Mork <bjorn@mork.no> wrote:
> Yauheni Kaliuta <y.kaliuta@gmail.com> writes:
>>>>>>> "BM" == Bjørn Mork writes:
>>
>>  > We normally avoid sending ZLPs by padding NTBs with a zero byte
>>  > if the NTB is shorter than dwNtbOutMaxSize, resulting in a short
>>  > USB packet instead of a ZLP.  But in the case where the NTB length
>>  > is exactly dwNtbOutMaxSize and this is an exact multiplum of
>>  > wMaxPacketSize, then we must send a ZLP.
>>
>> The idea of NCM was to avoid extra ZLPs. If your transfer is exactly
>> dwNtbOutMaxSize, it's known, you can submit such request on the receiver
>> side and you do not need any EOT indicatation, so the frametime can be
>> used for useful data.
>
> Yes, that makes sense.  And I understand that both you and Alexey are of
> this opinion.
>
> But this idea is by no means made clear (to me) in the spec.  I do not
> think the current wording is precise enough to expect every implementor
> to understand any such intent.  The only place I find either "short
> packet" or ZLP mentioned in the NCM spec is in tables 3-1 and 3-2,
> describing the 16bit and 32bit NTH formats, in the description of the
> (d)wBlockLength fields.  And even there it is only mentioned in the
> context of the special (d)wBlockLength = 0x0000 handling.

I agree, it could be a bit unclear,

"
If exactly dwNtbInMaxSize or dwNtbOutMaxSize
bytes are sent, and the size is a multiple of wMax-
PacketSize for the given pipe, then no ZLP shall be
sent.
"
is an independent clause.

>
> If the intent was to prevent ZLPs, then it would have been wise to write
> that in the NCM standard. As it stands you have to use a lot of
> imagination to read that intent into the current spec.

Well, I hope the guys took your complains into account and will fix
the wording in the future spec versions.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/usb/cdc_mbim.c b/drivers/net/usb/cdc_mbim.c
index 42f51c7..3a5673a 100644
--- a/drivers/net/usb/cdc_mbim.c
+++ b/drivers/net/usb/cdc_mbim.c
@@ -366,7 +366,7 @@  err:
 
 static const struct driver_info cdc_mbim_info = {
 	.description = "CDC MBIM",
-	.flags = FLAG_NO_SETINT | FLAG_MULTI_PACKET | FLAG_WWAN,
+	.flags = FLAG_NO_SETINT | FLAG_MULTI_PACKET | FLAG_WWAN | FLAG_SEND_ZLP,
 	.bind = cdc_mbim_bind,
 	.unbind = cdc_mbim_unbind,
 	.manage_power = cdc_mbim_manage_power,