Message ID | 20200122203253.20652-1-lrizzo@google.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | net-xdp: netdev attribute to control xdpgeneric skb linearization | expand |
Luigi Rizzo <lrizzo@google.com> writes: > Add a netdevice flag to control skb linearization in generic xdp mode. > Among the various mechanism to control the flag, the sysfs > interface seems sufficiently simple and self-contained. > The attribute can be modified through > /sys/class/net/<DEVICE>/xdp_linearize > The default is 1 (on) Erm, won't turning off linearization break the XDP program's ability to do direct packet access? -Toke
On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote: > Luigi Rizzo <lrizzo@google.com> writes: > >> Add a netdevice flag to control skb linearization in generic xdp mode. >> Among the various mechanism to control the flag, the sysfs >> interface seems sufficiently simple and self-contained. >> The attribute can be modified through >> /sys/class/net/<DEVICE>/xdp_linearize >> The default is 1 (on) Needs documentation in Documentation/ABI/testing/sysfs-class-net. > Erm, won't turning off linearization break the XDP program's ability to > do direct packet access? Yes, in the worst case you only have eth header pulled into linear section. :/ In tc/BPF for direct packet access we have bpf_skb_pull_data() helper which can pull in up to X bytes into linear section on demand. I guess something like this could be done for XDP context as well, e.g. generic XDP would pull when non-linear and native XDP would have nothing todo (though in this case you end up writing the prog specifically for generic XDP with slowdown when you'd load it on native XDP where it's linear anyway, but that could/should be documented if so). Thanks, Daniel
Daniel Borkmann <daniel@iogearbox.net> writes: > On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote: >> Luigi Rizzo <lrizzo@google.com> writes: >> >>> Add a netdevice flag to control skb linearization in generic xdp mode. >>> Among the various mechanism to control the flag, the sysfs >>> interface seems sufficiently simple and self-contained. >>> The attribute can be modified through >>> /sys/class/net/<DEVICE>/xdp_linearize >>> The default is 1 (on) > > Needs documentation in Documentation/ABI/testing/sysfs-class-net. > >> Erm, won't turning off linearization break the XDP program's ability to >> do direct packet access? > > Yes, in the worst case you only have eth header pulled into linear > section. :/ In which case an eBPF program could read/write out of bounds since the verifier only verifies checks against xdp->data_end. Right? > In tc/BPF for direct packet access we have bpf_skb_pull_data() helper > which can pull in up to X bytes into linear section on demand. I guess > something like this could be done for XDP context as well, e.g. > generic XDP would pull when non-linear and native XDP would have > nothing todo (though in this case you end up writing the prog > specifically for generic XDP with slowdown when you'd load it on > native XDP where it's linear anyway, but that could/should be > documented if so). Yeah, I really don't think this is a good idea; there are enough gotchas with the difference between generic and native XDP as it is... :/ -Toke
On Thu, Jan 23, 2020 at 7:48 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote: > > Luigi Rizzo <lrizzo@google.com> writes: > > > >> Add a netdevice flag to control skb linearization in generic xdp mode. > >> Among the various mechanism to control the flag, the sysfs > >> interface seems sufficiently simple and self-contained. > >> The attribute can be modified through > >> /sys/class/net/<DEVICE>/xdp_linearize > >> The default is 1 (on) > > Needs documentation in Documentation/ABI/testing/sysfs-class-net. > > > Erm, won't turning off linearization break the XDP program's ability to > > do direct packet access? > > Yes, in the worst case you only have eth header pulled into linear section. :/ > In tc/BPF for direct packet access we have bpf_skb_pull_data() helper which can > pull in up to X bytes into linear section on demand. I guess something like this > could be done for XDP context as well, e.g. generic XDP would pull when non-linear > and native XDP would have nothing todo (though in this case you end up writing the > prog specifically for generic XDP with slowdown when you'd load it on native XDP > where it's linear anyway, but that could/should be documented if so). There was some discussion on multi-segment xdp https://www.spinics.net/lists/netdev/msg620140.html https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org with no clear decision as far as I can tell. I wanted to point out that linearization might be an issue for native xdp as well (specifically with NICs that do header split, LRO, scatter-gather, MTU > pagesize ...) and having to unconditionally pay the linearization cost (or disable the above features) by just loading an xdp program may be a big performance hit. cheers luigi > > Thanks, > Daniel
On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Daniel Borkmann <daniel@iogearbox.net> writes: > > > On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote: > >> Luigi Rizzo <lrizzo@google.com> writes: > >> > >>> Add a netdevice flag to control skb linearization in generic xdp mode. > >>> Among the various mechanism to control the flag, the sysfs > >>> interface seems sufficiently simple and self-contained. > >>> The attribute can be modified through > >>> /sys/class/net/<DEVICE>/xdp_linearize > >>> The default is 1 (on) > > > > Needs documentation in Documentation/ABI/testing/sysfs-class-net. > > > >> Erm, won't turning off linearization break the XDP program's ability to > >> do direct packet access? > > > > Yes, in the worst case you only have eth header pulled into linear > > section. :/ > > In which case an eBPF program could read/write out of bounds since the > verifier only verifies checks against xdp->data_end. Right? Why out of bounds? Without linearization we construct xdp_buff as follows: mac_len = skb->data - skb_mac_header(skb); hlen = skb_headlen(skb) + mac_len; xdp->data = skb->data - mac_len; xdp->data_end = xdp->data + hlen; xdp->data_hard_start = skb->data - skb_headroom(skb); so we shouldn't go out of bounds. cheers luigi
Luigi Rizzo <lrizzo@google.com> writes: > On Thu, Jan 23, 2020 at 7:48 AM Daniel Borkmann <daniel@iogearbox.net> wrote: >> >> On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote: >> > Luigi Rizzo <lrizzo@google.com> writes: >> > >> >> Add a netdevice flag to control skb linearization in generic xdp mode. >> >> Among the various mechanism to control the flag, the sysfs >> >> interface seems sufficiently simple and self-contained. >> >> The attribute can be modified through >> >> /sys/class/net/<DEVICE>/xdp_linearize >> >> The default is 1 (on) >> >> Needs documentation in Documentation/ABI/testing/sysfs-class-net. >> >> > Erm, won't turning off linearization break the XDP program's ability to >> > do direct packet access? >> >> Yes, in the worst case you only have eth header pulled into linear section. :/ >> In tc/BPF for direct packet access we have bpf_skb_pull_data() helper which can >> pull in up to X bytes into linear section on demand. I guess something like this >> could be done for XDP context as well, e.g. generic XDP would pull when non-linear >> and native XDP would have nothing todo (though in this case you end up writing the >> prog specifically for generic XDP with slowdown when you'd load it on native XDP >> where it's linear anyway, but that could/should be documented if so). > > There was some discussion on multi-segment xdp > https://www.spinics.net/lists/netdev/msg620140.html > https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org > > with no clear decision as far as I can tell. > > I wanted to point out that linearization might be an issue for native > xdp as well (specifically with NICs that do header split, LRO, > scatter-gather, MTU pagesize ...) and having to unconditionally pay > the linearization cost (or disable the above features) by just loading > an xdp program may be a big performance hit. Right, sure, but then I'd rather fix it for all of XDP instead of introduce (more) differences between native and generic mode... -Toke
Luigi Rizzo <lrizzo@google.com> writes: > On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Daniel Borkmann <daniel@iogearbox.net> writes: >> >> > On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote: >> >> Luigi Rizzo <lrizzo@google.com> writes: >> >> >> >>> Add a netdevice flag to control skb linearization in generic xdp mode. >> >>> Among the various mechanism to control the flag, the sysfs >> >>> interface seems sufficiently simple and self-contained. >> >>> The attribute can be modified through >> >>> /sys/class/net/<DEVICE>/xdp_linearize >> >>> The default is 1 (on) >> > >> > Needs documentation in Documentation/ABI/testing/sysfs-class-net. >> > >> >> Erm, won't turning off linearization break the XDP program's ability to >> >> do direct packet access? >> > >> > Yes, in the worst case you only have eth header pulled into linear >> > section. :/ >> >> In which case an eBPF program could read/write out of bounds since the >> verifier only verifies checks against xdp->data_end. Right? > > Why out of bounds? Without linearization we construct xdp_buff as follows: > > mac_len = skb->data - skb_mac_header(skb); > hlen = skb_headlen(skb) + mac_len; > xdp->data = skb->data - mac_len; > xdp->data_end = xdp->data + hlen; > xdp->data_hard_start = skb->data - skb_headroom(skb); > > so we shouldn't go out of bounds. Hmm, right, as long as it's guaranteed that the bit up to hlen is already linear; is it? :) -Toke
On Thu, Jan 23, 2020 at 10:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Luigi Rizzo <lrizzo@google.com> writes: > > > On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> > >> Daniel Borkmann <daniel@iogearbox.net> writes: > >> > >> > On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote: > >> >> Luigi Rizzo <lrizzo@google.com> writes: > >> >> > >> >>> Add a netdevice flag to control skb linearization in generic xdp mode. > >> >>> Among the various mechanism to control the flag, the sysfs > >> >>> interface seems sufficiently simple and self-contained. > >> >>> The attribute can be modified through > >> >>> /sys/class/net/<DEVICE>/xdp_linearize > >> >>> The default is 1 (on) > >> > > >> > Needs documentation in Documentation/ABI/testing/sysfs-class-net. > >> > > >> >> Erm, won't turning off linearization break the XDP program's ability to > >> >> do direct packet access? > >> > > >> > Yes, in the worst case you only have eth header pulled into linear > >> > section. :/ > >> > >> In which case an eBPF program could read/write out of bounds since the > >> verifier only verifies checks against xdp->data_end. Right? > > > > Why out of bounds? Without linearization we construct xdp_buff as follows: > > > > mac_len = skb->data - skb_mac_header(skb); > > hlen = skb_headlen(skb) + mac_len; > > xdp->data = skb->data - mac_len; > > xdp->data_end = xdp->data + hlen; > > xdp->data_hard_start = skb->data - skb_headroom(skb); > > > > so we shouldn't go out of bounds. > > Hmm, right, as long as it's guaranteed that the bit up to hlen is > already linear; is it? :) honest question: that would be skb->len - skb->data_len, isn't that the linear part by definition ? cheers luigi > > -Toke >
On Thu, Jan 23, 2020 at 10:00 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Luigi Rizzo <lrizzo@google.com> writes: > > > On Thu, Jan 23, 2020 at 7:48 AM Daniel Borkmann <daniel@iogearbox.net> wrote: ... > > There was some discussion on multi-segment xdp > > https://www.spinics.net/lists/netdev/msg620140.html > > https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org > > > > with no clear decision as far as I can tell. > > > > I wanted to point out that linearization might be an issue for native > > xdp as well (specifically with NICs that do header split, LRO, > > scatter-gather, MTU pagesize ...) and having to unconditionally pay > > the linearization cost (or disable the above features) by just loading > > an xdp program may be a big performance hit. > > Right, sure, but then I'd rather fix it for all of XDP instead of > introduce (more) differences between native and generic mode... FWIW I think the discussion was heading towards 'only pass the first segment to the bpf code' with some other mechanism TBD to at least reliably know that there is more (right now bpf can look at the protocol headers, but of course they can lie). Whatever the fix is, it will probably require extending xdp_buff with extra fields. I am not sure whether this is considered a stable ABI or one in flux... cheers luigi > > -Toke >
On 1/23/20 7:06 PM, Luigi Rizzo wrote: > On Thu, Jan 23, 2020 at 10:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> Luigi Rizzo <lrizzo@google.com> writes: >>> On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>>> Daniel Borkmann <daniel@iogearbox.net> writes: >>>>> On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote: >>>>>> Luigi Rizzo <lrizzo@google.com> writes: >>>>>> >>>>>>> Add a netdevice flag to control skb linearization in generic xdp mode. >>>>>>> Among the various mechanism to control the flag, the sysfs >>>>>>> interface seems sufficiently simple and self-contained. >>>>>>> The attribute can be modified through >>>>>>> /sys/class/net/<DEVICE>/xdp_linearize >>>>>>> The default is 1 (on) >>>>> >>>>> Needs documentation in Documentation/ABI/testing/sysfs-class-net. >>>>> >>>>>> Erm, won't turning off linearization break the XDP program's ability to >>>>>> do direct packet access? >>>>> >>>>> Yes, in the worst case you only have eth header pulled into linear >>>>> section. :/ >>>> >>>> In which case an eBPF program could read/write out of bounds since the >>>> verifier only verifies checks against xdp->data_end. Right? >>> >>> Why out of bounds? Without linearization we construct xdp_buff as follows: >>> >>> mac_len = skb->data - skb_mac_header(skb); >>> hlen = skb_headlen(skb) + mac_len; >>> xdp->data = skb->data - mac_len; >>> xdp->data_end = xdp->data + hlen; >>> xdp->data_hard_start = skb->data - skb_headroom(skb); >>> >>> so we shouldn't go out of bounds. >> >> Hmm, right, as long as it's guaranteed that the bit up to hlen is >> already linear; is it? :) > > honest question: that would be skb->len - skb->data_len, isn't that > the linear part by definition ? Yep, that's the linear part by definition. Generic XDP with ->data/->data_end is in this aspect no different from tc/BPF where we operate on skb context. Only linear part can be covered from skb (unless you pull in more via helper for the latter). Thanks, Daniel
Daniel Borkmann <daniel@iogearbox.net> writes: > On 1/23/20 7:06 PM, Luigi Rizzo wrote: >> On Thu, Jan 23, 2020 at 10:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>> Luigi Rizzo <lrizzo@google.com> writes: >>>> On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>>>> Daniel Borkmann <daniel@iogearbox.net> writes: >>>>>> On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote: >>>>>>> Luigi Rizzo <lrizzo@google.com> writes: >>>>>>> >>>>>>>> Add a netdevice flag to control skb linearization in generic xdp mode. >>>>>>>> Among the various mechanism to control the flag, the sysfs >>>>>>>> interface seems sufficiently simple and self-contained. >>>>>>>> The attribute can be modified through >>>>>>>> /sys/class/net/<DEVICE>/xdp_linearize >>>>>>>> The default is 1 (on) >>>>>> >>>>>> Needs documentation in Documentation/ABI/testing/sysfs-class-net. >>>>>> >>>>>>> Erm, won't turning off linearization break the XDP program's ability to >>>>>>> do direct packet access? >>>>>> >>>>>> Yes, in the worst case you only have eth header pulled into linear >>>>>> section. :/ >>>>> >>>>> In which case an eBPF program could read/write out of bounds since the >>>>> verifier only verifies checks against xdp->data_end. Right? >>>> >>>> Why out of bounds? Without linearization we construct xdp_buff as follows: >>>> >>>> mac_len = skb->data - skb_mac_header(skb); >>>> hlen = skb_headlen(skb) + mac_len; >>>> xdp->data = skb->data - mac_len; >>>> xdp->data_end = xdp->data + hlen; >>>> xdp->data_hard_start = skb->data - skb_headroom(skb); >>>> >>>> so we shouldn't go out of bounds. >>> >>> Hmm, right, as long as it's guaranteed that the bit up to hlen is >>> already linear; is it? :) >> >> honest question: that would be skb->len - skb->data_len, isn't that >> the linear part by definition ? > > Yep, that's the linear part by definition. Generic XDP with ->data/->data_end is in > this aspect no different from tc/BPF where we operate on skb context. Only linear part > can be covered from skb (unless you pull in more via helper for the > latter). OK, but then why are we linearising in the first place? Just to get sufficient headroom? -Toke
On Fri, Jan 24, 2020 at 1:57 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Daniel Borkmann <daniel@iogearbox.net> writes: > > > On 1/23/20 7:06 PM, Luigi Rizzo wrote: > >> On Thu, Jan 23, 2020 at 10:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >>> Luigi Rizzo <lrizzo@google.com> writes: > >>>> On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >>>>> Daniel Borkmann <daniel@iogearbox.net> writes: > >>>>>> On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote: > >>>>>>> Luigi Rizzo <lrizzo@google.com> writes: > >>>>>>> > >>>>>>>> Add a netdevice flag to control skb linearization in generic xdp mode. > >>>>>>>> Among the various mechanism to control the flag, the sysfs > >>>>>>>> interface seems sufficiently simple and self-contained. > >>>>>>>> The attribute can be modified through > >>>>>>>> /sys/class/net/<DEVICE>/xdp_linearize > >>>>>>>> The default is 1 (on) > >>>>>> > >>>>>> Needs documentation in Documentation/ABI/testing/sysfs-class-net. > >>>>>> > >>>>>>> Erm, won't turning off linearization break the XDP program's ability to > >>>>>>> do direct packet access? > >>>>>> > >>>>>> Yes, in the worst case you only have eth header pulled into linear > >>>>>> section. :/ > >>>>> > >>>>> In which case an eBPF program could read/write out of bounds since the > >>>>> verifier only verifies checks against xdp->data_end. Right? > >>>> > >>>> Why out of bounds? Without linearization we construct xdp_buff as follows: > >>>> > >>>> mac_len = skb->data - skb_mac_header(skb); > >>>> hlen = skb_headlen(skb) + mac_len; > >>>> xdp->data = skb->data - mac_len; > >>>> xdp->data_end = xdp->data + hlen; > >>>> xdp->data_hard_start = skb->data - skb_headroom(skb); > >>>> > >>>> so we shouldn't go out of bounds. > >>> > >>> Hmm, right, as long as it's guaranteed that the bit up to hlen is > >>> already linear; is it? :) > >> > >> honest question: that would be skb->len - skb->data_len, isn't that > >> the linear part by definition ? > > > > Yep, that's the linear part by definition. Generic XDP with ->data/->data_end is in > > this aspect no different from tc/BPF where we operate on skb context. Only linear part > > can be covered from skb (unless you pull in more via helper for the > > latter). > > OK, but then why are we linearising in the first place? Just to get > sufficient headroom? Looking at the condition in the if() it is both to make sufficient headroom available and have linear data so the bpf code can access all the packet data. My motivation for this change is that enforcing those guarantees has significant cost (even for native xdp in the cases I mentioned - mtu > 1 page, hw LRO, header split), and this is an interim solution to make generic skb usable without too much penalty. In the long term I think it would be good if the xdp program could express its requirements at load time ("i just need header, I need at least 18 bytes of headroom..") and have the netdev or nic driver reconfigure as appropriate. cheers luigi
Luigi Rizzo <lrizzo@google.com> writes: > On Fri, Jan 24, 2020 at 1:57 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Daniel Borkmann <daniel@iogearbox.net> writes: >> >> > On 1/23/20 7:06 PM, Luigi Rizzo wrote: >> >> On Thu, Jan 23, 2020 at 10:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >>> Luigi Rizzo <lrizzo@google.com> writes: >> >>>> On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >>>>> Daniel Borkmann <daniel@iogearbox.net> writes: >> >>>>>> On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote: >> >>>>>>> Luigi Rizzo <lrizzo@google.com> writes: >> >>>>>>> >> >>>>>>>> Add a netdevice flag to control skb linearization in generic xdp mode. >> >>>>>>>> Among the various mechanism to control the flag, the sysfs >> >>>>>>>> interface seems sufficiently simple and self-contained. >> >>>>>>>> The attribute can be modified through >> >>>>>>>> /sys/class/net/<DEVICE>/xdp_linearize >> >>>>>>>> The default is 1 (on) >> >>>>>> >> >>>>>> Needs documentation in Documentation/ABI/testing/sysfs-class-net. >> >>>>>> >> >>>>>>> Erm, won't turning off linearization break the XDP program's ability to >> >>>>>>> do direct packet access? >> >>>>>> >> >>>>>> Yes, in the worst case you only have eth header pulled into linear >> >>>>>> section. :/ >> >>>>> >> >>>>> In which case an eBPF program could read/write out of bounds since the >> >>>>> verifier only verifies checks against xdp->data_end. Right? >> >>>> >> >>>> Why out of bounds? Without linearization we construct xdp_buff as follows: >> >>>> >> >>>> mac_len = skb->data - skb_mac_header(skb); >> >>>> hlen = skb_headlen(skb) + mac_len; >> >>>> xdp->data = skb->data - mac_len; >> >>>> xdp->data_end = xdp->data + hlen; >> >>>> xdp->data_hard_start = skb->data - skb_headroom(skb); >> >>>> >> >>>> so we shouldn't go out of bounds. >> >>> >> >>> Hmm, right, as long as it's guaranteed that the bit up to hlen is >> >>> already linear; is it? :) >> >> >> >> honest question: that would be skb->len - skb->data_len, isn't that >> >> the linear part by definition ? >> > >> > Yep, that's the linear part by definition. Generic XDP with ->data/->data_end is in >> > this aspect no different from tc/BPF where we operate on skb context. Only linear part >> > can be covered from skb (unless you pull in more via helper for the >> > latter). >> >> OK, but then why are we linearising in the first place? Just to get >> sufficient headroom? > > Looking at the condition in the if() it is both to make sufficient > headroom available and have linear data so the bpf code can access all > the packet data. Ohhh, didn't realise that linearising also changes skb_headlen() - makes so much more sense now :) > My motivation for this change is that enforcing those guarantees has > significant cost (even for native xdp in the cases I mentioned - mtu > > 1 page, hw LRO, header split), and this is an interim solution to make > generic skb usable without too much penalty. Sure, that part I understand; I just don't like that this "interim" solution makes generic and native XDP diverge further in their semantics... > In the long term I think it would be good if the xdp program could > express its requirements at load time ("i just need header, I need at > least 18 bytes of headroom..") and have the netdev or nic driver > reconfigure as appropriate. This may be interesting to include in the XDP feature detection capabilities we've been discussing for some time. Our current thinking is that the verifier should detect what a program does, rather than the program having to explicitly declare what features it needs. See https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#notes-implementation-plan for some notes on this :) -Toke
On Fri, Jan 24, 2020 at 7:31 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Luigi Rizzo <lrizzo@google.com> writes: > ... > > My motivation for this change is that enforcing those guarantees has > > significant cost (even for native xdp in the cases I mentioned - mtu > > > 1 page, hw LRO, header split), and this is an interim solution to make > > generic skb usable without too much penalty. > > Sure, that part I understand; I just don't like that this "interim" > solution makes generic and native XDP diverge further in their > semantics... As a matter of fact I think it would make full sense to use the same approach to control whether native xdp should pay the price converting to linear buffers when the hw cannot guarantee that. To me this seems to be a case of "perfect is enemy of good":.. cheers luigi > > > In the long term I think it would be good if the xdp program could > > express its requirements at load time ("i just need header, I need at > > least 18 bytes of headroom..") and have the netdev or nic driver > > reconfigure as appropriate. > > This may be interesting to include in the XDP feature detection > capabilities we've been discussing for some time. Our current thinking > is that the verifier should detect what a program does, rather than the > program having to explicitly declare what features it needs. See > https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#notes-implementation-plan > for some notes on this :) > > -Toke >
Luigi Rizzo <lrizzo@google.com> writes: > On Fri, Jan 24, 2020 at 7:31 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Luigi Rizzo <lrizzo@google.com> writes: >> > ... >> > My motivation for this change is that enforcing those guarantees has >> > significant cost (even for native xdp in the cases I mentioned - mtu > >> > 1 page, hw LRO, header split), and this is an interim solution to make >> > generic skb usable without too much penalty. >> >> Sure, that part I understand; I just don't like that this "interim" >> solution makes generic and native XDP diverge further in their >> semantics... > > As a matter of fact I think it would make full sense to use the same approach > to control whether native xdp should pay the price converting to linear buffers > when the hw cannot guarantee that. > > To me this seems to be a case of "perfect is enemy of good":.. Hmm, I can kinda see your point (now that I've actually grok'ed how the length works with skbs and generic XDP :)). I would still worry that only having the header there would lead some XDP programs to just silently fail. But on the other hand, this is opt-in... so IDK - maybe this is fine to merge as-is, and leave improvements for later? -Toke
On Fri, Jan 24, 2020 at 1:28 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Luigi Rizzo <lrizzo@google.com> writes: > > > On Fri, Jan 24, 2020 at 7:31 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> > >> Luigi Rizzo <lrizzo@google.com> writes: > >> > > ... > >> > My motivation for this change is that enforcing those guarantees has > >> > significant cost (even for native xdp in the cases I mentioned - mtu > > >> > 1 page, hw LRO, header split), and this is an interim solution to make > >> > generic skb usable without too much penalty. > >> > >> Sure, that part I understand; I just don't like that this "interim" > >> solution makes generic and native XDP diverge further in their > >> semantics... > > > > As a matter of fact I think it would make full sense to use the same approach > > to control whether native xdp should pay the price converting to linear buffers > > when the hw cannot guarantee that. > > > > To me this seems to be a case of "perfect is enemy of good":.. > > Hmm, I can kinda see your point (now that I've actually grok'ed how the > length works with skbs and generic XDP :)). I would still worry that > only having the header there would lead some XDP programs to just > silently fail. But on the other hand, this is opt-in... so IDK - maybe > this is fine to merge as-is, and leave improvements for later? Sorry I let this slip, any consensus on this patch? Thanks Luigi
Luigi Rizzo <rizzo@iet.unipi.it> writes: > On Fri, Jan 24, 2020 at 1:28 PM Toke Høiland-Jørgensen <toke@redhat.com> > wrote: > >> Luigi Rizzo <lrizzo@google.com> writes: >> >> > On Fri, Jan 24, 2020 at 7:31 AM Toke Høiland-Jørgensen <toke@redhat.com> >> wrote: >> >> >> >> Luigi Rizzo <lrizzo@google.com> writes: >> >> >> > ... >> >> > My motivation for this change is that enforcing those guarantees has >> >> > significant cost (even for native xdp in the cases I mentioned - mtu > >> >> > 1 page, hw LRO, header split), and this is an interim solution to make >> >> > generic skb usable without too much penalty. >> >> >> >> Sure, that part I understand; I just don't like that this "interim" >> >> solution makes generic and native XDP diverge further in their >> >> semantics... >> > >> > As a matter of fact I think it would make full sense to use the same >> approach >> > to control whether native xdp should pay the price converting to linear >> buffers >> > when the hw cannot guarantee that. >> > >> > To me this seems to be a case of "perfect is enemy of good":.. >> >> Hmm, I can kinda see your point (now that I've actually grok'ed how the >> length works with skbs and generic XDP :)). I would still worry that >> only having the header there would lead some XDP programs to just >> silently fail. But on the other hand, this is opt-in... so IDK - maybe >> this is fine to merge as-is, and leave improvements for later? >> > > Sorry I let this slip, any consensus on this patch? Dunno if there's a consensus, but I certainly ran out of objections ;) -Toke
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2741aa35bec6..ae873fb5ec3c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1958,7 +1958,8 @@ struct net_device { struct netdev_rx_queue *_rx; unsigned int num_rx_queues; - unsigned int real_num_rx_queues; + unsigned int real_num_rx_queues:31; + unsigned int xdp_linearize : 1; struct bpf_prog __rcu *xdp_prog; unsigned long gro_flush_timeout; diff --git a/net/core/dev.c b/net/core/dev.c index 6368c94c9e0a..04c7c8ed1b4a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4484,8 +4484,8 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb, * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also * native XDP provides, thus we need to do it here as well. */ - if (skb_is_nonlinear(skb) || - skb_headroom(skb) < XDP_PACKET_HEADROOM) { + if (skb->dev->xdp_linearize && (skb_is_nonlinear(skb) || + skb_headroom(skb) < XDP_PACKET_HEADROOM)) { int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb); int troom = skb->tail + skb->data_len - skb->end; @@ -9756,6 +9756,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, dev->gso_max_segs = GSO_MAX_SEGS; dev->upper_level = 1; dev->lower_level = 1; + dev->xdp_linearize = 1; INIT_LIST_HEAD(&dev->napi_list); INIT_LIST_HEAD(&dev->unreg_list); diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 4c826b8bf9b1..ec59aa296664 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -442,6 +442,20 @@ static ssize_t proto_down_store(struct device *dev, } NETDEVICE_SHOW_RW(proto_down, fmt_dec); +static int change_xdp_linearize(struct net_device *dev, unsigned long val) +{ + dev->xdp_linearize = !!val; + return 0; +} + +static ssize_t xdp_linearize_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + return netdev_store(dev, attr, buf, len, change_xdp_linearize); +} +NETDEVICE_SHOW_RW(xdp_linearize, fmt_dec); + static ssize_t phys_port_id_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -536,6 +550,7 @@ static struct attribute *net_class_attrs[] __ro_after_init = { &dev_attr_phys_port_name.attr, &dev_attr_phys_switch_id.attr, &dev_attr_proto_down.attr, + &dev_attr_xdp_linearize.attr, &dev_attr_carrier_up_count.attr, &dev_attr_carrier_down_count.attr, NULL,
Add a netdevice flag to control skb linearization in generic xdp mode. Among the various mechanism to control the flag, the sysfs interface seems sufficiently simple and self-contained. The attribute can be modified through /sys/class/net/<DEVICE>/xdp_linearize The default is 1 (on) On a kernel instrumented to grab timestamps around the linearization code in netif_receive_generic_xdp, and heavy netperf traffic with 1500b mtu, I see the following times (nanoseconds/pkt) The receiver generally sees larger packets so the difference is more significant. ns/pkt RECEIVER SENDER p50 p90 p99 p50 p90 p99 LINEARIZATION: 600ns 1090ns 4900ns 149ns 249ns 460ns NO LINEARIZATION: 40ns 59ns 90ns 40ns 50ns 100ns Tested: run tests on an instrumented kernel Change-Id: I69884661167ab86347c50bdece8cae1afa821956 Signed-off-by: Luigi Rizzo <lrizzo@google.com> --- include/linux/netdevice.h | 3 ++- net/core/dev.c | 5 +++-- net/core/net-sysfs.c | 15 +++++++++++++++ 3 files changed, 20 insertions(+), 3 deletions(-)