diff mbox series

net-xdp: netdev attribute to control xdpgeneric skb linearization

Message ID 20200122203253.20652-1-lrizzo@google.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series net-xdp: netdev attribute to control xdpgeneric skb linearization | expand

Commit Message

Luigi Rizzo Jan. 22, 2020, 8:32 p.m. UTC
Add a netdevice flag to control skb linearization in generic xdp mode.
Among the various mechanism to control the flag, the sysfs
interface seems sufficiently simple and self-contained.
The attribute can be modified through
	/sys/class/net/<DEVICE>/xdp_linearize
The default is 1 (on)

On a kernel instrumented to grab timestamps around the linearization
code in netif_receive_generic_xdp, and heavy netperf traffic with 1500b
mtu, I see the following times (nanoseconds/pkt)

The receiver generally sees larger packets so the difference is more
significant.

ns/pkt                   RECEIVER                 SENDER

                    p50     p90     p99       p50   p90    p99

LINEARIZATION:    600ns  1090ns  4900ns     149ns 249ns  460ns
NO LINEARIZATION:  40ns    59ns    90ns      40ns  50ns  100ns

Tested: run tests on an instrumented kernel
Change-Id: I69884661167ab86347c50bdece8cae1afa821956
Signed-off-by: Luigi Rizzo <lrizzo@google.com>
---
 include/linux/netdevice.h |  3 ++-
 net/core/dev.c            |  5 +++--
 net/core/net-sysfs.c      | 15 +++++++++++++++
 3 files changed, 20 insertions(+), 3 deletions(-)

Comments

Toke Høiland-Jørgensen Jan. 23, 2020, 9:53 a.m. UTC | #1
Luigi Rizzo <lrizzo@google.com> writes:

> Add a netdevice flag to control skb linearization in generic xdp mode.
> Among the various mechanism to control the flag, the sysfs
> interface seems sufficiently simple and self-contained.
> The attribute can be modified through
> 	/sys/class/net/<DEVICE>/xdp_linearize
> The default is 1 (on)

Erm, won't turning off linearization break the XDP program's ability to
do direct packet access?

-Toke
Daniel Borkmann Jan. 23, 2020, 3:48 p.m. UTC | #2
On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote:
> Luigi Rizzo <lrizzo@google.com> writes:
> 
>> Add a netdevice flag to control skb linearization in generic xdp mode.
>> Among the various mechanism to control the flag, the sysfs
>> interface seems sufficiently simple and self-contained.
>> The attribute can be modified through
>> 	/sys/class/net/<DEVICE>/xdp_linearize
>> The default is 1 (on)

Needs documentation in Documentation/ABI/testing/sysfs-class-net.

> Erm, won't turning off linearization break the XDP program's ability to
> do direct packet access?

Yes, in the worst case you only have eth header pulled into linear section. :/
In tc/BPF for direct packet access we have bpf_skb_pull_data() helper which can
pull in up to X bytes into linear section on demand. I guess something like this
could be done for XDP context as well, e.g. generic XDP would pull when non-linear
and native XDP would have nothing todo (though in this case you end up writing the
prog specifically for generic XDP with slowdown when you'd load it on native XDP
where it's linear anyway, but that could/should be documented if so).

Thanks,
Daniel
Toke Høiland-Jørgensen Jan. 23, 2020, 4:14 p.m. UTC | #3
Daniel Borkmann <daniel@iogearbox.net> writes:

> On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote:
>> Luigi Rizzo <lrizzo@google.com> writes:
>> 
>>> Add a netdevice flag to control skb linearization in generic xdp mode.
>>> Among the various mechanism to control the flag, the sysfs
>>> interface seems sufficiently simple and self-contained.
>>> The attribute can be modified through
>>> 	/sys/class/net/<DEVICE>/xdp_linearize
>>> The default is 1 (on)
>
> Needs documentation in Documentation/ABI/testing/sysfs-class-net.
>
>> Erm, won't turning off linearization break the XDP program's ability to
>> do direct packet access?
>
> Yes, in the worst case you only have eth header pulled into linear
> section. :/

In which case an eBPF program could read/write out of bounds since the
verifier only verifies checks against xdp->data_end. Right?

> In tc/BPF for direct packet access we have bpf_skb_pull_data() helper
> which can pull in up to X bytes into linear section on demand. I guess
> something like this could be done for XDP context as well, e.g.
> generic XDP would pull when non-linear and native XDP would have
> nothing todo (though in this case you end up writing the prog
> specifically for generic XDP with slowdown when you'd load it on
> native XDP where it's linear anyway, but that could/should be
> documented if so).

Yeah, I really don't think this is a good idea; there are enough gotchas
with the difference between generic and native XDP as it is... :/

-Toke
Luigi Rizzo Jan. 23, 2020, 5:25 p.m. UTC | #4
On Thu, Jan 23, 2020 at 7:48 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote:
> > Luigi Rizzo <lrizzo@google.com> writes:
> >
> >> Add a netdevice flag to control skb linearization in generic xdp mode.
> >> Among the various mechanism to control the flag, the sysfs
> >> interface seems sufficiently simple and self-contained.
> >> The attribute can be modified through
> >>      /sys/class/net/<DEVICE>/xdp_linearize
> >> The default is 1 (on)
>
> Needs documentation in Documentation/ABI/testing/sysfs-class-net.
>
> > Erm, won't turning off linearization break the XDP program's ability to
> > do direct packet access?
>
> Yes, in the worst case you only have eth header pulled into linear section. :/
> In tc/BPF for direct packet access we have bpf_skb_pull_data() helper which can
> pull in up to X bytes into linear section on demand. I guess something like this
> could be done for XDP context as well, e.g. generic XDP would pull when non-linear
> and native XDP would have nothing todo (though in this case you end up writing the
> prog specifically for generic XDP with slowdown when you'd load it on native XDP
> where it's linear anyway, but that could/should be documented if so).

There was some discussion on multi-segment xdp
https://www.spinics.net/lists/netdev/msg620140.html
https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org

with no clear decision as far as I can tell.

I wanted to point out that linearization might be an issue for native
xdp as well
(specifically with NICs that do header split, LRO, scatter-gather, MTU
> pagesize ...)
and having to unconditionally pay the linearization cost (or disable
the above features)
by just loading an xdp program may be a big performance hit.

cheers
luigi


>
> Thanks,
> Daniel
Luigi Rizzo Jan. 23, 2020, 5:30 p.m. UTC | #5
On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Daniel Borkmann <daniel@iogearbox.net> writes:
>
> > On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote:
> >> Luigi Rizzo <lrizzo@google.com> writes:
> >>
> >>> Add a netdevice flag to control skb linearization in generic xdp mode.
> >>> Among the various mechanism to control the flag, the sysfs
> >>> interface seems sufficiently simple and self-contained.
> >>> The attribute can be modified through
> >>>     /sys/class/net/<DEVICE>/xdp_linearize
> >>> The default is 1 (on)
> >
> > Needs documentation in Documentation/ABI/testing/sysfs-class-net.
> >
> >> Erm, won't turning off linearization break the XDP program's ability to
> >> do direct packet access?
> >
> > Yes, in the worst case you only have eth header pulled into linear
> > section. :/
>
> In which case an eBPF program could read/write out of bounds since the
> verifier only verifies checks against xdp->data_end. Right?

Why out of bounds? Without linearization we construct xdp_buff as follows:

mac_len = skb->data - skb_mac_header(skb);
hlen = skb_headlen(skb) + mac_len;
xdp->data = skb->data - mac_len;
xdp->data_end = xdp->data + hlen;
xdp->data_hard_start = skb->data - skb_headroom(skb);

so we shouldn't go out of bounds.

cheers
luigi
Toke Høiland-Jørgensen Jan. 23, 2020, 6 p.m. UTC | #6
Luigi Rizzo <lrizzo@google.com> writes:

> On Thu, Jan 23, 2020 at 7:48 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote:
>> > Luigi Rizzo <lrizzo@google.com> writes:
>> >
>> >> Add a netdevice flag to control skb linearization in generic xdp mode.
>> >> Among the various mechanism to control the flag, the sysfs
>> >> interface seems sufficiently simple and self-contained.
>> >> The attribute can be modified through
>> >>      /sys/class/net/<DEVICE>/xdp_linearize
>> >> The default is 1 (on)
>>
>> Needs documentation in Documentation/ABI/testing/sysfs-class-net.
>>
>> > Erm, won't turning off linearization break the XDP program's ability to
>> > do direct packet access?
>>
>> Yes, in the worst case you only have eth header pulled into linear section. :/
>> In tc/BPF for direct packet access we have bpf_skb_pull_data() helper which can
>> pull in up to X bytes into linear section on demand. I guess something like this
>> could be done for XDP context as well, e.g. generic XDP would pull when non-linear
>> and native XDP would have nothing todo (though in this case you end up writing the
>> prog specifically for generic XDP with slowdown when you'd load it on native XDP
>> where it's linear anyway, but that could/should be documented if so).
>
> There was some discussion on multi-segment xdp
> https://www.spinics.net/lists/netdev/msg620140.html
> https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
>
> with no clear decision as far as I can tell.
>
> I wanted to point out that linearization might be an issue for native
> xdp as well (specifically with NICs that do header split, LRO,
> scatter-gather, MTU pagesize ...) and having to unconditionally pay
> the linearization cost (or disable the above features) by just loading
> an xdp program may be a big performance hit.

Right, sure, but then I'd rather fix it for all of XDP instead of
introduce (more) differences between native and generic mode...

-Toke
Toke Høiland-Jørgensen Jan. 23, 2020, 6:01 p.m. UTC | #7
Luigi Rizzo <lrizzo@google.com> writes:

> On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>
>> > On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote:
>> >> Luigi Rizzo <lrizzo@google.com> writes:
>> >>
>> >>> Add a netdevice flag to control skb linearization in generic xdp mode.
>> >>> Among the various mechanism to control the flag, the sysfs
>> >>> interface seems sufficiently simple and self-contained.
>> >>> The attribute can be modified through
>> >>>     /sys/class/net/<DEVICE>/xdp_linearize
>> >>> The default is 1 (on)
>> >
>> > Needs documentation in Documentation/ABI/testing/sysfs-class-net.
>> >
>> >> Erm, won't turning off linearization break the XDP program's ability to
>> >> do direct packet access?
>> >
>> > Yes, in the worst case you only have eth header pulled into linear
>> > section. :/
>>
>> In which case an eBPF program could read/write out of bounds since the
>> verifier only verifies checks against xdp->data_end. Right?
>
> Why out of bounds? Without linearization we construct xdp_buff as follows:
>
> mac_len = skb->data - skb_mac_header(skb);
> hlen = skb_headlen(skb) + mac_len;
> xdp->data = skb->data - mac_len;
> xdp->data_end = xdp->data + hlen;
> xdp->data_hard_start = skb->data - skb_headroom(skb);
>
> so we shouldn't go out of bounds.

Hmm, right, as long as it's guaranteed that the bit up to hlen is
already linear; is it? :)

-Toke
Luigi Rizzo Jan. 23, 2020, 6:06 p.m. UTC | #8
On Thu, Jan 23, 2020 at 10:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Luigi Rizzo <lrizzo@google.com> writes:
>
> > On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Daniel Borkmann <daniel@iogearbox.net> writes:
> >>
> >> > On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote:
> >> >> Luigi Rizzo <lrizzo@google.com> writes:
> >> >>
> >> >>> Add a netdevice flag to control skb linearization in generic xdp mode.
> >> >>> Among the various mechanism to control the flag, the sysfs
> >> >>> interface seems sufficiently simple and self-contained.
> >> >>> The attribute can be modified through
> >> >>>     /sys/class/net/<DEVICE>/xdp_linearize
> >> >>> The default is 1 (on)
> >> >
> >> > Needs documentation in Documentation/ABI/testing/sysfs-class-net.
> >> >
> >> >> Erm, won't turning off linearization break the XDP program's ability to
> >> >> do direct packet access?
> >> >
> >> > Yes, in the worst case you only have eth header pulled into linear
> >> > section. :/
> >>
> >> In which case an eBPF program could read/write out of bounds since the
> >> verifier only verifies checks against xdp->data_end. Right?
> >
> > Why out of bounds? Without linearization we construct xdp_buff as follows:
> >
> > mac_len = skb->data - skb_mac_header(skb);
> > hlen = skb_headlen(skb) + mac_len;
> > xdp->data = skb->data - mac_len;
> > xdp->data_end = xdp->data + hlen;
> > xdp->data_hard_start = skb->data - skb_headroom(skb);
> >
> > so we shouldn't go out of bounds.
>
> Hmm, right, as long as it's guaranteed that the bit up to hlen is
> already linear; is it? :)

honest question: that would be skb->len - skb->data_len, isn't that
the linear part by definition ?

cheers
luigi
>
> -Toke
>
Luigi Rizzo Jan. 23, 2020, 6:11 p.m. UTC | #9
On Thu, Jan 23, 2020 at 10:00 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Luigi Rizzo <lrizzo@google.com> writes:
>
> > On Thu, Jan 23, 2020 at 7:48 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
...
> > There was some discussion on multi-segment xdp
> > https://www.spinics.net/lists/netdev/msg620140.html
> > https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> >
> > with no clear decision as far as I can tell.
> >
> > I wanted to point out that linearization might be an issue for native
> > xdp as well (specifically with NICs that do header split, LRO,
> > scatter-gather, MTU pagesize ...) and having to unconditionally pay
> > the linearization cost (or disable the above features) by just loading
> > an xdp program may be a big performance hit.
>
> Right, sure, but then I'd rather fix it for all of XDP instead of
> introduce (more) differences between native and generic mode...

FWIW I think the discussion was heading towards 'only pass the first segment
to the bpf code' with some other mechanism TBD to at least reliably know that
there is more (right now bpf can look at the protocol headers, but of
course they can lie).

Whatever the fix is, it will probably require extending xdp_buff with
extra fields.
I am not sure whether this is considered a stable ABI or one in flux...

cheers
luigi
>
> -Toke
>
Daniel Borkmann Jan. 23, 2020, 9:36 p.m. UTC | #10
On 1/23/20 7:06 PM, Luigi Rizzo wrote:
> On Thu, Jan 23, 2020 at 10:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> Luigi Rizzo <lrizzo@google.com> writes:
>>> On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>>> On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote:
>>>>>> Luigi Rizzo <lrizzo@google.com> writes:
>>>>>>
>>>>>>> Add a netdevice flag to control skb linearization in generic xdp mode.
>>>>>>> Among the various mechanism to control the flag, the sysfs
>>>>>>> interface seems sufficiently simple and self-contained.
>>>>>>> The attribute can be modified through
>>>>>>>      /sys/class/net/<DEVICE>/xdp_linearize
>>>>>>> The default is 1 (on)
>>>>>
>>>>> Needs documentation in Documentation/ABI/testing/sysfs-class-net.
>>>>>
>>>>>> Erm, won't turning off linearization break the XDP program's ability to
>>>>>> do direct packet access?
>>>>>
>>>>> Yes, in the worst case you only have eth header pulled into linear
>>>>> section. :/
>>>>
>>>> In which case an eBPF program could read/write out of bounds since the
>>>> verifier only verifies checks against xdp->data_end. Right?
>>>
>>> Why out of bounds? Without linearization we construct xdp_buff as follows:
>>>
>>> mac_len = skb->data - skb_mac_header(skb);
>>> hlen = skb_headlen(skb) + mac_len;
>>> xdp->data = skb->data - mac_len;
>>> xdp->data_end = xdp->data + hlen;
>>> xdp->data_hard_start = skb->data - skb_headroom(skb);
>>>
>>> so we shouldn't go out of bounds.
>>
>> Hmm, right, as long as it's guaranteed that the bit up to hlen is
>> already linear; is it? :)
> 
> honest question: that would be skb->len - skb->data_len, isn't that
> the linear part by definition ?

Yep, that's the linear part by definition. Generic XDP with ->data/->data_end is in
this aspect no different from tc/BPF where we operate on skb context. Only linear part
can be covered from skb (unless you pull in more via helper for the latter).

Thanks,
Daniel
Toke Høiland-Jørgensen Jan. 24, 2020, 9:57 a.m. UTC | #11
Daniel Borkmann <daniel@iogearbox.net> writes:

> On 1/23/20 7:06 PM, Luigi Rizzo wrote:
>> On Thu, Jan 23, 2020 at 10:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>> Luigi Rizzo <lrizzo@google.com> writes:
>>>> On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>>>> On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote:
>>>>>>> Luigi Rizzo <lrizzo@google.com> writes:
>>>>>>>
>>>>>>>> Add a netdevice flag to control skb linearization in generic xdp mode.
>>>>>>>> Among the various mechanism to control the flag, the sysfs
>>>>>>>> interface seems sufficiently simple and self-contained.
>>>>>>>> The attribute can be modified through
>>>>>>>>      /sys/class/net/<DEVICE>/xdp_linearize
>>>>>>>> The default is 1 (on)
>>>>>>
>>>>>> Needs documentation in Documentation/ABI/testing/sysfs-class-net.
>>>>>>
>>>>>>> Erm, won't turning off linearization break the XDP program's ability to
>>>>>>> do direct packet access?
>>>>>>
>>>>>> Yes, in the worst case you only have eth header pulled into linear
>>>>>> section. :/
>>>>>
>>>>> In which case an eBPF program could read/write out of bounds since the
>>>>> verifier only verifies checks against xdp->data_end. Right?
>>>>
>>>> Why out of bounds? Without linearization we construct xdp_buff as follows:
>>>>
>>>> mac_len = skb->data - skb_mac_header(skb);
>>>> hlen = skb_headlen(skb) + mac_len;
>>>> xdp->data = skb->data - mac_len;
>>>> xdp->data_end = xdp->data + hlen;
>>>> xdp->data_hard_start = skb->data - skb_headroom(skb);
>>>>
>>>> so we shouldn't go out of bounds.
>>>
>>> Hmm, right, as long as it's guaranteed that the bit up to hlen is
>>> already linear; is it? :)
>> 
>> honest question: that would be skb->len - skb->data_len, isn't that
>> the linear part by definition ?
>
> Yep, that's the linear part by definition. Generic XDP with ->data/->data_end is in
> this aspect no different from tc/BPF where we operate on skb context. Only linear part
> can be covered from skb (unless you pull in more via helper for the
> latter).

OK, but then why are we linearising in the first place? Just to get
sufficient headroom?

-Toke
Luigi Rizzo Jan. 24, 2020, 2:31 p.m. UTC | #12
On Fri, Jan 24, 2020 at 1:57 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Daniel Borkmann <daniel@iogearbox.net> writes:
>
> > On 1/23/20 7:06 PM, Luigi Rizzo wrote:
> >> On Thu, Jan 23, 2020 at 10:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>> Luigi Rizzo <lrizzo@google.com> writes:
> >>>> On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
> >>>>>> On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote:
> >>>>>>> Luigi Rizzo <lrizzo@google.com> writes:
> >>>>>>>
> >>>>>>>> Add a netdevice flag to control skb linearization in generic xdp mode.
> >>>>>>>> Among the various mechanism to control the flag, the sysfs
> >>>>>>>> interface seems sufficiently simple and self-contained.
> >>>>>>>> The attribute can be modified through
> >>>>>>>>      /sys/class/net/<DEVICE>/xdp_linearize
> >>>>>>>> The default is 1 (on)
> >>>>>>
> >>>>>> Needs documentation in Documentation/ABI/testing/sysfs-class-net.
> >>>>>>
> >>>>>>> Erm, won't turning off linearization break the XDP program's ability to
> >>>>>>> do direct packet access?
> >>>>>>
> >>>>>> Yes, in the worst case you only have eth header pulled into linear
> >>>>>> section. :/
> >>>>>
> >>>>> In which case an eBPF program could read/write out of bounds since the
> >>>>> verifier only verifies checks against xdp->data_end. Right?
> >>>>
> >>>> Why out of bounds? Without linearization we construct xdp_buff as follows:
> >>>>
> >>>> mac_len = skb->data - skb_mac_header(skb);
> >>>> hlen = skb_headlen(skb) + mac_len;
> >>>> xdp->data = skb->data - mac_len;
> >>>> xdp->data_end = xdp->data + hlen;
> >>>> xdp->data_hard_start = skb->data - skb_headroom(skb);
> >>>>
> >>>> so we shouldn't go out of bounds.
> >>>
> >>> Hmm, right, as long as it's guaranteed that the bit up to hlen is
> >>> already linear; is it? :)
> >>
> >> honest question: that would be skb->len - skb->data_len, isn't that
> >> the linear part by definition ?
> >
> > Yep, that's the linear part by definition. Generic XDP with ->data/->data_end is in
> > this aspect no different from tc/BPF where we operate on skb context. Only linear part
> > can be covered from skb (unless you pull in more via helper for the
> > latter).
>
> OK, but then why are we linearising in the first place? Just to get
> sufficient headroom?

Looking at the condition in the if() it is both to make sufficient
headroom available and have
linear data so the bpf code can access all the packet data.

My motivation for this change is that enforcing those guarantees has
significant cost
(even for native xdp in the cases I mentioned - mtu > 1 page, hw LRO,
header split),
and this is an interim solution to make generic skb usable without too
much penalty.

In the long term I think it would be good if the xdp program could
express its requirements
at load time ("i just need header, I need at least 18 bytes of
headroom..") and have the
netdev or nic driver reconfigure as appropriate.

cheers
luigi
Toke Høiland-Jørgensen Jan. 24, 2020, 3:30 p.m. UTC | #13
Luigi Rizzo <lrizzo@google.com> writes:

> On Fri, Jan 24, 2020 at 1:57 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>
>> > On 1/23/20 7:06 PM, Luigi Rizzo wrote:
>> >> On Thu, Jan 23, 2020 at 10:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>> Luigi Rizzo <lrizzo@google.com> writes:
>> >>>> On Thu, Jan 23, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>> >>>>>> On 1/23/20 10:53 AM, Toke Høiland-Jørgensen wrote:
>> >>>>>>> Luigi Rizzo <lrizzo@google.com> writes:
>> >>>>>>>
>> >>>>>>>> Add a netdevice flag to control skb linearization in generic xdp mode.
>> >>>>>>>> Among the various mechanism to control the flag, the sysfs
>> >>>>>>>> interface seems sufficiently simple and self-contained.
>> >>>>>>>> The attribute can be modified through
>> >>>>>>>>      /sys/class/net/<DEVICE>/xdp_linearize
>> >>>>>>>> The default is 1 (on)
>> >>>>>>
>> >>>>>> Needs documentation in Documentation/ABI/testing/sysfs-class-net.
>> >>>>>>
>> >>>>>>> Erm, won't turning off linearization break the XDP program's ability to
>> >>>>>>> do direct packet access?
>> >>>>>>
>> >>>>>> Yes, in the worst case you only have eth header pulled into linear
>> >>>>>> section. :/
>> >>>>>
>> >>>>> In which case an eBPF program could read/write out of bounds since the
>> >>>>> verifier only verifies checks against xdp->data_end. Right?
>> >>>>
>> >>>> Why out of bounds? Without linearization we construct xdp_buff as follows:
>> >>>>
>> >>>> mac_len = skb->data - skb_mac_header(skb);
>> >>>> hlen = skb_headlen(skb) + mac_len;
>> >>>> xdp->data = skb->data - mac_len;
>> >>>> xdp->data_end = xdp->data + hlen;
>> >>>> xdp->data_hard_start = skb->data - skb_headroom(skb);
>> >>>>
>> >>>> so we shouldn't go out of bounds.
>> >>>
>> >>> Hmm, right, as long as it's guaranteed that the bit up to hlen is
>> >>> already linear; is it? :)
>> >>
>> >> honest question: that would be skb->len - skb->data_len, isn't that
>> >> the linear part by definition ?
>> >
>> > Yep, that's the linear part by definition. Generic XDP with ->data/->data_end is in
>> > this aspect no different from tc/BPF where we operate on skb context. Only linear part
>> > can be covered from skb (unless you pull in more via helper for the
>> > latter).
>>
>> OK, but then why are we linearising in the first place? Just to get
>> sufficient headroom?
>
> Looking at the condition in the if() it is both to make sufficient
> headroom available and have linear data so the bpf code can access all
> the packet data.

Ohhh, didn't realise that linearising also changes skb_headlen() - makes
so much more sense now :)

> My motivation for this change is that enforcing those guarantees has
> significant cost (even for native xdp in the cases I mentioned - mtu >
> 1 page, hw LRO, header split), and this is an interim solution to make
> generic skb usable without too much penalty.

Sure, that part I understand; I just don't like that this "interim"
solution makes generic and native XDP diverge further in their
semantics...

> In the long term I think it would be good if the xdp program could
> express its requirements at load time ("i just need header, I need at
> least 18 bytes of headroom..") and have the netdev or nic driver
> reconfigure as appropriate.

This may be interesting to include in the XDP feature detection
capabilities we've been discussing for some time. Our current thinking
is that the verifier should detect what a program does, rather than the
program having to explicitly declare what features it needs. See
https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#notes-implementation-plan
for some notes on this :)

-Toke
Luigi Rizzo Jan. 24, 2020, 5:15 p.m. UTC | #14
On Fri, Jan 24, 2020 at 7:31 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Luigi Rizzo <lrizzo@google.com> writes:
>
...
> > My motivation for this change is that enforcing those guarantees has
> > significant cost (even for native xdp in the cases I mentioned - mtu >
> > 1 page, hw LRO, header split), and this is an interim solution to make
> > generic skb usable without too much penalty.
>
> Sure, that part I understand; I just don't like that this "interim"
> solution makes generic and native XDP diverge further in their
> semantics...

As a matter of fact I think it would make full sense to use the same approach
to control whether native xdp should pay the price converting to linear buffers
when the hw cannot guarantee that.

To me this seems to be a case of "perfect is enemy of good":..

cheers
luigi


>
> > In the long term I think it would be good if the xdp program could
> > express its requirements at load time ("i just need header, I need at
> > least 18 bytes of headroom..") and have the netdev or nic driver
> > reconfigure as appropriate.
>
> This may be interesting to include in the XDP feature detection
> capabilities we've been discussing for some time. Our current thinking
> is that the verifier should detect what a program does, rather than the
> program having to explicitly declare what features it needs. See
> https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#notes-implementation-plan
> for some notes on this :)
>
> -Toke
>
Toke Høiland-Jørgensen Jan. 24, 2020, 9:27 p.m. UTC | #15
Luigi Rizzo <lrizzo@google.com> writes:

> On Fri, Jan 24, 2020 at 7:31 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Luigi Rizzo <lrizzo@google.com> writes:
>>
> ...
>> > My motivation for this change is that enforcing those guarantees has
>> > significant cost (even for native xdp in the cases I mentioned - mtu >
>> > 1 page, hw LRO, header split), and this is an interim solution to make
>> > generic skb usable without too much penalty.
>>
>> Sure, that part I understand; I just don't like that this "interim"
>> solution makes generic and native XDP diverge further in their
>> semantics...
>
> As a matter of fact I think it would make full sense to use the same approach
> to control whether native xdp should pay the price converting to linear buffers
> when the hw cannot guarantee that.
>
> To me this seems to be a case of "perfect is enemy of good":..

Hmm, I can kinda see your point (now that I've actually grok'ed how the
length works with skbs and generic XDP :)). I would still worry that
only having the header there would lead some XDP programs to just
silently fail. But on the other hand, this is opt-in... so IDK - maybe
this is fine to merge as-is, and leave improvements for later?

-Toke
Luigi Rizzo Feb. 5, 2020, 3:36 p.m. UTC | #16
On Fri, Jan 24, 2020 at 1:28 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Luigi Rizzo <lrizzo@google.com> writes:
>
> > On Fri, Jan 24, 2020 at 7:31 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Luigi Rizzo <lrizzo@google.com> writes:
> >>
> > ...
> >> > My motivation for this change is that enforcing those guarantees has
> >> > significant cost (even for native xdp in the cases I mentioned - mtu >
> >> > 1 page, hw LRO, header split), and this is an interim solution to make
> >> > generic skb usable without too much penalty.
> >>
> >> Sure, that part I understand; I just don't like that this "interim"
> >> solution makes generic and native XDP diverge further in their
> >> semantics...
> >
> > As a matter of fact I think it would make full sense to use the same approach
> > to control whether native xdp should pay the price converting to linear buffers
> > when the hw cannot guarantee that.
> >
> > To me this seems to be a case of "perfect is enemy of good":..
>
> Hmm, I can kinda see your point (now that I've actually grok'ed how the
> length works with skbs and generic XDP :)). I would still worry that
> only having the header there would lead some XDP programs to just
> silently fail. But on the other hand, this is opt-in... so IDK - maybe
> this is fine to merge as-is, and leave improvements for later?

Sorry I let this slip, any consensus on this patch?

Thanks
Luigi
Toke Høiland-Jørgensen Feb. 5, 2020, 3:55 p.m. UTC | #17
Luigi Rizzo <rizzo@iet.unipi.it> writes:

> On Fri, Jan 24, 2020 at 1:28 PM Toke Høiland-Jørgensen <toke@redhat.com>
> wrote:
>
>> Luigi Rizzo <lrizzo@google.com> writes:
>>
>> > On Fri, Jan 24, 2020 at 7:31 AM Toke Høiland-Jørgensen <toke@redhat.com>
>> wrote:
>> >>
>> >> Luigi Rizzo <lrizzo@google.com> writes:
>> >>
>> > ...
>> >> > My motivation for this change is that enforcing those guarantees has
>> >> > significant cost (even for native xdp in the cases I mentioned - mtu >
>> >> > 1 page, hw LRO, header split), and this is an interim solution to make
>> >> > generic skb usable without too much penalty.
>> >>
>> >> Sure, that part I understand; I just don't like that this "interim"
>> >> solution makes generic and native XDP diverge further in their
>> >> semantics...
>> >
>> > As a matter of fact I think it would make full sense to use the same
>> approach
>> > to control whether native xdp should pay the price converting to linear
>> buffers
>> > when the hw cannot guarantee that.
>> >
>> > To me this seems to be a case of "perfect is enemy of good":..
>>
>> Hmm, I can kinda see your point (now that I've actually grok'ed how the
>> length works with skbs and generic XDP :)). I would still worry that
>> only having the header there would lead some XDP programs to just
>> silently fail. But on the other hand, this is opt-in... so IDK - maybe
>> this is fine to merge as-is, and leave improvements for later?
>>
>
> Sorry I let this slip, any consensus on this patch?

Dunno if there's a consensus, but I certainly ran out of objections ;)

-Toke
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2741aa35bec6..ae873fb5ec3c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1958,7 +1958,8 @@  struct net_device {
 
 	struct netdev_rx_queue	*_rx;
 	unsigned int		num_rx_queues;
-	unsigned int		real_num_rx_queues;
+	unsigned int		real_num_rx_queues:31;
+	unsigned int		xdp_linearize : 1;
 
 	struct bpf_prog __rcu	*xdp_prog;
 	unsigned long		gro_flush_timeout;
diff --git a/net/core/dev.c b/net/core/dev.c
index 6368c94c9e0a..04c7c8ed1b4a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4484,8 +4484,8 @@  static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	 * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
 	 * native XDP provides, thus we need to do it here as well.
 	 */
-	if (skb_is_nonlinear(skb) ||
-	    skb_headroom(skb) < XDP_PACKET_HEADROOM) {
+	if (skb->dev->xdp_linearize && (skb_is_nonlinear(skb) ||
+	    skb_headroom(skb) < XDP_PACKET_HEADROOM)) {
 		int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb);
 		int troom = skb->tail + skb->data_len - skb->end;
 
@@ -9756,6 +9756,7 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->gso_max_segs = GSO_MAX_SEGS;
 	dev->upper_level = 1;
 	dev->lower_level = 1;
+	dev->xdp_linearize = 1;
 
 	INIT_LIST_HEAD(&dev->napi_list);
 	INIT_LIST_HEAD(&dev->unreg_list);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 4c826b8bf9b1..ec59aa296664 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -442,6 +442,20 @@  static ssize_t proto_down_store(struct device *dev,
 }
 NETDEVICE_SHOW_RW(proto_down, fmt_dec);
 
+static int change_xdp_linearize(struct net_device *dev, unsigned long val)
+{
+	dev->xdp_linearize = !!val;
+	return 0;
+}
+
+static ssize_t xdp_linearize_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	return netdev_store(dev, attr, buf, len, change_xdp_linearize);
+}
+NETDEVICE_SHOW_RW(xdp_linearize, fmt_dec);
+
 static ssize_t phys_port_id_show(struct device *dev,
 				 struct device_attribute *attr, char *buf)
 {
@@ -536,6 +550,7 @@  static struct attribute *net_class_attrs[] __ro_after_init = {
 	&dev_attr_phys_port_name.attr,
 	&dev_attr_phys_switch_id.attr,
 	&dev_attr_proto_down.attr,
+	&dev_attr_xdp_linearize.attr,
 	&dev_attr_carrier_up_count.attr,
 	&dev_attr_carrier_down_count.attr,
 	NULL,