diff mbox series

[bpf-next,04/16] net: Add BPF_XDP_EGRESS as a bpf_attach_type

Message ID 20200420200055.49033-5-dsahern@kernel.org
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series net: Add support for XDP in egress path | expand

Commit Message

David Ahern April 20, 2020, 8 p.m. UTC
From: David Ahern <dahern@digitalocean.com>

Add new bpf_attach_type, BPF_XDP_EGRESS, for BPF programs attached
at the XDP layer, but the egress path.

Since egress path will not have ingress_ifindex and rx_queue_index
set, update xdp_is_valid_access to block access to these entries in
the xdp context when a program is attached to egress path.

Update dev_change_xdp_fd to verify expected_attach_type for a program
is BPF_XDP_EGRESS if egress argument is set.

The next patch adds support for the egress ifindex.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/uapi/linux/bpf.h       |  1 +
 net/core/dev.c                 | 11 +++++++++++
 net/core/filter.c              |  8 ++++++++
 tools/include/uapi/linux/bpf.h |  1 +
 4 files changed, 21 insertions(+)

Comments

Toke Høiland-Jørgensen April 21, 2020, 10:14 a.m. UTC | #1
David Ahern <dsahern@kernel.org> writes:

> From: David Ahern <dahern@digitalocean.com>
>
> Add new bpf_attach_type, BPF_XDP_EGRESS, for BPF programs attached
> at the XDP layer, but the egress path.
>
> Since egress path will not have ingress_ifindex and rx_queue_index
> set, update xdp_is_valid_access to block access to these entries in
> the xdp context when a program is attached to egress path.
>
> Update dev_change_xdp_fd to verify expected_attach_type for a program
> is BPF_XDP_EGRESS if egress argument is set.
>
> The next patch adds support for the egress ifindex.
>
> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> Signed-off-by: David Ahern <dahern@digitalocean.com>
> ---
>  include/uapi/linux/bpf.h       |  1 +
>  net/core/dev.c                 | 11 +++++++++++
>  net/core/filter.c              |  8 ++++++++
>  tools/include/uapi/linux/bpf.h |  1 +
>  4 files changed, 21 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2e29a671d67e..a9d384998e8b 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -215,6 +215,7 @@ enum bpf_attach_type {
>  	BPF_TRACE_FEXIT,
>  	BPF_MODIFY_RETURN,
>  	BPF_LSM_MAC,
> +	BPF_XDP_EGRESS,
>  	__MAX_BPF_ATTACH_TYPE
>  };
>  
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 97180458e7cb..e8a62bdb395b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -8732,6 +8732,17 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
>  		if (IS_ERR(prog))
>  			return PTR_ERR(prog);
>  
> +		if (egress && prog->expected_attach_type != BPF_XDP_EGRESS) {
> +			NL_SET_ERR_MSG(extack, "XDP program in Tx path must use BPF_XDP_EGRESS attach type");
> +			bpf_prog_put(prog);
> +			return -EINVAL;
> +		}
> +		if (!egress && prog->expected_attach_type == BPF_XDP_EGRESS) {
> +			NL_SET_ERR_MSG(extack, "XDP program in Rx path can not use BPF_XDP_EGRESS attach type");
> +			bpf_prog_put(prog);
> +			return -EINVAL;
> +		}
> +
>  		if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
>  			NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
>  			bpf_prog_put(prog);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 7d6ceaa54d21..bcb56448f336 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -6935,6 +6935,14 @@ static bool xdp_is_valid_access(int off, int size,
>  				const struct bpf_prog *prog,
>  				struct bpf_insn_access_aux *info)
>  {
> +	if (prog->expected_attach_type == BPF_XDP_EGRESS) {
> +		switch (off) {
> +		case offsetof(struct xdp_md, ingress_ifindex):
> +		case offsetof(struct xdp_md, rx_queue_index):
> +			return false;
> +		}
> +	}

As I pointed out on the RFC patch, I'm concerned whether this will work
right with freplace programs attaching to XDP programs. It may just be
that I'm missing something, but in that case please explain why it
works? :)

-Toke
David Ahern April 21, 2020, 12:50 p.m. UTC | #2
On 4/21/20 4:14 AM, Toke Høiland-Jørgensen wrote:
> As I pointed out on the RFC patch, I'm concerned whether this will work
> right with freplace programs attaching to XDP programs. It may just be
> that I'm missing something, but in that case please explain why it
> works? :)

expected_attach_type is not unique to XDP. freplace is not unique to
XDP. IF there is a problem, it is not unique to XDP, and any
enhancements needed to freplace functionality will not be unique to XDP.
Toke Høiland-Jørgensen April 21, 2020, 1:25 p.m. UTC | #3
David Ahern <dsahern@gmail.com> writes:

> On 4/21/20 4:14 AM, Toke Høiland-Jørgensen wrote:
>> As I pointed out on the RFC patch, I'm concerned whether this will work
>> right with freplace programs attaching to XDP programs. It may just be
>> that I'm missing something, but in that case please explain why it
>> works? :)
>
> expected_attach_type is not unique to XDP. freplace is not unique to
> XDP. IF there is a problem, it is not unique to XDP, and any
> enhancements needed to freplace functionality will not be unique to XDP.

Still needs to be fixed, though :)

Also, at least looking through all the is_valid_access functions in
filter.c, they all seem to "fail safe". I.e., specific
expected_attach_type values can permit the program access to additional
ranges. In which case an freplace program that doesn't have the right
attach type will just be rejected if it tries to access such a field.
Whereas here you're *disallowing* something based on a particular
expected_attach_type, so you can end up with an egress program that
should have been rejected by the verifier but isn't because it's missing
the attach_type.

-Toke
David Ahern April 21, 2020, 1:49 p.m. UTC | #4
On 4/21/20 7:25 AM, Toke Høiland-Jørgensen wrote:
> David Ahern <dsahern@gmail.com> writes:
> 
>> On 4/21/20 4:14 AM, Toke Høiland-Jørgensen wrote:
>>> As I pointed out on the RFC patch, I'm concerned whether this will work
>>> right with freplace programs attaching to XDP programs. It may just be
>>> that I'm missing something, but in that case please explain why it
>>> works? :)
>>
>> expected_attach_type is not unique to XDP. freplace is not unique to
>> XDP. IF there is a problem, it is not unique to XDP, and any
>> enhancements needed to freplace functionality will not be unique to XDP.
> 
> Still needs to be fixed, though :)

one problem at a time. I have a long list of items that are directly
relevant to what I want to do.

> 
> Also, at least looking through all the is_valid_access functions in
> filter.c, they all seem to "fail safe". I.e., specific
> expected_attach_type values can permit the program access to additional
> ranges. In which case an freplace program that doesn't have the right
> attach type will just be rejected if it tries to access such a field.
> Whereas here you're *disallowing* something based on a particular
> expected_attach_type, so you can end up with an egress program that
> should have been rejected by the verifier but isn't because it's missing
> the attach_type.

There are 6 existing valid access checks on expected_attach_type doing
the exact same thing - validating access based on attach type.
Toke Høiland-Jørgensen April 22, 2020, 11:21 a.m. UTC | #5
David Ahern <dsahern@gmail.com> writes:

> On 4/21/20 7:25 AM, Toke Høiland-Jørgensen wrote:
>> David Ahern <dsahern@gmail.com> writes:
>> 
>>> On 4/21/20 4:14 AM, Toke Høiland-Jørgensen wrote:
>>>> As I pointed out on the RFC patch, I'm concerned whether this will work
>>>> right with freplace programs attaching to XDP programs. It may just be
>>>> that I'm missing something, but in that case please explain why it
>>>> works? :)
>>>
>>> expected_attach_type is not unique to XDP. freplace is not unique to
>>> XDP. IF there is a problem, it is not unique to XDP, and any
>>> enhancements needed to freplace functionality will not be unique to XDP.
>> 
>> Still needs to be fixed, though :)
>
> one problem at a time. I have a long list of items that are directly
> relevant to what I want to do.

Not saying a fix to freplace *has* to be part of this series; just
saying that I would be more comfortable if that was fixed before we
merge this.

>> Also, at least looking through all the is_valid_access functions in
>> filter.c, they all seem to "fail safe". I.e., specific
>> expected_attach_type values can permit the program access to additional
>> ranges. In which case an freplace program that doesn't have the right
>> attach type will just be rejected if it tries to access such a field.
>> Whereas here you're *disallowing* something based on a particular
>> expected_attach_type, so you can end up with an egress program that
>> should have been rejected by the verifier but isn't because it's missing
>> the attach_type.
>
> There are 6 existing valid access checks on expected_attach_type doing
> the exact same thing - validating access based on attach type.

See my point about default black/white listing, though. You are adding a
new restriction to an existing program type based on this, so surely we
should make sure this restriction actually sticks, no?

-Toke
David Ahern April 22, 2020, 2:51 p.m. UTC | #6
On 4/22/20 5:21 AM, Toke Høiland-Jørgensen wrote:
> Not saying a fix to freplace *has* to be part of this series; just
> saying that I would be more comfortable if that was fixed before we
> merge this.

You don't have a test case that fails, you just think there is a problem
based on code analysis. Have you taken this presumption up with the
author of the freplace code?
Toke Høiland-Jørgensen April 22, 2020, 3:27 p.m. UTC | #7
David Ahern <dsahern@gmail.com> writes:

> On 4/22/20 5:21 AM, Toke Høiland-Jørgensen wrote:
>> Not saying a fix to freplace *has* to be part of this series; just
>> saying that I would be more comfortable if that was fixed before we
>> merge this.
>
> You don't have a test case that fails, you just think there is a problem
> based on code analysis. Have you taken this presumption up with the
> author of the freplace code?

Well this is a mailing list, isn't it? So that's what I'm doing;
hopefully Alexei will chime in... Otherwise I guess I can produce a
testcase tomorrow.

And as I said in the beginning, I'm perfectly happy to be told why I'm
wrong; but so far you have just been arguing that I'm out of scope ;)

-Toke
David Ahern April 22, 2020, 3:33 p.m. UTC | #8
On 4/22/20 9:27 AM, Toke Høiland-Jørgensen wrote:
> And as I said in the beginning, I'm perfectly happy to be told why I'm
> wrong; but so far you have just been arguing that I'm out of scope ;)

you are arguing about a suspected bug with existing code that is no way
touched or modified by this patch set, so yes it is out of scope.

The proper way to discuss a bug in existing code is a thread focused on
that topic, not buried inside comments on a patch set.
Toke Høiland-Jørgensen April 22, 2020, 3:51 p.m. UTC | #9
David Ahern <dsahern@gmail.com> writes:

> On 4/22/20 9:27 AM, Toke Høiland-Jørgensen wrote:
>> And as I said in the beginning, I'm perfectly happy to be told why I'm
>> wrong; but so far you have just been arguing that I'm out of scope ;)
>
> you are arguing about a suspected bug with existing code that is no way
> touched or modified by this patch set, so yes it is out of scope.

Your patch is relying on the (potentially buggy) behaviour, so I don't
think it's out of scope to mention it in this context.

> The proper way to discuss a bug in existing code is a thread focused on
> that topic, not buried inside comments on a patch set.

I'm fine with starting a new thread; will do (but as I said, tomorrow).

-Toke
David Ahern April 22, 2020, 3:56 p.m. UTC | #10
On 4/22/20 9:51 AM, Toke Høiland-Jørgensen wrote:
> Your patch is relying on the (potentially buggy) behaviour, so I don't
> think it's out of scope to mention it in this context.

This is getting ridiculous. I am in no way, shape, or form relying on
freplace.

If you think there is a problem with existing code, then you prove it.
Don't ask someone else to stop what they are doing and prove you right
or wrong.

I guess we'll see tomorrow the outcome of your investigations.
Alexei Starovoitov April 23, 2020, 12:39 a.m. UTC | #11
On Wed, Apr 22, 2020 at 05:51:36PM +0200, Toke Høiland-Jørgensen wrote:
> David Ahern <dsahern@gmail.com> writes:
> 
> > On 4/22/20 9:27 AM, Toke Høiland-Jørgensen wrote:
> >> And as I said in the beginning, I'm perfectly happy to be told why I'm
> >> wrong; but so far you have just been arguing that I'm out of scope ;)
> >
> > you are arguing about a suspected bug with existing code that is no way
> > touched or modified by this patch set, so yes it is out of scope.
> 
> Your patch is relying on the (potentially buggy) behaviour, so I don't
> think it's out of scope to mention it in this context.

Sorry for slow reply.
I'm swamped with other things atm.

Looks like there is indeed a bug in prog_type_ext handling code that
is doing
env->ops = bpf_verifier_ops[tgt_prog->type];
I'm not sure whether the verifier can simply add:
prog->expected_attach_type = tgt_prog->expected_attach_type;
and be done with it.
Likely yes, since expected_attach_type must be zero at that point
that is enforced by bpf_prog_load_check_attach().
So I suspect it's a single line fix.
A selftest to prove or disprove is necessary, of course.

Thanks Toke for bringing it to my attention.
Toke Høiland-Jørgensen April 23, 2020, 3:23 p.m. UTC | #12
David Ahern <dsahern@gmail.com> writes:

> On 4/22/20 9:51 AM, Toke Høiland-Jørgensen wrote:
>> Your patch is relying on the (potentially buggy) behaviour, so I don't
>> think it's out of scope to mention it in this context.
>
> This is getting ridiculous. I am in no way, shape, or form relying on
> freplace.

No, I meant that you're relying on 'expected_attach_type'.

> I guess we'll see tomorrow the outcome of your investigations.

Try this (when running a kernel with your XDP egress patches applied,
obviously):

$ git clone --recurse-submodules -b xdp-egress https://github.com/xdp-project/xdp-tools
$ cd xdp-tools && make && cd xdp-loader
$ sudo ./xdp-loader load --egress eth0 xdp-ctx-test.o xdp-ctx-test.o

Leads to an instant kernel crash (well, assuming there's any traffic on
the device, I suppose):

[  804.417699] general protection fault, probably for non-canonical address 0x10a18210d14ba9: 0000 [#1] SMP PTI
[  804.427518] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.7.0-rc1+ #204
[  804.434053] Hardware name: LENOVO 30B3005DMT/102F, BIOS S00KT56A 01/15/2018
[  804.441012] RIP: 0010:bpf_prog_3525d0165d4bca62_xdp_drop_func+0x1d/0xc98
[  804.447704] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 e5 48 81 ec 00 00 00 00 53 41 55 41 56 41 57 6a 00 48 8b 7f 28 <48> 8b 7f 00 8b bf 00 01 00 00 b8 01 00 00 00 48 85 ff 74 05 b8 02
[  804.466442] RSP: 0018:ffffb177c007ca38 EFLAGS: 00010286
[  804.471662] RAX: 000000000000dd86 RBX: ffffb177c007cb10 RCX: ffff939cde4ae102
[  804.478782] RDX: ffffffffc0bcb680 RSI: ffffb177c016b038 RDI: 0010a18210d14ba9
[  804.485903] RBP: ffffb177c007ca60 R08: ffff939cde4ae184 R09: 0000000000000001
[  804.493188] R10: ffff939d1c8028c0 R11: ffff939d1c3acf00 R12: ffffb177c016b000
[  804.500307] R13: 0000000000000002 R14: ffffb177c0594000 R15: ffff939cde4ae102
[  804.507430] FS:  0000000000000000(0000) GS:ffff939d1fc40000(0000) knlGS:0000000000000000
[  804.515510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  804.521245] CR2: 00007f6705642640 CR3: 000000074860a002 CR4: 00000000003606e0
[  804.528374] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  804.535502] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  804.542629] Call Trace:
[  804.545076]  <IRQ>
[  804.547096]  bpf_prog_79f73a00e07a6bab_F+0x38/0x980
[  804.551974]  do_xdp_generic_core+0x12b/0x320
[  804.556366]  do_xdp_egress_skb+0x52/0x110
[  804.560376]  dev_hard_start_xmit+0x107/0x220
[  804.564644]  sch_direct_xmit+0xec/0x230
[  804.568481]  __qdisc_run+0x140/0x550
[  804.572058]  __dev_queue_xmit+0x4b3/0x780
[  804.576071]  ip6_finish_output2+0x248/0x5b0
[  804.580254]  ip6_output+0x73/0x120
[  804.583656]  ? __ip6_finish_output+0x100/0x100
[  804.588105]  mld_sendpack+0x1c1/0x230
[  804.591766]  mld_ifc_timer_expire+0x1ab/0x310
[  804.596122]  ? ip6_mc_leave_src+0x90/0x90
[  804.600134]  call_timer_fn+0x2b/0x140
[  804.603796]  __run_timers.part.0+0x16f/0x270
[  804.608068]  ? timerqueue_add+0x96/0xb0
[  804.611902]  ? enqueue_hrtimer+0x36/0x90
[  804.615827]  run_timer_softirq+0x26/0x50
[  804.619877]  __do_softirq+0xe1/0x2fe
[  804.623453]  irq_exit+0xa6/0xb0
[  804.626595]  smp_apic_timer_interrupt+0x68/0x130
[  804.631210]  apic_timer_interrupt+0xf/0x20
[  804.635308]  </IRQ>
[  804.637414] RIP: 0010:cpuidle_enter_state+0xc3/0x400
[  804.642377] Code: e8 b2 02 8b ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 0a 03 00 00 31 ff e8 64 14 91 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 52 02 00 00 49 63 d5 4c 2b 64 24 10 48 8d 04 52 48
[  804.661114] RSP: 0018:ffffb177c00ffe60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  804.668674] RAX: ffff939d1fc6c8c0 RBX: ffff939d1fc76708 RCX: 000000000000001f
[  804.675801] RDX: 0000000000000000 RSI: 0000000023a34d57 RDI: 0000000000000000
[  804.683035] RBP: ffffffff896e6760 R08: 000000bb4b078647 R09: 0000000000000498
[  804.690163] R10: 0000000000000ae9 R11: ffff939d1fc6ba44 R12: 000000bb4b078647
[  804.697283] R13: 0000000000000004 R14: 0000000000000004 R15: ffff939d1fc76708
[  804.704409]  cpuidle_enter+0x29/0x40
[  804.707987]  do_idle+0x1ff/0x290
[  804.711216]  cpu_startup_entry+0x19/0x20
[  804.715139]  start_secondary+0x161/0x1c0
[  804.719063]  secondary_startup_64+0xa4/0xb0
[  804.723246] Modules linked in: xt_nat mlx5_ib xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp bridge stp llc iptable_filter binfmt_misc nls_iso8859_1 ib_uverbs intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi irqbypass mlx5_core snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core crct10dif_pclmul crc32_pclmul snd_pcm uas ghash_clmulni_intel usb_storage snd_timer wmi_bmof e1000e snd mei_me mei lpc_ich soundcore pata_acpi squashfs mac_hid ib_iser rdma_cm configfs iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear nouveau video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mxm_wmi aesni_intel
[  804.723288]  glue_helper crypto_simd cryptd ahci libahci wmi
[  804.816671] ---[ end trace d70f476e5883fde3 ]---
[  804.866476] RIP: 0010:bpf_prog_3525d0165d4bca62_xdp_drop_func+0x1d/0xc98
[  804.873269] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 e5 48 81 ec 00 00 00 00 53 41 55 41 56 41 57 6a 00 48 8b 7f 28 <48> 8b 7f 00 8b bf 00 01 00 00 b8 01 00 00 00 48 85 ff 74 05 b8 02
[  804.892010] RSP: 0018:ffffb177c007ca38 EFLAGS: 00010286
[  804.897236] RAX: 000000000000dd86 RBX: ffffb177c007cb10 RCX: ffff939cde4ae102
[  804.904362] RDX: ffffffffc0bcb680 RSI: ffffb177c016b038 RDI: 0010a18210d14ba9
[  804.911491] RBP: ffffb177c007ca60 R08: ffff939cde4ae184 R09: 0000000000000001
[  804.918622] R10: ffff939d1c8028c0 R11: ffff939d1c3acf00 R12: ffffb177c016b000
[  804.925750] R13: 0000000000000002 R14: ffffb177c0594000 R15: ffff939cde4ae102
[  804.932879] FS:  0000000000000000(0000) GS:ffff939d1fc40000(0000) knlGS:0000000000000000
[  804.941135] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  804.946876] CR2: 00007f6705642640 CR3: 000000074860a002 CR4: 00000000003606e0
[  804.954006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  804.961134] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  804.968265] Kernel panic - not syncing: Fatal exception in interrupt
[  804.974648] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)


The XDP program being loaded (as a multi-prog with freplace) is simply
this:

SEC("xdp_ctx_test")
int xdp_drop_func(struct xdp_md *ctx)
{
        if (ctx->ingress_ifindex > 0)
                return XDP_PASS;

        return XDP_DROP;
}

-Toke
Toke Høiland-Jørgensen April 23, 2020, 4:40 p.m. UTC | #13
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Wed, Apr 22, 2020 at 05:51:36PM +0200, Toke Høiland-Jørgensen wrote:
>> David Ahern <dsahern@gmail.com> writes:
>> 
>> > On 4/22/20 9:27 AM, Toke Høiland-Jørgensen wrote:
>> >> And as I said in the beginning, I'm perfectly happy to be told why I'm
>> >> wrong; but so far you have just been arguing that I'm out of scope ;)
>> >
>> > you are arguing about a suspected bug with existing code that is no way
>> > touched or modified by this patch set, so yes it is out of scope.
>> 
>> Your patch is relying on the (potentially buggy) behaviour, so I don't
>> think it's out of scope to mention it in this context.
>
> Sorry for slow reply.
> I'm swamped with other things atm.
>
> Looks like there is indeed a bug in prog_type_ext handling code that
> is doing
> env->ops = bpf_verifier_ops[tgt_prog->type];
> I'm not sure whether the verifier can simply add:
> prog->expected_attach_type = tgt_prog->expected_attach_type;
> and be done with it.
> Likely yes, since expected_attach_type must be zero at that point
> that is enforced by bpf_prog_load_check_attach().
> So I suspect it's a single line fix.

Not quite: the check in bpf_tracing_prog_attach() that enforces
prog->expected_attach_type==0 also needs to go. So 5 lines :)

With those two changes, I do seem to get the correct behaviour for XDP
multiprogs in both ingress and egress mode (with David's patch set).

> A selftest to prove or disprove is necessary, of course.

I'll see if I can't hook this into the existing freplace selftest and
send a proper patch. I'm not really sure how to check for any hidden
side effects; in particular I'm a bit twitchy about removing that check
in bpf_tracing_prog_attach() - presumably it is there for a reason? But
I guess I'll send a patch and you can take a look.

> Thanks Toke for bringing it to my attention.

You're quite welcome! And thanks for the hint that the fix was that
simple; I feared something more invasive was needed.

-Toke
Alexei Starovoitov April 23, 2020, 4:52 p.m. UTC | #14
On Thu, Apr 23, 2020 at 9:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Wed, Apr 22, 2020 at 05:51:36PM +0200, Toke Høiland-Jørgensen wrote:
> >> David Ahern <dsahern@gmail.com> writes:
> >>
> >> > On 4/22/20 9:27 AM, Toke Høiland-Jørgensen wrote:
> >> >> And as I said in the beginning, I'm perfectly happy to be told why I'm
> >> >> wrong; but so far you have just been arguing that I'm out of scope ;)
> >> >
> >> > you are arguing about a suspected bug with existing code that is no way
> >> > touched or modified by this patch set, so yes it is out of scope.
> >>
> >> Your patch is relying on the (potentially buggy) behaviour, so I don't
> >> think it's out of scope to mention it in this context.
> >
> > Sorry for slow reply.
> > I'm swamped with other things atm.
> >
> > Looks like there is indeed a bug in prog_type_ext handling code that
> > is doing
> > env->ops = bpf_verifier_ops[tgt_prog->type];
> > I'm not sure whether the verifier can simply add:
> > prog->expected_attach_type = tgt_prog->expected_attach_type;
> > and be done with it.
> > Likely yes, since expected_attach_type must be zero at that point
> > that is enforced by bpf_prog_load_check_attach().
> > So I suspect it's a single line fix.
>
> Not quite: the check in bpf_tracing_prog_attach() that enforces
> prog->expected_attach_type==0 also needs to go. So 5 lines :)

prog_ext's expected_attach_type needs to stay zero.
It needs to be inherited from tgt prog. Hence one line:
prog->expected_attach_type = tgt_prog->expected_attach_type;
Toke Høiland-Jørgensen April 23, 2020, 5:05 p.m. UTC | #15
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Thu, Apr 23, 2020 at 9:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Wed, Apr 22, 2020 at 05:51:36PM +0200, Toke Høiland-Jørgensen wrote:
>> >> David Ahern <dsahern@gmail.com> writes:
>> >>
>> >> > On 4/22/20 9:27 AM, Toke Høiland-Jørgensen wrote:
>> >> >> And as I said in the beginning, I'm perfectly happy to be told why I'm
>> >> >> wrong; but so far you have just been arguing that I'm out of scope ;)
>> >> >
>> >> > you are arguing about a suspected bug with existing code that is no way
>> >> > touched or modified by this patch set, so yes it is out of scope.
>> >>
>> >> Your patch is relying on the (potentially buggy) behaviour, so I don't
>> >> think it's out of scope to mention it in this context.
>> >
>> > Sorry for slow reply.
>> > I'm swamped with other things atm.
>> >
>> > Looks like there is indeed a bug in prog_type_ext handling code that
>> > is doing
>> > env->ops = bpf_verifier_ops[tgt_prog->type];
>> > I'm not sure whether the verifier can simply add:
>> > prog->expected_attach_type = tgt_prog->expected_attach_type;
>> > and be done with it.
>> > Likely yes, since expected_attach_type must be zero at that point
>> > that is enforced by bpf_prog_load_check_attach().
>> > So I suspect it's a single line fix.
>>
>> Not quite: the check in bpf_tracing_prog_attach() that enforces
>> prog->expected_attach_type==0 also needs to go. So 5 lines :)
>
> prog_ext's expected_attach_type needs to stay zero.
> It needs to be inherited from tgt prog. Hence one line:
> prog->expected_attach_type = tgt_prog->expected_attach_type;

Not sure I follow you here? I ended up with the patch below - without
the first hunk I can't attach freplace funcs to an xdp egress prog
(since the expected_attach_type will have been propagated from
verification time), and so that check will fail. Or am I missing
something?

-Toke



diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d85f37239540..40c3103c7233 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2381,10 +2381,6 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
                }
                break;
        case BPF_PROG_TYPE_EXT:
-               if (prog->expected_attach_type != 0) {
-                       err = -EINVAL;
-                       goto out_put_prog;
-               }
                break;
        case BPF_PROG_TYPE_LSM:
                if (prog->expected_attach_type != BPF_LSM_MAC) {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 513d9c545176..41c31773a3c4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10485,6 +10485,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
                                return -EINVAL;
                        }
                        env->ops = bpf_verifier_ops[tgt_prog->type];
+                       prog->expected_attach_type = tgt_prog->expected_attach_type;
                }
                if (!tgt_prog->jited) {
                        verbose(env, "Can attach to only JITed progs\n");
Alexei Starovoitov April 23, 2020, 10:44 p.m. UTC | #16
On Thu, Apr 23, 2020 at 07:05:42PM +0200, Toke Høiland-Jørgensen wrote:
> >> >
> >> > Looks like there is indeed a bug in prog_type_ext handling code that
> >> > is doing
> >> > env->ops = bpf_verifier_ops[tgt_prog->type];
> >> > I'm not sure whether the verifier can simply add:
> >> > prog->expected_attach_type = tgt_prog->expected_attach_type;
> >> > and be done with it.
> >> > Likely yes, since expected_attach_type must be zero at that point
> >> > that is enforced by bpf_prog_load_check_attach().
> >> > So I suspect it's a single line fix.
> >>
> >> Not quite: the check in bpf_tracing_prog_attach() that enforces
> >> prog->expected_attach_type==0 also needs to go. So 5 lines :)
> >
> > prog_ext's expected_attach_type needs to stay zero.
> > It needs to be inherited from tgt prog. Hence one line:
> > prog->expected_attach_type = tgt_prog->expected_attach_type;
> 
> Not sure I follow you here? I ended up with the patch below - without
> the first hunk I can't attach freplace funcs to an xdp egress prog
> (since the expected_attach_type will have been propagated from
> verification time), and so that check will fail. Or am I missing
> something?
> 
> -Toke
> 
> 
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index d85f37239540..40c3103c7233 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2381,10 +2381,6 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
>                 }
>                 break;
>         case BPF_PROG_TYPE_EXT:
> -               if (prog->expected_attach_type != 0) {
> -                       err = -EINVAL;
> -                       goto out_put_prog;
> -               }
>                 break;

ahh. that extra check.
I think it's better to keep it for extra safety.
Here all expected_attach_type have clear checks depending on prog_type.
There is no other place where it's that obvious.
The verifier does similar thing earlier, but it's not that clear.
I think the better fix would to set expected_attach_type = 0 for PROG_TYPE_EXT
at the end of do_check, since we're overriding this field temporarily
during verification.

>         case BPF_PROG_TYPE_LSM:
>                 if (prog->expected_attach_type != BPF_LSM_MAC) {
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 513d9c545176..41c31773a3c4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -10485,6 +10485,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
>                                 return -EINVAL;
>                         }
>                         env->ops = bpf_verifier_ops[tgt_prog->type];
> +                       prog->expected_attach_type = tgt_prog->expected_attach_type;

In that sense it's like 'env->expected_attach_type'.
Toke Høiland-Jørgensen April 23, 2020, 11:49 p.m. UTC | #17
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Thu, Apr 23, 2020 at 07:05:42PM +0200, Toke Høiland-Jørgensen wrote:
>> >> >
>> >> > Looks like there is indeed a bug in prog_type_ext handling code that
>> >> > is doing
>> >> > env->ops = bpf_verifier_ops[tgt_prog->type];
>> >> > I'm not sure whether the verifier can simply add:
>> >> > prog->expected_attach_type = tgt_prog->expected_attach_type;
>> >> > and be done with it.
>> >> > Likely yes, since expected_attach_type must be zero at that point
>> >> > that is enforced by bpf_prog_load_check_attach().
>> >> > So I suspect it's a single line fix.
>> >>
>> >> Not quite: the check in bpf_tracing_prog_attach() that enforces
>> >> prog->expected_attach_type==0 also needs to go. So 5 lines :)
>> >
>> > prog_ext's expected_attach_type needs to stay zero.
>> > It needs to be inherited from tgt prog. Hence one line:
>> > prog->expected_attach_type = tgt_prog->expected_attach_type;
>> 
>> Not sure I follow you here? I ended up with the patch below - without
>> the first hunk I can't attach freplace funcs to an xdp egress prog
>> (since the expected_attach_type will have been propagated from
>> verification time), and so that check will fail. Or am I missing
>> something?
>> 
>> -Toke
>> 
>> 
>> 
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index d85f37239540..40c3103c7233 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -2381,10 +2381,6 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
>>                 }
>>                 break;
>>         case BPF_PROG_TYPE_EXT:
>> -               if (prog->expected_attach_type != 0) {
>> -                       err = -EINVAL;
>> -                       goto out_put_prog;
>> -               }
>>                 break;
>
> ahh. that extra check.
> I think it's better to keep it for extra safety.
> Here all expected_attach_type have clear checks depending on prog_type.
> There is no other place where it's that obvious.
> The verifier does similar thing earlier, but it's not that clear.
> I think the better fix would to set expected_attach_type = 0 for PROG_TYPE_EXT
> at the end of do_check, since we're overriding this field temporarily
> during verification.

OK, sure, can do. I do agree it's better to keep the check. I'll send a
proper patch tomorrow, then.

As far as a adding a selftest for this, I think the most natural thing
would be to add it on top of David's tests for xdp_egress, since that's
what hit this - would you be OK with that? And if so, should I send the
main patch straight away and hold off on the selftest, or should I split
them, or hold off on the whole thing?

-Toke
Alexei Starovoitov April 24, 2020, 12:53 a.m. UTC | #18
On Fri, Apr 24, 2020 at 01:49:03AM +0200, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Thu, Apr 23, 2020 at 07:05:42PM +0200, Toke Høiland-Jørgensen wrote:
> >> >> >
> >> >> > Looks like there is indeed a bug in prog_type_ext handling code that
> >> >> > is doing
> >> >> > env->ops = bpf_verifier_ops[tgt_prog->type];
> >> >> > I'm not sure whether the verifier can simply add:
> >> >> > prog->expected_attach_type = tgt_prog->expected_attach_type;
> >> >> > and be done with it.
> >> >> > Likely yes, since expected_attach_type must be zero at that point
> >> >> > that is enforced by bpf_prog_load_check_attach().
> >> >> > So I suspect it's a single line fix.
> >> >>
> >> >> Not quite: the check in bpf_tracing_prog_attach() that enforces
> >> >> prog->expected_attach_type==0 also needs to go. So 5 lines :)
> >> >
> >> > prog_ext's expected_attach_type needs to stay zero.
> >> > It needs to be inherited from tgt prog. Hence one line:
> >> > prog->expected_attach_type = tgt_prog->expected_attach_type;
> >> 
> >> Not sure I follow you here? I ended up with the patch below - without
> >> the first hunk I can't attach freplace funcs to an xdp egress prog
> >> (since the expected_attach_type will have been propagated from
> >> verification time), and so that check will fail. Or am I missing
> >> something?
> >> 
> >> -Toke
> >> 
> >> 
> >> 
> >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> >> index d85f37239540..40c3103c7233 100644
> >> --- a/kernel/bpf/syscall.c
> >> +++ b/kernel/bpf/syscall.c
> >> @@ -2381,10 +2381,6 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
> >>                 }
> >>                 break;
> >>         case BPF_PROG_TYPE_EXT:
> >> -               if (prog->expected_attach_type != 0) {
> >> -                       err = -EINVAL;
> >> -                       goto out_put_prog;
> >> -               }
> >>                 break;
> >
> > ahh. that extra check.
> > I think it's better to keep it for extra safety.
> > Here all expected_attach_type have clear checks depending on prog_type.
> > There is no other place where it's that obvious.
> > The verifier does similar thing earlier, but it's not that clear.
> > I think the better fix would to set expected_attach_type = 0 for PROG_TYPE_EXT
> > at the end of do_check, since we're overriding this field temporarily
> > during verification.
> 
> OK, sure, can do. I do agree it's better to keep the check. I'll send a
> proper patch tomorrow, then.
> 
> As far as a adding a selftest for this, I think the most natural thing
> would be to add it on top of David's tests for xdp_egress, since that's
> what hit this - would you be OK with that? And if so, should I send the
> main patch straight away and hold off on the selftest, or should I split
> them, or hold off on the whole thing?

I think the issue is not related to xdp egress.
Hence I'd like to push the fix along with selftest into bpf tree.
The selftest can be:
void noinline do_bind((struct bpf_sock_addr *ctx)
{
  struct sockaddr_in sa = {};

  bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa));
  return 0;
}
SEC("cgroup/connect4")
int connect_v4_prog(struct bpf_sock_addr *ctx)
{
  return do_bind(ctx);
}

and freplace would replace do_bind() with do_new_bind()
that also calls bpf_bind().
I think without the fix freplace will fail to load, because
availability of bpf_bind() depends on correct prog->expected_attach_type.

I haven't looked at the crash you mentioned in the other email related
to xdp egress set. That could be different issue. I hope it's the same thing :)
David Ahern April 24, 2020, 12:58 a.m. UTC | #19
On 4/23/20 6:53 PM, Alexei Starovoitov wrote:
> 
> I think the issue is not related to xdp egress.

It isn't; that has been my point all along.

> Hence I'd like to push the fix along with selftest into bpf tree.
> The selftest can be:
> void noinline do_bind((struct bpf_sock_addr *ctx)
> {
>   struct sockaddr_in sa = {};
> 
>   bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa));
>   return 0;
> }
> SEC("cgroup/connect4")
> int connect_v4_prog(struct bpf_sock_addr *ctx)
> {
>   return do_bind(ctx);
> }
> 
> and freplace would replace do_bind() with do_new_bind()
> that also calls bpf_bind().
> I think without the fix freplace will fail to load, because
> availability of bpf_bind() depends on correct prog->expected_attach_type.
> 
> I haven't looked at the crash you mentioned in the other email related
> to xdp egress set. That could be different issue. I hope it's the same thing :)
> 

it is. The replaced program is accessing ingress_ifindex from xdp egress
context, and Rx stuff is not set (access is blocked by verifier).
Toke Høiland-Jørgensen April 24, 2020, 8:55 a.m. UTC | #20
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Fri, Apr 24, 2020 at 01:49:03AM +0200, Toke Høiland-Jørgensen wrote:
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>> 
>> > On Thu, Apr 23, 2020 at 07:05:42PM +0200, Toke Høiland-Jørgensen wrote:
>> >> >> >
>> >> >> > Looks like there is indeed a bug in prog_type_ext handling code that
>> >> >> > is doing
>> >> >> > env->ops = bpf_verifier_ops[tgt_prog->type];
>> >> >> > I'm not sure whether the verifier can simply add:
>> >> >> > prog->expected_attach_type = tgt_prog->expected_attach_type;
>> >> >> > and be done with it.
>> >> >> > Likely yes, since expected_attach_type must be zero at that point
>> >> >> > that is enforced by bpf_prog_load_check_attach().
>> >> >> > So I suspect it's a single line fix.
>> >> >>
>> >> >> Not quite: the check in bpf_tracing_prog_attach() that enforces
>> >> >> prog->expected_attach_type==0 also needs to go. So 5 lines :)
>> >> >
>> >> > prog_ext's expected_attach_type needs to stay zero.
>> >> > It needs to be inherited from tgt prog. Hence one line:
>> >> > prog->expected_attach_type = tgt_prog->expected_attach_type;
>> >> 
>> >> Not sure I follow you here? I ended up with the patch below - without
>> >> the first hunk I can't attach freplace funcs to an xdp egress prog
>> >> (since the expected_attach_type will have been propagated from
>> >> verification time), and so that check will fail. Or am I missing
>> >> something?
>> >> 
>> >> -Toke
>> >> 
>> >> 
>> >> 
>> >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> >> index d85f37239540..40c3103c7233 100644
>> >> --- a/kernel/bpf/syscall.c
>> >> +++ b/kernel/bpf/syscall.c
>> >> @@ -2381,10 +2381,6 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
>> >>                 }
>> >>                 break;
>> >>         case BPF_PROG_TYPE_EXT:
>> >> -               if (prog->expected_attach_type != 0) {
>> >> -                       err = -EINVAL;
>> >> -                       goto out_put_prog;
>> >> -               }
>> >>                 break;
>> >
>> > ahh. that extra check.
>> > I think it's better to keep it for extra safety.
>> > Here all expected_attach_type have clear checks depending on prog_type.
>> > There is no other place where it's that obvious.
>> > The verifier does similar thing earlier, but it's not that clear.
>> > I think the better fix would to set expected_attach_type = 0 for PROG_TYPE_EXT
>> > at the end of do_check, since we're overriding this field temporarily
>> > during verification.
>> 
>> OK, sure, can do. I do agree it's better to keep the check. I'll send a
>> proper patch tomorrow, then.
>> 
>> As far as a adding a selftest for this, I think the most natural thing
>> would be to add it on top of David's tests for xdp_egress, since that's
>> what hit this - would you be OK with that? And if so, should I send the
>> main patch straight away and hold off on the selftest, or should I split
>> them, or hold off on the whole thing?
>
> I think the issue is not related to xdp egress.
> Hence I'd like to push the fix along with selftest into bpf tree.
> The selftest can be:
> void noinline do_bind((struct bpf_sock_addr *ctx)
> {
>   struct sockaddr_in sa = {};
>
>   bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa));
>   return 0;
> }
> SEC("cgroup/connect4")
> int connect_v4_prog(struct bpf_sock_addr *ctx)
> {
>   return do_bind(ctx);
> }
>
> and freplace would replace do_bind() with do_new_bind()
> that also calls bpf_bind().
> I think without the fix freplace will fail to load, because
> availability of bpf_bind() depends on correct
> prog->expected_attach_type.

Right, I'll give this a shot, thanks :)

> I haven't looked at the crash you mentioned in the other email related
> to xdp egress set. That could be different issue. I hope it's the same
> thing :)

Yeah, it is.

-Toke
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2e29a671d67e..a9d384998e8b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -215,6 +215,7 @@  enum bpf_attach_type {
 	BPF_TRACE_FEXIT,
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
+	BPF_XDP_EGRESS,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 97180458e7cb..e8a62bdb395b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8732,6 +8732,17 @@  int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 
+		if (egress && prog->expected_attach_type != BPF_XDP_EGRESS) {
+			NL_SET_ERR_MSG(extack, "XDP program in Tx path must use BPF_XDP_EGRESS attach type");
+			bpf_prog_put(prog);
+			return -EINVAL;
+		}
+		if (!egress && prog->expected_attach_type == BPF_XDP_EGRESS) {
+			NL_SET_ERR_MSG(extack, "XDP program in Rx path can not use BPF_XDP_EGRESS attach type");
+			bpf_prog_put(prog);
+			return -EINVAL;
+		}
+
 		if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
 			NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
 			bpf_prog_put(prog);
diff --git a/net/core/filter.c b/net/core/filter.c
index 7d6ceaa54d21..bcb56448f336 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6935,6 +6935,14 @@  static bool xdp_is_valid_access(int off, int size,
 				const struct bpf_prog *prog,
 				struct bpf_insn_access_aux *info)
 {
+	if (prog->expected_attach_type == BPF_XDP_EGRESS) {
+		switch (off) {
+		case offsetof(struct xdp_md, ingress_ifindex):
+		case offsetof(struct xdp_md, rx_queue_index):
+			return false;
+		}
+	}
+
 	if (type == BPF_WRITE) {
 		if (bpf_prog_is_dev_bound(prog->aux)) {
 			switch (off) {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2e29a671d67e..a9d384998e8b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -215,6 +215,7 @@  enum bpf_attach_type {
 	BPF_TRACE_FEXIT,
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
+	BPF_XDP_EGRESS,
 	__MAX_BPF_ATTACH_TYPE
 };