Message ID | 20190824020028.6242-1-jakub.kicinski@netronome.com |
---|---|
State | Accepted |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [bpf] nfp: bpf: fix latency bug when updating stack index register | expand |
On Fri, Aug 23, 2019 at 7:04 PM Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > > From: Jiong Wang <jiong.wang@netronome.com> > > NFP is using Local Memory to model stack. LM_addr could be used as base of > a 16 32-bit word region of Local Memory. Then, if the stack offset is > beyond the current region, the local index needs to be updated. The update > needs at least three cycles to take effect, therefore the sequence normally > looks like: > > local_csr_wr[ActLMAddr3, gprB_5] > nop > nop > nop > > If the local index switch happens on a narrow loads, then the instruction > preparing value to zero high 32-bit of the destination register could be > counted as one cycle, the sequence then could be something like: > > local_csr_wr[ActLMAddr3, gprB_5] > nop > nop > immed[gprB_5, 0] > > However, we have zero extension optimization that zeroing high 32-bit could > be eliminated, therefore above IMMED insn won't be available for which case > the first sequence needs to be generated. > > Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen") > Signed-off-by: Jiong Wang <jiong.wang@netronome.com> > Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> I haven't looked into the code yet. But ^^^ should be Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> right?
On Sun, Aug 25, 2019 at 10:37 PM Song Liu <liu.song.a23@gmail.com> wrote: > On Fri, Aug 23, 2019 at 7:04 PM Jakub Kicinski wrote: > > From: Jiong Wang <jiong.wang@netronome.com> > > > > NFP is using Local Memory to model stack. LM_addr could be used as base of > > a 16 32-bit word region of Local Memory. Then, if the stack offset is > > beyond the current region, the local index needs to be updated. The update > > needs at least three cycles to take effect, therefore the sequence normally > > looks like: > > > > local_csr_wr[ActLMAddr3, gprB_5] > > nop > > nop > > nop > > > > If the local index switch happens on a narrow loads, then the instruction > > preparing value to zero high 32-bit of the destination register could be > > counted as one cycle, the sequence then could be something like: > > > > local_csr_wr[ActLMAddr3, gprB_5] > > nop > > nop > > immed[gprB_5, 0] > > > > However, we have zero extension optimization that zeroing high 32-bit could > > be eliminated, therefore above IMMED insn won't be available for which case > > the first sequence needs to be generated. > > > > Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen") > > Signed-off-by: Jiong Wang <jiong.wang@netronome.com> > > Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> > I haven't looked into the code yet. But ^^^ should be > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> > > right? I prefer Review on code I review, ack on code I ack, and sign-off on code I co-author.
On Mon, Aug 26, 2019 at 8:57 AM Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > > On Sun, Aug 25, 2019 at 10:37 PM Song Liu <liu.song.a23@gmail.com> wrote: > > On Fri, Aug 23, 2019 at 7:04 PM Jakub Kicinski wrote: > > > From: Jiong Wang <jiong.wang@netronome.com> > > > > > > NFP is using Local Memory to model stack. LM_addr could be used as base of > > > a 16 32-bit word region of Local Memory. Then, if the stack offset is > > > beyond the current region, the local index needs to be updated. The update > > > needs at least three cycles to take effect, therefore the sequence normally > > > looks like: > > > > > > local_csr_wr[ActLMAddr3, gprB_5] > > > nop > > > nop > > > nop > > > > > > If the local index switch happens on a narrow loads, then the instruction > > > preparing value to zero high 32-bit of the destination register could be > > > counted as one cycle, the sequence then could be something like: > > > > > > local_csr_wr[ActLMAddr3, gprB_5] > > > nop > > > nop > > > immed[gprB_5, 0] > > > > > > However, we have zero extension optimization that zeroing high 32-bit could > > > be eliminated, therefore above IMMED insn won't be available for which case > > > the first sequence needs to be generated. > > > > > > Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen") > > > Signed-off-by: Jiong Wang <jiong.wang@netronome.com> > > > Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> > > I haven't looked into the code yet. But ^^^ should be > > > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> > > > > right? > > I prefer Review on code I review, ack on code I ack, and sign-off on > code I co-author. I believe if you're sending somebody else patch you have to add your SOB in addition to their 'Author:' and their SOB fields.
On 8/26/19 6:18 PM, Alexei Starovoitov wrote: > On Mon, Aug 26, 2019 at 8:57 AM Jakub Kicinski > <jakub.kicinski@netronome.com> wrote: >> On Sun, Aug 25, 2019 at 10:37 PM Song Liu <liu.song.a23@gmail.com> wrote: >>> On Fri, Aug 23, 2019 at 7:04 PM Jakub Kicinski wrote: >>>> From: Jiong Wang <jiong.wang@netronome.com> >>>> >>>> NFP is using Local Memory to model stack. LM_addr could be used as base of >>>> a 16 32-bit word region of Local Memory. Then, if the stack offset is >>>> beyond the current region, the local index needs to be updated. The update >>>> needs at least three cycles to take effect, therefore the sequence normally >>>> looks like: >>>> >>>> local_csr_wr[ActLMAddr3, gprB_5] >>>> nop >>>> nop >>>> nop >>>> >>>> If the local index switch happens on a narrow loads, then the instruction >>>> preparing value to zero high 32-bit of the destination register could be >>>> counted as one cycle, the sequence then could be something like: >>>> >>>> local_csr_wr[ActLMAddr3, gprB_5] >>>> nop >>>> nop >>>> immed[gprB_5, 0] >>>> >>>> However, we have zero extension optimization that zeroing high 32-bit could >>>> be eliminated, therefore above IMMED insn won't be available for which case >>>> the first sequence needs to be generated. >>>> >>>> Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen") >>>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> >>>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> >>> I haven't looked into the code yet. But ^^^ should be >>> >>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> >>> >>> right? >> >> I prefer Review on code I review, ack on code I ack, and sign-off on >> code I co-author. > > I believe if you're sending somebody else patch you have to add your SOB > in addition to their 'Author:' and their SOB fields. +1, for co-authoring there's a 'Co-authored-by:' tag which seems to be frequently used these days.
On Mon, 26 Aug 2019 18:25:10 +0200, Daniel Borkmann wrote: > On 8/26/19 6:18 PM, Alexei Starovoitov wrote: > > On Mon, Aug 26, 2019 at 8:57 AM Jakub Kicinski > > <jakub.kicinski@netronome.com> wrote: > >> On Sun, Aug 25, 2019 at 10:37 PM Song Liu <liu.song.a23@gmail.com> wrote: > >>> On Fri, Aug 23, 2019 at 7:04 PM Jakub Kicinski wrote: > >>>> From: Jiong Wang <jiong.wang@netronome.com> > >>>> > >>>> NFP is using Local Memory to model stack. LM_addr could be used as base of > >>>> a 16 32-bit word region of Local Memory. Then, if the stack offset is > >>>> beyond the current region, the local index needs to be updated. The update > >>>> needs at least three cycles to take effect, therefore the sequence normally > >>>> looks like: > >>>> > >>>> local_csr_wr[ActLMAddr3, gprB_5] > >>>> nop > >>>> nop > >>>> nop > >>>> > >>>> If the local index switch happens on a narrow loads, then the instruction > >>>> preparing value to zero high 32-bit of the destination register could be > >>>> counted as one cycle, the sequence then could be something like: > >>>> > >>>> local_csr_wr[ActLMAddr3, gprB_5] > >>>> nop > >>>> nop > >>>> immed[gprB_5, 0] > >>>> > >>>> However, we have zero extension optimization that zeroing high 32-bit could > >>>> be eliminated, therefore above IMMED insn won't be available for which case > >>>> the first sequence needs to be generated. > >>>> > >>>> Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen") > >>>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> > >>>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> > >>> I haven't looked into the code yet. But ^^^ should be > >>> > >>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> > >>> > >>> right? > >> > >> I prefer Review on code I review, ack on code I ack, and sign-off on > >> code I co-author. > > > > I believe if you're sending somebody else patch you have to add your SOB > > in addition to their 'Author:' and their SOB fields. > > +1, for co-authoring there's a 'Co-authored-by:' tag which seems to be frequently > used these days. Ack, there is a difference between co-author of code, and co-author as step by step guidance. I've been doing this for 6 years now, and nobody ever complained :) Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Is that enough or should I repost?
On 8/24/19 4:00 AM, Jakub Kicinski wrote: > From: Jiong Wang <jiong.wang@netronome.com> > > NFP is using Local Memory to model stack. LM_addr could be used as base of > a 16 32-bit word region of Local Memory. Then, if the stack offset is > beyond the current region, the local index needs to be updated. The update > needs at least three cycles to take effect, therefore the sequence normally > looks like: > > local_csr_wr[ActLMAddr3, gprB_5] > nop > nop > nop > > If the local index switch happens on a narrow loads, then the instruction > preparing value to zero high 32-bit of the destination register could be > counted as one cycle, the sequence then could be something like: > > local_csr_wr[ActLMAddr3, gprB_5] > nop > nop > immed[gprB_5, 0] > > However, we have zero extension optimization that zeroing high 32-bit could > be eliminated, therefore above IMMED insn won't be available for which case > the first sequence needs to be generated. > > Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen") > Signed-off-by: Jiong Wang <jiong.wang@netronome.com> > Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Applied, thanks!
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c index 4054b70d7719..5afcb3c4c2ef 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c @@ -1163,7 +1163,7 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, bool clr_gpr, lmem_step step) { s32 off = nfp_prog->stack_frame_depth + meta->insn.off + ptr_off; - bool first = true, last; + bool first = true, narrow_ld, last; bool needs_inc = false; swreg stack_off_reg; u8 prev_gpr = 255; @@ -1209,13 +1209,22 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, needs_inc = true; } + + narrow_ld = clr_gpr && size < 8; + if (lm3) { + unsigned int nop_cnt; + emit_csr_wr(nfp_prog, imm_b(nfp_prog), NFP_CSR_ACT_LM_ADDR3); - /* For size < 4 one slot will be filled by zeroing of upper. */ - wrp_nops(nfp_prog, clr_gpr && size < 8 ? 2 : 3); + /* For size < 4 one slot will be filled by zeroing of upper, + * but be careful, that zeroing could be eliminated by zext + * optimization. + */ + nop_cnt = narrow_ld && meta->flags & FLAG_INSN_DO_ZEXT ? 2 : 3; + wrp_nops(nfp_prog, nop_cnt); } - if (clr_gpr && size < 8) + if (narrow_ld) wrp_zext(nfp_prog, meta, gpr); while (size) {