Message ID | AM2PR07MB104298D64EBD0B440536460F8AED0@AM2PR07MB1042.eurprd07.prod.outlook.com |
---|---|
State | Changes Requested |
Headers | show |
On 11 May 2017 at 09:04, Zoltán Balogh <zoltan.balogh@ericsson.com> wrote: > Hi Joe, > > Thank you for your comments! Please, find my answers below. > >> I had trouble trying to review this, in part because I felt like >> changes to fix multiple issues are being placed into one patch. If >> there are separate issues being addressed, each issue should be fixed >> in a separate patch, each with a clear description of the intent of >> the change (with links to patches that introduced the breakage if >> possible!). If there is refactoring to do, consider separating the >> non-functional changes into a separate patch---when looking at the >> original patch that started this discussion, I found it difficult to >> review post-merge because the refactoring was included and I couldn't >> tell if there were actually functional changes. >> >> Given that this is taking a bit longer than I thought and we need to >> take a holistic approach to how all of these interactions should work, >> I plan to go ahead and apply the revert patch. I look forward to >> seeing a fresh series proposal that will bring back the benefits of >> the original patch. >> >> More feedback below. >> >> > Here comes the patch: >> > >> > >> > Since tunneling and forwarding via patch port uses clone action, the tunneled >> >> Where does forwarding via patch port use clone action? > > Sorry, that was not correct. Patch port does not use clone action. Both patch > port and native tunneling use the apply_netsed_clone_action(). That's what I > wanted to express. OK. I find that function name a bit confusing given it doesn't relate to clone. >> apply_nested_clone_action() also takes copies of the flow and >> base_flow, do we need to do it twice in this path? I suspect that this >> work could benefit from some further refactoring, taking note to have >> separate patches to refactor code and different patches for functional >> changes. > > Thank you for indicating this. There is no need to create a backup copy of > flow. We need to store some fields of base_flow. I could use a function like > this to create backup: > > static void > copy_flow_L2L3_data(struct flow *dst, const struct flow *src) > { > dst->packet_type = src->packet_type; > dst->dl_dst = src->dl_dst; > dst->dl_src = src->dl_src; > dst->dl_type = src->dl_type; > dst->nw_dst = src->nw_dst; > dst->nw_src = src->nw_src; > dst->ipv6_dst = src->ipv6_dst; > dst->ipv6_src = src->ipv6_src; > dst->nw_tos = src->nw_tos; > dst->nw_ttl = src->nw_ttl; > dst->nw_proto = src->nw_proto; > dst->tp_dst = src->tp_dst; > dst->tp_src = src->tp_src; > } Ah, that's not quite what I meant. Copying the whole flow seems generically correct as per before. The above may be more efficient, but it also seems easier to screw up if 'struct flow' fields are changed. I think we can start with copying the whole flow, and consider an optimization like this separately. My concern was that both build_tunnel_send() and apply_nested_clone_action() was doing the store/restore of the flows. > ... > > /* Backup. */ > copy_flow_L2L3_data(&old_base_flow, &ctx->base_flow); > > ... > > /* Restore. */ > copy_flow_L2L3_data(&ctx->base_flow, &old_base_flow); > >> > >> > err = tnl_route_lookup_flow(flow, &d_ip6, &s_ip6, &out_dev); >> > if (err) { >> > @@ -3216,16 +3248,57 @@ build_tunnel_send(struct xlate_ctx *ctx, const struct xport *xport, >> > tnl_push_data.tnl_port = odp_to_u32(tunnel_odp_port); >> > tnl_push_data.out_port = odp_to_u32(out_dev->odp_port); >> > >> > + /* After tunnel header has been added, MAC and IP data of flow and >> > + * base_flow need to be set properly, since there is not recirculation >> > + * any more when sending packet to tunnel. */ >> > + if (!eth_addr_is_zero(ctx->xin->flow.dl_dst)) { >> > + ctx->xin->flow.dl_dst = dmac; >> > + ctx->base_flow.dl_dst = ctx->xin->flow.dl_dst; >> > + } >> >> What's the expectation if ctx->xin->flow.dl_dst is zero? Is that for >> L3 tunneled packets? > > No, it's not for L3 tunneled packets. Without this push_pop_tunnel and > push_pop_tunnel_ipv6 unit tests do fail. This needs to be fixed. > > 783: tunnel_push_pop - action FAILED (tunnel-push-pop.at:109) > 785: tunnel_push_pop_ipv6 - action FAILED (tunnel-push-pop-ipv6.at:92) OK, I still don't quite understand how we get into this state of dl_dst is zero in the first place. If it's unexpected, let's fix the bug that causes this state. If it's expected, maybe we can add a short comment above the if statement which describes why the dl_dst may be zero and why it makes sense to leave the flow's mac alone for tunneling. > ./tunnel-push-pop.at:108: ovs-appctl ofproto/trace ovs-dummy 'in_port(2),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1. > 3.112,proto=47,tos=0,ttl=64,frag=no)' > stdout: > Flow: ip,in_port=LOCAL,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=1.1.3.88,nw_dst=1.1. > 3.112,nw_proto=47,nw_tos=0,nw_ecn=0,nw_ttl=64 > > bridge("int-br") > ---------------- > 0. priority 32768 > output:2 > -> output to native tunnel > -> tunneling to 1.1.2.92 via br0 > -> tunneling from aa:55:aa:55:00:00 1.1.2.88 to f8:bc:12:44:34:b6 1.1.2.92 > > bridge("br0") > ------------- > 0. priority 32768 > NORMAL > -> learned port is input port, dropping > > Final flow: unchanged > Megaflow: recirc_id=0,ip,in_port=LOCAL,vlan_tci=0x0000/0x1fff,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_e > cn=0,nw_frag=no > Datapath actions: clone(drop),tnl_push(tnl_port(4789),header(size=50,type=4,eth(dst=f8:bc:12:44:34:b6,src=aa:55:aa:5 > 5:00:00,dl_type=0x0800),ipv4(src=1.1.2.88,dst=1.1.2.92,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x > 0),vxlan(flags=0x8000000,vni=0x7b)),out_port(100)) > ./tunnel-push-pop.at:109: tail -1 stdout > --- - 2017-05-11 14:40:42.205776583 +0200 > +++ /home/ezolbal/work/general_L3_tunneling/ovs/tests/testsuite.dir/at-groups/783/stdout 2017-05-11 14:40:42.200387095 +0200 > @@ -1,2 +1,2 @@ > -Datapath actions: clone(tnl_push(tnl_port(4789),header(size=50,type=4,eth(dst=f8:bc:12:44:34:b6,src=aa:55:aa:55:00:00,dl_type=0x0800),ipv4(src=1.1.2.88,dst=1.1.2.92,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x7b)),out_port(100)),1) > +Datapath actions: clone(drop),tnl_push(tnl_port(4789),header(size=50,type=4,eth(dst=f8:bc:12:44:34:b6,src=aa:55:aa:55:00:00,dl_type=0x0800),ipv4(src=1.1.2.88,dst=1.1.2.92,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x7b)),out_port(100)) > > >> > + >> > + ctx->xin->flow.dl_src = smac; >> > + ctx->base_flow.dl_src = ctx->xin->flow.dl_src; >> > + >> > + propagate_tunnel_data_to_flow(&ctx->xin->flow, &ctx->base_flow); >> > + >> > + if (!tnl_params.is_ipv6) { >> > + ctx->xin->flow.dl_type = htons(ETH_TYPE_IP); >> > + } else { >> > + ctx->xin->flow.dl_type = htons(ETH_TYPE_IPV6); >> > + } >> >> When writing if (!condition) { ... } else { ... }, please consider >> whether you can swap the conditions and remove the negation. It's less >> cognitive load if we don't have to negate a negative condition to >> consider which condition the "else" applies to. (This otherwise looks >> correct, so just a style request.) > > Ok. I'm going to change this. Thanks. >> > + >> > + if (tnl_push_data.tnl_type == OVS_VPORT_TYPE_GRE) { >> > + ctx->xin->flow.nw_proto = IPPROTO_GRE; >> > + } else { >> > + ctx->xin->flow.nw_proto = IPPROTO_UDP; >> > + } >> > + ctx->base_flow.nw_proto = ctx->xin->flow.nw_proto; >> >> Should we use a switch () statement here to ensure that newer tunnel >> ports are properly handled in this case if/when they are added? What >> about things like LISP or STT? > > I can use a switch case: > > switch (tnl_push_data.tnl_type) { > case OVS_VPORT_TYPE_GRE: > ctx->xin->flow.nw_proto = IPPROTO_GRE; > break; > case OVS_VPORT_TYPE_VXLAN: > case OVS_VPORT_TYPE_GENEVE: > ctx->xin->flow.nw_proto = IPPROTO_UDP; > break; > case OVS_VPORT_TYPE_LISP: > /* XXX: UDP? */ > case OVS_VPORT_TYPE_STT: > /* XXX: TCP? */ > default: > break; > } > > But, how to handle LISP and STT? I have not seen any build_header() function > for LISP and STT in netdev-vport.c: > > static const struct vport_class vport_classes[] = { > TUNNEL_CLASS("geneve", "genev_sys", netdev_geneve_build_header, > netdev_tnl_push_udp_header, > netdev_geneve_pop_header), > TUNNEL_CLASS("gre", "gre_sys", netdev_gre_build_header, > netdev_gre_push_header, > netdev_gre_pop_header), > TUNNEL_CLASS("vxlan", "vxlan_sys", netdev_vxlan_build_header, > netdev_tnl_push_udp_header, > netdev_vxlan_pop_header), > TUNNEL_CLASS("lisp", "lisp_sys", NULL, NULL, NULL), > TUNNEL_CLASS("stt", "stt_sys", NULL, NULL, NULL), > }; > > Are LISP and STT valid tunnel types for native tunneling? If I'm not wrong, > build_tunnel_send() is invoked when native tunneling is used. Ah! Sugesh pointed this out to me yesterday. I though there was STT but you're right. For LISP/STT it can just call OVS_NOT_REACHED(). Then it can fail in a pretty obvious way if someone attempts to use those tunnel types in future. >> > + >> > size_t push_action_size = 0; >> > size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions, >> > OVS_ACTION_ATTR_CLONE); >> > odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data); >> > push_action_size = ctx->odp_actions->size; >> > + >> > + if (ctx->rule) { >> > + old_ths = ctx->rule->tunnel_header_size; >> > + ctx->rule->tunnel_header_size = tnl_push_data.header_len; >> > + } >> > + >> > apply_nested_clone_actions(ctx, xport, out_dev); >> > if (ctx->odp_actions->size > push_action_size) { >> > /* Update the CLONE action only when combined */ >> > nl_msg_end_nested(ctx->odp_actions, clone_ofs); >> > } >> >> If there are no actions on the underlay bridge (ie drop), the else >> condition here should probably reset the nl_msg nesting so that we >> don't generate any actions to the datapath at all. You may also >> consider adding a xlate_report() call to explain why the drop is >> occuring - this should help debugging later on using ofproto/trace. > > I'm not sure about this. > Sugesh, how should this be resolved? > >> > if (credit_counts) { >> > + uint64_t stats_n_bytes = 0; >> > + >> > + if (rule->truncated_packet_size) { >> > + stats_n_bytes = MIN(rule->truncated_packet_size, stats->n_bytes); >> > + } else { >> > + stats_n_bytes = stats->n_bytes; >> > + } >> >> Is this fixing a separate issue? I'm confused about whether truncated >> packet stats are being correctly attributed today on master. If so, I >> think this should split out into a separate patch. You might also >> consider whether it makes more sense to modify 'stats' earlier in the >> path so that each rule doesn't need to individually apply the stats >> adjustment. I could imagine a store/restore of the stats plus >> modifying for truncation during the xlate_output_trunc_action() >> processing rather than pushing this complexity all the way into the >> rule stats attribution. > > Incorrect bytes statistics on the underlay bridge is an unwanted side-effect of > the original "Avoid recirculate" commit. By exluding recirculation, there is no > upcall for the tunnelled packets, and the packet size is not set by > upcall_xlate() again: > > static void > upcall_xlate(struct udpif *udpif, struct upcall *upcall, > struct ofpbuf *odp_actions, struct flow_wildcards *wc) > { > struct dpif_flow_stats stats; > struct xlate_in xin; > > stats.n_packets = 1; > stats.n_bytes = dp_packet_size(upcall->packet); > stats.used = time_msec(); > stats.tcp_flags = ntohs(upcall->flow->tcp_flags); > > xlate_in_init(&xin, upcall->ofproto, > ofproto_dpif_get_tables_version(upcall->ofproto), > upcall->flow, upcall->in_port, NULL, > stats.tcp_flags, upcall->packet, wc, odp_actions); > > if (upcall->type == DPIF_UC_MISS) { > xin.resubmit_stats = &stats; > > Here upcall_xlate() sets xin.resubmint_stats which will be used for calculating > statistic. These are const values, that's why I have not updated them earlier. > > /* If nonnull, flow translation credits the specified statistics to each > * rule reached through a resubmit or OFPP_TABLE action. > * > * This is normally null so the client has to set it manually after > * calling xlate_in_init(). */ > const struct dpif_flow_stats *resubmit_stats; > > If it does not harm, then the const modifier could be removed and stats updated > at earlier stage. I see, thanks for the explanation. Mostly I'm just trying to push the question of "how can we make this code easier to read and understand?". Const is nice because it tells you these fields won't change. But I think it's probably a little easier if the truncation of the statistics is performed where the truncation occurs, so that the core stats attribution logic can be oblivious of this particular feature. Cheers, Joe
Hi Joe I started to rework my patch based on your comments and suggestions. I had some difficulties with the last one. Let us focus on this below. > >> > if (credit_counts) { > >> > + uint64_t stats_n_bytes = 0; > >> > + > >> > + if (rule->truncated_packet_size) { > >> > + stats_n_bytes = MIN(rule->truncated_packet_size, stats->n_bytes); > >> > + } else { > >> > + stats_n_bytes = stats->n_bytes; > >> > + } > >> > >> Is this fixing a separate issue? I'm confused about whether truncated > >> packet stats are being correctly attributed today on master. If so, I > >> think this should split out into a separate patch. You might also > >> consider whether it makes more sense to modify 'stats' earlier in the > >> path so that each rule doesn't need to individually apply the stats > >> adjustment. I could imagine a store/restore of the stats plus > >> modifying for truncation during the xlate_output_trunc_action() > >> processing rather than pushing this complexity all the way into the > >> rule stats attribution. > > > > > Incorrect bytes statistics on the underlay bridge is an unwanted side-effect of > > the original "Avoid recirculate" commit. By exluding recirculation, there is no > > upcall for the tunnelled packets, and the packet size is not set by > > upcall_xlate() again: > > > > static void > > upcall_xlate(struct udpif *udpif, struct upcall *upcall, > > struct ofpbuf *odp_actions, struct flow_wildcards *wc) > > { > > struct dpif_flow_stats stats; > > struct xlate_in xin; > > > > stats.n_packets = 1; > > stats.n_bytes = dp_packet_size(upcall->packet); > > stats.used = time_msec(); > > stats.tcp_flags = ntohs(upcall->flow->tcp_flags); > > > > xlate_in_init(&xin, upcall->ofproto, > > ofproto_dpif_get_tables_version(upcall->ofproto), > > upcall->flow, upcall->in_port, NULL, > > stats.tcp_flags, upcall->packet, wc, odp_actions); > > > > if (upcall->type == DPIF_UC_MISS) { > > xin.resubmit_stats = &stats; > > > > Here upcall_xlate() sets xin.resubmint_stats which will be used for calculating > > statistic. These are const values, that's why I have not updated them earlier. > > > > /* If nonnull, flow translation credits the specified statistics to each > > * rule reached through a resubmit or OFPP_TABLE action. > > * > > * This is normally null so the client has to set it manually after > > * calling xlate_in_init(). */ > > const struct dpif_flow_stats *resubmit_stats; > > > > If it does not harm, then the const modifier could be removed and stats updated > > at earlier stage. > > I see, thanks for the explanation. Mostly I'm just trying to push the > question of "how can we make this code easier to read and > understand?". Const is nice because it tells you these fields won't > change. But I think it's probably a little easier if the truncation of > the statistics is performed where the truncation occurs, so that the > core stats attribution logic can be oblivious of this particular > feature. I removed the const qualifier from resubmit_stats, removed the truncation from rule_dpif_credit_stats__() and applied it to resubmit_stats during translation. That works fine for a packet translated due to an upcall. At the end we get a sinlge datapath flow in the netdev datapath. It should look like this: netdev@ovs-netdev: hit:4 missed:10 br-int: br-int 65534/1: (tap) br-int-ns1 1/3: (system) gre0 2/5: (gre: remote_ip=10.0.0.20) br-p: br-p 65534/2: (tap) br-p-ns2 3/4: (system) flow-dump from non-dpdk interfaces: recirc_id(0),in_port(3),eth_type(0x0800),ipv4(tos=0/0x3,ttl=64,frag=no), packets:0, bytes:0, used:never, actions:trunc(200),clone(tnl_push(tnl_port(5) ,header(size=38,type=3,eth(dst=cc:cc:cc:dd:dd:de,src=02:3e:b1:dc:0e:42,dl_type=0x0800),ipv4(src=10.0.0.10,dst=10.0.0.20,proto=47,tos=0,ttl=64,frag=0x4 000),gre((flags=0x0,proto=0x6558))),out_port(2)),set(ipv4(tos=0/0x3,ttl=63)),4) At this point I get a problem, when a new packet arrives but there is no need for translation since we have the datapath flow in the netdev datapath. In this case the byte and packet statistics are calculated by the rule_dpif_credit_stats__() function as well. static void rule_dpif_credit_stats__(struct rule_dpif *rule, const struct dpif_flow_stats *stats, bool credit_counts) OVS_REQUIRES(rule->stats_mutex) { if (credit_counts) { rule->stats.n_packets += stats->n_packets; rule->stats.n_bytes += stats->n_bytes; } rule->stats.used = MAX(rule->stats.used, stats->used); } However, in this case the 'stats' argument does not come from resubmit_stats. It is derived from dpif_flow_stats and udpif_flow stats resulting in the original packet size. I guess, this is due to fact, that we have a single datapath flow which can't reflect the size of the original and the altered packet size. Someone could confirm this, to make sure I'm not wrong. These functions are invoked: static void revalidate(struct revalidator *revalidator) { ... const struct dpif_flow *f; ... n_dumped = dpif_flow_dump_next(dump_thread, flows, ARRAY_SIZE(flows)); ... for (f = flows; f < &flows[n_dumped]; f++, index++) { long long int used = f->stats.used; ... result = revalidate_ukey(udpif, ukey, &f->stats, &odp_actions, reval_seq, &recircs); ... } static enum reval_result revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, const struct dpif_flow_stats *stats, struct ofpbuf *odp_actions, uint64_t reval_seq, struct recirc_refs *recircs) OVS_REQUIRES(ukey->mutex) { ... push.n_bytes = (stats->n_bytes > ukey->stats.n_bytes ? stats->n_bytes - ukey->stats.n_bytes : 0); ... xlate_push_stats(ukey->xcache, &push); ... } void xlate_push_stats(struct xlate_cache *xcache, const struct dpif_flow_stats *stats) { ... xlate_push_stats_entry(entry, stats); ... } void xlate_push_stats_entry(struct xc_entry *entry, const struct dpif_flow_stats *stats) { ... rule_dpif_credit_stats(entry->rule, stats); ... } void rule_dpif_credit_stats(struct rule_dpif *rule, const struct dpif_flow_stats *stats) { ... rule_dpif_credit_stats__(rule, stats, true); ... } So, I'm not sure if stats correction can be solved at an earlier stage but in rule_dpif_credit_stats__(). I would like to get some help or verification from experts. Comment are welcome. Best regards, Zoltan
On 17 May 2017 at 04:37, Zoltán Balogh <zoltan.balogh@ericsson.com> wrote: > Hi Joe > > I started to rework my patch based on your comments and suggestions. I had some > difficulties with the last one. Let us focus on this below. > >> >> > if (credit_counts) { >> >> > + uint64_t stats_n_bytes = 0; >> >> > + >> >> > + if (rule->truncated_packet_size) { >> >> > + stats_n_bytes = MIN(rule->truncated_packet_size, stats->n_bytes); >> >> > + } else { >> >> > + stats_n_bytes = stats->n_bytes; >> >> > + } >> >> >> >> Is this fixing a separate issue? I'm confused about whether truncated >> >> packet stats are being correctly attributed today on master. If so, I >> >> think this should split out into a separate patch. You might also >> >> consider whether it makes more sense to modify 'stats' earlier in the >> >> path so that each rule doesn't need to individually apply the stats >> >> adjustment. I could imagine a store/restore of the stats plus >> >> modifying for truncation during the xlate_output_trunc_action() >> >> processing rather than pushing this complexity all the way into the >> >> rule stats attribution. >> >> > >> > Incorrect bytes statistics on the underlay bridge is an unwanted side-effect of >> > the original "Avoid recirculate" commit. By exluding recirculation, there is no >> > upcall for the tunnelled packets, and the packet size is not set by >> > upcall_xlate() again: >> > >> > static void >> > upcall_xlate(struct udpif *udpif, struct upcall *upcall, >> > struct ofpbuf *odp_actions, struct flow_wildcards *wc) >> > { >> > struct dpif_flow_stats stats; >> > struct xlate_in xin; >> > >> > stats.n_packets = 1; >> > stats.n_bytes = dp_packet_size(upcall->packet); >> > stats.used = time_msec(); >> > stats.tcp_flags = ntohs(upcall->flow->tcp_flags); >> > >> > xlate_in_init(&xin, upcall->ofproto, >> > ofproto_dpif_get_tables_version(upcall->ofproto), >> > upcall->flow, upcall->in_port, NULL, >> > stats.tcp_flags, upcall->packet, wc, odp_actions); >> > >> > if (upcall->type == DPIF_UC_MISS) { >> > xin.resubmit_stats = &stats; >> > >> > Here upcall_xlate() sets xin.resubmint_stats which will be used for calculating >> > statistic. These are const values, that's why I have not updated them earlier. >> > >> > /* If nonnull, flow translation credits the specified statistics to each >> > * rule reached through a resubmit or OFPP_TABLE action. >> > * >> > * This is normally null so the client has to set it manually after >> > * calling xlate_in_init(). */ >> > const struct dpif_flow_stats *resubmit_stats; >> > >> > If it does not harm, then the const modifier could be removed and stats updated >> > at earlier stage. >> >> I see, thanks for the explanation. Mostly I'm just trying to push the >> question of "how can we make this code easier to read and >> understand?". Const is nice because it tells you these fields won't >> change. But I think it's probably a little easier if the truncation of >> the statistics is performed where the truncation occurs, so that the >> core stats attribution logic can be oblivious of this particular >> feature. > > I removed the const qualifier from resubmit_stats, removed the truncation from > rule_dpif_credit_stats__() and applied it to resubmit_stats during translation. > That works fine for a packet translated due to an upcall. At the end we get a > sinlge datapath flow in the netdev datapath. It should look like this: > > netdev@ovs-netdev: hit:4 missed:10 > br-int: > br-int 65534/1: (tap) > br-int-ns1 1/3: (system) > gre0 2/5: (gre: remote_ip=10.0.0.20) > br-p: > br-p 65534/2: (tap) > br-p-ns2 3/4: (system) > > flow-dump from non-dpdk interfaces: > recirc_id(0),in_port(3),eth_type(0x0800),ipv4(tos=0/0x3,ttl=64,frag=no), packets:0, bytes:0, used:never, actions:trunc(200),clone(tnl_push(tnl_port(5) > ,header(size=38,type=3,eth(dst=cc:cc:cc:dd:dd:de,src=02:3e:b1:dc:0e:42,dl_type=0x0800),ipv4(src=10.0.0.10,dst=10.0.0.20,proto=47,tos=0,ttl=64,frag=0x4 > 000),gre((flags=0x0,proto=0x6558))),out_port(2)),set(ipv4(tos=0/0x3,ttl=63)),4) > > > At this point I get a problem, when a new packet arrives but there is no need > for translation since we have the datapath flow in the netdev datapath. > In this case the byte and packet statistics are calculated by the > rule_dpif_credit_stats__() function as well. > > static void > rule_dpif_credit_stats__(struct rule_dpif *rule, > const struct dpif_flow_stats *stats, > bool credit_counts) > OVS_REQUIRES(rule->stats_mutex) > { > if (credit_counts) { > rule->stats.n_packets += stats->n_packets; > rule->stats.n_bytes += stats->n_bytes; > } > rule->stats.used = MAX(rule->stats.used, stats->used); > } > > However, in this case the 'stats' argument does not come from resubmit_stats. > It is derived from dpif_flow_stats and udpif_flow stats resulting in the > original packet size. > I guess, this is due to fact, that we have a single datapath flow which can't > reflect the size of the original and the altered packet size. Someone could > confirm this, to make sure I'm not wrong. upcall_xlate() generates the 'dpif_flow_stats' from the packet in the upcall path. It then populates xin.resubmit_stats and calls xlate_actions(). For revalidator path, it is similar but populated from xlate_key(). What I imagined happening during translation is that when the flow with the truncate action is handled, the 'resubmit_stats' would be modified for subsequent translation. That is to say, in xlate_output_trunc_action() where it calls out to the xlate_output_action() common code, there would be store/restore of the resubmit_stats. Backing up a bit for context, the stats attribution goes roughly like this: * First upcall, handler thread calls through the translate code with a packet. The resubmit_stats are derived from that packet. This goes through xlate_actions(). * First dump of flow from revalidator thread fetches the flow and runs the same xlate_actions() with whatever stats it has (may be zero). This time, whenever stats attribution or side effects occur, an xlate_cache entry is generated. * Second and subsequent dumps of flows fetches the flow and shortcuts the xlate_actions() by using the xlate_cache instead - ie a call to xlate_push_stats(). So, in the same place where the resubmit_stats is manipulated, you would also need to generate a new XC entry which would manipulate the stats - this would be a 'side-effect'. I'd imagine that prior to the full output translation there would be a XC_TRUNCATE(truncated_size) then afterwards there would be an XC_TRUNCATE_RESET(). Or it could be just XC_SET_SIZE(...) where 0 is reset and non-zero is a truncate size. In the implementation/execution in xlate_push_stats() when performing XC_TRUNCATE you would need to store the original push_stats size somewhere, then calculate a new 'n_bytes' based on the number of packets and existing bytes*. For XC_TRUNCATE_RESET(), it would restore the original push_stats size. * Hmm, I'm not sure the calculation will be 100% here. Let's say there were 3 packets hit the flow, 50B, 200B, 300B. If output(max_len=100) was executed, then we don't know how many of the packets were truncated. The maximum number of bytes that could be transmitted is 300, but the actual number was 250. We could divide the n_bytes by n_packets, subtract the max_len and then multiply back up by the number of packets, which works for this case assuming floating point arithmetic but is slightly off if using integer math.. > These functions are invoked: > > static void > revalidate(struct revalidator *revalidator) > { > ... > const struct dpif_flow *f; > ... > n_dumped = dpif_flow_dump_next(dump_thread, flows, ARRAY_SIZE(flows)); > ... > for (f = flows; f < &flows[n_dumped]; f++, index++) { > long long int used = f->stats.used; > ... > result = revalidate_ukey(udpif, ukey, &f->stats, &odp_actions, > reval_seq, &recircs); > ... > } > > static enum reval_result > revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, > const struct dpif_flow_stats *stats, > struct ofpbuf *odp_actions, uint64_t reval_seq, > struct recirc_refs *recircs) > OVS_REQUIRES(ukey->mutex) > { > ... > push.n_bytes = (stats->n_bytes > ukey->stats.n_bytes > ? stats->n_bytes - ukey->stats.n_bytes > : 0); > ... > xlate_push_stats(ukey->xcache, &push); > ... > } > > void > xlate_push_stats(struct xlate_cache *xcache, > const struct dpif_flow_stats *stats) > { > ... > xlate_push_stats_entry(entry, stats); > ... > } > > void > xlate_push_stats_entry(struct xc_entry *entry, > const struct dpif_flow_stats *stats) > { > ... > rule_dpif_credit_stats(entry->rule, stats); > ... > } > > void > rule_dpif_credit_stats(struct rule_dpif *rule, > const struct dpif_flow_stats *stats) > { > ... > rule_dpif_credit_stats__(rule, stats, true); > ... > } > > So, I'm not sure if stats correction can be solved at an earlier stage but in > rule_dpif_credit_stats__(). I would like to get some help or verification > from experts. Comment are welcome.
Hi Joe, > Backing up a bit for context, the stats attribution goes roughly like this: > * First upcall, handler thread calls through the translate code with a > packet. The resubmit_stats are derived from that packet. This goes > through xlate_actions(). > * First dump of flow from revalidator thread fetches the flow and runs > the same xlate_actions() with whatever stats it has (may be zero). > This time, whenever stats attribution or side effects occur, an > xlate_cache entry is generated. > * Second and subsequent dumps of flows fetches the flow and shortcuts > the xlate_actions() by using the xlate_cache instead - ie a call to > xlate_push_stats(). > > So, in the same place where the resubmit_stats is manipulated, you > would also need to generate a new XC entry which would manipulate the > stats - this would be a 'side-effect'. I'd imagine that prior to the > full output translation there would be a XC_TRUNCATE(truncated_size) > then afterwards there would be an XC_TRUNCATE_RESET(). Or it could be > just XC_SET_SIZE(...) where 0 is reset and non-zero is a truncate > size. In the implementation/execution in xlate_push_stats() when > performing XC_TRUNCATE you would need to store the original push_stats > size somewhere, then calculate a new 'n_bytes' based on the number of > packets and existing bytes*. For XC_TRUNCATE_RESET(), it would restore > the original push_stats size. Thank you for the explanation. > * Hmm, I'm not sure the calculation will be 100% here. Let's say there > were 3 packets hit the flow, 50B, 200B, 300B. If output(max_len=100) > was executed, then we don't know how many of the packets were > truncated. The maximum number of bytes that could be transmitted is > 300, but the actual number was 250. We could divide the n_bytes by > n_packets, subtract the max_len and then multiply back up by the > number of packets, which works for this case assuming floating point > arithmetic but is slightly off if using integer math.. I don't think, that would be the proper way of calculating n_bytes. Let's say we have 3 packets with 50B, 200B, 200B and max_len=100. The output should be 50 + 100 + 100 = 250B. Following the instructions above we will get [(50 + 200 + 200) / 3 - 100 ] * 3 = [450 / 3 - 100 ] * 3 = 50 * 3 = 150B Any other idea how to calculate the truncated size with xlate cache? Or maybe I did not understand your calculation. There is one more thing to be taken into consideration. By adding a tunnel header, the size of packets increases as well. But that's a constant value for each packet, easier to calculate with it. Best regards, Zoltan
Regards _Sugesh > -----Original Message----- > From: Zoltán Balogh [mailto:zoltan.balogh@ericsson.com] > Sent: Friday, May 26, 2017 3:01 PM > To: Joe Stringer <joe@ovn.org> > Cc: Chandran, Sugesh <sugesh.chandran@intel.com>; Andy Zhou > <azhou@ovn.org>; William Tu <u9012063@gmail.com>; > gvrose8192@gmail.com; ovs dev <dev@openvswitch.org>; Ben Pfaff > <blp@ovn.org>; Jan Scheurich <jan.scheurich@ericsson.com>; Gray, Mark D > <mark.d.gray@intel.com> > Subject: RE: [ovs-dev] [PATCH v5] tunneling: Avoid recirculation on datapath > by computing the recirculate actions at translate time. > > > Hi Joe, > > > Backing up a bit for context, the stats attribution goes roughly like this: > > * First upcall, handler thread calls through the translate code with a > > packet. The resubmit_stats are derived from that packet. This goes > > through xlate_actions(). > > * First dump of flow from revalidator thread fetches the flow and runs > > the same xlate_actions() with whatever stats it has (may be zero). > > This time, whenever stats attribution or side effects occur, an > > xlate_cache entry is generated. > > * Second and subsequent dumps of flows fetches the flow and shortcuts > > the xlate_actions() by using the xlate_cache instead - ie a call to > > xlate_push_stats(). > > > > So, in the same place where the resubmit_stats is manipulated, you > > would also need to generate a new XC entry which would manipulate the > > stats - this would be a 'side-effect'. I'd imagine that prior to the > > full output translation there would be a XC_TRUNCATE(truncated_size) > > then afterwards there would be an XC_TRUNCATE_RESET(). Or it could be > > just XC_SET_SIZE(...) where 0 is reset and non-zero is a truncate > > size. In the implementation/execution in xlate_push_stats() when > > performing XC_TRUNCATE you would need to store the original push_stats > > size somewhere, then calculate a new 'n_bytes' based on the number of > > packets and existing bytes*. For XC_TRUNCATE_RESET(), it would restore > > the original push_stats size. > > Thank you for the explanation. > > > * Hmm, I'm not sure the calculation will be 100% here. Let's say there > > were 3 packets hit the flow, 50B, 200B, 300B. If output(max_len=100) > > was executed, then we don't know how many of the packets were > > truncated. The maximum number of bytes that could be transmitted is > > 300, but the actual number was 250. We could divide the n_bytes by > > n_packets, subtract the max_len and then multiply back up by the > > number of packets, which works for this case assuming floating point > > arithmetic but is slightly off if using integer math.. > > I don't think, that would be the proper way of calculating n_bytes. Let's say > we have 3 packets with 50B, 200B, 200B and max_len=100. The output should > be 50 + 100 + 100 = 250B. > Following the instructions above we will get > [(50 + 200 + 200) / 3 - 100 ] * 3 = [450 / 3 - 100 ] * 3 = 50 * 3 = 150B > > Any other idea how to calculate the truncated size with xlate cache? > Or maybe I did not understand your calculation. [Sugesh] Since we have this issue with the trunc action, How about limit the combine action only for those tunnels that don’t have any post trunc action. If there is a trunc action, Create two separate rules normally as now. Is there any other action that would be considered as exception like this? > > There is one more thing to be taken into consideration. By adding a tunnel > header, the size of packets increases as well. But that's a constant value for > each packet, easier to calculate with it. > > Best regards, > Zoltan
On 26 May 2017 at 07:00, Zoltán Balogh <zoltan.balogh@ericsson.com> wrote: > > Hi Joe, > >> Backing up a bit for context, the stats attribution goes roughly like this: >> * First upcall, handler thread calls through the translate code with a >> packet. The resubmit_stats are derived from that packet. This goes >> through xlate_actions(). >> * First dump of flow from revalidator thread fetches the flow and runs >> the same xlate_actions() with whatever stats it has (may be zero). >> This time, whenever stats attribution or side effects occur, an >> xlate_cache entry is generated. >> * Second and subsequent dumps of flows fetches the flow and shortcuts >> the xlate_actions() by using the xlate_cache instead - ie a call to >> xlate_push_stats(). >> >> So, in the same place where the resubmit_stats is manipulated, you >> would also need to generate a new XC entry which would manipulate the >> stats - this would be a 'side-effect'. I'd imagine that prior to the >> full output translation there would be a XC_TRUNCATE(truncated_size) >> then afterwards there would be an XC_TRUNCATE_RESET(). Or it could be >> just XC_SET_SIZE(...) where 0 is reset and non-zero is a truncate >> size. In the implementation/execution in xlate_push_stats() when >> performing XC_TRUNCATE you would need to store the original push_stats >> size somewhere, then calculate a new 'n_bytes' based on the number of >> packets and existing bytes*. For XC_TRUNCATE_RESET(), it would restore >> the original push_stats size. > > Thank you for the explanation. > >> * Hmm, I'm not sure the calculation will be 100% here. Let's say there >> were 3 packets hit the flow, 50B, 200B, 300B. If output(max_len=100) >> was executed, then we don't know how many of the packets were >> truncated. The maximum number of bytes that could be transmitted is >> 300, but the actual number was 250. We could divide the n_bytes by >> n_packets, subtract the max_len and then multiply back up by the >> number of packets, which works for this case assuming floating point >> arithmetic but is slightly off if using integer math.. > > I don't think, that would be the proper way of calculating n_bytes. Let's > say we have 3 packets with 50B, 200B, 200B and max_len=100. The output > should be 50 + 100 + 100 = 250B. > Following the instructions above we will get > [(50 + 200 + 200) / 3 - 100 ] * 3 = [450 / 3 - 100 ] * 3 = 50 * 3 = 150B > > Any other idea how to calculate the truncated size with xlate cache? > Or maybe I did not understand your calculation. Nope, you're absolutely right. It occurred to me a little later that for more complex statistic sets this doesn't work. I suspect you actually need datapath support for tracking these stats separately. > There is one more thing to be taken into consideration. By adding a tunnel > header, the size of packets increases as well. But that's a constant value > for each packet, easier to calculate with it. Good point.
On 26 May 2017 at 10:42, Chandran, Sugesh <sugesh.chandran@intel.com> wrote: > > > Regards > _Sugesh > > >> -----Original Message----- >> From: Zoltán Balogh [mailto:zoltan.balogh@ericsson.com] >> Sent: Friday, May 26, 2017 3:01 PM >> To: Joe Stringer <joe@ovn.org> >> Cc: Chandran, Sugesh <sugesh.chandran@intel.com>; Andy Zhou >> <azhou@ovn.org>; William Tu <u9012063@gmail.com>; >> gvrose8192@gmail.com; ovs dev <dev@openvswitch.org>; Ben Pfaff >> <blp@ovn.org>; Jan Scheurich <jan.scheurich@ericsson.com>; Gray, Mark D >> <mark.d.gray@intel.com> >> Subject: RE: [ovs-dev] [PATCH v5] tunneling: Avoid recirculation on datapath >> by computing the recirculate actions at translate time. >> >> >> Hi Joe, >> >> > Backing up a bit for context, the stats attribution goes roughly like this: >> > * First upcall, handler thread calls through the translate code with a >> > packet. The resubmit_stats are derived from that packet. This goes >> > through xlate_actions(). >> > * First dump of flow from revalidator thread fetches the flow and runs >> > the same xlate_actions() with whatever stats it has (may be zero). >> > This time, whenever stats attribution or side effects occur, an >> > xlate_cache entry is generated. >> > * Second and subsequent dumps of flows fetches the flow and shortcuts >> > the xlate_actions() by using the xlate_cache instead - ie a call to >> > xlate_push_stats(). >> > >> > So, in the same place where the resubmit_stats is manipulated, you >> > would also need to generate a new XC entry which would manipulate the >> > stats - this would be a 'side-effect'. I'd imagine that prior to the >> > full output translation there would be a XC_TRUNCATE(truncated_size) >> > then afterwards there would be an XC_TRUNCATE_RESET(). Or it could be >> > just XC_SET_SIZE(...) where 0 is reset and non-zero is a truncate >> > size. In the implementation/execution in xlate_push_stats() when >> > performing XC_TRUNCATE you would need to store the original push_stats >> > size somewhere, then calculate a new 'n_bytes' based on the number of >> > packets and existing bytes*. For XC_TRUNCATE_RESET(), it would restore >> > the original push_stats size. >> >> Thank you for the explanation. >> >> > * Hmm, I'm not sure the calculation will be 100% here. Let's say there >> > were 3 packets hit the flow, 50B, 200B, 300B. If output(max_len=100) >> > was executed, then we don't know how many of the packets were >> > truncated. The maximum number of bytes that could be transmitted is >> > 300, but the actual number was 250. We could divide the n_bytes by >> > n_packets, subtract the max_len and then multiply back up by the >> > number of packets, which works for this case assuming floating point >> > arithmetic but is slightly off if using integer math.. >> >> I don't think, that would be the proper way of calculating n_bytes. Let's say >> we have 3 packets with 50B, 200B, 200B and max_len=100. The output should >> be 50 + 100 + 100 = 250B. >> Following the instructions above we will get >> [(50 + 200 + 200) / 3 - 100 ] * 3 = [450 / 3 - 100 ] * 3 = 50 * 3 = 150B >> >> Any other idea how to calculate the truncated size with xlate cache? >> Or maybe I did not understand your calculation. > [Sugesh] Since we have this issue with the trunc action, > How about limit the combine action only for those tunnels that don’t have any post trunc action. > If there is a trunc action, Create two separate rules normally as now. > Is there any other action that would be considered as exception like this? There's no need for different rules. Translation of truncate+output is done from xlate_output_trunc_action(), which calls the common xlate_output_action() translation after setting up the packet max_len. Perhaps this path just needs to keep the recirculation, while the common xlate_output_action() path should be capable of doing this 'combined' (patch-port-style) translation. I think your last question is effectively asking 'who calls xlate_output_action'. I see output_reg, output_trunc enqueue, bundle, and regular output. I think that truncate is the only 'special' case from this perspective.
Thank you Joe for the response. Regards _Sugesh > -----Original Message----- > From: Joe Stringer [mailto:joe@ovn.org] > Sent: Friday, May 26, 2017 7:02 PM > To: Chandran, Sugesh <sugesh.chandran@intel.com> > Cc: Zoltán Balogh <zoltan.balogh@ericsson.com>; Andy Zhou > <azhou@ovn.org>; William Tu <u9012063@gmail.com>; > gvrose8192@gmail.com; ovs dev <dev@openvswitch.org>; Ben Pfaff > <blp@ovn.org>; Jan Scheurich <jan.scheurich@ericsson.com>; Gray, Mark D > <mark.d.gray@intel.com> > Subject: Re: [ovs-dev] [PATCH v5] tunneling: Avoid recirculation on datapath > by computing the recirculate actions at translate time. > > On 26 May 2017 at 10:42, Chandran, Sugesh <sugesh.chandran@intel.com> > wrote: > > > > > > Regards > > _Sugesh > > > > > >> -----Original Message----- > >> From: Zoltán Balogh [mailto:zoltan.balogh@ericsson.com] > >> Sent: Friday, May 26, 2017 3:01 PM > >> To: Joe Stringer <joe@ovn.org> > >> Cc: Chandran, Sugesh <sugesh.chandran@intel.com>; Andy Zhou > >> <azhou@ovn.org>; William Tu <u9012063@gmail.com>; > >> gvrose8192@gmail.com; ovs dev <dev@openvswitch.org>; Ben Pfaff > >> <blp@ovn.org>; Jan Scheurich <jan.scheurich@ericsson.com>; Gray, > Mark > >> D <mark.d.gray@intel.com> > >> Subject: RE: [ovs-dev] [PATCH v5] tunneling: Avoid recirculation on > >> datapath by computing the recirculate actions at translate time. > >> > >> > >> Hi Joe, > >> > >> > Backing up a bit for context, the stats attribution goes roughly like this: > >> > * First upcall, handler thread calls through the translate code > >> > with a packet. The resubmit_stats are derived from that packet. > >> > This goes through xlate_actions(). > >> > * First dump of flow from revalidator thread fetches the flow and > >> > runs the same xlate_actions() with whatever stats it has (may be zero). > >> > This time, whenever stats attribution or side effects occur, an > >> > xlate_cache entry is generated. > >> > * Second and subsequent dumps of flows fetches the flow and > >> > shortcuts the xlate_actions() by using the xlate_cache instead - ie > >> > a call to xlate_push_stats(). > >> > > >> > So, in the same place where the resubmit_stats is manipulated, you > >> > would also need to generate a new XC entry which would manipulate > >> > the stats - this would be a 'side-effect'. I'd imagine that prior > >> > to the full output translation there would be a > >> > XC_TRUNCATE(truncated_size) then afterwards there would be an > >> > XC_TRUNCATE_RESET(). Or it could be just XC_SET_SIZE(...) where 0 > >> > is reset and non-zero is a truncate size. In the > >> > implementation/execution in xlate_push_stats() when performing > >> > XC_TRUNCATE you would need to store the original push_stats size > >> > somewhere, then calculate a new 'n_bytes' based on the number of > >> > packets and existing bytes*. For XC_TRUNCATE_RESET(), it would > restore the original push_stats size. > >> > >> Thank you for the explanation. > >> > >> > * Hmm, I'm not sure the calculation will be 100% here. Let's say > >> > there were 3 packets hit the flow, 50B, 200B, 300B. If > >> > output(max_len=100) was executed, then we don't know how many of > >> > the packets were truncated. The maximum number of bytes that could > >> > be transmitted is 300, but the actual number was 250. We could > >> > divide the n_bytes by n_packets, subtract the max_len and then > >> > multiply back up by the number of packets, which works for this > >> > case assuming floating point arithmetic but is slightly off if using integer > math.. > >> > >> I don't think, that would be the proper way of calculating n_bytes. > >> Let's say we have 3 packets with 50B, 200B, 200B and max_len=100. The > >> output should be 50 + 100 + 100 = 250B. > >> Following the instructions above we will get > >> [(50 + 200 + 200) / 3 - 100 ] * 3 = [450 / 3 - 100 ] * 3 = 50 * 3 = > >> 150B > >> > >> Any other idea how to calculate the truncated size with xlate cache? > >> Or maybe I did not understand your calculation. > > [Sugesh] Since we have this issue with the trunc action, How about > > limit the combine action only for those tunnels that don’t have any post > trunc action. > > If there is a trunc action, Create two separate rules normally as now. > > Is there any other action that would be considered as exception like this? > > There's no need for different rules. [Sugesh] Ok. What I meant here is do the combine only for the non trunc case. > > Translation of truncate+output is done from xlate_output_trunc_action(), > which calls the common > xlate_output_action() translation after setting up the packet max_len. > Perhaps this path just needs to keep the recirculation, while the common > xlate_output_action() path should be capable of doing this 'combined' > (patch-port-style) translation. [Sugesh] yes. > > I think your last question is effectively asking 'who calls xlate_output_action'. > I see output_reg, output_trunc enqueue, bundle, and regular output. I think > that truncate is the only 'special' case from this perspective. [Sugesh] So we can treat trunc as a special case then,
--- - 2017-05-11 14:40:42.205776583 +0200 +++ /home/ezolbal/work/general_L3_tunneling/ovs/tests/testsuite.dir/at-groups/783/stdout 2017-05-11 14:40:42.200387095 +0200 @@ -1,2 +1,2 @@ -Datapath actions: clone(tnl_push(tnl_port(4789),header(size=50,type=4,eth(dst=f8:bc:12:44:34:b6,src=aa:55:aa:55:00:00,dl_type=0x0800),ipv4(src=1.1.2.88,dst=1.1.2.92,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x7b)),out_port(100)),1) +Datapath actions: clone(drop),tnl_push(tnl_port(4789),header(size=50,type=4,eth(dst=f8:bc:12:44:34:b6,src=aa:55:aa:55:00:00,dl_type=0x0800),ipv4(src=1.1.2.88,dst=1.1.2.92,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x7b)),out_port(100))