Message ID | 1483640872.9712.1.camel@edumazet-glaptop3.roam.corp.google.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On 01/05/2017 07:27 PM, Eric Dumazet wrote: > On Thu, 2017-01-05 at 02:34 +0100, Daniel Borkmann wrote: [...] >> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c >> index 7e39087..ddbda25 100644 >> --- a/net/packet/af_packet.c >> +++ b/net/packet/af_packet.c >> @@ -481,6 +481,9 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame, >> h.h2->tp_nsec = ts.tv_nsec; >> break; >> case TPACKET_V3: >> + h.h3->tp_sec = ts.tv_sec; >> + h.h3->tp_nsec = ts.tv_nsec; >> + break; >> default: >> WARN(1, "TPACKET version not supported.\n"); >> BUG(); > > Gosh. Can we also replace this BUG() into something less aggressive ? There are currently 5 of these WARN() + BUG() constructs and 1 BUG()-only for the 'default' TPACKET version spread all over af_packet, so probably makes sense to rather make all of them less aggressive. > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index b9e1a13b4ba36a0bc7edf6a8c2c116c7d48c970c..0c0d268544787dcbef6601c5014e7d3836d16f96 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -476,9 +476,11 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame, > h.h2->tp_nsec = ts.tv_nsec; > break; > case TPACKET_V3: > + h.h3->tp_sec = ts.tv_sec; > + h.h3->tp_nsec = ts.tv_nsec; > + break; > default: > - WARN(1, "TPACKET version not supported.\n"); > - BUG(); > + pr_err_once("TPACKET version %u not supported.\n", po->tp_version); > } > > /* one flush is safe, as both fields always lie on the same cacheline */ > >
>> >> Gosh. Can we also replace this BUG() into something less aggressive ? > > > There are currently 5 of these WARN() + BUG() constructs and 1 BUG()-only > for the 'default' TPACKET version spread all over af_packet, so probably > makes sense to rather make all of them less aggressive. > > Very few consumers actually go looking in the kernel logs to see the error-warnings and report them back here. This severity will get them to report the incident which in this case got fixed??
On Mon, Mar 6, 2017 at 12:13 PM, chetan loke <loke.chetan@gmail.com> wrote: >>> >>> Gosh. Can we also replace this BUG() into something less aggressive ? >> >> >> There are currently 5 of these WARN() + BUG() constructs and 1 BUG()-only >> for the 'default' TPACKET version spread all over af_packet, so probably >> makes sense to rather make all of them less aggressive. >> >> > > Very few consumers actually go looking in the kernel logs to see the > error-warnings and report them back here. > > This severity will get them to report the incident which in this case > got fixed?? But BUG_ONs in the datapath can cause outages in real production environments. This should not happen for recoverable failures. For users who cannot be bothered to check their logs, there is sysctl kernel.panic_on_warn.
On Mon, Mar 6, 2017 at 9:45 AM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > On Mon, Mar 6, 2017 at 12:13 PM, chetan loke <loke.chetan@gmail.com> wrote: >>>> >>>> Gosh. Can we also replace this BUG() into something less aggressive ? >>> >>> >>> There are currently 5 of these WARN() + BUG() constructs and 1 BUG()-only >>> for the 'default' TPACKET version spread all over af_packet, so probably >>> makes sense to rather make all of them less aggressive. >>> >>> >> >> Very few consumers actually go looking in the kernel logs to see the >> error-warnings and report them back here. >> >> This severity will get them to report the incident which in this case >> got fixed?? > > But BUG_ONs in the datapath can cause outages in real production > environments. This should not happen for recoverable failures. For > users who cannot be bothered to check their logs, there is sysctl > kernel.panic_on_warn. Completely understand(and you should have failover to handle these outages). But then are you ok giving incorrect info to the application? For this specific bug: it is so basic that you should hit this bug 1st time everytime when you are adding support or porting a new header. Correct? And so from that point of view, this BUG_ON has served its purpose.
>>>>> Gosh. Can we also replace this BUG() into something less aggressive ? >>>> >>>> >>>> There are currently 5 of these WARN() + BUG() constructs and 1 BUG()-only >>>> for the 'default' TPACKET version spread all over af_packet, so probably >>>> makes sense to rather make all of them less aggressive. >>>> >>>> >>> >>> Very few consumers actually go looking in the kernel logs to see the >>> error-warnings and report them back here. >>> >>> This severity will get them to report the incident which in this case >>> got fixed?? >> >> But BUG_ONs in the datapath can cause outages in real production >> environments. This should not happen for recoverable failures. For >> users who cannot be bothered to check their logs, there is sysctl >> kernel.panic_on_warn. > > > Completely understand(and you should have failover to handle these > outages). Not for correlated failures where all systems can hit the same path. This is especially dangerous when remote packets or untrusted local users can trigger a BUG-enabled path. > But then are you ok giving incorrect info to the > application? No, we should certainly signal an error. For instance, returning TP_STATUS_WRONG_FORMAT instead of TP_STATUS_AVAILABLE. > For this specific bug: it is so basic that you should hit this bug 1st > time everytime when you are adding support or porting a new header. > Correct? Agreed, but that is small consolation if an unprivileged user (say, in a namespace) finds out that it can trigger the codepath. But I agree that this particular BUG_ON is one of the easier to reason about.
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index b9e1a13b4ba36a0bc7edf6a8c2c116c7d48c970c..0c0d268544787dcbef6601c5014e7d3836d16f96 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -476,9 +476,11 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame, h.h2->tp_nsec = ts.tv_nsec; break; case TPACKET_V3: + h.h3->tp_sec = ts.tv_sec; + h.h3->tp_nsec = ts.tv_nsec; + break; default: - WARN(1, "TPACKET version not supported.\n"); - BUG(); + pr_err_once("TPACKET version %u not supported.\n", po->tp_version); } /* one flush is safe, as both fields always lie on the same cacheline */