diff mbox

[net-next] packet: fix panic in __packet_set_timestamp on tpacket_v3 in tx mode

Message ID 1483640872.9712.1.camel@edumazet-glaptop3.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Jan. 5, 2017, 6:27 p.m. UTC
On Thu, 2017-01-05 at 02:34 +0100, Daniel Borkmann wrote:
> When TX timestamping is in use with TPACKET_V3's TX ring, then we'll
> hit the BUG() in __packet_set_timestamp() when ring buffer slot is
> returned to user space via tpacket_destruct_skb(). This is due to v3
> being assumed as unreachable here, but since 7f953ab2ba46 ("af_packet:
> TX_RING support for TPACKET_V3") it's not anymore. Fix it by filling
> the timestamp back into the ring slot.
> 
> Fixes: 7f953ab2ba46 ("af_packet: TX_RING support for TPACKET_V3")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  net/packet/af_packet.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 7e39087..ddbda25 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -481,6 +481,9 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame,
>  		h.h2->tp_nsec = ts.tv_nsec;
>  		break;
>  	case TPACKET_V3:
> +		h.h3->tp_sec = ts.tv_sec;
> +		h.h3->tp_nsec = ts.tv_nsec;
> +		break;
>  	default:
>  		WARN(1, "TPACKET version not supported.\n");
>  		BUG();

Gosh. Can we also replace this BUG() into something less aggressive ?

Comments

Daniel Borkmann Jan. 5, 2017, 7:10 p.m. UTC | #1
On 01/05/2017 07:27 PM, Eric Dumazet wrote:
> On Thu, 2017-01-05 at 02:34 +0100, Daniel Borkmann wrote:
[...]
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index 7e39087..ddbda25 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -481,6 +481,9 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame,
>>   		h.h2->tp_nsec = ts.tv_nsec;
>>   		break;
>>   	case TPACKET_V3:
>> +		h.h3->tp_sec = ts.tv_sec;
>> +		h.h3->tp_nsec = ts.tv_nsec;
>> +		break;
>>   	default:
>>   		WARN(1, "TPACKET version not supported.\n");
>>   		BUG();
>
> Gosh. Can we also replace this BUG() into something less aggressive ?

There are currently 5 of these WARN() + BUG() constructs and 1 BUG()-only
for the 'default' TPACKET version spread all over af_packet, so probably
makes sense to rather make all of them less aggressive.

> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index b9e1a13b4ba36a0bc7edf6a8c2c116c7d48c970c..0c0d268544787dcbef6601c5014e7d3836d16f96 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -476,9 +476,11 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame,
>   		h.h2->tp_nsec = ts.tv_nsec;
>   		break;
>   	case TPACKET_V3:
> +		h.h3->tp_sec = ts.tv_sec;
> +		h.h3->tp_nsec = ts.tv_nsec;
> +		break;
>   	default:
> -		WARN(1, "TPACKET version not supported.\n");
> -		BUG();
> +		pr_err_once("TPACKET version %u not supported.\n", po->tp_version);
>   	}
>
>   	/* one flush is safe, as both fields always lie on the same cacheline */
>
>
chetan L March 6, 2017, 5:13 p.m. UTC | #2
>>
>> Gosh. Can we also replace this BUG() into something less aggressive ?
>
>
> There are currently 5 of these WARN() + BUG() constructs and 1 BUG()-only
> for the 'default' TPACKET version spread all over af_packet, so probably
> makes sense to rather make all of them less aggressive.
>
>

Very few consumers actually go looking in the kernel logs to see the
error-warnings and report them back here.

This severity will get them to report the incident which in this case
got fixed??
Willem de Bruijn March 6, 2017, 5:45 p.m. UTC | #3
On Mon, Mar 6, 2017 at 12:13 PM, chetan loke <loke.chetan@gmail.com> wrote:
>>>
>>> Gosh. Can we also replace this BUG() into something less aggressive ?
>>
>>
>> There are currently 5 of these WARN() + BUG() constructs and 1 BUG()-only
>> for the 'default' TPACKET version spread all over af_packet, so probably
>> makes sense to rather make all of them less aggressive.
>>
>>
>
> Very few consumers actually go looking in the kernel logs to see the
> error-warnings and report them back here.
>
> This severity will get them to report the incident which in this case
> got fixed??

But BUG_ONs in the datapath can cause outages in real production
environments. This should not happen for recoverable failures. For
users who cannot be bothered to check their logs, there is sysctl
kernel.panic_on_warn.
chetan L March 6, 2017, 9:36 p.m. UTC | #4
On Mon, Mar 6, 2017 at 9:45 AM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Mon, Mar 6, 2017 at 12:13 PM, chetan loke <loke.chetan@gmail.com> wrote:
>>>>
>>>> Gosh. Can we also replace this BUG() into something less aggressive ?
>>>
>>>
>>> There are currently 5 of these WARN() + BUG() constructs and 1 BUG()-only
>>> for the 'default' TPACKET version spread all over af_packet, so probably
>>> makes sense to rather make all of them less aggressive.
>>>
>>>
>>
>> Very few consumers actually go looking in the kernel logs to see the
>> error-warnings and report them back here.
>>
>> This severity will get them to report the incident which in this case
>> got fixed??
>
> But BUG_ONs in the datapath can cause outages in real production
> environments. This should not happen for recoverable failures. For
> users who cannot be bothered to check their logs, there is sysctl
> kernel.panic_on_warn.


Completely understand(and you should have failover to handle these
outages). But then are you ok giving incorrect info to the
application?

For this specific bug: it is so basic that you should hit this bug 1st
time everytime when you are adding support or porting a new header.
Correct?

And so from that point of view, this BUG_ON has served its purpose.
Willem de Bruijn March 6, 2017, 10:13 p.m. UTC | #5
>>>>> Gosh. Can we also replace this BUG() into something less aggressive ?
>>>>
>>>>
>>>> There are currently 5 of these WARN() + BUG() constructs and 1 BUG()-only
>>>> for the 'default' TPACKET version spread all over af_packet, so probably
>>>> makes sense to rather make all of them less aggressive.
>>>>
>>>>
>>>
>>> Very few consumers actually go looking in the kernel logs to see the
>>> error-warnings and report them back here.
>>>
>>> This severity will get them to report the incident which in this case
>>> got fixed??
>>
>> But BUG_ONs in the datapath can cause outages in real production
>> environments. This should not happen for recoverable failures. For
>> users who cannot be bothered to check their logs, there is sysctl
>> kernel.panic_on_warn.
>
>
> Completely understand(and you should have failover to handle these
> outages).

Not for correlated failures where all systems can hit the same path.
This is especially dangerous when remote packets or untrusted
local users can trigger a BUG-enabled path.

> But then are you ok giving incorrect info to the
> application?

No, we should certainly signal an error. For instance, returning
TP_STATUS_WRONG_FORMAT instead of TP_STATUS_AVAILABLE.

> For this specific bug: it is so basic that you should hit this bug 1st
> time everytime when you are adding support or porting a new header.
> Correct?

Agreed, but that is small consolation if an unprivileged user (say, in
a namespace) finds out that it can trigger the codepath.

But I agree that this particular BUG_ON is one of the easier to
reason about.
diff mbox

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b9e1a13b4ba36a0bc7edf6a8c2c116c7d48c970c..0c0d268544787dcbef6601c5014e7d3836d16f96 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -476,9 +476,11 @@  static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame,
 		h.h2->tp_nsec = ts.tv_nsec;
 		break;
 	case TPACKET_V3:
+		h.h3->tp_sec = ts.tv_sec;
+		h.h3->tp_nsec = ts.tv_nsec;
+		break;
 	default:
-		WARN(1, "TPACKET version not supported.\n");
-		BUG();
+		pr_err_once("TPACKET version %u not supported.\n", po->tp_version);
 	}
 
 	/* one flush is safe, as both fields always lie on the same cacheline */