diff mbox

net: stop endless flood about dst entry refcount underflow or overflow

Message ID 20150714114305.17434.53731.stgit@buzz
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Konstantin Khlebnikov July 14, 2015, 11:43 a.m. UTC
Kernel generates a lot of warnings when dst entry reference counter
overflows and becomes negative. This patch prints address of dst entry,
its refcount and then resets reference counter to INT_MAX/2.

That bug was seen several times at machines with outdated 3.10.y kernels.
Most like it's already fixed in upstream. Anyway flood of that warnings
completely kills machine and makes further debugging impossible.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 net/core/dst.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet July 14, 2015, 12:04 p.m. UTC | #1
On Tue, 2015-07-14 at 14:43 +0300, Konstantin Khlebnikov wrote:
> Kernel generates a lot of warnings when dst entry reference counter
> overflows and becomes negative. This patch prints address of dst entry,
> its refcount and then resets reference counter to INT_MAX/2.
> 
> That bug was seen several times at machines with outdated 3.10.y kernels.
> Most like it's already fixed in upstream. Anyway flood of that warnings
> completely kills machine and makes further debugging impossible.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
>  net/core/dst.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/dst.c b/net/core/dst.c
> index e956ce6d1378..2ed91082b3cf 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -284,7 +284,8 @@ void dst_release(struct dst_entry *dst)
>  		int newrefcnt;
>  
>  		newrefcnt = atomic_dec_return(&dst->__refcnt);
> -		WARN_ON(newrefcnt < 0);
> +		if (WARN(newrefcnt < 0, "dst: %p refcnt: %d\n", dst, newrefcnt))
> +			atomic_set(&dst->__refcnt, INT_MAX / 2);
>  		if (unlikely(dst->flags & DST_NOCACHE) && !newrefcnt)
>  			call_rcu(&dst->rcu_head, dst_destroy_rcu);
>  	}


WARN_ON_ONCE() if you want, but setting __refcnt like this is absolutely
a dirty hack.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Konstantin Khlebnikov July 14, 2015, 12:15 p.m. UTC | #2
On 14.07.2015 15:04, Eric Dumazet wrote:
> On Tue, 2015-07-14 at 14:43 +0300, Konstantin Khlebnikov wrote:
>> Kernel generates a lot of warnings when dst entry reference counter
>> overflows and becomes negative. This patch prints address of dst entry,
>> its refcount and then resets reference counter to INT_MAX/2.
>>
>> That bug was seen several times at machines with outdated 3.10.y kernels.
>> Most like it's already fixed in upstream. Anyway flood of that warnings
>> completely kills machine and makes further debugging impossible.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> ---
>>   net/core/dst.c |    3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/core/dst.c b/net/core/dst.c
>> index e956ce6d1378..2ed91082b3cf 100644
>> --- a/net/core/dst.c
>> +++ b/net/core/dst.c
>> @@ -284,7 +284,8 @@ void dst_release(struct dst_entry *dst)
>>   		int newrefcnt;
>>
>>   		newrefcnt = atomic_dec_return(&dst->__refcnt);
>> -		WARN_ON(newrefcnt < 0);
>> +		if (WARN(newrefcnt < 0, "dst: %p refcnt: %d\n", dst, newrefcnt))
>> +			atomic_set(&dst->__refcnt, INT_MAX / 2);
>>   		if (unlikely(dst->flags & DST_NOCACHE) && !newrefcnt)
>>   			call_rcu(&dst->rcu_head, dst_destroy_rcu);
>>   	}
>
>
> WARN_ON_ONCE() if you want, but setting __refcnt like this is absolutely
> a dirty hack.

Simple warn-once will hide a lot of information which could be useful.
Also dst entry leak is better than freeing actually active entry.

>
>
>
Eric Dumazet July 14, 2015, 12:26 p.m. UTC | #3
On Tue, 2015-07-14 at 15:15 +0300, Konstantin Khlebnikov wrote:

> Simple warn-once will hide a lot of information which could be useful.
> Also dst entry leak is better than freeing actually active entry.

Then BUG_ON() .

Really, we need to fix leaks, not brown paper them.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 14, 2015, 10:30 p.m. UTC | #4
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 14 Jul 2015 14:26:07 +0200

> On Tue, 2015-07-14 at 15:15 +0300, Konstantin Khlebnikov wrote:
> 
>> Simple warn-once will hide a lot of information which could be useful.
>> Also dst entry leak is better than freeing actually active entry.
> 
> Then BUG_ON() .
> 
> Really, we need to fix leaks, not brown paper them.

No, killing the machine is not the answer.

If you want to rate limit this message, do it on a per-device basis,
but without corrupting the netdev state in the process.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dst.c b/net/core/dst.c
index e956ce6d1378..2ed91082b3cf 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -284,7 +284,8 @@  void dst_release(struct dst_entry *dst)
 		int newrefcnt;
 
 		newrefcnt = atomic_dec_return(&dst->__refcnt);
-		WARN_ON(newrefcnt < 0);
+		if (WARN(newrefcnt < 0, "dst: %p refcnt: %d\n", dst, newrefcnt))
+			atomic_set(&dst->__refcnt, INT_MAX / 2);
 		if (unlikely(dst->flags & DST_NOCACHE) && !newrefcnt)
 			call_rcu(&dst->rcu_head, dst_destroy_rcu);
 	}