diff mbox

Linux 4.2.4

Message ID alpine.DEB.2.10.1510252032420.14141@blackhole.kfki.hu
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jozsef Kadlecsik Oct. 25, 2015, 7:46 p.m. UTC
Hi,

On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:

> On 25.10.2015 10:46, Willy Tarreau wrote:
> > ipset *triggered* the problem. The whole stack dump would tell more. 
> 
> OK, find the stack traces in the bug report:
> https://bugzilla.redhat.com/show_bug.cgi?id=1272645
> 
> Kernel 4.1.10 triggered also a kernel dump when playing with ipset commands
> and IPv6, details in the bug report  ....

It seems to me it is an architecture-specific alignment issue. I don't 
have a Cortex-A7 ARM hardware and qemu doesn't seem to support it either, 
so I'm unable to reproduce it (ipset passes all my tests on my hardware, 
including more complex ones than what breaks here). My first wild guess is 
that the dynamic array of the element structure is not aligned properly. 
Could you give a try to the next patch?


If that does not solve it, then could you help to narrow down the issue? 
Does the bug still appear if your remove the counter extension of the set?

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Gerhard Wiesinger Oct. 25, 2015, 8:08 p.m. UTC | #1
On 25.10.2015 20:46, Jozsef Kadlecsik wrote:
> Hi,
>
> On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:
>
>> On 25.10.2015 10:46, Willy Tarreau wrote:
>>> ipset *triggered* the problem. The whole stack dump would tell more.
>> OK, find the stack traces in the bug report:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1272645
>>
>> Kernel 4.1.10 triggered also a kernel dump when playing with ipset commands
>> and IPv6, details in the bug report  ....
> It seems to me it is an architecture-specific alignment issue. I don't
> have a Cortex-A7 ARM hardware and qemu doesn't seem to support it either,
> so I'm unable to reproduce it (ipset passes all my tests on my hardware,
> including more complex ones than what breaks here). My first wild guess is
> that the dynamic array of the element structure is not aligned properly.
> Could you give a try to the next patch?
>
> diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
> index afe905c..1cf357d 100644
> --- a/net/netfilter/ipset/ip_set_hash_gen.h
> +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> @@ -1211,6 +1211,9 @@ static const struct ip_set_type_variant mtype_variant = {
>   	.same_set = mtype_same_set,
>   };
>   
> +#define IP_SET_BASE_ALIGN(dtype)	\
> +	ALIGN(sizeof(struct dtype), __alignof__(struct dtype))
> +
>   #ifdef IP_SET_EMIT_CREATE
>   static int
>   IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
> @@ -1319,12 +1322,12 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
>   #endif
>   		set->variant = &IPSET_TOKEN(HTYPE, 4_variant);
>   		set->dsize = ip_set_elem_len(set, tb,
> -				sizeof(struct IPSET_TOKEN(HTYPE, 4_elem)));
> +				IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE, 4_elem)));
>   #ifndef IP_SET_PROTO_UNDEF
>   	} else {
>   		set->variant = &IPSET_TOKEN(HTYPE, 6_variant);
>   		set->dsize = ip_set_elem_len(set, tb,
> -				sizeof(struct IPSET_TOKEN(HTYPE, 6_elem)));
> +				IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE, 6_elem)));
>   	}
>   #endif
>   	if (tb[IPSET_ATTR_TIMEOUT]) {
>
> If that does not solve it, then could you help to narrow down the issue?
> Does the bug still appear if your remove the counter extension of the set?
>

Hello Jozsef,

Patch applied well, compiling ...

Interesting, that it didn't happen before. Device is in production for 
more than 2 month without any issue.

Also any idea regarding the second isssue? Or do you think it has the 
same root cause?

Greetings from Vienna, Austria :-)

BTW: You can get the Banana Pi R1 for example at:
http://www.aliexpress.com/item/BPI-R1-Set-1-R1-Board-Clear-Case-5dB-Antenna-Power-Adapter-Banana-PI-R1-Smart/32362127917.html
I can really recommend it as a router. Power consumption is as less as 
3W. Price is also IMHO very good.

Ciao,
Gerhard

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gerhard Wiesinger Oct. 25, 2015, 9:26 p.m. UTC | #2
On 25.10.2015 21:08, Gerhard Wiesinger wrote:
> On 25.10.2015 20:46, Jozsef Kadlecsik wrote:
>> Hi,
>>
>> On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:
>>
>>> On 25.10.2015 10:46, Willy Tarreau wrote:
>>>> ipset *triggered* the problem. The whole stack dump would tell more.
>>> OK, find the stack traces in the bug report:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1272645
>>>
>>> Kernel 4.1.10 triggered also a kernel dump when playing with ipset 
>>> commands
>>> and IPv6, details in the bug report  ....
>> It seems to me it is an architecture-specific alignment issue. I don't
>> have a Cortex-A7 ARM hardware and qemu doesn't seem to support it 
>> either,
>> so I'm unable to reproduce it (ipset passes all my tests on my hardware,
>> including more complex ones than what breaks here). My first wild 
>> guess is
>> that the dynamic array of the element structure is not aligned properly.
>> Could you give a try to the next patch?
>>
>> diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
>> b/net/netfilter/ipset/ip_set_hash_gen.h
>> index afe905c..1cf357d 100644
>> --- a/net/netfilter/ipset/ip_set_hash_gen.h
>> +++ b/net/netfilter/ipset/ip_set_hash_gen.h
>> @@ -1211,6 +1211,9 @@ static const struct ip_set_type_variant 
>> mtype_variant = {
>>       .same_set = mtype_same_set,
>>   };
>>   +#define IP_SET_BASE_ALIGN(dtype)    \
>> +    ALIGN(sizeof(struct dtype), __alignof__(struct dtype))
>> +
>>   #ifdef IP_SET_EMIT_CREATE
>>   static int
>>   IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
>> @@ -1319,12 +1322,12 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, 
>> struct ip_set *set,
>>   #endif
>>           set->variant = &IPSET_TOKEN(HTYPE, 4_variant);
>>           set->dsize = ip_set_elem_len(set, tb,
>> -                sizeof(struct IPSET_TOKEN(HTYPE, 4_elem)));
>> +                IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE, 4_elem)));
>>   #ifndef IP_SET_PROTO_UNDEF
>>       } else {
>>           set->variant = &IPSET_TOKEN(HTYPE, 6_variant);
>>           set->dsize = ip_set_elem_len(set, tb,
>> -                sizeof(struct IPSET_TOKEN(HTYPE, 6_elem)));
>> +                IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE, 6_elem)));
>>       }
>>   #endif
>>       if (tb[IPSET_ATTR_TIMEOUT]) {
>>
>> If that does not solve it, then could you help to narrow down the issue?
>> Does the bug still appear if your remove the counter extension of the 
>> set?
>>
>
> Hello Jozsef,
>
> Patch applied well, compiling ...

Hello Jozsef,

Thank you for the patch it but still  crashes, see: 
https://bugzilla.redhat.com/show_bug.cgi?id=1272645

Any further ideas?

Thank you.

Ciao,
Gerhard

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jozsef Kadlecsik Oct. 25, 2015, 9:53 p.m. UTC | #3
On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:

> On 25.10.2015 21:08, Gerhard Wiesinger wrote:
> > On 25.10.2015 20:46, Jozsef Kadlecsik wrote:
> > > Hi,
> > > 
> > > On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:
> > > 
> > > > On 25.10.2015 10:46, Willy Tarreau wrote:
> > > > > ipset *triggered* the problem. The whole stack dump would tell more.
> > > > OK, find the stack traces in the bug report:
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1272645
> > > > 
> > > > Kernel 4.1.10 triggered also a kernel dump when playing with ipset
> > > > commands
> > > > and IPv6, details in the bug report  ....
> > > It seems to me it is an architecture-specific alignment issue. I don't
> > > have a Cortex-A7 ARM hardware and qemu doesn't seem to support it either,
> > > so I'm unable to reproduce it (ipset passes all my tests on my hardware,
> > > including more complex ones than what breaks here). My first wild guess is
> > > that the dynamic array of the element structure is not aligned properly.
> > > Could you give a try to the next patch?
> > > 
> > > diff --git a/net/netfilter/ipset/ip_set_hash_gen.h
> > > b/net/netfilter/ipset/ip_set_hash_gen.h
> > > index afe905c..1cf357d 100644
> > > --- a/net/netfilter/ipset/ip_set_hash_gen.h
> > > +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> > > @@ -1211,6 +1211,9 @@ static const struct ip_set_type_variant
> > > mtype_variant = {
> > >       .same_set = mtype_same_set,
> > >   };
> > >   +#define IP_SET_BASE_ALIGN(dtype)    \
> > > +    ALIGN(sizeof(struct dtype), __alignof__(struct dtype))
> > > +
> > >   #ifdef IP_SET_EMIT_CREATE
> > >   static int
> > >   IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
> > > @@ -1319,12 +1322,12 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net,
> > > struct ip_set *set,
> > >   #endif
> > >           set->variant = &IPSET_TOKEN(HTYPE, 4_variant);
> > >           set->dsize = ip_set_elem_len(set, tb,
> > > -                sizeof(struct IPSET_TOKEN(HTYPE, 4_elem)));
> > > +                IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE, 4_elem)));
> > >   #ifndef IP_SET_PROTO_UNDEF
> > >       } else {
> > >           set->variant = &IPSET_TOKEN(HTYPE, 6_variant);
> > >           set->dsize = ip_set_elem_len(set, tb,
> > > -                sizeof(struct IPSET_TOKEN(HTYPE, 6_elem)));
> > > +                IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE, 6_elem)));
> > >       }
> > >   #endif
> > >       if (tb[IPSET_ATTR_TIMEOUT]) {
> > > 
> > > If that does not solve it, then could you help to narrow down the issue?
> > > Does the bug still appear if your remove the counter extension of the set?
> > > 
>
> Thank you for the patch it but still  crashes, see:
> https://bugzilla.redhat.com/show_bug.cgi?id=1272645
> 
> Any further ideas?

Does it crash without counters? That could narrow down where to look for.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gerhard Wiesinger Oct. 26, 2015, 7:27 a.m. UTC | #4
On 25.10.2015 22:53, Jozsef Kadlecsik wrote:
> On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:
>
>> Any further ideas?
> Does it crash without counters? That could narrow down where to look for.
>
>

Hello Jozsef,

it doesn't crash i I don't use the counters so far. So there must be a 
bug with the counters.

Any idea for the root cause?

Thnx.

Ciao,
Gerhard

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jozsef Kadlecsik Oct. 26, 2015, 8:58 a.m. UTC | #5
On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:

> On 25.10.2015 20:46, Jozsef Kadlecsik wrote:
> > Hi,
> > 
> > On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:
> > 
> > > On 25.10.2015 10:46, Willy Tarreau wrote:
> > > > ipset *triggered* the problem. The whole stack dump would tell more.
> > > OK, find the stack traces in the bug report:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1272645
> > > 
> > > Kernel 4.1.10 triggered also a kernel dump when playing with ipset
> > > commands
> > > and IPv6, details in the bug report  ....
> > It seems to me it is an architecture-specific alignment issue. I don't
> > have a Cortex-A7 ARM hardware and qemu doesn't seem to support it either,
> > so I'm unable to reproduce it (ipset passes all my tests on my hardware,
> > including more complex ones than what breaks here). My first wild guess is
> > that the dynamic array of the element structure is not aligned properly.
> > Could you give a try to the next patch?
> > 
> > diff --git a/net/netfilter/ipset/ip_set_hash_gen.h
> > b/net/netfilter/ipset/ip_set_hash_gen.h
> > index afe905c..1cf357d 100644
> > --- a/net/netfilter/ipset/ip_set_hash_gen.h
> > +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> > @@ -1211,6 +1211,9 @@ static const struct ip_set_type_variant mtype_variant
> > = {
> >   	.same_set = mtype_same_set,
> >   };
> >   +#define IP_SET_BASE_ALIGN(dtype)	\
> > +	ALIGN(sizeof(struct dtype), __alignof__(struct dtype))
> > +
> >   #ifdef IP_SET_EMIT_CREATE
> >   static int
> >   IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
> > @@ -1319,12 +1322,12 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct
> > ip_set *set,
> >   #endif
> >   		set->variant = &IPSET_TOKEN(HTYPE, 4_variant);
> >   		set->dsize = ip_set_elem_len(set, tb,
> > -				sizeof(struct IPSET_TOKEN(HTYPE, 4_elem)));
> > +				IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE,
> > 4_elem)));
> >   #ifndef IP_SET_PROTO_UNDEF
> >   	} else {
> >   		set->variant = &IPSET_TOKEN(HTYPE, 6_variant);
> >   		set->dsize = ip_set_elem_len(set, tb,
> > -				sizeof(struct IPSET_TOKEN(HTYPE, 6_elem)));
> > +				IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE,
> > 6_elem)));
> >   	}
> >   #endif
> >   	if (tb[IPSET_ATTR_TIMEOUT]) {
> > 
> > If that does not solve it, then could you help to narrow down the issue?
> > Does the bug still appear if your remove the counter extension of the set?
> > 
> 
> Patch applied well, compiling ...
> 
> Interesting, that it didn't happen before. Device is in production for 
> more than 2 month without any issue.

You mean the device was stable with the earlier kernels, but starting with 
4.2.3 (and back to 4.1.10) you have got problems, don't you?
 
> Also any idea regarding the second isssue? Or do you think it has the 
> same root cause?

Looking at your RedHat bugzilla report, the "nf_conntrack: table full, 
dropping packet" and "Alignment trap: not handling instruction" are two 
unrelated issues and the second one is triggered by the unaligned counter 
extension acccess in ipset, I'm investigating. I can't think of any reason 
how those issues could be related to each other.

> Greetings from Vienna, Austria :-)

Quite near to my place :-) 

> BTW: You can get the Banana Pi R1 for example at:
> http://www.aliexpress.com/item/BPI-R1-Set-1-R1-Board-Clear-Case-5dB-Antenna-Power-Adapter-Banana-PI-R1-Smart/32362127917.html
> I can really recommend it as a router. Power consumption is as less as 3W.
> Price is also IMHO very good.

Cool mini gear, indeed!

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gerhard Wiesinger Oct. 26, 2015, 9:11 a.m. UTC | #6
On 26.10.2015 09:58, Jozsef Kadlecsik wrote:
> On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:
>
>> Also any idea regarding the second isssue? Or do you think it has the
>> same root cause?
> Looking at your RedHat bugzilla report, the "nf_conntrack: table full,
> dropping packet" and "Alignment trap: not handling instruction" are two
> unrelated issues and the second one is triggered by the unaligned counter
> extension acccess in ipset, I'm investigating. I can't think of any reason
> how those issues could be related to each other.

Yes, they are unrelated.
Issue 1: nf_conntrack: table full, dropping packet => Fixed with 4.2.4
Issue 2: Alignment trap: not handling instruction => Happens when ipset 
counters are enabled

Please keep in mind it happens with IPv6 commands.

Currently 4.2.4 without ipset counters runs well.

Ciao,
Gerhard

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index afe905c..1cf357d 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -1211,6 +1211,9 @@  static const struct ip_set_type_variant mtype_variant = {
 	.same_set = mtype_same_set,
 };
 
+#define IP_SET_BASE_ALIGN(dtype)	\
+	ALIGN(sizeof(struct dtype), __alignof__(struct dtype))
+
 #ifdef IP_SET_EMIT_CREATE
 static int
 IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
@@ -1319,12 +1322,12 @@  IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
 #endif
 		set->variant = &IPSET_TOKEN(HTYPE, 4_variant);
 		set->dsize = ip_set_elem_len(set, tb,
-				sizeof(struct IPSET_TOKEN(HTYPE, 4_elem)));
+				IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE, 4_elem)));
 #ifndef IP_SET_PROTO_UNDEF
 	} else {
 		set->variant = &IPSET_TOKEN(HTYPE, 6_variant);
 		set->dsize = ip_set_elem_len(set, tb,
-				sizeof(struct IPSET_TOKEN(HTYPE, 6_elem)));
+				IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE, 6_elem)));
 	}
 #endif
 	if (tb[IPSET_ATTR_TIMEOUT]) {