diff mbox

bssgp: Fix call to llist_entry in fc_queue_timer_cfg

Message ID 1430414883-25640-1-git-send-email-jerlbeck@sysmocom.de
State Accepted
Headers show

Commit Message

Jacob Erlbeck April 30, 2015, 5:28 p.m. UTC
Currently the DL sometimes hangs and sometimes a lot of messages
(still not able to send PDU) are logged. This is caused by an invalid
timer delay computation, setting msecs either to 0 or to some big value.

This is due to an '&' operator at the wrong place, accessing some
parts in fc instead of the first element of the list.

This commit fixes that issue.

Sponsored-by: On-Waves ehf
---
 src/gb/gprs_bssgp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Holger Freyther April 30, 2015, 6:01 p.m. UTC | #1
> On 30 Apr 2015, at 19:28, Jacob Erlbeck <jerlbeck@sysmocom.de> wrote:
> 
> Currently the DL sometimes hangs and sometimes a lot of messages
> (still not able to send PDU) are logged. This is caused by an invalid
> timer delay computation, setting msecs either to 0 or to some big value.
> 
> This is due to an '&' operator at the wrong place, accessing some
> parts in fc instead of the first element of the list.

ouch. it appears that this is the only occurrence of this issue. How long did
it take for you to find it? Did ASAN help?
Jacob Erlbeck April 30, 2015, 6:33 p.m. UTC | #2
On 30.04.2015 20:01, Holger Freyther wrote:
> 
>> On 30 Apr 2015, at 19:28, Jacob Erlbeck <jerlbeck@sysmocom.de> wrote:
>> This is due to an '&' operator at the wrong place, accessing some
>> parts in fc instead of the first element of the list.
> 
> ouch. it appears that this is the only occurrence of this issue. How long did
> it take for you to find it? Did ASAN help?
> 
No, since the data seems to be taken from then fc struct, so the memory
area is valid. ASAN was enabled and didn't complain.

Logging helped instead, since the logging messages suggested, that the
timer interval computation might have been broken.

Jacob
Mike McTernan (wavemobile) May 1, 2015, 10:23 a.m. UTC | #3
> On 30.04.2015 20:01, Holger Freyther wrote:
> >
> >> On 30 Apr 2015, at 19:28, Jacob Erlbeck <jerlbeck@sysmocom.de> wrote:
> >> This is due to an '&' operator at the wrong place, accessing some
> >> parts in fc instead of the first element of the list.

Yikes!

I think the definition of container_of() shouldn't cast ptr:

#define container_of(ptr, type, member) ({                     \
-        const typeof( ((type *)0)->member ) *__mptr = (typeof( ((type *)0)->member ) *)(ptr);  \
+        const typeof( ((type *)0)->member ) *__mptr = (ptr);   \
         (type *)( (char *)__mptr - offsetof(type, member) );})

Signed-off-by: Michael McTernan <mike.mcternan@wavemobile.com>

Then we get one nice warning from gcc:

gprs_bssgp.c: In function 'fc_queue_timer_cfg':
gprs_bssgp.c:631:9: warning: initialization from incompatible pointer type [enabled by default]

Everything I've tried* still compiles and runs with this change (and generates no other warnings), but obviously libosmocore could be used somewhere else where container_of type abuse will now make warnings.  If such a case is found where the cast is correct and required, I'd recommend the casting be placed at the calls to container_of() where it is more visible and auditable.

Please try the above patch and consider applying it to libosmocore.  

Jacob, would you also like/be able to check that the above change is good with Coverity, incase that can dig out any other bugs?

Kind Regards,

Mike

* I build the following into my system, some of which use libosmocore:
libdbi-0.9.0
libdbi-drivers-0.9.0
libosmo-abis
libosmocore
libosmo-netif
libosmo-sccp
openbsc
openggsn
ortp-0.22.0
osmo-bts
osmo-pcu
Holger Freyther May 1, 2015, 2:13 p.m. UTC | #4
> On 01 May 2015, at 12:23, Mike McTernan (wavemobile) <mike.mcternan@wavemobile.com> wrote:
> 
> 
> I think the definition of container_of() shouldn't cast ptr:
> 
> #define container_of(ptr, type, member) ({                     \
> -        const typeof( ((type *)0)->member ) *__mptr = (typeof( ((type *)0)->member ) *)(ptr);  \
> +        const typeof( ((type *)0)->member ) *__mptr = (ptr);   \
>         (type *)( (char *)__mptr - offsetof(type, member) );})


good point. I thought we have a void* in our hands but we do have the
llist_head here. The container_of macro in linux 2.6.12-rc2 (initial git
commit) didn’t cast either so we should definitely take your commit.


> Jacob, would you also like/be able to check that the above change is good with Coverity, incase that can dig out any other bugs?

Our jenkins only submits HEAD as build to coverity. We don’t have a try
server or similar
Mike McTernan (wavemobile) May 1, 2015, 3:09 p.m. UTC | #5
Hi Holger,

> good point. I thought we have a void* in our hands but we do have the

> llist_head here. The container_of macro in linux 2.6.12-rc2 (initial git

> commit) didn’t cast either so we should definitely take your commit.


Great!  I subsequently checked too, and couldn't find a version of Linux with the cast. 

The other change to perhaps consider is what happens when the passed ptr is null.  In my projects I tend to use a version of containerof that will return NULL in that case.  This is both convenient for NULL checks and ensures we don't end up with a pointer to the top of memory (which will _probably_ segfault so would have been detected). 

> Our jenkins only submits HEAD as build to coverity. 


Cool!  I just found the Jenkins pages... that's nice to know, thanks.
	
Kind Regards,

Mike
Holger Freyther May 2, 2015, 6:03 a.m. UTC | #6
> On 01 May 2015, at 12:23, Mike McTernan (wavemobile) <mike.mcternan@wavemobile.com> wrote:
> 
> Yikes!
> 
> I think the definition of container_of() shouldn't cast ptr:
> 
> #define container_of(ptr, type, member) ({                     \
> -        const typeof( ((type *)0)->member ) *__mptr = (typeof( ((type *)0)->member ) *)(ptr);  \
> +        const typeof( ((type *)0)->member ) *__mptr = (ptr);   \
>         (type *)( (char *)__mptr - offsetof(type, member) );})


In OpenBSC the cast was added in be68f6fc6cde1367a4481d2e774a64e2cd657267

    Change the variable "new" to "_new" in order to include it from C++ code.
    The define "container_of" will cast pointer before assigning. Compilers
    with stricter options require this. (Andreas Eversberg)

I still think we should revert the cast and see how C++ code complains and then look
at the compiler warning. As you were able to compile osmo-pcu (a C++ app that uses
linked lists) it can’t be that bad.
Mike McTernan (wavemobile) May 5, 2015, 8:25 a.m. UTC | #7
> I still think we should revert the cast and see how C++ code complains 


Is anyone suggesting the cast should be kept?

In my view it would be a kind of madness to keep the cast - C is weakly typed as it is, so removing the little type checking we can get from the compiler is foolhardy and will just create subtle timewasting bugs.  C++ has better typing, generics and true OO, so I don't see why the cast should be present there, or that container_of should be particularly useful when better type-safety can be achieved.

> and then look at the compiler warning.


The particular warning I highlighted is the one fixed by Jacob's recent patch.  It's something of a proof as to how dangerous that cast is.  I'm happy to take a look at others if they can be found (hence I was wondering about Coverity, but it looks like we have to submit a change before we can get the results via Jenkins).

> As you were able to compile osmo-pcu (a C++ app

> that uses linked lists) it can’t be that bad.


Just to recap, I've complied and ran the following without any additional warnings or problems (though I've not ran all the tests, the libosmocore test all pass without the cast):

libdbi-0.9.0 libosmo-abis  libosmo-netif openggsn  osmo-bts  
libdbi-drivers-0.9.0  libosmocore   libosmo-sccp openbsc ortp-0.22.0  osmo-pcu sqlite-autoconf-3080802

There are many packages in the osmocom git - are there any other significant projects which should be checked?  Is there anything significant that Jenkins doesn't cover?

Kind Regards,

Mike
Holger Freyther May 5, 2015, 9:38 a.m. UTC | #8
> On 05 May 2015, at 10:25, Mike McTernan (wavemobile) <mike.mcternan@wavemobile.com> wrote:
> 
>> I still think we should revert the cast and see how C++ code complains 
> 
> Is anyone suggesting the cast should be kept?

no, and I had already pushed a change (outlook mangled your diff so I applied
the change by hand). Thanks for proposing this solution.. i thought we deal with
void* at this point.

http://git.osmocom.org/libosmocore/commit/?id=780bba625d2d09478527ec6038f0f6e15eb6e651
Mike McTernan (wavemobile) May 5, 2015, 9:44 a.m. UTC | #9
> no, and I had already pushed a change

Great!  Thanks, that's one less patch in my quilt :-)

> (outlook mangled your diff so I applied the change by hand). 

Hmm, Outlook is such a pain.

Many Thanks,

Mike
diff mbox

Patch

diff --git a/src/gb/gprs_bssgp.c b/src/gb/gprs_bssgp.c
index 4c93b69..fe4fcca 100644
--- a/src/gb/gprs_bssgp.c
+++ b/src/gb/gprs_bssgp.c
@@ -628,7 +628,7 @@  static int fc_queue_timer_cfg(struct bssgp_flow_control *fc)
 	if (llist_empty(&fc->queue))
 		return 0;
 
-	fcqe = llist_entry(&fc->queue.next, struct bssgp_fc_queue_element,
+	fcqe = llist_entry(fc->queue.next, struct bssgp_fc_queue_element,
 			   list);
 
 	if (fc->bucket_leak_rate != 0) {