Message ID | 1430414883-25640-1-git-send-email-jerlbeck@sysmocom.de |
---|---|
State | Accepted |
Headers | show |
> On 30 Apr 2015, at 19:28, Jacob Erlbeck <jerlbeck@sysmocom.de> wrote: > > Currently the DL sometimes hangs and sometimes a lot of messages > (still not able to send PDU) are logged. This is caused by an invalid > timer delay computation, setting msecs either to 0 or to some big value. > > This is due to an '&' operator at the wrong place, accessing some > parts in fc instead of the first element of the list. ouch. it appears that this is the only occurrence of this issue. How long did it take for you to find it? Did ASAN help?
On 30.04.2015 20:01, Holger Freyther wrote: > >> On 30 Apr 2015, at 19:28, Jacob Erlbeck <jerlbeck@sysmocom.de> wrote: >> This is due to an '&' operator at the wrong place, accessing some >> parts in fc instead of the first element of the list. > > ouch. it appears that this is the only occurrence of this issue. How long did > it take for you to find it? Did ASAN help? > No, since the data seems to be taken from then fc struct, so the memory area is valid. ASAN was enabled and didn't complain. Logging helped instead, since the logging messages suggested, that the timer interval computation might have been broken. Jacob
> On 30.04.2015 20:01, Holger Freyther wrote: > > > >> On 30 Apr 2015, at 19:28, Jacob Erlbeck <jerlbeck@sysmocom.de> wrote: > >> This is due to an '&' operator at the wrong place, accessing some > >> parts in fc instead of the first element of the list. Yikes! I think the definition of container_of() shouldn't cast ptr: #define container_of(ptr, type, member) ({ \ - const typeof( ((type *)0)->member ) *__mptr = (typeof( ((type *)0)->member ) *)(ptr); \ + const typeof( ((type *)0)->member ) *__mptr = (ptr); \ (type *)( (char *)__mptr - offsetof(type, member) );}) Signed-off-by: Michael McTernan <mike.mcternan@wavemobile.com> Then we get one nice warning from gcc: gprs_bssgp.c: In function 'fc_queue_timer_cfg': gprs_bssgp.c:631:9: warning: initialization from incompatible pointer type [enabled by default] Everything I've tried* still compiles and runs with this change (and generates no other warnings), but obviously libosmocore could be used somewhere else where container_of type abuse will now make warnings. If such a case is found where the cast is correct and required, I'd recommend the casting be placed at the calls to container_of() where it is more visible and auditable. Please try the above patch and consider applying it to libosmocore. Jacob, would you also like/be able to check that the above change is good with Coverity, incase that can dig out any other bugs? Kind Regards, Mike * I build the following into my system, some of which use libosmocore: libdbi-0.9.0 libdbi-drivers-0.9.0 libosmo-abis libosmocore libosmo-netif libosmo-sccp openbsc openggsn ortp-0.22.0 osmo-bts osmo-pcu
> On 01 May 2015, at 12:23, Mike McTernan (wavemobile) <mike.mcternan@wavemobile.com> wrote: > > > I think the definition of container_of() shouldn't cast ptr: > > #define container_of(ptr, type, member) ({ \ > - const typeof( ((type *)0)->member ) *__mptr = (typeof( ((type *)0)->member ) *)(ptr); \ > + const typeof( ((type *)0)->member ) *__mptr = (ptr); \ > (type *)( (char *)__mptr - offsetof(type, member) );}) good point. I thought we have a void* in our hands but we do have the llist_head here. The container_of macro in linux 2.6.12-rc2 (initial git commit) didn’t cast either so we should definitely take your commit. > Jacob, would you also like/be able to check that the above change is good with Coverity, incase that can dig out any other bugs? Our jenkins only submits HEAD as build to coverity. We don’t have a try server or similar
Hi Holger, > good point. I thought we have a void* in our hands but we do have the > llist_head here. The container_of macro in linux 2.6.12-rc2 (initial git > commit) didn’t cast either so we should definitely take your commit. Great! I subsequently checked too, and couldn't find a version of Linux with the cast. The other change to perhaps consider is what happens when the passed ptr is null. In my projects I tend to use a version of containerof that will return NULL in that case. This is both convenient for NULL checks and ensures we don't end up with a pointer to the top of memory (which will _probably_ segfault so would have been detected). > Our jenkins only submits HEAD as build to coverity. Cool! I just found the Jenkins pages... that's nice to know, thanks. Kind Regards, Mike
> On 01 May 2015, at 12:23, Mike McTernan (wavemobile) <mike.mcternan@wavemobile.com> wrote: > > Yikes! > > I think the definition of container_of() shouldn't cast ptr: > > #define container_of(ptr, type, member) ({ \ > - const typeof( ((type *)0)->member ) *__mptr = (typeof( ((type *)0)->member ) *)(ptr); \ > + const typeof( ((type *)0)->member ) *__mptr = (ptr); \ > (type *)( (char *)__mptr - offsetof(type, member) );}) In OpenBSC the cast was added in be68f6fc6cde1367a4481d2e774a64e2cd657267 Change the variable "new" to "_new" in order to include it from C++ code. The define "container_of" will cast pointer before assigning. Compilers with stricter options require this. (Andreas Eversberg) I still think we should revert the cast and see how C++ code complains and then look at the compiler warning. As you were able to compile osmo-pcu (a C++ app that uses linked lists) it can’t be that bad.
> I still think we should revert the cast and see how C++ code complains Is anyone suggesting the cast should be kept? In my view it would be a kind of madness to keep the cast - C is weakly typed as it is, so removing the little type checking we can get from the compiler is foolhardy and will just create subtle timewasting bugs. C++ has better typing, generics and true OO, so I don't see why the cast should be present there, or that container_of should be particularly useful when better type-safety can be achieved. > and then look at the compiler warning. The particular warning I highlighted is the one fixed by Jacob's recent patch. It's something of a proof as to how dangerous that cast is. I'm happy to take a look at others if they can be found (hence I was wondering about Coverity, but it looks like we have to submit a change before we can get the results via Jenkins). > As you were able to compile osmo-pcu (a C++ app > that uses linked lists) it can’t be that bad. Just to recap, I've complied and ran the following without any additional warnings or problems (though I've not ran all the tests, the libosmocore test all pass without the cast): libdbi-0.9.0 libosmo-abis libosmo-netif openggsn osmo-bts libdbi-drivers-0.9.0 libosmocore libosmo-sccp openbsc ortp-0.22.0 osmo-pcu sqlite-autoconf-3080802 There are many packages in the osmocom git - are there any other significant projects which should be checked? Is there anything significant that Jenkins doesn't cover? Kind Regards, Mike
> On 05 May 2015, at 10:25, Mike McTernan (wavemobile) <mike.mcternan@wavemobile.com> wrote: > >> I still think we should revert the cast and see how C++ code complains > > Is anyone suggesting the cast should be kept? no, and I had already pushed a change (outlook mangled your diff so I applied the change by hand). Thanks for proposing this solution.. i thought we deal with void* at this point. http://git.osmocom.org/libosmocore/commit/?id=780bba625d2d09478527ec6038f0f6e15eb6e651
> no, and I had already pushed a change Great! Thanks, that's one less patch in my quilt :-) > (outlook mangled your diff so I applied the change by hand). Hmm, Outlook is such a pain. Many Thanks, Mike
diff --git a/src/gb/gprs_bssgp.c b/src/gb/gprs_bssgp.c index 4c93b69..fe4fcca 100644 --- a/src/gb/gprs_bssgp.c +++ b/src/gb/gprs_bssgp.c @@ -628,7 +628,7 @@ static int fc_queue_timer_cfg(struct bssgp_flow_control *fc) if (llist_empty(&fc->queue)) return 0; - fcqe = llist_entry(&fc->queue.next, struct bssgp_fc_queue_element, + fcqe = llist_entry(fc->queue.next, struct bssgp_fc_queue_element, list); if (fc->bucket_leak_rate != 0) {