Message ID | 20220124103445.2459400-1-amorenoz@redhat.com |
---|---|
Headers | show |
Series | Fix undefined behavior in loop macros | expand |
On 1/24/22 11:34, Adrian Moreno wrote: > When running builds with UBSan, some undefined behavior was detected in the iteration of common data data structures in OVS. > Coincidentally, a bug was reported [1] whose root cause whas another, this time undetected, undefined behavior in the iteration macros. > > From both cases, we conclude that the way we're currently iterating the data structures is prone to errors and UB. This series is an attempt to rewrite those macros in a UB-safe manner. Hi Adrian, Thanks for this! The patchset needs a small rebase, nothing major though. That's also why 0-day robot failed to build. > > The core problem is that #define OBJECT_CONTAINING(POINTER, OBJECT, MEMBER) macro is being used on invalid POINTER values. In some cases we use NULL to compute the end-of-loop condition. In others, we allow it to point to non-contained objects (e.g: a non-contained stack allocated "struct ovs_list" as in [1]). > > In order to systematically solve this in all cases this series introduces a new set of macros that implement a multi-variable loop iteration. They declare a hidden iterator variable inside the loop, use to iterate and evaluate the loop condition and only compute its OBJECT_CONTAINING if it satisfies the loop condition. One consequence of this safety guard is that the pointer provided by the user is set to NULL after the loop (if not exited via "break;"). > It seems that sparse is not too happy about these changes. In the documentation we recommend sparse version 0.5.1 or later. However, sparse complains about various things when run with any version older than 0.6.2 (more specifically after [0]). I think we should just recommend a newer version of sparse and that's it. [0] https://git.kernel.org/pub/scm/devel/sparse/sparse.git/commit/?id=ffb24e18c9b83e5878ee9ca4513deb5de235e15c However, that's not the only issue with sparse. It seems on specific distributions (e.g., on my Fedora 34 test machine) sparse fails to use the right headers. I made it work with this change, although I'm not sure this is the best way of doing things: diff --git a/acinclude.m4 b/acinclude.m4 index 0c360fd1ef73..f704bf36cdfe 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -1424,7 +1424,7 @@ AC_DEFUN([OVS_ENABLE_SPARSE], : ${SPARSE=sparse} AC_SUBST([SPARSE]) AC_CONFIG_COMMANDS_PRE( - [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) $(SPARSE_WERROR) -I $(top_srcdir)/include/sparse $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')']) + [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) $(SPARSE_WERROR) -I $(top_srcdir)/include/sparse -I $(top_srcdir)/include $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')']) AC_ARG_ENABLE( [sparse], --- > Apart from normal iteration, many OVS data structures have a _SAFE version of the loop which require the user to declare an extra variable to hold the next value of the iterator. The _SAFE version of the multi-variable iterators have the extra benefit of not requiring such extra variable. > On relevant data structures, an initial patch rewrites the macros in a backwards compatible manner and a second patch modifies all the callers to remove the unneeded variable. The fist patch would be easy to backport and the second would make code cleaner for the master branch. Although this sounds nice, I'm afraid it creates some issues: a. OVN relies in some places on the fact that it can explicitly access the "next" of the iterator, e.g.: https://github.com/ovn-org/ovn/blob/main/lib/expr.c#L1958 We can fix it but it's not too nice. b. There's currently some inconsistency in the way safe container iterators can be used: - SSET_FOR_EACH_SAFE() still expects the user to pass a "next" variable. - the IDL safe iterator helpers in ovsdb-idlc still require the user to provide a "next" variable: https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-idlc.in#L254 c. The changes in this OVS patchset seem to allow easier backport but that won't really be the case for OVN. I wonder if it makes sense to keep both versions: - the old *_SAFE(.., next, ..) - a new *_SAFE_V2 (or a better name) that drops the need for explicitly supplied "next". What do you think? > > Testing notes: > In order to verify this series removes all the loop-related UB, I've tested it on top of Dumitru's series [2] (without patch 1/11, which can hide some still invalid use of OBJECT_CONTAINING). > I've also verified no extra errors are reported through clang-analyzer. > > Limitations: > The proposed approach benefits code readability, therefore the name of the iterator variable is derived from the name of the object pointer given by the caller. > This means that in an unlikely but still possible case in which a caller wants to nest two loops with the same iterating pointer object, the inner loop iterator variable will hide the one declared in the outer loop. This limitation is easy to spot (the compiler will warn) and easy to work around (just declaring another object pointer variable). I found no such code in the ovs tree. > > Credits: > The idea was discussed in [3] and proposed by Jakub Jelinek <jakub@redhat.com>. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=2014942 > [2] https://patchwork.ozlabs.org/project/openvswitch/list/?series=277900 > [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103964 > > Adrian Moreno (13): > util: add multi-variable loop iterator macros > util: add safe multi-variable iterators > list: use multi-variable helpers for list loops > list: ensure iterator is NULL after pop loop > list: remove the next variable in safe loops > hmap: use multi-variable helpers for hmap loops > hmap: implement UB-safe hmap pop iterator > hmap: remove the next variable in safe loops > cmap: use multi-variable iterators > hindex: use multi-variable iterators > hindex: remove the next variable in safe loops > rculist: use multi-variable helpers for loop macros > vtep: use _SAFE iterator if freing the iterator I did try this patchset out (after quickly translating the OVN callers too) and it seems to work fine. There was one place in OVN where we relied on a specific value of the LIST iterator after a complete iteration but I *think* that was the only case: https://github.com/ovn-org/ovn/blob/main/controller/ofctrl.c#L894 Regards, DUmitru
On 2/9/22 16:23, Dumitru Ceara wrote: > However, that's not the only issue with sparse. It seems on specific > distributions (e.g., on my Fedora 34 test machine) sparse fails to use > the right headers. I made it work with this change, although I'm not > sure this is the best way of doing things: > > diff --git a/acinclude.m4 b/acinclude.m4 > index 0c360fd1ef73..f704bf36cdfe 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -1424,7 +1424,7 @@ AC_DEFUN([OVS_ENABLE_SPARSE], > : ${SPARSE=sparse} > AC_SUBST([SPARSE]) > AC_CONFIG_COMMANDS_PRE( > - [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) $(SPARSE_WERROR) -I $(top_srcdir)/include/sparse $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')']) > + [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) $(SPARSE_WERROR) -I $(top_srcdir)/include/sparse -I $(top_srcdir)/include $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')']) > > AC_ARG_ENABLE( > [sparse], > --- Turns out this was kind of my fault because I had OVS headers installed on the test system . Those headers obviously had the old macro definitions. So we need to make sure we use the ones from the tree instead. Regards, Dumitru
On 2/9/22 16:23, Dumitru Ceara wrote: > On 1/24/22 11:34, Adrian Moreno wrote: >> When running builds with UBSan, some undefined behavior was detected in the iteration of common data data structures in OVS. >> Coincidentally, a bug was reported [1] whose root cause whas another, this time undetected, undefined behavior in the iteration macros. >> >> From both cases, we conclude that the way we're currently iterating the data structures is prone to errors and UB. This series is an attempt to rewrite those macros in a UB-safe manner. > > Hi Adrian, > > Thanks for this! The patchset needs a small rebase, nothing major > though. That's also why 0-day robot failed to build. > >> >> The core problem is that #define OBJECT_CONTAINING(POINTER, OBJECT, MEMBER) macro is being used on invalid POINTER values. In some cases we use NULL to compute the end-of-loop condition. In others, we allow it to point to non-contained objects (e.g: a non-contained stack allocated "struct ovs_list" as in [1]). >> >> In order to systematically solve this in all cases this series introduces a new set of macros that implement a multi-variable loop iteration. They declare a hidden iterator variable inside the loop, use to iterate and evaluate the loop condition and only compute its OBJECT_CONTAINING if it satisfies the loop condition. One consequence of this safety guard is that the pointer provided by the user is set to NULL after the loop (if not exited via "break;"). >> > > It seems that sparse is not too happy about these changes. > > In the documentation we recommend sparse version 0.5.1 or later. > However, sparse complains about various things when run with any version > older than 0.6.2 (more specifically after [0]). I think we should just > recommend a newer version of sparse and that's it. > > [0] https://git.kernel.org/pub/scm/devel/sparse/sparse.git/commit/?id=ffb24e18c9b83e5878ee9ca4513deb5de235e15c > > However, that's not the only issue with sparse. It seems on specific > distributions (e.g., on my Fedora 34 test machine) sparse fails to use > the right headers. I made it work with this change, although I'm not > sure this is the best way of doing things: > > diff --git a/acinclude.m4 b/acinclude.m4 > index 0c360fd1ef73..f704bf36cdfe 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -1424,7 +1424,7 @@ AC_DEFUN([OVS_ENABLE_SPARSE], > : ${SPARSE=sparse} > AC_SUBST([SPARSE]) > AC_CONFIG_COMMANDS_PRE( > - [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) $(SPARSE_WERROR) -I $(top_srcdir)/include/sparse $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')']) > + [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) $(SPARSE_WERROR) -I $(top_srcdir)/include/sparse -I $(top_srcdir)/include $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')']) > > AC_ARG_ENABLE( > [sparse], > --- > Thanks for bisecting sparse and pinpoint the problematic patch. I will add your patch and the recommendation to use sparse > 0.6.2 in the next version of the series. >> Apart from normal iteration, many OVS data structures have a _SAFE version of the loop which require the user to declare an extra variable to hold the next value of the iterator. The _SAFE version of the multi-variable iterators have the extra benefit of not requiring such extra variable. >> On relevant data structures, an initial patch rewrites the macros in a backwards compatible manner and a second patch modifies all the callers to remove the unneeded variable. The fist patch would be easy to backport and the second would make code cleaner for the master branch. > > Although this sounds nice, I'm afraid it creates some issues: > a. OVN relies in some places on the fact that it can explicitly access > the "next" of the iterator, e.g.: > https://github.com/ovn-org/ovn/blob/main/lib/expr.c#L1958 > > We can fix it but it's not too nice. > > b. There's currently some inconsistency in the way safe container > iterators can be used: > - SSET_FOR_EACH_SAFE() still expects the user to pass a "next" > variable. > - the IDL safe iterator helpers in ovsdb-idlc still require the user > to provide a "next" variable: > https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-idlc.in#L254 > Good catch! I'll take care of those two iterators as well and make it homogeneous in the next version. > c. The changes in this OVS patchset seem to allow easier backport but > that won't really be the case for OVN. > > I wonder if it makes sense to keep both versions: > - the old *_SAFE(.., next, ..) > - a new *_SAFE_V2 (or a better name) that drops the need for explicitly > supplied "next". > > What do you think? How about *_SAFE_OLD(...) for the old one and *_SAFE(...) for the new one? :-D I'm just thinking that it would be a way to promote the use of the new and cleaner iterator in new code. On the other hand, moving most of the users to the new macro should be a good example while we keep backwards compatibility. But even if we do keep the old name of the old version of the macro, it should have a new implementation that makes sure the "next" variable is not left pointing to something invalid, so I guess the backwards compatibility is broken anyway. What do you think? >> >> Testing notes: >> In order to verify this series removes all the loop-related UB, I've tested it on top of Dumitru's series [2] (without patch 1/11, which can hide some still invalid use of OBJECT_CONTAINING). >> I've also verified no extra errors are reported through clang-analyzer. >> >> Limitations: >> The proposed approach benefits code readability, therefore the name of the iterator variable is derived from the name of the object pointer given by the caller. >> This means that in an unlikely but still possible case in which a caller wants to nest two loops with the same iterating pointer object, the inner loop iterator variable will hide the one declared in the outer loop. This limitation is easy to spot (the compiler will warn) and easy to work around (just declaring another object pointer variable). I found no such code in the ovs tree. >> >> Credits: >> The idea was discussed in [3] and proposed by Jakub Jelinek <jakub@redhat.com>. >> >> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2014942 >> [2] https://patchwork.ozlabs.org/project/openvswitch/list/?series=277900 >> [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103964 >> >> Adrian Moreno (13): >> util: add multi-variable loop iterator macros >> util: add safe multi-variable iterators >> list: use multi-variable helpers for list loops >> list: ensure iterator is NULL after pop loop >> list: remove the next variable in safe loops >> hmap: use multi-variable helpers for hmap loops >> hmap: implement UB-safe hmap pop iterator >> hmap: remove the next variable in safe loops >> cmap: use multi-variable iterators >> hindex: use multi-variable iterators >> hindex: remove the next variable in safe loops >> rculist: use multi-variable helpers for loop macros >> vtep: use _SAFE iterator if freing the iterator > > I did try this patchset out (after quickly translating the OVN callers > too) and it seems to work fine. There was one place in OVN where we > relied on a specific value of the LIST iterator after a complete > iteration but I *think* that was the only case: > > https://github.com/ovn-org/ovn/blob/main/controller/ofctrl.c#L894 > > Regards, > DUmitru >
>> c. The changes in this OVS patchset seem to allow easier backport but >> that won't really be the case for OVN. >> >> I wonder if it makes sense to keep both versions: >> - the old *_SAFE(.., next, ..) >> - a new *_SAFE_V2 (or a better name) that drops the need for explicitly >> supplied "next". >> >> What do you think? > > How about *_SAFE_OLD(...) for the old one and *_SAFE(...) for the new one? :-D > I'm just thinking that it would be a way to promote the use of the new and cleaner iterator in new code. > > On the other hand, moving most of the users to the new macro should be a good example while we keep backwards compatibility. But even if we do keep the old name of the old version of the macro, it should have a new implementation that makes sure the "next" variable is not left pointing to something invalid, so I guess the backwards compatibility is broken anyway. What do you think? Maybe we can overload the macro instead? With something like this: https://stackoverflow.com/questions/9183993/msvc-variadic-macro-expansion/24028231#24028231 Best regards, Ilya Maximets.
On 2/9/22 16:55, Ilya Maximets wrote: >>> c. The changes in this OVS patchset seem to allow easier backport but >>> that won't really be the case for OVN. >>> >>> I wonder if it makes sense to keep both versions: >>> - the old *_SAFE(.., next, ..) >>> - a new *_SAFE_V2 (or a better name) that drops the need for explicitly >>> supplied "next". >>> >>> What do you think? >> >> How about *_SAFE_OLD(...) for the old one and *_SAFE(...) for the new one? :-D >> I'm just thinking that it would be a way to promote the use of the new and cleaner iterator in new code. >> >> On the other hand, moving most of the users to the new macro should be a good example while we keep backwards compatibility. But even if we do keep the old name of the old version of the macro, it should have a new implementation that makes sure the "next" variable is not left pointing to something invalid, so I guess the backwards compatibility is broken anyway. What do you think? > > > Maybe we can overload the macro instead? > With something like this: > https://stackoverflow.com/questions/9183993/msvc-variadic-macro-expansion/24028231#24028231 > > Best regards, Ilya Maximets. > That's a nice solution, yes. Thanks.
On 2/9/22 16:46, Dumitru Ceara wrote: > On 2/9/22 16:23, Dumitru Ceara wrote: >> However, that's not the only issue with sparse. It seems on specific >> distributions (e.g., on my Fedora 34 test machine) sparse fails to use >> the right headers. I made it work with this change, although I'm not >> sure this is the best way of doing things: >> >> diff --git a/acinclude.m4 b/acinclude.m4 >> index 0c360fd1ef73..f704bf36cdfe 100644 >> --- a/acinclude.m4 >> +++ b/acinclude.m4 >> @@ -1424,7 +1424,7 @@ AC_DEFUN([OVS_ENABLE_SPARSE], >> : ${SPARSE=sparse} >> AC_SUBST([SPARSE]) >> AC_CONFIG_COMMANDS_PRE( >> - [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) $(SPARSE_WERROR) -I $(top_srcdir)/include/sparse $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')']) >> + [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) $(SPARSE_WERROR) -I $(top_srcdir)/include/sparse -I $(top_srcdir)/include $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')']) >> >> AC_ARG_ENABLE( >> [sparse], >> --- > > Turns out this was kind of my fault because I had OVS headers installed > on the test system . Those headers obviously had the old macro > definitions. So we need to make sure we use the ones from the tree instead. > Do you mean we don't need to bump the sparse version, only ensure the tree headers are included? Thanks
On 2/11/22 12:36, Adrian Moreno wrote: > > > On 2/9/22 16:46, Dumitru Ceara wrote: >> On 2/9/22 16:23, Dumitru Ceara wrote: >>> However, that's not the only issue with sparse. It seems on specific >>> distributions (e.g., on my Fedora 34 test machine) sparse fails to use >>> the right headers. I made it work with this change, although I'm not >>> sure this is the best way of doing things: >>> >>> diff --git a/acinclude.m4 b/acinclude.m4 >>> index 0c360fd1ef73..f704bf36cdfe 100644 >>> --- a/acinclude.m4 >>> +++ b/acinclude.m4 >>> @@ -1424,7 +1424,7 @@ AC_DEFUN([OVS_ENABLE_SPARSE], >>> : ${SPARSE=sparse} >>> AC_SUBST([SPARSE]) >>> AC_CONFIG_COMMANDS_PRE( >>> - [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) >>> $(SPARSE_WERROR) -I $(top_srcdir)/include/sparse $(SPARSEFLAGS) >>> $(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')']) >>> + [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) >>> $(SPARSE_WERROR) -I $(top_srcdir)/include/sparse -I >>> $(top_srcdir)/include $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc >>> $(CGCCFLAGS),'"$CC"')']) >>> AC_ARG_ENABLE( >>> [sparse], >>> --- >> >> Turns out this was kind of my fault because I had OVS headers installed >> on the test system . Those headers obviously had the old macro >> definitions. So we need to make sure we use the ones from the tree >> instead. >> > > Do you mean we don't need to bump the sparse version, only ensure the > tree headers are included? We need both, sorry for the confusion.