Message ID | 8dcee609af0015c15d2a99ab6d076a65d6ec01a8.1414586968.git.mleitner@redhat.com |
---|---|
State | Changes Requested |
Delegated to: | Pablo Neira |
Headers | show |
Hi Marcelo, On Wed, Oct 29, 2014 at 10:51:14AM -0200, Marcelo Ricardo Leitner wrote: > Currently, we don't check if __build_packet_message fails or not and > that leaves it prone to sending incomplete messages to userspace. > > This commit fixes it by canceling the new message if there was any issue > while building it and makes sure the skb is freed if it was the only > message in it. > > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> > --- > net/netfilter/nfnetlink_log.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c > index b1e3a05794169283ed50d1c0fb4f44d9e7753eeb..045c5956611f61a3f5dad3e15ca7cf19365e5afa 100644 > --- a/net/netfilter/nfnetlink_log.c > +++ b/net/netfilter/nfnetlink_log.c > @@ -568,10 +568,8 @@ __build_packet_message(struct nfnl_log_net *log, > struct nlattr *nla; > int size = nla_attr_size(data_len); > > - if (skb_tailroom(inst->skb) < nla_total_size(data_len)) { > - printk(KERN_WARNING "nfnetlink_log: no tailroom!\n"); > - return -1; > - } > + if (skb_tailroom(inst->skb) < nla_total_size(data_len)) > + goto nla_put_failure; > > nla = (struct nlattr *)skb_put(inst->skb, nla_total_size(data_len)); > nla->nla_type = NFULA_PAYLOAD; > @@ -586,6 +584,7 @@ __build_packet_message(struct nfnl_log_net *log, > > nla_put_failure: > PRINTR(KERN_ERR "nfnetlink_log: error creating log nlmsg\n"); > + nlmsg_cancel(skb, nlh); > return -1; > } > > @@ -708,8 +707,9 @@ nfulnl_log_packet(struct net *net, > > inst->qlen++; > > - __build_packet_message(log, inst, skb, data_len, pf, > - hooknum, in, out, prefix, plen); > + if (__build_packet_message(log, inst, skb, data_len, pf, > + hooknum, in, out, prefix, plen)) > + goto build_failure; > > if (inst->qlen >= qthreshold) > __nfulnl_flush(inst); > @@ -726,6 +726,15 @@ unlock_and_release: > instance_put(inst); > return; > > +build_failure: > + /* If no other messages in it, we're good to free it. */ > + if (!inst->skb->len) { > + kfree_skb(inst->skb); > + inst->skb = NULL; > + } > + > + inst->qlen--; For each message that we put into the batch, we increase inst->qlen in one, so I think this decrement isn't enough to leave things in consistent state. If we at least have one message already in the batch, I think it's good to give it a try to deliver it to userspace: if (inst->skb) __nfulnl_flush(inst); Then, the WARN_ON that Florian added recently should catch that we have size miscalculations from __nfulnl_send(). -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Pablo, On 04-11-2014 14:47, Pablo Neira Ayuso wrote: > Hi Marcelo, > > On Wed, Oct 29, 2014 at 10:51:14AM -0200, Marcelo Ricardo Leitner wrote: >> Currently, we don't check if __build_packet_message fails or not and >> that leaves it prone to sending incomplete messages to userspace. >> >> This commit fixes it by canceling the new message if there was any issue >> while building it and makes sure the skb is freed if it was the only >> message in it. >> >> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> >> --- >> net/netfilter/nfnetlink_log.c | 21 +++++++++++++++------ >> 1 file changed, 15 insertions(+), 6 deletions(-) >> >> diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c >> index b1e3a05794169283ed50d1c0fb4f44d9e7753eeb..045c5956611f61a3f5dad3e15ca7cf19365e5afa 100644 >> --- a/net/netfilter/nfnetlink_log.c >> +++ b/net/netfilter/nfnetlink_log.c >> @@ -568,10 +568,8 @@ __build_packet_message(struct nfnl_log_net *log, >> struct nlattr *nla; >> int size = nla_attr_size(data_len); >> >> - if (skb_tailroom(inst->skb) < nla_total_size(data_len)) { >> - printk(KERN_WARNING "nfnetlink_log: no tailroom!\n"); >> - return -1; >> - } >> + if (skb_tailroom(inst->skb) < nla_total_size(data_len)) >> + goto nla_put_failure; >> >> nla = (struct nlattr *)skb_put(inst->skb, nla_total_size(data_len)); >> nla->nla_type = NFULA_PAYLOAD; >> @@ -586,6 +584,7 @@ __build_packet_message(struct nfnl_log_net *log, >> >> nla_put_failure: >> PRINTR(KERN_ERR "nfnetlink_log: error creating log nlmsg\n"); >> + nlmsg_cancel(skb, nlh); >> return -1; >> } >> >> @@ -708,8 +707,9 @@ nfulnl_log_packet(struct net *net, >> >> inst->qlen++; >> >> - __build_packet_message(log, inst, skb, data_len, pf, >> - hooknum, in, out, prefix, plen); >> + if (__build_packet_message(log, inst, skb, data_len, pf, >> + hooknum, in, out, prefix, plen)) >> + goto build_failure; >> >> if (inst->qlen >= qthreshold) >> __nfulnl_flush(inst); >> @@ -726,6 +726,15 @@ unlock_and_release: >> instance_put(inst); >> return; >> >> +build_failure: >> + /* If no other messages in it, we're good to free it. */ >> + if (!inst->skb->len) { >> + kfree_skb(inst->skb); >> + inst->skb = NULL; >> + } >> + >> + inst->qlen--; > > For each message that we put into the batch, we increase inst->qlen in > one, so I think this decrement isn't enough to leave things in > consistent state. If we at least have one message already in the > batch, I think it's good to give it a try to deliver it to userspace: > > if (inst->skb) > __nfulnl_flush(inst); The idea was to undo just the last attempt and leave the older ones where they were... Currently we: - allocate skb, if needed - increment inst->qlen - call __build_packet_message - if (inst->qlen >= qthreshold) we flush.. Considering __build_packet_message now will call nlmsg_cancel and undo all its work on fails, I thought on just dec'ing inst->qlen and releasing the skbuff, if it's empty. I see your point on attempting the flush, good idea. Otherwise we could hit a situation that it would be stuck on undoing the last message and this situation would be fixed only after inst->timer expires. It could be like: build_failure: inst->qlen--; /* If no other messages in it, we're good to free it. */ if (!inst->skb->len) { kfree_skb(inst->skb); inst->skb = NULL; } else { __nfulnl_flush(inst); } What do you think? > Then, the WARN_ON that Florian added recently should catch that we > have size miscalculations from __nfulnl_send(). Good point. It may not catch it always. If the failed message was bigger than the DONE message, it may go on unnoticed. Actually, even if it warns, we would have no idea that a message was dropped a bit earlier, as the stats aren't there yet. Maybe another WARN_ONCE on build_failure label? Thanks, Marcelo -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 04, 2014 at 04:01:54PM -0200, Marcelo Ricardo Leitner wrote: > >>@@ -708,8 +707,9 @@ nfulnl_log_packet(struct net *net, > >> > >> inst->qlen++; > >> > >>- __build_packet_message(log, inst, skb, data_len, pf, > >>- hooknum, in, out, prefix, plen); > >>+ if (__build_packet_message(log, inst, skb, data_len, pf, > >>+ hooknum, in, out, prefix, plen)) > >>+ goto build_failure; > >> > >> if (inst->qlen >= qthreshold) > >> __nfulnl_flush(inst); > >>@@ -726,6 +726,15 @@ unlock_and_release: > >> instance_put(inst); > >> return; > >> > >>+build_failure: > >>+ /* If no other messages in it, we're good to free it. */ > >>+ if (!inst->skb->len) { > >>+ kfree_skb(inst->skb); > >>+ inst->skb = NULL; > >>+ } > >>+ > >>+ inst->qlen--; > > > >For each message that we put into the batch, we increase inst->qlen in > >one, so I think this decrement isn't enough to leave things in > >consistent state. If we at least have one message already in the > >batch, I think it's good to give it a try to deliver it to userspace: > > > > if (inst->skb) > > __nfulnl_flush(inst); > > The idea was to undo just the last attempt and leave the older ones > where they were... > > Currently we: > - allocate skb, if needed > - increment inst->qlen > - call __build_packet_message > - if (inst->qlen >= qthreshold) > we flush.. > > Considering __build_packet_message now will call nlmsg_cancel and > undo all its work on fails, I thought on just dec'ing inst->qlen and > releasing the skbuff, if it's empty. > > I see your point on attempting the flush, good idea. Otherwise we > could hit a situation that it would be stuck on undoing the last > message and this situation would be fixed only after inst->timer > expires. > > It could be like: > build_failure: > inst->qlen--; We can inst->qlen++ after build_packet_message, so you can skip this line above. > /* If no other messages in it, we're good to free it. */ > if (!inst->skb->len) { Now I'd suggest: if (inst->qlen == 0) { > kfree_skb(inst->skb); > inst->skb = NULL; > } > else { > __nfulnl_flush(inst); > } > > What do you think? > > >Then, the WARN_ON that Florian added recently should catch that we > >have size miscalculations from __nfulnl_send(). > > Good point. It may not catch it always. If the failed message was > bigger than the DONE message, it may go on unnoticed. Actually, even > if it warns, we would have no idea that a message was dropped a bit > earlier, as the stats aren't there yet. Maybe another WARN_ONCE on > build_failure label? That will catch possible miscalculations too, I don't like we're polluting this with many WARN_ONCE, but I don't see any better way to catch this size miscalculation bugs, so ahead add it. BTW, we should also signal the userspace when we fail to build the message via: nfnetlink_set_err(net, 0, group, -ENOBUFS); so it knows that we're losing log messages for whatever reason. Basically, userspace hits -ENOBUFS when calling recv(), which means netlink is losing messages. I don't think we really need the statistics. I can also see this line: nlh->nlmsg_len = inst->skb->tail - old_tail; that can be replaced by nlmsg_end(skb, nlh); It seems this code needs some care. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04-11-2014 16:26, Pablo Neira Ayuso wrote: > On Tue, Nov 04, 2014 at 04:01:54PM -0200, Marcelo Ricardo Leitner wrote: >>>> @@ -708,8 +707,9 @@ nfulnl_log_packet(struct net *net, >>>> >>>> inst->qlen++; >>>> >>>> - __build_packet_message(log, inst, skb, data_len, pf, >>>> - hooknum, in, out, prefix, plen); >>>> + if (__build_packet_message(log, inst, skb, data_len, pf, >>>> + hooknum, in, out, prefix, plen)) >>>> + goto build_failure; >>>> >>>> if (inst->qlen >= qthreshold) >>>> __nfulnl_flush(inst); >>>> @@ -726,6 +726,15 @@ unlock_and_release: >>>> instance_put(inst); >>>> return; >>>> >>>> +build_failure: >>>> + /* If no other messages in it, we're good to free it. */ >>>> + if (!inst->skb->len) { >>>> + kfree_skb(inst->skb); >>>> + inst->skb = NULL; >>>> + } >>>> + >>>> + inst->qlen--; >>> >>> For each message that we put into the batch, we increase inst->qlen in >>> one, so I think this decrement isn't enough to leave things in >>> consistent state. If we at least have one message already in the >>> batch, I think it's good to give it a try to deliver it to userspace: >>> >>> if (inst->skb) >>> __nfulnl_flush(inst); >> >> The idea was to undo just the last attempt and leave the older ones >> where they were... >> >> Currently we: >> - allocate skb, if needed >> - increment inst->qlen >> - call __build_packet_message >> - if (inst->qlen >= qthreshold) >> we flush.. >> >> Considering __build_packet_message now will call nlmsg_cancel and >> undo all its work on fails, I thought on just dec'ing inst->qlen and >> releasing the skbuff, if it's empty. >> >> I see your point on attempting the flush, good idea. Otherwise we >> could hit a situation that it would be stuck on undoing the last >> message and this situation would be fixed only after inst->timer >> expires. >> >> It could be like: >> build_failure: >> inst->qlen--; > > We can inst->qlen++ after build_packet_message, so you can skip this > line above. Ack, okay > >> /* If no other messages in it, we're good to free it. */ >> if (!inst->skb->len) { > > Now I'd suggest: > > if (inst->qlen == 0) { ack > >> kfree_skb(inst->skb); >> inst->skb = NULL; >> } >> else { >> __nfulnl_flush(inst); >> } >> >> What do you think? >> >>> Then, the WARN_ON that Florian added recently should catch that we >>> have size miscalculations from __nfulnl_send(). >> >> Good point. It may not catch it always. If the failed message was >> bigger than the DONE message, it may go on unnoticed. Actually, even >> if it warns, we would have no idea that a message was dropped a bit >> earlier, as the stats aren't there yet. Maybe another WARN_ONCE on >> build_failure label? > > That will catch possible miscalculations too, I don't like we're > polluting this with many WARN_ONCE, but I don't see any better way to > catch this size miscalculation bugs, so ahead add it. > > BTW, we should also signal the userspace when we fail to build the > message via: > > nfnetlink_set_err(net, 0, group, -ENOBUFS); > > so it knows that we're losing log messages for whatever reason. > Basically, userspace hits -ENOBUFS when calling recv(), which means > netlink is losing messages. I don't think we really need the > statistics. Cool. Then we probably don't need the WARN_ON unless we really want those numbers, or even so. As we will be calling nfnetlink_set_err, we have other ways of debugging. Wouldn't be as ready as a WARN_ONCE in there, yes.. but a systemtap script can easily get a hold in there. So no WARN_ONCE, just the nfnetlink_set_err() call? Like this? > if (inst->qlen == 0) { >> kfree_skb(inst->skb); >> inst->skb = NULL; >> } >> else { >> __nfulnl_flush(inst); >> } nfnetlink_set_err(net, 0, group, -ENOBUFS); So we send whatever we had, and signal the failure.. > > I can also see this line: > > nlh->nlmsg_len = inst->skb->tail - old_tail; > > that can be replaced by nlmsg_end(skb, nlh); > > It seems this code needs some care. Agreed. I'll try :) Thanks, Marcelo -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 04, 2014 at 04:52:31PM -0200, Marcelo Ricardo Leitner wrote: > >That will catch possible miscalculations too, I don't like we're > >polluting this with many WARN_ONCE, but I don't see any better way to > >catch this size miscalculation bugs, so ahead add it. > > > >BTW, we should also signal the userspace when we fail to build the > >message via: > > > >nfnetlink_set_err(net, 0, group, -ENOBUFS); > > > >so it knows that we're losing log messages for whatever reason. > >Basically, userspace hits -ENOBUFS when calling recv(), which means > >netlink is losing messages. I don't think we really need the > >statistics. > > Cool. Then we probably don't need the WARN_ON unless we really want > those numbers, or even so. As we will be calling nfnetlink_set_err, > we have other ways of debugging. Wouldn't be as ready as a WARN_ONCE > in there, yes.. but a systemtap script can easily get a hold in > there. > > So no WARN_ONCE, just the nfnetlink_set_err() call? Like this? The ENOBUFS notice won't be enough. This can be triggered if the rate of log message is too high, or the userspace process is too slow to empty the socket queue. I think we need both WARN_ONCE (tells us that miscalculation is happening, ie. we have a bug) and the nfnetlink_set_err (tells userspace it's losing log messages, so logging becomes unreliable/approximative). -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04-11-2014 17:04, Pablo Neira Ayuso wrote: > On Tue, Nov 04, 2014 at 04:52:31PM -0200, Marcelo Ricardo Leitner wrote: >>> That will catch possible miscalculations too, I don't like we're >>> polluting this with many WARN_ONCE, but I don't see any better way to >>> catch this size miscalculation bugs, so ahead add it. >>> >>> BTW, we should also signal the userspace when we fail to build the >>> message via: >>> >>> nfnetlink_set_err(net, 0, group, -ENOBUFS); >>> >>> so it knows that we're losing log messages for whatever reason. >>> Basically, userspace hits -ENOBUFS when calling recv(), which means >>> netlink is losing messages. I don't think we really need the >>> statistics. >> >> Cool. Then we probably don't need the WARN_ON unless we really want >> those numbers, or even so. As we will be calling nfnetlink_set_err, >> we have other ways of debugging. Wouldn't be as ready as a WARN_ONCE >> in there, yes.. but a systemtap script can easily get a hold in >> there. >> >> So no WARN_ONCE, just the nfnetlink_set_err() call? Like this? > > The ENOBUFS notice won't be enough. This can be triggered if the rate > of log message is too high, or the userspace process is too slow to > empty the socket queue. > > I think we need both WARN_ONCE (tells us that miscalculation is > happening, ie. we have a bug) and the nfnetlink_set_err (tells > userspace it's losing log messages, so logging becomes > unreliable/approximative). yeah.. okay, will put it in too. Thanks, Marcelo -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > BTW, we should also signal the userspace when we fail to build the > message via: > > nfnetlink_set_err(net, 0, group, -ENOBUFS); > > so it knows that we're losing log messages for whatever reason. > Basically, userspace hits -ENOBUFS when calling recv(), which means > netlink is losing messages. I don't think we really need the > statistics. Not sure if this is a good idea. a) __build_packet_message must never fail. If it does, the kernel has a size accoutning bug somewhere. b) I see no meaningful way for userspace to handle this error; there is nothing it can do about it. c) If it happens, it might be that some userspace logging daemon suddently dies because it sees an unexpected 'fatal' error. > It seems this code needs some care. Agreed... -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 04, 2014 at 08:11:20PM +0100, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > BTW, we should also signal the userspace when we fail to build the > > message via: > > > > nfnetlink_set_err(net, 0, group, -ENOBUFS); > > > > so it knows that we're losing log messages for whatever reason. > > Basically, userspace hits -ENOBUFS when calling recv(), which means > > netlink is losing messages. I don't think we really need the > > statistics. > > Not sure if this is a good idea. > > a) __build_packet_message must never fail. > If it does, the kernel has a size accoutning bug somewhere. > b) I see no meaningful way for userspace to handle this error; > there is nothing it can do about it. > c) If it happens, it might be that some userspace logging daemon > suddently dies because it sees an unexpected 'fatal' error. userspace should be handling -ENOBUFS already, netlink reports this if the buffer overruns. Although there's nothing userspace can do to recover lost messages, this reports that the logging became unreliable. People that don't mind about this can disable it via NETLINK_NO_ENOBUFS socket option. The new nfnetlink_set_err() will also report skb allocation failures to userspace, which is missing. I would remove those printk there to report OOM, there's nothing userspace can do with that. For the unlikely size miscalculation case, this probably will make it more evident to userspace that we have a bug since userspace will receive very frequent ENOBUFS report, even under very low netlink traffic. I would also add a WARN_ONCE there as you added if __build_packet_message returns an error, so we have two ways to rise alarms. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Tue, Nov 04, 2014 at 08:11:20PM +0100, Florian Westphal wrote: > > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > BTW, we should also signal the userspace when we fail to build the > > > message via: > > > > > > nfnetlink_set_err(net, 0, group, -ENOBUFS); > > > > > > so it knows that we're losing log messages for whatever reason. > > > Basically, userspace hits -ENOBUFS when calling recv(), which means > > > netlink is losing messages. I don't think we really need the > > > statistics. > > > > Not sure if this is a good idea. > > > > a) __build_packet_message must never fail. > > If it does, the kernel has a size accoutning bug somewhere. > > b) I see no meaningful way for userspace to handle this error; > > there is nothing it can do about it. > > c) If it happens, it might be that some userspace logging daemon > > suddently dies because it sees an unexpected 'fatal' error. > > userspace should be handling -ENOBUFS already, netlink reports this if > the buffer overruns. You're right. > I would remove those printk there to > report OOM, there's nothing userspace can do with that. Agreed. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c index b1e3a05794169283ed50d1c0fb4f44d9e7753eeb..045c5956611f61a3f5dad3e15ca7cf19365e5afa 100644 --- a/net/netfilter/nfnetlink_log.c +++ b/net/netfilter/nfnetlink_log.c @@ -568,10 +568,8 @@ __build_packet_message(struct nfnl_log_net *log, struct nlattr *nla; int size = nla_attr_size(data_len); - if (skb_tailroom(inst->skb) < nla_total_size(data_len)) { - printk(KERN_WARNING "nfnetlink_log: no tailroom!\n"); - return -1; - } + if (skb_tailroom(inst->skb) < nla_total_size(data_len)) + goto nla_put_failure; nla = (struct nlattr *)skb_put(inst->skb, nla_total_size(data_len)); nla->nla_type = NFULA_PAYLOAD; @@ -586,6 +584,7 @@ __build_packet_message(struct nfnl_log_net *log, nla_put_failure: PRINTR(KERN_ERR "nfnetlink_log: error creating log nlmsg\n"); + nlmsg_cancel(skb, nlh); return -1; } @@ -708,8 +707,9 @@ nfulnl_log_packet(struct net *net, inst->qlen++; - __build_packet_message(log, inst, skb, data_len, pf, - hooknum, in, out, prefix, plen); + if (__build_packet_message(log, inst, skb, data_len, pf, + hooknum, in, out, prefix, plen)) + goto build_failure; if (inst->qlen >= qthreshold) __nfulnl_flush(inst); @@ -726,6 +726,15 @@ unlock_and_release: instance_put(inst); return; +build_failure: + /* If no other messages in it, we're good to free it. */ + if (!inst->skb->len) { + kfree_skb(inst->skb); + inst->skb = NULL; + } + + inst->qlen--; + alloc_failure: /* FIXME: statistics */ goto unlock_and_release;
Currently, we don't check if __build_packet_message fails or not and that leaves it prone to sending incomplete messages to userspace. This commit fixes it by canceling the new message if there was any issue while building it and makes sure the skb is freed if it was the only message in it. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> --- net/netfilter/nfnetlink_log.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)