Message ID | 4e0db5bc0907130939k48b16256j8f60c786a7e5e44c@mail.gmail.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Gautam Kachroo wrote: > use a new netlink socket when sending flush messages to avoid reading > any pending data on the existing netlink socket. > > read all of the response from the netlink request -- this response can > be split over multiple recv calls, pretty much one per netlink request > message. ENOENT errors, which correspond to attempts to delete an > already deleted entry, are ignored. Other errors are not ignored. In which case would there be any pending data? From what I can see, this can only happen when using batching, but in that case the previous command should continue reading until it has received all responses (which the netlink functions appear to be doing properly). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 14, 2009 at 2:38 AM, Patrick McHardy<kaber@trash.net> wrote: > Gautam Kachroo wrote: >> use a new netlink socket when sending flush messages to avoid reading >> any pending data on the existing netlink socket. >> >> read all of the response from the netlink request -- this response can >> be split over multiple recv calls, pretty much one per netlink request >> message. ENOENT errors, which correspond to attempts to delete an >> already deleted entry, are ignored. Other errors are not ignored. > > In which case would there be any pending data? From what I can see, > this can only happen when using batching, but in that case the > previous command should continue reading until it has received all > responses (which the netlink functions appear to be doing properly). What is the "previous command"? Are you referring to rtnl_dump_filter? If rtnl_send_check comes across a failure, rtnl_dump_filter will not continue reading. Here's the situation that I'm referring to: If rtnl_send_check detects an error, it returns -1. rtnl_send_check is called from flush_update. The multiple implementations of flush_update (e.g. in ipneigh.c, ipaddress.c) propagate this return value to their caller, e.g. print_neigh or print_addrinfo. print_neigh, print_addrinfo, etc. are called from rtnl_dump_filter. rtnl_dump_filter sits in a loop calling recvmsg on the netlink socket. However, it returns the error value if the filter function (e.g. print_neigh) returns an error. In this case, rtnl_dump_filter can return before it's read all the responses. The error return from rtnl_dump_filter causes the program to exit. Note, rtnl_send_check treats any pending data as an error. It could be changed to only complain if the response contains NLMSG_ERROR messages, but that delays error detection because it will read all the responses to the dump request first. thanks, -gk -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Gautam Kachroo wrote: > On Tue, Jul 14, 2009 at 2:38 AM, Patrick McHardy<kaber@trash.net> wrote: >> Gautam Kachroo wrote: >>> use a new netlink socket when sending flush messages to avoid reading >>> any pending data on the existing netlink socket. >>> >>> read all of the response from the netlink request -- this response can >>> be split over multiple recv calls, pretty much one per netlink request >>> message. ENOENT errors, which correspond to attempts to delete an >>> already deleted entry, are ignored. Other errors are not ignored. >> >> In which case would there be any pending data? From what I can see, >> this can only happen when using batching, but in that case the >> previous command should continue reading until it has received all >> responses (which the netlink functions appear to be doing properly). > > What is the "previous command"? The last command before the one executing when using batching. > Are you referring to rtnl_dump_filter? If rtnl_send_check comes across > a failure, rtnl_dump_filter will not continue reading. > > Here's the situation that I'm referring to: > > If rtnl_send_check detects an error, it returns -1. rtnl_send_check is > called from flush_update. The multiple implementations of flush_update > (e.g. in ipneigh.c, ipaddress.c) propagate this return value to their > caller, e.g. print_neigh or print_addrinfo. > > print_neigh, print_addrinfo, etc. are called from rtnl_dump_filter. > rtnl_dump_filter sits in a loop calling recvmsg on the netlink socket. > However, it returns the error value if the filter function (e.g. > print_neigh) returns an error. In this case, rtnl_dump_filter can > return before it's read all the responses. > The error return from rtnl_dump_filter causes the program to exit. Yes, and I agree with your patch so far. My question is why you need another socket. > use a new netlink socket when sending flush messages to avoid reading > any pending data on the existing netlink socket. Under what circumstances would there be pending data when performing a new iproute operation? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 15, 2009 at 8:19 AM, Patrick McHardy<kaber@trash.net> wrote: > Gautam Kachroo wrote: >> On Tue, Jul 14, 2009 at 2:38 AM, Patrick McHardy<kaber@trash.net> wrote: >>> Gautam Kachroo wrote: >>>> use a new netlink socket when sending flush messages to avoid reading >>>> any pending data on the existing netlink socket. >>>> >>>> read all of the response from the netlink request -- this response can >>>> be split over multiple recv calls, pretty much one per netlink request >>>> message. ENOENT errors, which correspond to attempts to delete an >>>> already deleted entry, are ignored. Other errors are not ignored. >>> >>> In which case would there be any pending data? From what I can see, >>> this can only happen when using batching, but in that case the >>> previous command should continue reading until it has received all >>> responses (which the netlink functions appear to be doing properly). >> >> What is the "previous command"? > > The last command before the one executing when using batching. This is independent of batching (I assume you're referring to the -batch option to the ip command). It happens when running a command like "ip neigh flush to 0.0.0.0/0" if there are many neighbor entries. The implementation of flush commands, e.g. ip neigh flush, sends a dump request, e.g. RTM_GETNEIGH, and then sends requests, e.g. RTM_DELNEIGH, *while* there can be unread data from the dump request. There would be unread data if the response to the dump request was split over multiple calls to recvmsg. >> Are you referring to rtnl_dump_filter? If rtnl_send_check comes across >> a failure, rtnl_dump_filter will not continue reading. >> >> Here's the situation that I'm referring to: >> >> If rtnl_send_check detects an error, it returns -1. rtnl_send_check is >> called from flush_update. The multiple implementations of flush_update >> (e.g. in ipneigh.c, ipaddress.c) propagate this return value to their >> caller, e.g. print_neigh or print_addrinfo. >> >> print_neigh, print_addrinfo, etc. are called from rtnl_dump_filter. >> rtnl_dump_filter sits in a loop calling recvmsg on the netlink socket. >> However, it returns the error value if the filter function (e.g. >> print_neigh) returns an error. In this case, rtnl_dump_filter can >> return before it's read all the responses. >> The error return from rtnl_dump_filter causes the program to exit. > > Yes, and I agree with your patch so far. My question is why you > need another socket. > >> use a new netlink socket when sending flush messages to avoid reading >> any pending data on the existing netlink socket. > > Under what circumstances would there be pending data when > performing a new iproute operation? As above, it's not that there is pending data when performing a new iproute operation, it's that there can be pending data while performing a single iproute operation, namely ip <object> flush. The benefit of a new socket is that it won't have any data from the dump request waiting for it. thanks, -gk -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 15 Jul 2009 10:50:57 -0700 Gautam Kachroo <gk@aristanetworks.com> wrote: > On Wed, Jul 15, 2009 at 8:19 AM, Patrick McHardy<kaber@trash.net> wrote: > > Gautam Kachroo wrote: > >> On Tue, Jul 14, 2009 at 2:38 AM, Patrick McHardy<kaber@trash.net> wrote: > >>> Gautam Kachroo wrote: > >>>> use a new netlink socket when sending flush messages to avoid reading > >>>> any pending data on the existing netlink socket. > >>>> > >>>> read all of the response from the netlink request -- this response can > >>>> be split over multiple recv calls, pretty much one per netlink request > >>>> message. ENOENT errors, which correspond to attempts to delete an > >>>> already deleted entry, are ignored. Other errors are not ignored. > >>> > >>> In which case would there be any pending data? From what I can see, > >>> this can only happen when using batching, but in that case the > >>> previous command should continue reading until it has received all > >>> responses (which the netlink functions appear to be doing properly). > >> > >> What is the "previous command"? > > > > The last command before the one executing when using batching. > > This is independent of batching (I assume you're referring to the > -batch option to the ip command). > It happens when running a command like "ip neigh flush to 0.0.0.0/0" > if there are many neighbor entries. > > The implementation of flush commands, e.g. ip neigh flush, sends a > dump request, e.g. RTM_GETNEIGH, and then sends requests, e.g. > RTM_DELNEIGH, *while* there can be unread data from the dump request. > There would be unread data if the response to the dump request was > split over multiple calls to recvmsg. > > >> Are you referring to rtnl_dump_filter? If rtnl_send_check comes across > >> a failure, rtnl_dump_filter will not continue reading. > >> > >> Here's the situation that I'm referring to: > >> > >> If rtnl_send_check detects an error, it returns -1. rtnl_send_check is > >> called from flush_update. The multiple implementations of flush_update > >> (e.g. in ipneigh.c, ipaddress.c) propagate this return value to their > >> caller, e.g. print_neigh or print_addrinfo. > >> > >> print_neigh, print_addrinfo, etc. are called from rtnl_dump_filter. > >> rtnl_dump_filter sits in a loop calling recvmsg on the netlink socket. > >> However, it returns the error value if the filter function (e.g. > >> print_neigh) returns an error. In this case, rtnl_dump_filter can > >> return before it's read all the responses. > >> The error return from rtnl_dump_filter causes the program to exit. > > > > Yes, and I agree with your patch so far. My question is why you > > need another socket. > > > >> use a new netlink socket when sending flush messages to avoid reading > >> any pending data on the existing netlink socket. > > > > Under what circumstances would there be pending data when > > performing a new iproute operation? > > As above, it's not that there is pending data when performing a new > iproute operation, it's that there can be pending data while > performing a single iproute operation, namely ip <object> flush. > The benefit of a new socket is that it won't have any data from the > dump request waiting for it. I posted a better fix (using MSG_PEEK). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 15, 2009 at 12:19 PM, Stephen Hemminger<shemminger@vyatta.com> wrote: > On Wed, 15 Jul 2009 10:50:57 -0700 > Gautam Kachroo <gk@aristanetworks.com> wrote: > >> On Wed, Jul 15, 2009 at 8:19 AM, Patrick McHardy<kaber@trash.net> wrote: >> > Gautam Kachroo wrote: >> >> On Tue, Jul 14, 2009 at 2:38 AM, Patrick McHardy<kaber@trash.net> wrote: >> >>> Gautam Kachroo wrote: >> >>>> use a new netlink socket when sending flush messages to avoid reading >> >>>> any pending data on the existing netlink socket. >> >>>> >> >>>> read all of the response from the netlink request -- this response can >> >>>> be split over multiple recv calls, pretty much one per netlink request >> >>>> message. ENOENT errors, which correspond to attempts to delete an >> >>>> already deleted entry, are ignored. Other errors are not ignored. >> >>> >> >>> In which case would there be any pending data? From what I can see, >> >>> this can only happen when using batching, but in that case the >> >>> previous command should continue reading until it has received all >> >>> responses (which the netlink functions appear to be doing properly). >> >> >> >> What is the "previous command"? >> > >> > The last command before the one executing when using batching. >> >> This is independent of batching (I assume you're referring to the >> -batch option to the ip command). >> It happens when running a command like "ip neigh flush to 0.0.0.0/0" >> if there are many neighbor entries. >> >> The implementation of flush commands, e.g. ip neigh flush, sends a >> dump request, e.g. RTM_GETNEIGH, and then sends requests, e.g. >> RTM_DELNEIGH, *while* there can be unread data from the dump request. >> There would be unread data if the response to the dump request was >> split over multiple calls to recvmsg. >> >> >> Are you referring to rtnl_dump_filter? If rtnl_send_check comes across >> >> a failure, rtnl_dump_filter will not continue reading. >> >> >> >> Here's the situation that I'm referring to: >> >> >> >> If rtnl_send_check detects an error, it returns -1. rtnl_send_check is >> >> called from flush_update. The multiple implementations of flush_update >> >> (e.g. in ipneigh.c, ipaddress.c) propagate this return value to their >> >> caller, e.g. print_neigh or print_addrinfo. >> >> >> >> print_neigh, print_addrinfo, etc. are called from rtnl_dump_filter. >> >> rtnl_dump_filter sits in a loop calling recvmsg on the netlink socket. >> >> However, it returns the error value if the filter function (e.g. >> >> print_neigh) returns an error. In this case, rtnl_dump_filter can >> >> return before it's read all the responses. >> >> The error return from rtnl_dump_filter causes the program to exit. >> > >> > Yes, and I agree with your patch so far. My question is why you >> > need another socket. >> > >> >> use a new netlink socket when sending flush messages to avoid reading >> >> any pending data on the existing netlink socket. >> > >> > Under what circumstances would there be pending data when >> > performing a new iproute operation? >> >> As above, it's not that there is pending data when performing a new >> iproute operation, it's that there can be pending data while >> performing a single iproute operation, namely ip <object> flush. >> The benefit of a new socket is that it won't have any data from the >> dump request waiting for it. > > I posted a better fix (using MSG_PEEK). Where did you post the fix? I didn't see it on netdev or in the iproute2 git... I had considered using MSG_PEEK in rtnl_send_check, but I don't think that notices errors with the requests in the "buf" argument of rtnl_send_check if there is already pending data -- the recv will peek the next chunk of the dump response. The error response will be waiting in the queue after the dump response. Of course, an error, e.g. EPERM, will eventually be noticed, just not as early... thanks, -gk -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 15, 2009 at 3:04 PM, Gautam Kachroo<gk@aristanetworks.com> wrote: > On Wed, Jul 15, 2009 at 12:19 PM, Stephen > Hemminger<shemminger@vyatta.com> wrote: >> On Wed, 15 Jul 2009 10:50:57 -0700 >> Gautam Kachroo <gk@aristanetworks.com> wrote: >> >>> On Wed, Jul 15, 2009 at 8:19 AM, Patrick McHardy<kaber@trash.net> wrote: >>> > Gautam Kachroo wrote: >>> >> On Tue, Jul 14, 2009 at 2:38 AM, Patrick McHardy<kaber@trash.net> wrote: >>> >>> Gautam Kachroo wrote: >>> >>>> use a new netlink socket when sending flush messages to avoid reading >>> >>>> any pending data on the existing netlink socket. >>> >>>> >>> >>>> read all of the response from the netlink request -- this response can >>> >>>> be split over multiple recv calls, pretty much one per netlink request >>> >>>> message. ENOENT errors, which correspond to attempts to delete an >>> >>>> already deleted entry, are ignored. Other errors are not ignored. >>> >>> >>> >>> In which case would there be any pending data? From what I can see, >>> >>> this can only happen when using batching, but in that case the >>> >>> previous command should continue reading until it has received all >>> >>> responses (which the netlink functions appear to be doing properly). >>> >> >>> >> What is the "previous command"? >>> > >>> > The last command before the one executing when using batching. >>> >>> This is independent of batching (I assume you're referring to the >>> -batch option to the ip command). >>> It happens when running a command like "ip neigh flush to 0.0.0.0/0" >>> if there are many neighbor entries. >>> >>> The implementation of flush commands, e.g. ip neigh flush, sends a >>> dump request, e.g. RTM_GETNEIGH, and then sends requests, e.g. >>> RTM_DELNEIGH, *while* there can be unread data from the dump request. >>> There would be unread data if the response to the dump request was >>> split over multiple calls to recvmsg. >>> >>> >> Are you referring to rtnl_dump_filter? If rtnl_send_check comes across >>> >> a failure, rtnl_dump_filter will not continue reading. >>> >> >>> >> Here's the situation that I'm referring to: >>> >> >>> >> If rtnl_send_check detects an error, it returns -1. rtnl_send_check is >>> >> called from flush_update. The multiple implementations of flush_update >>> >> (e.g. in ipneigh.c, ipaddress.c) propagate this return value to their >>> >> caller, e.g. print_neigh or print_addrinfo. >>> >> >>> >> print_neigh, print_addrinfo, etc. are called from rtnl_dump_filter. >>> >> rtnl_dump_filter sits in a loop calling recvmsg on the netlink socket. >>> >> However, it returns the error value if the filter function (e.g. >>> >> print_neigh) returns an error. In this case, rtnl_dump_filter can >>> >> return before it's read all the responses. >>> >> The error return from rtnl_dump_filter causes the program to exit. >>> > >>> > Yes, and I agree with your patch so far. My question is why you >>> > need another socket. >>> > >>> >> use a new netlink socket when sending flush messages to avoid reading >>> >> any pending data on the existing netlink socket. >>> > >>> > Under what circumstances would there be pending data when >>> > performing a new iproute operation? >>> >>> As above, it's not that there is pending data when performing a new >>> iproute operation, it's that there can be pending data while >>> performing a single iproute operation, namely ip <object> flush. >>> The benefit of a new socket is that it won't have any data from the >>> dump request waiting for it. >> >> I posted a better fix (using MSG_PEEK). > > Where did you post the fix? I didn't see it on netdev or in the iproute2 git... > I had considered using MSG_PEEK in rtnl_send_check, but I don't think > that notices errors with the requests in the "buf" argument of > rtnl_send_check if there is already pending data -- the recv will peek > the next chunk of the dump response. The error response will be > waiting in the queue after the dump response. > Of course, an error, e.g. EPERM, will eventually be noticed, just not > as early... I saw commit 2d8240f8d95dfdc276dcf447623129fb5ccedcd6. Using MSG_PEEK will prevent pending data from being removed during the check for errors, but re-using the same socket means that errors won't be detected until all the pending data has been read. rtnl_send_check still treats ENOENT as an error. It seems better for flush to ignore ENOENT. That way a flush will not be disrupted by an entry being removed since that's not really an error for a flush operation. thanks, -gk > thanks, > -gk > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/lib/libnetlink.c +++ b/lib/libnetlink.c @@ -111,7 +111,10 @@ int rtnl_send(struct rtnl_handle *rth, const char *buf, int len) return send(rth->fd, buf, len, 0); } -int rtnl_send_check(struct rtnl_handle *rth, const char *buf, int len) +/* send the message in buf using the socket in rth and check for errors + (ignoring ENOENT errors) +*/ +int rtnl_send_check_impl(struct rtnl_handle *rth, const char *buf, int len) { struct nlmsghdr *h;
use a new netlink socket when sending flush messages to avoid reading any pending data on the existing netlink socket. read all of the response from the netlink request -- this response can be split over multiple recv calls, pretty much one per netlink request message. ENOENT errors, which correspond to attempts to delete an already deleted entry, are ignored. Other errors are not ignored. Signed-off-by: Gautam Kachroo <gk@aristanetworks.com> --- resending in plain text format rtnl_send_check calls recv after sending the delete messages. It does this to catch errors, e.g. user doesn't have permissions to send netlink messages. It treats *any* response as an error, even if the message is not an NLMSG_ERROR. However, this doesn't work if the dump is in the middle of being processed. recv will return the next chunk of the dump. This was causing the flush operation to bail early, e.g. when there is a large arp cache. Ignoring ENOENT lets flush succeed even if entries have been deleted from underneath the flush. int status; @@ -122,28 +125,51 @@ int rtnl_send_check(struct rtnl_handle *rth, const char *buf, int len) return status; /* Check for errors */ - status = recv(rth->fd, resp, sizeof(resp), MSG_DONTWAIT); - if (status < 0) { - if (errno == EAGAIN) - return 0; - return -1; - } + for (;;) { + status = recv(rth->fd, resp, sizeof(resp), MSG_DONTWAIT); + if (status < 0) { + if (errno == EAGAIN) + return 0; + return -1; + } - for (h = (struct nlmsghdr *)resp; NLMSG_OK(h, status); - h = NLMSG_NEXT(h, status)) { - if (h->nlmsg_type == NLMSG_ERROR) { - struct nlmsgerr *err = (struct nlmsgerr*)NLMSG_DATA(h); - if (h->nlmsg_len < NLMSG_LENGTH(sizeof(struct nlmsgerr))) - fprintf(stderr, "ERROR truncated\n"); - else - errno = -err->error; + for (h = (struct nlmsghdr *)resp; NLMSG_OK(h, status); + h = NLMSG_NEXT(h, status)) { + if (h->nlmsg_type == NLMSG_ERROR) { + struct nlmsgerr *err = (struct nlmsgerr*)NLMSG_DATA(h); + if (h->nlmsg_len < NLMSG_LENGTH(sizeof(struct nlmsgerr))) { + fprintf(stderr, "ERROR truncated\n"); + return -1; + } + else { + if (err->error != -ENOENT) { + errno = -err->error; + return -1; + } + } + } } - return -1; } return 0; } +/* creates a netlink socket and passes it to rtnl_send_check_impl to send the + message in buf +*/ +int rtnl_send_check(struct rtnl_handle *unused, const char *buf, int len) +{ + struct rtnl_handle rth = { .fd = -1 }; + int ret; + if (rtnl_open(&rth, 0) < 0) { + fprintf(stderr, "Cannot open rtnetlink\n"); + return -1; + } + ret = rtnl_send_check_impl(&rth, buf, len); + rtnl_close(&rth); + return ret; +} + int rtnl_dump_request(struct rtnl_handle *rth, int type, void *req, int len) { struct nlmsghdr nlh; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html