Message ID | 1418275427-18921-3-git-send-email-rich.tollerton@ni.com |
---|---|
State | New |
Headers | show |
----- Original Message ----- > Some drivers set RDT=RDH. Oddly, this works on real hardware. To work > around this, autodecrement RDT when this happens. > > Signed-off-by: Richard Tollerton <rich.tollerton@ni.com> > Signed-off-by: Jeff Westfahl <jeff.westfahl@ni.com> > --- > hw/net/e1000.c | 6 ++++++ > 1 file changed, 6 insertions(+) Please describe more details on the issue. The spec 3.2.6 said: " When the head pointer is equal to the tail pointer, the ring is empty. " So RDT=RDH in fact empty the ring. No? > > diff --git a/hw/net/e1000.c b/hw/net/e1000.c > index 44ae3a8..b8cbfc1 100644 > --- a/hw/net/e1000.c > +++ b/hw/net/e1000.c > @@ -1152,6 +1152,12 @@ mac_writereg(E1000State *s, int index, uint32_t val) > static void > set_rdt(E1000State *s, int index, uint32_t val) > { > + if (val == s->mac_reg[RDH]) { /* Decrement RDT if it's too big */ > + if (val == 0) { > + val = s->mac_reg[RDLEN] / sizeof(struct e1000_rx_desc); > + } > + val--; > + } > s->mac_reg[index] = val & 0xffff; > if (e1000_has_rxbufs(s, 1)) { > qemu_flush_queued_packets(qemu_get_queue(s->nic)); > -- > 2.1.3 > > >
On Thu, Dec 18, 2014 at 12:01:48AM -0500, Jason Wang wrote: > > > ----- Original Message ----- > > Some drivers set RDT=RDH. Oddly, this works on real hardware. To work > > around this, autodecrement RDT when this happens. > > > > Signed-off-by: Richard Tollerton <rich.tollerton@ni.com> > > Signed-off-by: Jeff Westfahl <jeff.westfahl@ni.com> > > --- > > hw/net/e1000.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > Please describe more details on the issue. > > The spec 3.2.6 said: > > " > When the head pointer is equal to the tail pointer, the ring is empty. > " > > So RDT=RDH in fact empty the ring. No? Richard, can you respond please? I'd like to see this clarified in code comment or commit message before applying this patchset. > > > > diff --git a/hw/net/e1000.c b/hw/net/e1000.c > > index 44ae3a8..b8cbfc1 100644 > > --- a/hw/net/e1000.c > > +++ b/hw/net/e1000.c > > @@ -1152,6 +1152,12 @@ mac_writereg(E1000State *s, int index, uint32_t val) > > static void > > set_rdt(E1000State *s, int index, uint32_t val) > > { > > + if (val == s->mac_reg[RDH]) { /* Decrement RDT if it's too big */ > > + if (val == 0) { > > + val = s->mac_reg[RDLEN] / sizeof(struct e1000_rx_desc); > > + } > > + val--; > > + } > > s->mac_reg[index] = val & 0xffff; > > if (e1000_has_rxbufs(s, 1)) { > > qemu_flush_queued_packets(qemu_get_queue(s->nic)); > > -- > > 2.1.3 > > > > > >
"Michael S. Tsirkin" <mst@redhat.com> writes: > Richard, can you respond please? > I'd like to see this clarified in code comment or > commit message before applying this patchset. Apologies, and thanks for reminding me. On Thu, Dec 18, 2014 at 12:01:48AM -0500, Jason Wang wrote: > > Some drivers set RDT=RDH. Oddly, this works on real hardware. To work > > around this, autodecrement RDT when this happens. > > Please describe more details on the issue. The spec 3.2.6 said: "When > the head pointer is equal to the tail pointer, the ring is empty." So > RDT=RDH in fact empty the ring. No? That is incorrect; the spec explicitly states that RDT=RDH means the ring is full. The linux e1000 driver more or less implies the same thing. You forgot to include the sentence after that in section 3.2.6 :) "When the head pointer is equal to the tail pointer, the ring is empty. Hardware stops storing packets in system memory until software advances the tail pointer, making more receive buffers available." Yeah, this seems really poorly worded to me too. :( You appear to be interpreting "ring is empty" in the usual sense, i.e. "all N elements of the ring buffer are available for use by hardware". In fact, the correct interpretation [1] is the exact opposite, "none of the elements are available for use by hardware". The last sentence in the quote makes this explicit. See also linux e1000 driver sources at [2] [3] [4]. See also [5] which implies that hardware DMA is kicked off by setting tail != head at initialization. I'm *guessing* (?) that the DMA engine isn't correspondingly stopped when software sets RDT=RDH, so that once packets start getting received, the hardware can more or less ignore it. In this context, my patch makes sense. (Yes, this is totally an ex-post-facto justification for the patch; it arrived to me secondhand, and I had not been familiar with the driver source before now.) [1] http://sourceforge.net/p/e1000/mailman/message/29280078/ [2] http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/e1000/e1000_main.c#L398 [3] http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/e1000/e1000.h#L215 [4] http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/e1000/e1000_main.c#L4302 [5] http://sourceforge.net/p/e1000/mailman/message/29969887/ >> > diff --git a/hw/net/e1000.c b/hw/net/e1000.c >> > index 44ae3a8..b8cbfc1 100644 >> > --- a/hw/net/e1000.c >> > +++ b/hw/net/e1000.c >> > @@ -1152,6 +1152,12 @@ mac_writereg(E1000State *s, int index, uint32_t val) >> > static void >> > set_rdt(E1000State *s, int index, uint32_t val) >> > { >> > + if (val == s->mac_reg[RDH]) { /* Decrement RDT if it's too big */ >> > + if (val == 0) { >> > + val = s->mac_reg[RDLEN] / sizeof(struct e1000_rx_desc); >> > + } >> > + val--; >> > + } >> > s->mac_reg[index] = val & 0xffff; >> > if (e1000_has_rxbufs(s, 1)) { >> > qemu_flush_queued_packets(qemu_get_queue(s->nic)); >> > -- >> > 2.1.3 >> > >> > >> >
On Tue, Jan 13, 2015 at 3:12 AM, Richard Tollerton <rich.tollerton@ni.com> wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > >> Richard, can you respond please? >> I'd like to see this clarified in code comment or >> commit message before applying this patchset. > > Apologies, and thanks for reminding me. > > On Thu, Dec 18, 2014 at 12:01:48AM -0500, Jason Wang wrote: > >> > Some drivers set RDT=RDH. Oddly, this works on real hardware. To >> work >> > around this, autodecrement RDT when this happens. >> >> Please describe more details on the issue. The spec 3.2.6 said: >> "When >> the head pointer is equal to the tail pointer, the ring is empty." >> So >> RDT=RDH in fact empty the ring. No? > > That is incorrect; the spec explicitly states that RDT=RDH means the > ring is full. The linux e1000 driver more or less implies the same > thing. > > You forgot to include the sentence after that in section 3.2.6 :) > > "When the head pointer is equal to the tail pointer, the ring is > empty. > Hardware stops storing packets in system memory until software > advances > the tail pointer, making more receive buffers available." > > Yeah, this seems really poorly worded to me too. :( You appear to be > interpreting "ring is empty" in the usual sense, i.e. "all N elements > of > the ring buffer are available for use by hardware". In fact, the > correct > interpretation [1] is the exact opposite, "none of the elements are > available for use by hardware". Yes, I do think 'empty' means no available buffer for device to receive :) > The last sentence in the quote makes > this explicit. See also linux e1000 driver sources at [2] [3] [4]. Btw, [2],[3],[4] are all codes that deal with driver's internal variable, not the one that the hardware use. > > > See also [5] which implies that hardware DMA is kicked off by setting > tail != head at initialization. Yes, and we trigger receiving in set_rdt(). > I'm *guessing* (?) that the DMA engine > isn't correspondingly stopped when software sets RDT=RDH, so that once > packets start getting received, Do you mean in qemu? I/O are single threaded, so looks like we are safe. > the hardware can more or less ignore it. > In this context, my patch makes sense. > > (Yes, this is totally an ex-post-facto justification for the patch; it > arrived to me secondhand, and I had not been familiar with the driver > source before now.) True, we've found many undocumented behavior in the past (some even conflicts with spec). I don't have a 82540EM in my hand, but I think the best thing is to check this behavior in real hardware to prevent this patch from breaking many existing drivers. > > [1] http://sourceforge.net/p/e1000/mailman/message/29280078/ This issue mentioned in the thread seems solved. Current e1000_has_rxbufs() will return false if RDT==RDH. > > [2] > http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/e1000/e1000_main.c#L398 > [3] > http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/e1000/e1000.h#L215 > [4] > http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/e1000/e1000_main.c#L4302 > [5] http://sourceforge.net/p/e1000/mailman/message/29969887/ Looks like what mentioned in this thread was also solved. We check both RCTL and e1000_has_rxbufs() in e1000_can_receive(). And flush queued packets in set_rx_control(). > >>> > diff --git a/hw/net/e1000.c b/hw/net/e1000.c >>> > index 44ae3a8..b8cbfc1 100644 >>> > --- a/hw/net/e1000.c >>> > +++ b/hw/net/e1000.c >>> > @@ -1152,6 +1152,12 @@ mac_writereg(E1000State *s, int index, >>> uint32_t val) >>> > static void >>> > set_rdt(E1000State *s, int index, uint32_t val) >>> > { >>> > + if (val == s->mac_reg[RDH]) { /* Decrement RDT if it's >>> too big */ >>> > + if (val == 0) { >>> > + val = s->mac_reg[RDLEN] / sizeof(struct >>> e1000_rx_desc); >>> > + } >>> > + val--; >>> > + } >>> > s->mac_reg[index] = val & 0xffff; >>> > if (e1000_has_rxbufs(s, 1)) { >>> > qemu_flush_queued_packets(qemu_get_queue(s->nic)); >>> > -- >>> > 2.1.3 >>> > >>> > >>> > > > -- > Richard Tollerton <rich.tollerton@ni.com> >
Jason Wang <jasowang@redhat.com> writes: > On Tue, Jan 13, 2015 at 3:12 AM, Richard Tollerton <rich.tollerton@ni.com> wrote: >> On Thu, Dec 18, 2014 at 12:01:48AM -0500, Jason Wang wrote: >> >>> > Some drivers set RDT=RDH. Oddly, this works on real hardware. To >>> > work around this, autodecrement RDT when this happens. >>> >>> Please describe more details on the issue. The spec 3.2.6 said: >>> "When the head pointer is equal to the tail pointer, the ring is >>> empty." So RDT=RDH in fact empty the ring. No? >> >> That is incorrect; the spec explicitly states that RDT=RDH means the >> ring is full. The linux e1000 driver more or less implies the same >> thing. >> >> You forgot to include the sentence after that in section 3.2.6 :) >> >> "When the head pointer is equal to the tail pointer, the ring is >> empty. Hardware stops storing packets in system memory until software >> advances the tail pointer, making more receive buffers available." >> >> Yeah, this seems really poorly worded to me too. :( You appear to be >> interpreting "ring is empty" in the usual sense, i.e. "all N elements >> of the ring buffer are available for use by hardware". In fact, the >> correct interpretation [1] is the exact opposite, "none of the >> elements are available for use by hardware". > > Yes, I do think 'empty' means no available buffer for device to receive > :) Ah, ok. But if you're concerned about breaking drivers with this... what legitimate reason could a driver possibly have to set RDT=RDH? (Besides a mistaken attempt to free the ring for hardware use, as I posit?) The only reason I can think of is that maybe a driver is trying to implement some sort of crude flow control. But if that actually worked, then major packet loss would be observed under load, as any packets received by hardware but not yet processed by software would get dropped. I'm going to go out (further) on a limb and assert that no driver ever sets RDT=RDH to stop receives, because no hardware implements that behavior. The driver I'm trying to get working appears to have been setting RDT=RDH at the end of *every* iteration of the receive loop, since it was originally written in 2003. If I am to trust the comments, it's been ported/supported on 28 different chipset variants, and it's definitely been tested for performance and packet loss under load for a good number of those, including under polling modes where the ring is only checked every few milliseconds. If RDT=RDH ever did anything except free all buffers for hardware use, surely catastrophic packet loss would have been observed? >> The last sentence in the quote makes this explicit. See also linux >> e1000 driver sources at [2] [3] [4]. > > Btw, [2],[3],[4] are all codes that deal with driver's internal > variable, not the one that the hardware use. No, it's directly used by hardware -- current_count increments RDT. See e1000_alloc_rx_buffers(). >> I'm *guessing* (?) that the DMA engine isn't correspondingly stopped >> when software sets RDT=RDH, so that once packets start getting >> received, > > Do you mean in qemu? I/O are single threaded, so looks like we are > safe. I'm referring to the possibility that physical hardware is doing this. Interesting side note: while this driver sets RDT=RDH on every iteration, it *initializes* RDT=0 and RDH=1... >> the hardware can more or less ignore it. In this context, my patch >> makes sense. >> >> (Yes, this is totally an ex-post-facto justification for the patch; it >> arrived to me secondhand, and I had not been familiar with the driver >> source before now.) > > True, we've found many undocumented behavior in the past (some even > conflicts with spec). I don't have a 82540EM in my hand, but I think > the best thing is to check this behavior in real hardware to prevent > this patch from breaking many existing drivers. Can you be more specific in regards to what information you're requesting? Are you wanting additional confirmation (i.e. via instrumented code in addition to code inspection) that setting RDT=RDH does not stop packet receive once it has started? >> [1] http://sourceforge.net/p/e1000/mailman/message/29280078/ > > This issue mentioned in the thread seems solved. Indeed; I was citing this thread (and the other thread) for the discussions, not the issues themselves. >> [2] >> http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/e1000/e1000_main.c#L398 >> [3] >> http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/e1000/e1000.h#L215 >> [4] >> http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/e1000/e1000_main.c#L4302 >> [5] http://sourceforge.net/p/e1000/mailman/message/29969887/ > > Looks like what mentioned in this thread was also solved. > > We check both RCTL and e1000_has_rxbufs() in e1000_can_receive(). > And flush queued packets in set_rx_control(). >> >>>> > diff --git a/hw/net/e1000.c b/hw/net/e1000.c >>>> > index 44ae3a8..b8cbfc1 100644 >>>> > --- a/hw/net/e1000.c >>>> > +++ b/hw/net/e1000.c >>>> > @@ -1152,6 +1152,12 @@ mac_writereg(E1000State *s, int index, >>>> uint32_t val) >>>> > static void >>>> > set_rdt(E1000State *s, int index, uint32_t val) >>>> > { >>>> > + if (val == s->mac_reg[RDH]) { /* Decrement RDT if it's >>>> too big */ >>>> > + if (val == 0) { >>>> > + val = s->mac_reg[RDLEN] / sizeof(struct >>>> e1000_rx_desc); >>>> > + } >>>> > + val--; >>>> > + } >>>> > s->mac_reg[index] = val & 0xffff; >>>> > if (e1000_has_rxbufs(s, 1)) { >>>> > qemu_flush_queued_packets(qemu_get_queue(s->nic)); >>>> > -- >>>> > 2.1.3 >>>> > >>>> > >>>> > >> >> -- >> Richard Tollerton <rich.tollerton@ni.com> >> >
On 01/14/2015 05:06 AM, Richard Tollerton wrote: > Jason Wang <jasowang@redhat.com> writes: > >> On Tue, Jan 13, 2015 at 3:12 AM, Richard Tollerton <rich.tollerton@ni.com> wrote: >>> On Thu, Dec 18, 2014 at 12:01:48AM -0500, Jason Wang wrote: >>> >>>> > Some drivers set RDT=RDH. Oddly, this works on real hardware. To >>>> > work around this, autodecrement RDT when this happens. >>>> >>>> Please describe more details on the issue. The spec 3.2.6 said: >>>> "When the head pointer is equal to the tail pointer, the ring is >>>> empty." So RDT=RDH in fact empty the ring. No? >>> That is incorrect; the spec explicitly states that RDT=RDH means the >>> ring is full. The linux e1000 driver more or less implies the same >>> thing. >>> >>> You forgot to include the sentence after that in section 3.2.6 :) >>> >>> "When the head pointer is equal to the tail pointer, the ring is >>> empty. Hardware stops storing packets in system memory until software >>> advances the tail pointer, making more receive buffers available." >>> >>> Yeah, this seems really poorly worded to me too. :( You appear to be >>> interpreting "ring is empty" in the usual sense, i.e. "all N elements >>> of the ring buffer are available for use by hardware". In fact, the >>> correct interpretation [1] is the exact opposite, "none of the >>> elements are available for use by hardware". >> Yes, I do think 'empty' means no available buffer for device to receive >> :) > Ah, ok. But if you're concerned about breaking drivers with this... what > legitimate reason could a driver possibly have to set RDT=RDH? (Besides a > mistaken attempt to free the ring for hardware use, as I posit?) One example is initialization in Linux (e1000_configure_rx()). What it does is: /* disable receives while setting up the descriptors */ rctl = er32(RCTL); ew32(RCTL, rctl & ~E1000_RCTL_EN); ... ew32(RDT, 0); ew32(RDH, 0); ... /* Enable Receives */ ew32(RCTL, rctl | E1000_RCTL_EN); And the rx buffer allocations were done later. So with your patch, after rx was enabled but before rx buffer were allocated. Since RDT was set to RDLEN, e1000_has_rxbuf() will return true. Qemu will try to receive packet to uninitialized buffers. This seems wrong. > > The only reason I can think of is that maybe a driver is trying to > implement some sort of crude flow control. But if that actually worked, > then major packet loss would be observed under load, as any packets > received by hardware but not yet processed by software would get > dropped. > > I'm going to go out (further) on a limb and assert that no driver ever > sets RDT=RDH to stop receives, because no hardware implements that > behavior. The driver I'm trying to get working appears to have been > setting RDT=RDH at the end of *every* iteration of the receive loop, > since it was originally written in 2003. If I am to trust the comments, > it's been ported/supported on 28 different chipset variants, and it's > definitely been tested for performance and packet loss under load for a > good number of those, including under polling modes where the ring is > only checked every few milliseconds. If RDT=RDH ever did anything except > free all buffers for hardware use, surely catastrophic packet loss would > have been observed? From device's point of view. It just need stop receiving when RDT=RDH. It does not care whether the ring was full of received packets or empty. I probably miss something but could you please show the (pseudo) code of your driver to see why current qemu does not work? > >>> The last sentence in the quote makes this explicit. See also linux >>> e1000 driver sources at [2] [3] [4]. >> Btw, [2],[3],[4] are all codes that deal with driver's internal >> variable, not the one that the hardware use. > No, it's directly used by hardware -- current_count increments RDT. See > e1000_alloc_rx_buffers(). Do you mean cleaned_count? (Didn't find current_count). > >>> I'm *guessing* (?) that the DMA engine isn't correspondingly stopped >>> when software sets RDT=RDH, so that once packets start getting >>> received, >> Do you mean in qemu? I/O are single threaded, so looks like we are >> safe. > I'm referring to the possibility that physical hardware is doing this. > > Interesting side note: while this driver sets RDT=RDH on every > iteration, it *initializes* RDT=0 and RDH=1... I want to know more about this driver. RDT=0 and RDH=1 means all buffers were available for device to receive. If no rx buffer refilling happens, RDH will be finally advanced by device and finally equal to RDT and then receiving is stopped. When will driver set RDT=RDH? >>> the hardware can more or less ignore it. In this context, my patch >>> makes sense. >>> >>> (Yes, this is totally an ex-post-facto justification for the patch; it >>> arrived to me secondhand, and I had not been familiar with the driver >>> source before now.) >> True, we've found many undocumented behavior in the past (some even >> conflicts with spec). I don't have a 82540EM in my hand, but I think >> the best thing is to check this behavior in real hardware to prevent >> this patch from breaking many existing drivers. > Can you be more specific in regards to what information you're > requesting? Are you wanting additional confirmation (i.e. via > instrumented code in addition to code inspection) that setting RDT=RDH > does not stop packet receive once it has started? > Just a test to see if a real 82540 card behaves like this patch. E.g. write(RDH, 0) write(RDT, 0) read(RDT) To see if RDT is zero or RDLEN. >>> [1] http://sourceforge.net/p/e1000/mailman/message/29280078/ >> This issue mentioned in the thread seems solved. > Indeed; I was citing this thread (and the other thread) for the > discussions, not the issues themselves. > >>> [2] >>> http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/e1000/e1000_main.c#L398 >>> [3] >>> http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/e1000/e1000.h#L215 >>> [4] >>> http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/e1000/e1000_main.c#L4302 >>> [5] http://sourceforge.net/p/e1000/mailman/message/29969887/ >> Looks like what mentioned in this thread was also solved. >> >> We check both RCTL and e1000_has_rxbufs() in e1000_can_receive(). >> And flush queued packets in set_rx_control(). >>>>> > diff --git a/hw/net/e1000.c b/hw/net/e1000.c >>>>> > index 44ae3a8..b8cbfc1 100644 >>>>> > --- a/hw/net/e1000.c >>>>> > +++ b/hw/net/e1000.c >>>>> > @@ -1152,6 +1152,12 @@ mac_writereg(E1000State *s, int index, >>>>> uint32_t val) >>>>> > static void >>>>> > set_rdt(E1000State *s, int index, uint32_t val) >>>>> > { >>>>> > + if (val == s->mac_reg[RDH]) { /* Decrement RDT if it's >>>>> too big */ >>>>> > + if (val == 0) { >>>>> > + val = s->mac_reg[RDLEN] / sizeof(struct >>>>> e1000_rx_desc); >>>>> > + } >>>>> > + val--; >>>>> > + } >>>>> > s->mac_reg[index] = val & 0xffff; >>>>> > if (e1000_has_rxbufs(s, 1)) { >>>>> > qemu_flush_queued_packets(qemu_get_queue(s->nic)); >>>>> > -- >>>>> > 2.1.3 >>>>> > >>>>> > >>>>> > >>> -- >>> Richard Tollerton <rich.tollerton@ni.com> >>>
diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 44ae3a8..b8cbfc1 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -1152,6 +1152,12 @@ mac_writereg(E1000State *s, int index, uint32_t val) static void set_rdt(E1000State *s, int index, uint32_t val) { + if (val == s->mac_reg[RDH]) { /* Decrement RDT if it's too big */ + if (val == 0) { + val = s->mac_reg[RDLEN] / sizeof(struct e1000_rx_desc); + } + val--; + } s->mac_reg[index] = val & 0xffff; if (e1000_has_rxbufs(s, 1)) { qemu_flush_queued_packets(qemu_get_queue(s->nic));