diff mbox

[1/2] vhost: enable any layout feature

Message ID 20161010073721-mutt-send-email-mst@kernel.org
State New
Headers show

Commit Message

Michael S. Tsirkin Oct. 10, 2016, 4:39 a.m. UTC
On Mon, Oct 10, 2016 at 04:16:19AM +0000, Wang, Zhihong wrote:
> 
> 
> > -----Original Message-----
> > From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> > Sent: Monday, October 10, 2016 11:59 AM
> > To: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Stephen Hemminger
> > <stephen@networkplumber.org>; dev@dpdk.org; qemu-
> > devel@nongnu.org; Wang, Zhihong <zhihong.wang@intel.com>
> > Subject: Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
> > 
> > On Mon, Oct 10, 2016 at 06:46:44AM +0300, Michael S. Tsirkin wrote:
> > > On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote:
> > > > On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > > > > >
> > > > > >
> > > > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > > > > Yes but two points.
> > > > >
> > > > > 1. why is this memset expensive?
> > > >
> > > > I don't have the exact answer, but just some rough thoughts:
> > > >
> > > > It's an external clib function: there is a call stack and the
> > > > IP register will bounch back and forth.
> > >
> > > for memset 0?  gcc 5.3.1 on fedora happily inlines it.
> > 
> > Good to know!
> > 
> > > > overkill to use that for resetting 14 bytes structure.
> > > >
> > > > Some trick like
> > > >     *(struct virtio_net_hdr *)hdr = {0, };
> > > >
> > > > Or even
> > > >     hdr->xxx = 0;
> > > >     hdr->yyy = 0;
> > > >
> > > > should behaviour better.
> > > >
> > > > There was an example: the vhost enqueue optmization patchset from
> > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> > > > on my Ivybridge server: it has no such issue on his server though.
> > > >
> > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > >
> > > > 	--yliu
> > >
> > > I'd say that's weird. what's your config? any chance you
> > > are using an old compiler?
> > 
> > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> > he said the memset is not well optimized for Ivybridge server.
> 
> The dst is remote in that case. It's fine on Haswell but has complication
> in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue.
> 
> I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.


So try something like this then:

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>



Generally pointer chasing in vq->hw->vtnet_hdr_size can't be good
for performance. Move fields used on data path into vq
and use from there to avoid indirections?

Comments

Yuanhan Liu Oct. 11, 2016, 6:57 a.m. UTC | #1
On Mon, Oct 10, 2016 at 07:39:59AM +0300, Michael S. Tsirkin wrote:
> > > > > > 1. why is this memset expensive?
> > > > >
> > > > > I don't have the exact answer, but just some rough thoughts:
> > > > >
> > > > > It's an external clib function: there is a call stack and the
> > > > > IP register will bounch back and forth.
> > > >
> > > > for memset 0?  gcc 5.3.1 on fedora happily inlines it.
> > > 
> > > Good to know!
> > > 
> > > > > overkill to use that for resetting 14 bytes structure.
> > > > >
> > > > > Some trick like
> > > > >     *(struct virtio_net_hdr *)hdr = {0, };
> > > > >
> > > > > Or even
> > > > >     hdr->xxx = 0;
> > > > >     hdr->yyy = 0;
> > > > >
> > > > > should behaviour better.
> > > > >
> > > > > There was an example: the vhost enqueue optmization patchset from
> > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> > > > > on my Ivybridge server: it has no such issue on his server though.
> > > > >
> > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > > >
> > > > > 	--yliu
> > > >
> > > > I'd say that's weird. what's your config? any chance you
> > > > are using an old compiler?
> > > 
> > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> > > he said the memset is not well optimized for Ivybridge server.
> > 
> > The dst is remote in that case. It's fine on Haswell but has complication
> > in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue.
> > 
> > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.
> 
> 
> So try something like this then:

Yes, I saw memset is inlined when this diff is applied.

So, mind to send a formal patch? You might want to try build at least:
it doesn't build.

 
> Generally pointer chasing in vq->hw->vtnet_hdr_size can't be good
> for performance. Move fields used on data path into vq
> and use from there to avoid indirections?

Good suggestion!

	--yliu
Yuanhan Liu Oct. 12, 2016, 3:21 a.m. UTC | #2
On Tue, Oct 11, 2016 at 02:57:49PM +0800, Yuanhan Liu wrote:
> > > > > > There was an example: the vhost enqueue optmization patchset from
> > > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)

Though it doesn't matter now, but I have verified it yesterday (with and
wihtout memset), the drop could be up to 30+%.

This is to let you know that it could behaviour badly if memset is not
inlined.

> > > > > > on my Ivybridge server: it has no such issue on his server though.
> > > > > >
> > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > > > >
> > > > > > 	--yliu
> > > > >
> > > > > I'd say that's weird. what's your config? any chance you
> > > > > are using an old compiler?
> > > > 
> > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> > > > he said the memset is not well optimized for Ivybridge server.
> > > 
> > > The dst is remote in that case. It's fine on Haswell but has complication
> > > in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue.
> > > 
> > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.
> > 
> > 
> > So try something like this then:
> 
> Yes, I saw memset is inlined when this diff is applied.

I have another concern though: It's a trick could let gcc do the inline,
I am not quite sure whether that's ture with other compilers (i.e. clang,
icc, or even, older gcc).

For this case, I think I still prefer some trick like
    *(struct ..*) = {0, }

Or even, we may could introduce rte_memset(). IIRC, that has been
proposed somehow before?

	--yliu
Yang, Zhiyong Oct. 13, 2016, 2:52 a.m. UTC | #3
Hi, Yuanhan:

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yuanhan Liu
> Sent: Wednesday, October 12, 2016 11:22 AM
> To: Michael S. Tsirkin <mst@redhat.com>; Thomas Monjalon
> <thomas.monjalon@6wind.com>
> Cc: Wang, Zhihong <zhihong.wang@intel.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Stephen Hemminger
> <stephen@networkplumber.org>; dev@dpdk.org; qemu-
> devel@nongnu.org
> Subject: Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout
> feature
> 
> On Tue, Oct 11, 2016 at 02:57:49PM +0800, Yuanhan Liu wrote:
> > > > > > > There was an example: the vhost enqueue optmization patchset
> > > > > > > from Zhihong [0] uses memset, and it introduces more than
> > > > > > > 15% drop (IIRC)
> 
> Though it doesn't matter now, but I have verified it yesterday (with and
> wihtout memset), the drop could be up to 30+%.
> 
> This is to let you know that it could behaviour badly if memset is not inlined.
> 
> > > > > > > on my Ivybridge server: it has no such issue on his server though.
> > > > > > >
> > > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > > > > >
> > > > > > > 	--yliu
> > > > > >
> > > > > > I'd say that's weird. what's your config? any chance you are
> > > > > > using an old compiler?
> > > > >
> > > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more.
> > > > > IIRC, he said the memset is not well optimized for Ivybridge server.
> > > >
> > > > The dst is remote in that case. It's fine on Haswell but has
> > > > complication in Ivy Bridge which (wasn't supposed to but) causes
> serious frontend issue.
> > > >
> > > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.
> > >
> > >
> > > So try something like this then:
> >
> > Yes, I saw memset is inlined when this diff is applied.
> 
> I have another concern though: It's a trick could let gcc do the inline, I am not
> quite sure whether that's ture with other compilers (i.e. clang, icc, or even,
> older gcc).
> 
> For this case, I think I still prefer some trick like
>     *(struct ..*) = {0, }
> 
> Or even, we may could introduce rte_memset(). IIRC, that has been
> proposed somehow before?
> 

I'm trying to introduce rte_memset to have a prototype  It have
Gotten some performance enhancement For small size, I'm optimize it further.

--Zhiyong

> 	--yliu
diff mbox

Patch

diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index dd7693f..7a3f88e 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -292,6 +292,16 @@  vtpci_with_feature(struct virtio_hw *hw, uint64_t bit)
 	return (hw->guest_features & (1ULL << bit)) != 0;
 }
 
+static inline int
+vtnet_hdr_size(struct virtio_hw *hw)
+{
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) ||
+	    vtpci_with_feature(hw, VIRTIO_F_VERSION_1))
+		return sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	else
+		return sizeof(struct virtio_net_hdr);
+}
+
 /*
  * Function declaration from virtio_pci.c
  */
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index a27208e..21a45e1 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -216,7 +216,7 @@  virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 	struct vring_desc *start_dp;
 	uint16_t seg_num = cookie->nb_segs;
 	uint16_t head_idx, idx;
-	uint16_t head_size = vq->hw->vtnet_hdr_size;
+	uint16_t head_size = vtnet_hdr_size(vq->hw);
 	unsigned long offs;
 
 	head_idx = vq->vq_desc_head_idx;