[4/7] netprio_cgroup: reimplement priomap expansion

Message ID	1353400211-5182-5-git-send-email-tj@kernel.org
State	Not Applicable, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Tejun Heo <tj@kernel.org> To: daniel.wagner@bmw-carit.de, serge.hallyn@canonical.com, ebiederm@xmission.com, nhorman@tuxdriver.com, tgraf@suug.ch Cc: davem@davemloft.net, lizefan@huawei.com, cgroups@vger.kernel.org, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, Tejun Heo <tj@kernel.org> Subject: [PATCH 4/7] netprio_cgroup: reimplement priomap expansion Date: Tue, 20 Nov 2012 00:30:08 -0800 Message-Id: <1353400211-5182-5-git-send-email-tj@kernel.org> In-Reply-To: <1353400211-5182-1-git-send-email-tj@kernel.org> References: <1353400211-5182-1-git-send-email-tj@kernel.org> Sender: netdev-owner@vger.kernel.org Precedence: bulk

Tejun Heo Nov. 20, 2012, 8:30 a.m. UTC

netprio kept track of the highest prioidx allocated and resized
priomaps accordingly when necessary.  This makes it necessary to keep
track of prioidx allocation and may end up resizing on every new
prioidx.

Update extend_netdev_table() such that it takes @target_idx which the
priomap should be able to accomodate.  If the priomap is large enough,
nothing happens; otherwise, the size is doubled until @target_idx can
be accomodated.

This makes max_prioidx and write_update_netdev_table() unnecessary.
write_priomap() now calls extend_netdev_table() directly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
---
 net/core/netprio_cgroup.c | 56 ++++++++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 23 deletions(-)

Daniel Wagner Nov. 20, 2012, 8:46 a.m. UTC | #1

Hi Tejun,

On 20.11.2012 09:30, Tejun Heo wrote:
> netprio kept track of the highest prioidx allocated and resized
> priomaps accordingly when necessary.  This makes it necessary to keep
> track of prioidx allocation and may end up resizing on every new
> prioidx.
> 
> Update extend_netdev_table() such that it takes @target_idx which the
> priomap should be able to accomodate.  If the priomap is large enough,
> nothing happens; otherwise, the size is doubled until @target_idx can
> be accomodated.
> 
> This makes max_prioidx and write_update_netdev_table() unnecessary.
> write_priomap() now calls extend_netdev_table() directly.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> ---
>   net/core/netprio_cgroup.c | 56 ++++++++++++++++++++++++++++-------------------
>   1 file changed, 33 insertions(+), 23 deletions(-)
> 
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index 92cc54c..569d83d 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -27,11 +27,11 @@
>   
>   #include <linux/fdtable.h>
>   
> +#define PRIOMAP_MIN_SZ		128
>   #define PRIOIDX_SZ 128
>   
>   static unsigned long prioidx_map[PRIOIDX_SZ];
>   static DEFINE_SPINLOCK(prioidx_map_lock);
> -static atomic_t max_prioidx = ATOMIC_INIT(0);
>   
>   static inline struct cgroup_netprio_state *cgrp_netprio_state(struct cgroup *cgrp)
>   {
> @@ -51,8 +51,6 @@ static int get_prioidx(u32 *prio)
>   		return -ENOSPC;
>   	}
>   	set_bit(prioidx, prioidx_map);
> -	if (atomic_read(&max_prioidx) < prioidx)
> -		atomic_set(&max_prioidx, prioidx);
>   	spin_unlock_irqrestore(&prioidx_map_lock, flags);
>   	*prio = prioidx;
>   	return 0;
> @@ -67,15 +65,40 @@ static void put_prioidx(u32 idx)
>   	spin_unlock_irqrestore(&prioidx_map_lock, flags);
>   }
>   
> -static int extend_netdev_table(struct net_device *dev, u32 new_len)
> +/*
> + * Extend @dev->priomap so that it's large enough to accomodate
> + * @target_idx.  @dev->priomap.priomap_len > @target_idx after successful
> + * return.  Must be called under rtnl lock.
> + */
> +static int extend_netdev_table(struct net_device *dev, u32 target_idx)
>   {
> -	size_t new_size = sizeof(struct netprio_map) +
> -			   ((sizeof(u32) * new_len));
> -	struct netprio_map *new = kzalloc(new_size, GFP_KERNEL);
> -	struct netprio_map *old;
> +	struct netprio_map *old, *new;
> +	size_t new_sz, new_len;
>   
> +	/* is the existing priomap large enough? */
>   	old = rtnl_dereference(dev->priomap);
> +	if (old && old->priomap_len > target_idx)
> +		return 0;
> +
> +	/*
> +	 * Determine the new size.  Let's keep it power-of-two.  We start
> +	 * from PRIOMAP_MIN_SZ and double it until it's large enough to
> +	 * accommodate @target_idx.
> +	 */
> +	new_sz = PRIOMAP_MIN_SZ;
> +	while (true) {
> +		new_len = (new_sz - offsetof(struct netprio_map, priomap)) /
> +			sizeof(new->priomap[0]);
> +		if (new_len > target_idx)
> +			break;
> +		new_sz *= 2;
> +		/* overflowed? */
> +		if (WARN_ON(new_sz < PRIOMAP_MIN_SZ))
> +			return -ENOSPC;
> +	}
>   
> +	/* allocate & copy */
> +	new = kzalloc(new_sz, GFP_KERNEL);
>   	if (!new) {
>   		pr_warn("Unable to alloc new priomap!\n");
>   		return -ENOMEM;
> @@ -87,26 +110,13 @@ static int extend_netdev_table(struct net_device *dev, u32 new_len)
>   
>   	new->priomap_len = new_len;
>   
> +	/* install the new priomap */
>   	rcu_assign_pointer(dev->priomap, new);
>   	if (old)
>   		kfree_rcu(old, rcu);
>   	return 0;
>   }

Okay, I might be just to stupid to see the beauty in what you are doing. So
please bear with me when I ask these question.

struct netprio_map {
	struct rcu_head rcu;
	struct netprio_aux *aux;	/* auxiliary config array */
	u32 priomap_len;
	u32 priomap[];
};

Is there a specific reason why aux and priomap is handled differently? Couldn't you 
just use same approach for both variables, e.g. re/allocating only them here and
leave the allocation struct netprio_map in cgrp_css_alloc()?

Also the algorithm to figure out the size of the array might be a bit too aggressive
in my opinion. So you always start at PRIOMAP_MIN_SIZE and then try to double
the size until target_idx fits. Wouldn't it make sense to start to look
for the new size beginning at old->priomap_len and then do the power-of-two
increase?

cheers,
daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tejun Heo Nov. 20, 2012, 2:38 p.m. UTC | #2

Hello, Daniel.

On Tue, Nov 20, 2012 at 09:46:22AM +0100, Daniel Wagner wrote:
> struct netprio_map {
> 	struct rcu_head rcu;
> 	struct netprio_aux *aux;	/* auxiliary config array */
> 	u32 priomap_len;
> 	u32 priomap[];
> };
> 
> Is there a specific reason why aux and priomap is handled
> differently? Couldn't you just use same approach for both variables,
> e.g. re/allocating only them here and leave the allocation struct
> netprio_map in cgrp_css_alloc()?

->aux is no longer added, so the consistency issue doesn't exist
anymore.  The reason why they were handled differently before (or
rather why I didn't change priomap[] to be allocated separately) was
that pointer chasing tends to be more expensive than offsetting.  I
don't know how much effect it would have in this case but things
sitting in packet in/out paths can be very hot so didn't wanna disturb
it.

> Also the algorithm to figure out the size of the array might be a
> bit too aggressive in my opinion. So you always start at
> PRIOMAP_MIN_SIZE and then try to double the size until target_idx
> fits. Wouldn't it make sense to start to look for the new size
> beginning at old->priomap_len and then do the power-of-two increase?

The only downside of always starting from PRIOMAP_MIN_SIZE is
iterating several more times in the sizing loop which isn't really
anything to worry about.  The loop is structured that way because I
wanted to keep the size of the whole thing power-of-two.  Due to the
fields before priomap[], if we size priomap_len power-of-two, we'll
always end up with something slightly over power-of-two, which is
usually the worst size to allocate.

Thanks.

Daniel Wagner Nov. 20, 2012, 3:09 p.m. UTC | #3

On 20.11.2012 15:38, Tejun Heo wrote:
> Hello, Daniel.
>
> On Tue, Nov 20, 2012 at 09:46:22AM +0100, Daniel Wagner wrote:
>> struct netprio_map {
>> 	struct rcu_head rcu;
>> 	struct netprio_aux *aux;	/* auxiliary config array */
>> 	u32 priomap_len;
>> 	u32 priomap[];
>> };
>>
>> Is there a specific reason why aux and priomap is handled
>> differently? Couldn't you just use same approach for both variables,
>> e.g. re/allocating only them here and leave the allocation struct
>> netprio_map in cgrp_css_alloc()?
>
> ->aux is no longer added, so the consistency issue doesn't exist
> anymore.

Right, I got confused looking at v1 and v2.

> The reason why they were handled differently before (or
> rather why I didn't change priomap[] to be allocated separately) was
> that pointer chasing tends to be more expensive than offsetting.  I
> don't know how much effect it would have in this case but things
> sitting in packet in/out paths can be very hot so didn't wanna disturb
> it.

I see.

>> Also the algorithm to figure out the size of the array might be a
>> bit too aggressive in my opinion. So you always start at
>> PRIOMAP_MIN_SIZE and then try to double the size until target_idx
>> fits. Wouldn't it make sense to start to look for the new size
>> beginning at old->priomap_len and then do the power-of-two increase?
>
> The only downside of always starting from PRIOMAP_MIN_SIZE is
> iterating several more times in the sizing loop which isn't really
> anything to worry about.  The loop is structured that way because I
> wanted to keep the size of the whole thing power-of-two.  Due to the
> fields before priomap[], if we size priomap_len power-of-two, we'll
> always end up with something slightly over power-of-two, which is
> usually the worst size to allocate.

Thanks for the explanation. I was pondering if the new size in power of 
two could be a bit too excessive and the allocation step could be 
linear, e.g. stick at 4096. target_id will increase linear, therefore 
linear increase might also be enough, no?

cheers,
daniel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tejun Heo Nov. 20, 2012, 3:13 p.m. UTC | #4

Hello,

On Tue, Nov 20, 2012 at 04:09:22PM +0100, Daniel Wagner wrote:
> Thanks for the explanation. I was pondering if the new size in power
> of two could be a bit too excessive and the allocation step could be
> linear, e.g. stick at 4096. target_id will increase linear,
> therefore linear increase might also be enough, no?

Well, power-of-two resizing tends to behave relatively well under most
cases and slab allocations are binned by power-of-two sizes, so
linearly growing it doesn't really buy anything.

Thanks.

Daniel Wagner Nov. 20, 2012, 3:16 p.m. UTC | #5

On 20.11.2012 16:13, Tejun Heo wrote:
> Hello,
>
> On Tue, Nov 20, 2012 at 04:09:22PM +0100, Daniel Wagner wrote:
>> Thanks for the explanation. I was pondering if the new size in power
>> of two could be a bit too excessive and the allocation step could be
>> linear, e.g. stick at 4096. target_id will increase linear,
>> therefore linear increase might also be enough, no?
>
> Well, power-of-two resizing tends to behave relatively well under most
> cases and slab allocations are binned by power-of-two sizes, so
> linearly growing it doesn't really buy anything.

Convinced :)

Tested and Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>

thanks,
daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[4/7] netprio_cgroup: reimplement priomap expansion

Commit Message

Comments

Patch