From patchwork Thu Sep 11 23:38:22 2008 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andre Detsch X-Patchwork-Id: 257 X-Patchwork-Delegate: jk@ozlabs.org Return-Path: X-Original-To: patchwork@ozlabs.org Delivered-To: patchwork@ozlabs.org Received: from ozlabs.org (localhost [127.0.0.1]) by ozlabs.org (Postfix) with ESMTP id 18CCDDE516 for ; Fri, 12 Sep 2008 09:40:35 +1000 (EST) X-Original-To: cbe-oss-dev@ozlabs.org Delivered-To: cbe-oss-dev@ozlabs.org Received: from igw1.br.ibm.com (igw1.br.ibm.com [32.104.18.24]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 3E68BDE011; Fri, 12 Sep 2008 09:39:16 +1000 (EST) Received: from mailhub3.br.ibm.com (mailhub3 [9.18.232.110]) by igw1.br.ibm.com (Postfix) with ESMTP id D9EF832C016; Thu, 11 Sep 2008 20:08:34 -0300 (BRT) Received: from d24av01.br.ibm.com (d24av01.br.ibm.com [9.18.232.46]) by mailhub3.br.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m8BNdBM81167598; Thu, 11 Sep 2008 20:39:11 -0300 Received: from d24av01.br.ibm.com (loopback [127.0.0.1]) by d24av01.br.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m8BNd3aB024393; Thu, 11 Sep 2008 20:39:03 -0300 Received: from [9.8.10.86] ([9.8.10.86]) by d24av01.br.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id m8BNd3Bw024383; Thu, 11 Sep 2008 20:39:03 -0300 From: Andre Detsch To: cbe-oss-dev@ozlabs.org Date: Thu, 11 Sep 2008 20:38:22 -0300 User-Agent: KMail/1.9.6 References: <200809111955.28780.adetsch@br.ibm.com> In-Reply-To: <200809111955.28780.adetsch@br.ibm.com> MIME-Version: 1.0 Content-Disposition: inline Message-Id: <200809112038.22459.adetsch@br.ibm.com> Cc: LukeBrowning@us.ibm.com, Jeremy Kerr Subject: [Cbe-oss-dev] [PATCH 09/11] powerpc/spufs: Limit size of gangs to avoid starvation due to reserved spus X-BeenThere: cbe-oss-dev@ozlabs.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: Discussion about Open Source Software for the Cell Broadband Engine List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: cbe-oss-dev-bounces+patchwork=ozlabs.org@ozlabs.org Errors-To: cbe-oss-dev-bounces+patchwork=ozlabs.org@ozlabs.org At context creation, determine the number of available spus for general scheduling to avoid creating a gang that can;t be placed by the scheduler. Gangs require concurrent scheduling, so multiple spus have to be allocated at the same time. Similarly, prevent the reservation of an spu, if it would result in the inability to schedule an existing job. A new data structure is introduced to keep track of active gangs. It is designed to show the size of the largest active gang and is coded to handle the dynamic addition and deletion of contexts within gangs. An array is allocated that has an element for each spus in the system. As contexts are added and removed, elements are incremented and decremented to show the number of gangs of at least that size. For example, element 0 represents the number of active gangs with at least 1 context. element 1 is the number of gangs with at least 2 contexts, and so on. A high water mark is kept to keep track of the largest gang. Signed-off-by: Luke Browning Signed-off-by: Andre Detsch diff --git a/arch/powerpc/platforms/cell/spufs/context.c b/arch/powerpc/platforms/cell/spufs/context.c index c472519..7ca787e 100644 --- a/arch/powerpc/platforms/cell/spufs/context.c +++ b/arch/powerpc/platforms/cell/spufs/context.c @@ -32,6 +32,22 @@ atomic_t nr_spu_contexts = ATOMIC_INIT(0); +static void inc_active_gangs(struct spu_gang *gang) +{ + if (atomic_inc_return(&spu_active_gangs[gang->contexts]) == 1) { + atomic_set(&largest_active_gang, gang->contexts); + mb(); /* XXX atomic_set doesn't have a sync */ + } +} + +static void dec_active_gangs(struct spu_gang *gang) +{ + if (!atomic_dec_return(&spu_active_gangs[gang->contexts])) { + atomic_set(&largest_active_gang, gang->contexts); + mb(); /* XXX atomic_set doesn't have a sync */ + } +} + struct spu_context *alloc_spu_context(struct spu_gang *gang) { struct spu_context *ctx; @@ -57,6 +73,8 @@ struct spu_context *alloc_spu_context(struct spu_gang *gang) if (spu_init_csa(&ctx->csa)) goto out_free_gang; + inc_active_gangs(gang); + /* If the gang is running, it needs to be stopped, since we have a * new context that needs to be gang scheduled. Gangs are allowed * to grow and shrink over time, but they are unscheduled when it @@ -89,6 +107,7 @@ struct spu_context *alloc_spu_context(struct spu_gang *gang) ctx->stats.util_state = SPU_UTIL_IDLE_LOADED; atomic_inc(&nr_spu_contexts); + goto out; out_free_gang: @@ -123,6 +142,7 @@ void destroy_spu_context(struct kref *kref) if (ctx->prof_priv_kref) kref_put(ctx->prof_priv_kref, ctx->prof_priv_release); atomic_dec(&nr_spu_contexts); + dec_active_gangs(gang); kfree(ctx->switch_log); kfree(ctx); } diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c index cf97761..c455a44 100644 --- a/arch/powerpc/platforms/cell/spufs/inode.c +++ b/arch/powerpc/platforms/cell/spufs/inode.c @@ -263,6 +263,7 @@ spufs_mkdir(struct inode *dir, struct dentry *dentry, unsigned int flags, struct inode *inode; struct spu_context *ctx, *gang_ctx; struct spu_gang *gang; + int node, avail_spus; ret = -ENOSPC; inode = spufs_new_inode(dir->i_sb, mode | S_IFDIR); @@ -280,6 +281,27 @@ spufs_mkdir(struct inode *dir, struct dentry *dentry, unsigned int flags, } } + for (node = 0, avail_spus = 0; node < MAX_NUMNODES; node++) { + avail_spus += cbe_spu_info[node].n_spus - atomic_read( + &cbe_spu_info[node].reserved_spus); + } + + /* Ensure there are enough available spus for scheduling. */ + if (flags & SPU_CREATE_NOSCHED) { + /* Can't reserve an spu if it would starve an active gang */ + if (avail_spus <= atomic_read(&largest_active_gang) + 1) { + ret = -EPERM; + goto out_iput; + } + } + else { + /* Can't create a gang too big either. */ + if (!avail_spus || (gang && gang->contexts + 1 > avail_spus)) { + ret = -EPERM; + goto out_iput; + } + } + if (dir->i_mode & S_ISGID) { inode->i_gid = dir->i_gid; inode->i_mode &= S_ISGID; diff --git a/arch/powerpc/platforms/cell/spufs/sched.c b/arch/powerpc/platforms/cell/spufs/sched.c index f3dee8d..8326034 100644 --- a/arch/powerpc/platforms/cell/spufs/sched.c +++ b/arch/powerpc/platforms/cell/spufs/sched.c @@ -90,6 +90,9 @@ static struct timer_list spusched_timer; static struct timer_list spuloadavg_timer; static void spu_unschedule(struct spu_gang *gang); +atomic_t *spu_active_gangs; +atomic_t largest_active_gang; + /* * Priority of a normal, non-rt, non-niced'd process (aka nice level 0). */ @@ -1492,12 +1495,27 @@ static const struct file_operations spu_loadavg_fops = { int __init spu_sched_init(void) { struct proc_dir_entry *entry; - int err = -ENOMEM, i; + int err = -ENOMEM, node, nspus, i; spu_prio = kzalloc(sizeof(struct spu_prio_array), GFP_KERNEL); if (!spu_prio) goto out; + /* + * A gang cannot be larger than the number of spus in the system + * since they have to be scheduled at the same time. Allocate an + * array of that length to keep track of the size of active gangs. + * We need to limit the number of spus that can be reserved to + * the starvation of gangs. A reserved spus can be used by the + * scheduler. + */ + for (node = 0, nspus = 0; node < MAX_NUMNODES; node++) + nspus += cbe_spu_info[node].n_spus; + spu_active_gangs = kzalloc(sizeof(atomic_t) * nspus, GFP_KERNEL); + if (!spu_active_gangs) + goto out_free_spu_prio; + atomic_set(&largest_active_gang, 0); + for (i = 0; i < MAX_PRIO; i++) { INIT_LIST_HEAD(&spu_prio->runq[i]); __clear_bit(i, spu_prio->bitmap); @@ -1551,5 +1569,6 @@ void spu_sched_exit(void) } spin_unlock(&cbe_spu_info[node].list_lock); } + kfree(spu_active_gangs); kfree(spu_prio); } diff --git a/arch/powerpc/platforms/cell/spufs/spufs.h b/arch/powerpc/platforms/cell/spufs/spufs.h index de436f2..6afc514 100644 --- a/arch/powerpc/platforms/cell/spufs/spufs.h +++ b/arch/powerpc/platforms/cell/spufs/spufs.h @@ -297,6 +297,17 @@ int put_spu_gang(struct spu_gang *gang); void spu_gang_remove_ctx(struct spu_gang *gang, struct spu_context *ctx); void spu_gang_add_ctx(struct spu_gang *gang, struct spu_context *ctx); +/* + * Each element of the spu_active_gang[] identifies the number of active + * gangs of at least that size. largest_active_gang identifies the size of + * the largest aactive gang in the system. Array elements are incremented + * as contexts are created and they are decremented as contexts are destroyed. + * The first context in a gang increments element[1], the second element[2], + * and so on. largest_active_gang is set to the highest non-zero array element. + */ +extern atomic_t largest_active_gang; +extern atomic_t *spu_active_gangs; + /* fault handling */