From patchwork Fri Oct 4 20:15:07 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 280731 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id EB7492C00C1 for ; Sat, 5 Oct 2013 06:15:22 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; q=dns; s=default; b=aPJRozL1ihyeOv8O8aTd0fyHGBnol CZIiGFVFzQuqZQesRkPIXnzDOw21V24u6Vd6UIj7CR/luVyozKazyPtIdGdgm6wQ 8HYJFmxTOEL473yyx6Z5PPsn67yphL9+FONDUSLLBFwjdKpmnvugMSBaI/mw1ej1 eEQi0ddwjoXT14= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; s=default; bh=zDMqU0dgSiIkstFq55pMIrsn1O8=; b=rpt rRrPnflV78pqHTSBChFJm4V6Y16sk5rAiOUR+H+jOiAnoK+9XMRDbRtvGO73n7cd he+G2y9Rtp4UnkhZHpeagz0oamarzGUrLFAghtBjTzwVu2mokUQFfoEdaUwZz25Q vS3M4YZFrbOkKSWiPUAGwo72lDIZT4Ixae6q4CVg= Received: (qmail 7039 invoked by alias); 4 Oct 2013 20:15:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 7021 invoked by uid 89); 4 Oct 2013 20:15:14 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-4.4 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 04 Oct 2013 20:15:12 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r94KFACs024910 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 4 Oct 2013 16:15:10 -0400 Received: from tucnak.zalov.cz (vpn1-5-163.ams2.redhat.com [10.36.5.163]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r94KF8q3011908 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 4 Oct 2013 16:15:09 -0400 Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.14.7/8.14.7) with ESMTP id r94KF7Dl018407; Fri, 4 Oct 2013 22:15:07 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.14.7/8.14.7/Submit) id r94KF7GW018406; Fri, 4 Oct 2013 22:15:07 +0200 Date: Fri, 4 Oct 2013 22:15:07 +0200 From: Jakub Jelinek To: Richard Henderson , Torvald Riegel , "Michael V. Zolotukhin" Cc: gcc-patches@gcc.gnu.org Subject: [gomp4] Target fallback ICV handling, ICV fixes Message-ID: <20131004201507.GF30970@tucnak.zalov.cz> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Hi! I've committed the following patch to gomp-4.0-branch. The omp-low.c changes are to fix some bugs with if clause on #pragma omp target{, data, update}. The c-cppbuiltin.c is to finally announce OpenMP 4.0 support for C/C++. The libgomp changes are: 1) as required by OpenMP 4.0, thread_limit_var is now a per-data-environment ICV, rather than global var 2) gomp_remaining_threads_count has been removed, instead as required by the spec ThreadsBusy from the spec is tracked per contention group inside of thread_pool; if there is just one contention group, then the new thr->thread_pool->threads_busy should be the difference between icv->thread_limit_var and the old gomp_remaining_threads_count; so, in gomp_resolve_num_threads now we add nthreads - 1 to it rather than subtracting it 3) apparently the old OMP_THREAD_LIMIT code was buggy, because GOMP_parallel_end was also subtracting from gomp_remaining_threads_count rather than adding to it (with the new code it is correct to subtract; when I get spare time I'll write a small alternative patch for the release branches together with thread-limit-1.c testcase) 4) as the threads_busy count is now per-contention group, if a parallel isn't nested, we actually don't need to atomically update the counter, because there is just one thread in the contention group 5) gomp_managed_threads counter remains to be a global var, that is used to decide about spinning length, that is desirable to be global and is not user observable thing covered by the standard; I've just renamed the mutex guarding it 6) for GOMP_target host fallback, the function will create a new initial thread by making a copy of the old TLS *gomp_thread () and clearing it (except for affinity place and reinitializing it's place var to the whole place list), then restoring back 7) I've noticed that &thr->release semaphore is never used for the master threads, so there is no point initializing it; we were initializing it just for the first initial thread, e.g. not in subsequent user pthread_create created threads that encounter #pragma omp constructs; and the semaphore wasn't ever destroyed 8) GOMP_teams is now implemented for the host fallback just by adjusting icv->thread_limit_var 9) on the target-7.c testcase I found several issues in the var remapping code (some fields could be uninitialized in certain cases) Tested on x86_64-linux, committed. 2013-10-04 Jakub Jelinek * omp-low.c (expand_omp_target): When handling IF clause on #pragma omp target, split new_bb rather than entry_bb. If not GF_OMP_TARGET_KIND_REGION, split new_bb right before the GOMP_TARGET stmt, rather than after labels. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins): Predefine _OPENMP to 201307 instead of 201107. libgomp/ * libgomp.h (struct gomp_task_icv): Add thread_limit_var. (gomp_thread_limit_var, gomp_remaining_threads_count, gomp_remaining_threads_lock): Remove. (gomp_managed_threads_lock): New variable. (struct gomp_thread_pool): Add threads_busy field. (gomp_free_thread): New prototype. * parallel.c (gomp_resolve_num_threads): Adjust for thread_limit now being in icv->thread_limit_var. Use UINT_MAX instead of ULONG_MAX as infinity. If not nested, just return minimum of max_num_threads and icv->thread_limit_var and if thr->thread_pool, set threads_busy to the returned value. Otherwise, don't update atomically gomp_remaining_threads_count, but instead thr->thread_pool->threads_busy. (GOMP_parallel_end): Adjust for thread_limit now being in icv->thread_limit_var. Use UINT_MAX instead of ULONG_MAX as infinity. Adjust threads_busy in the pool rather than gomp_remaining_threads_count. Remember team->nthreads and call gomp_team_end before adjusting threads_busy, if not nested afterwards, just set it to 1 non-atomically. * team.c (gomp_thread_start): Clear thr->thread_pool and thr->task before returning. (gomp_free_pool_helper): Clear thr->thread_pool and thr->task before calling pthread_exit. (gomp_free_thread): No longer static. Use gomp_managed_threads_lock instead of gomp_remaining_threads_lock. (gomp_team_start): Set thr->thread_pool->threads_busy to nthreads immediately after creating new pool. Use gomp_managed_threads_lock instead of gomp_remaining_threads_lock. (gomp_team_end): Use gomp_managed_threads_lock instead of gomp_remaining_threads_lock. (initialize_team): Don't call gomp_sem_init here. * env.c (gomp_global_icv): Initialize thread_limit_var field. (gomp_thread_limit_var, gomp_remaining_threads_count, gomp_remaining_threads_lock): Remove. (gomp_managed_threads_locks): New variable. (handle_omp_display_env): Adjust for thread_limit var being in gomp_global_icv. (initialize_env): Likewise. If user provided value is > INT_MAX, set gomp_global_icv.thread_limit_var to UINT_MAX (infinity). Initialize gomp_managed_threads_lock instead of gomp_remaining_threads_lock. (omp_get_thread_limit): Adjust for thread_limit var in icv->thread_limit_var. * target.c: Include limits.h. (resolve_device): If device_id is < -1, return NULL. (gomp_map_vars): Make sure tgt->array, tgt->to_free, tgt->tgt_start, tgt->tgt_end and k->copy_from aren't left uninitialized. (GOMP_target): Arrange for host callback to be performed in a separate initial thread and contention group, inheriting ICVs from gomp_global_icv etc. (GOMP_teams): Adjust thread_limit_var ICV. * testsuite/libgomp.c/affinity-1.c (main): Call omp_set_dynamic (0). * testsuite/libgomp.c/target-3.c: New test. * testsuite/libgomp.c/target-4.c: New test. * testsuite/libgomp.c/target-5.c: New test. * testsuite/libgomp.c/target-6.c: New test. * testsuite/libgomp.c/target-7.c: New test. * testsuite/libgomp.c/thread-limit-1.c: New test. * testsuite/libgomp.c/thread-limit-2.c: New test. * testsuite/libgomp.c/thread-limit-3.c: New test. Jakub --- gcc/omp-low.c.jj 2013-09-26 09:53:03.000000000 +0200 +++ gcc/omp-low.c 2013-10-04 21:39:19.927947940 +0200 @@ -7881,9 +7881,16 @@ expand_omp_target (struct omp_region *re tree tmp_var; tmp_var = create_tmp_var (TREE_TYPE (device), NULL); - e = split_block (entry_bb, NULL); + if (kind != GF_OMP_TARGET_KIND_REGION) + { + gsi = gsi_last_bb (new_bb); + gsi_prev (&gsi); + e = split_block (new_bb, gsi_stmt (gsi)); + } + else + e = split_block (new_bb, NULL); cond_bb = e->src; - entry_bb = e->dest; + new_bb = e->dest; remove_edge (e); then_bb = create_empty_bb (cond_bb); @@ -7892,7 +7899,7 @@ expand_omp_target (struct omp_region *re set_immediate_dominator (CDI_DOMINATORS, else_bb, cond_bb); stmt = gimple_build_cond_empty (cond); - gsi = gsi_start_bb (cond_bb); + gsi = gsi_last_bb (cond_bb); gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING); gsi = gsi_start_bb (then_bb); @@ -7911,8 +7918,8 @@ expand_omp_target (struct omp_region *re add_bb_to_loop (then_bb, cond_bb->loop_father); add_bb_to_loop (else_bb, cond_bb->loop_father); } - make_edge (then_bb, entry_bb, EDGE_FALLTHRU); - make_edge (else_bb, entry_bb, EDGE_FALLTHRU); + make_edge (then_bb, new_bb, EDGE_FALLTHRU); + make_edge (else_bb, new_bb, EDGE_FALLTHRU); device = tmp_var; } --- gcc/c-family/c-cppbuiltin.c.jj 2013-06-26 12:13:47.000000000 +0200 +++ gcc/c-family/c-cppbuiltin.c 2013-10-04 21:10:43.884067723 +0200 @@ -896,7 +896,7 @@ c_cpp_builtins (cpp_reader *pfile) cpp_define (pfile, "__SSP__=1"); if (flag_openmp) - cpp_define (pfile, "_OPENMP=201107"); + cpp_define (pfile, "_OPENMP=201307"); if (int128_integer_type_node != NULL_TREE) builtin_define_type_sizeof ("__SIZEOF_INT128__", --- libgomp/libgomp.h.jj 2013-10-04 10:28:53.000000000 +0200 +++ libgomp/libgomp.h 2013-10-04 13:48:39.460550732 +0200 @@ -232,6 +232,7 @@ struct gomp_task_icv enum gomp_schedule_type run_sched_var; int run_sched_modifier; int default_device_var; + unsigned int thread_limit_var; bool dyn_var; bool nest_var; char bind_var; @@ -240,10 +241,8 @@ struct gomp_task_icv }; extern struct gomp_task_icv gomp_global_icv; -extern unsigned long gomp_thread_limit_var; -extern unsigned long gomp_remaining_threads_count; #ifndef HAVE_SYNC_BUILTINS -extern gomp_mutex_t gomp_remaining_threads_lock; +extern gomp_mutex_t gomp_managed_threads_lock; #endif extern unsigned long gomp_max_active_levels_var; extern bool gomp_cancel_var; @@ -431,6 +430,8 @@ struct gomp_thread_pool unsigned threads_size; unsigned threads_used; struct gomp_team *last_team; + /* Number of threads running in this contention group. */ + unsigned long threads_busy; /* This barrier holds and releases threads waiting in threads. */ gomp_barrier_t threads_dock; @@ -580,6 +581,7 @@ extern struct gomp_team *gomp_new_team ( extern void gomp_team_start (void (*) (void *), void *, unsigned, unsigned, struct gomp_team *); extern void gomp_team_end (void); +extern void gomp_free_thread (void *); /* target.c */ --- libgomp/parallel.c.jj 2013-10-02 16:01:09.000000000 +0200 +++ libgomp/parallel.c 2013-10-04 14:54:47.843337279 +0200 @@ -37,18 +37,19 @@ unsigned gomp_resolve_num_threads (unsigned specified, unsigned count) { - struct gomp_thread *thread = gomp_thread(); + struct gomp_thread *thr = gomp_thread (); struct gomp_task_icv *icv; unsigned threads_requested, max_num_threads, num_threads; - unsigned long remaining; + unsigned long busy; + struct gomp_thread_pool *pool; icv = gomp_icv (false); if (specified == 1) return 1; - else if (thread->ts.active_level >= 1 && !icv->nest_var) + else if (thr->ts.active_level >= 1 && !icv->nest_var) return 1; - else if (thread->ts.active_level >= gomp_max_active_levels_var) + else if (thr->ts.active_level >= gomp_max_active_levels_var) return 1; /* If NUM_THREADS not specified, use nthreads_var. */ @@ -72,30 +73,46 @@ gomp_resolve_num_threads (unsigned speci max_num_threads = count; } - /* ULONG_MAX stands for infinity. */ - if (__builtin_expect (gomp_thread_limit_var == ULONG_MAX, 1) + /* UINT_MAX stands for infinity. */ + if (__builtin_expect (icv->thread_limit_var == UINT_MAX, 1) || max_num_threads == 1) return max_num_threads; + /* The threads_busy counter lives in thread_pool, if there + isn't a thread_pool yet, there must be just one thread + in the contention group. If thr->team is NULL, this isn't + nested parallel, so there is just one thread in the + contention group as well, no need to handle it atomically. */ + pool = thr->thread_pool; + if (thr->ts.team == NULL) + { + num_threads = max_num_threads; + if (num_threads > icv->thread_limit_var) + num_threads = icv->thread_limit_var; + if (pool) + pool->threads_busy = num_threads; + return num_threads; + } + #ifdef HAVE_SYNC_BUILTINS do { - remaining = gomp_remaining_threads_count; + busy = pool->threads_busy; num_threads = max_num_threads; - if (num_threads > remaining) - num_threads = remaining + 1; + if (icv->thread_limit_var - busy + 1 < num_threads) + num_threads = icv->thread_limit_var - busy + 1; } - while (__sync_val_compare_and_swap (&gomp_remaining_threads_count, - remaining, remaining - num_threads + 1) - != remaining); + while (__sync_val_compare_and_swap (&pool->threads_busy, + busy, busy + num_threads - 1) + != busy); #else - gomp_mutex_lock (&gomp_remaining_threads_lock); + gomp_mutex_lock (&gomp_managed_threads_lock); num_threads = max_num_threads; - remaining = gomp_remaining_threads_count; - if (num_threads > remaining) - num_threads = remaining + 1; - gomp_remaining_threads_count -= num_threads - 1; - gomp_mutex_unlock (&gomp_remaining_threads_lock); + busy = pool->threads_busy; + if (icv->thread_limit_var - busy + 1 < num_threads) + num_threads = icv->thread_limit_var - busy + 1; + pool->threads_busy += num_threads - 1; + gomp_mutex_unlock (&gomp_managed_threads_lock); #endif return num_threads; @@ -111,23 +128,34 @@ GOMP_parallel_start (void (*fn) (void *) void GOMP_parallel_end (void) { - if (__builtin_expect (gomp_thread_limit_var != ULONG_MAX, 0)) + struct gomp_task_icv *icv = gomp_icv (false); + if (__builtin_expect (icv->thread_limit_var != UINT_MAX, 0)) { struct gomp_thread *thr = gomp_thread (); struct gomp_team *team = thr->ts.team; - if (team && team->nthreads > 1) + unsigned int nthreads = team ? team->nthreads : 1; + gomp_team_end (); + if (nthreads > 1) { + /* If not nested, there is just one thread in the + contention group left, no need for atomicity. */ + if (thr->ts.team == NULL) + thr->thread_pool->threads_busy = 1; + else + { #ifdef HAVE_SYNC_BUILTINS - __sync_fetch_and_add (&gomp_remaining_threads_count, - 1UL - team->nthreads); + __sync_fetch_and_add (&thr->thread_pool->threads_busy, + 1UL - nthreads); #else - gomp_mutex_lock (&gomp_remaining_threads_lock); - gomp_remaining_threads_count -= team->nthreads - 1; - gomp_mutex_unlock (&gomp_remaining_threads_lock); + gomp_mutex_lock (&gomp_managed_threads_lock); + thr->thread_pool->threads_busy -= nthreads - 1; + gomp_mutex_unlock (&gomp_managed_threads_lock); #endif + } } } - gomp_team_end (); + else + gomp_team_end (); } ialias (GOMP_parallel_end) --- libgomp/team.c.jj 2013-10-03 16:27:34.000000000 +0200 +++ libgomp/team.c 2013-10-04 18:44:31.100494492 +0200 @@ -128,6 +128,8 @@ gomp_thread_start (void *xdata) } gomp_sem_destroy (&thr->release); + thr->thread_pool = NULL; + thr->task = NULL; return NULL; } @@ -204,16 +206,19 @@ static struct gomp_thread_pool *gomp_new static void gomp_free_pool_helper (void *thread_pool) { + struct gomp_thread *thr = gomp_thread (); struct gomp_thread_pool *pool = (struct gomp_thread_pool *) thread_pool; gomp_barrier_wait_last (&pool->threads_dock); - gomp_sem_destroy (&gomp_thread ()->release); + gomp_sem_destroy (&thr->release); + thr->thread_pool = NULL; + thr->task = NULL; pthread_exit (NULL); } /* Free a thread pool and release its threads. */ -static void +void gomp_free_thread (void *arg __attribute__((unused))) { struct gomp_thread *thr = gomp_thread (); @@ -241,9 +246,9 @@ gomp_free_thread (void *arg __attribute_ __sync_fetch_and_add (&gomp_managed_threads, 1L - pool->threads_used); #else - gomp_mutex_lock (&gomp_remaining_threads_lock); + gomp_mutex_lock (&gomp_managed_threads_lock); gomp_managed_threads -= pool->threads_used - 1L; - gomp_mutex_unlock (&gomp_remaining_threads_lock); + gomp_mutex_unlock (&gomp_managed_threads_lock); #endif } free (pool->threads); @@ -285,6 +290,7 @@ gomp_team_start (void (*fn) (void *), vo if (__builtin_expect (thr->thread_pool == NULL, 0)) { thr->thread_pool = gomp_new_thread_pool (); + thr->thread_pool->threads_busy = nthreads; pthread_setspecific (gomp_thread_destructor, thr); } pool = thr->thread_pool; @@ -678,9 +684,9 @@ gomp_team_start (void (*fn) (void *), vo #ifdef HAVE_SYNC_BUILTINS __sync_fetch_and_add (&gomp_managed_threads, diff); #else - gomp_mutex_lock (&gomp_remaining_threads_lock); + gomp_mutex_lock (&gomp_managed_threads_lock); gomp_managed_threads += diff; - gomp_mutex_unlock (&gomp_remaining_threads_lock); + gomp_mutex_unlock (&gomp_managed_threads_lock); #endif } @@ -822,9 +828,9 @@ gomp_team_start (void (*fn) (void *), vo #ifdef HAVE_SYNC_BUILTINS __sync_fetch_and_add (&gomp_managed_threads, diff); #else - gomp_mutex_lock (&gomp_remaining_threads_lock); + gomp_mutex_lock (&gomp_managed_threads_lock); gomp_managed_threads += diff; - gomp_mutex_unlock (&gomp_remaining_threads_lock); + gomp_mutex_unlock (&gomp_managed_threads_lock); #endif } if (__builtin_expect (affinity_thr != NULL, 0) @@ -871,9 +877,9 @@ gomp_team_end (void) #ifdef HAVE_SYNC_BUILTINS __sync_fetch_and_add (&gomp_managed_threads, 1L - team->nthreads); #else - gomp_mutex_lock (&gomp_remaining_threads_lock); + gomp_mutex_lock (&gomp_managed_threads_lock); gomp_managed_threads -= team->nthreads - 1L; - gomp_mutex_unlock (&gomp_remaining_threads_lock); + gomp_mutex_unlock (&gomp_managed_threads_lock); #endif /* This barrier has gomp_barrier_wait_last counterparts and ensures the team can be safely destroyed. */ @@ -914,8 +920,6 @@ gomp_team_end (void) static void __attribute__((constructor)) initialize_team (void) { - struct gomp_thread *thr; - #ifndef HAVE_TLS static struct gomp_thread initial_thread_tls_data; @@ -925,13 +929,6 @@ initialize_team (void) if (pthread_key_create (&gomp_thread_destructor, gomp_free_thread) != 0) gomp_fatal ("could not create thread pool destructor."); - -#ifdef HAVE_TLS - thr = &gomp_tls_data; -#else - thr = &initial_thread_tls_data; -#endif - gomp_sem_init (&thr->release, 0); } static void __attribute__((destructor)) --- libgomp/env.c.jj 2013-10-04 10:28:53.000000000 +0200 +++ libgomp/env.c 2013-10-04 14:54:47.843337279 +0200 @@ -54,6 +54,7 @@ struct gomp_task_icv gomp_global_icv = { .nthreads_var = 1, + .thread_limit_var = UINT_MAX, .run_sched_var = GFS_DYNAMIC, .run_sched_modifier = 1, .default_device_var = 0, @@ -64,11 +65,9 @@ struct gomp_task_icv gomp_global_icv = { }; unsigned long gomp_max_active_levels_var = INT_MAX; -unsigned long gomp_thread_limit_var = ULONG_MAX; bool gomp_cancel_var = false; -unsigned long gomp_remaining_threads_count; #ifndef HAVE_SYNC_BUILTINS -gomp_mutex_t gomp_remaining_threads_lock; +gomp_mutex_t gomp_managed_threads_lock; #endif unsigned long gomp_available_cpus = 1, gomp_managed_threads = 1; unsigned long long gomp_spin_count_var, gomp_throttled_spin_count_var; @@ -1126,8 +1125,8 @@ handle_omp_display_env (unsigned long st /* GOMP's default value is actually neither active nor passive. */ fprintf (stderr, " OMP_WAIT_POLICY = '%s'\n", wait_policy > 0 ? "ACTIVE" : "PASSIVE"); - fprintf (stderr, " OMP_THREAD_LIMIT = '%lu'\n", - gomp_thread_limit_var); + fprintf (stderr, " OMP_THREAD_LIMIT = '%u'\n", + gomp_global_icv.thread_limit_var); fprintf (stderr, " OMP_MAX_ACTIVE_LEVELS = '%lu'\n", gomp_max_active_levels_var); @@ -1156,7 +1155,7 @@ handle_omp_display_env (unsigned long st static void __attribute__((constructor)) initialize_env (void) { - unsigned long stacksize; + unsigned long thread_limit_var, stacksize; int wait_policy; /* Do a compile time check that mkomp_h.pl did good job. */ @@ -1169,11 +1168,13 @@ initialize_env (void) parse_int ("OMP_DEFAULT_DEVICE", &gomp_global_icv.default_device_var, true); parse_unsigned_long ("OMP_MAX_ACTIVE_LEVELS", &gomp_max_active_levels_var, true); - parse_unsigned_long ("OMP_THREAD_LIMIT", &gomp_thread_limit_var, false); - if (gomp_thread_limit_var != ULONG_MAX) - gomp_remaining_threads_count = gomp_thread_limit_var - 1; + if (parse_unsigned_long ("OMP_THREAD_LIMIT", &thread_limit_var, false)) + { + gomp_global_icv.thread_limit_var + = thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var; + } #ifndef HAVE_SYNC_BUILTINS - gomp_mutex_init (&gomp_remaining_threads_lock); + gomp_mutex_init (&gomp_managed_threads_lock); #endif gomp_init_num_threads (); gomp_available_cpus = gomp_global_icv.nthreads_var; @@ -1325,7 +1326,8 @@ omp_get_max_threads (void) int omp_get_thread_limit (void) { - return gomp_thread_limit_var > INT_MAX ? INT_MAX : gomp_thread_limit_var; + struct gomp_task_icv *icv = gomp_icv (false); + return icv->thread_limit_var > INT_MAX ? INT_MAX : icv->thread_limit_var; } void --- libgomp/target.c.jj 2013-09-23 16:07:25.000000000 +0200 +++ libgomp/target.c 2013-10-04 21:20:22.013998048 +0200 @@ -26,6 +26,7 @@ creation and termination. */ #include "libgomp.h" +#include #include #include #include @@ -144,8 +145,9 @@ resolve_device (int device_id) struct gomp_task_icv *icv = gomp_icv (false); device_id = icv->default_device_var; } - if (device_id >= gomp_get_num_devices () - && device_id != 257) + if (device_id < 0 + || (device_id >= gomp_get_num_devices () + && device_id != 257)) return NULL; /* FIXME: Temporary hack for testing non-shared address spaces on host. */ @@ -239,11 +241,18 @@ gomp_map_vars (struct gomp_device_descr tgt->tgt_start = (tgt->tgt_start + tgt_align - 1) & ~(tgt_align - 1); tgt->tgt_end = tgt->tgt_start + tgt_size; } + else + { + tgt->to_free = NULL; + tgt->tgt_start = 0; + tgt->tgt_end = 0; + } tgt_size = 0; if (is_target) tgt_size = mapnum * sizeof (void *); + tgt->array = NULL; if (not_found_cnt) { tgt->array = gomp_malloc (not_found_cnt * sizeof (*tgt->array)); @@ -273,6 +282,7 @@ gomp_map_vars (struct gomp_device_descr k->tgt = tgt; k->tgt_offset = tgt_size; tgt_size += k->host_end - k->host_start; + k->copy_from = false; if ((kinds[i] & 7) == 2 || (kinds[i] & 7) == 3) k->copy_from = true; k->refcount = 1; @@ -475,13 +485,33 @@ GOMP_target (int device, void (*fn) (voi if (devicep == NULL) { /* Host fallback. */ + struct gomp_thread old_thr, *thr = gomp_thread (); + old_thr = *thr; + memset (thr, '\0', sizeof (*thr)); + if (gomp_places_list) + { + thr->place = old_thr.place; + thr->ts.place_partition_len = gomp_places_list_len; + } fn (hostaddrs); + gomp_free_thread (thr); + *thr = old_thr; return; } struct target_mem_desc *tgt = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, true); + struct gomp_thread old_thr, *thr = gomp_thread (); + old_thr = *thr; + memset (thr, '\0', sizeof (*thr)); + if (gomp_places_list) + { + thr->place = old_thr.place; + thr->ts.place_partition_len = gomp_places_list_len; + } fn ((void *) tgt->tgt_start); + gomp_free_thread (thr); + *thr = old_thr; gomp_unmap_vars (tgt); } @@ -541,6 +571,13 @@ GOMP_target_update (int device, const vo void GOMP_teams (unsigned int num_teams, unsigned int thread_limit) { + if (thread_limit) + { + struct gomp_task_icv *icv = gomp_icv (true); + icv->thread_limit_var + = thread_limit > INT_MAX ? UINT_MAX : thread_limit; + } + (void) num_teams; } #ifdef PLUGIN_SUPPORT --- libgomp/testsuite/libgomp.c/affinity-1.c.jj 2013-10-04 09:51:49.000000000 +0200 +++ libgomp/testsuite/libgomp.c/affinity-1.c 2013-10-04 15:58:13.639326486 +0200 @@ -244,6 +244,7 @@ main () print_affinity (places_array[test_places].places[0]); printf ("\n"); omp_set_nested (1); + omp_set_dynamic (0); #pragma omp parallel if (0) { --- libgomp/testsuite/libgomp.c/target-3.c.jj 2013-10-04 15:55:43.360132627 +0200 +++ libgomp/testsuite/libgomp.c/target-3.c 2013-10-04 17:14:07.560142600 +0200 @@ -0,0 +1,17 @@ +#include +#include + +int +main () +{ + if (omp_get_level ()) + abort (); + #pragma omp target if (0) + if (omp_get_level ()) + abort (); + #pragma omp target if (0) + #pragma omp teams + if (omp_get_level ()) + abort (); + return 0; +} --- libgomp/testsuite/libgomp.c/target-4.c.jj 2013-10-04 15:55:43.360132627 +0200 +++ libgomp/testsuite/libgomp.c/target-4.c 2013-10-04 15:57:32.422553332 +0200 @@ -0,0 +1,14 @@ +#include +#include + +int +main () +{ + omp_set_dynamic (0); + #pragma omp parallel num_threads (4) + #pragma omp target if (0) + #pragma omp single + if (omp_get_num_threads () != 1) + abort (); + return 0; +} --- libgomp/testsuite/libgomp.c/target-5.c.jj 2013-10-04 15:55:43.360132627 +0200 +++ libgomp/testsuite/libgomp.c/target-5.c 2013-10-04 17:05:23.632931518 +0200 @@ -0,0 +1,83 @@ +#include +#include + +int +main () +{ + int d_o = omp_get_dynamic (); + int n_o = omp_get_nested (); + omp_sched_t s_o; + int c_o; + omp_get_schedule (&s_o, &c_o); + int m_o = omp_get_max_threads (); + omp_set_dynamic (1); + omp_set_nested (1); + omp_set_schedule (omp_sched_static, 2); + omp_set_num_threads (4); + int d = omp_get_dynamic (); + int n = omp_get_nested (); + omp_sched_t s; + int c; + omp_get_schedule (&s, &c); + int m = omp_get_max_threads (); + if (!omp_is_initial_device ()) + abort (); + #pragma omp target if (0) + { + omp_sched_t s_c; + int c_c; + omp_get_schedule (&s_c, &c_c); + if (d_o != omp_get_dynamic () + || n_o != omp_get_nested () + || s_o != s_c + || c_o != c_c + || m_o != omp_get_max_threads ()) + abort (); + omp_set_dynamic (0); + omp_set_nested (0); + omp_set_schedule (omp_sched_dynamic, 4); + omp_set_num_threads (2); + if (!omp_is_initial_device ()) + abort (); + } + if (!omp_is_initial_device ()) + abort (); + omp_sched_t s_c; + int c_c; + omp_get_schedule (&s_c, &c_c); + if (d != omp_get_dynamic () + || n != omp_get_nested () + || s != s_c + || c != c_c + || m != omp_get_max_threads ()) + abort (); + #pragma omp target if (0) + #pragma omp teams + { + omp_sched_t s_c; + int c_c; + omp_get_schedule (&s_c, &c_c); + if (d_o != omp_get_dynamic () + || n_o != omp_get_nested () + || s_o != s_c + || c_o != c_c + || m_o != omp_get_max_threads ()) + abort (); + omp_set_dynamic (0); + omp_set_nested (0); + omp_set_schedule (omp_sched_dynamic, 4); + omp_set_num_threads (2); + if (!omp_is_initial_device ()) + abort (); + } + if (!omp_is_initial_device ()) + abort (); + omp_get_schedule (&s_c, &c_c); + if (d != omp_get_dynamic () + || n != omp_get_nested () + || s != s_c + || c != c_c + || m != omp_get_max_threads ()) + abort (); + return 0; +} --- libgomp/testsuite/libgomp.c/target-6.c.jj 2013-10-04 15:55:43.360132627 +0200 +++ libgomp/testsuite/libgomp.c/target-6.c 2013-10-04 18:12:54.097493969 +0200 @@ -0,0 +1,68 @@ +#include +#include + +int +main () +{ + omp_set_dynamic (0); + omp_set_nested (1); + if (omp_in_parallel ()) + abort (); + #pragma omp parallel num_threads (3) + if (omp_get_thread_num () == 2) + { + if (!omp_in_parallel ()) + abort (); + #pragma omp parallel num_threads (3) + if (omp_get_thread_num () == 1) + { + if (!omp_in_parallel () + || omp_get_level () != 2 + || omp_get_ancestor_thread_num (0) != 0 + || omp_get_ancestor_thread_num (1) != 2 + || omp_get_ancestor_thread_num (2) != 1 + || omp_get_ancestor_thread_num (3) != -1) + abort (); + #pragma omp target if (0) + { + if (omp_in_parallel () + || omp_get_level () != 0 + || omp_get_ancestor_thread_num (0) != 0 + || omp_get_ancestor_thread_num (1) != -1) + abort (); + #pragma omp parallel num_threads (2) + { + if (!omp_in_parallel () + || omp_get_level () != 1 + || omp_get_ancestor_thread_num (0) != 0 + || omp_get_ancestor_thread_num (1) + != omp_get_thread_num () + || omp_get_ancestor_thread_num (2) != -1) + abort (); + } + } + #pragma omp target if (0) + { + #pragma omp teams thread_limit (2) + { + if (omp_in_parallel () + || omp_get_level () != 0 + || omp_get_ancestor_thread_num (0) != 0 + || omp_get_ancestor_thread_num (1) != -1) + abort (); + #pragma omp parallel num_threads (2) + { + if (!omp_in_parallel () + || omp_get_level () != 1 + || omp_get_ancestor_thread_num (0) != 0 + || omp_get_ancestor_thread_num (1) + != omp_get_thread_num () + || omp_get_ancestor_thread_num (2) != -1) + abort (); + } + } + } + } + } + return 0; +} --- libgomp/testsuite/libgomp.c/target-7.c.jj 2013-10-04 19:05:13.452909674 +0200 +++ libgomp/testsuite/libgomp.c/target-7.c 2013-10-04 21:47:21.814387228 +0200 @@ -0,0 +1,111 @@ +#include +#include + +volatile int v; + +void +foo (int f) +{ + int d = f ? omp_get_num_devices () : omp_get_default_device (); + int h = 5; + #pragma omp target device (d) + if (omp_get_level () != 0) + abort (); + #pragma omp target if (v > 1) + if (omp_get_level () != 0 || !omp_is_initial_device ()) + abort (); + #pragma omp target device (d) if (v > 1) + if (omp_get_level () != 0 || !omp_is_initial_device ()) + abort (); + #pragma omp target if (v <= 1) + if (omp_get_level () != 0 || (f && !omp_is_initial_device ())) + abort (); + #pragma omp target device (d) if (v <= 1) + if (omp_get_level () != 0 || (f && !omp_is_initial_device ())) + abort (); + #pragma omp target if (0) + if (omp_get_level () != 0 || !omp_is_initial_device ()) + abort (); + #pragma omp target device (d) if (0) + if (omp_get_level () != 0 || !omp_is_initial_device ()) + abort (); + #pragma omp target if (1) + if (omp_get_level () != 0 || (f && !omp_is_initial_device ())) + abort (); + #pragma omp target device (d) if (1) + if (omp_get_level () != 0 || (f && !omp_is_initial_device ())) + abort (); + #pragma omp target data device (d) map (to: h) + { + #pragma omp target device (d) + if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 5) + abort (); + #pragma omp target update device (d) from (h) + } + #pragma omp target data if (v > 1) map (to: h) + { + #pragma omp target if (v > 1) + if (omp_get_level () != 0 || !omp_is_initial_device () || h++ != 6) + abort (); + #pragma omp target update if (v > 1) from (h) + } + #pragma omp target data device (d) if (v > 1) map (to: h) + { + #pragma omp target device (d) if (v > 1) + if (omp_get_level () != 0 || !omp_is_initial_device () || h++ != 7) + abort (); + #pragma omp target update device (d) if (v > 1) from (h) + } + #pragma omp target data if (v <= 1) map (to: h) + { + #pragma omp target if (v <= 1) + if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 8) + abort (); + #pragma omp target update if (v <= 1) from (h) + } + #pragma omp target data device (d) if (v <= 1) map (to: h) + { + #pragma omp target device (d) if (v <= 1) + if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 9) + abort (); + #pragma omp target update device (d) if (v <= 1) from (h) + } + #pragma omp target data if (0) map (to: h) + { + #pragma omp target if (0) + if (omp_get_level () != 0 || !omp_is_initial_device () || h++ != 10) + abort (); + #pragma omp target update if (0) from (h) + } + #pragma omp target data device (d) if (0) map (to: h) + { + #pragma omp target device (d) if (0) + if (omp_get_level () != 0 || !omp_is_initial_device () || h++ != 11) + abort (); + #pragma omp target update device (d) if (0) from (h) + } + #pragma omp target data if (1) map (to: h) + { + #pragma omp target if (1) + if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 12) + abort (); + #pragma omp target update if (1) from (h) + } + #pragma omp target data device (d) if (1) map (to: h) + { + #pragma omp target device (d) if (1) + if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 13) + abort (); + #pragma omp target update device (d) if (1) from (h) + } + if (h != 14) + abort (); +} + +int +main () +{ + foo (0); + foo (1); + return 0; +} --- libgomp/testsuite/libgomp.c/thread-limit-1.c.jj 2013-10-04 15:36:59.363996164 +0200 +++ libgomp/testsuite/libgomp.c/thread-limit-1.c 2013-10-04 15:41:48.479547631 +0200 @@ -0,0 +1,35 @@ +/* { dg-do run } */ +/* { dg-set-target-env-var OMP_THREAD_LIMIT "6" } */ + +#include +#include + +int +main () +{ + if (omp_get_thread_limit () != 6) + return 0; + omp_set_dynamic (0); + omp_set_nested (1); + #pragma omp parallel num_threads (3) + if (omp_get_num_threads () != 3) + abort (); + #pragma omp parallel num_threads (3) + if (omp_get_num_threads () != 3) + abort (); + #pragma omp parallel num_threads (8) + if (omp_get_num_threads () > 6) + abort (); + #pragma omp parallel num_threads (6) + if (omp_get_num_threads () != 6) + abort (); + int cnt = 0; + #pragma omp parallel num_threads (5) + #pragma omp parallel num_threads (5) + #pragma omp parallel num_threads (2) + #pragma omp atomic + cnt++; + if (cnt > 6) + abort (); + return 0; +} --- libgomp/testsuite/libgomp.c/thread-limit-2.c.jj 2013-10-04 15:42:52.229212860 +0200 +++ libgomp/testsuite/libgomp.c/thread-limit-2.c 2013-10-04 15:48:28.230486490 +0200 @@ -0,0 +1,51 @@ +/* { dg-do run } */ +/* { dg-set-target-env-var OMP_THREAD_LIMIT "9" } */ + +#include +#include + +int +main () +{ + if (omp_get_thread_limit () != 9) + return 0; + omp_set_dynamic (0); + #pragma omp parallel num_threads (8) + if (omp_get_num_threads () != 8) + abort (); + #pragma omp parallel num_threads (16) + if (omp_get_num_threads () > 9) + abort (); + #pragma omp target if (0) + #pragma omp teams thread_limit (6) + { + if (omp_get_thread_limit () > 6) + abort (); + if (omp_get_thread_limit () == 6) + { + omp_set_dynamic (0); + omp_set_nested (1); + #pragma omp parallel num_threads (3) + if (omp_get_num_threads () != 3) + abort (); + #pragma omp parallel num_threads (3) + if (omp_get_num_threads () != 3) + abort (); + #pragma omp parallel num_threads (8) + if (omp_get_num_threads () > 6) + abort (); + #pragma omp parallel num_threads (6) + if (omp_get_num_threads () != 6) + abort (); + int cnt = 0; + #pragma omp parallel num_threads (5) + #pragma omp parallel num_threads (5) + #pragma omp parallel num_threads (2) + #pragma omp atomic + cnt++; + if (cnt > 6) + abort (); + } + } + return 0; +} --- libgomp/testsuite/libgomp.c/thread-limit-3.c.jj 2013-10-04 15:54:02.289657212 +0200 +++ libgomp/testsuite/libgomp.c/thread-limit-3.c 2013-10-04 15:55:04.677326917 +0200 @@ -0,0 +1,12 @@ +#include +#include + +int +main () +{ + #pragma omp target if (0) + #pragma omp teams thread_limit (1) + if (omp_get_thread_limit () != 1) + abort (); + return 0; +}