From patchwork Fri Feb 9 18:52:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Vladimir Makarov X-Patchwork-Id: 871529 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-472978-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="deO95JSf"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zdPM14Hphz9s4q for ; Sat, 10 Feb 2018 05:52:28 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; q=dns; s= default; b=e+sKJRPJmkt1+lOkCzQM78cXSkY6gMUpt4qZAAiozDmZCbCWVjWpm ch67vOWDFYrXHXI4nM45Z5YrBEfd6qJ5qPyIhAidk5ekh61PQFh0vIdadHWLds9V ksP6R+lTDBsn8ENPZYO/sB0vhDuZ3Hi0um9Jw0quWwH7mTHipdpYjI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; s= default; bh=6MsTiMn63MoQajuCSJ+YkkV4cp4=; b=deO95JSfmSW3nwatZNaz tqwMvIDTg5F0r03HUWWX2E44L73Ho+jipULgEZe/0FKAM1lbzFuZ1+7qkGQ1OQqt rXHVqdDUV5G6HK2XVaNFqQRyEt/6FiTQzKoPt0chLAZCX8+SGcroWJD4M8EWXMdq 2upsQN0nhOX7DYBjVv7/kJg= Received: (qmail 66354 invoked by alias); 9 Feb 2018 18:52:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 66340 invoked by uid 89); 9 Feb 2018 18:52:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_LOW, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=frequencies, colored X-HELO: mx1.redhat.com Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 09 Feb 2018 18:52:19 +0000 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8606626EDA for ; Fri, 9 Feb 2018 18:52:07 +0000 (UTC) Received: from [10.10.125.208] (ovpn-125-208.rdu2.redhat.com [10.10.125.208]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4C4B8100F9EA for ; Fri, 9 Feb 2018 18:52:07 +0000 (UTC) From: Vladimir Makarov Subject: patch to fix PR57193 To: "gcc-patches@gcc.gnu.org" Message-ID: <767eae56-0126-8751-cfd6-ed9f35a8660d@redhat.com> Date: Fri, 9 Feb 2018 13:52:07 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 X-IsSubscribed: yes The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193   The patch introduces a new heuristic to change order of coloring.  Allocnos conflicting with other allocnos preferring hard registers are colored first when other higher-level heuristics can not differ them.  On x86_64 this new heuristic results in practically the same SPEC2000 performance and code size.   The patch was successfully bootstrapped and tested on x86-64.   Committed as rev. 257537. Index: ChangeLog =================================================================== --- ChangeLog (revision 257536) +++ ChangeLog (working copy) @@ -1,3 +1,15 @@ +2018-02-09 Vladimir Makarov + + PR rtl-optimization/57193 + * ira-color.c (struct allocno_color_data): Add member + conflict_allocno_hard_prefs. + (update_conflict_allocno_hard_prefs): New. + (bucket_allocno_compare_func): Add a preference based on + conflict_allocno_hard_prefs. + (push_allocno_to_stack): Update conflict_allocno_hard_prefs. + (color_allocnos): Remove a dead code. Initiate + conflict_allocno_hard_prefs. Call update_costs_from_prefs. + 2018-02-09 Jakub Jelinek PR target/84226 Index: testsuite/ChangeLog =================================================================== --- testsuite/ChangeLog (revision 257536) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2018-02-09 Vladimir Makarov + + PR rtl-optimization/57193 + * gcc.target/i386/57193.c: New. + 2018-02-09 Jakub Jelinek PR target/84226 Index: ira-color.c =================================================================== --- ira-color.c (revision 257157) +++ ira-color.c (working copy) @@ -112,6 +112,9 @@ struct allocno_color_data available for the allocno allocation. It is number of the profitable hard regs. */ int available_regs_num; + /* Sum of frequencies of hard register preferences of all + conflicting allocnos which are not the coloring stack yet. */ + int conflict_allocno_hard_prefs; /* Allocnos in a bucket (used in coloring) chained by the following two members. */ ira_allocno_t next_bucket_allocno; @@ -1435,6 +1438,36 @@ update_costs_from_copies (ira_allocno_t update_costs_from_allocno (allocno, hard_regno, 1, decr_p, record_p); } +/* Update conflict_allocno_hard_prefs of allocnos conflicting with + ALLOCNO. */ +static void +update_conflict_allocno_hard_prefs (ira_allocno_t allocno) +{ + int l, nr = ALLOCNO_NUM_OBJECTS (allocno); + + for (l = 0; l < nr; l++) + { + ira_object_t conflict_obj, obj = ALLOCNO_OBJECT (allocno, l); + ira_object_conflict_iterator oci; + + FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci) + { + ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj); + allocno_color_data_t conflict_data = ALLOCNO_COLOR_DATA (conflict_a); + ira_pref_t pref; + + if (!(hard_reg_set_intersect_p + (ALLOCNO_COLOR_DATA (allocno)->profitable_hard_regs, + conflict_data->profitable_hard_regs))) + continue; + for (pref = ALLOCNO_PREFS (allocno); + pref != NULL; + pref = pref->next_pref) + conflict_data->conflict_allocno_hard_prefs += pref->freq; + } + } +} + /* Restore costs of allocnos connected to ALLOCNO by copies as it was before updating costs of these allocnos from given allocno. This is a wise thing to do as if given allocno did not get an expected @@ -2223,7 +2256,7 @@ bucket_allocno_compare_func (const void { ira_allocno_t a1 = *(const ira_allocno_t *) v1p; ira_allocno_t a2 = *(const ira_allocno_t *) v2p; - int diff, freq1, freq2, a1_num, a2_num; + int diff, freq1, freq2, a1_num, a2_num, pref1, pref2; ira_allocno_t t1 = ALLOCNO_COLOR_DATA (a1)->first_thread_allocno; ira_allocno_t t2 = ALLOCNO_COLOR_DATA (a2)->first_thread_allocno; int cl1 = ALLOCNO_CLASS (a1), cl2 = ALLOCNO_CLASS (a2); @@ -2253,6 +2286,11 @@ bucket_allocno_compare_func (const void a2_num = ALLOCNO_COLOR_DATA (a2)->available_regs_num; if ((diff = a2_num - a1_num) != 0) return diff; + /* Push allocnos with minimal conflict_allocno_hard_prefs first. */ + pref1 = ALLOCNO_COLOR_DATA (a1)->conflict_allocno_hard_prefs; + pref2 = ALLOCNO_COLOR_DATA (a2)->conflict_allocno_hard_prefs; + if ((diff = pref1 - pref2) != 0) + return diff; return ALLOCNO_NUM (a2) - ALLOCNO_NUM (a1); } @@ -2339,7 +2377,8 @@ delete_allocno_from_bucket (ira_allocno_ /* Put allocno A onto the coloring stack without removing it from its bucket. Pushing allocno to the coloring stack can result in moving conflicting allocnos from the uncolorable bucket to the colorable - one. */ + one. Update conflict_allocno_hard_prefs of the conflicting + allocnos which are not on stack yet. */ static void push_allocno_to_stack (ira_allocno_t a) { @@ -2369,15 +2408,19 @@ push_allocno_to_stack (ira_allocno_t a) FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci) { ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj); - + ira_pref_t pref; + conflict_data = ALLOCNO_COLOR_DATA (conflict_a); - if (conflict_data->colorable_p - || ! conflict_data->in_graph_p + if (! conflict_data->in_graph_p || ALLOCNO_ASSIGNED_P (conflict_a) || !(hard_reg_set_intersect_p (ALLOCNO_COLOR_DATA (a)->profitable_hard_regs, conflict_data->profitable_hard_regs))) continue; + for (pref = ALLOCNO_PREFS (a); pref != NULL; pref = pref->next_pref) + conflict_data->conflict_allocno_hard_prefs -= pref->freq; + if (conflict_data->colorable_p) + continue; ira_assert (bitmap_bit_p (coloring_allocno_bitmap, ALLOCNO_NUM (conflict_a))); if (update_left_conflict_sizes_p (conflict_a, a, size)) @@ -3048,21 +3091,12 @@ color_allocnos (void) setup_profitable_hard_regs (); EXECUTE_IF_SET_IN_BITMAP (coloring_allocno_bitmap, 0, i, bi) { - int l, nr; - HARD_REG_SET conflict_hard_regs; allocno_color_data_t data; ira_pref_t pref, next_pref; a = ira_allocnos[i]; - nr = ALLOCNO_NUM_OBJECTS (a); - CLEAR_HARD_REG_SET (conflict_hard_regs); - for (l = 0; l < nr; l++) - { - ira_object_t obj = ALLOCNO_OBJECT (a, l); - IOR_HARD_REG_SET (conflict_hard_regs, - OBJECT_CONFLICT_HARD_REGS (obj)); - } data = ALLOCNO_COLOR_DATA (a); + data->conflict_allocno_hard_prefs = 0; for (pref = ALLOCNO_PREFS (a); pref != NULL; pref = next_pref) { next_pref = pref->next_pref; @@ -3072,6 +3106,7 @@ color_allocnos (void) ira_remove_pref (pref); } } + if (flag_ira_algorithm == IRA_ALGORITHM_PRIORITY) { n = 0; @@ -3134,6 +3169,7 @@ color_allocnos (void) { ALLOCNO_COLOR_DATA (a)->in_graph_p = true; update_costs_from_prefs (a); + update_conflict_allocno_hard_prefs (a); } else { Index: testsuite/gcc.target/i386/pr57193.c =================================================================== --- testsuite/gcc.target/i386/pr57193.c (nonexistent) +++ testsuite/gcc.target/i386/pr57193.c (working copy) @@ -0,0 +1,16 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-times "movdqa" 2 } } */ + +#include + +void test1(const __m128i* in1, const __m128i* in2, __m128i* out, + __m128i f, __m128i zero) +{ + __m128i c = _mm_avg_epu8(*in1, *in2); + __m128i l = _mm_unpacklo_epi8(c, zero); + __m128i h = _mm_unpackhi_epi8(c, zero); + __m128i m = _mm_mulhi_epu16(l, f); + __m128i n = _mm_mulhi_epu16(h, f); + *out = _mm_packus_epi16(m, n); +}