From patchwork Mon Jun 21 14:52:29 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tamar Christina <Tamar.Christina@arm.com>
X-Patchwork-Id: 1495165
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dkim=pass (1024-bit key;
 unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256
 header.s=default header.b=j4pcqQyX;
	dkim-atps=neutral
Received: from sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest
 SHA256)
	(No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 4G7sxG0sNvz9sW7
	for <incoming@patchwork.ozlabs.org>; Tue, 22 Jun 2021 00:54:08 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 60EBA383D829
	for <incoming@patchwork.ozlabs.org>; Mon, 21 Jun 2021 14:54:05 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 60EBA383D829
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1624287245;
	bh=LuNLwi6eCmXvmNsIficaQPdmUyUzT8cS27rm77m1u94=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:Cc:From;
	b=j4pcqQyX8yAzKHWUPoMcE1ivDzo4o4+PPFtMLsWpk7LFCQ+E0owD/BxdajhSEyWjB
	 R30U3nbTQ1X1QOsbzGg211k4Db9w86XpyCkh+KuUB6VPgjbkiveS8zeYxkXxxO/xyf
	 9hUWhWOgpOuo9a7FbtwLjXzDWq4AIS3kueEVVCxA=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from EUR04-DB3-obe.outbound.protection.outlook.com
 (mail-eopbgr60066.outbound.protection.outlook.com [40.107.6.66])
 by sourceware.org (Postfix) with ESMTPS id DF2F73861802
 for <gcc-patches@gcc.gnu.org>; Mon, 21 Jun 2021 14:53:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DF2F73861802
Received: from AM5PR0301CA0020.eurprd03.prod.outlook.com
 (2603:10a6:206:14::33) by DB7PR08MB3098.eurprd08.prod.outlook.com
 (2603:10a6:5:1e::11) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4242.16; Mon, 21 Jun
 2021 14:52:58 +0000
Received: from VE1EUR03FT028.eop-EUR03.prod.protection.outlook.com
 (2603:10a6:206:14:cafe::3c) by AM5PR0301CA0020.outlook.office365.com
 (2603:10a6:206:14::33) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4242.15 via Frontend
 Transport; Mon, 21 Jun 2021 14:52:58 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123)
 smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified)
 header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none
 header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 63.35.35.123 as permitted sender) receiver=protection.outlook.com;
 client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by
 VE1EUR03FT028.mail.protection.outlook.com (10.152.18.88) with
 Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.4242.16 via Frontend Transport; Mon, 21 Jun 2021 14:52:57 +0000
Received: ("Tessian outbound 7799c3c2ab28:v96");
 Mon, 21 Jun 2021 14:52:57 +0000
X-CheckRecipientChecked: true
X-CR-MTA-CID: 73680c3c33b056d0
X-CR-MTA-TID: 64aa7808
Received: from 6cbced619b3e.1
 by 64aa7808-outbound-1.mta.getcheckrecipient.com id
 7F272F81-6076-421C-B5F3-EE4EDEABC6FB.1;
 Mon, 21 Jun 2021 14:52:40 +0000
Received: from EUR03-DB5-obe.outbound.protection.outlook.com
 by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id
 6cbced619b3e.1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384);
 Mon, 21 Jun 2021 14:52:40 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=fvWZkE44ZjwvbZ5fGHYYsqv6JTf822Ht9ND3SDV45U/vvAnFaC8BWEvXZZQyZ0vd8TvUK+lW7eiXJ1EvK5CFmCyZvXBJREJTzKWcPKtM2vJHao0aGNfDWOJDupkBDPvv0Nah0QONy4PEOBRBiD7EvP8MOfbdSr2aR/wMmHAsBbh+upRyVa3QgRCCrze687AV+qSfEWS+hdaNH6GVKZNAUXqy46iAkrSWepuDgTpgsgtiwRSU8soxQb6xtvgkXwNu6auLW5hxzy+mRg5Ma3yM5dDzpuz7s81W8SoWp9uYIiRWCEy2wMOapU4LG2bq8SmPF6lDv+NHwt4cmn/BVwi1vQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=LuNLwi6eCmXvmNsIficaQPdmUyUzT8cS27rm77m1u94=;
 b=Z4RaWuRk8Ii6KE4e6spEacT4g1HbtdPMqPrtDCjGC9+6nQfRYWJAc4OFeTpTFLU7hl7lKju1TLTAivN2CgAHCCmlmOIIU793ecIxQTQ+wQWEyxs2rwCpBT8yLvnfc8doMI1k95dJMEvicxqUSnjRVbwd/L5N58y9t43U/Z4c7gNKmg3I33QkpQP31jEhI0bykLDyPaCpyx56sQtodlH4VnsAzFDmBPHxUhr28/A8tl3YbZs9dZ4wn13dpommZvVUTpMd7ww1C+MxI7dhabE929biei4qqpLR/kqdCSNh5MIEGTs4h+MQN/7bDgHmsE2RZDZd50z6uud+zBSB4rItkg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass
 header.d=arm.com; arc=none
Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed)
 header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com;
Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17)
 by VE1PR08MB5678.eurprd08.prod.outlook.com (2603:10a6:800:1a0::20)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4242.19; Mon, 21 Jun
 2021 14:52:36 +0000
Received: from VI1PR08MB5325.eurprd08.prod.outlook.com
 ([fe80::69b2:ae3a:1d7b:5e3c]) by VI1PR08MB5325.eurprd08.prod.outlook.com
 ([fe80::69b2:ae3a:1d7b:5e3c%9]) with mapi id 15.20.4242.023; Mon, 21 Jun 2021
 14:52:36 +0000
Date: Mon, 21 Jun 2021 15:52:29 +0100
To: gcc-patches@gcc.gnu.org
Subject: [PATCH]middle-end[RFC] slp: new implementation of complex numbers
Message-ID: <patch-14582-tamar@arm.com>
Content-Disposition: inline
User-Agent: Mutt/1.9.4 (2018-02-28)
X-Originating-IP: [217.140.106.51]
X-ClientProxiedBy: SA0PR11CA0079.namprd11.prod.outlook.com
 (2603:10b6:806:d2::24) To VI1PR08MB5325.eurprd08.prod.outlook.com
 (2603:10a6:803:13e::17)
MIME-Version: 1.0
X-MS-Exchange-MessageSentRepresentingType: 1
Received: from arm.com (217.140.106.51) by
 SA0PR11CA0079.namprd11.prod.outlook.com (2603:10b6:806:d2::24) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.4242.19 via Frontend Transport; Mon, 21 Jun 2021 14:52:35 +0000
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: 64aa15b1-5831-4fe6-ff45-08d934c44229
X-MS-TrafficTypeDiagnostic: VE1PR08MB5678:|DB7PR08MB3098:
X-Microsoft-Antispam-PRVS: 
 <DB7PR08MB3098E5B7271B6BF7F7F7C628FF0A9@DB7PR08MB3098.eurprd08.prod.outlook.com>
x-checkrecipientrouted: true
NoDisclaimer: true
X-MS-Oob-TLC-OOBClassifiers: OLM:10000;OLM:10000;
X-MS-Exchange-SenderADCheck: 1
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original: 
 nEae12NCykRlcU0VN2JLHFOeq7d18stxmkWjq4RpW8CnymjiDrdgeyc96FHSTHeZlb86cmOXww8nMnZH8rj9xNRtOXx4gBHObXV+4p3NCCkhW0CzBglpdmNJLWeFlzPfHOFOFDQgjml92s7eHtQ6zvRc2tRXZLPqcWOB5fDL9hV4soafuWODyIuvJRhiU3FNMNUOQLbAJkCBMwGOl+iS2AD4wL2mz2H0UzUlZe2yDwBRKxBUCRCKwiN5yA0U98fO+tQDNDFYmBeIEqPE6UwPqcTTGKH4o1lhfz42vh18Oa245+hmF1r2AwgryD/tYicmrhZYWOlr2RGTKO5GsvIE9+r1P6nnU4oo2SoSeZZ8yLZikhn0xX+CDhH1EWvepQVelbeHQJHnrqELR2cGnqqDvx4SlnEj+eY0BNX6lepwoYH9/yjhcAGX0zNLKfX4xR90G7j4ycEihKcsareus4MAgiaJgctdtUBNB9Xfi/CkauqwtqmAyaCVcMLXQBn+gZjWEa72QnN2bn2VQ5R3nJQ97kKG35c2AQEPeP62QQYAQX6tnJUVq8MB0wJuPZ+yrjw2BZhVFGhWznZAfqsK6Hzlr71WCNSqDx2ZabsrWsdK/Y0mGy9GCucj36VEB1qbEa2JP2zi5KK+uVExbqnUuygTZd2MDdVqfrj6Q7STzLaQkF1e+oH1HNz1gMIuwf0v8lJnhCMCB61ADsI5UqrFNdGX5g==
X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en;
 SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com;
 PTR:; CAT:NONE;
 SFS:(4636009)(376002)(39860400002)(396003)(136003)(346002)(366004)(83380400001)(36756003)(6666004)(316002)(30864003)(4326008)(956004)(2616005)(66946007)(55016002)(66476007)(44832011)(66616009)(66556008)(38350700002)(235185007)(2906002)(5660300002)(52116002)(7696005)(86362001)(8886007)(8676002)(8936002)(16526019)(26005)(44144004)(33964004)(6916009)(186003)(38100700002)(478600001)(4216001)(2700100001);
 DIR:OUT; SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?IcrU5CuXTUNHXR2OP84UBkJHdhO4?=
	=?utf-8?q?nO/ctra1SoqVWJcsQiQ3wUlvtAwDsLmuGH2woqQv2dDvI46qAwaxrSZy8CzgofRV1?=
	=?utf-8?q?O8z5iOMTdKkcRSDEpZ3aJY6AJkKvOBPMPG97c3npHskQW5W4IhLvh1I429Poh70fr?=
	=?utf-8?q?CmzQFH6pnG8L2SAQ7C8NO0V/Z4/03wYlOZSbrWfJb/cW11Z2k9+oyleKrzhozZ3qy?=
	=?utf-8?q?sz1k5dHjudowNvrbBveZC0GX6sabROiPVUIXgH0C8z2HnkNRZVsv/pj21YGks3K9B?=
	=?utf-8?q?xuR/p5nLQdEt2fV+ycyDSKC2yitv7l5HYVhwQt7qUTLlZKp/gyvSBPuvpSmznkeBu?=
	=?utf-8?q?xZCg3cL5bFVvluYiXFARF61kNcWgR1iGrVGCn99RzV83Y9+jEJ/cbk3cv2YCMEFKm?=
	=?utf-8?q?obf1NVlvEHxSVt1/XLamY/HbXxMUjGhFDGqo2bRwB30Xe2oDWFm1bA87kf0mpLUxR?=
	=?utf-8?q?8sgc/+yJ5oqBhUqdBvqeyf90AfHoAC80UEwleSdAC2h8eY8TY+W2Hi05Ovwk8qfQE?=
	=?utf-8?q?IrgQoizXEfmGeTVcKVxgeTJyqfmlGP63JnaTgVFl/RCyhac6huOfoq8e4Rxfat6Ah?=
	=?utf-8?q?XkvgqIXDD764i5awqedV7T6ReucJRiDiV0mDn0iLhFLGy691ZFyFWwck7vYwfExz+?=
	=?utf-8?q?DaZHoWKeZrX6N8rbTW5YoMNA8KjRLND4SiBDBYAAqPdmDs4mXRbtt+xVhQh4hcoHG?=
	=?utf-8?q?6hBn0j1Euu117zi/Erxgurcq6Z2DppZ1mFNCwh9Yfs48Wy+tICv76w6X/Lc54pUEg?=
	=?utf-8?q?lcQW9zdGYAv3uYexIAjyh+Q+BV6WyGzUjk8Z8U2cXJf4haa8vSrkaFIeyZqlbfL3Z?=
	=?utf-8?q?koZUF0tndwjxmcWCTs6kPD6mGVcNFTKK+x/hT13hPe49HCJW51mb0qLipQd6CPDjm?=
	=?utf-8?q?5ddGtJnq1N0kKLbNgDMN/xwDZFCNdfxMpKIDEtm6OrjZdvUWbEAqSZcG1SVuw33xt?=
	=?utf-8?q?fSmVdXIgYOqU6bmD5+TdUlIrqG5i8VCgymOZ6MLZ0b309t9H4LKo2h9ymCXBqzIou?=
	=?utf-8?q?XRI/LW8oNHB6bHLKrr4E5PDvZprwPQmSB5leZiXpmM0uf2URTgdNegIFITFmgEU9V?=
	=?utf-8?q?W6qEuWZr5AAvEVCfkynFAgVaTB5qcUFysQ+6NfZXW3Z7IxDQLS2tUIbHiaC4s5ned?=
	=?utf-8?q?4fv56tgar906nspPag6bAVuvpEX8WRLqweUgz6kMiLt8w4ieKRvzKb4ljtZrA5rWv?=
	=?utf-8?q?DGnnnRP601dO/m/q0yVJymVMKLtSFgN8kroQKxe/5vaSs7QxYcMv9k2O/YQsRrrkn?=
	=?utf-8?q?mzFOmhxAaO8OalLA?=
X-MS-Exchange-Transport-Forked: True
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5678
Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed)
 header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com;
X-EOPAttributedMessage: 0
X-MS-Exchange-Transport-CrossTenantHeadersStripped: 
 VE1EUR03FT028.eop-EUR03.prod.protection.outlook.com
X-MS-Office365-Filtering-Correlation-Id-Prvs: 
 efbad6b1-d424-4910-a001-08d934c43571
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: 
 Kb7r2sovrz/eDb8HQa3MRhCc19DpUi24ZnRa/SPkdKa2dOl2vFqjIRb3N22Zw/CiQg8Y9MMotF5fwNQrpyriQrIKGwCa1n32eoOf4jA3ZaRwOYP4IAjXS8J9s2Ghr7LbjuS8M8LBOd1o5sUUbOboPQ2/xrEcYNl0PnTs95IkDyq6q/AHmEfM7wXSYba3YKnwnNrKvU5lReAsw6pBSRh22YytXGwO36T9xAZhB+9fsx8YOqZsjDljg+sRPa7LQMbZQhcQyaMzbiatGnUj0piuJM09mBdNALn4I0NZriCS76lfkx3q/SrNg+kluQ21xY0ScVbyyfZdw6YyEry/jfuoHephLdFQXPzG8L2Ul2g3H14wWPcdq+MQUaM9hWSEhdoupukSQ/eEwUf0Q5JpnzCvLUVbik9WQYpDxmusIheBEGE4Y1cEy0OfAUBTxF9+cQ6/PPZiMMybAXM7ekLkZQ0fo6RRhXuNguHzzR/TwNVx3bWmqmiWard0dyNPjywW2xpvFR20ZInrjtAGMX6zrA1FM7njSj57kcl94dzKtPxVmfaIIgPqI0fn/HdnTt9Er2x3nV9Ne1yXtD4VTF0MlSCVEDzNMG0h2f8Q+7ARkEE8pTgPo85NZCaIM5mDgR7l/8ebTOZ3MoqeR++GSld2wUwIds75jtn8mykMmyLHRa+V4NRy2dvLeqU8dXQbTXCfmUkIkngK8OkGreknT9bNFqtHxg==
X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:;
 IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com;
 PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE;
 SFS:(4636009)(376002)(346002)(396003)(39860400002)(136003)(46966006)(36840700001)(336012)(81166007)(6666004)(6916009)(186003)(316002)(16526019)(66616009)(70586007)(8676002)(7696005)(956004)(70206006)(2906002)(82740400003)(2616005)(26005)(8886007)(44832011)(356005)(55016002)(478600001)(36756003)(47076005)(107886003)(36860700001)(30864003)(5660300002)(8936002)(33964004)(86362001)(82310400003)(4326008)(83380400001)(44144004)(235185007)(4216001)(2700100001);
 DIR:OUT; SFP:1101;
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Jun 2021 14:52:57.6739 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 64aa15b1-5831-4fe6-ff45-08d934c44229
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123];
 Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 VE1EUR03FT028.eop-EUR03.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR08MB3098
X-Spam-Status: No, score=-14.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, GIT_PATCH_0, KAM_LOTSOFHASH, MSGID_FROM_MTA_HEADER,
 RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP,
 UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Tamar Christina via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Tamar Christina <Tamar.Christina@arm.com>
Reply-To: Tamar Christina <tamar.christina@arm.com>
Cc: nd@arm.com, rguenther@suse.de
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>

Hi Richi,

This patch is still very much incomplete and I do know that it is missing things
but it's complete enough such that examples are working and allows me to show
what I'm working towards.

note, that this approach will remove a lot of code in tree-vect-slp-patterns but
to keep the diff readable I've left them in and just commented out the calls or
removed them where needed.

The patch rewrites the complex numbers detection by splitting the detection of
structure from dataflow analysis.  In principle the biggest difference between
this and the previous implementation is that instead of trying to detect valid
complex operations it *makes* an operation a valid complex operation.

To do this each operation gets a dual optab which matches the same structure but
has no dataflow requirement.

i.e. in this patch I added 4, ADDSUB, SUBADD, MUL_ADDSUB, MULL_SUBADD.

There is a then a mapping between these and their variant with the dataflow:

* ADDSUB -> COMPLEX_ADD_ROT270
* SUBADD -> COMPLEX_ADD_ROT90
* MUL_ADDSUB -> COMPLEX_MUL_CONJ
* MUL_SUBADD -> COMPLEX_MUL

with the intention that when we detect the structure of an operation we query
the backend for both optabs.

This should result in one of three states:

 * not supported: Move on.
 * Supports ADDSUB only: Rewrite using ADDSUB, set type to 'cannot_transform'
 * Supports COMPLEX_ADD_ROT270 only: Rewrite using ADDSUB, set type to 'must_transform'
 * Supports both: Rewrite using ADDSUB, set type fo 'can_transform'

with the idea behind `can_transform` is to check the costs of the inverse
permute needed to use the complex operation and if this is very expensive then
stick to addsub.  This requires the target to be able to cost the operations
reasonably correct.

So for ADD this looks like

 === vect_match_slp_patterns ===
 Analyzing SLP tree 0x494e970 for patterns
 Found ADDSUB pattern in SLP tree
 Target does not support ADDSUB for vector type vector(4) float
 Found COMPLEX_ADD_ROT270 pattern in SLP tree
 Target supports COMPLEX_ADD_ROT270 vectorization with mode vector(4) float
Pattern matched SLP tree
node 0x494e970 (max_nunits=4, refcnt=1)
op template: REALPART_EXPR <*_10> = _23;
  stmt 0 REALPART_EXPR <*_10> = _23;
  stmt 1 IMAGPART_EXPR <*_10> = _22;
  children 0x494ea00
node 0x494ea00 (max_nunits=4, refcnt=1)
op template: slp_patt_39 = .ADDSUB (_23, _23);
  stmt 0 _23 = _6 + _13;
  stmt 1 _22 = _12 - _8;
  children 0x494eb20 0x494ebb0
node 0x494eb20 (max_nunits=4, refcnt=1)
op template: _13 = REALPART_EXPR <*_3>;
  stmt 0 _13 = REALPART_EXPR <*_3>;
  stmt 1 _12 = IMAGPART_EXPR <*_3>;
node 0x494ebb0 (max_nunits=4, refcnt=1)
op: VEC_PERM_EXPR
  { }
  lane permutation { 0[1] 0[0] }
  children 0x494ec40
node 0x494ec40 (max_nunits=1, refcnt=2)
op template: _8 = REALPART_EXPR <*_5>;
  stmt 0 _8 = REALPART_EXPR <*_5>;
  stmt 1 _6 = IMAGPART_EXPR <*_5>;
  load permutation { 0 1 }

and later during optimize_slp we get

Tranforming SLP expression from ADDSUB to COMPLEX_ADD_ROT270
processing node 0x494ebb0
simplifying permute node 0x494ebb0
Optimized SLP instances:
node 0x494e970 (max_nunits=4, refcnt=1)
op template: REALPART_EXPR <*_10> = _23;
   stmt 0 REALPART_EXPR <*_10> = _23;
   stmt 1 IMAGPART_EXPR <*_10> = _22;
   children 0x494ea00
node 0x494ea00 (max_nunits=4, refcnt=1)
op template: slp_patt_39 = .COMPLEX_ADD_ROT270 (_23, _23);
   stmt 0 _23 = _6 + _13;
   stmt 1 _22 = _12 - _8;
   children 0x494eb20 0x494ebb0
node 0x494eb20 (max_nunits=4, refcnt=1)
op template: _13 = REALPART_EXPR <*_3>;
   stmt 0 _13 = REALPART_EXPR <*_3>;
   stmt 1 _12 = IMAGPART_EXPR <*_3>;
node 0x494ebb0 (max_nunits=4, refcnt=1)
op: VEC_PERM_EXPR
   { }
   lane permutation { 0[0] 0[1] }
   children 0x494ec40
node 0x494ec40 (max_nunits=1, refcnt=2)
op template: _8 = REALPART_EXPR <*_5>;
   stmt 0 _8 = REALPART_EXPR <*_5>;
   stmt 1 _6 = IMAGPART_EXPR <*_5>;

Now I still have to elide the VEC_PERM_EXPR here but that's easy.

This works for ADD, but it doesn't work well when there's a complicated sequence
of loads.  So for example

#define N 200
void g (double complex a[restrict N], double complex b[restrict N],
	double complex c[restrict N])
{
  for (int i=0; i < N; i++)
    {
      c[i] =  a[i] * b[i];
    }
}

will results in an SLP tree where each operand of the multiply does not get to
see all the original vectors:

Final SLP tree for instance 0x5678a30:
node 0x55965a0 (max_nunits=4, refcnt=2)
op template: REALPART_EXPR <*_7> = _25;
  stmt 0 REALPART_EXPR <*_7> = _25;
  stmt 1 IMAGPART_EXPR <*_7> = _26;
  children 0x5596630
node 0x5596630 (max_nunits=4, refcnt=2)
op: VEC_PERM_EXPR
  stmt 0 _25 = _17 - _22;
  stmt 1 _26 = _23 + _24;
  lane permutation { 0[0] 1[1] }
  children 0x5596a20 0x5596ab0
node 0x5596a20 (max_nunits=1, refcnt=1)
op template: _25 = _17 - _22;
  { }
  children 0x55966c0 0x5596870
node 0x55966c0 (max_nunits=4, refcnt=3)
op template: _17 = _10 * _19;
  stmt 0 _17 = _10 * _19;
  stmt 1 _23 = _10 * _18;
  children 0x5596750 0x55967e0
node 0x5596750 (max_nunits=4, refcnt=2)
op template: _10 = REALPART_EXPR <*_3>;
  stmt 0 _10 = REALPART_EXPR <*_3>;
  stmt 1 _10 = REALPART_EXPR <*_3>;
  load permutation { 0 0 }
node 0x55967e0 (max_nunits=4, refcnt=2)
op template: _19 = REALPART_EXPR <*_5>;
  stmt 0 _19 = REALPART_EXPR <*_5>;
  stmt 1 _18 = IMAGPART_EXPR <*_5>;
  load permutation { 0 1 }
node 0x5596870 (max_nunits=4, refcnt=3)
op template: _22 = _9 * _18;
  stmt 0 _22 = _9 * _18;
  stmt 1 _24 = _9 * _19;
  children 0x5596900 0x5596990
node 0x5596900 (max_nunits=4, refcnt=2)
op template: _9 = IMAGPART_EXPR <*_3>;
  stmt 0 _9 = IMAGPART_EXPR <*_3>;
  stmt 1 _9 = IMAGPART_EXPR <*_3>;
  load permutation { 1 1 }
node 0x5596990 (max_nunits=4, refcnt=2)
op template: _18 = IMAGPART_EXPR <*_5>;
  stmt 0 _18 = IMAGPART_EXPR <*_5>;
  stmt 1 _19 = REALPART_EXPR <*_5>;
  load permutation { 1 0 }
node 0x5596ab0 (max_nunits=1, refcnt=1)
op template: _26 = _23 + _24;
  { }
  children 0x55966c0 0x5596870

So depending on the node each multiply either sees REAL 3 + REAL 5 + IMAG 5 or
IMAG 3 + IMAG 5 + REAL 5.

since we are removing the TWO_OPERANDS node we need to drop one of the multiply
and so we need to give it the original 2 vectors as a parameter.  The current
implementation emits a permute operation to combine the loads again and later
pushes the permute down.  This is problematic as it again requires us to do df
analysis early.

To counter this, in the patch I have changed loads to no longer come out of
build_slp with LOAD_PERMUTES but instead to have a VEC_PERM above each load.

This is only temporary and the idea is that optimize SLP will push them back
down into the loads once it's deemed there are still permutes needed.

So COMPLEX MUL becomes:

Final SLP tree for instance 0x46d9c80:
node 0x45f76c0 (max_nunits=4, refcnt=2)
op template: REALPART_EXPR <*_7> = _25;
  stmt 0 REALPART_EXPR <*_7> = _25;
  stmt 1 IMAGPART_EXPR <*_7> = _26;
  children 0x45f7750
node 0x45f7750 (max_nunits=4, refcnt=2)
op: VEC_PERM_EXPR
  stmt 0 _25 = _17 - _22;
  stmt 1 _26 = _23 + _24;
  lane permutation { 0[0] 1[1] }
  children 0x45f7bd0 0x45f7c60
node 0x45f7bd0 (max_nunits=1, refcnt=1)
op template: _25 = _17 - _22;
  { }
  children 0x45f77e0 0x45f7a20
node 0x45f77e0 (max_nunits=4, refcnt=3)
op template: _17 = _10 * _19;
  stmt 0 _17 = _10 * _19;
  stmt 1 _23 = _10 * _18;
  children 0x45f7870 0x45f7990
node 0x45f7870 (max_nunits=4, refcnt=2)
op: VEC_PERM_EXPR
  { }
  lane permutation { 0[0] 0[0] }
  children 0x45f7900
node 0x45f7900 (max_nunits=1, refcnt=4)
op template: _10 = REALPART_EXPR <*_3>;
  stmt 0 _10 = REALPART_EXPR <*_3>;
  stmt 1 _9 = IMAGPART_EXPR <*_3>;
  load permutation { 0 1 }
node 0x45f7990 (max_nunits=4, refcnt=3)
op template: _19 = REALPART_EXPR <*_5>;
  stmt 0 _19 = REALPART_EXPR <*_5>;
  stmt 1 _18 = IMAGPART_EXPR <*_5>;
  load permutation { 0 1 }
node 0x45f7a20 (max_nunits=4, refcnt=3)
op template: _22 = _9 * _18;
  stmt 0 _22 = _9 * _18;
  stmt 1 _24 = _9 * _19;
  children 0x45f7ab0 0x45f7b40
node 0x45f7ab0 (max_nunits=4, refcnt=2)
op: VEC_PERM_EXPR
  { }
  lane permutation { 0[1] 0[1] }
  children 0x45f7900
node 0x45f7b40 (max_nunits=4, refcnt=2)
op: VEC_PERM_EXPR
  { }
  lane permutation { 0[1] 0[0] }
  children 0x45f7990
node 0x45f7c60 (max_nunits=1, refcnt=1)
op template: _26 = _23 + _24;
  { }
  children 0x45f77e0 0x45f7a20

This representation has a number of advantages:

 * complex mul becomes trivial, as trivial as add.
 * add now detects itself as part of a sequence of complex mul.  i.e. if you
   have only add, it will use it instead of failing to follow the df like now.
 * It opens the door a bit more to vectorize the first loop in x264.  The code
   there has a cross iteration mix of loading the full vector but in an order
   that the vectorizer thinks it can't use a single linear load for.  e.g.

   .. = (a[0] + b[0]) - (a[4] + b[4])
   .. = (a[1] + b[1]) - (a[5] + b[5])
   .. = (a[2] + b[2]) - (a[6] + b[6])
   .. = (a[3] + b[3]) - (a[7] + b[6])

 Ultimately it thinks it's cheaper to use scalar loads here, while (at least for
 aarch64) if we see the unrolled loop together we can pattern match this sequence
 and rewrite it to something that uses the linear load. This is because the
 addition and subtract are widening.  So the widening arithmetic operations we
 have already only read half the vector since they're widening.

 For this to work, we need to see all 16 bytes loads, so the full unrolled loop
 of 4 iterations. Which currently doesn't happen for NEON but does for SVE. (though
 I wonder if now that you've added SLP decomposition if the initial SLP build
 shouldn't allow building of SLP trees that are wider than the vector length and
 later just decompose it or bump up the VF.).  However this is something for
 another discussion :).

 Changing the loads like this makes things a lot simpler and we require no
 additional work anywhere else.  The complexity ends up in optimize SLP which
 now has the following changes:

 * loads no longer introduce permutes, so they don't see the permutations.
   Instead each load can lead to a permute which can see the permutes.
 * As we're moving permutes up we stop at call boundaries as well.
 * As we're calculating materialization points for the permute we inspect the
   node and check if we can transform it, if so we mark it for transformation.
 * After calculating the materialization and transform points we iterate over
   the transform points checking whether we should transform.  If we should we
   modify the permute at the materialization point (which should be at the
   transform point) into the inverse permute.
 * During materialization as the permute is transformed this would undo the
   permutation if we transformed the instruction.
 * After computing this we elide the permute node, or push it down towards the
   load and then back into the load itself, splitting the node if required.

After this the tree should be back to it's normal form.  I still need to work
out when and where one needs to push down the permute all the way to the loads.

But this covers my thoughts.  Any feedback is appreciated to get it right early
on :)

Regards,
Tamar

--- inline copy of patch --
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index b2f414d2131b867eda337cd30f5ed40ed7c9fa10..73d1d742414d16e06e8312967990d565eddb218d 100644

diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index b2f414d2131b867eda337cd30f5ed40ed7c9fa10..73d1d742414d16e06e8312967990d565eddb218d 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -277,6 +277,10 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+DEF_INTERNAL_OPTAB_FN (ADDSUB, ECF_CONST, addsub, binary)
+DEF_INTERNAL_OPTAB_FN (SUBADD, ECF_CONST, subadd, binary)
+DEF_INTERNAL_OPTAB_FN (MUL_ADDSUB, ECF_CONST, mul_addsub, binary)
+DEF_INTERNAL_OPTAB_FN (MUL_SUBADD, ECF_CONST, mul_subadd, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..73d23c49bdfa7423f67a6240650c558cfaf3f2f0 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -290,6 +290,10 @@ OPTAB_D (atan_optab, "atan$a2")
 OPTAB_D (atanh_optab, "atanh$a2")
 OPTAB_D (copysign_optab, "copysign$F$a3")
 OPTAB_D (xorsign_optab, "xorsign$F$a3")
+OPTAB_D (addsub_optab, "addsub$a3")
+OPTAB_D (subadd_optab, "subadd$a3")
+OPTAB_D (mul_addsub_optab, "mul_addsub$a3")
+OPTAB_D (mul_subadd_optab, "mul_subadd$a3")
 OPTAB_D (cadd90_optab, "cadd90$a3")
 OPTAB_D (cadd270_optab, "cadd270$a3")
 OPTAB_D (cmul_optab, "cmul$a3")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 2ed49cd9edcabd7948b365dd60d7405b79079a7b..ce8d2daa1f06f17eda2cb08255d76bac9cf263f0 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -68,6 +68,39 @@ along with GCC; see the file COPYING3.  If not see
  * vect_pattern class
  ******************************************************************************/
 
+static bool
+vect_pattern_validate_optab (internal_fn ifn, slp_tree node);
+
+
+/* Check if the base or complex equivalents are supported.  */
+static bool
+vect_pattern_validate_optab_or_eq (internal_fn ifn, slp_tree node)
+{
+  if (vect_pattern_validate_optab (ifn, node))
+    return true;
+
+  switch (ifn)
+  {
+    case IFN_ADDSUB:
+      ifn = IFN_COMPLEX_ADD_ROT270;
+      break;
+    case IFN_SUBADD:
+      ifn = IFN_COMPLEX_ADD_ROT90;
+      break;
+    case IFN_MUL_ADDSUB:
+      ifn = IFN_COMPLEX_MUL_CONJ;
+      break;
+    case IFN_MUL_SUBADD:
+      ifn = IFN_COMPLEX_MUL;
+      break;
+    default:
+      gcc_unreachable ();
+  }
+
+  return vect_pattern_validate_optab (ifn, node);
+}
+
+
 /* Default implementation of recognize that performs matching, validation and
    replacement of nodes but that can be overriden if required.  */
 
@@ -630,8 +663,7 @@ complex_add_pattern::build (vec_info *vinfo)
 
   /* First re-arrange the children.  */
   SLP_TREE_CHILDREN (*this->m_node)[0] = children[0];
-  SLP_TREE_CHILDREN (*this->m_node)[1] =
-    vect_build_swap_evenodd_node (children[1]);
+  SLP_TREE_CHILDREN (*this->m_node)[1] = children[1];
 
   SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (*this->m_node)[0])++;
   SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (*this->m_node)[1])++;
@@ -670,9 +702,9 @@ complex_add_pattern::matches (complex_operation_t op,
       Rotation 0 and 180 can be handled by normal SIMD code, so we don't need
       to care about them here.  */
   if (op == MINUS_PLUS)
-    ifn = IFN_COMPLEX_ADD_ROT90;
+    ifn = IFN_SUBADD;
   else if (op == PLUS_MINUS)
-    ifn = IFN_COMPLEX_ADD_ROT270;
+    ifn = IFN_ADDSUB;
   else
     return ifn;
 
@@ -680,17 +712,7 @@ complex_add_pattern::matches (complex_operation_t op,
      we support.  */
   gcc_assert (ops->length () == 2);
 
-  vec<slp_tree> children = SLP_TREE_CHILDREN ((*ops)[0]);
-
-  /* First node must be unpermuted.  */
-  if (linear_loads_p (perm_cache, children[0]) != PERM_EVENODD)
-    return IFN_LAST;
-
-  /* Second node must be permuted.  */
-  if (linear_loads_p (perm_cache, children[1]) != PERM_ODDEVEN)
-    return IFN_LAST;
-
-  if (!vect_pattern_validate_optab (ifn, *node))
+  if (!vect_pattern_validate_optab_or_eq (ifn, *node))
     return IFN_LAST;
 
   return ifn;
@@ -1501,7 +1523,8 @@ vect_pattern_decl_t slp_patterns[]
      order patterns from the largest to the smallest.  Especially if they
      overlap in what they can detect.  */
 
-  SLP_PATTERN (complex_operations_pattern),
+  SLP_PATTERN (complex_add_pattern),
+  //SLP_PATTERN (complex_operations_pattern),
 };
 #undef SLP_PATTERN
 
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 0c1f85beeb2e9f3fb7c66c15d4d30594b2570f9e..f24ad85d15780807c49153895b7340c9b4d4d1f3 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1764,25 +1764,69 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
 	{
 	  *max_nunits = this_max_nunits;
 	  (*tree_size)++;
-	  node = vect_create_new_slp_node (node, stmts, 0);
-	  SLP_TREE_VECTYPE (node) = vectype;
+	  node = vect_create_new_slp_node (node, stmts, 1);
 	  /* And compute the load permutation.  Whether it is actually
 	     a permutation depends on the unrolling factor which is
-	     decided later.  */
-	  vec<unsigned> load_permutation;
+	     decided later.  We specifically materialize permutes at this
+	     early stage and leave it up to optimize SLP to push them back
+	     into the load if needed.  */
+	  lane_permutation_t lane_permutation;
+
+	  /* See the loads with a linear permute, this so that optimize_slp
+	     can later push the permute upwards and downwards without needing
+	     to inspect the parent node.  */
+	  load_permutation_t load_permutation;
 	  int j;
-	  stmt_vec_info load_info;
+	  stmt_vec_info load_info, curr_load;
+	  lane_permutation.create (group_size);
 	  load_permutation.create (group_size);
+	  vec<stmt_vec_info> group_stmts;
+	  bool linear = true;
 	  stmt_vec_info first_stmt_info
 	    = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (node)[0]);
+	  group_stmts.create (DR_GROUP_SIZE (first_stmt_info));
+	  curr_load = first_stmt_info;
 	  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), j, load_info)
 	    {
 	      int load_place = vect_get_place_in_interleaving_chain
 		  (load_info, first_stmt_info);
 	      gcc_assert (load_place != -1);
-	      load_permutation.safe_push (load_place);
+
+	      lane_permutation.safe_push (std::make_pair (0, load_place));
+	      load_permutation.quick_push (j);
+
+	      group_stmts.quick_push (curr_load);
+	      linear = linear && curr_load == load_info;
+	      curr_load = DR_GROUP_NEXT_ELEMENT (curr_load);
+	    }
+
+	  if (!linear)
+	    {
+	      slp_tree child;
+	      if (slp_tree *load = bst_map->get (group_stmts))
+		child = *load;
+	      else
+		{
+		  child =  vect_create_new_slp_node (group_stmts.copy (), 0);
+		  SLP_TREE_VECTYPE (child) = vectype;
+		  /* One for me, one for the BST map.  */
+		  SLP_TREE_REF_COUNT (child)++;
+		  bst_map->put (group_stmts, child);
+		}
+
+	      /* Set some metadata on the load node.  */
+	      SLP_TREE_REF_COUNT (child)++;
+	      SLP_TREE_LANES (child) = SLP_TREE_LANES (node);
+	      SLP_TREE_LOAD_PERMUTATION (child) = load_permutation;
+
+	      /* But return a permute node.  */
+	      SLP_TREE_LANE_PERMUTATION (node) = lane_permutation;
+	      SLP_TREE_CODE (node) = VEC_PERM_EXPR;
+	      SLP_TREE_CHILDREN (node).quick_push (child);
+	      SLP_TREE_SCALAR_STMTS (node) = vNULL;
 	    }
-	  SLP_TREE_LOAD_PERMUTATION (node) = load_permutation;
+
+	  SLP_TREE_VECTYPE (node) = vectype;
 	  return node;
 	}
     }
@@ -3430,8 +3474,8 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
       FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
 	{
 	  slp_tree root = SLP_INSTANCE_TREE (instance);
-	  optimize_load_redistribution (bst_map, vinfo, SLP_TREE_LANES (root),
-					&load_map, root);
+	  //optimize_load_redistribution (bst_map, vinfo, SLP_TREE_LANES (root),
+	//				&load_map, root);
 	}
     }
 
@@ -3548,6 +3592,83 @@ vect_slp_perms_eq (const vec<vec<unsigned> > &perms,
 			 sizeof (unsigned) * perms[perm_a].length ()) == 0));
 }
 
+static bool
+is_pattern_convert_candidate_p (slp_tree node)
+{
+  stmt_vec_info stmt_info;
+  if (!(stmt_info = SLP_TREE_REPRESENTATIVE (node))
+      || !STMT_VINFO_SLP_VECT_ONLY_PATTERN (stmt_info))
+    return false;
+
+  gimple* stmt = STMT_VINFO_STMT (stmt_info);
+  if (!is_gimple_call (stmt))
+    return false;
+
+  if (!gimple_call_internal_p (stmt))
+    return false;
+
+  internal_fn ifn = gimple_call_internal_fn (stmt);
+
+  switch (ifn)
+    {
+    case IFN_ADDSUB:
+    case IFN_SUBADD:
+    case IFN_MUL_ADDSUB:
+    case IFN_MUL_SUBADD:
+      return true;
+    default:
+      break;
+    }
+
+  return false;
+}
+
+static internal_fn
+vect_get_transformed_pattern (internal_fn ifn, slp_tree /* node */)
+{
+  switch (ifn)
+    {
+    case IFN_ADDSUB:
+      ifn = IFN_COMPLEX_ADD_ROT270;
+      break;
+    case IFN_SUBADD:
+      ifn = IFN_COMPLEX_ADD_ROT90;
+      break;
+    case IFN_MUL_ADDSUB:
+      ifn = IFN_COMPLEX_MUL_CONJ;
+      break;
+    case IFN_MUL_SUBADD:
+      ifn = IFN_COMPLEX_MUL;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  return ifn;
+}
+
+
+static load_permutation_t
+vect_reverse_perm (load_permutation_t perm)
+{
+  auto_vec<std::pair<unsigned, unsigned> > pairs;
+  pairs.create (perm.length ());
+  unsigned i;
+  for (i = 0; i < perm.length (); i++)
+    pairs.quick_push (std::make_pair (i, perm[i]));
+
+  typedef const std::pair<unsigned, unsigned>* cmp_t;
+  pairs.qsort ([](const void* a, const void* b) -> int
+    { return (int)((cmp_t)a)->second - (int)((cmp_t)b)->second; });
+
+  load_permutation_t rev_perm;
+  rev_perm.create (perm.length ());
+  for (i = 0; i < perm.length (); i++)
+    rev_perm.quick_push (pairs[i].first);
+
+  return rev_perm;
+}
+
 /* Optimize the SLP graph of VINFO.  */
 
 void
@@ -3578,11 +3699,13 @@ vect_optimize_slp (vec_info *vinfo)
 
   auto_sbitmap n_visited (vertices.length ());
   auto_sbitmap n_materialize (vertices.length ());
+  auto_sbitmap n_transform (vertices.length ());
   auto_vec<int> n_perm (vertices.length ());
   auto_vec<vec<unsigned> > perms;
 
   bitmap_clear (n_visited);
   bitmap_clear (n_materialize);
+  bitmap_clear (n_transform);
   n_perm.quick_grow_cleared (vertices.length ());
   perms.safe_push (vNULL); /* zero is no permute */
 
@@ -3602,55 +3725,73 @@ vect_optimize_slp (vec_info *vinfo)
 	 as entries to the reverse graph.  */
       if (!slpg->vertices[idx].succ)
 	bitmap_set_bit (n_visited, idx);
-      /* Loads are the only thing generating permutes.  */
-      if (!SLP_TREE_LOAD_PERMUTATION (node).exists ())
-	continue;
 
-      /* If splitting out a SLP_TREE_LANE_PERMUTATION can make the
-	 node unpermuted, record this permute.  */
-      stmt_vec_info dr_stmt = SLP_TREE_REPRESENTATIVE (node);
-      if (!STMT_VINFO_GROUPED_ACCESS (dr_stmt))
-	continue;
-      dr_stmt = DR_GROUP_FIRST_ELEMENT (dr_stmt);
-      unsigned imin = DR_GROUP_SIZE (dr_stmt) + 1, imax = 0;
-      bool any_permute = false;
-      for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
+      int perm_seed = 0;
+
+      for (graph_edge *pred = slpg->vertices[idx].pred;
+	   pred; pred = pred->pred_next)
 	{
-	  unsigned idx = SLP_TREE_LOAD_PERMUTATION (node)[j];
-	  imin = MIN (imin, idx);
-	  imax = MAX (imax, idx);
-	  if (idx - SLP_TREE_LOAD_PERMUTATION (node)[0] != j)
-	    any_permute = true;
-	}
-      /* If there's no permute no need to split one out.  */
-      if (!any_permute)
-	continue;
-      /* If the span doesn't match we'd disrupt VF computation, avoid
-	 that for now.  */
-      if (imax - imin + 1 != SLP_TREE_LANES (node))
-	continue;
+	  int src_idx = pred->src;
+	  slp_tree src_node = vertices[src_idx];
 
-      /* For now only handle true permutes, like
-	 vect_attempt_slp_rearrange_stmts did.  This allows us to be lazy
-	 when permuting constants and invariants keeping the permute
-	 bijective.  */
-      auto_sbitmap load_index (SLP_TREE_LANES (node));
-      bitmap_clear (load_index);
-      for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
-	bitmap_set_bit (load_index, SLP_TREE_LOAD_PERMUTATION (node)[j] - imin);
-      unsigned j;
-      for (j = 0; j < SLP_TREE_LANES (node); ++j)
-	if (!bitmap_bit_p (load_index, j))
-	  break;
-      if (j != SLP_TREE_LANES (node))
-	continue;
+	  /* Loads are the only thing generating permutes, however the permutes
+	     are not yet lowered into the loads.  So instead chase up the chain
+	     to find a permute node.  */
+	  if (!SLP_TREE_LANE_PERMUTATION (src_node).exists ())
+	    continue;
+
+	  /* If splitting out a SLP_TREE_LANE_PERMUTATION can make the
+	     node unpermuted, record this permute.  */
+	  stmt_vec_info dr_stmt = SLP_TREE_REPRESENTATIVE (node);
+	  if (!STMT_VINFO_GROUPED_ACCESS (dr_stmt))
+	    continue;
+	  dr_stmt = DR_GROUP_FIRST_ELEMENT (dr_stmt);
+	  unsigned imin = DR_GROUP_SIZE (dr_stmt) + 1, imax = 0;
+	  bool any_permute = false;
+	  for (unsigned j = 0; j < SLP_TREE_LANES (src_node); ++j)
+	    {
+	      unsigned src_op = SLP_TREE_LANE_PERMUTATION (src_node)[j].first;
+	      unsigned idx = SLP_TREE_LANE_PERMUTATION (src_node)[j].second;
+	      /* This isn't a load permute moved out.  */
+	      if (src_op != 0)
+		continue;
+	      imin = MIN (imin, idx);
+	      imax = MAX (imax, idx);
+	      if (idx - SLP_TREE_LANE_PERMUTATION (src_node)[0].second != j)
+		any_permute = true;
+	    }
+	  /* If there's no permute no need to split one out.  */
+	  if (!any_permute)
+	    continue;
+	  /* If the span doesn't match we'd disrupt VF computation, avoid
+	     that for now.  */
+	  if (imax - imin + 1 != SLP_TREE_LANES (node))
+	    continue;
+
+	  /* For now only handle true permutes, like
+	     vect_attempt_slp_rearrange_stmts did.  This allows us to be lazy
+	     when permuting constants and invariants keeping the permute
+	     bijective.  */
+	  auto_sbitmap load_index (SLP_TREE_LANES (src_node));
+	  bitmap_clear (load_index);
+	  for (unsigned j = 0; j < SLP_TREE_LANES (src_node); ++j)
+	    bitmap_set_bit (load_index, SLP_TREE_LANE_PERMUTATION (src_node)[j].second - imin);
+	  unsigned j;
+	  for (j = 0; j < SLP_TREE_LANES (src_node); ++j)
+	  if (!bitmap_bit_p (load_index, j))
+	    break;
+	  if (j != SLP_TREE_LANES (src_node))
+	    continue;
 
-      vec<unsigned> perm = vNULL;
-      perm.safe_grow (SLP_TREE_LANES (node), true);
-      for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
-	perm[j] = SLP_TREE_LOAD_PERMUTATION (node)[j] - imin;
-      perms.safe_push (perm);
-      n_perm[idx] = perms.length () - 1;
+	  vec<unsigned> perm = vNULL;
+	  perm.safe_grow (SLP_TREE_LANES (src_node), true);
+	  for (unsigned j = 0; j < SLP_TREE_LANES (src_node); ++j)
+	    perm[j] = SLP_TREE_LANE_PERMUTATION (src_node)[j].second - imin;
+	  perms.safe_push (perm);
+	  n_perm[src_idx] = perms.length () - 1;
+	  perm_seed = -1;
+       }
+      n_perm[idx] = perm_seed;
     }
 
   /* Propagate permutes along the graph and compute materialization points.  */
@@ -3667,12 +3808,12 @@ vect_optimize_slp (vec_info *vinfo)
 	  slp_tree node = vertices[idx];
 	  /* For leafs there's nothing to do - we've seeded permutes
 	     on those above.  */
-	  if (SLP_TREE_DEF_TYPE (node) != vect_internal_def)
+	  if (SLP_TREE_DEF_TYPE (node) != vect_internal_def || !slpg->vertices[idx].succ)
 	    continue;
 
 	  bitmap_set_bit (n_visited, idx);
 
-	  /* We cannot move a permute across a store.  */
+	  /* We cannot move a permute across a store.  TODO: or a call.  */
 	  if (STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (node))
 	      && DR_IS_WRITE
 		   (STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (node))))
@@ -3723,6 +3864,12 @@ vect_optimize_slp (vec_info *vinfo)
 	      changed = true;
 	    }
 
+	  /* Check to see if this node can be transformed during permute
+	     materialization.  */
+	  bool patt_trans_cand_p = is_pattern_convert_candidate_p (node);
+	  if (patt_trans_cand_p)
+	    bitmap_set_bit (n_transform, idx);
+
 	  if (perm == 0)
 	    continue;
 
@@ -3764,6 +3911,46 @@ vect_optimize_slp (vec_info *vinfo)
     }
   while (changed || iteration == 1);
 
+  bitmap_clear (n_visited);
+
+  /* Transform and record any permutes for the usage of the instruction.
+     TODO: Check if it's cheaper to keep ADDSUB if available or use to
+	   COMPLEX_ADD based on the reverse permute that's needed.  */
+  sbitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (n_transform, 0, i, bi)
+    {
+      slp_tree node = vertices[i];
+
+      for (graph_edge *succ = slpg->vertices[i].succ;
+		       succ; succ = succ->succ_next)
+	{
+	  int perm_id = n_perm[succ->dest];
+	  if (perm_id <= 0)
+	    continue;
+
+	  load_permutation_t linperm = vect_reverse_perm (perms[perm_id]);
+	  perms[perm_id].release ();
+	  perms[perm_id] = linperm;
+
+          bitmap_set_bit (n_materialize, succ->dest);
+	}
+
+      /* Transform the node if we have determined that it's beneficial to do so.
+	 when we do so the perm that has been calculated for the children of the
+	 node will be applied.  */
+      stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node);
+      gimple* stmt = STMT_VINFO_STMT (stmt_info);
+      internal_fn old_ifn = gimple_call_internal_fn (stmt);
+      internal_fn ifn = vect_get_transformed_pattern (old_ifn ,node);
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "Tranforming SLP expression from %s to %s\n",
+			 internal_fn_name (old_ifn), internal_fn_name (ifn));
+      gimple_call_set_internal_fn (as_a <gcall *> (stmt), ifn);
+    }
+
+
   /* Materialize.  */
   for (i = 0; i < vertices.length (); ++i)
     {
@@ -3798,6 +3985,11 @@ vect_optimize_slp (vec_info *vinfo)
 			    SLP_TREE_SCALAR_OPS (child), true);
 	}
 
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "processing node %p\n",
+			 node);
+
       if (bitmap_bit_p (n_materialize, i))
 	{
 	  if (SLP_TREE_LOAD_PERMUTATION (node).exists ())
@@ -3986,6 +4178,16 @@ vect_optimize_slp (vec_info *vinfo)
 	    }
 	}
     }
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location, "Optimized SLP instances: \n");
+      hash_set<slp_tree> visited;
+      slp_instance instance;
+      FOR_EACH_VEC_ELT (vinfo->slp_instances, i, instance)
+	vect_print_slp_graph (MSG_NOTE, vect_location,
+			      SLP_INSTANCE_TREE (instance), visited);
+    }
 }
 
 /* Gather loads reachable from the individual SLP graph entries.  */