diff mbox series

[OG9,amdgcn,committed] Use GFX9 granulated sgprs count correctly

Message ID ac6db32d-c54e-dc25-aaa6-7cf9e9f17605@codesourcery.com
State New
Headers show
Series [OG9,amdgcn,committed] Use GFX9 granulated sgprs count correctly | expand

Commit Message

Andrew Stubbs Sept. 10, 2019, 11:38 a.m. UTC
This patches adjusts the "granulated sgpr count" kernel settings for
GFX9 devices.

I followed the description I found here:
   http://llvm.org/docs/AMDGPUUsage.html

Basically, GFX9 allocates in blocks of 16, not 8, so there was some
danger of requesting too many registers, which would hurt performance.

Andrew
diff mbox series

Patch

Use GFX9 granulated sgprs count correctly.

2019-09-10  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/gcn/gcn.c (gcn_hsa_declare_function_name): Calculate
	granulated_sgprs according to architecture.

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 66854b6f9c5..f8434e4a4f1 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -4884,6 +4884,14 @@  gcn_hsa_declare_function_name (FILE *file, const char *name, tree)
 	sgpr = 102 - extra_regs;
     }
 
+  /* GFX8 allocates SGPRs in blocks of 8.
+     GFX9 uses blocks of 16.  */
+  int granulated_sgprs;
+  if (TARGET_GCN3)
+    granulated_sgprs = (sgpr + extra_regs + 7) / 8 - 1;
+  else if (TARGET_GCN5)
+    granulated_sgprs = 2 * ((sgpr + extra_regs + 15) / 16 - 1);
+
   fputs ("\t.align\t256\n", file);
   fputs ("\t.type\t", file);
   assemble_name (file, name);
@@ -4922,7 +4930,7 @@  gcn_hsa_declare_function_name (FILE *file, const char *name, tree)
 	   "\t\tcompute_pgm_rsrc2_excp_en = 0\n",
 	   (vgpr - 1) / 4,
 	   /* Must match wavefront_sgpr_count */
-	   (sgpr + extra_regs + 7) / 8 - 1,
+	   granulated_sgprs,
 	   /* The total number of SGPR user data registers requested.  This
 	      number must match the number of user data registers enabled.  */
 	   cfun->machine->args.nsgprs);