[11/11] Increase MAX_MAX_OPERANDS limit

Message ID 20180613185805.7833-12-dimitar@dinux.eu
State New
Headers show
Series
  • New backend for the TI PRU processor
Related show

Commit Message

Dimitar Dimitrov June 13, 2018, 6:58 p.m.
The PRU load/store instructions can access memory with byte
granularity for all 30 of its 32-bit GP registers. Examples:

   # Load 17 bytes from address r0[0] into registers r10.b1-r14.b2
   lbbo r10.b1, r0, 0, 17

   # Load 100 bytes from address r28[0] into registers r0-r25
   lbbo r0.b0, r28, 0, 100

The load/store multiple patterns declare all subsequent registers
as distinct operands. Hence the need to increase the limit.

Increase the value to just 60 in order to avoid modifying regrename.c.

2018-06-13  Dimitar Dimitrov  <dimitar@dinux.eu>

        * genoutput.c (MAX_MAX_OPERANDS): Increase to 60.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
---
 gcc/genoutput.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jeff Law June 22, 2018, 5:33 p.m. | #1
On 06/13/2018 12:58 PM, Dimitar Dimitrov wrote:
> The PRU load/store instructions can access memory with byte
> granularity for all 30 of its 32-bit GP registers. Examples:
> 
>    # Load 17 bytes from address r0[0] into registers r10.b1-r14.b2
>    lbbo r10.b1, r0, 0, 17
> 
>    # Load 100 bytes from address r28[0] into registers r0-r25
>    lbbo r0.b0, r28, 0, 100
> 
> The load/store multiple patterns declare all subsequent registers
> as distinct operands. Hence the need to increase the limit.
> 
> Increase the value to just 60 in order to avoid modifying regrename.c.
> 
> 2018-06-13  Dimitar Dimitrov  <dimitar@dinux.eu>
> 
>         * genoutput.c (MAX_MAX_OPERANDS): Increase to 60.
Not ideal.  I'm not aware of any other port which puts each register
into a class then has an alternative for each class in its patterns.
I'm not even really sure what that's buying you WRT the feature your
noted above.

I'll also note that there a MAX_OPERANDS for gensupport.c which you
probably should investigate.

It's probably not a huge deal, but I'm going to defer judgment on this
right now.

jeff
Jakub Jelinek June 22, 2018, 5:41 p.m. | #2
On Fri, Jun 22, 2018 at 11:33:06AM -0600, Jeff Law wrote:
> On 06/13/2018 12:58 PM, Dimitar Dimitrov wrote:
> > The PRU load/store instructions can access memory with byte
> > granularity for all 30 of its 32-bit GP registers. Examples:
> > 
> >    # Load 17 bytes from address r0[0] into registers r10.b1-r14.b2
> >    lbbo r10.b1, r0, 0, 17
> > 
> >    # Load 100 bytes from address r28[0] into registers r0-r25
> >    lbbo r0.b0, r28, 0, 100
> > 
> > The load/store multiple patterns declare all subsequent registers
> > as distinct operands. Hence the need to increase the limit.

Can't you have a look on how other targets, e.g. arm, aarch64, s390x
etc. handle load/store multiple patterns, e.g. with match_parallel or
match_par_dup?
The instructions then don't have dozens of operands, and the predicate
is just supposed to check everything is the way it should be.

	Jakub
Dimitar Dimitrov June 23, 2018, 12:26 p.m. | #3
On петък, 22 юни 2018 г. 19:41:55 EEST Jakub Jelinek wrote:
> On Fri, Jun 22, 2018 at 11:33:06AM -0600, Jeff Law wrote:
> > On 06/13/2018 12:58 PM, Dimitar Dimitrov wrote:
> > > The PRU load/store instructions can access memory with byte
> > > 
> > > granularity for all 30 of its 32-bit GP registers. Examples:
> > >    # Load 17 bytes from address r0[0] into registers r10.b1-r14.b2
> > >    lbbo r10.b1, r0, 0, 17
> > >    
> > >    # Load 100 bytes from address r28[0] into registers r0-r25
> > >    lbbo r0.b0, r28, 0, 100
> > > 
> > > The load/store multiple patterns declare all subsequent registers
> > > as distinct operands. Hence the need to increase the limit.
> 
> Can't you have a look on how other targets, e.g. arm, aarch64, s390x
> etc. handle load/store multiple patterns, e.g. with match_parallel or
> match_par_dup?
> The instructions then don't have dozens of operands, and the predicate
> is just supposed to check everything is the way it should be.
I took arm/ldmstm.md as an inspiration. See attached machine description for 
PRU that requires the increase. I omitted this machine-generated MD file from 
my first patch set, but per comments will include it in v2.

PRU has a total of 32 32-bit registers with flexible subregister addressing. 
The PRU GCC port represents the register file as 128 individual 8-bit 
registers. Rationale: http://gcc.gnu.org/ml/gcc/2017-01/msg00217.html

Load/store instructions can load anywhere between 1 and 124 consecutive 8-bit 
registers. The load/store-multiple patterns seem to require const_int_operand 
offsets for each loaded register, hence the explosion of operands.

I make no distintion for class - patterns accept any GP register.

Regards,
Dimitar
Jakub Jelinek June 23, 2018, 6:35 p.m. | #4
On Sat, Jun 23, 2018 at 03:26:50PM +0300, Dimitar Dimitrov wrote:
> I took arm/ldmstm.md as an inspiration. See attached machine description for 
> PRU that requires the increase. I omitted this machine-generated MD file from 
> my first patch set, but per comments will include it in v2.
> 
> PRU has a total of 32 32-bit registers with flexible subregister addressing. 
> The PRU GCC port represents the register file as 128 individual 8-bit 
> registers. Rationale: http://gcc.gnu.org/ml/gcc/2017-01/msg00217.html
> 
> Load/store instructions can load anywhere between 1 and 124 consecutive 8-bit 
> registers. The load/store-multiple patterns seem to require const_int_operand 
> offsets for each loaded register, hence the explosion of operands.

If it is consecutive only, then you could represent those that load a lot of
registers using wider modes, so represent e.g. that 124 register load as
15 DImode loads + 1 SImode.

	Jakub
Jeff Law June 28, 2018, 3:29 a.m. | #5
On 06/23/2018 06:26 AM, Dimitar Dimitrov wrote:
> On петък, 22 юни 2018 г. 19:41:55 EEST Jakub Jelinek wrote:
>> On Fri, Jun 22, 2018 at 11:33:06AM -0600, Jeff Law wrote:
>>> On 06/13/2018 12:58 PM, Dimitar Dimitrov wrote:
>>>> The PRU load/store instructions can access memory with byte
>>>>
>>>> granularity for all 30 of its 32-bit GP registers. Examples:
>>>>    # Load 17 bytes from address r0[0] into registers r10.b1-r14.b2
>>>>    lbbo r10.b1, r0, 0, 17
>>>>    
>>>>    # Load 100 bytes from address r28[0] into registers r0-r25
>>>>    lbbo r0.b0, r28, 0, 100
>>>>
>>>> The load/store multiple patterns declare all subsequent registers
>>>> as distinct operands. Hence the need to increase the limit.
>>
>> Can't you have a look on how other targets, e.g. arm, aarch64, s390x
>> etc. handle load/store multiple patterns, e.g. with match_parallel or
>> match_par_dup?
>> The instructions then don't have dozens of operands, and the predicate
>> is just supposed to check everything is the way it should be.
> I took arm/ldmstm.md as an inspiration. See attached machine description for 
> PRU that requires the increase. I omitted this machine-generated MD file from 
> my first patch set, but per comments will include it in v2.
> 
> PRU has a total of 32 32-bit registers with flexible subregister addressing. 
> The PRU GCC port represents the register file as 128 individual 8-bit 
> registers. Rationale: http://gcc.gnu.org/ml/gcc/2017-01/msg00217.html
> 
> Load/store instructions can load anywhere between 1 and 124 consecutive 8-bit 
> registers. The load/store-multiple patterns seem to require const_int_operand 
> offsets for each loaded register, hence the explosion of operands.
> 
> I make no distintion for class - patterns accept any GP register.
Right, but is that level of generality really all that useful?  Based on
what I know about the PRU I'd probably stick mostly to 32bit registers
and only expose the byte level addressibility when it's clearly a win,
particularly for bitfield insertions/extractions.  I probably wouldn't
expose operations which cross 32bit boundaries, except perhaps for
arithmetic through the carry.

I guess my point is I'd like to see a stronger justification for
exposing this much of the architecture before bumping up the maximum
operand limits.

jeff
Dimitar Dimitrov July 20, 2018, 2:12 a.m. | #6
On събота, 23 юни 2018 г. 20:35:23 EEST Jakub Jelinek wrote:
> On Sat, Jun 23, 2018 at 03:26:50PM +0300, Dimitar Dimitrov wrote:
> > I took arm/ldmstm.md as an inspiration. See attached machine description
> > for PRU that requires the increase. I omitted this machine-generated MD
> > file from my first patch set, but per comments will include it in v2.
> > 
> > PRU has a total of 32 32-bit registers with flexible subregister
> > addressing. The PRU GCC port represents the register file as 128
> > individual 8-bit registers. Rationale:
> > http://gcc.gnu.org/ml/gcc/2017-01/msg00217.html
> > 
> > Load/store instructions can load anywhere between 1 and 124 consecutive
> > 8-bit registers. The load/store-multiple patterns seem to require
> > const_int_operand offsets for each loaded register, hence the explosion
> > of operands.
> If it is consecutive only, then you could represent those that load a lot of
> registers using wider modes, so represent e.g. that 124 register load as 15
> DImode loads + 1 SImode.
> 
> 	Jakub
Jeff, Jakub, thank you for raising a concern that increasing MAX_MAX_OPERANDS 
is suspicous.

I think a better approach is to altogether avoid expansion, and instead 
declare define_insn. Advantages are that:
  - machine description is greatly simplified;
  - there is no machine-generated code;
  - I don't need to increase MAX_MAX_OPERANDS.

I'll revise the PRU port and send patch v2. Here is how I intend to implement 
the pattern:

(define_insn "load_multiple"
  [(unspec_volatile
    [(parallel [(match_operand:QI 0 "register_operand" "=r")
                (match_operand:BLK 1 "memory_operand" "m")
                (match_operand:VOID 2 "const_int_operand" "i")])]
    UNSPECV_LOAD_MULTPLE)]
  ""
  "lb%B1o\\t%b0, %1, %2"
  [(set_attr "type" "ld")
   (set_attr "length" "4")])

Regards,
Dimitar
Jeff Law July 23, 2018, 10:22 p.m. | #7
On 07/19/2018 08:12 PM, Dimitar Dimitrov wrote:
> On събота, 23 юни 2018 г. 20:35:23 EEST Jakub Jelinek wrote:
>> On Sat, Jun 23, 2018 at 03:26:50PM +0300, Dimitar Dimitrov wrote:
>>> I took arm/ldmstm.md as an inspiration. See attached machine description
>>> for PRU that requires the increase. I omitted this machine-generated MD
>>> file from my first patch set, but per comments will include it in v2.
>>>
>>> PRU has a total of 32 32-bit registers with flexible subregister
>>> addressing. The PRU GCC port represents the register file as 128
>>> individual 8-bit registers. Rationale:
>>> http://gcc.gnu.org/ml/gcc/2017-01/msg00217.html
>>>
>>> Load/store instructions can load anywhere between 1 and 124 consecutive
>>> 8-bit registers. The load/store-multiple patterns seem to require
>>> const_int_operand offsets for each loaded register, hence the explosion
>>> of operands.
>> If it is consecutive only, then you could represent those that load a lot of
>> registers using wider modes, so represent e.g. that 124 register load as 15
>> DImode loads + 1 SImode.
>>
>> 	Jakub
> Jeff, Jakub, thank you for raising a concern that increasing MAX_MAX_OPERANDS 
> is suspicous.
> 
> I think a better approach is to altogether avoid expansion, and instead 
> declare define_insn. Advantages are that:
>   - machine description is greatly simplified;
>   - there is no machine-generated code;
>   - I don't need to increase MAX_MAX_OPERANDS.
> 
> I'll revise the PRU port and send patch v2. Here is how I intend to implement 
> the pattern:
> 
> (define_insn "load_multiple"
>   [(unspec_volatile
>     [(parallel [(match_operand:QI 0 "register_operand" "=r")
>                 (match_operand:BLK 1 "memory_operand" "m")
>                 (match_operand:VOID 2 "const_int_operand" "i")])]
>     UNSPECV_LOAD_MULTPLE)]
>   ""
>   "lb%B1o\\t%b0, %1, %2"
>   [(set_attr "type" "ld")
>    (set_attr "length" "4")])
So my only worry with that is dataflow -- ie, how many registers have
their values changed isn't expressed in the pattern in a way that the
generic parts of the compiler understand.  That's likely to cause some
problems.

Jeff
Dimitar Dimitrov July 27, 2018, 5:23 a.m. | #8
On Monday, 23/7/2018 16:22:24 EEST Jeff Law wrote:
> On 07/19/2018 08:12 PM, Dimitar Dimitrov wrote:
> > On събота, 23 юни 2018 г. 20:35:23 EEST Jakub Jelinek wrote:
> >> On Sat, Jun 23, 2018 at 03:26:50PM +0300, Dimitar Dimitrov wrote:
> >>> I took arm/ldmstm.md as an inspiration. See attached machine description
> >>> for PRU that requires the increase. I omitted this machine-generated MD
> >>> file from my first patch set, but per comments will include it in v2.
> >>> 
> >>> PRU has a total of 32 32-bit registers with flexible subregister
> >>> addressing. The PRU GCC port represents the register file as 128
> >>> individual 8-bit registers. Rationale:
> >>> http://gcc.gnu.org/ml/gcc/2017-01/msg00217.html
> >>> 
> >>> Load/store instructions can load anywhere between 1 and 124 consecutive
> >>> 8-bit registers. The load/store-multiple patterns seem to require
> >>> const_int_operand offsets for each loaded register, hence the explosion
> >>> of operands.
> >> 
> >> If it is consecutive only, then you could represent those that load a lot
> >> of registers using wider modes, so represent e.g. that 124 register load
> >> as 15 DImode loads + 1 SImode.
> >> 
> >> 	Jakub
> > 
> > Jeff, Jakub, thank you for raising a concern that increasing
> > MAX_MAX_OPERANDS is suspicous.
> > 
> > I think a better approach is to altogether avoid expansion, and instead
> > 
> > declare define_insn. Advantages are that:
> >   - machine description is greatly simplified;
> >   - there is no machine-generated code;
> >   - I don't need to increase MAX_MAX_OPERANDS.
> > 
> > I'll revise the PRU port and send patch v2. Here is how I intend to
> > implement the pattern:
> > 
> > (define_insn "load_multiple"
> > 
> >   [(unspec_volatile
> >   
> >     [(parallel [(match_operand:QI 0 "register_operand" "=r")
> >     
> >                 (match_operand:BLK 1 "memory_operand" "m")
> >                 (match_operand:VOID 2 "const_int_operand" "i")])]
> >     
> >     UNSPECV_LOAD_MULTPLE)]
> >   
> >   ""
> >   "lb%B1o\\t%b0, %1, %2"
> >   [(set_attr "type" "ld")
> >   
> >    (set_attr "length" "4")])
> 
> So my only worry with that is dataflow -- ie, how many registers have
> their values changed isn't expressed in the pattern in a way that the
> generic parts of the compiler understand.  That's likely to cause some
> problems.
My intention was to simplify the machine description, but apparently I went 
too far. I'll instead use the s390x port that Jakub recommended as a starting 
point. It seems to fit the PRU requirements.

Thanks,
Dimitar

Patch

diff --git a/gcc/genoutput.c b/gcc/genoutput.c
index 06456f4400c..d2eb179e813 100644
--- a/gcc/genoutput.c
+++ b/gcc/genoutput.c
@@ -96,7 +96,7 @@  along with GCC; see the file COPYING3.  If not see
    arbitrary limit, but what machine will have an instruction with
    this many operands?  */
 
-#define MAX_MAX_OPERANDS 40
+#define MAX_MAX_OPERANDS 60
 
 static char general_mem[] = { TARGET_MEM_CONSTRAINT, 0 };