diff mbox series

RISC-V: Allow more load/stores to be compressed

Message ID f413b5c6-1ba5-b83a-a7b7-991fb0e3d8b2@embecosm.com
State New
Headers show
Series RISC-V: Allow more load/stores to be compressed | expand

Commit Message

Craig Blackmore Sept. 12, 2019, 4:19 p.m. UTC
This patch aims to allow more load/store instructions to be compressed by
replacing a load/store of 'base register + large offset' with a new load/store
of 'new base + small offset'. If the new base gets stored in a compressed
register, then the new load/store can be compressed. Since there is an overhead
in creating the new base, this change is only attempted when 'base register' is
referenced in at least 4 load/stores in a basic block.

The optimization is implemented in a new RISC-V specific pass called
shorten_memrefs which is enabled for RVC targets. It has been developed for the
32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.

The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes
compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the
Embench benchmark suite (https://www.embench.org/) we see code size reductions
of up to 18 bytes (0.7%) and only two cases where code size is increased
slightly, by 2 bytes each:

Embench results (.text size in bytes, excluding .rodata)

Benchmark       Without patch  With patch  Diff
aha-mont64      1052           1052        0
crc32           232            232         0
cubic           2446           2448        2
edn             1454           1450        -4
huffbench       1642           1642        0
matmult-int     420            420         0
minver          1056           1056        0
nbody           714            714         0
nettle-aes      2888           2884        -4
nettle-sha256   5566           5564        -2
nsichneu        15052          15052       0
picojpeg        8078           8078        0
qrduino         6140           6140        0
sglib-combined  2444           2444        0
slre            2438           2420        -18
st              880            880         0
statemate       3842           3842        0
ud              702            702         0
wikisort        4278           4280        2
-------------------------------------------------
Total           61324          61300       -24

The patch has been tested on the following bare metal targets using QEMU
and there were no regressions:

  rv32i
  rv32iac
  rv32im
  rv32imac
  rv32imafc
  rv64imac
  rv64imafdc

We noticed that sched2 undoes some of the addresses generated by this
optimization and consequently increases code size, therefore this patch adds a
check in sched-deps.c to avoid changes that are expected to increase code size
when not optimizing for speed. Since this change touches target-independent
code, the patch has been bootstrapped and tested on x86 with no regressions.

gcc/ChangeLog

	* config/riscv/riscv.c (tree-pass.h): New include.
	(cfg.h) Likewise.
	(context.h) Likewise.
	(riscv_compressed_reg_p): New function.
	(riscv_compressed_lw_address_p): Likewise.
	(riscv_legitimize_address): Attempt to convert base + large_offset
	to compressible new_base + small_offset.
	(riscv_address_cost): Make anticipated compressed load/stores
	cheaper for code size than uncompressed load/stores.
	(class pass_shorten_memrefs): New pass.
	(pass_shorten_memrefs::execute): Likewise.
	(make_pass_shorten_memrefs): Likewise.
	(riscv_option_override): Register shorten_memrefs pass for
	TARGET_RVC.
	(riscv_register_priority): Move compressed register check to
	riscv_compressed_reg_p.
	* sched-deps.c (attempt_change): When optimizing for code size
	don't make change if it increases code size.

---
 gcc/config/riscv/riscv.c | 179 +++++++++++++++++++++++++++++++++++++++++++++--
 gcc/sched-deps.c         |  10 +++
 2 files changed, 183 insertions(+), 6 deletions(-)

Comments

Kito Cheng Sept. 18, 2019, 10:01 a.m. UTC | #1
Hi Craig:

Some general review comment:
- Split new pass into new file.
- Add new option to enable/disable this pass.
- Could you extend this patch to support lw/sw/ld/sd/flw/fsw/fld/fsd?
  I think there is lots of common logic for supporting other types
compressed load/store
  instruction, but I'd like to see those support at once.
- Do you have experimental data about doing that after register
allocation/reload,
  I'd prefer doing such optimization after RA, because we can
accurately estimate
  how many byte we can gain, I guess it because RA didn't assign fit
src/dest reg
  or base reg so that increase code size?

On Fri, Sep 13, 2019 at 12:20 AM Craig Blackmore
<craig.blackmore@embecosm.com> wrote:
>
> This patch aims to allow more load/store instructions to be compressed by
> replacing a load/store of 'base register + large offset' with a new load/store
> of 'new base + small offset'. If the new base gets stored in a compressed
> register, then the new load/store can be compressed. Since there is an overhead
> in creating the new base, this change is only attempted when 'base register' is
> referenced in at least 4 load/stores in a basic block.
>
> The optimization is implemented in a new RISC-V specific pass called
> shorten_memrefs which is enabled for RVC targets. It has been developed for the
> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.
>
> The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes
> compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the
> Embench benchmark suite (https://www.embench.org/) we see code size reductions
> of up to 18 bytes (0.7%) and only two cases where code size is increased
> slightly, by 2 bytes each:
>
> Embench results (.text size in bytes, excluding .rodata)
>
> Benchmark       Without patch  With patch  Diff
> aha-mont64      1052           1052        0
> crc32           232            232         0
> cubic           2446           2448        2
> edn             1454           1450        -4
> huffbench       1642           1642        0
> matmult-int     420            420         0
> minver          1056           1056        0
> nbody           714            714         0
> nettle-aes      2888           2884        -4
> nettle-sha256   5566           5564        -2
> nsichneu        15052          15052       0
> picojpeg        8078           8078        0
> qrduino         6140           6140        0
> sglib-combined  2444           2444        0
> slre            2438           2420        -18
> st              880            880         0
> statemate       3842           3842        0
> ud              702            702         0
> wikisort        4278           4280        2
> -------------------------------------------------
> Total           61324          61300       -24
>
> The patch has been tested on the following bare metal targets using QEMU
> and there were no regressions:
>
>   rv32i
>   rv32iac
>   rv32im
>   rv32imac
>   rv32imafc
>   rv64imac
>   rv64imafdc
>
> We noticed that sched2 undoes some of the addresses generated by this
> optimization and consequently increases code size, therefore this patch adds a
> check in sched-deps.c to avoid changes that are expected to increase code size
> when not optimizing for speed. Since this change touches target-independent
> code, the patch has been bootstrapped and tested on x86 with no regressions.
>
> gcc/ChangeLog
>
>         * config/riscv/riscv.c (tree-pass.h): New include.
>         (cfg.h) Likewise.
>         (context.h) Likewise.
>         (riscv_compressed_reg_p): New function.
>         (riscv_compressed_lw_address_p): Likewise.
>         (riscv_legitimize_address): Attempt to convert base + large_offset
>         to compressible new_base + small_offset.
>         (riscv_address_cost): Make anticipated compressed load/stores
>         cheaper for code size than uncompressed load/stores.
>         (class pass_shorten_memrefs): New pass.
>         (pass_shorten_memrefs::execute): Likewise.
>         (make_pass_shorten_memrefs): Likewise.
>         (riscv_option_override): Register shorten_memrefs pass for
>         TARGET_RVC.
>         (riscv_register_priority): Move compressed register check to
>         riscv_compressed_reg_p.
>         * sched-deps.c (attempt_change): When optimizing for code size
>         don't make change if it increases code size.
>
> ---
>  gcc/config/riscv/riscv.c | 179 +++++++++++++++++++++++++++++++++++++++++++++--
>  gcc/sched-deps.c         |  10 +++
>  2 files changed, 183 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 39bf87a..e510314 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -55,6 +55,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "diagnostic.h"
>  #include "builtins.h"
>  #include "predict.h"
> +#include "tree-pass.h"
> +#include "cfg.h"
> +#include "context.h"
>
>  /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF.  */
>  #define UNSPEC_ADDRESS_P(X)                                    \
> @@ -848,6 +851,44 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p)
>    return riscv_classify_address (&addr, x, mode, strict_p);
>  }
>
> +/* Return true if hard reg REGNO can be used in compressed instructions.  */
> +
> +static bool
> +riscv_compressed_reg_p (int regno)
> +{
> +  /* x8-x15/f8-f15 are compressible registers.  */
> +  return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
> +         || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)));
> +}
> +
> +/* Return true if load/store from/to address x can be compressed.  */
> +
> +static bool
> +riscv_compressed_lw_address_p (rtx x)
> +{
> +  struct riscv_address_info addr;
> +  bool result = riscv_classify_address (&addr, x, GET_MODE (x),
> +                                       reload_completed);
> +
> +  /* Before reload, assuming all load/stores of valid addresses get compressed
> +     gives better code size than checking if the address is reg + small_offset
> +     early on.  */
> +  if (result && !reload_completed)
> +    return true;
> +
> +  /* Return false if address is not compressed_reg + small_offset.  */
> +  if (!result
> +      || addr.type != ADDRESS_REG
> +      || (!riscv_compressed_reg_p (REGNO (addr.reg))
> +           && addr.reg != stack_pointer_rtx)
> +      || !CONST_INT_P (addr.offset)
> +      || (INTVAL (addr.offset) & 3) != 0
> +      || !IN_RANGE (INTVAL (addr.offset), 0, 124))
> +    return false;
> +
> +  return result;
> +}
> +
>  /* Return the number of instructions needed to load or store a value
>     of mode MODE at address X.  Return 0 if X isn't valid for MODE.
>     Assume that multiword moves may need to be split into word moves
> @@ -1318,7 +1359,9 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
>    if (riscv_split_symbol (NULL, x, mode, &addr))
>      return riscv_force_address (addr, mode);
>
> -  /* Handle BASE + OFFSET using riscv_add_offset.  */
> +  /* When optimizing for size, try to convert BASE + LARGE_OFFSET into
> +     NEW_BASE + SMALL_OFFSET to allow possible compressed load/store, otherwise,
> +     handle BASE + OFFSET using riscv_add_offset.  */
>    if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))
>        && INTVAL (XEXP (x, 1)) != 0)
>      {
> @@ -1327,7 +1370,24 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
>
>        if (!riscv_valid_base_register_p (base, mode, false))
>         base = copy_to_mode_reg (Pmode, base);
> -      addr = riscv_add_offset (NULL, base, offset);
> +      if (optimize_function_for_size_p (cfun)
> +         && (strcmp (current_pass->name, "shorten_memrefs") == 0)
> +         && mode == SImode
> +         && (offset & 3) == 0
> +         && !IN_RANGE (offset, 0, 124))

I think the offset check can be relax here, since you can put those
offset at HIGH,
and rest offset for compressed load/store still can align to fit
instruction format.

e.g.
  lw a1, 125(a0)
 lw a2, 129(a0)
->
 addi a3, a0, 125
 lw a1, 0(a3)
 lw a2, 4(a3)

> +       {
> +         rtx high;
> +
> +         /* Leave OFFSET as a 7-bit offset and put the excess in HIGH.  */
> +         high = GEN_INT (offset & ~124);
> +         offset &= 124;

124 like a magic number appear 4 times in this patch,
I know it come from (((2^5) << 2) -1) & ~0x3 = 124, but it should
find some way to make this more readable.

> +         if (!SMALL_OPERAND (INTVAL (high)))
> +           high = force_reg (Pmode, high);
> +         base = force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base));
> +         addr = plus_constant (Pmode, base, offset);
> +       }
> +      else
> +       addr = riscv_add_offset (NULL, base, offset);

Could you split those logic into new function?

>        return riscv_force_address (addr, mode);
>      }
>
> @@ -1812,7 +1872,10 @@ riscv_address_cost (rtx addr, machine_mode mode,
>                     addr_space_t as ATTRIBUTE_UNUSED,
>                     bool speed ATTRIBUTE_UNUSED)
>  {
> -  return riscv_address_insns (addr, mode, false);
> +  if (!speed && mode == SImode
> +      && riscv_compressed_lw_address_p (addr))
> +    return 1;
> +  return !speed + riscv_address_insns (addr, mode, false);
>  }
>
>  /* Return one word of double-word value OP.  HIGH_P is true to select the
> @@ -4541,6 +4604,106 @@ riscv_init_machine_status (void)
>    return ggc_cleared_alloc<machine_function> ();
>  }
>
> +namespace {
> +
> +const pass_data pass_data_shorten_memrefs =
> +{
> +  RTL_PASS, /* type */
> +  "shorten_memrefs", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_shorten_memrefs : public rtl_opt_pass
> +{
> +public:
> +  pass_shorten_memrefs (gcc::context *ctxt)
> +    : rtl_opt_pass (pass_data_shorten_memrefs, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return optimize > 0; }
> +  virtual unsigned int execute (function *);
> +
> +}; // class pass_shorten_memrefs
> +
> +/* Try to make more use of compressed load and store instructions by replacing
> +   a load/store at address BASE + LARGE_OFFSET with a new load/store at address
> +   NEW BASE + SMALL OFFSET.  If NEW BASE is stored in a compressed register, the
> +   load/store can be compressed.  Since creating NEW BASE incurs an overhead,
> +   the change is only attempted when BASE is referenced by at least four
> +   load/stores in the same basic block.  */
> +unsigned int
> +pass_shorten_memrefs::execute (function *fn)
> +{
> +  typedef int_hash <HOST_WIDE_INT, 0> regno_hash;
> +  typedef hash_map <regno_hash, int> regno_map;
> +
> +  basic_block bb;
> +  rtx_insn *insn;
> +
> +  regstat_init_n_sets_and_refs ();
> +
> +  FOR_ALL_BB_FN (bb, fn)
> +  {
> +    regno_map *m = hash_map<regno_hash, int>::create_ggc (10);
> +    for (int pass = 0; !optimize_bb_for_speed_p (bb) && pass < 2; pass++)

It seems like collect info at first pass and doing the transform at second pass,
But you did the transform at first pass too, split two pass into two part would
be more readable.

> +      FOR_BB_INSNS (bb, insn)
> +       {
> +         if (!NONJUMP_INSN_P (insn))
> +           continue;
> +         rtx pat = PATTERN (insn);
> +         if (GET_CODE (pat) != SET)
> +           continue;
> +         start_sequence ();
> +         for (int i = 0; i < 2; i++)
> +           {
> +             rtx mem = XEXP (pat, i);
> +             if (MEM_P (mem) && GET_MODE (mem) == SImode)
> +               {
> +                 rtx addr = XEXP (mem, 0);
> +                 if (GET_CODE (addr) != PLUS)
> +                   continue;
> +                 if (!REG_P (XEXP (addr, 0)))
> +                   continue;
> +                 HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
> +                 if (REG_N_REFS (regno) < 4)
> +                   continue;
> +                 if (pass == 0)
> +                   m->get_or_insert (regno)++;
> +                 else if (m->get_or_insert (regno) > 3)
> +                   {
> +                     addr
> +                       = riscv_legitimize_address (addr, addr, GET_MODE (mem));
> +                     XEXP (pat, i) = replace_equiv_address (mem, addr);
> +                     df_insn_rescan (insn);
> +                   }
> +               }
> +           }
> +         rtx_insn *seq = get_insns ();
> +         end_sequence ();
> +         emit_insn_before (seq, insn);
> +       }
> +
> +  }
> +  regstat_free_n_sets_and_refs ();
> +
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +opt_pass *
> +make_pass_shorten_memrefs (gcc::context *ctxt)
> +{
> +  return new pass_shorten_memrefs (ctxt);
> +}
> +
>  /* Implement TARGET_OPTION_OVERRIDE.  */
>
>  static void
> @@ -4637,6 +4800,10 @@ riscv_option_override (void)
>      error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32"
>            " [%<-mriscv-attribute%>]");
>  #endif
> +
> +  if (TARGET_RVC)
> +    register_pass (make_pass_shorten_memrefs (g),
> +                  PASS_POS_INSERT_AFTER, "store_motion", 1);
>  }

GCC has new interface to put target specify optimization passes.
You can create a new file riscv-passes.def like
aarch64-passes.def/i386-passes.def
and add PASSES_EXTRA += $(srcdir)/config/riscv/riscv-passes.def to t-riscv.

>
>  /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
> @@ -4676,9 +4843,9 @@ riscv_conditional_register_usage (void)
>  static int
>  riscv_register_priority (int regno)
>  {
> -  /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection.  */
> -  if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
> -                    || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)))
> +  /* Favor compressed registers to improve the odds of RVC instruction
> +     selection.  */
> +  if (riscv_compressed_reg_p (regno))
>      return 1;
>
>    return 0;
> diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
> index 52db3cc..92a0893 100644
> --- a/gcc/sched-deps.c
> +++ b/gcc/sched-deps.c
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "sched-int.h"
>  #include "params.h"
>  #include "cselib.h"
> +#include "predict.h"
>
>  #ifdef INSN_SCHEDULING
>
> @@ -4707,6 +4708,15 @@ attempt_change (struct mem_inc_info *mii, rtx new_addr)
>    rtx mem = *mii->mem_loc;
>    rtx new_mem;
>
> +  /* When not optimizing for speed, avoid changes that are expected to make code
> +     size larger.  */
> +  addr_space_t as = MEM_ADDR_SPACE (mem);
> +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn));
> +  int old_cost = address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed);
> +  int new_cost = address_cost (new_addr, GET_MODE (mem), as, speed);
> +  if (new_cost > old_cost && !speed)

I think !speed should not needed here, it mean address_cost is
incorrect if generated worse code, but this change will effect all
other targets,
so I think it would be better to split into another patch and CC
related reviewer.


> +    return NULL_RTX;
> +
>    /* Jump through a lot of hoops to keep the attributes up to date.  We
>       do not want to call one of the change address variants that take
>       an offset even though we know the offset in many cases.  These
>
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 39bf87a..e510314 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -55,6 +55,9 @@  along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "builtins.h"
 #include "predict.h"
+#include "tree-pass.h"
+#include "cfg.h"
+#include "context.h"
 
 /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF.  */
 #define UNSPEC_ADDRESS_P(X)					\
@@ -848,6 +851,44 @@  riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p)
   return riscv_classify_address (&addr, x, mode, strict_p);
 }
 
+/* Return true if hard reg REGNO can be used in compressed instructions.  */
+
+static bool
+riscv_compressed_reg_p (int regno)
+{
+  /* x8-x15/f8-f15 are compressible registers.  */
+  return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
+	  || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)));
+}
+
+/* Return true if load/store from/to address x can be compressed.  */
+
+static bool
+riscv_compressed_lw_address_p (rtx x)
+{
+  struct riscv_address_info addr;
+  bool result = riscv_classify_address (&addr, x, GET_MODE (x),
+					reload_completed);
+
+  /* Before reload, assuming all load/stores of valid addresses get compressed
+     gives better code size than checking if the address is reg + small_offset
+     early on.  */
+  if (result && !reload_completed)
+    return true;
+
+  /* Return false if address is not compressed_reg + small_offset.  */
+  if (!result
+      || addr.type != ADDRESS_REG
+      || (!riscv_compressed_reg_p (REGNO (addr.reg))
+	    && addr.reg != stack_pointer_rtx)
+      || !CONST_INT_P (addr.offset)
+      || (INTVAL (addr.offset) & 3) != 0
+      || !IN_RANGE (INTVAL (addr.offset), 0, 124))
+    return false;
+
+  return result;
+}
+
 /* Return the number of instructions needed to load or store a value
    of mode MODE at address X.  Return 0 if X isn't valid for MODE.
    Assume that multiword moves may need to be split into word moves
@@ -1318,7 +1359,9 @@  riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
   if (riscv_split_symbol (NULL, x, mode, &addr))
     return riscv_force_address (addr, mode);
 
-  /* Handle BASE + OFFSET using riscv_add_offset.  */
+  /* When optimizing for size, try to convert BASE + LARGE_OFFSET into
+     NEW_BASE + SMALL_OFFSET to allow possible compressed load/store, otherwise,
+     handle BASE + OFFSET using riscv_add_offset.  */
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))
       && INTVAL (XEXP (x, 1)) != 0)
     {
@@ -1327,7 +1370,24 @@  riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
 
       if (!riscv_valid_base_register_p (base, mode, false))
 	base = copy_to_mode_reg (Pmode, base);
-      addr = riscv_add_offset (NULL, base, offset);
+      if (optimize_function_for_size_p (cfun)
+	  && (strcmp (current_pass->name, "shorten_memrefs") == 0)
+	  && mode == SImode
+	  && (offset & 3) == 0
+	  && !IN_RANGE (offset, 0, 124))
+	{
+	  rtx high;
+
+	  /* Leave OFFSET as a 7-bit offset and put the excess in HIGH.  */
+	  high = GEN_INT (offset & ~124);
+	  offset &= 124;
+	  if (!SMALL_OPERAND (INTVAL (high)))
+	    high = force_reg (Pmode, high);
+	  base = force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base));
+	  addr = plus_constant (Pmode, base, offset);
+	}
+      else
+	addr = riscv_add_offset (NULL, base, offset);
       return riscv_force_address (addr, mode);
     }
 
@@ -1812,7 +1872,10 @@  riscv_address_cost (rtx addr, machine_mode mode,
 		    addr_space_t as ATTRIBUTE_UNUSED,
 		    bool speed ATTRIBUTE_UNUSED)
 {
-  return riscv_address_insns (addr, mode, false);
+  if (!speed && mode == SImode
+      && riscv_compressed_lw_address_p (addr))
+    return 1;
+  return !speed + riscv_address_insns (addr, mode, false);
 }
 
 /* Return one word of double-word value OP.  HIGH_P is true to select the
@@ -4541,6 +4604,106 @@  riscv_init_machine_status (void)
   return ggc_cleared_alloc<machine_function> ();
 }
 
+namespace {
+
+const pass_data pass_data_shorten_memrefs =
+{
+  RTL_PASS, /* type */
+  "shorten_memrefs", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_shorten_memrefs : public rtl_opt_pass
+{
+public:
+  pass_shorten_memrefs (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_shorten_memrefs, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return optimize > 0; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_shorten_memrefs
+
+/* Try to make more use of compressed load and store instructions by replacing
+   a load/store at address BASE + LARGE_OFFSET with a new load/store at address
+   NEW BASE + SMALL OFFSET.  If NEW BASE is stored in a compressed register, the
+   load/store can be compressed.  Since creating NEW BASE incurs an overhead,
+   the change is only attempted when BASE is referenced by at least four
+   load/stores in the same basic block.  */
+unsigned int
+pass_shorten_memrefs::execute (function *fn)
+{
+  typedef int_hash <HOST_WIDE_INT, 0> regno_hash;
+  typedef hash_map <regno_hash, int> regno_map;
+
+  basic_block bb;
+  rtx_insn *insn;
+
+  regstat_init_n_sets_and_refs ();
+
+  FOR_ALL_BB_FN (bb, fn)
+  {
+    regno_map *m = hash_map<regno_hash, int>::create_ggc (10);
+    for (int pass = 0; !optimize_bb_for_speed_p (bb) && pass < 2; pass++)
+      FOR_BB_INSNS (bb, insn)
+	{
+	  if (!NONJUMP_INSN_P (insn))
+	    continue;
+	  rtx pat = PATTERN (insn);
+	  if (GET_CODE (pat) != SET)
+	    continue;
+	  start_sequence ();
+	  for (int i = 0; i < 2; i++)
+	    {
+	      rtx mem = XEXP (pat, i);
+	      if (MEM_P (mem) && GET_MODE (mem) == SImode)
+		{
+		  rtx addr = XEXP (mem, 0);
+		  if (GET_CODE (addr) != PLUS)
+		    continue;
+		  if (!REG_P (XEXP (addr, 0)))
+		    continue;
+		  HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
+		  if (REG_N_REFS (regno) < 4)
+		    continue;
+		  if (pass == 0)
+		    m->get_or_insert (regno)++;
+		  else if (m->get_or_insert (regno) > 3)
+		    {
+		      addr
+			= riscv_legitimize_address (addr, addr, GET_MODE (mem));
+		      XEXP (pat, i) = replace_equiv_address (mem, addr);
+		      df_insn_rescan (insn);
+		    }
+		}
+	    }
+	  rtx_insn *seq = get_insns ();
+	  end_sequence ();
+	  emit_insn_before (seq, insn);
+	}
+
+  }
+  regstat_free_n_sets_and_refs ();
+
+  return 0;
+}
+
+} // anon namespace
+
+opt_pass *
+make_pass_shorten_memrefs (gcc::context *ctxt)
+{
+  return new pass_shorten_memrefs (ctxt);
+}
+
 /* Implement TARGET_OPTION_OVERRIDE.  */
 
 static void
@@ -4637,6 +4800,10 @@  riscv_option_override (void)
     error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32"
 	   " [%<-mriscv-attribute%>]");
 #endif
+
+  if (TARGET_RVC)
+    register_pass (make_pass_shorten_memrefs (g),
+		   PASS_POS_INSERT_AFTER, "store_motion", 1);
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -4676,9 +4843,9 @@  riscv_conditional_register_usage (void)
 static int
 riscv_register_priority (int regno)
 {
-  /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection.  */
-  if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
-		     || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)))
+  /* Favor compressed registers to improve the odds of RVC instruction
+     selection.  */
+  if (riscv_compressed_reg_p (regno))
     return 1;
 
   return 0;
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index 52db3cc..92a0893 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -38,6 +38,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "sched-int.h"
 #include "params.h"
 #include "cselib.h"
+#include "predict.h"
 
 #ifdef INSN_SCHEDULING
 
@@ -4707,6 +4708,15 @@  attempt_change (struct mem_inc_info *mii, rtx new_addr)
   rtx mem = *mii->mem_loc;
   rtx new_mem;
 
+  /* When not optimizing for speed, avoid changes that are expected to make code
+     size larger.  */
+  addr_space_t as = MEM_ADDR_SPACE (mem);
+  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn));
+  int old_cost = address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed);
+  int new_cost = address_cost (new_addr, GET_MODE (mem), as, speed);
+  if (new_cost > old_cost && !speed)
+    return NULL_RTX;
+
   /* Jump through a lot of hoops to keep the attributes up to date.  We
      do not want to call one of the change address variants that take
      an offset even though we know the offset in many cases.  These