diff mbox

have -Wformat-overflow handle -fexec-charset (PR 80503)

Message ID 7b8f6999-942c-1540-c6e6-a6284d57baf5@gmail.com
State New
Headers show

Commit Message

Martin Sebor April 27, 2017, 9:05 p.m. UTC
On 04/26/2017 04:34 PM, Jakub Jelinek wrote:
> On Wed, Apr 26, 2017 at 10:26:56PM +0000, Joseph Myers wrote:
>> On Wed, 26 Apr 2017, Martin Sebor wrote:
>>
>>> Testing my solution for bug 77671 (missing -Wformat-overflow
>>> sprintf with "%s") caused a regression in the charset/builtin2.c
>>> test for bug 25120 (builtin *printf handlers are confused by
>>> -fexec-charset).  That led me to realize that like -Wformat
>>> itself, the whole gimple-ssa-sprintf pass is oblivious to
>>> potential differences between the source character set on
>>> the host and the execution character set on the target.  As
>>> a result, when the host and target sets are different, the
>>> pass misinterprets ordinary format characters as special
>>> (e.g., parts of directives) and vice versa.
>>>
>>> The attached patch implements a simple solution to this problem
>>> by introducing a mapping between the two sets.
>>
>> target_strtol10 appears to do no checking for overflow, which I'd expect
>> would result in nonsensical results for large width values overflowing
>> host long (whereas strtol would reliably return LONG_MAX in such cases).

Thanks, good catch!  I've added overflow detection (along with
a warning) in the updated version.

>
> Also, can't there be a way to shortcut all this processing if the
> charsets are the same?  And is it a good idea if every pass that needs to do
> something with the exec charset chars caches its own results of the
> langhook?

The biggest overhead is calling lang_hooks.to_target_charset
to initialize each character in the source set.  That could
be avoided if there were a way to determine in the middle-end
whether the input and target charsets are the same, but I don't
see one.  Alternatively, it could be optimized by converting
all the characters in one shot as a string, but without adding
a new target hook to do that I don't see how to do that either.
It might be a useful enhancement but given the scope it feels
like it should be done independently of this change.  But if
you know of a solution that escaped me please let me know.

The overhead of the additional processing should be negligible
irrespective of whether the charsets are different or the same
(all it entails is indexing into a table).

I agree that sharing data would be a good thing but as it is,
the little that can be be shared already is (the target_percent
character with builtins.c).  The rest of it (i.e., the mapping)
is only being used by gimple-ssa-sprintf.c.

If/when -Wformat is ever enhanced to handle -fexec-charset, or
if another area needs to, then implementinng some more general
would be worthwhile.

Attached is an updated patch with just the overflow handling
suggested by Joseph.

Martin

Comments

Jeff Law April 28, 2017, 4:22 p.m. UTC | #1
On 04/27/2017 03:05 PM, Martin Sebor wrote:
> On 04/26/2017 04:34 PM, Jakub Jelinek wrote:
>>
>> Also, can't there be a way to shortcut all this processing if the
>> charsets are the same?  And is it a good idea if every pass that needs 
>> to do
>> something with the exec charset chars caches its own results of the
>> langhook?
> 
> The biggest overhead is calling lang_hooks.to_target_charset
> to initialize each character in the source set.  That could
> be avoided if there were a way to determine in the middle-end
> whether the input and target charsets are the same, but I don't
> see one.  Alternatively, it could be optimized by converting
> all the characters in one shot as a string, but without adding
> a new target hook to do that I don't see how to do that either.
> It might be a useful enhancement but given the scope it feels
> like it should be done independently of this change.  But if
> you know of a solution that escaped me please let me know.
> 
> The overhead of the additional processing should be negligible
> irrespective of whether the charsets are different or the same
> (all it entails is indexing into a table).
So the initialization could be done once per translation unit rather 
than once per function -- assuming the target character set doesn't 
change within a translation unit.

That seems like it ought to be easy.

The table-lookup seems like a reasonable cost and I don't think we 
should litter the code with conditionals to conditionally lookup based 
on whether or not the charsets are the same.

> 
> I agree that sharing data would be a good thing but as it is,
> the little that can be be shared already is (the target_percent
> character with builtins.c).  The rest of it (i.e., the mapping)
> is only being used by gimple-ssa-sprintf.c.
If we ever get to a point where we need this level of mapping elsewhere, 
we can always pull the code out of gimple-ssa-sprintf into a more 
generic location.  I don't think we need to over-engineering sharing 
into this right now.


> 
> If/when -Wformat is ever enhanced to handle -fexec-charset, or
> if another area needs to, then implementinng some more general
> would be worthwhile.
Right.

> 
> Attached is an updated patch with just the overflow handling
> suggested by Joseph.
> 
> Martin
> 
> 
> gcc-80523.diff
> 
> 
> PR tree-optimization/80523 -  -Wformat-overflow doesn't consider -fexec-charset
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/80523
> 	* gimple-ssa-sprintf.c (target_to_host_charmap): New global variable.
> 	(init_target_to_host_charmap, target_to_host, target_strtol10): New
> 	functions.
> 	(maybe_warn, format_directive, parse_directive): Use new functions.
> 	(pass_sprintf_length::execute): Call init_target_to_host_charmap.	
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/80523
> 	* gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: New test.
> 
> Index: gcc/gimple-ssa-sprintf.c
> ===================================================================
> --- gcc/gimple-ssa-sprintf.c	(revision 247263)
> +++ gcc/gimple-ssa-sprintf.c	(working copy)
> +/* Return a string consisting of charavters in the host source character
s/charavters/characters/


> +   set corresponding to the string TARGSTR consisting of characters in
> +   the execution character set.  */
> +
> +static const char*
> +target_to_host (const char *targstr)
> +{
> +  /* The interesting subset of source and execution characters are
> +     the same so no conversion is necessary.  */
> +  if (target_to_host_charmap['\0'] == 1)
> +    return targstr;
> +
> +  /* Convert the initial substring of TARGSTR to the corresponding
> +     characters in the host set, appending "..." if TARGSTR is too
> +     long to fit.  Using the static buffer assumes the function is
> +     not called in between sequence points (which it isn't).  */
> +  static char hostr[32];
> +  for (char *ph = hostr; ; ++targstr)
> +    {
> +      *ph++ = target_to_host (*targstr);
> +      if (!*targstr)
> +	break;
> +
> +      if (ph - hostr == sizeof hostr - 4)
> +	{
> +	  *ph = '\0';
> +	  strcat (ph, "...");
> +	  break;
> +	}
> +    }
> +
> +  return hostr;
Ewww.  I guess the alternative would be something like:

Expand the return value to include a indicator of whether or not the 
original string was returned (common case) or if a new string was 
returned and thus needs to be deallocated by the caller.

That's probably pretty unpleasant given we don't have a central place 
where we call target_to_host, so the caller side would be much uglier.

Are there any downstream impacts when the string is too long to covert 
other than not warning for things which were unconverted?

Jeff
Jakub Jelinek April 28, 2017, 4:27 p.m. UTC | #2
On Fri, Apr 28, 2017 at 10:22:29AM -0600, Jeff Law wrote:
> So the initialization could be done once per translation unit rather than
> once per function -- assuming the target character set doesn't change within
> a translation unit.
> 
> That seems like it ought to be easy.
> 
> The table-lookup seems like a reasonable cost and I don't think we should
> litter the code with conditionals to conditionally lookup based on whether
> or not the charsets are the same.

One option would be to have the cache array initialized say to 0 for
all chars except for % (which can be taken from target_percent or how is
that called), and then query (and cache) chars you need (you don't need
anything until % is seen, then obviously you need to translate chars
following that until end of the format string is recognized, then again
skip until next % etc.

And/or enhance libcpp and the langhooks so that they will tell you when
the exec charset is identical to host (or at least the subset format strings
care about).

	Jakub
Jeff Law April 28, 2017, 4:36 p.m. UTC | #3
On 04/28/2017 10:27 AM, Jakub Jelinek wrote:
> On Fri, Apr 28, 2017 at 10:22:29AM -0600, Jeff Law wrote:
>> So the initialization could be done once per translation unit rather than
>> once per function -- assuming the target character set doesn't change within
>> a translation unit.
>>
>> That seems like it ought to be easy.
>>
>> The table-lookup seems like a reasonable cost and I don't think we should
>> litter the code with conditionals to conditionally lookup based on whether
>> or not the charsets are the same.
> 
> One option would be to have the cache array initialized say to 0 for
> all chars except for % (which can be taken from target_percent or how is
> that called), and then query (and cache) chars you need (you don't need
> anything until % is seen, then obviously you need to translate chars
> following that until end of the format string is recognized, then again
> skip until next % etc.
That seems like a significant over-complication for little real world 
benefit.

Jeff
diff mbox

Patch

PR tree-optimization/80523 -  -Wformat-overflow doesn't consider -fexec-charset

gcc/ChangeLog:

	PR tree-optimization/80523
	* gimple-ssa-sprintf.c (target_to_host_charmap): New global variable.
	(init_target_to_host_charmap, target_to_host, target_strtol10): New
	functions.
	(maybe_warn, format_directive, parse_directive): Use new functions.
	(pass_sprintf_length::execute): Call init_target_to_host_charmap.	

gcc/testsuite/ChangeLog:

	PR tree-optimization/80523
	* gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: New test.

Index: gcc/gimple-ssa-sprintf.c
===================================================================
--- gcc/gimple-ssa-sprintf.c	(revision 247263)
+++ gcc/gimple-ssa-sprintf.c	(working copy)
@@ -66,6 +66,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "calls.h"
 #include "cfgloop.h"
 #include "intl.h"
+#include "langhooks.h"
 
 #include "builtins.h"
 #include "stor-layout.h"
@@ -273,6 +274,143 @@  target_size_max ()
   return tree_to_uhwi (TYPE_MAX_VALUE (size_type_node));
 }
 
+/* A straightforward mapping from the execution character set to the host
+   character set indexed by execution character.  */
+
+static char target_to_host_charmap[256];
+
+/* Initialize a mapping from the execution character set to the host
+   character set.  */
+
+static bool
+init_target_to_host_charmap ()
+{
+  if (!init_target_chars ())
+    return false;
+
+  /* The subset of the source character set used by printf conversion
+     specifications (strictly speaking, not all letters are used but
+     they are included here for the sake of simplicity).  The dollar
+     sign must be included even though it's not in the basic source
+     character set.  */
+  const char srcset[] = " 0123456789!\"#%&'()*+,-./:;<=>?[\\]^_{|}~$"
+    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
+
+  /* Set the mapping for all characters to some ordinary value (i,e.,
+     not none used in printf conversion specifications) and overwrite
+     those that are used by conversion specifications to their
+     corresponding values.  */
+  memset (target_to_host_charmap + 1, '?', sizeof target_to_host_charmap - 1);
+
+  /* Are the two sets of characters the same?  */
+  bool all_same_p = true;
+
+  for (const char *pc = srcset; *pc; ++pc)
+    {
+      /* Slice off the high end bits in case target characters are
+	 signed.  All values are expected to be non-nul, otherwise
+	 there's a problem.  */
+      if (unsigned char tc = lang_hooks.to_target_charset (*pc))
+	{
+	  target_to_host_charmap[tc] = *pc;
+	  if (tc != *pc)
+	    all_same_p = false;
+	}
+      else
+	return false;
+
+    }
+
+  /* Set the first element to a non-zero value if the mapping
+     is 1-to-1, otherwise leave it clear (NUL is assumed to be
+     the same in both character sets).  */
+  target_to_host_charmap[0] = all_same_p;
+
+  return true;
+}
+
+/* Return the host source character corresponding to the character
+   CH in the execution character set if one exists, or some innocuous
+   (non-special, non-nul) source character otherwise.  */
+
+static inline unsigned char
+target_to_host (unsigned char ch)
+{
+  return target_to_host_charmap[ch];
+}
+
+/* Return a string consisting of charavters in the host source character
+   set corresponding to the string TARGSTR consisting of characters in
+   the execution character set.  */
+
+static const char*
+target_to_host (const char *targstr)
+{
+  /* The interesting subset of source and execution characters are
+     the same so no conversion is necessary.  */
+  if (target_to_host_charmap['\0'] == 1)
+    return targstr;
+
+  /* Convert the initial substring of TARGSTR to the corresponding
+     characters in the host set, appending "..." if TARGSTR is too
+     long to fit.  Using the static buffer assumes the function is
+     not called in between sequence points (which it isn't).  */
+  static char hostr[32];
+  for (char *ph = hostr; ; ++targstr)
+    {
+      *ph++ = target_to_host (*targstr);
+      if (!*targstr)
+	break;
+
+      if (ph - hostr == sizeof hostr - 4)
+	{
+	  *ph = '\0';
+	  strcat (ph, "...");
+	  break;
+	}
+    }
+
+  return hostr;
+}
+
+/* Convert the sequence of decimal digits in the execution character
+   starting at S to a long, just like strtol does.  Return the result
+   and set *END to one past the last converted character.  On range
+   error set ERANGE to the digit that caused it.  */
+
+static inline long
+target_strtol10 (const char **ps, const char **erange)
+{
+  unsigned HOST_WIDE_INT val = 0;
+  for ( ; ; ++*ps)
+    {
+      unsigned char c = target_to_host (**ps);
+      if (ISDIGIT (c))
+	{
+	  c -= '0';
+
+	  /* Check for overflow.  */
+	  if (val > (LONG_MAX - c) / 10LU)
+	    {
+	      val = LONG_MAX;
+	      *erange = *ps;
+
+	      /* Skip the remaining digits.  */
+	      do
+		c = target_to_host (*++*ps);
+	      while (ISDIGIT (c));
+	      break;
+	    }
+	  else
+	    val = val * 10 + c;
+	}
+      else
+	break;
+    }
+
+  return val;
+}
+
 /* Return the constant initial value of DECL if available or DECL
    otherwise.  Same as the synonymous function in c/c-typeck.c.  */
 
@@ -2289,7 +2427,7 @@  maybe_warn (substring_loc &dirloc, source_range *p
       /* The size of the destination region is exact.  */
       unsigned HOST_WIDE_INT navail = avail_range.max;
 
-      if (*dir.beg != '%')
+      if (target_to_host (*dir.beg) != '%')
 	{
 	  /* For plain character directives (i.e., the format string itself)
 	     but not others, point the caret at the first character that's
@@ -2340,7 +2478,7 @@  maybe_warn (substring_loc &dirloc, source_range *p
 		       "into a region of size %wu")));
 	  return fmtwarn (dirloc, pargrange, NULL,
 			  info.warnopt (), fmtstr,
-			  dir.len, dir.beg, res.min,
+			  dir.len, target_to_host (dir.beg), res.min,
 			  navail);
 	}
 
@@ -2357,7 +2495,7 @@  maybe_warn (substring_loc &dirloc, source_range *p
 		    "into a region of size %wu"));
 	  return fmtwarn (dirloc, pargrange, NULL,
 			  info.warnopt (), fmtstr,
-			  dir.len, dir.beg,
+			  dir.len, target_to_host (dir.beg),
 			  res.max, navail);
 	}
 
@@ -2377,7 +2515,7 @@  maybe_warn (substring_loc &dirloc, source_range *p
 		    "into a region of size %wu"));
 	  return fmtwarn (dirloc, pargrange, NULL,
 			  info.warnopt (), fmtstr,
-			  dir.len, dir.beg,
+			  dir.len, target_to_host (dir.beg),
 			  res.likely, navail);
 	}
 
@@ -2394,7 +2532,7 @@  maybe_warn (substring_loc &dirloc, source_range *p
 		    "%wu bytes into a region of size %wu"));
 	  return fmtwarn (dirloc, pargrange, NULL,
 			  info.warnopt (), fmtstr,
-			  dir.len, dir.beg,
+			  dir.len, target_to_host (dir.beg),
 			  res.min, res.max,
 			  navail);
 	}
@@ -2410,13 +2548,13 @@  maybe_warn (substring_loc &dirloc, source_range *p
 		"into a region of size %wu"));
       return fmtwarn (dirloc, pargrange, NULL,
 		      info.warnopt (), fmtstr,
-		      dir.len, dir.beg,
+		      dir.len, target_to_host (dir.beg),
 		      res.min, navail);
     }
 
   /* The size of the destination region is a range.  */
 
-  if (*dir.beg != '%')
+  if (target_to_host (*dir.beg) != '%')
     {
       unsigned HOST_WIDE_INT navail = avail_range.max;
 
@@ -2469,7 +2607,7 @@  maybe_warn (substring_loc &dirloc, source_range *p
 
       return fmtwarn (dirloc, pargrange, NULL,
 		      info.warnopt (), fmtstr,
-		      dir.len, dir.beg, res.min,
+		      dir.len, target_to_host (dir.beg), res.min,
 		      avail_range.min, avail_range.max);
     }
 
@@ -2488,7 +2626,7 @@  maybe_warn (substring_loc &dirloc, source_range *p
 		"into a region of size between %wu and %wu"));
       return fmtwarn (dirloc, pargrange, NULL,
 		      info.warnopt (), fmtstr,
-		      dir.len, dir.beg, res.max,
+		      dir.len, target_to_host (dir.beg), res.max,
 		      avail_range.min, avail_range.max);
     }
 
@@ -2510,7 +2648,7 @@  maybe_warn (substring_loc &dirloc, source_range *p
 		"into a region of size between %wu and %wu"));
       return fmtwarn (dirloc, pargrange, NULL,
 		      info.warnopt (), fmtstr,
-		      dir.len, dir.beg, res.likely,
+		      dir.len, target_to_host (dir.beg), res.likely,
 		      avail_range.min, avail_range.max);
     }
 
@@ -2529,7 +2667,7 @@  maybe_warn (substring_loc &dirloc, source_range *p
 		"%wu bytes into a region of size between %wu and %wu"));
       return fmtwarn (dirloc, pargrange, NULL,
 		      info.warnopt (), fmtstr,
-		      dir.len, dir.beg,
+		      dir.len, target_to_host (dir.beg),
 		      res.min, res.max,
 		      avail_range.min, avail_range.max);
     }
@@ -2547,7 +2685,7 @@  maybe_warn (substring_loc &dirloc, source_range *p
 	    "into a region of size between %wu and %wu"));
   return fmtwarn (dirloc, pargrange, NULL,
 		  info.warnopt (), fmtstr,
-		  dir.len, dir.beg,
+		  dir.len, target_to_host (dir.beg),
 		  res.min,
 		  avail_range.min, avail_range.max);
 }
@@ -2636,7 +2774,7 @@  format_directive (const pass_sprintf_length::call_
     {
       fmtwarn (dirloc, pargrange, NULL, info.warnopt (),
 	       "%<%.*s%> directive argument is null",
-	       dirlen, dir.beg);
+	       dirlen, target_to_host (dir.beg));
 
       /* Don't bother processing the rest of the format string.  */
       res->warned = true;
@@ -2703,7 +2841,7 @@  format_directive (const pass_sprintf_length::call_
 			  info.warnopt (),
 			  "%<%.*s%> directive output of %wu bytes exceeds "
 			  "minimum required size of 4095",
-			  dirlen, dir.beg, fmtres.range.min);
+			  dirlen, target_to_host (dir.beg), fmtres.range.min);
       else
 	{
 	  const char *fmtstr
@@ -2715,7 +2853,7 @@  format_directive (const pass_sprintf_length::call_
 
 	  warned = fmtwarn (dirloc, pargrange, NULL,
 			    info.warnopt (), fmtstr,
-			    dirlen, dir.beg,
+			    dirlen, target_to_host (dir.beg),
 			    fmtres.range.min, fmtres.range.max);
 	}
     }
@@ -2744,7 +2882,7 @@  format_directive (const pass_sprintf_length::call_
 	warned = fmtwarn (dirloc, pargrange, NULL, info.warnopt (),
 			  "%<%.*s%> directive output of %wu bytes causes "
 			  "result to exceed %<INT_MAX%>",
-			  dirlen, dir.beg, fmtres.range.min);
+			  dirlen, target_to_host (dir.beg), fmtres.range.min);
       else
 	{
 	  const char *fmtstr
@@ -2755,7 +2893,7 @@  format_directive (const pass_sprintf_length::call_
 		     "bytes may cause result to exceed %<INT_MAX%>"));
 	  warned = fmtwarn (dirloc, pargrange, NULL,
 			    info.warnopt (), fmtstr,
-			    dirlen, dir.beg,
+			    dirlen, target_to_host (dir.beg),
 			    fmtres.range.min, fmtres.range.max);
 	}
     }
@@ -2847,7 +2985,7 @@  parse_directive (pass_sprintf_length::call_info &i
 		 directive &dir, format_result *res,
 		 const char *str, unsigned *argno)
 {
-  const char *pcnt = strchr (str, '%');
+  const char *pcnt = strchr (str, target_percent);
   dir.beg = str;
 
   if (size_t len = pcnt ? pcnt - str : *str ? strlen (str) : 1)
@@ -2873,7 +3011,7 @@  parse_directive (pass_sprintf_length::call_info &i
   const char *pf = pcnt + 1;
 
     /* POSIX numbered argument index or zero when none.  */
-  unsigned dollar = 0;
+  HOST_WIDE_INT dollar = 0;
 
   /* With and precision.  -1 when not specified, HOST_WIDE_INT_MIN
      when given by a va_list argument, and a non-negative value
@@ -2881,6 +3019,17 @@  parse_directive (pass_sprintf_length::call_info &i
   HOST_WIDE_INT width = -1;
   HOST_WIDE_INT precision = -1;
 
+  /* Pointers to the beginning of the width and precision decimal
+     string (if any) within the directive.  */
+  const char *pwidth = 0;
+  const char *pprec = 0;
+
+  /* When the value of the decimal string that specifies width or
+     precision is out of range, points to the digit that causes
+     the value to exceed the limit.  */
+  const char *werange = NULL;
+  const char *perange = NULL;
+
   /* Width specified via the asterisk.  Need not be INTEGER_CST.
      For vararg functions set to void_node.  */
   tree star_width = NULL_TREE;
@@ -2889,17 +3038,16 @@  parse_directive (pass_sprintf_length::call_info &i
      For vararg functions set to void_node.  */
   tree star_precision = NULL_TREE;
 
-  if (ISDIGIT (*pf))
+  if (ISDIGIT (target_to_host (*pf)))
     {
       /* This could be either a POSIX positional argument, the '0'
 	 flag, or a width, depending on what follows.  Store it as
 	 width and sort it out later after the next character has
 	 been seen.  */
-      char *end;
-      width = strtol (pf, &end, 10);
-      pf = end;
+      pwidth = pf;
+      width = target_strtol10 (&pf, &werange);
     }
-  else if ('*' == *pf)
+  else if (target_to_host (*pf) == '*')
     {
       /* Similarly to the block above, this could be either a POSIX
 	 positional argument or a width, depending on what follows.  */
@@ -2910,7 +3058,7 @@  parse_directive (pass_sprintf_length::call_info &i
       ++pf;
     }
 
-  if (*pf == '$')
+  if (target_to_host (*pf) == '$')
     {
       /* Handle the POSIX dollar sign which references the 1-based
 	 positional argument number.  */
@@ -2925,7 +3073,7 @@  parse_directive (pass_sprintf_length::call_info &i
       /* Bail when the numbered argument is out of range (it will
 	 have already been diagnosed by -Wformat).  */
       if (dollar == 0
-	  || dollar == info.argidx
+	  || dollar == (int)info.argidx
 	  || dollar > gimple_call_num_args (info.callstmt))
 	return false;
 
@@ -2959,7 +3107,7 @@  parse_directive (pass_sprintf_length::call_info &i
 	 the next field is the optional flags followed by an optional
 	 width.  */
       for ( ; ; ) {
-	switch (*pf)
+	switch (target_to_host (*pf))
 	  {
 	  case ' ':
 	  case '0':
@@ -2966,7 +3114,7 @@  parse_directive (pass_sprintf_length::call_info &i
 	  case '+':
 	  case '-':
 	  case '#':
-	    dir.set_flag (*pf++);
+	    dir.set_flag (target_to_host (*pf++));
 	    break;
 
 	  default:
@@ -2975,13 +3123,13 @@  parse_directive (pass_sprintf_length::call_info &i
       }
 
     start_width:
-      if (ISDIGIT (*pf))
+      if (ISDIGIT (target_to_host (*pf)))
 	{
-	  char *end;
-	  width = strtol (pf, &end, 10);
-	  pf = end;
+	  werange = 0;
+	  pwidth = pf;
+	  width = target_strtol10 (&pf, &werange);
 	}
-      else if ('*' == *pf)
+      else if (target_to_host (*pf) == '*')
 	{
 	  if (*argno < gimple_call_num_args (info.callstmt))
 	    star_width = gimple_call_arg (info.callstmt, (*argno)++);
@@ -2993,7 +3141,7 @@  parse_directive (pass_sprintf_length::call_info &i
 	    }
 	  ++pf;
 	}
-      else if ('\'' == *pf)
+      else if (target_to_host (*pf) == '\'')
 	{
 	  /* The POSIX apostrophe indicating a numeric grouping
 	     in the current locale.  Even though it's possible to
@@ -3005,17 +3153,16 @@  parse_directive (pass_sprintf_length::call_info &i
     }
 
  start_precision:
-  if ('.' == *pf)
+  if (target_to_host (*pf) == '.')
     {
       ++pf;
 
-      if (ISDIGIT (*pf))
+      if (ISDIGIT (target_to_host (*pf)))
 	{
-	  char *end;
-	  precision = strtol (pf, &end, 10);
-	  pf = end;
+	  pprec = pf;
+	  precision = target_strtol10 (&pf, &perange);
 	}
-      else if ('*' == *pf)
+      else if (target_to_host (*pf) == '*')
 	{
 	  if (*argno < gimple_call_num_args (info.callstmt))
 	    star_precision = gimple_call_arg (info.callstmt, (*argno)++);
@@ -3035,10 +3182,10 @@  parse_directive (pass_sprintf_length::call_info &i
 	}
     }
 
-  switch (*pf)
+  switch (target_to_host (*pf))
     {
     case 'h':
-      if (pf[1] == 'h')
+      if (target_to_host (pf[1]) == 'h')
 	{
 	  ++pf;
 	  dir.modifier = FMT_LEN_hh;
@@ -3059,7 +3206,7 @@  parse_directive (pass_sprintf_length::call_info &i
       break;
 
     case 'l':
-      if (pf[1] == 'l')
+      if (target_to_host (pf[1]) == 'l')
 	{
 	  ++pf;
 	  dir.modifier = FMT_LEN_ll;
@@ -3080,7 +3227,7 @@  parse_directive (pass_sprintf_length::call_info &i
       break;
     }
 
-  switch (*pf)
+  switch (target_to_host (*pf))
     {
       /* Handle a sole '%' character the same as "%%" but since it's
 	 undefined prevent the result from being folded.  */
@@ -3141,8 +3288,11 @@  parse_directive (pass_sprintf_length::call_info &i
       return 0;
     }
 
-  dir.specifier = *pf++;
+  dir.specifier = target_to_host (*pf++);
 
+  /* Store the length of the format directive.  */
+  dir.len = pf - pcnt;
+
   if (star_width)
     {
       if (INTEGRAL_TYPE_P (TREE_TYPE (star_width)))
@@ -3156,8 +3306,26 @@  parse_directive (pass_sprintf_length::call_info &i
 	}
     }
   else
-    dir.set_width (width);
+    {
+      if (width == LONG_MAX && werange)
+	{
+	  size_t begin = dir.beg - info.fmtstr + (pwidth - pcnt);
+	  size_t caret = begin + (werange - pcnt);
+	  size_t end = pf - info.fmtstr - 1;
 
+	  /* Create a location for the width part of the directive,
+	     pointing the caret at the first out-of-range digit.  */
+	  substring_loc dirloc (info.fmtloc, TREE_TYPE (info.format),
+				caret, begin, end);
+
+	  fmtwarn (dirloc, NULL, NULL,
+		   info.warnopt (), "%<%.*s%> directive width out of range",
+		   dir.len, target_to_host (dir.beg));
+	}
+
+      dir.set_width (width);
+    }
+
   if (star_precision)
     {
       if (INTEGRAL_TYPE_P (TREE_TYPE (star_precision)))
@@ -3171,8 +3339,27 @@  parse_directive (pass_sprintf_length::call_info &i
 	}
     }
   else
-    dir.set_precision (precision);
+    {
+      if (precision == LONG_MAX && perange)
+	{
+	  size_t begin = dir.beg - info.fmtstr + (pprec - pcnt) - 1;
+	  size_t caret = dir.beg - info.fmtstr + (perange - pcnt) - 1;
+	  size_t end = pf - info.fmtstr - 2;
 
+	  /* Create a location for the precision part of the directive,
+	     including the leading period, pointing the caret at the first
+	     out-of-range digit .  */
+	  substring_loc dirloc (info.fmtloc, TREE_TYPE (info.format),
+				caret, begin, end);
+
+	  fmtwarn (dirloc, NULL, NULL,
+		   info.warnopt (), "%<%.*s%> directive precision out of range",
+		   dir.len, target_to_host (dir.beg));
+	}
+
+      dir.set_precision (precision);
+    }
+
   /* Extract the argument if the directive takes one and if it's
      available (e.g., the function doesn't take a va_list).  Treat
      missing arguments the same as va_list, even though they will
@@ -3181,9 +3368,6 @@  parse_directive (pass_sprintf_length::call_info &i
       && *argno < gimple_call_num_args (info.callstmt))
     dir.arg = gimple_call_arg (info.callstmt, dollar ? dollar : (*argno)++);
 
-  /* Return the length of the format directive.  */
-  dir.len = pf - pcnt;
-
   if (dump_file)
     {
       fprintf (dump_file, "  Directive %u at offset %llu: \"%.*s\"",
@@ -3708,6 +3892,8 @@  pass_sprintf_length::handle_gimple_call (gimple_st
 unsigned int
 pass_sprintf_length::execute (function *fun)
 {
+  init_target_to_host_charmap ();
+
   basic_block bb;
   FOR_EACH_BB_FN (bb, fun)
     {
Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c	(working copy)
@@ -0,0 +1,129 @@ 
+/* PR tree-optimization/80523 - -Wformat-overflow doesn't consider
+   -fexec-charset
+   { dg-do compile }
+   { dg-require-iconv "IBM1047" }
+   { dg-options "-O2 -Wall -Wno-format -Wformat-overflow -fexec-charset=IBM1047 -ftrack-macro-expansion=0" } */
+
+char buf[1];
+void sink (void*);
+
+#define T(...) (__builtin_sprintf (buf + 1, __VA_ARGS__), sink (buf))
+
+/* Exercise all special C and POSIX characters. */
+
+void test_characters ()
+{
+  T ("%%");           /* { dg-warning ".%%. directive writing 1 byte" } */
+
+  T ("%A",    0.0);   /* { dg-warning ".%A. directive writing between 6 and 20 " } */
+  T ("%a",    0.0);   /* { dg-warning ".%a. directive writing between 6 and 20 " } */
+
+  T ("%C",    'a');   /* { dg-warning ".%C. directive writing 1 byte" "bug 80537" { xfail *-*-* } } */
+  T ("%c",    'a');   /* { dg-warning ".%c. directive writing 1 byte" } */
+
+  T ("%d",     12);   /* { dg-warning ".%d. directive writing 2 bytes" } */
+  T ("% d",    12);   /* { dg-warning ".% d. directive writing 3 bytes" } */
+  T ("%-d",   123);   /* { dg-warning ".%-d. directive writing 3 bytes" } */
+  T ("%+d",  1234);   /* { dg-warning ".%\\+d. directive writing 5 bytes" } */
+  T ("%'d",  1234);   /* { dg-warning ".%'d. directive writing 5 bytes" "bug 80535" { xfail *-*-* } } */
+  T ("%1$d", 2345);   /* { dg-warning ".%1\\\$d. directive writing 4 bytes" } */
+
+  /* Verify that digits are correctly interpreted as width and precision.  */
+  T ("%0d", 12345);   /* { dg-warning ".%0d. directive writing 5 bytes" } */
+  T ("%1d", 12345);   /* { dg-warning ".%1d. directive writing 5 bytes" } */
+  T ("%2d", 12345);   /* { dg-warning ".%2d. directive writing 5 bytes" } */
+  T ("%3d", 12345);   /* { dg-warning ".%3d. directive writing 5 bytes" } */
+  T ("%4d", 12345);   /* { dg-warning ".%4d. directive writing 5 bytes" } */
+  T ("%5d", 12345);   /* { dg-warning ".%5d. directive writing 5 bytes" } */
+  T ("%6d", 12345);   /* { dg-warning ".%6d. directive writing 6 bytes" } */
+  T ("%7d", 12345);   /* { dg-warning ".%7d. directive writing 7 bytes" } */
+  T ("%8d", 12345);   /* { dg-warning ".%8d. directive writing 8 bytes" } */
+  T ("%9d", 12345);   /* { dg-warning ".%9d. directive writing 9 bytes" } */
+
+  T ("%.0d", 12345);  /* { dg-warning ".%.0d. directive writing 5 bytes" } */
+  T ("%.1d", 12345);  /* { dg-warning ".%.1d. directive writing 5 bytes" } */
+  T ("%.2d", 12345);  /* { dg-warning ".%.2d. directive writing 5 bytes" } */
+  T ("%.3d", 12345);  /* { dg-warning ".%.3d. directive writing 5 bytes" } */
+  T ("%.4d", 12345);  /* { dg-warning ".%.4d. directive writing 5 bytes" } */
+  T ("%.5d", 12345);  /* { dg-warning ".%.5d. directive writing 5 bytes" } */
+  T ("%.6d", 12345);  /* { dg-warning ".%.6d. directive writing 6 bytes" } */
+  T ("%.7d", 12345);  /* { dg-warning ".%.7d. directive writing 7 bytes" } */
+  T ("%.8d", 12345);  /* { dg-warning ".%.8d. directive writing 8 bytes" } */
+  T ("%.9d", 12345);  /* { dg-warning ".%.9d. directive writing 9 bytes" } */
+
+  T ("%hhd",    12);   /* { dg-warning ".%hhd. directive writing 2 bytes" } */
+  T ("%hd",    234);   /* { dg-warning ".%hd. directive writing 3 bytes" } */
+
+  {
+    const __PTRDIFF_TYPE__ i = 3456;
+    T ("%jd",   i);  /* { dg-warning ".%jd. directive writing 4 bytes" } */
+  }
+
+  T ("%ld",  45678L);  /* { dg-warning ".%ld. directive writing 5 bytes" } */
+
+  {
+    const __PTRDIFF_TYPE__ i = 56789;
+    T ("%td",   i);  /* { dg-warning ".%td. directive writing 5 bytes" } */
+  }
+
+  {
+    const __SIZE_TYPE__ i = 67890;
+    T ("%zd",   i);  /* { dg-warning ".%zd. directive writing 5 bytes" } */
+  }
+
+  T ("%E",    0.0);   /* { dg-warning ".%E. directive writing 12 bytes" } */
+  T ("%e",    0.0);   /* { dg-warning ".%e. directive writing 12 bytes" } */
+  T ("%F",    0.0);   /* { dg-warning ".%F. directive writing 8 bytes" } */
+  T ("%f",    0.0);   /* { dg-warning ".%f. directive writing 8 bytes" } */
+  T ("%G",    0.0);   /* { dg-warning ".%G. directive writing 1 byte" } */
+  T ("%g",    0.0);   /* { dg-warning ".%g. directive writing 1 byte" } */
+
+  T ("%i",     123);  /* { dg-warning ".%i. directive writing 3 bytes" } */
+
+  {
+    int n;
+
+    T ("%n",    &n);  /* { dg-warning "writing a terminating nul" } */
+    T ("%nH",   &n);  /* { dg-warning ".H. directive writing 1 byte" } */
+  }
+
+  T ("%o",     999);  /* { dg-warning ".%o. directive writing 4 bytes" } */
+  T ("%#o",    999);  /* { dg-warning ".%#o. directive writing 5 bytes" } */
+
+  T ("%x",    1234);  /* { dg-warning ".%x. directive writing 3 bytes" } */
+  T ("%#X",   1235);  /* { dg-warning ".%#X. directive writing 5 bytes" } */
+
+  T ("%S",    L"1");  /* { dg-warning ".%S. directive writing 1 byte" } */
+  T ("%-s",    "1");  /* { dg-warning ".%-s. directive writing 1 byte" } */
+
+  /* Verify that characters in the source character set appear in
+     the text of the warning unchanged (i.e., not as their equivalents
+     in the execution character set on the target).  The trailing %%
+     disables sprintf->strcpy optimization.  */
+  T ("ABCDEFGHIJ%%");   /* { dg-warning ".ABCDEFGHIJ. directive writing 10 bytes" } */
+  T ("KLMNOPQRST%%");   /* { dg-warning ".KLMNOPQRST. directive writing 10 bytes" } */
+  T ("UVWXYZ%%");       /* { dg-warning ".UVWXYZ. directive writing 6 bytes" } */
+
+  T ("abcdefghij%%");   /* { dg-warning ".abcdefghij. directive writing 10 bytes" } */
+  T ("klmnopqrst%%");   /* { dg-warning ".klmnopqrst. directive writing 10 bytes" } */
+  T ("uvwxyz%%");       /* { dg-warning ".uvwxyz. directive writing 6 bytes" } */
+}
+
+#undef T
+#define T(...) (__builtin_sprintf (d, __VA_ARGS__), sink (d))
+
+void test_width_and_precision_out_of_range (char *d)
+{
+#if __LONG_MAX__ == 2147483647
+#  define   MAX_P1_STR "2147483648"
+#elif __LONG_MAX__ == 9223372036854775807
+#  define MAX_P1_STR "9223372036854775808"
+#endif
+
+  T ("%" MAX_P1_STR "i", 0);    /* { dg-warning "width out of range" } */
+  /* { dg-warning "result to exceed .INT_MAX. " "" { target *-*-* } .-1 } */
+  T ("%." MAX_P1_STR "i", 0);   /* { dg-warning "precision out of range" } */
+
+  /* The following is diagnosed by -Wformat (disabled here).  */
+  /* T ("%" MAX_P1_STR "$i", 0); */
+}