diff mbox

[4/4] c-format.c: suggest the correct format string to use (PR c/64955)

Message ID 1470239113-42666-4-git-send-email-dmalcolm@redhat.com
State New
Headers show

Commit Message

David Malcolm Aug. 3, 2016, 3:45 p.m. UTC
This adds fix-it hints to c-format.c so that it can (sometimes) suggest
the format string the user should have used.

The patch adds selftests for the new code in c-format.c.  These
selftests are thus lang-specific.  This is the first time we've had
lang-specific selftests, and hence the patch also adds a langhook for
running them.  (Note that currently the Makefile only invokes the
selftests for cc1).

Successfully bootstrapped&regrtested in conjunction with the rest of the
patch kit on x86_64-pc-linux-gnu.

(The v2 version of the patch had a successful selftest run for stage 1 on
powerpc-ibm-aix7.1.3.0 (gcc111) in conjunction with the rest of the patch
kit, and a successful build of stage1 for all targets via config-list.mk;
the patch has only been rebased since)

OK for trunk if it passes testing?

gcc/c-family/ChangeLog:
	PR c/64955
	* c-common.h (selftest::c_format_c_tests): New declaration.
	(selftest::run_c_tests): New declaration.
	* c-format.c: Include "selftest.h.
	(format_warning_va): Add param "corrected_substring" and use
	it to add a replacement fix-it hint.
	(format_warning_at_substring): Likewise.
	(format_warning_at_char): Update for new param of
	format_warning_va.
	(check_format_info_main): Pass "fki" to check_format_types.
	(check_format_types): Add param "fki" and pass it to
	format_type_warning.
	(deref_n_times): New function.
	(get_modifier_for_format_len): New function.
	(selftest::test_get_modifier_for_format_len): New function.
	(get_format_for_type): New function.
	(format_type_warning): Add param "fki" and use it to attempt
	to provide hints for argument types when calling
	format_warning_at_substring.
	(selftest::get_info): New function.
	(selftest::assert_format_for_type_streq): New function.
	(ASSERT_FORMAT_FOR_TYPE_STREQ): New macro.
	(selftest::test_get_format_for_type_printf): New function.
	(selftest::test_get_format_for_type_scanf): New function.
	(selftest::c_format_c_tests): New function.

gcc/c/ChangeLog:
	PR c/64955
	* c-lang.c (LANG_HOOKS_RUN_LANG_SELFTESTS): If CHECKING_P, wire
	this up to selftest::run_c_tests.
	(selftest::run_c_tests): New function.

gcc/ChangeLog:
	PR c/64955
	* langhooks-def.h (LANG_HOOKS_RUN_LANG_SELFTESTS): New default
	do-nothing langhook.
	(LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_RUN_LANG_SELFTESTS.
	* langhooks.h (struct lang_hooks): Add run_lang_selftests.
	* selftest-run-tests.c: Include "tree.h" and "langhooks.h".
	(selftest::run_tests): Call lang_hooks.run_lang_selftests.

gcc/testsuite/ChangeLog:
	PR c/64955
	* gcc.dg/format/diagnostic-ranges.c: Add fix-it hints to expected
	output.
---
 gcc/c-family/c-common.h                         |   7 +
 gcc/c-family/c-format.c                         | 268 ++++++++++++++++++++++--
 gcc/c/c-lang.c                                  |  22 ++
 gcc/langhooks-def.h                             |   4 +-
 gcc/langhooks.h                                 |   3 +
 gcc/selftest-run-tests.c                        |   5 +
 gcc/testsuite/gcc.dg/format/diagnostic-ranges.c |  30 ++-
 7 files changed, 319 insertions(+), 20 deletions(-)

Comments

Jeff Law Aug. 4, 2016, 7:55 p.m. UTC | #1
On 08/03/2016 09:45 AM, David Malcolm wrote:
> This adds fix-it hints to c-format.c so that it can (sometimes) suggest
> the format string the user should have used.
>
> The patch adds selftests for the new code in c-format.c.  These
> selftests are thus lang-specific.  This is the first time we've had
> lang-specific selftests, and hence the patch also adds a langhook for
> running them.  (Note that currently the Makefile only invokes the
> selftests for cc1).
>
> Successfully bootstrapped&regrtested in conjunction with the rest of the
> patch kit on x86_64-pc-linux-gnu.
>
> (The v2 version of the patch had a successful selftest run for stage 1 on
> powerpc-ibm-aix7.1.3.0 (gcc111) in conjunction with the rest of the patch
> kit, and a successful build of stage1 for all targets via config-list.mk;
> the patch has only been rebased since)
>
> OK for trunk if it passes testing?
>
> gcc/c-family/ChangeLog:
> 	PR c/64955
> 	* c-common.h (selftest::c_format_c_tests): New declaration.
> 	(selftest::run_c_tests): New declaration.
> 	* c-format.c: Include "selftest.h.
> 	(format_warning_va): Add param "corrected_substring" and use
> 	it to add a replacement fix-it hint.
> 	(format_warning_at_substring): Likewise.
> 	(format_warning_at_char): Update for new param of
> 	format_warning_va.
> 	(check_format_info_main): Pass "fki" to check_format_types.
> 	(check_format_types): Add param "fki" and pass it to
> 	format_type_warning.
> 	(deref_n_times): New function.
> 	(get_modifier_for_format_len): New function.
> 	(selftest::test_get_modifier_for_format_len): New function.
> 	(get_format_for_type): New function.
> 	(format_type_warning): Add param "fki" and use it to attempt
> 	to provide hints for argument types when calling
> 	format_warning_at_substring.
> 	(selftest::get_info): New function.
> 	(selftest::assert_format_for_type_streq): New function.
> 	(ASSERT_FORMAT_FOR_TYPE_STREQ): New macro.
> 	(selftest::test_get_format_for_type_printf): New function.
> 	(selftest::test_get_format_for_type_scanf): New function.
> 	(selftest::c_format_c_tests): New function.
>
> gcc/c/ChangeLog:
> 	PR c/64955
> 	* c-lang.c (LANG_HOOKS_RUN_LANG_SELFTESTS): If CHECKING_P, wire
> 	this up to selftest::run_c_tests.
> 	(selftest::run_c_tests): New function.
>
> gcc/ChangeLog:
> 	PR c/64955
> 	* langhooks-def.h (LANG_HOOKS_RUN_LANG_SELFTESTS): New default
> 	do-nothing langhook.
> 	(LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_RUN_LANG_SELFTESTS.
> 	* langhooks.h (struct lang_hooks): Add run_lang_selftests.
> 	* selftest-run-tests.c: Include "tree.h" and "langhooks.h".
> 	(selftest::run_tests): Call lang_hooks.run_lang_selftests.
>
> gcc/testsuite/ChangeLog:
> 	PR c/64955
> 	* gcc.dg/format/diagnostic-ranges.c: Add fix-it hints to expected
> 	output.
So presumably we always use the type of the argument as the "correct" 
type and assume the format string is what needs to be fixed (with the 
exception of getting the right amount of *s handled).  That seems 
intuitively the right thing to do, but do we have a hit rate better than 
50% in practice?

This is OK.  I'm really just curious about your thoughts/experiences on 
the heuristics.

jeff
David Malcolm Aug. 4, 2016, 9:06 p.m. UTC | #2
On Thu, 2016-08-04 at 13:55 -0600, Jeff Law wrote:
> On 08/03/2016 09:45 AM, David Malcolm wrote:
> > This adds fix-it hints to c-format.c so that it can (sometimes)
> > suggest
> > the format string the user should have used.
> > 
> > The patch adds selftests for the new code in c-format.c.  These
> > selftests are thus lang-specific.  This is the first time we've had
> > lang-specific selftests, and hence the patch also adds a langhook
> > for
> > running them.  (Note that currently the Makefile only invokes the
> > selftests for cc1).
> > 
> > Successfully bootstrapped&regrtested in conjunction with the rest
> > of the
> > patch kit on x86_64-pc-linux-gnu.
> > 
> > (The v2 version of the patch had a successful selftest run for
> > stage 1 on
> > powerpc-ibm-aix7.1.3.0 (gcc111) in conjunction with the rest of the
> > patch
> > kit, and a successful build of stage1 for all targets via config
> > -list.mk;
> > the patch has only been rebased since)
> > 
> > OK for trunk if it passes testing?
> > 
> > gcc/c-family/ChangeLog:
> > 	PR c/64955
> > 	* c-common.h (selftest::c_format_c_tests): New declaration.
> > 	(selftest::run_c_tests): New declaration.
> > 	* c-format.c: Include "selftest.h.
> > 	(format_warning_va): Add param "corrected_substring" and use
> > 	it to add a replacement fix-it hint.
> > 	(format_warning_at_substring): Likewise.
> > 	(format_warning_at_char): Update for new param of
> > 	format_warning_va.
> > 	(check_format_info_main): Pass "fki" to check_format_types.
> > 	(check_format_types): Add param "fki" and pass it to
> > 	format_type_warning.
> > 	(deref_n_times): New function.
> > 	(get_modifier_for_format_len): New function.
> > 	(selftest::test_get_modifier_for_format_len): New function.
> > 	(get_format_for_type): New function.
> > 	(format_type_warning): Add param "fki" and use it to attempt
> > 	to provide hints for argument types when calling
> > 	format_warning_at_substring.
> > 	(selftest::get_info): New function.
> > 	(selftest::assert_format_for_type_streq): New function.
> > 	(ASSERT_FORMAT_FOR_TYPE_STREQ): New macro.
> > 	(selftest::test_get_format_for_type_printf): New function.
> > 	(selftest::test_get_format_for_type_scanf): New function.
> > 	(selftest::c_format_c_tests): New function.
> > 
> > gcc/c/ChangeLog:
> > 	PR c/64955
> > 	* c-lang.c (LANG_HOOKS_RUN_LANG_SELFTESTS): If CHECKING_P, wire
> > 	this up to selftest::run_c_tests.
> > 	(selftest::run_c_tests): New function.
> > 
> > gcc/ChangeLog:
> > 	PR c/64955
> > 	* langhooks-def.h (LANG_HOOKS_RUN_LANG_SELFTESTS): New default
> > 	do-nothing langhook.
> > 	(LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_RUN_LANG_SELFTESTS.
> > 	* langhooks.h (struct lang_hooks): Add run_lang_selftests.
> > 	* selftest-run-tests.c: Include "tree.h" and "langhooks.h".
> > 	(selftest::run_tests): Call lang_hooks.run_lang_selftests.
> > 
> > gcc/testsuite/ChangeLog:
> > 	PR c/64955
> > 	* gcc.dg/format/diagnostic-ranges.c: Add fix-it hints to
> > expected
> > 	output.
> So presumably we always use the type of the argument as the "correct"
> type and assume the format string is what needs to be fixed (with the
> exception of getting the right amount of *s handled).  That seems 
> intuitively the right thing to do, but do we have a hit rate better
> than 
> 50% in practice?
> 
> This is OK.  I'm really just curious about your thoughts/experiences
> on 
> the heuristics.

Our current behavior is to emit a warning of the form:

test.c:9:18: warning: format ‘%i’ expects argument of type ‘int’, but
argument 2 has type ‘const char *’ [-Wformat=]
   printf("hello %i", msg);
                  ^

Note that the text of the diagnostic tells the user the two types
involved within the message.

My experience with printf-style issues of this form is that I know want
expression I want to print, but I don't necessarily know exactly
whether it's say, an int vs a long: it's not that I want to print "an
int", it's that I wanted to print some specific expression.  Hence 
 (assuming this experience is typical) for mismatches of printf-style
calls, I think it's most helpful to the user to tell the user what
format code they need for the expression they want to print, which the
patch does, via the fix-it:

test.c:9:18: warning: format ‘%i’ expects argument of type ‘int’, but
argument 2 has type ‘const char *’ [-Wformat=]
   printf("hello %i",
msg);
                 ~^
                 %s

Note how the text of the diagnostic is unchanged; it's just the fixit that's new (and the underline, which is patch 3 of the kit).

For scanf-style calls this argument may not be so strong, but I think it still holds for the "do I have an int or a long?" cases (giving int * vs long *).   It would be nice for scanf to detect the "you forgot to put an & in front of the destination lvalue" case and offer a fix-it hint for it, but I think that's a followup.

So I think it's likely to be much better than 50%, but that's a gut feeling, based on the above arguments.

I'm not aware of anyone who's done formal usability testing of a compiler's diagnostics.  I think there are two distinct types of activity:
(a) the edit->try to compile->edit->try to compile -> "it compiles!" cycle (nested within a debug cycle)
(b) compiling pre-existing code, perhaps with a different configuration than it was written on, often written by someone else

I'd be very interested in seeing usability studies of both aspects of a compiler, but in particular of (a): is there a published corpus somewhere of the kind of half-written non-compiling code that happens during (a) out there?  (I know this invites various sarcastic responses, but I'm serious :) )

(For myself, I attempt to run with a relatively recent build of gcc trunk as my day-to-day compiler, and if I see a diagnostic that could be improved whilst I'm hacking on something I make a note of it, and try to come back to it later as an RFE; bug reports of this form are most welcome).

Dave
diff mbox

Patch

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 7b5da57..61f9ced 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1533,4 +1533,11 @@  extern bool valid_array_size_p (location_t, tree, tree);
 extern bool cilk_ignorable_spawn_rhs_op (tree);
 extern bool cilk_recognize_spawn (tree, tree *);
 
+#if CHECKING_P
+namespace selftest {
+  extern void c_format_c_tests (void);
+  extern void run_c_tests (void);
+} // namespace selftest
+#endif /* #if CHECKING_P */
+
 #endif /* ! GCC_C_COMMON_H */
diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 5b79588..f5a4011 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -30,6 +30,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-format.h"
 #include "diagnostic.h"
+#include "selftest.h"
 
 /* Handle attributes associated with format checking.  */
 
@@ -126,11 +127,21 @@  static int format_flags (int format_num);
      printf(fmt, msg);
             ^~~  ~~~
 
+   If CORRECTED_SUBSTRING is non-NULL, use it for cases 1 and 2 to provide
+   a fix-it hint, suggesting that it should replace the text within the
+   substring range.  For example:
+
+     test.c:90:10: warning: problem with '%i' here [-Wformat=]
+     printf ("hello %i", msg);
+                    ~^
+                    %s
+
    Return true if a warning was emitted, false otherwise.  */
 
-ATTRIBUTE_GCC_DIAG (4,0)
+ATTRIBUTE_GCC_DIAG (5,0)
 static bool
 format_warning_va (const substring_loc &fmt_loc, source_range *param_range,
+		   const char *corrected_substring,
 		   int opt, const char *gmsgid, va_list *ap)
 {
   bool substring_within_range = false;
@@ -174,6 +185,9 @@  format_warning_va (const substring_loc &fmt_loc, source_range *param_range,
       richloc.add_range (param_loc, false);
     }
 
+  if (!err && corrected_substring && substring_within_range)
+    richloc.add_fixit_replace (fmt_substring_range, corrected_substring);
+
   diagnostic_info diagnostic;
   diagnostic_set_info (&diagnostic, gmsgid, ap, &richloc, DK_WARNING);
   diagnostic.option_index = opt;
@@ -182,22 +196,31 @@  format_warning_va (const substring_loc &fmt_loc, source_range *param_range,
   if (!err && substring_loc && !substring_within_range)
     /* Case 2.  */
     if (warned)
-      inform (substring_loc, "format string is defined here");
+      {
+	rich_location substring_richloc (line_table, substring_loc);
+	if (corrected_substring)
+	  substring_richloc.add_fixit_replace (fmt_substring_range,
+					       corrected_substring);
+	inform_at_rich_loc (&substring_richloc,
+			    "format string is defined here");
+      }
 
   return warned;
 }
 
 /* Variadic call to format_warning_va.  */
 
-ATTRIBUTE_GCC_DIAG (4,0)
+ATTRIBUTE_GCC_DIAG (5,0)
 static bool
 format_warning_at_substring (const substring_loc &fmt_loc,
 			     source_range *param_range,
+			     const char *corrected_substring,
 			     int opt, const char *gmsgid, ...)
 {
   va_list ap;
   va_start (ap, gmsgid);
-  bool warned = format_warning_va (fmt_loc, param_range, opt, gmsgid, &ap);
+  bool warned = format_warning_va (fmt_loc, param_range, corrected_substring,
+				   opt, gmsgid, &ap);
   va_end (ap);
 
   return warned;
@@ -225,7 +248,7 @@  format_warning_at_char (location_t fmt_string_loc, tree format_string_cst,
   char_idx -= 1;
 
   substring_loc fmt_loc (fmt_string_loc, string_type, char_idx, char_idx);
-  bool warned = format_warning_va (fmt_loc, NULL, opt, gmsgid, &ap);
+  bool warned = format_warning_va (fmt_loc, NULL, NULL, opt, gmsgid, &ap);
   va_end (ap);
 
   return warned;
@@ -1126,11 +1149,13 @@  static const format_flag_spec *get_flag_spec (const format_flag_spec *,
 					      int, const char *);
 
 static void check_format_types (const substring_loc &fmt_loc,
-				format_wanted_type *);
+				format_wanted_type *,
+				const format_kind_info *fki);
 static void format_type_warning (const substring_loc &fmt_loc,
 				 source_range *param_range,
 				 format_wanted_type *, tree,
-				 tree);
+				 tree,
+				 const format_kind_info *fki);
 
 /* Decode a format type from a string, returning the type, or
    format_type_error if not valid, in which case the caller should print an
@@ -2534,7 +2559,7 @@  check_format_info_main (format_check_results *res,
 	  ptrdiff_t offset_to_format_end = (format_chars - 1) - orig_format_chars;
 	  substring_loc fmt_loc (fmt_param_loc, TREE_TYPE (format_string_cst),
 				 offset_to_format_start, offset_to_format_end);
-	  check_format_types (fmt_loc, first_wanted_type);
+	  check_format_types (fmt_loc, first_wanted_type, fki);
 	}
     }
 
@@ -2558,7 +2583,7 @@  check_format_info_main (format_check_results *res,
    location of the format conversion.  */
 static void
 check_format_types (const substring_loc &fmt_loc,
-		    format_wanted_type *types)
+		    format_wanted_type *types, const format_kind_info *fki)
 {
   for (; types != 0; types = types->next)
     {
@@ -2585,7 +2610,7 @@  check_format_types (const substring_loc &fmt_loc,
       cur_param = types->param;
       if (!cur_param)
         {
-          format_type_warning (fmt_loc, NULL, types, wanted_type, NULL);
+	  format_type_warning (fmt_loc, NULL, types, wanted_type, NULL, fki);
           continue;
         }
 
@@ -2670,7 +2695,7 @@  check_format_types (const substring_loc &fmt_loc,
 	  else
 	    {
 	      format_type_warning (fmt_loc, param_range_ptr,
-				   types, wanted_type, orig_cur_type);
+				   types, wanted_type, orig_cur_type, fki);
 	      break;
 	    }
 	}
@@ -2739,10 +2764,115 @@  check_format_types (const substring_loc &fmt_loc,
 	continue;
       /* Now we have a type mismatch.  */
       format_type_warning (fmt_loc, param_range_ptr, types,
-			   wanted_type, orig_cur_type);
+			   wanted_type, orig_cur_type, fki);
+    }
+}
+
+/* Given type TYPE, attempt to dereference the type N times
+   (e.g. from ("int ***", 2) to "int *")
+
+   Return the derefenced type, with any qualifiers
+   such as "const" stripped from the result, or
+   NULL if unsuccessful (e.g. TYPE is not a pointer type).  */
+
+static tree
+deref_n_times (tree type, int n)
+{
+  gcc_assert (type);
+
+  for (int i = n; i > 0; i--)
+    {
+      if (TREE_CODE (type) != POINTER_TYPE)
+	return NULL_TREE;
+      type = TREE_TYPE (type);
     }
+  /* Strip off any "const" etc.  */
+  return build_qualified_type (type, 0);
 }
 
+/* Lookup the format code for FORMAT_LEN within FLI,
+   returning the string code for expressing it, or NULL
+   if it is not found.  */
+
+static const char *
+get_modifier_for_format_len (const format_length_info *fli,
+			     enum format_lengths format_len)
+{
+  for (; fli->name; fli++)
+    {
+      if (fli->index == format_len)
+	return fli->name;
+      if (fli->double_index == format_len)
+	return fli->double_name;
+    }
+  return NULL;
+}
+
+#if CHECKING_P
+
+namespace selftest {
+
+static void
+test_get_modifier_for_format_len ()
+{
+  ASSERT_STREQ ("h",
+		get_modifier_for_format_len (printf_length_specs, FMT_LEN_h));
+  ASSERT_STREQ ("hh",
+		get_modifier_for_format_len (printf_length_specs, FMT_LEN_hh));
+  ASSERT_STREQ ("L",
+		get_modifier_for_format_len (printf_length_specs, FMT_LEN_L));
+  ASSERT_EQ (NULL,
+	     get_modifier_for_format_len (printf_length_specs, FMT_LEN_none));
+}
+
+} // namespace selftest
+
+#endif /* CHECKING_P */
+
+/* Generate a string containing the format string that should be
+   used to format arguments of type ARG_TYPE within FKI (effectively
+   the inverse of the checking code).
+
+   If successful, returns a non-NULL string which should be freed
+   by the called.
+   Otherwise, returns NULL.  */
+
+static char *
+get_format_for_type (const format_kind_info *fki, tree arg_type)
+{
+  gcc_assert (arg_type);
+
+  const format_char_info *spec;
+  for (spec = &fki->conversion_specs[0];
+       spec->format_chars;
+       spec++)
+    {
+      tree effective_arg_type = deref_n_times (arg_type,
+					       spec->pointer_count);
+      if (!effective_arg_type)
+	continue;
+      for (int i = 0; i < FMT_LEN_MAX; i++)
+	{
+	  const format_type_detail *ftd = &spec->types[i];
+	  if (!ftd->type)
+	    continue;
+	  if (TYPE_CANONICAL (*ftd->type)
+	      == TYPE_CANONICAL (effective_arg_type))
+	    {
+	      const char *len_modifier
+		= get_modifier_for_format_len (fki->length_char_specs,
+					       (enum format_lengths)i);
+	      if (!len_modifier)
+		len_modifier = "";
+
+	      return xasprintf ("%%%s%c",
+				len_modifier,
+				spec->format_chars[0]);
+	    }
+	}
+   }
+  return NULL;
+}
 
 /* Give a warning at FMT_LOC about a format argument of different type
    from that expected.  If non-NULL, PARAM_RANGE is the source range of the
@@ -2756,9 +2886,10 @@  static void
 format_type_warning (const substring_loc &fmt_loc,
 		     source_range *param_range,
 		     format_wanted_type *type,
-		     tree wanted_type, tree arg_type)
+		     tree wanted_type, tree arg_type,
+		     const format_kind_info *fki)
 {
-  int kind = type->kind;
+  enum format_specifier_kind kind = type->kind;
   const char *wanted_type_name = type->wanted_type_name;
   const char *format_start = type->format_start;
   int format_length = type->format_length;
@@ -2797,12 +2928,18 @@  format_type_warning (const substring_loc &fmt_loc,
       p[pointer_count + 1] = 0;
     }
 
+  /* Attempt to provide hints for argument types, but not for field widths
+     and precisions.  */
+  char *format_for_type = NULL;
+  if (arg_type && kind == CF_KIND_FORMAT)
+    format_for_type = get_format_for_type (fki, arg_type);
+
   if (wanted_type_name)
     {
       if (arg_type)
 	format_warning_at_substring
 	  (fmt_loc, param_range,
-	   OPT_Wformat_,
+	   format_for_type, OPT_Wformat_,
 	   "%s %<%s%.*s%> expects argument of type %<%s%s%>, "
 	   "but argument %d has type %qT",
 	   gettext (kind_descriptions[kind]),
@@ -2812,7 +2949,7 @@  format_type_warning (const substring_loc &fmt_loc,
       else
 	format_warning_at_substring
 	  (fmt_loc, param_range,
-	   OPT_Wformat_,
+	   format_for_type, OPT_Wformat_,
 	   "%s %<%s%.*s%> expects a matching %<%s%s%> argument",
 	   gettext (kind_descriptions[kind]),
 	   (kind == CF_KIND_FORMAT ? "%" : ""),
@@ -2823,7 +2960,7 @@  format_type_warning (const substring_loc &fmt_loc,
       if (arg_type)
 	format_warning_at_substring
 	  (fmt_loc, param_range,
-	   OPT_Wformat_,
+	   format_for_type, OPT_Wformat_,
 	   "%s %<%s%.*s%> expects argument of type %<%T%s%>, "
 	   "but argument %d has type %qT",
 	   gettext (kind_descriptions[kind]),
@@ -2833,12 +2970,14 @@  format_type_warning (const substring_loc &fmt_loc,
       else
 	format_warning_at_substring
 	  (fmt_loc, param_range,
-	   OPT_Wformat_,
+	   format_for_type, OPT_Wformat_,
 	   "%s %<%s%.*s%> expects a matching %<%T%s%> argument",
 	   gettext (kind_descriptions[kind]),
 	   (kind == CF_KIND_FORMAT ? "%" : ""),
 	   format_length, format_start, wanted_type, p);
     }
+
+  free (format_for_type);
 }
 
 
@@ -3359,3 +3498,96 @@  handle_format_attribute (tree *node, tree ARG_UNUSED (name), tree args,
 
   return NULL_TREE;
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Selftests of location handling.  */
+
+/* Get the format_kind_info with the given name.  */
+
+static const format_kind_info *
+get_info (const char *name)
+{
+  int idx = decode_format_type (name);
+  const format_kind_info *fki = &format_types[idx];
+  ASSERT_STREQ (fki->name, name);
+  return fki;
+}
+
+/* Verify that get_format_for_type (FKI, TYPE) is EXPECTED_FORMAT.  */
+
+static void
+assert_format_for_type_streq (const location &loc, const format_kind_info *fki,
+			      const char *expected_format, tree type)
+{
+  gcc_assert (fki);
+  gcc_assert (expected_format);
+  gcc_assert (type);
+
+  char *actual_format = get_format_for_type (fki, type);
+  ASSERT_STREQ_AT (loc, expected_format, actual_format);
+  free (actual_format);
+}
+
+/* Selftests for get_format_for_type.  */
+
+#define ASSERT_FORMAT_FOR_TYPE_STREQ(EXPECTED_FORMAT, TYPE) \
+  assert_format_for_type_streq (SELFTEST_LOCATION, (fki), (EXPECTED_FORMAT), (TYPE))
+
+/* Selftest for get_format_for_type for "printf"-style functions.  */
+
+static void
+test_get_format_for_type_printf ()
+{
+  const format_kind_info *fki = get_info ("gnu_printf");
+  ASSERT_NE (fki, NULL);
+
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%f", double_type_node);
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%Lf", long_double_type_node);
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%d", integer_type_node);
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%o", unsigned_type_node);
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%ld", long_integer_type_node);
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%lo", long_unsigned_type_node);
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%lld", long_long_integer_type_node);
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%llo", long_long_unsigned_type_node);
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%s", build_pointer_type (char_type_node));
+}
+
+/* Selftest for get_format_for_type for "scanf"-style functions.  */
+
+static void
+test_get_format_for_type_scanf ()
+{
+  const format_kind_info *fki = get_info ("gnu_scanf");
+  ASSERT_NE (fki, NULL);
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%d", build_pointer_type (integer_type_node));
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%u", build_pointer_type (unsigned_type_node));
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%ld",
+				build_pointer_type (long_integer_type_node));
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%lu",
+				build_pointer_type (long_unsigned_type_node));
+  ASSERT_FORMAT_FOR_TYPE_STREQ
+    ("%lld", build_pointer_type (long_long_integer_type_node));
+  ASSERT_FORMAT_FOR_TYPE_STREQ
+    ("%llu", build_pointer_type (long_long_unsigned_type_node));
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%e", build_pointer_type (float_type_node));
+  ASSERT_FORMAT_FOR_TYPE_STREQ ("%le", build_pointer_type (double_type_node));
+}
+
+#undef ASSERT_FORMAT_FOR_TYPE_STREQ
+
+/* Run all of the selftests within this file.  */
+
+void
+c_format_c_tests ()
+{
+  test_get_modifier_for_format_len ();
+  test_get_format_for_type_printf ();
+  test_get_format_for_type_scanf ();
+}
+
+} // namespace selftest
+
+#endif /* CHECKING_P */
diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c
index 89954b7..b26be6a 100644
--- a/gcc/c/c-lang.c
+++ b/gcc/c/c-lang.c
@@ -38,7 +38,29 @@  enum c_language_kind c_language = clk_c;
 #undef LANG_HOOKS_INIT_TS
 #define LANG_HOOKS_INIT_TS c_common_init_ts
 
+#if CHECKING_P
+#undef LANG_HOOKS_RUN_LANG_SELFTESTS
+#define LANG_HOOKS_RUN_LANG_SELFTESTS selftest::run_c_tests
+#endif /* #if CHECKING_P */
+
 /* Each front end provides its own lang hook initializer.  */
 struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
 
+#if CHECKING_P
+
+namespace selftest {
+
+/* Implementation of LANG_HOOKS_RUN_LANG_SELFTESTS for the C frontend.  */
+
+void
+run_c_tests (void)
+{
+  c_format_c_tests ();
+}
+
+} // namespace selftest
+
+#endif /* #if CHECKING_P */
+
+
 #include "gtype-c.h"
diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
index 034b3b7..c17f998 100644
--- a/gcc/langhooks-def.h
+++ b/gcc/langhooks-def.h
@@ -120,6 +120,7 @@  extern bool lhd_omp_mappable_type (tree);
 #define LANG_HOOKS_BLOCK_MAY_FALLTHRU	hook_bool_const_tree_true
 #define LANG_HOOKS_EH_USE_CXA_END_CLEANUP	false
 #define LANG_HOOKS_DEEP_UNSHARING	false
+#define LANG_HOOKS_RUN_LANG_SELFTESTS   lhd_do_nothing
 
 /* Attribute hooks.  */
 #define LANG_HOOKS_ATTRIBUTE_TABLE		NULL
@@ -319,7 +320,8 @@  extern void lhd_end_section (void);
   LANG_HOOKS_EH_PROTECT_CLEANUP_ACTIONS, \
   LANG_HOOKS_BLOCK_MAY_FALLTHRU, \
   LANG_HOOKS_EH_USE_CXA_END_CLEANUP, \
-  LANG_HOOKS_DEEP_UNSHARING \
+  LANG_HOOKS_DEEP_UNSHARING, \
+  LANG_HOOKS_RUN_LANG_SELFTESTS \
 }
 
 #endif /* GCC_LANG_HOOKS_DEF_H */
diff --git a/gcc/langhooks.h b/gcc/langhooks.h
index 0593424..169a678 100644
--- a/gcc/langhooks.h
+++ b/gcc/langhooks.h
@@ -505,6 +505,9 @@  struct lang_hooks
      gimplification.  */
   bool deep_unsharing;
 
+  /* Run all lang-specific selftests.  */
+  void (*run_lang_selftests) (void);
+
   /* Whenever you add entries here, make sure you adjust langhooks-def.h
      and langhooks.c accordingly.  */
 };
diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index 85e101d..9d75a8e 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -21,6 +21,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "selftest.h"
+#include "tree.h"
+#include "langhooks.h"
 
 /* This function needed to be split out from selftest.c as it references
    tests from the whole source tree, and so is within
@@ -70,6 +72,9 @@  selftest::run_tests ()
   /* This one relies on most of the above.  */
   function_tests_c_tests ();
 
+  /* Run any lang-specific selftests.  */
+  lang_hooks.run_lang_selftests ();
+
   /* Finished running tests.  */
   long finish_time = get_run_time ();
   long elapsed_time = finish_time - start_time;
diff --git a/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c b/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
index 9e86b52..ff51833 100644
--- a/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
+++ b/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
@@ -12,6 +12,25 @@  void test_mismatching_types (const char *msg)
 /* { dg-begin-multiline-output "" }
    printf("hello %i", msg);
                  ~^
+                 %s
+   { dg-end-multiline-output "" } */
+
+
+  printf("hello %s", 42);  /* { dg-warning "format '%s' expects argument of type 'char \\*', but argument 2 has type 'int'" } */
+/* TODO: ideally would also underline "42".  */
+/* { dg-begin-multiline-output "" }
+   printf("hello %s", 42);
+                 ~^
+                 %d
+   { dg-end-multiline-output "" } */
+
+
+  printf("hello %i", (long)0);  /* { dg-warning "format '%i' expects argument of type 'int', but argument 2 has type 'long int' " } */
+/* TODO: ideally would also underline the argument.  */
+/* { dg-begin-multiline-output "" }
+   printf("hello %i", (long)0);
+                 ~^
+                 %ld
    { dg-end-multiline-output "" } */
 }
 
@@ -23,6 +42,7 @@  void test_multiple_arguments (void)
 /* { dg-begin-multiline-output "" }
    printf ("arg0: %i  arg1: %s arg 2: %i",
                             ~^
+                            %d
    { dg-end-multiline-output "" } */
 }
 
@@ -33,6 +53,7 @@  void test_multiple_arguments_2 (int i, int j)
 /* { dg-begin-multiline-output "" }
    printf ("arg0: %i  arg1: %s arg 2: %i",
                             ~^
+                            %d
            100, i + j, 102);
                 ~~~~~         
    { dg-end-multiline-output "" } */
@@ -67,6 +88,7 @@  void test_hex (const char *msg)
 /* { dg-begin-multiline-output "" }
    printf("hello \x25\x69", msg);
                  ~~~~~~~^
+                 %s
    { dg-end-multiline-output "" } */
 }
 
@@ -80,6 +102,7 @@  void test_oct (const char *msg)
 /* { dg-begin-multiline-output "" }
    printf("hello \045\151", msg);
                  ~~~~~~~^
+                 %s
    { dg-end-multiline-output "" } */
 }
 
@@ -98,6 +121,7 @@  void test_multiple (const char *msg)
 /* { dg-begin-multiline-output "" }
    printf("prefix"  "\x25"  "\151"  "suffix",
                      ~~~~~~~~~~~^
+                     %s
   { dg-end-multiline-output "" } */
 }
 
@@ -108,6 +132,7 @@  void test_u8 (const char *msg)
 /* { dg-begin-multiline-output "" }
    printf(u8"hello %i", msg);
                    ~^
+                   %s
    { dg-end-multiline-output "" } */
 }
 
@@ -117,6 +142,7 @@  void test_param (long long_i, long long_j)
 /* { dg-begin-multiline-output "" }
    printf ("foo %s bar", long_i + long_j);
                 ~^       ~~~~~~~~~~~~~~~
+                %ld
    { dg-end-multiline-output "" } */
 }
 
@@ -192,13 +218,14 @@  void test_macro (const char *msg)
 /* { dg-begin-multiline-output "" }
  #define INT_FMT "%i"
                   ~^
+                  %s
    { dg-end-multiline-output "" } */
 }
 
 void test_non_contiguous_strings (void)
 {
   __builtin_printf(" %" "d ", 0.5); /* { dg-warning "20: format .%d. expects argument of type .int., but argument 2 has type .double." } */
-                                    /* { dg-message "26: format string is defined here" "" { target *-*-* } 200 } */
+                                    /* { dg-message "26: format string is defined here" "" { target *-*-* } 227 } */
   /* { dg-begin-multiline-output "" }
    __builtin_printf(" %" "d ", 0.5);
                     ^~~~
@@ -206,6 +233,7 @@  void test_non_contiguous_strings (void)
   /* { dg-begin-multiline-output "" }
    __builtin_printf(" %" "d ", 0.5);
                       ~~~~^
+                      %f
    { dg-end-multiline-output "" } */
 }