diff mbox series

[1/2] Flag CPP_W_BIDIRECTIONAL so that source lines are escaped

Message ID 20211102205801.1202228-2-dmalcolm@redhat.com
State New
Headers show
Series Re: [PATCH] libcpp: Implement -Wbidirectional for CVE-2021-42574 [PR103026] | expand

Commit Message

David Malcolm Nov. 2, 2021, 8:58 p.m. UTC
Before:

  Wbidirectional-1.c: In function ‘main’:
  Wbidirectional-1.c:6:43: warning: unpaired UTF-8 bidirectional character detected [-Wbidirectional=]
      6 |     /*‮ } ⁦if (isAdmin)⁩ ⁦ begin admins only */
        |                                           ^
  Wbidirectional-1.c:9:28: warning: unpaired UTF-8 bidirectional character detected [-Wbidirectional=]
      9 |     /* end admins only ‮ { ⁦*/
        |                            ^

  Wbidirectional-11.c:6:15: warning: UTF-8 vs UCN mismatch when closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-Wbidirectional=]
      6 | int LRE_‪_PDF_\u202c;
        |               ^

After setting rich_loc.set_escape_on_output (true):

  Wbidirectional-1.c:6:43: warning: unpaired UTF-8 bidirectional character detected [-Wbidirectional=]
      6 |     /*<U+202E> } <U+2066>if (isAdmin)<U+2069> <U+2066> begin admins only */
        |                                                                           ^
  Wbidirectional-1.c:9:28: warning: unpaired UTF-8 bidirectional character detected [-Wbidirectional=]
      9 |     /* end admins only <U+202E> { <U+2066>*/
        |                                            ^

  Wbidirectional-11.c:6:15: warning: UTF-8 vs UCN mismatch when closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-Wbidirectional=]
      6 | int LRE_<U+202A>_PDF_\u202c;
        |                       ^

libcpp/ChangeLog:
	* lex.c (maybe_warn_bidi_on_close): Use a rich_location
	and call set_escape_on_output (true) on it.
	(maybe_warn_bidi_on_char): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
---
 libcpp/lex.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

Comments

David Malcolm Nov. 2, 2021, 9:07 p.m. UTC | #1
On Tue, 2021-11-02 at 16:58 -0400, David Malcolm wrote:
> Before:
> 
>   Wbidirectional-1.c: In function ‘main’:
>   Wbidirectional-1.c:6:43: warning: unpaired UTF-8 bidirectional
> character detected [-Wbidirectional=]
>       6 |     /*‮ } ⁦if (isAdmin)⁩ ⁦ begin admins only */
>         |                                           ^
>   Wbidirectional-1.c:9:28: warning: unpaired UTF-8 bidirectional
> character detected [-Wbidirectional=]
>       9 |     /* end admins only ‮ { ⁦*/
>         |                            ^
> 
>   Wbidirectional-11.c:6:15: warning: UTF-8 vs UCN mismatch when
> closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-
> Wbidirectional=]
>       6 | int LRE_‪_PDF_\u202c;
>         |               ^
> 
> After setting rich_loc.set_escape_on_output (true):
> 
>   Wbidirectional-1.c:6:43: warning: unpaired UTF-8 bidirectional
> character detected [-Wbidirectional=]
>       6 |     /*<U+202E> } <U+2066>if (isAdmin)<U+2069> <U+2066>
> begin admins only */
>        
> |                                                                    
>        ^
>   Wbidirectional-1.c:9:28: warning: unpaired UTF-8 bidirectional
> character detected [-Wbidirectional=]
>       9 |     /* end admins only <U+202E> { <U+2066>*/
>         |                                            ^
> 
>   Wbidirectional-11.c:6:15: warning: UTF-8 vs UCN mismatch when
> closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-
> Wbidirectional=]
>       6 | int LRE_<U+202A>_PDF_\u202c;
>         |                       ^
> 
> libcpp/ChangeLog:
>         * lex.c (maybe_warn_bidi_on_close): Use a rich_location
>         and call set_escape_on_output (true) on it.
>         (maybe_warn_bidi_on_char): Likewise.
> 
> Signed-off-by: David Malcolm <dmalcolm@redhat.com>

[...snip...]

To be more explicit: part of the benefit of escaping non-ASCII bytes in
the source line is that it further mitigates against CVE-2021-42574,
since it "defangs" the bidi control characters - turning everything
into ASCII, so that the user can see the logical ordering of the
characters directly.  A similar consideration applies to homoglyph
attacks.

Dave
diff mbox series

Patch

diff --git a/libcpp/lex.c b/libcpp/lex.c
index f7a86fbe4b5..88aba307991 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -1387,9 +1387,11 @@  maybe_warn_bidi_on_close (cpp_reader *pfile, const uchar *p)
       const location_t loc
 	= linemap_position_for_column (pfile->line_table,
 				       CPP_BUF_COLUMN (pfile->buffer, p));
-      cpp_warning_with_line (pfile, CPP_W_BIDIRECTIONAL, loc, 0,
-			     "unpaired UTF-8 bidirectional character "
-			     "detected");
+      rich_location rich_loc (pfile->line_table, loc);
+      rich_loc.set_escape_on_output (true);
+      cpp_warning_at (pfile, CPP_W_BIDIRECTIONAL, &rich_loc,
+		      "unpaired UTF-8 bidirectional character "
+		      "detected");
     }
   /* We're done with this context.  */
   bidi::on_close ();
@@ -1414,6 +1416,9 @@  maybe_warn_bidi_on_char (cpp_reader *pfile, const uchar *p, bidi::kind kind,
       const location_t loc
 	= linemap_position_for_column (pfile->line_table,
 				       CPP_BUF_COLUMN (pfile->buffer, p));
+      rich_location rich_loc (pfile->line_table, loc);
+      rich_loc.set_escape_on_output (true);
+
       /* It seems excessive to warn about a PDI/PDF that is closing
 	 an opened context because we've already warned about the
 	 opening character.  Except warn when we have a UCN x UTF-8
@@ -1422,20 +1427,20 @@  maybe_warn_bidi_on_char (cpp_reader *pfile, const uchar *p, bidi::kind kind,
 	{
 	  if (warn_bidi == bidirectional_unpaired
 	      && bidi::current_ctx_ucn_p () != ucn_p)
-	    cpp_warning_with_line (pfile, CPP_W_BIDIRECTIONAL, loc, 0,
-				   "UTF-8 vs UCN mismatch when closing "
-				   "a context by \"%s\"", bidi::to_str (kind));
+	    cpp_warning_at (pfile, CPP_W_BIDIRECTIONAL, &rich_loc,
+			    "UTF-8 vs UCN mismatch when closing "
+			    "a context by \"%s\"", bidi::to_str (kind));
 	}
       else if (warn_bidi == bidirectional_any)
 	{
 	  if (kind == bidi::kind::PDF || kind == bidi::kind::PDI)
-	    cpp_warning_with_line (pfile, CPP_W_BIDIRECTIONAL, loc, 0,
-				   "\"%s\" is closing an unopened context",
-				   bidi::to_str (kind));
+	    cpp_warning_at (pfile, CPP_W_BIDIRECTIONAL, &rich_loc,
+			    "\"%s\" is closing an unopened context",
+			    bidi::to_str (kind));
 	  else
-	    cpp_warning_with_line (pfile, CPP_W_BIDIRECTIONAL, loc, 0,
-				   "found problematic Unicode character \"%s\"",
-				   bidi::to_str (kind));
+	    cpp_warning_at (pfile, CPP_W_BIDIRECTIONAL, &rich_loc,
+			    "found problematic Unicode character \"%s\"",
+			    bidi::to_str (kind));
 	}
     }
   /* We're done with this context.  */