diff mbox series

Fortran: Avoid accessing gfc_charlen when not looking at BT_CHARACTER (PR 110677)

Message ID ri64jl1cyky.fsf@suse.cz
State New
Headers show
Series Fortran: Avoid accessing gfc_charlen when not looking at BT_CHARACTER (PR 110677) | expand

Commit Message

Martin Jambor Aug. 14, 2023, 5:39 p.m. UTC
Hello,

this patch addresses an issue uncovered by the undefined behavior
sanitizer.  In function resolve_structure_cons in resolve.cc there is
a test starting with:

      if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
	  && comp->ts.u.cl->length
	  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT

and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of
integer value 1818451807 which is outside of the value range expr_t
enum.  If I understand the code correctly it the entire load was
unwanted because comp->ts.type in those cases is BT_CLASS and not
BT_CHARACTER.  This patch simply adds a check to make sure it is only
accessed in those cases.

I have verified that the UPBSAN failure goes away with this patch, it
also passes bootstrap and testing on x86_64-linux.  OK for master?

Thanks,

Martin



gcc/fortran/ChangeLog:

2023-08-14  Martin Jambor  <mjambor@suse.cz>

	PR fortran/110677
	* resolve.cc (resolve_structure_cons): Check comp->ts is character
	type before accessing stuff through comp->ts.u.cl.
---
 gcc/fortran/resolve.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Harald Anlauf Aug. 14, 2023, 7:28 p.m. UTC | #1
Hi Martin,

Am 14.08.23 um 19:39 schrieb Martin Jambor:
> Hello,
>
> this patch addresses an issue uncovered by the undefined behavior
> sanitizer.  In function resolve_structure_cons in resolve.cc there is
> a test starting with:
>
>        if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
> 	  && comp->ts.u.cl->length
> 	  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
>
> and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of
> integer value 1818451807 which is outside of the value range expr_t
> enum.  If I understand the code correctly it the entire load was
> unwanted because comp->ts.type in those cases is BT_CLASS and not
> BT_CHARACTER.  This patch simply adds a check to make sure it is only
> accessed in those cases.
>
> I have verified that the UPBSAN failure goes away with this patch, it
> also passes bootstrap and testing on x86_64-linux.  OK for master?

this looks good to me.

Looking at that code block, there is a potential other UB a few lines
below, where (hopefully integer) string lengths are to be passed to
mpz_cmp.

If the string length is ill-defined (e.g. non-integer), value.integer
is undefined.  We've seen this elsewhere, where on BE platforms that
undefined value was interpreted as some large integer and giving
failures on those platforms.  One could similarly add the following
checks here (on top of your patch):

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index e7c8d919bef..43095406c16 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -1401,6 +1401,8 @@ resolve_structure_cons (gfc_expr *expr, int init)
  	  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
  	  && cons->expr->ts.u.cl && cons->expr->ts.u.cl->length
  	  && cons->expr->ts.u.cl->length->expr_type == EXPR_CONSTANT
+	  && cons->expr->ts.u.cl->length->ts.type == BT_INTEGER
+	  && comp->ts.u.cl->length->ts.type == BT_INTEGER
  	  && mpz_cmp (cons->expr->ts.u.cl->length->value.integer,
  		      comp->ts.u.cl->length->value.integer) != 0)
  	{

It is up to you whether you want to add this.

Thanks for the patch!

Harald


> Thanks,
>
> Martin
>
>
>
> gcc/fortran/ChangeLog:
>
> 2023-08-14  Martin Jambor  <mjambor@suse.cz>
>
> 	PR fortran/110677
> 	* resolve.cc (resolve_structure_cons): Check comp->ts is character
> 	type before accessing stuff through comp->ts.u.cl.
> ---
>   gcc/fortran/resolve.cc | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
> index e7c8d919bef..5b4dfc5fcd2 100644
> --- a/gcc/fortran/resolve.cc
> +++ b/gcc/fortran/resolve.cc
> @@ -1396,8 +1396,9 @@ resolve_structure_cons (gfc_expr *expr, int init)
>   	 the one of the structure, ensure this if the lengths are known at
>    	 compile time and when we are dealing with PARAMETER or structure
>   	 constructors.  */
> -      if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
> -	  && comp->ts.u.cl->length
> +      if (cons->expr->ts.type == BT_CHARACTER
> +	  && comp->ts.type == BT_CHARACTER
> +	  && comp->ts.u.cl && comp->ts.u.cl->length
>   	  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
>   	  && cons->expr->ts.u.cl && cons->expr->ts.u.cl->length
>   	  && cons->expr->ts.u.cl->length->expr_type == EXPR_CONSTANT
Martin Jambor Aug. 15, 2023, 3:18 p.m. UTC | #2
Hello,

On Mon, Aug 14 2023, Harald Anlauf via Gcc-patches wrote:
> Hi Martin,
>
> Am 14.08.23 um 19:39 schrieb Martin Jambor:
>> Hello,
>> 
>> this patch addresses an issue uncovered by the undefined behavior
>> sanitizer.  In function resolve_structure_cons in resolve.cc there is
>> a test starting with:
>> 
>>        if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
>> 	  && comp->ts.u.cl->length
>> 	  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
>> 
>> and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of
>> integer value 1818451807 which is outside of the value range expr_t
>> enum.  If I understand the code correctly it the entire load was
>> unwanted because comp->ts.type in those cases is BT_CLASS and not
>> BT_CHARACTER.  This patch simply adds a check to make sure it is only
>> accessed in those cases.
>> 
>> I have verified that the UPBSAN failure goes away with this patch, it
>> also passes bootstrap and testing on x86_64-linux.  OK for master?
>
> this looks good to me.
>
> Looking at that code block, there is a potential other UB a few lines
> below, where (hopefully integer) string lengths are to be passed to
> mpz_cmp.
>
> If the string length is ill-defined (e.g. non-integer), value.integer
> is undefined.  We've seen this elsewhere, where on BE platforms that
> undefined value was interpreted as some large integer and giving
> failures on those platforms.  One could similarly add the following
> checks here (on top of your patch):

Thank you very much for the approval and the improvement.  I have
committed the following (after another round of testing).

Martin



Fortran: Avoid accessing gfc_charlen when not looking at BT_CHARACTER (PR 110677)

This patch addresses an issue uncovered by the undefined behavior
sanitizer.  In function resolve_structure_cons in resolve.cc there is
a test starting with:

      if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
	  && comp->ts.u.cl->length
	  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT

and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of
integer value 1818451807 which is outside of the value range expr_t
enum.  If I understand the code correctly it the entire load was
unwanted because comp->ts.type in those cases is BT_CLASS and not
BT_CHARACTER.  This patch simply adds a check to make sure it is only
accessed in those cases.

During review, Harald Anlauf noticed that length types also need to be
checked and so I added also checks that he suggested to the condition.

Co-authored-by: Harald Anlauf <anlauf@gmx.de>

gcc/fortran/ChangeLog:

2023-08-14  Martin Jambor  <mjambor@suse.cz>

	PR fortran/110677
	* resolve.cc (resolve_structure_cons): Check comp->ts is character
	type before accessing stuff through comp->ts.u.cl.
---
 gcc/fortran/resolve.cc | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index e7c8d919bef..f51674f7faa 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -1396,11 +1396,14 @@ resolve_structure_cons (gfc_expr *expr, int init)
 	 the one of the structure, ensure this if the lengths are known at
  	 compile time and when we are dealing with PARAMETER or structure
 	 constructors.  */
-      if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
-	  && comp->ts.u.cl->length
+      if (cons->expr->ts.type == BT_CHARACTER
+	  && comp->ts.type == BT_CHARACTER
+	  && comp->ts.u.cl && comp->ts.u.cl->length
 	  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
 	  && cons->expr->ts.u.cl && cons->expr->ts.u.cl->length
 	  && cons->expr->ts.u.cl->length->expr_type == EXPR_CONSTANT
+	  && cons->expr->ts.u.cl->length->ts.type == BT_INTEGER
+	  && comp->ts.u.cl->length->ts.type == BT_INTEGER
 	  && mpz_cmp (cons->expr->ts.u.cl->length->value.integer,
 		      comp->ts.u.cl->length->value.integer) != 0)
 	{
diff mbox series

Patch

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index e7c8d919bef..5b4dfc5fcd2 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -1396,8 +1396,9 @@  resolve_structure_cons (gfc_expr *expr, int init)
 	 the one of the structure, ensure this if the lengths are known at
  	 compile time and when we are dealing with PARAMETER or structure
 	 constructors.  */
-      if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
-	  && comp->ts.u.cl->length
+      if (cons->expr->ts.type == BT_CHARACTER
+	  && comp->ts.type == BT_CHARACTER
+	  && comp->ts.u.cl && comp->ts.u.cl->length
 	  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
 	  && cons->expr->ts.u.cl && cons->expr->ts.u.cl->length
 	  && cons->expr->ts.u.cl->length->expr_type == EXPR_CONSTANT